As AI progress, the limits between robots and humans are narrowing. AI challenges us in countless areas and surpassing our ability to complete countless tasks.
And today, companies want us to talk to them via AI–their so-called vocal assistants.
As if talking to a robot has become normal!
Recent years have seen an explosion in so-called conversational AI. The problem is that some current systems are still unstable and don’t exactly spark the desire for conversation.
Conversational agents have a poor reputation. They have only an average sense of humor, problems with understanding humans, and slow execution. The list of troubles is long. Designers have a lot of work ahead of them.
People want a real-time response
Imagine talking to a friend who has to think every time you ask a simple question such as “Are you okay?” If this happens, maybe you should look for other friends. I’m kidding, but that’s what happens sometimes today when you pose a question to a vocal assistant.
It turns out that instantaneity and big data don’t mix. Very often, when data processing is involved, the system needs time to do it. And that’s something you feel with today’s vocal assistants. Even if the ones from huge companies like Google or Amazon are good at it, the demand for instantaneity sometimes affects the quality of the answer.
The voice problem
We don’t talk about this problem enough! Synthetic voices are horrible! There have been big improvements in recent years, but we still haven’t found a truly satisfying result.
On this point, the breakthrough will almost certainly come from a Canadian start-up you may have heard of. Lyrebird (now owned by Descript) has developed a machine learning model that can accurately mimic a person’s voice. The most impressive thing is that only a few seconds of recording are needed for the cloning.
I confess that their system scares me. However, it will eliminate the problem of synthetic voices. The risk is that it will be co-opted by bad actors to spread rumors and false information. This is the danger inherent in technology. It allows us to do interesting things and raises ethical questions at the same time. This is why we must more than ever cultivate our critical minds. It’s no longer possible in 2021 to believe what we see or hear without questioning it.
AI must record the context of the conversation
Today’s AI’s forget quickly. A few years ago, voice assistants were programmed to process requests one after the other without putting them in a particular context. If you wanted to ask your assistant a new question, he’d already forgotten the first one.
Fortunately, today’s systems are more efficient. They’re equipped with a memory that allows you to hold a conversation consisting of several messages. This is often short-term memory, but it’s sufficient in many cases. Perhaps in systems like Google Home or Siri, longer-term memories are included.
Still, all these problems make conversational AIs boring. They don’t convert customers for business, they don’t make people want to talk, and worse, they can sometimes make people angry. I’ve disabled the AI on my phone. I still can’t chat with a robot, and it’s not something I’m considering. Maybe I will when more interesting solutions are proposed.
And yet, I’m convinced that this decade will be the decade of conversational AI. It’s the AI application that seems to be the most obvious. Human interactions will be done by voice more naturally. This is why these systems will continue to become more democratic.
The challenges I’ve listed above will soon be overcome. But there’s an essential component to take into account: talking is an art! And, like any art, it’s difficult to master.
The use of rhetorical questions, jokes, irony, these are things that AIs don’t know. We all have a friend who has trouble understanding jokes. Tell him a story about a fictional character and he’ll ask, “Who is that?” Don’t laugh. It’s not funny — it’s a scourge!
With AI, it’s the same thing. Models designed today are content to study sentences or paragraphs literally. And if the AI makes a joke, it’s only because the designer programmed it in.
Emotion — in the prosody of the voice, intonation, all those things that are not part of the text but give life to your subject and makes it more meaningful — is an essential aspect of speech. Today’s AIs can detect a person’s emotions more or less accurately, but designing an AI that can adapt its voice to the emotion it wants to simulate is still a bit like science fiction.
Speech is more than a succession of words. It’s what shapes thought. It makes you convincing, makes you stand out, and makes your words interesting. But if you want to talk with an AI, put away your figures of speech.
Language is an art that cannot emerge from a succession of conditions integrated into an assembly of lines of code. It is something superior that the machine will have difficulty mastering. That’s why for now, nobody wants to talk to an AI.