Researchers at Google and UC Berkeley developed, in late 2017, a new artificial intelligence technology able to generate realistic voices 100% synthesized by a computer.
This technology is now in the wild and is being used to clone celebrities’ voices. After training the AI model in hundreds of hours of speeches by Mark Zuckerberg, Donald Trump, or Boris Johnson, for instance, anyone can just write a text and let the system read it. The results are already impressive and will get much better over time.
What can really be done with the tech?
Developers are already using the tech and building applications on top of it. For instance, at the site Vo.codes, you can test it with an array of personalities.
To compound the potential of a fake voice, it is possible to combine it with a video deep fake. Below, you can watch Dr. Phil recite the lyrics from the rap Milkshake, by Kelis, on a 100% fabricated clip.
Why should I care about this tech?
As with any technology ever created, there are potentially good and bad use cases. On the positive side, it may be used to emulate voice actors for a very low cost so videogames, books, and movies can license, for instance, the iconic voices of Samuel L. Jackson or Sir. David Attenborough. We could also bring back legendary voices from the past from Carl Sagan or June Foray.
The downside of this tech, though, is that it might make the problem of disinformation and fake news even worse. Imagine the damage it can do to politics, international relations, or businesses if any leader’s voice is faked and he or she is caught saying things that were never said? The tech has the potential to initiate wars, manipulate the stock market, or blackmail leaders. It is more important than ever to double-check any information you are seeing on Whatsapp, Facebook, Tiktok, or Instagram.
This tech also opens doors for the development of AIs able to identify fake audios. This is a huge area of research opening up.
What use cases do you think this tech can be used for?