The Future of Voice: For Better or Worse, AI Takes Center Stage

Artist depiction of AI-generated sound (TheDigitalArtist) — ***Artist depiction of AI-generated sound (Pete Linforth, TheDigitalArtist)***

Generative Artificial Intelligence (AI) involves using AI algorithms to create output from training data. Alongside text, photos, and video, this output can also be in the form of speech/sound. With the interest in AI products increasing worldwide, it is no surprise that innovations like AI voices are appearing in the public limelight as well. For example, a San Francisco-based voice AI startup recently raised $8 million in funding.

Worldwide interest in ‘Artificial Intelligence’ over time, with the Y-axis representing research interest relative to the highest point on the chart. Interest in the topic has recently peaked in June 2023. (Data source: Google Trends).

AI voices have the potential to transform the current landscape for many industries. In the entertainment sector, AI-generated voices can be applied to provide voice-overs for audiobooks, animated movies, and video games. Businesses can streamline customer support operations by using AI voices to engage with customers. The technology can also improve accessibility, with AI voices being utilized in text-to-speech systems to convert text into spoken words for visually impaired individuals. Of course, AI is also responsible for the voices (and thoughts) of digital assistants such as Amazon’s Alexa, Microsoft’s Cortana, Apple’s Siri, and Google Assistant. Although AI-generated voices haven’t completely replaced humans in many areas, they are certainly changing the game.

With any rapidly-developing technology, there are always some growing pains. AI voices are definitely not exempt from this. In his blog, Microsoft founder Bill Gates posted his concerns on how “Deepfakes and misinformation generated by AI could undermine elections and democracy.” He discussed how fake, AI-generated audio and video could potentially be used to tilt an election. In fact, an estimated 500,000 video and voice deepfakes are predicted to be shared globally on social media sites in 2023.

The fears about malicious use of AI voice are not unfounded. A report by the Identity Theft Resource Center (ITRC) found that 61% of scams resulting in identity theft are conducted through AI voice. AI software has also been used in fake kidnapping scams, where a clone of the alleged victim’s voice is used to demand a ransom from their family.

There are also worries about the ethical implications of using AI voice. The lack of consent when using someone’s voice to train AI-generated voices is a focal area of contention. There have already been several cases where voice actors have found their voices being used without permission and within inappropriate contexts. How would you react if you discovered that your voice was being exploited to endorse products to your friends and family? Additionally, the automation of human jobs through AI is also a subject of heated debate.

Products and services utilizing AI is not new. However, they are being adopted at an unprecedented rate and scale. AI voices present an amazing opportunity for innovation and exploration. At the same time, there is a large potential for misuse and abuse. Irrespective of the world’s preparedness, Pandora’s box has already been opened. Our only option now is to respond as effectively as possible, implementing guardrails and limitations to minimize harm wherever feasible.

The Future of Voice: For Better or Worse, AI Takes Center Stage

Article by: Terry Li

Leave a Reply Cancel reply