OpenAI Clearly Nervous About Its New Voice Cloning Tool Being Used for Scams

OpenAI announced a new AI-based audio cloning tool called Voice Engine on Friday. While the company is obviously proud of the potential of this technology—touting how it could be used to provide reading assistance for kids and give a voice to those who’ve lost theirs—OpenAI is clearly very nervous about how this could be abused. And with good reason.

“OpenAI is committed to developing safe and broadly beneficial AI,” the company said in a statement on Friday, making its concerns clear in the very first sentence.

Voice Engine essentially uses the same tech that’s behind its text-to-speech API and ChatGPT Voice but this application of the tech is all about cloning a voice rather than reading something aloud with a stranger’s tone and inflection. OpenAI notes that its tech is exceptional in that it needs just a 15-second sample to “create emotive and realistic voices.”

“Today we are sharing preliminary insights and results from a small-scale preview of a model called Voice Engine, which uses text input and a single 15-second audio sample to generate natural-sounding speech that closely resembles the original speaker,” the company wrote.

It’s not clear what kind of training data was used to build Voice Engine, a sore spot for AI companies that have been accused of violating copyright laws by training their models on protected works. Companies like OpenAI argue their training methods count as “fair use” under U.S. copyright law, but a number of rights holders have sued, complaining they weren’t compensated for their work.

OpenAI’s website has example audio clips that have been fed through Voice Engine and they’re pretty damn impressive. The ability to change the language someone is speaking is also very cool. But you can’t try it out for yourself just yet.

There are already a number of voice cloning tools available like ElevenLabs, and translators like Respeecher. But OpenAI has become a behemoth since it first launched ChatGPT publicly in late 2022. And as soon as it makes Voice Engine a publicly available product (there’s no word on a release date yet) it could open up the floodgates for all kinds of new abuses we’ve never even dreamed of.

OpenAI’s statement on Friday noted, “We are taking a cautious and informed approach to a broader release due to the potential for synthetic voice misuse,” emphasizing the worries every major company now faces with this kind of AI tech.

One particularly worrying example of someone using AI voice cloning for nefarious purposes happened earlier this year using President Joe Biden’s voice. Steve Kramer, who worked for longshot Democratic presidential candidate Dean Phillips, cloned Biden’s voice to create a message that said people shouldn’t bother to vote in the New Hampshire primary. Kramer used the ElevenLabs AI voice tool and made it in “less than 30 minutes,” sending the robocall message out to about 5,000 people, according to the Washington Post.

“We hope to start a dialogue on the responsible deployment of synthetic voices, and how society can adapt to these new capabilities,” OpenAI’s statement said. “Based on these conversations and the results of these small scale tests, we will make a more informed decision about whether and how to deploy this technology at scale.”

That, of course, is the double-edged sword of all new technology. Scam artists will always find a way to exploit emerging tools to bilk people out of their hard-earned cash. But you don’t need to use fake AI-generated voices to scam people. As we reported earlier this week, the latest crypto scam uses real actors hired on Fiverr to read a script that helps sell their scam as authentic.