2024-03-30 02:30:48
Reliably clone the human voice with a sample of just 15 seconds. That is what the latest artificial intelligence tool created by OpenAI achieves, the firm that dazzled the world with ChatGPT, its generative AI language program.
“Today we share information and preliminary results from a small-scale preview of a model called Voice Engine, which uses text and a single 15-second audio sample to generate natural speech that closely resembles the original speaker. It is remarkable that a small model with a single 15-second sample can create emotive and realistic voices,” the firm led by Sam Altman said in a statement.
All the user has to do is provide that sample. Once the Voice Engine program has it, you can make it read any text you provide with the timbre and tone of that voice. The text doesn’t even have to be in the same language. A Spanish speaker can provide the sample in her language and then ask the program to read a text in English, Chinese, or other languages in his or her voice.
It can also be used directly for audio translation. What’s more, when used for translation, Voice Engine preserves the native accent of the original speaker: for example, generating English with an audio sample from a French speaker would produce French-accented speech.
Restricted use
The company prefers to launch a small-scale test for the moment instead of facilitating widespread access to the tool, as it did with ChatGPT, as it is aware of the risk of identity theft. With the tool, you only need to record 15 seconds of someone to get their voice.
”We are taking a cautious and informed approach to a broader release due to the potential for misuse of synthetic voices,” OpenAI says. “We hope to start a dialogue regarding the responsible deployment of synthetic voices and how society can adapt to these new capabilities. “Based on these conversations and the results of these small-scale tests, we will make a more informed decision regarding whether and how to deploy this technology on a large scale,” he adds.
OpenAI considers that before generalizing access to the new tool, decisions must be made on a series of aspects. For example, it calls for progressively eliminating voice authentication as a security measure to access bank accounts and other sensitive information, since it would no longer be secure.
He also considers it necessary to explore policies to protect the use of individuals’ voices in artificial intelligence. The risk of manipulation and misinformation is especially marked in the case of public figures, including politicians.
Therefore, it also calls for educating the public to understand the capabilities and limitations of AI technologies, including the possibility of misleading AI content.
Another proposal he puts on the table is to accelerate the development and adoption of techniques to trace the origin of audiovisual content, so that it is always clear when you are interacting with a real person or with an AI.
“It’s important that people around the world understand where this technology is headed, whether we ultimately deploy it widely ourselves or not. We look forward to continuing to engage in conversations regarding the challenges and opportunities of synthetic voices with policy makers, researchers, developers and creatives,” concludes OpenAI.
In its career of innovations, OpenAI has launched tools not only for language, but also for images and video generation. Last month it presented Sora, a revolutionary video tool that only needed text to create a short video with the required content and style.
You can follow EL PAÍS Tecnología in Facebook y X or sign up here to receive our weekly newsletter.
1711768886
#OpenAI #launches #audio #tool #capable #cloning #human #voices #Technology