OpenAI’s Voice Engine has been introduced as a new text-to-speech technology, capable of generating a synthetic voice from just a 15-second audio sample of an individual’s voice. This innovative tool can vocalize text prompts as requested, either in the original language of the recorded voice or in various other languages.
“These small scale deployments are helping to inform our approach, safeguards, and thinking about how Voice Engine could be used for good across various industries,” OpenAI stated in its blog post.
Among the organizations granted early access are Age of Learning, a company specializing in educational technology; HeyGen, a platform for visual storytelling; Dimagi, a developer of healthcare software for field workers; Livox, which produces an AI-powered communication application; and Lifespan, a healthcare network.
How good is Voice Engine by OpenAI?
Now, we will present one reference audio along with three samples generated by OpenAI, accompanied by their respective transcripts. It’s up to you to determine the effectiveness of OpenAI’s Voice Engine by considering the shared examples. However, a definitive assessment cannot be made until the feature is widely available to end-users.
- The input audio.
- Salt also makes sure we stay hydrated which means there is enough water in our body for it to properly function.
- Let’s make the parts the same by adding one to three!
- Some of the most amazing habitats on Earth are found in the rainforest. A rainforest is a place with a lot of precipitation and it has many kinds of animals trees and other plants. Tropical rainforests are usually not too far from the equator and are warm all year.
OpenAI announced the development of its Voice Engine technology in late 2022, highlighting its application in providing preset voices for text-to-speech APIs and enabling the Read Aloud feature in ChatGPT. Recently, OpenAI product team mentioned that the technology was refined using both licensed data and data that’s publicly accessible. OpenAI has indicated that initially, this technology will be accessible to roughly 10 developers.
The field of AI-driven text-to-audio conversion is rapidly advancing. While the majority of developments have been in creating instrumental or environmental sounds, the creation of synthetic voices has seen less activity, a situation OpenAI attributes to the ethical concerns involved. Some entities active in this domain include Podcastle and ElevenLabs.
OpenAI has confirmed that its collaborators have committed to adhering to its use policies, which preclude the use of Voice Generation for impersonating individuals or entities without consent. Furthermore, these agreements stipulate that collaborators must obtain clear and voluntary consent from the people whose voices are used, prevent users from generating voices independently, and inform listeners that the voices are synthesized by AI. To ensure the traceability of its audio outputs, OpenAI has incorporated watermarking into the sound clips and is vigilant in overseeing their utilization.
OpenAI proposed a series of measures aimed at mitigating potential risks associated with technologies of this nature. These include transitioning away from the use of voice-based verification for banking access, implementing regulations to safeguard individuals’ voice data in AI applications, enhancing public awareness about AI-generated deepfakes, and creating mechanisms for the monitoring of AI-generated content.
“We recognize that generating speech that resembles people’s voices has serious risks, which are especially top of mind in an election year. We are engaging with U.S. and international partners from across government, media, entertainment, education, civil society and beyond to ensure we are incorporating their feedback as we build,” OpenAI told.
Use cases for OpenAI’s Voice Engine feature
OpenAI suggests that the below use cases of Voice Engine are viable examples of its application, yet emphasizes that the true limit to its potential uses is only bounded by one’s imagination:
- Educational assistance: Voice Engine can be used to provide reading help to non-readers and children by creating natural and emotive voices. This aids in generating pre-scripted voice-over content and real-time, personalized interactions with students, thereby broadening the range of accessible educational content.
- Content translation: This technology can enable the translation of videos and podcasts, allowing creators and businesses to reach a global audience in their own voices. It maintains the original speaker’s native accent across languages, thus preserving the authenticity of the translated content.
- Service delivery in remote communities: Voice Engine might improve essential service delivery by providing interactive feedback in primary languages of community health workers. This supports skill development in various essential services, such as maternal health counseling, in languages and dialects specific to remote communities.
- Support for non-verbal individuals: The technology powers devices that assist non-verbal people in communicating. Users can choose voices that represent them accurately across multiple languages, making communication more personal and less robotic.
- Voice recovery for patients with speech impairments: It offers a solution for individuals suffering from speech impairments due to sudden or degenerative conditions. By requiring only a short audio sample, Voice Engine can recreate the patient’s voice, helping them regain their ability to communicate in their natural voice.
Featured image credit: Kerem Gülen/Midjourney