Dataconomy
  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Glossary
    • Whitepapers
  • Newsletter
  • + More
    • Conversations
    • Events
    • About
      • About
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
Subscribe
No Result
View All Result
  • AI
  • Tech
  • Cybersecurity
  • Finance
  • DeFi & Blockchain
  • Startups
  • Gaming
Dataconomy
  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Glossary
    • Whitepapers
  • Newsletter
  • + More
    • Conversations
    • Events
    • About
      • About
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
Subscribe
No Result
View All Result
Dataconomy
No Result
View All Result

OpenAI’s new voice AI can apologize like it actually means it

Jeff Harris, a member of OpenAI's product staff, stated that the objective is to allow developers to customize both the voice experience and context

byKerem Gülen
March 21, 2025
in Artificial Intelligence, News
Home News Artificial Intelligence

According to TechCrunch, OpenAI is launching upgraded transcription and voice-generating AI models in its API, which the company claims enhance prior versions. This release aligns with OpenAI’s broader aim of creating automated systems that can autonomously perform tasks for users.

The new text-to-speech model, “gpt-4o-mini-tts,” provides more nuanced and realistic-sounding speech, characterized as more “steerable” than earlier speech-synthesizing models. Developers can instruct gpt-4o-mini-tts to modify speech based on the context, such as saying, “speak like a mad scientist” or adopting a serene tone akin to a mindfulness teacher.

Jeff Harris, a member of OpenAI’s product staff, stated that the objective is to allow developers to customize both the voice experience and context. “In different contexts, you don’t just want a flat, monotonous voice,” he explained. For instance, in a customer support scenario where an apology is warranted, developers can configure the voice to convey that emotion. Harris emphasized that developers and users should have substantial control over both the content and manner of spoken outputs.

Stay Ahead of the Curve!

Don't miss out on the latest insights, trends, and analysis in the world of data, technology, and startups. Subscribe to our newsletter and get exclusive content delivered straight to your inbox.

Below are some shared samples (via TechCrunch):

Regarding the new speech-to-text models, “gpt-4o-transcribe” and “gpt-4o-mini-transcribe,” these models replace OpenAI’s previous Whisper transcription model. Trained using diverse, high-quality audio datasets, these new models are designed to improve the capturing of varied speech, even in noisy environments.

They also offer a significant reduction in the production of inaccuracies, as noted by Harris. The earlier Whisper model was known to generate false transcriptions, including fabricated words and incorrect content. “These models are much improved versus Whisper on that front,” Harris remarked, asserting that precision in speech recognition is vital for delivering a reliable voice experience.


OpenAI launches o1-pro: A costly upgrade for developers


However, the transcription accuracy may vary by language. OpenAI’s internal benchmarks indicate that gpt-4o-transcribe, noted for its accuracy, approaches a “word error rate” of 30% for Indic and Dravidian languages such as Tamil, Telugu, Malayalam, and Kannada. This means that approximately three out of every ten words may differ from a human-generated transcription in these languages.

In a departure from past practices, OpenAI has opted not to release these new transcription models under an open-source license. Historically, new versions of Whisper were made available for commercial use under an MIT license. According to Harris, the gpt-4o-transcribe and gpt-4o-mini-transcribe models are significantly larger than Whisper, making local execution impractical for users’ devices. He noted, “[They’re] not the kind of model that you can just run locally on your laptop, like Whisper.”

Harris concluded by stating that OpenAI aims to responsibly release open-source models for specific needs, emphasizing the importance of honing these models for particular applications.


Featured image credit: Zac Wolff/Unsplash

Tags: FeaturedopenAI

Related Posts

Everything announced at Apple’s September 9 Event

Everything announced at Apple’s September 9 Event

September 9, 2025
Apple introduces iPhone 17 Pro and Pro Max with new design, A19 Pro chip, and pro-level cameras

Apple introduces iPhone 17 Pro and Pro Max with new design, A19 Pro chip, and pro-level cameras

September 9, 2025
Apple’s iPhone 17 AIR arrives with the thinnest titanium design and pro cameras

Apple’s iPhone 17 AIR arrives with the thinnest titanium design and pro cameras

September 9, 2025
iPhone 17 debuts with Center Stage front camera, brighter display, and A19 chip

iPhone 17 debuts with Center Stage front camera, brighter display, and A19 chip

September 9, 2025
AirPods Pro 3 introduce stronger noise cancellation, heart rate tracking, and live translation

AirPods Pro 3 introduce stronger noise cancellation, heart rate tracking, and live translation

September 9, 2025
Apple Watch Ultra 3 expands health insights, satellite connectivity, and battery life

Apple Watch Ultra 3 expands health insights, satellite connectivity, and battery life

September 9, 2025

LATEST NEWS

Everything announced at Apple’s September 9 Event

Apple introduces iPhone 17 Pro and Pro Max with new design, A19 Pro chip, and pro-level cameras

Apple’s iPhone 17 AIR arrives with the thinnest titanium design and pro cameras

iPhone 17 debuts with Center Stage front camera, brighter display, and A19 chip

AirPods Pro 3 introduce stronger noise cancellation, heart rate tracking, and live translation

Apple Watch Ultra 3 expands health insights, satellite connectivity, and battery life

Dataconomy

COPYRIGHT © DATACONOMY MEDIA GMBH, ALL RIGHTS RESERVED.

  • About
  • Imprint
  • Contact
  • Legal & Privacy

Follow Us

  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Glossary
    • Whitepapers
  • Newsletter
  • + More
    • Conversations
    • Events
    • About
      • About
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
No Result
View All Result
Subscribe

This website uses cookies. By continuing to use this website you are giving consent to cookies being used. Visit our Privacy Policy.