Dataconomy
  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Glossary
    • Whitepapers
  • Newsletter
  • + More
    • Conversations
    • Events
    • About
      • About
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
Subscribe
No Result
View All Result
  • AI
  • Tech
  • Cybersecurity
  • Finance
  • DeFi & Blockchain
  • Startups
  • Gaming
Dataconomy
  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Glossary
    • Whitepapers
  • Newsletter
  • + More
    • Conversations
    • Events
    • About
      • About
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
Subscribe
No Result
View All Result
Dataconomy
No Result
View All Result

OpenAI’s new voice AI can apologize like it actually means it

Jeff Harris, a member of OpenAI's product staff, stated that the objective is to allow developers to customize both the voice experience and context

byKerem Gülen
March 21, 2025
in Artificial Intelligence, News

According to TechCrunch, OpenAI is launching upgraded transcription and voice-generating AI models in its API, which the company claims enhance prior versions. This release aligns with OpenAI’s broader aim of creating automated systems that can autonomously perform tasks for users.

The new text-to-speech model, “gpt-4o-mini-tts,” provides more nuanced and realistic-sounding speech, characterized as more “steerable” than earlier speech-synthesizing models. Developers can instruct gpt-4o-mini-tts to modify speech based on the context, such as saying, “speak like a mad scientist” or adopting a serene tone akin to a mindfulness teacher.

Jeff Harris, a member of OpenAI’s product staff, stated that the objective is to allow developers to customize both the voice experience and context. “In different contexts, you don’t just want a flat, monotonous voice,” he explained. For instance, in a customer support scenario where an apology is warranted, developers can configure the voice to convey that emotion. Harris emphasized that developers and users should have substantial control over both the content and manner of spoken outputs.

Stay Ahead of the Curve!

Don't miss out on the latest insights, trends, and analysis in the world of data, technology, and startups. Subscribe to our newsletter and get exclusive content delivered straight to your inbox.

Below are some shared samples (via TechCrunch):

Regarding the new speech-to-text models, “gpt-4o-transcribe” and “gpt-4o-mini-transcribe,” these models replace OpenAI’s previous Whisper transcription model. Trained using diverse, high-quality audio datasets, these new models are designed to improve the capturing of varied speech, even in noisy environments.

They also offer a significant reduction in the production of inaccuracies, as noted by Harris. The earlier Whisper model was known to generate false transcriptions, including fabricated words and incorrect content. “These models are much improved versus Whisper on that front,” Harris remarked, asserting that precision in speech recognition is vital for delivering a reliable voice experience.


OpenAI launches o1-pro: A costly upgrade for developers


However, the transcription accuracy may vary by language. OpenAI’s internal benchmarks indicate that gpt-4o-transcribe, noted for its accuracy, approaches a “word error rate” of 30% for Indic and Dravidian languages such as Tamil, Telugu, Malayalam, and Kannada. This means that approximately three out of every ten words may differ from a human-generated transcription in these languages.

In a departure from past practices, OpenAI has opted not to release these new transcription models under an open-source license. Historically, new versions of Whisper were made available for commercial use under an MIT license. According to Harris, the gpt-4o-transcribe and gpt-4o-mini-transcribe models are significantly larger than Whisper, making local execution impractical for users’ devices. He noted, “[They’re] not the kind of model that you can just run locally on your laptop, like Whisper.”

Harris concluded by stating that OpenAI aims to responsibly release open-source models for specific needs, emphasizing the importance of honing these models for particular applications.


Featured image credit: Zac Wolff/Unsplash

Tags: FeaturedopenAI

Related Posts

Z.AI GLM-4.6 boosts context window to 200K tokens

Z.AI GLM-4.6 boosts context window to 200K tokens

October 2, 2025
OpenAI releases Sora 2, iOS app with real-world inserts

OpenAI releases Sora 2, iOS app with real-world inserts

October 2, 2025
Bitrig: SwiftUI apps from voice using Apple Intelligence

Bitrig: SwiftUI apps from voice using Apple Intelligence

October 2, 2025
Bengio warns hyper-AI preservation goals threaten humanity

Bengio warns hyper-AI preservation goals threaten humanity

October 2, 2025
Apple TV 4K to feature A17 Pro chip and Apple Intelligence

Apple TV 4K to feature A17 Pro chip and Apple Intelligence

October 2, 2025
Instagram tests Reels-first home tab in India

Instagram tests Reels-first home tab in India

October 2, 2025

LATEST NEWS

Z.AI GLM-4.6 boosts context window to 200K tokens

OpenAI releases Sora 2, iOS app with real-world inserts

Bitrig: SwiftUI apps from voice using Apple Intelligence

Bengio warns hyper-AI preservation goals threaten humanity

Apple TV 4K to feature A17 Pro chip and Apple Intelligence

Instagram tests Reels-first home tab in India

Dataconomy

COPYRIGHT © DATACONOMY MEDIA GMBH, ALL RIGHTS RESERVED.

  • About
  • Imprint
  • Contact
  • Legal & Privacy

Follow Us

  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Glossary
    • Whitepapers
  • Newsletter
  • + More
    • Conversations
    • Events
    • About
      • About
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
No Result
View All Result
Subscribe

This website uses cookies. By continuing to use this website you are giving consent to cookies being used. Visit our Privacy Policy.