OpenAI announced new voice intelligence features for its API designed to enhance developer applications that can talk, transcribe, and translate conversations. The new GPT‑Realtime‑2 model offers realistic vocal simulation and is built on GPT‑5‑class reasoning, enabling it to manage more complex requests than its predecessor, GPT-Realtime-1.5.
Additionally, OpenAI introduced GPT‑Realtime‑Translate, which provides real-time translation services supporting over 70 input languages and 13 output languages. A transcription feature called GPT-Realtime-Whisper was also launched, offering live speech-to-text functionality during interactions.
OpenAI stated, “Together, the models we are launching move real-time audio from simple call-and-response toward voice interfaces that can actually do work: listen, reason, translate, transcribe, and take action as a conversation unfolds.”
The updates target businesses aiming to enhance customer service capabilities. OpenAI noted these features also serve a range of applications in education, media, events, and creator platforms.
In addressing potential misuse, OpenAI implemented guardrails to prevent spam, fraud, and other online abuses. Certain triggers have been integrated to halt conversations that violate harmful content guidelines.
All new voice models are part of OpenAI’s Realtime API. GPT-Realtime-Translate and Whisper are billed by the minute, while GPT-Realtime-2 charges based on token consumption.





