Microsoft has launched “MAI-Transcribe-1,” an AI model designed for accurate speech-to-text transcription across 25 widely spoken languages. This model is intended for applications such as meetings, closed captioning, and dictation.
MAI-Transcribe-1 will be available on Microsoft Foundry alongside two other models: MAI-Voice-1 and MAI-Image-2. Microsoft stated that this launch will enable customers to evaluate and build with these models across transcription, voice, and image generation.
MAI-Voice-1 features hyper-realistic speech generation and allows for the creation of custom brand voices from just one minute of audio. Meanwhile, MAI-Image-2 specializes in text-to-image generation, excelling in natural lighting, accurate skin tones, and clarity of in-image text.
Microsoft has expressed a desire to reduce reliance on OpenAI by developing its own AI models, following criticisms of limitations in OpenAI’s GPT-4 technology. The company is restructuring its Copilot division into four pillars for better management: Copilot experience, Copilot platform, Microsoft 365 apps, and AI models.
Jacob Andreou, a former Snap executive, will lead Copilot experiences, while Mustafa Suleyman, Microsoft’s AI CEO, will focus on in-house developments. Salesforce CEO Marc Benioff predicted Microsoft would move away from OpenAI technology amid discontinuations like OpenAI’s Stargate project.
Microsoft’s shift to in-house model development reflects a strategic response to ongoing challenges and market demands. Suleyman acknowledged that current in-house models would still be a secondary option compared to OpenAI’s more advanced solutions.
With the launch of MAI-Transcribe-1, Microsoft aims to broaden its capabilities and offer businesses enhanced tools for productivity and communication in the evolving AI ecosystem.





