Speech & Audio

Microsoft unveils home-made ML models amid OpenAI talks • The Register

Editorial Team

August 28, 2025
3 Min Read

Microsoft unveils home-made ML models amid OpenAI talks • The Register

Microsoft has introduced two home-grown machine learning models, potentially complicating negotiations with its current favored model supplier, OpenAI.

On Thursday, Microsoft AI (MAI) debuted MAI-Voice-1, which generates realistic sounding speech from text, and MAI-1-preview, a model it’s testing as the future basis of the company’s Copilot service.

In a Semafor video interview, MAI CEO Mustafa Suleyman explained Microsoft needs its own foundation models is because AI is fundamental to the company’s business.

“We have to be able to have the in-house expertise to create the strongest models in the world,” he said.

At the same time, Suleyman insisted Microsoft’s collaboration with OpenAI has been successful so far, and expressed hope it will continue.

If Redmond can create the strongest models in the world, it’s unclear why it would continue paying OpenAI for less capable technology unless it’s contractually obliged to do so.

Microsoft has already invested around $13 billion in OpenAI and the two firms are reportedly trying to renegotiate their contract, set to expire in 2030, so that OpenAI can restructure for a future public offering. Separately, OpenAI is said to be discussing the potential sale of shares owned by employees to investors in a deal that would see the unprofitable firm valued at $500 billion.

Last year, Microsoft opted not to release its VALL-E 2 speech synthesis project to the public because of potential abuses “such as spoofing voice identification or impersonating a specific speaker.” OpenAI took similar steps, limiting access to its Voice Engine for speech synthesis. And when Consumer Reports looked at voice cloning services, it found most firms didn’t do enough to prevent unauthorized impersonation.

Yet MAI-Voice-1 has arrived in Copilot Labs with only a minimalist warning: “Copilot may make mistakes.” It also powers Copilot Daily, an online AI-voiced summary of news and historic events, and Copilot Podcasts.

“MAI-Voice-1 is a lightning-fast speech generation model, with an ability to generate a full minute of audio in under a second on a single GPU, making it one of the most efficient speech systems available today,” said MAI in an online post.

Microsoft let model evaluation platform LMArena test MAI-1, but it isn’t available the public. Would-be testers in the US can apply for access.

LMArena currently ranks Microsoft’s model the equal thirteenth most effective in terms of output quality, behind grok-3-preview-02-24 and ahead of gemini-2.5-flash.

“MAI-1-preview is an in-house mixture-of-experts model, pre-trained and post-trained on ~15,000 NVIDIA H100 GPUs,” MAI said. “This model is designed to provide powerful capabilities to consumers seeking to benefit from models that specialize in following instructions and providing helpful responses to everyday queries.”

That’s significantly fewer GPUs than the 100,000 Nvidia H100s powering xAI’s Colossus supercomputer cluster. And it’s comparable with Meta’s Llama-3.1 model, which required over 16,000 Nvidia H100s.

Microsoft, which says its GB200 cluster is now operational, expects to expose MAI-1-preview for specific Copilot scenarios in the coming weeks, so it can gather data about the model’s performance.

OpenAI did not respond to a request for comment. ®

Source link