Mistral has released a new open-source model — and this time it’s not about text, it’s about voice. Voxtral TTS is a text-to-speech model compact enough to run on a smartwatch. And yes, it’s open source.
What Voxtral TTS can do
The model supports nine languages: English, French, German, Spanish, Dutch, Portuguese, Italian, Hindi, and Arabic. It’s built on Mistral’s Ministral 3B, making it small enough for smartphones, laptops, and other edge devices.
The most impressive specs:
- 90 milliseconds time-to-first-audio (for 500 characters)
- Voice cloning from less than 5 seconds of audio
- Captures accents, intonation, and speech flow of the original voice
- Seamless language switching without losing voice characteristics — useful for dubbing and real-time translation
What it’s for
Mistral is positioning Voxtral TTS squarely at the enterprise market: voice agents for sales and customer service that sound like real humans. This puts Mistral in direct competition with ElevenLabs, Deepgram, and OpenAI.
The advantage: open source and customizable. Companies can adapt the model however they want, run it on their own servers, and keep costs under control.
The bigger picture
Voxtral TTS is part of a larger strategy. Earlier this year, Mistral released transcription models — one for batch processing, one for real-time use. With the new TTS model, the circle is complete: input (transcription) and output (speech synthesis) now come from a single provider.
Pierre Stock, VP of Science Operations at Mistral, laid out the vision clearly: an end-to-end platform for multimodal streams — audio, text, and image as both input and output. That sounds like a complete agent stack.
My take
Mistral’s strength has always been packing big capabilities into small packages — while staying open source. Voxtral TTS fits that pattern perfectly. A speech model that runs on a smartwatch and can clone voices from five seconds of audio is genuinely impressive.
For the European market, this is particularly relevant: nine languages including German right from launch, and the option to run everything on-premises. That’s exactly what GDPR-conscious companies want to hear.
Sources: