Xiaomi has unveiled OmniVoice, an open-source artificial intelligence model for text-to-speech conversion. The tool supports hundreds of languages, including voice cloning and customizable speech generation. According to the company, it excels particularly in Chinese and English, surpassing commercial systems in several tasks. Its strong point: it can generate voice in languages with limited training data, facilitating access to minority languages.
How OmniVoice handles low-resource languages 🗣️
OmniVoice uses a transformer-based architecture and multi-task training to achieve speech synthesis under data-limited conditions. The model leverages shared representations across languages, allowing knowledge transfer from resource-rich languages to those with fewer resources. Xiaomi claims that in blind tests, OmniVoice matches or exceeds the naturalness of proprietary systems like those from Google or Microsoft, especially in Chinese tones and intonations. The source code and weights are available on GitHub under the Apache 2.0 license, allowing developers to adapt it to their needs.
Now even your toaster can complain in 500 languages 🤖
With OmniVoice, any startup with three euros and a laptop can clone their neighbor's voice to make them say to return the drill. The best part is, if you don't have data to train the model in your local language, Xiaomi promises that with four WhatsApp audios and one TikTok video, you'll have enough. Soon we'll see voice assistants in fire extinguishers or the fridge reciting poetry in Swahili. The only thing missing is for it to learn to say I forgot the groceries with the right tone of guilt.