Xiaomi has introduced OmniVoice, an open-source artificial intelligence model for text-to-speech conversion. The tool supports hundreds of languages, including voice cloning and customizable speech generation. According to the company, it excels particularly in Chinese and English, surpassing commercial systems in several tasks. Its strong point: it can generate voice in languages with limited training data, facilitating access to minority languages.
How OmniVoice handles low-resource languages 🗣️
OmniVoice uses a transformer-based architecture and multi-task training to achieve speech synthesis under data-limited conditions. The model leverages shared representations between languages, allowing knowledge transfer from resource-rich languages to those with fewer resources. Xiaomi claims that in blind tests, OmniVoice matches or exceeds the naturalness of proprietary systems like those from Google or Microsoft, especially in tones and intonations of Chinese. The source code and weights are available on GitHub under the Apache 2.0 license, allowing developers to adapt it to their needs.
Now even your toaster can complain in 500 languages 🤖
With OmniVoice, any startup with three euros and a laptop can clone their neighbor's voice to make them say to return the drill. The best part is, if you don't have data to train the model in your local language, Xiaomi promises that with four WhatsApp audios and a TikTok video, you'll have enough. Soon we'll see voice assistants in fire extinguishers or in the fridge reciting poetry in Swahili. The only thing missing is for it to learn to say I forgot the groceries with the right tone of guilt.