Google has introduced Gemini Omni, a model that allows transforming one video into another using natural language in a dialogue format. Unlike the previous Veo, this system edits original frames while maintaining scene coherence and character actions. It currently generates clips up to 10 seconds long with sound, although the company already plans to extend that limit.
Physics and historical context in every frame 🧠
The model relies on the Gemini ecosystem to generate scenes considering historical and scientific contexts. It reproduces phenomena such as gravity or fluid dynamics with precision, allowing, for example, changing the background of a medieval fight to a space storm without the characters floating like balloons. It also includes the creation of personalized digital avatars, using the system's vast knowledge to maintain visual logic.
Every YouTuber's dream: editing without opening After Effects 🎬
Now any mortal will be able to say change that cat for a dancing dinosaur and the video will obey. The downside is that if you ask for an 11-second clip, Gemini will look at you with digital disdain and remind you that it's still in beta. But hey, while you wait, you can create an avatar that does things you would never do, like cleaning the house. Human laziness, finally, has its tool.