Google DeepMind introduced Gemini Omni, a family of generative AI models that process text, photos, audio, and video to create content. Its first model, Gemini Omni Flash, generates video clips by combining multimodal data with advanced knowledge of physical laws. Executives state that this technology has a superior understanding of the world compared to previous developments, marking a step toward more integrated artificial intelligence.
How data and physics fusion works in the model 🧠
Gemini Omni Flash uses a unified architecture that processes multiple input types simultaneously. The model not only recognizes objects in video but predicts their behavior based on principles of gravity, collision, and spatial continuity. This allows it to generate coherent sequences where a glass breaks when falling or a ball bounces according to its mass. DeepMind trained the system with labeled data from real-world interactions, avoiding common hallucinations in other video generators.
Now the AI knows that an egg doesn't stick to the ceiling 🥚
Finally, an artificial intelligence that doesn't think objects float for no reason. Gemini Omni Flash knows that if you throw an egg, it breaks, and that a cat cannot walk through a wall. Google DeepMind developers must be proud: they have achieved a machine that understands that milk spills and doesn't turn into confetti. Meanwhile, other models keep generating videos where cars fly and people walk on water.