Gemini Omni Flash: the AI that understands the world to create video

Published on May 23, 2026 | Translated from Spanish

Google DeepMind introduced Gemini Omni, a family of generative AI models that process text, photos, audio, and video to create content. Its first model, Gemini Omni Flash, generates video clips by combining multimodal data with advanced knowledge of physical laws. Executives state that this technology has a superior understanding of the world compared to previous developments, marking a step toward more integrated artificial intelligence.

photorealistic technical scene of a glowing holographic globe surrounded by floating multimedia data fragments, a human hand reaching toward a translucent video creation interface, while streams of text, audio waveforms, and photographic thumbnails merge into a cinematic video clip, the globe displaying simulated physics trajectories like falling leaves and flowing water, dark studio environment with blue and cyan neon lighting, reflective surfaces on a sleek workstation, volumetric light beams passing through the hologram, ultra-detailed futuristic hardware panels in the background, engineering visualization style

How data and physics fusion works in the model 🧠

Gemini Omni Flash uses a unified architecture that processes multiple input types simultaneously. The model not only recognizes objects in video but predicts their behavior based on principles of gravity, collision, and spatial continuity. This allows it to generate coherent sequences where a glass breaks when falling or a ball bounces according to its mass. DeepMind trained the system with labeled data from real-world interactions, avoiding common hallucinations in other video generators.

Now the AI knows that an egg doesn't stick to the ceiling 🥚

Finally, an artificial intelligence that doesn't think objects float for no reason. Gemini Omni Flash knows that if you throw an egg, it breaks, and that a cat cannot walk through a wall. Google DeepMind developers must be proud: they have achieved a machine that understands that milk spills and doesn't turn into confetti. Meanwhile, other models keep generating videos where cars fly and people walk on water.