Gemini Omni Flash: 理解世界以创造视频的人工智能

Google DeepMind 推出了 Gemini Omni，这是一系列生成式 AI 模型，可处理文本、照片、音频和视频以创建内容。其首个模型 Gemini Omni Flash 通过结合多模态数据与对物理定律的先进理解来生成视频片段。高管们表示，这项技术对世界的理解超越了以往的开发成果，标志着向更集成的人工智能迈出了一步。

photorealistic technical scene of a glowing holographic globe surrounded by floating multimedia data fragments, a human hand reaching toward a translucent video creation interface, while streams of text, audio waveforms, and photographic thumbnails merge into a cinematic video clip, the globe displaying simulated physics trajectories like falling leaves and flowing water, dark studio environment with blue and cyan neon lighting, reflective surfaces on a sleek workstation, volumetric light beams passing through the hologram, ultra-detailed futuristic hardware panels in the background, engineering visualization style

数据与物理在模型中的融合原理 🧠

Gemini Omni Flash 采用统一架构，可同时处理多种输入类型。该模型不仅能识别视频中的物体，还能基于重力、碰撞和空间连续性原理预测其行为。这使得它能够生成连贯的序列，例如杯子掉落时碎裂，或球根据其质量反弹。DeepMind 使用标记的真实世界交互数据训练该系统，避免了其他视频生成器中常见的幻觉问题。

现在 AI 知道鸡蛋不会粘在天花板上 🥚

终于有一个人工智能不再认为物体会无缘无故漂浮。Gemini Omni Flash 知道如果你扔一个鸡蛋，它会碎掉，并且猫无法穿过墙壁。Google DeepMind 的开发人员一定感到自豪：他们成功让一台机器理解了牛奶会洒出来，而不会变成五彩纸屑。与此同时，其他模型仍在生成汽车飞行、人行走在水面上的视频。