3D reconstruction of interactions between people and scenes from videos or images usually generates visually plausible but physically unstable results. This gap between perception and simulation prevents its use in physics engines and embodied AI applications. We present HSImul3R, a unified framework that closes this gap through bidirectional optimization with active supervision from the physical simulator, producing simulation-ready reconstructions transferable to real humanoid robots. 🚀
Bidirectional Optimization with Physical Simulator Supervision ⚙️
HSImul3R integrates the physical simulator as an active supervisor in a two-way pipeline. In the forward direction, a scene-directed Reinforcement Learning optimizes human dynamics under dual supervision: fidelity to the captured motion and stability of contacts with objects. In the inverse direction, Simulation Reward Direct Optimization uses simulator feedback on gravitational stability and interaction success to refine the scene geometry. This joint cycle ensures that both the human avatar and objects comply with physical laws.
A Crucial Advance for Robotics and the Metaverse 🤖
This work goes beyond mere visualization, endowing digital humanoids with an essential physical foundation. By producing stable and simulable reconstructions, it enables training AI agents in realistic environments and transferring behaviors directly to physical robots. It is a key step for developing metaverse avatars that interact with physical coherence and for accelerating humanoid robot learning in complex real-world tasks.
How to ensure physical stability and biomechanical coherence in the 3D reconstruction of digital humanoids from video, avoiding artifacts like sinking into the floor or penetrations between bodies and objects?
(P.S.: Digital humanoids have the advantage that they never complain about the rigging.)