A team from the Technical University of Munich has presented a robotic system designed to locate everyday lost objects, such as glasses or a remote control. It combines real-time 3D map creation with contextual knowledge extracted from the internet. The goal is for the robot not only to navigate but to interpret the environment with human-like logic to optimize the search.
The fusion of spatial vision and language models 🤖
The robot, equipped with a depth camera, builds a detailed three-dimensional map of the space, labeling objects and furniture. The innovation lies in integrating two AI systems: one for visual recognition and another, a large language model. The latter provides general knowledge about the use of spaces, allowing the robot to deduce where an object is most likely to be found. Thus, it prioritizes searching for keys on a table rather than in the fridge.
Goodbye to searching for glasses... that are on your forehead 😅
With this development, perhaps soon we can delegate to a robot that frantic search for the glasses that, invariably, are on our head. The irony would be that, after meticulously mapping the house and applying its artificial common sense, the robot points at us with its mechanical arm while emitting a soft beep of disappointment. A technological reminder of our own distraction.