Google DeepMind Boosts Robot Intelligence with AI

Google DeepMind Unveils Advanced AI Models for Robots

Google DeepMind has taken a major step forward in the field of robotics with the release of two new artificial intelligence (AI) models designed to enhance the reasoning and problem-solving capabilities of robots. The new models, named Gemini Robotics 1.5 and Gemini Robotics-ER 1.5, represent a significant evolution from the company’s earlier AI initiatives and mark an important milestone in the development of general-purpose robots capable of performing complex, real-world tasks.

Contents

Google DeepMind Unveils Advanced AI Models for Robots

From Simple Commands to Multistep Reasoning

A New Model for Robotic Intelligence

Leveraging Real-World Data and Tools

Transferable Learning Across Robotic Systems

Real-World Applications and Future Outlook

From Simple Commands to Multistep Reasoning

Earlier in the year, DeepMind introduced the first version of Gemini Robotics, an AI system derived from its Gemini large language model (LLM) and tailored specifically for robotics. While the original model allowed robots to complete simple tasks like placing a banana in a basket, the new iterations greatly expand on that functionality. Now, robots can handle multistep, long-horizon tasks and explain their actions in natural language.

In one demonstration, a pair of robotic arms — DeepMind’s Aloha 2 robot — successfully sorted a banana, an apple, and a lime onto color-matched plates. As it worked, the robot described what it was doing and provided reasoning for each action. This capability showcases the model’s ability to perceive the environment, plan tasks, and act accordingly.

A New Model for Robotic Intelligence

According to Jie Tan, a senior staff research scientist at DeepMind, “We enable it to think. It can perceive the environment, think step-by-step and then finish this multistep task.” The development is more than a technological novelty — it reflects a paradigm shift in how robots process and act on information.

The two models work together in a supervisor-worker dynamic. Gemini Robotics-ER 1.5 acts as the “brain,” functioning as a vision-language model (VLM) that interprets natural language commands and situational context. It processes this data and sends instructions to Gemini Robotics 1.5, the “hands and eyes,” which is a vision-language-action (VLA) model. This second model executes the physical tasks while providing feedback about its reasoning process.

Leveraging Real-World Data and Tools

What makes these models exceptionally powerful is their ability to integrate external tools such as Google Search. In one experiment, a researcher asked a robot to sort waste into compost, recycling, and trash bins according to local rules. The robot identified the user’s location as San Francisco and used online search to access the city’s recycling guidelines. It then proceeded to sort the waste correctly, demonstrating an impressive level of contextual awareness and adaptability.

Transferable Learning Across Robotic Systems

Another breakthrough with the Gemini Robotics models is their capacity for generalized learning across various robotic platforms. DeepMind states that any learning acquired through one robot, such as the Aloha 2 dual-arm robot, the Apollo humanoid robot, or the Franka bi-arm system, can be applied to others. This interoperability is possible due to the models’ generalized approach to understanding physical tasks and environments.

This contrasts sharply with earlier AI systems, which were typically trained to perform narrowly defined tasks on specific hardware. The new models embody a more universal approach, enabling robots to adapt and apply their knowledge across different setups and scenarios.

Real-World Applications and Future Outlook

In another test, the Apollo robot was asked to sort clothing by color into two different bins. Initially, it performed the task without issue. Then, to test its adaptability, researchers moved the bins and clothing. The robot successfully reassessed the situation, demonstrating its ability to re-evaluate and adjust to dynamic environments.

These capabilities are essential for the development of truly general-purpose robots. As the Gemini Robotics Team noted in their technical report, “General-purpose robots need a deep understanding of the physical world, advanced reasoning, and general and dexterous control.”

The models’ approach to task execution — breaking down complex problems into manageable steps — signals a new era in robotics. No longer restricted to repetitive or narrowly defined roles, robots powered by these AI models can potentially assist in a wide variety of everyday tasks, from household chores to industrial applications.

This article is inspired by content from Original Source. It has been rephrased for originality. Images are credited to the original source.