Google's Gemini AI Experiment With Robots

Ever wondered how robots could ace navigation like seasoned pros? Google’s Gemini AI experiment dives into this with style and smarts!

Google’s Gemini AI experiment focused on equipping robots with enhanced navigational abilities using the Gemini 1.5 Pro system. This system is distinguished by its capability to process a vast amount of contextual information—up to 1 million tokens—allowing robots to effectively interpret and utilize human instructions, video tours, and various multimodal inputs for navigation.

The Gemini 1.5 Pro system’s most critical feature is its ability to handle a vast context length, which enables robots to retain and utilize detailed spatial information over extended periods. This capability is crucial for navigating complex and dynamic environments without traditional mapping solutions.

How can Gemini 1.5 Pro’s long context window help robots navigate the world? 🤖

A thread of our latest experiments. 🧵 pic.twitter.com/ZRQqQDEw98

— Google DeepMind (@GoogleDeepMind) July 11, 2024

During the experiment, robots received instructions through multiple sensory channels:

Human instructions: Clear verbal commands and descriptive cues that guide robots to specific locations within a designated space.
Video tours: Visual representations of the environment, which help robots create a mental map and understand spatial relationships.
Map sketches and audio references: Additional cues provided through map sketches on whiteboards, audio instructions referencing key locations, and visual markers like toys or boxes strategically placed within the environment.

The experiment was conducted in a real-world operational area spanning over 9000 square feet. Within this space, robots were tasked with performing a diverse range of 57 specific tasks. These tasks encompassed various actions and operations that required the robots to navigate autonomously and efficiently based on the inputs provided.

We took the robots on a tour of specific areas in a real-world setting, highlighting key places to recall – such as "Lewis’s desk" or "temporary desk area". Then, they were asked to lead us to these locations. 🏢

Watch more. ↓ pic.twitter.com/Sptm6q31CL

— Google DeepMind (@GoogleDeepMind) July 11, 2024

Performance and success rate of Gemini-powered robots

According to Google’s findings, the Gemini-enabled robots achieved an impressive success rate of 90% across the 57 tasks assigned. This high success rate underscores the effectiveness of the Gemini 1.5 Pro system in enhancing robot autonomy and operational efficiency in complex environments.

Behind the scenes, the Gemini AI system processes the multimodal inputs received from the environment. It creates topological graphs—a simplified representation of spatial connectivity based on video frames and contextual instructions. These graphs serve as navigational maps that guide robots in real-time, enabling them to navigate without the need for continuous external mapping updates.

Google's Gemini AI experiment with robots — (Credit: Google DeepMind)

Need a recap? Google uses Gemini AI to train its robots for improved navigation and task completion. Robots can process extensive information with Gemini 1.5 Pro’s extended context window, enabling them to respond to natural language instructions more effectively. By filming video tours of environments like homes or offices, researchers teach robots to understand their surroundings. The robots, equipped with Gemini, achieved a 90% success rate across 50+ tasks in a 9,000+ square-foot area. Gemini also helps robots plan actions beyond navigation, such as fetching food from the fridge. While there are still processing delays of 10–30 seconds per instruction, Google aims to advance these capabilities further in future research.

Featured image credit: Google DeepMind/X

Tags: AI DeepMind gemini Google robot

Google’s Gemini AI experiment with robots

And it's quite successful

Related Posts

Meta releases Pocket app for generative AI games

Android Halo will place AI agent updates in status bar

Anthropic launches Claude Science workbench for researchers

ChatGPT Plus users can now connect financial accounts

Google rolls out Gemini Spark for macOS subscribers in the US

Google expands Gemini’s personalized image generation to all U.S. users

LATEST NEWS

Tesla brings long-wheelbase Model Y to the US

Opera adds protection against copy-paste ClickFix attacks

Cloudflare will block AI crawlers unless sites opt in

Meta releases Pocket app for generative AI games

Android Halo will place AI agent updates in status bar

WhatsApp usernames spark impersonation and fraud concerns

BEST AI MODELS LEADERBOARD

LATEST TOOLS

Instantchapters

Intellectia

ZipWP

Copyleaks – Plagiarism detector

Clipping Magic

KoalaChat

SpeechText

Booknotes

Unscrambler

LingoLooper

COPYRIGHT © DATACONOMY MEDIA GMBH, ALL RIGHTS RESERVED.