Dataconomy
  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Whitepapers
    • AI Models Leaderboard
  • AI toolsNEW
  • Newsletter
  • + More
    • Glossary
    • Conversations
    • Events
    • About
      • Who we are
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
Subscribe
No Result
View All Result
  • AI
  • Tech
  • Cybersecurity
  • Finance
  • DeFi & Blockchain
  • Startups
  • Gaming
Dataconomy
  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Whitepapers
    • AI Models Leaderboard
  • AI toolsNEW
  • Newsletter
  • + More
    • Glossary
    • Conversations
    • Events
    • About
      • Who we are
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
Subscribe
No Result
View All Result
Dataconomy
No Result
View All Result

AI masters language but flunks LEGO 101

The researchers designed a clever benchmark called LEGO-Puzzles precisely because building LEGOs mirrors how humans develop "spatial intelligence." Following those little diagrams requires understanding 3D shapes, how they fit together, their orientation, and the correct sequence of actions. If an AI can't handle that, how can we expect it to guide a robot arm assembling a product or navigate a self-driving car through a complex construction zone?

byKerem Gülen
March 27, 2025
in Research
Home Research
Share on FacebookShare on TwitterShare on LinkedInShare on WhatsAppShare on e-mail
Google Preferred Source

We hear constantly about the incredible feats of AI like GPT-4o and Gemini – writing code, crafting poetry, acing exams. You might think these powerful Multimodal Large Language Models (MLLMs), which understand both text and images, are well on their way to mastering everything. But what happens when you ask them to do something seemingly simple, like follow LEGO instructions?

According to a new study from researchers at the Shanghai AI Laboratory and Tongji University, the answer is: they largely fail. These AI wizards, it turns out, are surprisingly clumsy when it comes to understanding and reasoning about objects in space over multiple steps – a skill crucial for interacting with the real world.

Why test AI with LEGOs?

The researchers designed a clever benchmark called LEGO-Puzzles precisely because building LEGOs mirrors how humans develop “spatial intelligence.” Following those little diagrams requires understanding 3D shapes, how they fit together, their orientation, and the correct sequence of actions. If an AI can’t handle that, how can we expect it to guide a robot arm assembling a product or navigate a self-driving car through a complex construction zone?

Stay Ahead of the Curve!

Don't miss out on the latest insights, trends, and analysis in the world of data, technology, and startups. Subscribe to our newsletter and get exclusive content delivered straight to your inbox.

The LEGO-Puzzles benchmark isn’t child’s play. It includes over 1,100 visual questions spanning 11 different tasks. These range from basic checks (“Is this piece taller than that one?”, “Are these two blocks touching?”) to complex sequences (“Put these assembly steps in the right order,” “Which image shows the wrong step?”).

The surprising scorecard: AI vs humans

So, how did today’s top AI models fare on these LEGO challenges? The results were striking, and frankly, a bit embarrassing for the AI.

  • Massive gap: Even the best models, like OpenAI’s GPT-4o and Google’s Gemini-2.0-Flash, only answered about 50-58% of the questions correctly.
  • Human triumph: Human participants, in contrast, breezed through the puzzles with over 90% accuracy.
  • Open-source struggles: Many open-source MLLMs performed only slightly better than random guessing. Some completely failed specific tasks, like ordering assembly steps, sometimes just outputting the same wrong letter for almost every question.

The AI particularly struggled with tasks involving:

  • Height perception: Often confusing a 2D image projection with 3D reality (think optical illusions).
  • Rotation: Understanding how objects look after being turned.
  • Multi-step reasoning: The more steps involved in a sequence, the worse the AI performed, highlighting a failure to track changes over time.

KAIST grew brains for AI that can learn right off devices


Can AI even show us the next step?

Perhaps even more telling was the image generation test. Researchers asked MLLMs to generate an image showing the result of a specific LEGO assembly step.

The outcome? A near-total failure. Most models either ignored the instructions, simply copied the input image, or generated something completely unrelated. Only Gemini-2.0-Flash and GPT-4o showed a “limited ability” – Gemini was better at editing the existing image accurately, while GPT-4o seemed to regenerate the scene conceptually, often losing visual consistency. The open-source models were hopelessly lost.

This research exposes a critical weakness in current AI development. While models excel at pattern matching in language and static images, they lack a robust grasp of multi-step spatial reasoning – the dynamic understanding of how things work in physical space and time.

The study found that even prompting techniques like “Chain-of-Thought” (asking the AI to “think step-by-step”), which often help with text problems, provided minimal benefit and sometimes even hindered performance on these spatial tasks, especially complex ones.

It seems that truly understanding our 3D world and how actions unfold within it requires more than just processing massive amounts of text and images. MLLMs need better ways to represent space, track changes sequentially, and perhaps develop a form of “visual memory.”


Featured image credit: Kerem Gülen/Imagen 3

Tags: AI

Related Posts

Harvard and Boston Children’s use AI to revisit unsolved genetic cases

Harvard and Boston Children’s use AI to revisit unsolved genetic cases

June 19, 2026
Adobe report finds 86% of creators now use generative AI in workflows

Adobe report finds 86% of creators now use generative AI in workflows

June 17, 2026
AI transfer learning speeds cosmology research but has hidden risks

AI transfer learning speeds cosmology research but has hidden risks

June 15, 2026
Phishing scams targeting travelers hit record levels in 2026

Phishing scams targeting travelers hit record levels in 2026

June 15, 2026
Most UK SMEs now consult AI before their accountants

Most UK SMEs now consult AI before their accountants

June 12, 2026
Faith in large employers is fading among UK workers

Faith in large employers is fading among UK workers

June 5, 2026

LATEST NEWS

OpenAI improves health responses for free ChatGPT users

Adobe expands Firefly AI across Premiere, Illustrator, InDesign and Frame.io

Spotify launches Reserved to give superfans early ticket access

Google discontinues Nest Home Mini and Nest Audio

Instagram adds unique captions for each carousel slide

Steam Next Fest sees one in five demos labeled for generative AI

BEST AI MODELS LEADERBOARD

See the best AI models, ranked by intelligence, benchmark results, speed and token price. Find the most suitable LLMs, Text-to-Image, Image Editing, Text-to-Speech, Text-to-Video and Image-to-Video  artificial intelligence model for your tasks and business.

LATEST TOOLS

Novoresume

PolyAI

SeaArt

H2O.ai

Techpresso

Namecheap Free Logo Maker

Binaural Beats Factory

Lyricallabs

Jobscan

Vsub

Dataconomy

COPYRIGHT © DATACONOMY MEDIA GMBH, ALL RIGHTS RESERVED.

  • About
  • Imprint
  • Contact
  • Legal & Privacy

Follow Us

  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Whitepapers
    • AI Models Leaderboard
  • AI tools
  • Newsletter
  • + More
    • Glossary
    • Conversations
    • Events
    • About
      • Who we are
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
No Result
View All Result
Subscribe

This website uses cookies to improve your experience. You can choose to accept or reject them. Visit our Privacy Policy.