Dataconomy
  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Whitepapers
  • AI toolsNEW
  • Newsletter
  • + More
    • Glossary
    • Conversations
    • Events
    • About
      • Who we are
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
Subscribe
No Result
View All Result
  • AI
  • Tech
  • Cybersecurity
  • Finance
  • DeFi & Blockchain
  • Startups
  • Gaming
Dataconomy
  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Whitepapers
  • AI toolsNEW
  • Newsletter
  • + More
    • Glossary
    • Conversations
    • Events
    • About
      • Who we are
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
Subscribe
No Result
View All Result
Dataconomy
No Result
View All Result

Google reveals that leading AI chatbots still fail 30% of the time

Gemini 3 Pro emerged as the top-performing model with a 68.8% score, followed closely by Gemini 2.5 Pro and OpenAI's ChatGPT-5.

byEmre Çıtak
December 19, 2025
in Research
Home Research
Share on FacebookShare on TwitterShare on LinkedInShare on WhatsAppShare on e-mail

Google revealed artificial intelligence (AI) chatbots achieved an accuracy rate of 69% at best, according to a recent assessment. The company utilized its new FACTS Benchmark Suite to test the factual reliability of various AI models.

This benchmark has quantified a critical gap in AI performance, indicating that even leading models frequently produce incorrect information. For sectors such as finance, healthcare, and law, this inaccuracy presents significant risks where erroneous yet confidently delivered responses could lead to substantial damage.

The FACTS Benchmark Suite, developed by Google’s FACTS team in collaboration with Kaggle, specifically evaluates factual accuracy across four real-world application areas:

Stay Ahead of the Curve!

Don't miss out on the latest insights, trends, and analysis in the world of data, technology, and startups. Subscribe to our newsletter and get exclusive content delivered straight to your inbox.

  • Parametric Knowledge: This tests a model’s ability to answer fact-based questions using only its pre-trained knowledge.
  • Search Performance: This assesses how effectively models leverage web tools to retrieve accurate information.
  • Grounding: This measures whether a model adheres to provided documents without introducing false details.
  • Multimodal Understanding: This examines the model’s accuracy in interpreting charts, diagrams, and images.

The assessments highlighted considerable performance differences among models. Gemini 3 Pro achieved the highest FACTS score at 69%. Gemini 2.5 Pro and OpenAI’s ChatGPT-5 followed with approximately 62%. Claude 4.5 Opus scored around 51%, and Grok 4 registered approximately 54%. Multimodal tasks consistently represented the weakest area, often showing accuracy levels below 50%. This lower performance in understanding visual data, such as sales graphs or numerical information from documents, poses a risk of critical errors that are difficult to detect or rectify.

Google stated that while AI technology continues to improve, it requires human oversight, verification, and robust guardrails before users can treat it as a reliable source of truth.


Featured image credit

Tags: AIGoogle

Related Posts

Appfigures: Mobile app spending hits record 5.8 billion

Appfigures: Mobile app spending hits record $155.8 billion

January 15, 2026
Engineers build grasshopper-inspired robots to solve battery drain

Engineers build grasshopper-inspired robots to solve battery drain

January 14, 2026
Global memory chip shortage sends PC prices soaring

Global memory chip shortage sends PC prices soaring

January 12, 2026
63% of new AI models are now based on Chinese tech

63% of new AI models are now based on Chinese tech

January 12, 2026
Physics at -271°C: How the cold is heating up quantum computing

Physics at -271°C: How the cold is heating up quantum computing

January 8, 2026
Nature study projects 2B wearable health devices by 2050

Nature study projects 2B wearable health devices by 2050

January 7, 2026

LATEST NEWS

Is Twitter down? Users report access issues as X won’t open

Paramount+ raises subscription prices and terminates free trials for 2026

Capcom reveals Resident Evil Requiem gameplay and February release date

Mother of one of Elon Musk’s children sues xAI over sexual Grok deepfakes

Samsung revamps Mobile Gaming Hub to fix broken game discovery

Bluesky launches Live Now badge and cashtags in major update

Dataconomy

COPYRIGHT © DATACONOMY MEDIA GMBH, ALL RIGHTS RESERVED.

  • About
  • Imprint
  • Contact
  • Legal & Privacy

Follow Us

  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Whitepapers
  • AI tools
  • Newsletter
  • + More
    • Glossary
    • Conversations
    • Events
    • About
      • Who we are
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
No Result
View All Result
Subscribe

This website uses cookies. By continuing to use this website you are giving consent to cookies being used. Visit our Privacy Policy.