Dataconomy
  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Whitepapers
  • AI toolsNEW
  • Newsletter
  • + More
    • Glossary
    • Conversations
    • Events
    • About
      • About
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
Subscribe
No Result
View All Result
  • AI
  • Tech
  • Cybersecurity
  • Finance
  • DeFi & Blockchain
  • Startups
  • Gaming
Dataconomy
  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Whitepapers
  • AI toolsNEW
  • Newsletter
  • + More
    • Glossary
    • Conversations
    • Events
    • About
      • About
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
Subscribe
No Result
View All Result
Dataconomy
No Result
View All Result

Google reveals that leading AI chatbots still fail 30% of the time

Gemini 3 Pro emerged as the top-performing model with a 68.8% score, followed closely by Gemini 2.5 Pro and OpenAI's ChatGPT-5.

byEmre Çıtak
December 19, 2025
in Research
Home Research
Share on FacebookShare on TwitterShare on LinkedInShare on WhatsAppShare on e-mail

Google revealed artificial intelligence (AI) chatbots achieved an accuracy rate of 69% at best, according to a recent assessment. The company utilized its new FACTS Benchmark Suite to test the factual reliability of various AI models.

This benchmark has quantified a critical gap in AI performance, indicating that even leading models frequently produce incorrect information. For sectors such as finance, healthcare, and law, this inaccuracy presents significant risks where erroneous yet confidently delivered responses could lead to substantial damage.

The FACTS Benchmark Suite, developed by Google’s FACTS team in collaboration with Kaggle, specifically evaluates factual accuracy across four real-world application areas:

Stay Ahead of the Curve!

Don't miss out on the latest insights, trends, and analysis in the world of data, technology, and startups. Subscribe to our newsletter and get exclusive content delivered straight to your inbox.

  • Parametric Knowledge: This tests a model’s ability to answer fact-based questions using only its pre-trained knowledge.
  • Search Performance: This assesses how effectively models leverage web tools to retrieve accurate information.
  • Grounding: This measures whether a model adheres to provided documents without introducing false details.
  • Multimodal Understanding: This examines the model’s accuracy in interpreting charts, diagrams, and images.

The assessments highlighted considerable performance differences among models. Gemini 3 Pro achieved the highest FACTS score at 69%. Gemini 2.5 Pro and OpenAI’s ChatGPT-5 followed with approximately 62%. Claude 4.5 Opus scored around 51%, and Grok 4 registered approximately 54%. Multimodal tasks consistently represented the weakest area, often showing accuracy levels below 50%. This lower performance in understanding visual data, such as sales graphs or numerical information from documents, poses a risk of critical errors that are difficult to detect or rectify.

Google stated that while AI technology continues to improve, it requires human oversight, verification, and robust guardrails before users can treat it as a reliable source of truth.


Featured image credit

Tags: AIGoogle

Related Posts

Scientists discover more than 17,000 new species

Scientists discover more than 17,000 new species

December 25, 2025
GPT-5.2 surpasses expert PhD baseline with 92% science score

GPT-5.2 surpasses expert PhD baseline with 92% science score

December 24, 2025
Why DIG AI is the most dangerous malicious AI of 2025

Why DIG AI is the most dangerous malicious AI of 2025

December 23, 2025
Pew Research reveals significant racial gaps in teen AI chatbot usage

Pew Research reveals significant racial gaps in teen AI chatbot usage

December 23, 2025
MIT’s JETS model predicts disease from Apple Watch data

MIT’s JETS model predicts disease from Apple Watch data

December 22, 2025
Sodium-ion batteries edge closer to fast charging as researchers crack ion bottlenecks

Sodium-ion batteries edge closer to fast charging as researchers crack ion bottlenecks

December 19, 2025

LATEST NEWS

Stock watch: Nvidia, Samsung, AMD, and Intel updates (December 26)

OnePlus Turbo spotted with 9000 mAh battery

5 essential Mac apps to transform your productivity in 2026

One year in Tiangong: China to test long-duration space stay in 2026

Google NotebookLM introduces “Lecture Mode” for 30-minute AI learning

ChatGPT evolves into an office suite with new formatting blocks

Dataconomy

COPYRIGHT © DATACONOMY MEDIA GMBH, ALL RIGHTS RESERVED.

  • About
  • Imprint
  • Contact
  • Legal & Privacy

Follow Us

  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Whitepapers
  • AI tools
  • Newsletter
  • + More
    • Glossary
    • Conversations
    • Events
    • About
      • About
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
No Result
View All Result
Subscribe

This website uses cookies. By continuing to use this website you are giving consent to cookies being used. Visit our Privacy Policy.