Dataconomy
  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Whitepapers
    • AI Models Leaderboard
  • AI toolsNEW
  • Newsletter
  • + More
    • Glossary
    • Conversations
    • Events
    • About
      • Who we are
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
Subscribe
No Result
View All Result
  • AI
  • Tech
  • Cybersecurity
  • Finance
  • DeFi & Blockchain
  • Startups
  • Gaming
Dataconomy
  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Whitepapers
    • AI Models Leaderboard
  • AI toolsNEW
  • Newsletter
  • + More
    • Glossary
    • Conversations
    • Events
    • About
      • Who we are
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
Subscribe
No Result
View All Result
Dataconomy
No Result
View All Result

Google reveals that leading AI chatbots still fail 30% of the time

Gemini 3 Pro emerged as the top-performing model with a 68.8% score, followed closely by Gemini 2.5 Pro and OpenAI's ChatGPT-5.

byEmre Çıtak
December 19, 2025
in Research
Home Research
Share on FacebookShare on TwitterShare on LinkedInShare on WhatsAppShare on e-mail
Google Preferred Source

Google revealed artificial intelligence (AI) chatbots achieved an accuracy rate of 69% at best, according to a recent assessment. The company utilized its new FACTS Benchmark Suite to test the factual reliability of various AI models.

This benchmark has quantified a critical gap in AI performance, indicating that even leading models frequently produce incorrect information. For sectors such as finance, healthcare, and law, this inaccuracy presents significant risks where erroneous yet confidently delivered responses could lead to substantial damage.

The FACTS Benchmark Suite, developed by Google’s FACTS team in collaboration with Kaggle, specifically evaluates factual accuracy across four real-world application areas:

Stay Ahead of the Curve!

Don't miss out on the latest insights, trends, and analysis in the world of data, technology, and startups. Subscribe to our newsletter and get exclusive content delivered straight to your inbox.

  • Parametric Knowledge: This tests a model’s ability to answer fact-based questions using only its pre-trained knowledge.
  • Search Performance: This assesses how effectively models leverage web tools to retrieve accurate information.
  • Grounding: This measures whether a model adheres to provided documents without introducing false details.
  • Multimodal Understanding: This examines the model’s accuracy in interpreting charts, diagrams, and images.

The assessments highlighted considerable performance differences among models. Gemini 3 Pro achieved the highest FACTS score at 69%. Gemini 2.5 Pro and OpenAI’s ChatGPT-5 followed with approximately 62%. Claude 4.5 Opus scored around 51%, and Grok 4 registered approximately 54%. Multimodal tasks consistently represented the weakest area, often showing accuracy levels below 50%. This lower performance in understanding visual data, such as sales graphs or numerical information from documents, poses a risk of critical errors that are difficult to detect or rectify.

Google stated that while AI technology continues to improve, it requires human oversight, verification, and robust guardrails before users can treat it as a reliable source of truth.


Featured image credit

Tags: AIGoogle

Related Posts

Researchers create AI worm that adapts attacks without human input

Researchers create AI worm that adapts attacks without human input

June 4, 2026
Researchers unlock 20-fold enhancement in ultrafast laser experiments

Researchers unlock 20-fold enhancement in ultrafast laser experiments

June 3, 2026
NASA tests next-gen radiation-hardened space computer chip

NASA tests next-gen radiation-hardened space computer chip

May 29, 2026
Penn physicists use light-matter particles to boost AI chip speeds

Penn physicists use light-matter particles to boost AI chip speeds

May 29, 2026
Global AI spending to hit .59 trillion in 2026, says Gartner forecast

Global AI spending to hit $2.59 trillion in 2026, says Gartner forecast

May 28, 2026
New CHEEM framework helps AI learn new tasks without forgetting old ones

New CHEEM framework helps AI learn new tasks without forgetting old ones

May 27, 2026

LATEST NEWS

Amazon adds AI-generated product previews to search results

Meta launches AI business agents on WhatsApp, Instagram and Messenger

Nintendo will release a repair-friendly Switch 2 in Europe

Google rolls out Ask Gemini in Drive to eligible Workspace users

Google Wallet to add digital IDs from select EU countries this summer

Why Telegram Mini Apps have become the optimal ecosystem for launching AI SaaS products

BEST AI MODELS LEADERBOARD

See the best AI models, ranked by intelligence, benchmark results, speed and token price. Find the most suitable LLMs, Text-to-Image, Image Editing, Text-to-Speech, Text-to-Video and Image-to-Video  artificial intelligence model for your tasks and business.

LATEST TOOLS

Roboto AI

Pickaxe

Pfpmaker

MindPal

Syllaby

ScreenApp

FinanceBrain

GitHub Spark

Hints

VisionStory AI

Dataconomy

COPYRIGHT © DATACONOMY MEDIA GMBH, ALL RIGHTS RESERVED.

  • About
  • Imprint
  • Contact
  • Legal & Privacy

Follow Us

  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Whitepapers
    • AI Models Leaderboard
  • AI tools
  • Newsletter
  • + More
    • Glossary
    • Conversations
    • Events
    • About
      • Who we are
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
No Result
View All Result
Subscribe

This website uses cookies to improve your experience. You can choose to accept or reject them. Visit our Privacy Policy.