Google Reveals That Leading AI Chatbots Still Fail 30% Of The Time

Gemini 3 Pro emerged as the top-performing model with a 68.8% score, followed closely by Gemini 2.5 Pro and OpenAI's ChatGPT-5.

Google revealed artificial intelligence (AI) chatbots achieved an accuracy rate of 69% at best, according to a recent assessment. The company utilized its new FACTS Benchmark Suite to test the factual reliability of various AI models.

This benchmark has quantified a critical gap in AI performance, indicating that even leading models frequently produce incorrect information. For sectors such as finance, healthcare, and law, this inaccuracy presents significant risks where erroneous yet confidently delivered responses could lead to substantial damage.

The FACTS Benchmark Suite, developed by Google’s FACTS team in collaboration with Kaggle, specifically evaluates factual accuracy across four real-world application areas:

Parametric Knowledge: This tests a model’s ability to answer fact-based questions using only its pre-trained knowledge.
Search Performance: This assesses how effectively models leverage web tools to retrieve accurate information.
Grounding: This measures whether a model adheres to provided documents without introducing false details.
Multimodal Understanding: This examines the model’s accuracy in interpreting charts, diagrams, and images.

The assessments highlighted considerable performance differences among models. Gemini 3 Pro achieved the highest FACTS score at 69%. Gemini 2.5 Pro and OpenAI’s ChatGPT-5 followed with approximately 62%. Claude 4.5 Opus scored around 51%, and Grok 4 registered approximately 54%. Multimodal tasks consistently represented the weakest area, often showing accuracy levels below 50%. This lower performance in understanding visual data, such as sales graphs or numerical information from documents, poses a risk of critical errors that are difficult to detect or rectify.

Google stated that while AI technology continues to improve, it requires human oversight, verification, and robust guardrails before users can treat it as a reliable source of truth.

Featured image credit

Tags: AI Google

Google reveals that leading AI chatbots still fail 30% of the time

Gemini 3 Pro emerged as the top-performing model with a 68.8% score, followed closely by Gemini 2.5 Pro and OpenAI's ChatGPT-5.

Related Posts

Researchers create AI worm that adapts attacks without human input

Researchers unlock 20-fold enhancement in ultrafast laser experiments

NASA tests next-gen radiation-hardened space computer chip

Penn physicists use light-matter particles to boost AI chip speeds

Global AI spending to hit $2.59 trillion in 2026, says Gartner forecast

New CHEEM framework helps AI learn new tasks without forgetting old ones

LATEST NEWS

Amazon adds AI-generated product previews to search results

Meta launches AI business agents on WhatsApp, Instagram and Messenger

Nintendo will release a repair-friendly Switch 2 in Europe

Google rolls out Ask Gemini in Drive to eligible Workspace users

Google Wallet to add digital IDs from select EU countries this summer

Why Telegram Mini Apps have become the optimal ecosystem for launching AI SaaS products

BEST AI MODELS LEADERBOARD

LATEST TOOLS

Roboto AI

Pickaxe

Pfpmaker

MindPal

Syllaby

ScreenApp

FinanceBrain

GitHub Spark

Hints

VisionStory AI

COPYRIGHT © DATACONOMY MEDIA GMBH, ALL RIGHTS RESERVED.

Google reveals that leading AI chatbots still fail 30% of the time

Gemini 3 Pro emerged as the top-performing model with a 68.8% score, followed closely by Gemini 2.5 Pro and OpenAI's ChatGPT-5.

Stay Ahead of the Curve!

Related Posts

LATEST NEWS

COPYRIGHT © DATACONOMY MEDIA GMBH, ALL RIGHTS RESERVED.

Follow Us