Dataconomy
  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Whitepapers
    • AI Models Leaderboard
  • AI toolsNEW
  • Newsletter
  • + More
    • Glossary
    • Conversations
    • Events
    • About
      • Who we are
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
Subscribe
No Result
View All Result
  • AI
  • Tech
  • Cybersecurity
  • Finance
  • DeFi & Blockchain
  • Startups
  • Gaming
Dataconomy
  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Whitepapers
    • AI Models Leaderboard
  • AI toolsNEW
  • Newsletter
  • + More
    • Glossary
    • Conversations
    • Events
    • About
      • Who we are
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
Subscribe
No Result
View All Result
Dataconomy
No Result
View All Result

Study finds ChatGPT-5 has 25% error rate

New research shows OpenAI’s ChatGPT-5 makes 45% fewer factual errors and six times fewer hallucinations than GPT-4 but still answers incorrectly in about a quarter of cases.

byKerem Gülen
September 25, 2025
in Artificial Intelligence
Home News Artificial Intelligence
Share on FacebookShare on TwitterShare on LinkedInShare on WhatsAppShare on e-mail
Google Preferred Source

A study on OpenAI’s ChatGPT-5 model determined it generates incorrect answers in approximately 25% of cases. The research attributes these inaccuracies to inherent limitations within the model’s training data and its probabilistic reasoning architecture, as detailed in a Tom’s Guide report.

The model demonstrates a notable reduction in errors compared to its predecessor, GPT-4, registering 45% fewer factual mistakes and six times fewer instances of “hallucinated,” or entirely fabricated, answers. Despite these advancements, the study confirms that ChatGPT-5 can still exhibit overconfidence, a phenomenon where it presents factually incorrect information with a high degree of certainty. This persistence of hallucination, though diminished, remains a core issue affecting its reliability.

Performance accuracy varies significantly depending on the specific domain of the task. For example, the model achieved a 94.6% accuracy score on the 2025 AIME mathematics test and a 74.9% success rate on a set of real-world coding assignments. The research indicates that errors become more prevalent in tasks that involve general knowledge or require complex, multi-step reasoning, where the model’s performance is less consistent.

Stay Ahead of the Curve!

Don't miss out on the latest insights, trends, and analysis in the world of data, technology, and startups. Subscribe to our newsletter and get exclusive content delivered straight to your inbox.

When evaluated against the MMLU Pro benchmark, a rigorous academic test covering a wide range of subjects including science, mathematics, and history, ChatGPT-5 scored approximately 87% accuracy. The study identifies several underlying causes for the remaining errors. These include an inability to fully comprehend nuanced questions, reliance on training data that may be outdated or incomplete, and the model’s fundamental design as a probabilistic pattern-prediction mechanism, which can generate responses that are plausible but not factually correct.

Based on these findings, the report advises users to independently verify any critical information produced by ChatGPT-5. This recommendation is especially pertinent for professional, academic, or health-related inquiries where precision is essential. The consistent error rate, even with marked improvements, underscores the need for cautious use and external validation of the model’s outputs.


Featured image credit

Tags: ChatGPT-5Featured

Related Posts

Samsung adopts ChatGPT Enterprise and Codex across global workforce

Samsung adopts ChatGPT Enterprise and Codex across global workforce

June 22, 2026
OpenAI improves health responses for free ChatGPT users

OpenAI improves health responses for free ChatGPT users

June 19, 2026
Steam Next Fest sees one in five demos labeled for generative AI

Steam Next Fest sees one in five demos labeled for generative AI

June 17, 2026
Anthropic adds multilingual and push-to-talk features to Claude Voice Mode

Anthropic adds multilingual and push-to-talk features to Claude Voice Mode

June 17, 2026
Is Gemini down? Users report problems with Google Gemini

Is Gemini down? Users report problems with Google Gemini

June 17, 2026
The Atlantic uncovers millions of copyrighted songs in AI training data

The Atlantic uncovers millions of copyrighted songs in AI training data

June 16, 2026

LATEST NEWS

Samsung adopts ChatGPT Enterprise and Codex across global workforce

Samsung Galaxy S27 Pro leak points to built-in Privacy Display

Perseverance rover completes a marathon on Mars

Polymarket accused of paying creators to post misleading TikTok bet videos

OpenAI improves health responses for free ChatGPT users

Adobe expands Firefly AI across Premiere, Illustrator, InDesign and Frame.io

BEST AI MODELS LEADERBOARD

See the best AI models, ranked by intelligence, benchmark results, speed and token price. Find the most suitable LLMs, Text-to-Image, Image Editing, Text-to-Speech, Text-to-Video and Image-to-Video  artificial intelligence model for your tasks and business.

LATEST TOOLS

Moonbeam

Charisma AI

Essay Writer by Papertyper

Slite

Wonderin AI

Spur

Stenography

Calldesk

MaxAI.me

PhotoRestore

Dataconomy

COPYRIGHT © DATACONOMY MEDIA GMBH, ALL RIGHTS RESERVED.

  • About
  • Imprint
  • Contact
  • Legal & Privacy

Follow Us

  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Whitepapers
    • AI Models Leaderboard
  • AI tools
  • Newsletter
  • + More
    • Glossary
    • Conversations
    • Events
    • About
      • Who we are
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
No Result
View All Result
Subscribe

This website uses cookies to improve your experience. You can choose to accept or reject them. Visit our Privacy Policy.