Dataconomy
  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Glossary
    • Whitepapers
  • Newsletter
  • + More
    • Conversations
    • Events
    • About
      • About
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
Subscribe
No Result
View All Result
  • AI
  • Tech
  • Cybersecurity
  • Finance
  • DeFi & Blockchain
  • Startups
  • Gaming
Dataconomy
  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Glossary
    • Whitepapers
  • Newsletter
  • + More
    • Conversations
    • Events
    • About
      • About
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
Subscribe
No Result
View All Result
Dataconomy
No Result
View All Result

OpenAI researchers identify the mathematical causes of AI hallucinations

Mathematical analysis shows that error accumulation and binary scoring systems make confident fabrications unavoidable.

byAytun Çelebi
September 17, 2025
in Research, Artificial Intelligence
Home Research

Researchers at OpenAI have published a paper diagnosing why large language models like ChatGPT hallucinate, or confidently generate false information.

The study uses mathematical analysis to explain that hallucinations are an unavoidable result of how these models make predictions, even when trained on perfect data. The primary causes are error accumulation and flawed evaluation benchmarks.

How sequential predictions lead to errors

The paper explains that LLMs operate through an autoregressive process, predicting the next word in a sequence based on the words that came before it. This creates a chain where a single early error can propagate and amplify, leading to an entirely incorrect statement. The researchers’ mathematical proof shows that the error rate for generating a full sentence is at least twice the error rate of a simple yes/no question, simply because of this compounding effect.

Stay Ahead of the Curve!

Don't miss out on the latest insights, trends, and analysis in the world of data, technology, and startups. Subscribe to our newsletter and get exclusive content delivered straight to your inbox.

This structural limitation means that hallucinations cannot be completely eliminated by scaling up computing power or improving training data, as the issue is inherent to the predictive architecture. The problem is worse for facts that appear infrequently in the training data. The study found that about 20% of the birthdays of notable figures appeared only once in the training set, leading to a baseline error rate of at least 20% for those queries.

As a practical example, the researchers queried state-of-the-art models for the birthday of Adam Kalai, one of the paper’s authors. The models confidently provided several different incorrect dates, demonstrating a pattern of fabricating plausible-sounding details to fill knowledge gaps.

Evaluation benchmarks penalize honesty and encourage guessing

The study also critiques the benchmarks used to evaluate AI models. The researchers reviewed ten prominent AI benchmarks and found that nine of them use a binary grading system: A response is either 100% correct or 100% incorrect.

Under this system, an answer of “I don’t know” receives the same score as a completely wrong answer—zero.

This scoring method creates what the paper calls an “epidemic” of penalizing honesty. A mathematical proof included in the study demonstrates that this system incentivizes models to always guess an answer, as any guess has a probability greater than zero of being correct and thus receiving a higher score than abstaining. This explains why even advanced models default to confident fabrications rather than admitting uncertainty.

Proposed solutions and the trade-off between accuracy and user experience

To address this, the OpenAI researchers propose a new approach that integrates confidence estimation into both the model’s behavior and the evaluation process. Models would be trained to assess their own certainty and would be evaluated with a scoring system that penalizes incorrect answers more heavily than it rewards correct ones. For example, a prompt could instruct the model to “Answer only if you are more than 75 percent confident, since mistakes are penalized 3 points while correct answers receive 1 point.”

Implementing this would significantly reduce hallucinations, but it comes at a cost. The paper estimates that under such a system, models would respond with “I don’t know” to about 30% of user queries. This could be frustrating for users accustomed to receiving an immediate answer for everything, potentially driving them to less cautious competitor models.

The high computational cost of accurately measuring uncertainty also makes this approach impractical for high-volume consumer services. However, the paper notes that for high-stakes professional applications in fields like finance, medicine, or chip design, the cost of an error is far greater than the cost of computation, making uncertainty-aware systems not just viable but essential. The study concludes that the core incentives in consumer AI, which prioritize user engagement and speed, will ensure that hallucinations remain a persistent issue until those priorities shift.


Featured image credit

Tags: AIFeaturedopenAIResearch

Related Posts

Microsoft will install Copilot to everyone’s PCs from fall 2025

Microsoft will install Copilot to everyone’s PCs from fall 2025

September 17, 2025
Microsoft’s deal with OpenAI in question as they trusted Anthropic for this new feature

Microsoft’s deal with OpenAI in question as they trusted Anthropic for this new feature

September 17, 2025
Under 18? You won’t be able to use ChatGPT soon

Under 18? You won’t be able to use ChatGPT soon

September 17, 2025
OpenAI’s ChatGPT-5 finally got the “half of knowledge”

OpenAI’s ChatGPT-5 finally got the “half of knowledge”

September 17, 2025
OpenAI hardware chief calls for kill switches to counter devious AI models

OpenAI hardware chief calls for kill switches to counter devious AI models

September 16, 2025
DeepMind CEO says learning how to learn is the key skill for the AI era

DeepMind CEO says learning how to learn is the key skill for the AI era

September 16, 2025

LATEST NEWS

How data bias in healthcare leaves midlife women behind—and how to fix it

Microsoft will install Copilot to everyone’s PCs from fall 2025

Microsoft’s deal with OpenAI in question as they trusted Anthropic for this new feature

Under 18? You won’t be able to use ChatGPT soon

OpenAI’s ChatGPT-5 finally got the “half of knowledge”

CrowdStrike and Meta launch open-source CyberSOCEval benchmark to test AI cybersecurity models

Dataconomy

COPYRIGHT © DATACONOMY MEDIA GMBH, ALL RIGHTS RESERVED.

  • About
  • Imprint
  • Contact
  • Legal & Privacy

Follow Us

  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Glossary
    • Whitepapers
  • Newsletter
  • + More
    • Conversations
    • Events
    • About
      • About
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
No Result
View All Result
Subscribe

This website uses cookies. By continuing to use this website you are giving consent to cookies being used. Visit our Privacy Policy.