Dataconomy
  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Glossary
    • Whitepapers
  • Newsletter
  • + More
    • Conversations
    • Events
    • About
      • About
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
Subscribe
No Result
View All Result
  • AI
  • Tech
  • Cybersecurity
  • Finance
  • DeFi & Blockchain
  • Startups
  • Gaming
Dataconomy
  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Glossary
    • Whitepapers
  • Newsletter
  • + More
    • Conversations
    • Events
    • About
      • About
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
Subscribe
No Result
View All Result
Dataconomy
No Result
View All Result

OpenAI researchers identify the mathematical causes of AI hallucinations

Mathematical analysis shows that error accumulation and binary scoring systems make confident fabrications unavoidable.

byAytun Çelebi
September 17, 2025
in Research, Artificial Intelligence

Researchers at OpenAI have published a paper diagnosing why large language models like ChatGPT hallucinate, or confidently generate false information.

The study uses mathematical analysis to explain that hallucinations are an unavoidable result of how these models make predictions, even when trained on perfect data. The primary causes are error accumulation and flawed evaluation benchmarks.

How sequential predictions lead to errors

The paper explains that LLMs operate through an autoregressive process, predicting the next word in a sequence based on the words that came before it. This creates a chain where a single early error can propagate and amplify, leading to an entirely incorrect statement. The researchers’ mathematical proof shows that the error rate for generating a full sentence is at least twice the error rate of a simple yes/no question, simply because of this compounding effect.

Stay Ahead of the Curve!

Don't miss out on the latest insights, trends, and analysis in the world of data, technology, and startups. Subscribe to our newsletter and get exclusive content delivered straight to your inbox.

This structural limitation means that hallucinations cannot be completely eliminated by scaling up computing power or improving training data, as the issue is inherent to the predictive architecture. The problem is worse for facts that appear infrequently in the training data. The study found that about 20% of the birthdays of notable figures appeared only once in the training set, leading to a baseline error rate of at least 20% for those queries.

As a practical example, the researchers queried state-of-the-art models for the birthday of Adam Kalai, one of the paper’s authors. The models confidently provided several different incorrect dates, demonstrating a pattern of fabricating plausible-sounding details to fill knowledge gaps.

Evaluation benchmarks penalize honesty and encourage guessing

The study also critiques the benchmarks used to evaluate AI models. The researchers reviewed ten prominent AI benchmarks and found that nine of them use a binary grading system: A response is either 100% correct or 100% incorrect.

Under this system, an answer of “I don’t know” receives the same score as a completely wrong answer—zero.

This scoring method creates what the paper calls an “epidemic” of penalizing honesty. A mathematical proof included in the study demonstrates that this system incentivizes models to always guess an answer, as any guess has a probability greater than zero of being correct and thus receiving a higher score than abstaining. This explains why even advanced models default to confident fabrications rather than admitting uncertainty.

Proposed solutions and the trade-off between accuracy and user experience

To address this, the OpenAI researchers propose a new approach that integrates confidence estimation into both the model’s behavior and the evaluation process. Models would be trained to assess their own certainty and would be evaluated with a scoring system that penalizes incorrect answers more heavily than it rewards correct ones. For example, a prompt could instruct the model to “Answer only if you are more than 75 percent confident, since mistakes are penalized 3 points while correct answers receive 1 point.”

Implementing this would significantly reduce hallucinations, but it comes at a cost. The paper estimates that under such a system, models would respond with “I don’t know” to about 30% of user queries. This could be frustrating for users accustomed to receiving an immediate answer for everything, potentially driving them to less cautious competitor models.

The high computational cost of accurately measuring uncertainty also makes this approach impractical for high-volume consumer services. However, the paper notes that for high-stakes professional applications in fields like finance, medicine, or chip design, the cost of an error is far greater than the cost of computation, making uncertainty-aware systems not just viable but essential. The study concludes that the core incentives in consumer AI, which prioritize user engagement and speed, will ensure that hallucinations remain a persistent issue until those priorities shift.


Featured image credit

Tags: AIFeaturedopenAIResearch

Related Posts

ChatGPT reaches 800m weekly active users

ChatGPT reaches 800m weekly active users

October 7, 2025
Claude Sonnet 4.5 flags its own AI safety tests

Claude Sonnet 4.5 flags its own AI safety tests

October 7, 2025
Ethical hackers invited: Google launches Gemini AI bug bounty

Ethical hackers invited: Google launches Gemini AI bug bounty

October 7, 2025
Evernote adds OpenAI-powered AI assistant

Evernote adds OpenAI-powered AI assistant

October 7, 2025
xAI announces Grokipedia, an AI Wikipedia competitor

xAI announces Grokipedia, an AI Wikipedia competitor

October 7, 2025
OpenAI Sora 2 requires opt-in for Nintendo, Pokémon content

OpenAI Sora 2 requires opt-in for Nintendo, Pokémon content

October 6, 2025

LATEST NEWS

Shinyhunters extorts Red Hat over stolen CER data

CPAP breach exposes data of 90k military members

Windows 11 test build blocks local account bypass

Excel gets AI agent mode for automated data tasks

What is new at iOS 26.1 beta 2?

ChatGPT reaches 800m weekly active users

Dataconomy

COPYRIGHT © DATACONOMY MEDIA GMBH, ALL RIGHTS RESERVED.

  • About
  • Imprint
  • Contact
  • Legal & Privacy

Follow Us

  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Glossary
    • Whitepapers
  • Newsletter
  • + More
    • Conversations
    • Events
    • About
      • About
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
No Result
View All Result
Subscribe

This website uses cookies. By continuing to use this website you are giving consent to cookies being used. Visit our Privacy Policy.