Researchers at OpenAI have published a paper diagnosing why large language models like ChatGPT hallucinate, or confidently generate false information.
The study uses mathematical analysis to explain that hallucinations are an unavoidable result of how these models make predictions, even when trained on perfect data. The primary causes are error accumulation and flawed evaluation benchmarks.
How sequential predictions lead to errors
The paper explains that LLMs operate through an autoregressive process, predicting the next word in a sequence based on the words that came before it. This creates a chain where a single early error can propagate and amplify, leading to an entirely incorrect statement. The researchers’ mathematical proof shows that the error rate for generating a full sentence is at least twice the error rate of a simple yes/no question, simply because of this compounding effect.
This structural limitation means that hallucinations cannot be completely eliminated by scaling up computing power or improving training data, as the issue is inherent to the predictive architecture. The problem is worse for facts that appear infrequently in the training data. The study found that about 20% of the birthdays of notable figures appeared only once in the training set, leading to a baseline error rate of at least 20% for those queries.
As a practical example, the researchers queried state-of-the-art models for the birthday of Adam Kalai, one of the paper’s authors. The models confidently provided several different incorrect dates, demonstrating a pattern of fabricating plausible-sounding details to fill knowledge gaps.
Evaluation benchmarks penalize honesty and encourage guessing
The study also critiques the benchmarks used to evaluate AI models. The researchers reviewed ten prominent AI benchmarks and found that nine of them use a binary grading system: A response is either 100% correct or 100% incorrect.
Under this system, an answer of “I don’t know” receives the same score as a completely wrong answer—zero.
This scoring method creates what the paper calls an “epidemic” of penalizing honesty. A mathematical proof included in the study demonstrates that this system incentivizes models to always guess an answer, as any guess has a probability greater than zero of being correct and thus receiving a higher score than abstaining. This explains why even advanced models default to confident fabrications rather than admitting uncertainty.
Proposed solutions and the trade-off between accuracy and user experience
To address this, the OpenAI researchers propose a new approach that integrates confidence estimation into both the model’s behavior and the evaluation process. Models would be trained to assess their own certainty and would be evaluated with a scoring system that penalizes incorrect answers more heavily than it rewards correct ones. For example, a prompt could instruct the model to “Answer only if you are more than 75 percent confident, since mistakes are penalized 3 points while correct answers receive 1 point.”
Implementing this would significantly reduce hallucinations, but it comes at a cost. The paper estimates that under such a system, models would respond with “I don’t know” to about 30% of user queries. This could be frustrating for users accustomed to receiving an immediate answer for everything, potentially driving them to less cautious competitor models.
The high computational cost of accurately measuring uncertainty also makes this approach impractical for high-volume consumer services. However, the paper notes that for high-stakes professional applications in fields like finance, medicine, or chip design, the cost of an error is far greater than the cost of computation, making uncertainty-aware systems not just viable but essential. The study concludes that the core incentives in consumer AI, which prioritize user engagement and speed, will ensure that hallucinations remain a persistent issue until those priorities shift.