OpenAI announced that GPT-5.5 Instant, the default model for free ChatGPT users, now performs comparably to its frontier Thinking models on health-related questions, according to the company’s evaluations. This update arrives amid growing scrutiny of AI-generated health answers, as highlighted by a recent Guardian investigation revealing inaccuracies in some Google AI Overviews, which led to the removal of certain Google AI features.
OpenAI claims GPT-5.5 Instant outperforms its predecessor, GPT-5.3 Instant, in evaluations conducted by its internal benchmarks, HealthBench and HealthBench Professional. The company reported a 71% decline in the rate of health responses flagged for factuality issues over two months, based on monitoring of live traffic.
In a separate comparison, OpenAI had physicians write responses to representative health queries. A distinct panel of doctors rated GPT-5.5 Instant’s responses higher than those crafted by physicians in measures of accuracy, communication, and completeness across a pool of 3,500 reviewed interactions. The company stated that the new model displayed fewer failure modes than previous versions and physicians, with reduced instances of missing important red flags or failing to request additional user context.
HealthBench, the benchmark used by OpenAI, was developed with input from over 260 physicians across 60 countries, who have assessed over 700,000 example responses. The figure of 260 physicians has been consistent since the launch of ChatGPT Health in January; however, results from the evaluations have not been made available for external review.
OpenAI indicated that more than 230 million users inquire about health and wellness topics through ChatGPT weekly, highlighting this use case as one of the most significant for the chatbot. Health topics are also prioritized in OpenAI’s policies, which prohibit running advertisements in health-related discussions.
According to an Ahrefs analysis, medical queries receive the highest exposure rate for AI-generated answers, indicating a potential shift in demand towards ChatGPT’s free tier. OpenAI’s claims about accuracy are based on in-house evaluations, creating challenges for external validation. The future implications of these developments, particularly regarding how they may affect citations and the responsibility of practitioners for verifying AI responses, remain unclear.





