ChatGPT Health Fails To Spot 52% Of Medical Emergencies In Study

A study published in Nature Medicine on February 24 found that ChatGPT Health failed to direct users to emergency care in more than half of serious medical cases. Researchers at the Icahn School of Medicine at Mount Sinai conducted the evaluation, testing the consumer-facing tool across 960 interactions. The study highlights potential safety concerns regarding AI-powered triage as millions of users increasingly rely on chatbots for health guidance.

The research team designed 60 clinical scenarios spanning 21 medical specialties. These cases ranged from minor conditions suitable for home care to genuine emergencies. Three independent physicians established the correct level of urgency for each scenario, utilizing guidelines from 56 medical societies. This consensus approach ensured a standardized benchmark for evaluating the AI’s performance. Each scenario was then tested under 16 different contextual conditions, including variations in race, gender, social dynamics, and barriers to care such as lack of insurance. This methodology produced a total of 960 interactions with ChatGPT Health.

The results revealed what the researchers described as an “inverted U-shaped” pattern of performance. ChatGPT Health handled textbook emergencies like stroke and anaphylaxis correctly. However, the tool under-triaged 52 percent of cases that physicians deemed true emergencies. For conditions such as diabetic ketoacidosis and impending respiratory failure, the AI directed patients toward a 24-to-48-hour evaluation instead of recommending immediate emergency department care. Additionally, the system misclassified 35 percent of non-urgent cases.

A significant finding concerned the tool’s susceptibility to anchoring bias. When family members or friends minimized symptoms within the prompts, triage recommendations shifted dramatically toward less urgent care. The study quantified this influence with an odds ratio of 11.7. Dr. Ashwin Ramaswamy, one of the study’s corresponding authors, commented on the specific limitations observed. “ChatGPT Health performed well in textbook emergencies such as stroke or severe allergic reactions,” Ramaswamy said. “But it struggled in more nuanced situations where the danger is not immediately obvious, and those are often the cases where clinical judgment matters most.”

The study also exposed inconsistencies in the tool’s crisis intervention system. ChatGPT Health is designed to direct users to the 988 Suicide and Crisis Lifeline in high-risk situations. Researchers found that these alerts appeared more reliably when users described no specific method of self-harm than when they articulated a concrete plan. This observation effectively inverted the relationship between risk level and safeguard activation. Dr. Girish Nadkarni, Mount Sinai’s Chief AI Officer and the study’s other corresponding author, described the finding as going “beyond inconsistency.” Nadkarni noted that “the system’s alerts were inverted relative to clinical risk.”

The study’s publication coincides with rapid consumer adoption of AI health tools. OpenAI launched ChatGPT Health in January 2026. The company reported that roughly 40 million people were using ChatGPT daily for health-related questions. Earlier in 2026, the nonprofit patient safety organization ECRI ranked misuse of AI chatbots in healthcare as the top health technology hazard. ECRI warned that these tools “can provide false or misleading information that could result in significant patient harm.”

The Mount Sinai team analyzed the influence of demographic and socioeconomic factors on triage outcomes. The data showed no statistically detectable effects from patient race, gender, or barriers to care. However, the study’s confidence intervals did not rule out the possibility of clinically meaningful differences. The researchers indicated plans to continue evaluating updated versions of ChatGPT Health and other consumer AI tools. Future research will expand into pediatric care, medication safety, and non-English-language use.

Featured image credit

Tags: chatgpt health

ChatGPT Health fails to spot 52% of medical emergencies in study

Researchers at the Icahn School of Medicine at Mount Sinai conducted 960 tests, finding the AI struggled with "nuanced" cases like respiratory failure.

Related Posts

New Mac malware disguises itself as CrashReporter

LLMs showed stronger hiring bias than humans

AI surge to drive US data centers to use one-fifth of power by 2035

Startup unveils AI model built on oscillators and it could cut energy use by 1,000x

Digital transformation of procurement processes: Building a corporate procurement system based on the example of an international industrial holding project

New dark matter theory proposes two particle types

LATEST NEWS

Xbox tests free ad-supported cloud gaming

OpenAI launches ChatGPT Health to all US users

Runway introduces AI model router via Dev platform

AMD unveils Helios AI rack to challenge Nvidia

Amazon brings Luna games into Prime Video

Anthropic upgrades Claude voice mode with Sonnet

BEST AI MODELS LEADERBOARD

LATEST TOOLS

Amanda AI

InterviewBot

VernAI

MyLoans

Essay Grader AI

Cover Letter AI

Animate Old Photos

Resume.io

MonAI

AIEngine Plugin

COPYRIGHT © DATACONOMY MEDIA GMBH, ALL RIGHTS RESERVED.

ChatGPT Health fails to spot 52% of medical emergencies in study

Researchers at the Icahn School of Medicine at Mount Sinai conducted 960 tests, finding the AI struggled with "nuanced" cases like respiratory failure.

Stay Ahead of the Curve!

Related Posts

LATEST NEWS

COPYRIGHT © DATACONOMY MEDIA GMBH, ALL RIGHTS RESERVED.

Follow Us