Skip to Main Content

ChatGPT has rocketed into health care like a medical prodigy. The artificial intelligence tool correctly answered more than 80% of board exam questions, showing an impressive depth of knowledge in a field that takes even elite students years to master.

But in the hype-heavy days that followed, experts at Stanford University began to ask the AI questions drawn from real situations in medicine — and got much different results. Almost 60% of its answers either disagreed with human specialists or provided information that wasn’t clearly relevant.


The discordance was unsurprising since the specialists’ answers were based on a review of patients’ electronic health records — a data source ChatGPT, whose knowledge is derived from the internet, has never seen. However, the results pointed to a bigger problem: The early testing of the model only examined its textbook knowledge, and not its ability to help doctors make faster, better decisions in real-life situations.

Unlock this article by subscribing to STAT+ and enjoy your first 30 days free!


Create a display name to comment

This name will appear with your comment

There was an error saving your display name. Please check and try again.