ChatGPT has rocketed into health care like a medical prodigy. The artificial intelligence tool correctly answered more than 80% of board exam questions, showing an impressive depth of knowledge in a field that takes even elite students years to master.
But in the hype-heavy days that followed, experts at Stanford University began to ask the AI questions drawn from real situations in medicine — and got much different results. Almost 60% of its answers either disagreed with human specialists or provided information that wasn’t clearly relevant.
The discordance was unsurprising since the specialists’ answers were based on a review of patients’ electronic health records — a data source ChatGPT, whose knowledge is derived from the internet, has never seen. However, the results pointed to a bigger problem: The early testing of the model only examined its textbook knowledge, and not its ability to help doctors make faster, better decisions in real-life situations.
Create a display name to comment
This name will appear with your comment