After an explosion of excitement in the potential for machine learning in medicine, cracks in the foundation are emerging.
More and more research is focusing on the ways that medical models can introduce algorithmic bias into health care. But in a new paper, machine learning researchers caution that such self-reflection is often ad hoc and incomplete. They argue that to get “an unbiased judgment of AI bias,” there needs to be a more routine and robust way of analyzing how well algorithms perform. Without a standardized process, researchers will only find the bias they think to look for.
To make that possible, the researchers provide a new framework designed to help regularly and holistically assess for drops in performance — whether a model is just being developed or regularly spot-checked for real-world use.
This article is exclusive to STAT+ subscribers
Unlock this article — and get additional analysis of the technologies disrupting health care — by subscribing to STAT+.
Already have an account? Log in
To submit a correction request, please visit our Contact Us page.