What if we could learn from massive collections of data while avoiding the privacy and other risks typically associated with sharing such information?
The Mayo Clinic has taken a step toward making that possible with its announcement that the first venture of the Mayo Clinic Platform will use federated learning as a foundational technology of if its privacy model.
Federated learning lets a network of participants collaboratively train algorithms on data while keeping each stakeholder’s data within its home location. Instead of sending data to a single, central repository where algorithms are trained, federated learning sends algorithms to the data. The updated algorithms are then shared with the participants.
This is in direct contrast to traditional data science methods, which require aggregating increasingly large amounts and types of data into a central location.
It is likely that you are already using and benefiting from federated learning without realizing it. Google created federated learning to improve autocorrect on mobile phones. Your phone can correct what you type seemingly instantaneously because the analytics are performed on your phone itself. This has multiple benefits: It keeps the text you type private, the autocorrect algorithm is improved, and predictions are made almost instantly because they don’t need to be beamed to the cloud and back. Apple recently implemented a similar scheme for natural language processing with Siri.
The new Mayo venture, the Clinical Data Analytics Platform, will let many different parties — from universities to private corporations and governments — work together to accelerate the development of knowledge and pharmaceuticals by leveraging this privacy-preserving technology. Applications and algorithms will be run on Mayo’s data in an environment that the clinic controls. The underlying data never leave. Only the insights gleamed from this sensitive information are shared externally. In this way, the Clinical Data Analytics Platform will allow external parties to learn from Mayo Clinic data without Mayo ceding control of its data or sharing it.
Data aggregation has generated unprecedented insights and created value far larger than the sum of individual data sources on their own. At the same time, stockpiling data in a central location exposes that data to newfound privacy risks by creating virtual treasure chests of data, more enticing to hackers than the individual data sources.
Current methods for leveraging data analytics require data to be shared with a central location, like a startup with a novel algorithm. This forces data generators to give away most of the control they held by being the sole possessors of that data. Loss of control is one of the disincentives keeping many health care companies and organizations from sharing their data, and it is precisely the problem that federated learning was created to solve.
Federated learning moved quickly from mobile phones to the lab. Several of the world’s largest drug makers, who are usually fierce competitors, are collaborating in the MELLODDY (MachinE Learning Ledger Orchestration for Drug DiscoverY) project to advance drug discovery. By leveraging federated learning, these companies are able to learn from data of an unprecedented scale: It includes more than 1 billion relevant data points, hundreds of terabytes of image data, and 10 million small molecules.
Similarly, King’s College London announced in December 2019 the formation of a federated learning network that will work on data from three U.K. universities and four of London’s teaching hospitals. This network will focus on developing research, clinical, and operational improvements across many clinical pathways by leveraging patient data encompassing one-third of London’s population, far larger than the reach of any individual organization.
Health care has struggled to find the correct balance between privacy and the desire to innovate. Inherent in this trade-off, and baked into the frameworks and policies governing health care and research, is the assumption that we need to share data to gain its benefits. Technologies like federated learning disprove this premise and challenge us to radically rethink the way we approach data creation, use, disclosure, and analysis. It promises to unlock the benefits of collaboration without unleashing its compromises.
Marielle S. Gross, M.D., is an OB-GYN and postdoctoral fellow at the Johns Hopkins Berman Institute of Bioethics. Robert C. Miller, Jr., is a senior consultant for ConsenSysHealth and manages Blockchain and Healthcare, a newsletter on emerging technology and health care.