Skip to Main Content

As money pours into health care startups built around artificial intelligence — more than 350 deals totaling $4 billion in 2019 — the field is generally overlooking the potential litigation risk surrounding the de-identified data exception in HIPAA.

Large volumes of data underpin the development of any AI effort. So it’s no surprise we’ve been seeing partnerships between hospitals and holders of large amounts of consumer data (“big data”). Here are just a few examples:


  • Massachusetts General Hospital and Google Cloud joining to enhance ProofPilot for clinical research.
  • Kaiser Permanente and Samsung working toward a smartwatch-based remote monitoring of cardiac rehabilitation patients.
  • University of Southern California’s Lawrence J. Ellison Institute for Transformative Medicine partnering with AT&T to open a new “smart facility” to promote data-driven initiatives in cancer treatment and education.

These partnerships promise innovation that will reshape health care as we know it. The McKinsey Global Institute estimates that applying artificial intelligence algorithms to medical health records would save the U.S. $100 billion a year by “optimizing innovation, improving the efficiency of research and clinical trials, and building new tools … to meet the promise of more individualized approaches.”

But some of those savings may be offset, at least temporarily, by increased litigation costs because the treasure trove of electronic medical records that are so essential to health care AI are protected by powerful privacy laws. The most well-known of these is the Health Insurance Portability and Accountability Act (HIPAA).

Initially passed in 1996, HIPAA’s privacy rule establishes permissible uses and disclosures of so-called protected health information, which it defines as “individually identifiable health information” that is transmitted or maintained in any form and is not subject to certain exclusions.


But what is deemed “individually identifiable” may be a shifting target.

When HIPAA was passed in 1996, it was limited to things like medical records, claims data, and the like. Today, big data has greatly expanded what information could be considered “individually identifiable,” particularly when it is in the hands of data giants that already have so many other individual data points they can combine to identify an individual.

At its most basic, if health care data is capable of identifying a person, it is considered individually identifiable and thus subject to HIPAA’s protections. However, data that is not individually identifiable is not subject to all of HIPAA’s requirements.

HIPAA defines “de-identified data” as health information that “does not identify an individual and with respect to which there is no reasonable basis to believe that the information can be used to identify an individual.” Health providers frequently provide de-identified data in their partnerships with big data to avoid the need for patient authorization before the data exchange.

There are two ways to establish that data is de-identified. According to the Code of Federal Regulations, the first option is for an appropriate expert to determine “that the risk is very small that the information could be used, alone or in combination with other reasonably available information, by an anticipated recipient to identify an individual who is a subject of the information.” The second option is to remove certain identifiers so the “covered entity does not have actual knowledge that the information could be used alone or in combination with other information to identify an individual who is a subject of the information.”

Given the information that big data already has, however, it may be almost impossible to find that a data giant could not combine almost any information with other data to identify an individual — meaning it may be practically impossible to say that data can ever fall within HIPAA’s “de-identified data” exception. The Consumer Technology Association’s recently issued standard definition and characteristics of AI data in health care recognizes that even if all personally identifiable information is removed from a dataset, de-identified data might still be at risk of being re-identified. While the Department of Health and Human Services has provided additional guidance on de-identifying data, its guidance is not without ambiguity. And even if it were unambiguous, courts could disagree.

Adding to the problem is the fact that HIPAA currently does not require that anyone actually re-identify data before it is no longer considered de-identified. The mere ability to use the information, in combination with other data, to identify individuals can prevent the information from being considered as de-identified.

The take-home message is that companies involved in AI-focused partnerships should not rely on HIPAA’s de-identified data exception without considering the risk of litigation. While there is no private right of action under HIPAA (meaning that individuals cannot sue to enforce it), individuals can still use a provider’s noncompliance with a HIPAA standard as a basis for suing a provider for being negligent with the individual’s information.

The recent class action arising from a partnership between Google and the University of Chicago serves as an example and a warning to providers that are considering relying on HIPAA’s de-identified data exception to share protected health information with big data. This litigation arises from sharing electronic health records and a statement in a recent Google research paper that the company had used de-identified patient data. The plaintiff, a former patient at the University of Chicago Medical Center, alleged that the shared data included dates of service, which Google could combine with the wealth of information it already has on individuals, including geolocation data taken from smartphones, to identify the subject of the health record.

The Google-University of Chicago litigation is interesting for one other reason: In the unlikely scenario that the case is not settled before trial, it may provide the judiciary with fertile ground to interpret HIPAA’s de-identified information exception. One argument almost certain to be raised will be whether Google could have re-identified the data.

If, for example, Google was contractually forbidden from re-identifying the data, as it was in a similar partnership through which it received medical information from the University of California, would that mean that Google could not re-identify the data and thus the information was de-identified? While plaintiffs will likely argue that the clear intent when HIPAA’s de-identified data provision was drafted must have been a question whether the information was capable of being re-identified, not the legality of doing so, the court may not be persuaded.

The takeaway is that unless and until HIPAA’s de-identified data exception is amended, entities relying on it for AI partnerships face increased risk of litigation. At this point, the only way to completely avoid this risk is to obtain HIPAA-compliant patient authorization for the disclosure of data.

Those entering into big data-health care provider partnerships should consider the unique challenges of big data in health care. On top of the issues created by HIPAA’s not keeping up with technological advancement, the privacy landscape is also shifting. State privacy laws continue to develop, and they do not always exempt HIPAA-covered entities. For example, the Biometric Information Privacy Act in Illinois has been an area of booming litigation and, despite a Gramm-Leach-Bliley exclusion, does not include a HIPAA exclusion.

One thing is certain: As health care providers and big data combine to develop and implement AI in health care, medical care won’t be the only thing that changes.

Patricia S. Calhoun is a health care attorney with an interest in privacy issues with Carlton Fields, P.A., where Patricia M. Carreiro is a data privacy and cybersecurity litigation attorney.