
I once heard someone quip that, “Google is worth three times as much as Facebook because Facebook knows what you tell your friends, but Google know what you do in the privacy of your room.”
It’s certainly true that people ask search engines about personal and sometimes very private matters. These queries provide the searchers with (mostly) useful information. Records of these searches and other types of Internet data also have a second life. Search companies, for example, use anonymized search records to operate and improve their services. These data can also be a treasure trove for researchers who want to study public health, sociology, and even how earthquakes spread.
Here are a few areas in which aggregated Internet data can generate what I call “crowdsourced health”:
Where individuals interact extensively with online content: Search queries for celebrities known or rumored to have anorexia spike following media reports about them, as evident from a study of approximately 6 million people that I did with Danah Boyd, a colleague at Microsoft. A single such search by an individual increases her or his risk of developing anorexia — unless the report discusses the harms of being underweight. Then it has no effect on developing anorexia.
Where Internet data provide better sensors: Most people with the flu never see a health care provider. Many will, however, query a search engine about it or tweet that they are staying home from work or school because of it. These online mentions made it possible to estimate the effectiveness of flu vaccinations among children in the United Kingdom when traditional methods of tracking vaccine effectiveness could not.
Where people have difficulty reporting: Negative reactions to medicines can take a long time to appear, making it difficult for people to realize that a drug is causing problems. My colleague Evgeniy Gabrilovich and I analyzed the searches of people who queried for a medicine and then looked to see if they later searched for negative reactions associated with drugs. By linking anonymized data separated by many months, we found that while serious, early-onset drug reactions are more likely to be reported to doctors and regulatory agencies, milder, later-onset reactions are better captured in online search queries.
Wait just a second, you might be thinking. Linking “anonymized data separated by many months” sounds like search engines are saving the queries I make and that someone could essentially see my search history.
That’s definitely a concern. We tend to think of our Web searches as things we do in private, not subject to the scrutiny of others. That may not be the case.
Identifying individual users from anonymized search queries has happened. In 2006, AOL released a log of 20 million searches from about 650,000 anonymized users — their names were replaced with numbers. Two New York Times reporters were able to cross reference the searches and relatively easily identify one of them as a 62-year-old widow living in Lilburn, Ga.
In 2009, when Google introduced its system for tracking influenza through an aggregated count of queries about the flu, the Electronic Privacy Information Center and Patient Privacy Rights questioned whether these data could be used to identify individual users, thus breaching their privacy.
Since then, progress has been made in the way we store and process these data, in our ability to protect user privacy through anonymization and aggregation, and in our understanding of the ethical challenges that arise when using these data. We are also learning more and more about the potential of Internet data to improve public health.
In the balance of risk and reward regarding using search data for crowdsourced health, we have clearly moved towards rewards. Medical researchers should think of online searches and other Internet data as new tools in their toolbox, especially when traditional methods of medical research cannot help or are prohibitively expensive.
Elad Yom-Tov is a principal researcher at Microsoft Research and a visiting scientist at the Technion-Israel Institute for Technology. His book, “Crowdsourced Health: How What You Do on the Internet Will Improve Medicine,” (MIT Press) was published last month.