In response to the Covid-19 pandemic, tech giants have chipped in on a panoply of efforts to shore up data that can help track, trace, and curb the spread of the virus, from Apple and Google’s collaboration on contact tracing technology to Amazon’s funding for several pandemic-related research projects.
But an independent group of researchers thinks it has a better way to lend a hand.
The group, called the Covid-19 Mobility Data Network, includes public health researchers and data scientists across a smattering of institutions, including Harvard and MIT. Two of its co-founders were early members of a task force involving tech companies and the White House that has struggled to make a meaningful impact. They have since left and are now focused on scaling up the network’s research.
The group is using aggregated movement data from Facebook and other companies to assess potential pitfalls of public health interventions and have been regularly sharing their findings with state and local public health authorities across the country, as well as foreign governments.
Unlike the contact tracing initiative from Apple and Google, which will require public health authorities to collect a suite of detailed data on individuals’ movements and demographics, the Covid-19 Mobility Data Network’s efforts use zoomed-out, population-level data to trace mobility patterns at the level of ZIP codes, counties, and states.
The data comes from Facebook’s Data for Good program, as well as from advertising firms including Cuebiq, which collect location data whenever someone sees an ad on their smartphone. It then is aggregated and anonymized by analytics company Camber Systems and turned over to researchers. Using population-level data addresses some — though not all — of the privacy concerns that have been raised about the role of technology in the response.
STAT spoke with Andrew Schroeder, one of the group’s co-founders and vice president of research for medical aid nonprofit Direct Relief, about the partnerships, the limitations of digital contact tracing efforts, and what the network has learned so far about the current impact of Covid-19 on the U.S. This interview has been condensed and edited for length and clarity.
How did you go from being the vice president of research at Direct Relief to the co-founder of this new group?
We respond to wildfires and other natural disasters where people are evacuated, and we first got involved with Facebook’s Data for Good program in 2017 during the Thomas Fire in California. We were working with Santa Barbara County, where we’re headquartered, along with Ventura County and the state, to help support communities by providing access to N95 masks.
We began using disaster maps to look at how population patterns were changing in Santa Barbara — where evacuations were happening, how long people were staying away from certain areas, things like that — and lined that up with our strategy to dispense masks.
Then we began building that data into our disaster response activities for everything from hurricanes to earthquakes and tornadoes. In many ways, the biggest challenge of all has been Covid-19, since basically the whole planet is implicated at the same time.
What do you think about current coronavirus contact tracing efforts?
I’m pretty skeptical. I’m not skeptical of traditional contact tracing — I think it’s the most important thing we can do. We’ve worked with contact tracers in sub-Saharan Africa for years and everyone there will tell you the same thing: The most important aspect of contact tracing is community trust. You don’t need an app to solve that problem. What you need is people. You don’t have to have highly trained people, you just need a bunch of them, and trust, so people will talk to them. And you need to build upon the practices that many countries around the world have used for a long time now, with the response to the Ebola outbreak being the best example.
Why use data from Facebook and ad tech companies, as opposed to data from other tech companies, like Apple, Google, or Microsoft?
The thing that makes Covid-19 different from any other event we’ve worked on is the sheer number of places affected. You could do this analysis on every city on Earth.
Facebook’s Data for Good team did a good job getting the data right. It gives us a close to real-time view of large-scale population change. The team has the legal framework in place, they publish a portal which gives access to downloadable data and they’ve done work on the backend through a data science team around calculating metrics, cleaning the data, getting the headers right — all key things when you’re trying to use data at scale.
The other companies we chose had similar programs set up with the right parameters and privacy protections. Other tech companies either didn’t provide access to the data or wouldn’t share how they arrived at their calculations, so we couldn’t check their math in a sense.
How do you do this in a way that protects privacy?
The first key is that no individual information is being accessed in the aggregated analysis at all. For Facebook in particular, the lowest unit of analysis is a 600-meter square. In the county-level information, Facebook has also applied differential privacy rules to the data, [meaning] noise is added to the locations [to mask any individual-level information]. Other groups have different ways to deal with these issues, such as calculating population movement index values as a means of keeping any particular individual’s information non-identifiable.
What still needs to be addressed is the appropriate level of scale. One concern we’ve had is making sure that information which could be used to do things like shame specific local communities or assist with punitive approaches to social distancing enforcement are not released publicly, nor to agencies who are not specifically tasked with public health policy, such as law enforcement. We’ve decided that on balance, county-level information is of sufficient analytical value and sufficient protection in terms of scale that it’s not a concern to release publicly.
How will your work complement human contact tracing? Are there too many competing data collection efforts for Covid-19?
Our analysis is at the level of counties or ZIP codes or other big units, where the big units represent the overall rate of change for the entire population, while contact tracing is at the individual level. You need both approaches, and you can marry low-tech and high-tech solutions.
We need the means to safely share this data without putting individual and community privacy at risk. All these efforts are collecting different forms of data.
What does your data say about how people in the U.S. are complying with physical distancing recommendations?
We’ve seen a pretty clear correlation between accelerated timelines to reopen in specific areas and changes in case count. If you look at Alabama, Wisconsin, Iowa, Nebraska, Minnesota, Ohio — all places where they changed social distancing orders at earlier stages — you now see a turnaround in the case rate. Where you had falling case rates at the beginning of May, you now have rising case rates.
In Alabama, where most of the counties were flat or declining at the end of April, practically the entire state is now on the increase. In rural areas, where there’s no ICU system, people are flocking to cities like Montgomery. Those hospitals are now in dire condition. Wisconsin is also a concern right now. Most of the southern part of the state has been rebounding pretty significantly.
At the same time there’s good news: New York and New Jersey — where mobility patterns increased much more slowly — have gotten things under control.
California is a weird case. It was the case study for getting things under control early. Practically the entire coast — from the San Francisco Bay Area to San Diego — was showing a very high level of adherence to social distancing orders. Now Los Angeles, Riverside, Central Valley have all been seeing a turnaround. Is it enough to tip the national numbers back towards positive? Not yet, but it’s concerning.
Are you worried about parts of the U.S. reopening too early?
I’m pretty concerned. I’d relate it to the challenge many people have with antibiotics — you take the antibiotics, you start to feel better, and think, “I don’t need to take these anymore” — and that leads to problems. I worry we’re there with social distancing.
What are the most interesting findings you’ve made so far about Covid-19?
We’ve been able to bring some empirical focus into questions about risk and the unequal burden of Covid-19 on different populations. Caroline Buckee [another co-founder of the mobility group] and her team at Harvard, for example, studied pregnant women in New York as a means of doing random sampling [since women don’t get pregnant on a systematic basis], and found a linear correlation between commuting mobility and Covid-19 seroprevalence, [the measure of the virus in blood serum]. You could see pretty clearly that certain populations are at elevated risk of getting coronavirus and developing complications, which can include death.
Who are those at-risk populations, and what can be done to improve the public health response with them in mind?
Where you see higher rates of commuting mobility are in areas including the Bronx, the northern section of Queens — basically, the farther you get from central Manhattan. Essential workers. Middle and lower ranks of health care. People that literally cannot work from home because their job cannot be virtualized.
We’ve shared these insights with public health departments. There is probably a series of public policy interventions that can and should be considered on how to better support workers and communities that cannot stop commuting.
What are some of the limitations of the data you use?
There’s always the question of how representative it is. Does it represent everyone, or are you making predictions based on 20 people in an area? Some places have a high density of people using various social media apps, whereas other places there may be a population of people using the app that’s way lower as an overall percentage.
How do you address those shortcomings?
There are techniques to allow you to get better insights. Some of the ad tech companies we work with, for example, use data sourced from rolling averages, rather than relying on single days, which can provide a more biased or limited view.
It would be nice if we could give a confidence estimate along with the data to say, “This rate of mobility is based on Facebook data in this area and we are confident within 99.9% that this is the true population change rate, as opposed to this area where we’ll give 70% confidence.” Right now we can’t do that.