Skip to Main Content

Nearly 170 years ago, a physician who had been born into a poor family used data to stop a cholera epidemic in London’s then-marginalized Soho neighborhood. By interviewing residents and plotting the locations of those who were ill on a simple map, John Snow, today seen as one of the founders of modern epidemiology, identified a shared water well as the source of cholera. Removing the pump handle, which stopped the epidemic by cutting off the supply of contaminated water to the community, is one of epidemiology’s legendary stories.

Although the scope of the Covid-19 pandemic is magnitudes larger than that historic cholera outbreak, those most affected by it are marginalized people. And just as data enabled Snow to ask and answer the right questions to solve a problem, collecting data has similarly helped countries around the world fight the pandemic. But there is a big gap: lack of data representing communities of Black, Indigenous, and other people of color (BIPOC).


This dearth of data has at least two origins. One is that researchers and data collectors don’t ask about race or ethnicity because of concerns about privacy, sensitivity, and skewing participation. Another is that some members of BIPOC communities, particularly those who are most marginalized, are often reticent to share data with researchers. Efforts to investigate health disparities often highlight deficits in BIPOC communities that have limited impact in solving problems at the community or individual level. These harms are real and continue to occur today.

Demonstrating the trustworthiness of the research enterprise through communication is an urgent task during this pandemic and will be needed for community-engaged research long after it ends. It is essential to use a framework that demonstrates trustworthiness, is engaged with the community, and recognizes and actively attempts to mitigate harms and inequality through bidirectional engagement not only for accurate and complete data collection but also to use that data to address health inequalities.

The Covid-19 pandemic has made structural inequalities in U.S. communities and in the country’s approach to public health clear and urgent. Collecting and providing high-quality, accurate information to communities in a way that is culturally informed, sensitive, and builds trust requires prolonged, intentional effort.


Most data being used to address Covid-19, and public health in general, are missing information from BIPOC communities. Despite federal requirements from the Office of Management and Budget, race and ethnicity data are often incompletely collected or misclassified. In 2020, more than half of U.S. health departments did not report data about all racial and ethnic groups. This lack of representation in data is leading to systemic erasure of already vulnerable populations, as well as making it difficult to assess and mitigate the impact of Covid-19 in these communities.

This matters because lack of nuance in data about BIPOC communities affects policy decisions. The lack of granularity in public health data also makes it hard to measure the true impact of the pandemic on underserved communities. While Covid-19 public data is at the ZIP code or county level, race and ethnicity data are currently reported at a state level. More granular access to race and ethnicity data at the county and ZIP code levels for Covid-19 testing, vaccination, cases, hospitalizations, and deaths would enable a better assessment of the impact of changing policies and public health interventions.

Testing sewage for SARS-CoV-2, the virus that causes Covid-19, offers an example of this problem. It can estimate how much of the virus is circulating in the population served by the sewer system and is quite sensitive to new infections, since the amount of virus that individuals shed is highest early in the course of infection. This is an excellent technique for monitoring the course of the pandemic in the community or city or metropolitan region served by the sewage system, but it does not provide information about who is infected or where they live.

There is a critical need to engage communities around how and why data affect them and provide tools, like data visualization, to use these data to make informed decisions for communities and to work effectively with local public health officials.

The NIH-funded Rapid Acceleration of Diagnostics-Underserved Populations (RADx-UP) program, which our group at Duke University, UNC-Chapel Hill, and the Community-Campus Partnerships for Health is part of, is working across the United States to do two important things:

  • Evaluate strategies to increase access to and uptake of SARS-CoV-2 tests in communities where the need for them is greatest.
  • Partner with those communities to share data in a standardized way.

The data being collected in and with RADx-UP communities can help identify in real time what historically marginalized communities need to fight Covid-19. For example, where are there gaps in testing resources? Do people know where the closest testing site is? What are their perceptions and behaviors around testing? How good is their access to information about testing and vaccination?

Such information is being gathered by the RADx-UP consortium through forms and surveys using the NIH RADx-UP common data elements. By using uniform data collection during testing and ensuring the inclusion of diverse populations, the program is developing a national resource that will enable researchers to answer these questions as well as provide community partners and local leaders with the evidence to help them advocate for what their community needs, such as establishing closer locations for testing sites; translating information about testing and vaccination into the right languages; implementing policies to protect workers; or launching campaigns to provide more trustworthy, accessible information about the virtues of testing and vaccination.

Beyond the pandemic, data being collected through RADx-UP will benefit communities in the long term by creating a data framework for researchers and policy makers to examine and address factors that contribute to health disparities with a much-needed community focus.

Researchers can choose to be today’s John Snows by engaging BIPOC communities in meaningful discussions about data collection and finding ways to balance data sharing, data sovereignty, and data protection so data can be trusted and accessible to these communities. To be sure, stopping the Covid-19 pandemic won’t be as simple as removing the handle of the Broad Street pump. But by using data to map and knit together community-level experiences across the nation, researchers and public health officials can begin to better understand both the common and the distinct experiences of those hit hardest by the pandemic and shut down all of the pumps that are making people sick.

Warren Kibbe is a biomedical informaticist, chief data officer for the Duke Cancer Institute, and professor and vice chair for the department of biostatistics and bioinformatics at Duke University School of Medicine. Giselle Corbie-Smith is a general internist, health equity researcher, director of the Center for Health Equity Research, and distinguished professor of social medicine and medicine at the University of North Carolina at Chapel Hill School of Medicine.

Create a display name to comment

This name will appear with your comment

There was an error saving your display name. Please check and try again.