As the novel coronavirus continues to infect people around the world, scientists have scrambled to understand its origins and evolution and learn how and where it is spreading. As a developer of gene-sequencing technology, I look to the virus’ genes for answers.
Applied strategically, this type of genomic surveillance can offer great insight into how a virus becomes dispersed in real time. It may also inform policy regarding how and when restrictions might be imposed. One day, gene sequencing might even help prevent an outbreak, or at least nip it in the bud.
I’ve been involved for several years in developing faster, more sensitive, and more portable methods of sequencing viral genomes. My colleagues and I are part of the ARTIC Network, an international consortium of researchers at universities and research institutes in Europe and the U.S.
My involvement with the group began in 2015 during the Ebola outbreak in West Africa (although the network was not officially established until 2017, after receiving funding from the Wellcome Trust). Using the revolutionary MinION sequencer, a portable device for sequencing DNA and RNA, I created a “lab in a suitcase” for sequencing the Ebola genome. As its name implies, this laboratory literally fits in a suitcase, and we’ve transported it all over the world in standard airline luggage. The Ebola work demonstrated for the first time that real-time gene sequencing surveillance was possible with a tiny portable setup, spitting out the sequences of viral genomes in hours.
By analyzing changes in the genome, we could generate a high-resolution view of viral evolution in real time, which made it possible to help identify transmission chains, information that can be used to guide control measures.
We’re now using the same methods of gene sequencing for SARS-CoV-2, the novel coronavirus that causes Covid-19. After the first cases became publicly known at the end of 2019, the complete genome sequence was published in the journal Nature by the Chinese Center for Disease Control and Prevention. From it, I designed a primer scheme — essentially the building blocks for genetic sequencing — and made it available online Jan. 23. I’ve also put online detailed sequencing instructions on how to use the primers that were made available to researchers by Integrated DNA Technologies, a well-known genomics solutions provider already involved in the fight against Covid-19 and the first company in the U.S. to have its primer and probe kits approved by the U.S. Centers for Disease Control and Prevention for use in its emergency use authorization (EUA) testing protocol for the diagnosis and detection of Covid-19. To date, Integrated DNA Technologies has produced sufficient quantities to enable approximately 40 million tests to be conducted.
As I write this, my collaboration with IDT has helped build gene sequencing capacity for the coronavirus with 100 research groups in more than 40 countries.
SARS-CoV-2 is an RNA virus, like the viruses that cause Ebola, Middle East respiratory syndrome (MERS), severe acute respiratory syndrome (SARS), and influenza. RNA viruses are more prone to replication errors than DNA viruses. In fact, SARS-CoV-2 has a predictable rate of evolution, accumulating on average two mutations per month. Sequencing makes it possible to read these mutations to identify different sub-lineages of the virus as it evolves, branching like a tree. The information gathered from it allows researchers to understand the modes of transmission by identifying community spread so interventions can be made to block it, among other things.
In addition to tracking the dispersal of a virus in the present, gene sequencing can reconstruct the processes that drove its global spread in the past and determine when it first arose within a population. Using the mutation rate, the most recent common ancestor of all Covid-19 cases is from mid-November 2019.
Combining viral genome data with location or people’s movements makes it possible to investigate factors driving the outbreak and determine. An analysis by the COVID-19 Genomics UK Consortium, for example, identified 1,350 separate introductions of SARS-CoV-2 into the United Kingdom in March 2020.
In 2016, a team of British and Brazilian researchers that included myself and other ARTIC network colleagues traveled to Northeast Brazil to study Zika virus with the same portable device used to track genetic changes in the Ebola virus. We found that the first case of Zika virus infection in Brazil had likely occurred a year before the disease was first recognized.
Genomic sequencing of SARS-CoV-2 has shown that the virus was transmitted into the human population in China in mid-November before spreading throughout the world. Because many of the early cases were linked to a wet market in Wuhan, it is most likely that the spillover occurred there, although no close relative to the SARS-CoV-2 virus has been isolated anywhere in the world allowing us to identify the animal host.
Gathering real-time data about the genome of a virus during an outbreak provides vital information not just about the virus’ spread and rate of evolution but also about its adaptation to human hosts, antiviral therapies, or vaccines.
One difficulty of the technique is sampling bias. More than 90,000 genome sequences of SARS-CoV-2 have been generated, an astonishing achievement, but two-thirds of them come from the United Kingdom and the United States, meaning that other locations are poorly represented and lineages are missed. ARTIC network methods are helping more groups establish real-time genome sequencing capacity to assist the international effort and reduce the sampling bias.
The larger goal of this work is to guide public health policy. As many areas are now trying to ease lockdown restrictions, countries such as China and South Korea have shown that testing, tracing, and isolation are highly effective measures for controlling the spread of the novel coronavirus. In addition to drawing up a list of contacts a person with Covid-19 might have had, and reaching out to each one, genomic surveillance of individual cases can help fine tune the picture, splitting cases that look identical but that may have different origins. Samples collected via home testing kits can be sequenced to detect unknown clusters of related cases, while the technology can also be used to confirm presumptive clusters such as outbreaks in nursing homes or factories to guide public health policy.
Don’t get me wrong: Genomic surveillance is no substitute for traditional epidemiologic methods, which involve painstaking investigation of all cases. But it can help make the tracing process more efficient. Deploying mini-sequencers in laboratories across the country and analyzing samples in real time will enhance our collective understanding of how the infection picture is changing. My hope is that more and more institutions will begin to adopt this method in both populated and remote areas.
An exciting but further-off application of genomic sequencing is prediction. Someday, DNA sequencers could become so ubiquitous they’d be embedded in people’s smartphones, continually analyzing DNA or RNA from the environment. Potential outbreaks could then be detected before they even get off the ground.
Some have argued for broad genomic surveys to detect animal viruses that have the potential to infect people, but the sheer diversity of viruses in the animal population makes this a big undertaking. If a system had been set up to detect these viruses in advance of the Covid-19 outbreak in Wuhan, China, it might have prevented the pandemic. Although establishing such a sentinel system to detect emerging infectious diseases will require significant investment, I expect this to be a focus of research and development in the wake of the pandemic.
Josh Quick is a UK Research and Innovation Future Leaders fellow at the Institute of Microbiology and Infection at the University of Birmingham.