Skip to Main Content

An international team of scientists says it has sequenced and assembled the entirety of the human genome, including parts that were missed in the sequencing of the first human genome two decades ago.

The claim, if confirmed, surpasses the achievement laid out by leaders from the Human Genome Project and Celera Genomics on the White House lawn in 2000, when they announced the sequencing of the first draft human genome. That historic draft, and subsequent human DNA sequences, have all missed about 8% of the genome.


The sequencing of the new genome fills in these gaps using new technology. It has different limitations, however, including the type of cell line that the researchers used in order to speed up their effort.

The work was detailed May 27 in a pre-print, meaning it has not yet been peer-reviewed.

“You’re just trying to dig into this final unknown of the human genome,” said Karen Miga, a researcher at the University of California, Santa Cruz, who co-led the international consortium that created the sequence. “It’s just never been done before and the reason it hasn’t been done before is because it’s hard.”


Miga emphasized that she won’t consider the announcement official until the paper is peer-reviewed and published in a medical journal.

The new genome is a leap forward, researchers say, that was made possible by new DNA sequencing technologies developed by two private sector companies: Pacific Biosciences of Menlo Park, Calif., also known as PacBio, and Oxford Nanopore, of Oxford Science Park, U.K.. Their technologies for reading out DNA have very specific advantages over the tools that have long been considered researchers’ gold standards.

Ewan Birney, the deputy director general of the European Molecular Biology Laboratory called the result  “a technical tour de force.” The original genome papers were carefully worded because they did not sequence every DNA molecule from one end to the other, he noted. “What this group has done is show that they can do it end-to-end.” That’s important for future research, he said, because it shows what is possible.

George Church, a Harvard biologist and sequencing pioneer, called the work “very important.” He said he likes to note in his talks that up until now no one has sequenced the entire genome of a vertebrate — something that is no longer true, if the new work is confirmed.

One important and unanswered question: How important are these missing pieces of the human puzzle? The consortium said that it increased the number of DNA bases from 2.92 billion to 3.05 billion, a 4.5% increase. But the count of protein-coding genes increased by just 0.4%, to 19,969.  That doesn’t mean, researchers emphasized, that the work couldn’t also lead to other new insights, including those related to how genes are regulated.

The DNA sequence used was not from a person, but from a hydatidiform mole, a growth in a woman’s uterus caused when sperm fertilized an egg that did not have a nucleus. This meant that it contained two copies of the same 23 chromosomes, instead of two differing sets of chromosomes, as normal human cells do.

The researchers chose these cells, which had been kept in a lab, because this made the computational effort of creating the DNA sequence simpler. The original draft genome created in 2003 also contained only 23 chromosomes, but as technologies for DNA sequencing have become cheaper and simpler, researchers have tended to sequence all 46 chromosomes.

Elaine Mardis, co-executive director of the Institute for Genomic Medicine at Nationwide Children’s Hospital, worried that because these cell lines were kept in the lab, potentially mutating, the new genetic information “may be largely the detritus that accumulates as a cell line is propagated over many years in culture.”

Miga said that studies of the cell line had shown it to be similar to human cells, and that the researchers used cells that had been kept frozen, not propagated for many years. “We went to great lengths in the preprints to demonstrate that these new sequences serve as biological reference for human genomes,” Miga wrote in an email.  She agreed the next step was for the group to try to sequence all 46 chromosomes, known as a diploid genome. 

Why did it take 20 years for this last 8% of the genome to be sequenced, even as the cost of sequencing the rest of the genome dropped from $300 million to as little as $300? The answer has to do with the way DNA sequencing technologies work.

The current workhorse DNA sequencers, made by Illumina, take little fragments of DNA, decode them, and reassemble the resulting puzzle. This works fine for most of the genome, but not in areas where DNA code is the result of long repeating patterns. If a supercomputer only had small fragments, how could it assemble a DNA sequence that repeated “AGAGAGA” for bases upon bases? That’s what the missing 8% of the genome looked like.

 Among these “unmappable” regions were one of the most recognizable structures in biology. If you’ve ever looked at chromosomes (think back to high school biology), they look like strings that have been knotted together. Those knots are centromeres, bundles of DNA that hold the chromosomes together. They play a key role in cell division. And they are full of repeats.

It was the centromeres, in fact, that drew Miga to want to see these missing regions.

“Why are the regions that are so fundamental to life, so fundamental to how the cell operates, positioned over parts of our genome that are these giant seas of tandem repeats?” she remembers asking as a grad student.

It was that question that led her, in discussion with Adam Phillippy, a researcher at the National Institutes of Health, to propose starting their current initiative, called the Telomere 2 Telomere Consortium, after the telomeres, which are the ends of the chromosome, in 2019. They signed on Evan Eichler, a University of Washington biologist who had been worried about the missing parts of the genome for years, as a co-author.

The work was possible because the Oxford Nanopore and PacBio technologies do not cut the DNA up into tiny puzzle pieces. The Oxford Nanopore technology runs a DNA molecule through a tiny hole, resulting in a very long sequence. The PacBio tech uses lasers to examine the same sequence of DNA again and again, creating a readout that can be highly accurate. Both are more expensive than the existing Illumina technology.

The companies are in a heated race. For this project, the researchers say, the PacBio technology’s accuracy proved invaluable, and they used Oxford Nanopore to finish up some areas. But Oxford Nanopore has already been promising new, more usable tech. “In the here and now, PacBio has the advantage but it’s not clear how long they’ll be able to keep it,”  said Michael Schatz, an associate professor at Johns Hopkins University.

All the researchers spoke of a vision of the future where instead of using a single reference genome, they would assemble hundreds of different, complete genomes that are interlinked and ethnically diverse, and can be used as references. Miga is helping lead that work, as well. And this is just a step in that direction.

But until now, Schatz says, there have always been questions about what was missing. Now finally we have the right data,” he said. “We have the right technology.”

Correction: A previous version of this story incorrectly described the chromosomes of a hydatidiform mole.

  • Dear Matthew,
    Your article cites the following: 
    If a supercomputer only had small fragments, how could it assemble a DNA sequence that repeated “AGAGAGA” for bases upon bases? That’s what the missing 8% of the genome looked like.
    What do you mean by “if a computer only had small fragments” ? I don’t understand why a supercomputer should have trouble with analyzing repeated segments?
    Kind regards, Shahroch Nahrwar

    • I believe the problem lies in the computer not knowing which correct sequence(s) following the repeats. Since read assemblies are done by matching overlaps in the sequences, which may result in erroneous scaffolds. This would confuse the software down the line. At least, this is the most common problem when it comes to read assemblies.

      Recent tech by PacBio for example sequences the DNA in its entirety without the need for fragmenting, which solves this issue. It’s remarkable, really.

  • illumina technology and PacBio with Oxford Nanopore technology are making us dream the human/human
    Parts and whole body development
    From simpler Genomic structures we are amazed with the development!

  • Worked at UCSC in the Natural Sciences Electronics Shop from 1985 until I retired in 2012. Our shop started Ethernet on campus, so later, after the IT department took over network management, we still were allowed to activate Ethernet ports and provide cables, small hubs, and then switches to the professors and their research groups. Buying cables and switches in bulk allowed us to have some available to “lend” to the Human Genome Project so they could build their supercluster of borrowed Dell desktops.
    I was also consulted by Mark Akeson about electronic techniques for gathering data in their prototype nanopore DNA sequencer. Not sure of the details on this, but that may be the technology that Oxford Nanopore is using.
    It is very satisfying to me that I was able to make tiny contributions to both these breakthroughs. And because of the work of so many others, these breakthroughs will continue to advance and improve our knowledge of life.

  • The hydatidiform mole used in this research is very unusual because of its 23 chromosomes. Most hydatidiform moles have the normal complement of 46 chromosomes and less often 69 chromosomes–46 from an abnormal sperm and 23 from the egg.
    23 chromosomes is quite rare because it means the egg was abnormal and had no nucleus or chromosomes of its own.

    I see this advance as useful but still limited because the source is just one-half of one man’s DNA.

Comments are closed.