The first ‘gap-free’ human genome now completed

Published 05 April 2022

Two decades after scientists first announced a reading of the human genome, a US consortium has now unveiled the fully complete version, covering the final 8% of formerly unread DNA and providing a gapless spectrum of human genomic variation.

Twenty-one years after its ‘completion’, the human genome has been finally completed, with publication of a full gapless sequence of the bases in our DNA and covering each chromosome from end to end. The original publication of 2001 apparently mapped only 92% of the human genome; this final 8% now includes numerous genes and much repeated DNA, most of which were around the telomeres and centromeres of each chromosome.

This much heralded completion of the human genome sequence was the work of a US collaboration known as the Telomere to Telomere (T2T) consortium, with six groups reporting their work in a bumper issue of the journal Science on 1 April.(1) The complete mapping of all nucleotides in our DNA, providing the first comprehensive view of the human DNA blueprint, ‘is critical for understanding the full spectrum of human genomic variation and for understanding the genetic contributions to certain diseases’, reports the journal. The new additions will also significantly add to understanding of chromosomes and how they segregate and divide.

The new T2T genome has now brought to light millions of gene variations — stretches of DNA that differ from person to person — not identified in the original reference. ‘In the future,’ the T2T consortium’s co-chair Adam Phillippy of the US National Human Genome Research Institute said in a press statement, ‘when someone has their genome sequenced, we will be able to identify all of the variants in their DNA and use that information to better guide their healthcare.’ He added that sequencing a person’s entire genome should become less expensive and more straightforward in the coming years.

The new reference genome, called T2T-CHM13, adds nearly 200 million base pairs of new DNA sequences, including 99 genes likely to code for proteins and nearly 2000 candidate genes for further study. The gaps now filled by the new sequence include the entire short arms of five human chromosomes and cover some of the most complex regions of the genome. These include highly repetitive DNA sequences found in and around chromosomal structures, notably the telomeres located at the ends of chromosomes and the centromeres that co-ordinate the separation of replicated chromosomes during cell division. The new sequence also reveals previously undetected segmental duplications, long stretches of DNA that are duplicated in the genome and known to play important roles in evolution and disease.

The many press statements supporting the announcement explain that completion of the project was made possible by improved techniques of sequencing long stretches of DNA at a time, which allowed the accurate mapping of very repetitive lengths of DNA, which in the original project’s ‘short-read’ technology was not possible.

An introductory ’perspective’ in the journal notes that the new genomic reference will allow the assembly of models which represent all humans, which will ‘better support personalized medicine, population genome analysis and genome editing’.

1. Filling the gaps. Science special issue, 1 April 2022

Get notified of new articles with our ESHRE newsletter.

Sign up and never miss an update