Twenty-one
years ago, researchers announced the first “draft” of sequencing the complete
human genome. It was a monumental achievement, but the sequence was still
missing about 8 percent of the genome. Now, scientists working together around
the world say they’ve finally filled in that reclusive 8 percent.
If their
work holds up to peer review and it turns out they really did sequence and
assemble the human genome in its entirety, gaps and all, it could change the
future of medicine.
Sequencing
the human genome has long been a huge project with worthy goals. Why? Because
as humans understand their genetic code better, they can make better, more
customized medicines, for example—including the kind of gene-focused medicine
that powered the first effective COVID-19 vaccines.
Humans have
46 chromosomes, in 23 pairs, that represent tens of thousands of individual
genes. Each gene consists of some number of base pairs made of adenine (A),
thymine (T), guanine (G), and cytosine (C). There are billions of base pairs in
the human genome.
In June
2000, the Human Genome Project (HGP) and private company Celera Genomics
announced that first “draft” of the human genome. This was the result of years
of work that picked up the pace as humans continued to make better computers
and algorithms for processing the genome. At the time, scientists were
surprised that of the over 3 billion individual “letters” of base pairs, they
estimated humans have just 30,000 to 35,000 genes. Today, that number is far
lower, hovering just above 20,000.
Three years later, HGP completed its mission to map the whole human genome and defined itsterms this way:
“‘Finished sequence’ is a technical term meaning that the sequence is highly accurate (with fewer than one error per 10,000 letters) and highly contiguous (with the only remaining gaps corresponding to regions whose sequence cannot be reliably resolved with current technology).”
“Current
technology” is doing a lot of heavy lifting here. At the time, HGP used a
process called bacterial artificial chromosome (BAC), where scientists used a
bacterium to clone each piece of the genome, and then study them in smaller
groups. A complete “BAC library” is 20,000 carefully prepared bacteria with
cloned genes inside.
But that BAC
process inherently misses some portions of the whole genome. The reason why is
a great lead-in to what the new team of scientists has helped to accomplish.
A Sequencing
Breakthrough
What’s
lurking in the secretive 8 percent of the genome that the 2000 “draft” of the
genome left untouched? The base pairs in this section are made of many, many
repeated patterns that just made it too unwieldy to study using the bacteria
cloning method.
BAC and other approaches just weren’t right for the repeats-heavy remaining 8 percent of the genome. “The current workhorse DNA sequencers, made by Illumina, take little fragments of DNA, decode them, and reassemble the resulting puzzle,” Stat’s Matthew Herper reports. “This works fine for most of the genome, but not in areas where DNA code is the result of long repeating patterns.”
That makes
intuitive sense; imagine counting from 1 to 50 versus simply counting 1, 2, 1,
2, . . . over and over again. Part of what made the BAC method successful is
scientists took care to minimize and match up the overlaps, which became almost
impossible in the repeats-heavy unexplored portion of the genome.
So, what’s different in the new approaches? Let’s first look at what they are. The California-based Pacific Biosciences (PacBio) the U.K.-based Oxford Nanopore have different technologies, but are racing toward the same goal.
PacBio’s
proprietary gene sequencing technology
PacBio uses
a system called HiFi, where base pairs are circulated, literally as circles,
until they’re read in full and in high fidelity—hence the name. The system
dates back just a few years and represents a big step forward in both length
and accuracy for those longer sequences.
Oxford
Nanopore, meanwhile, uses electrical current in its proprietary devices.
Strands of base pairs are pressed through a microscopic nanopore—just one
molecule at a time—where a current zaps them in order to observe what kind of
molecule they are. By zapping each molecule, scientists can identify the full
strand.
Oxford
Nanopore's proprietary technology.
In the new
study published in the biology preprint server bioRxiv, an international
consortium of about 100 scientists used both PacBio and Oxford Nanopore
technologies to chase down some of the remaining unknown sections of the human
genome.
The amount of
ground the consortium covered is staggering. “The consortium said that it
increased the number of DNA bases from 2.92 billion to 3.05 billion, a 4.5
[percent] increase. But the count of genes increased by just 0.4 [percent], to
19,969,” Stat reports. This shows how big the heavily repeating base pair
sequences in this zone are compared to the genes they represent.
The Missing
Links
Sequencing
godfather George Church, a biologist at Harvard University, told Stat if this
work goes through peer review successfully, it will be the first time any
vertebrate genome has been fully mapped. And the reason seems to be simply that
both new technologies allow very long strings of base pairs to be read at once.
Why is the
missing gene information so important? Well, the study of genes experiences a
lot of favoritism, with a handful of most popular genes taking up the bulk of
research interest and funding. The overlooked genes hold a lot of key
mechanisms that cause disease, for example.
There’s one
little snag, although it was also a snag for the 2000 announcement of the first
draft of the genome. Both projects studied cells that had just 23 chromosomes
instead of the full 46. That’s because they use cells derived from the
reproductive system, where eggs and sperm each carry half of a full chromosomal
load.
The cell is
from a hydatidiform mole, a kind of reproductive growth that represents an
extremely early, unviable union between a sperm and an egg cell that has no
nucleus. Choosing this kind of cell, which has been kept and cultured as a
“cell line” used for research purposes, cuts the huge sequencing job in half.
The next
step is for the study to appear in a peer-reviewed publication. After that,
though, both PacBio and Oxford seek to sequence the entire 46-chromosome human
genome. But we might be waiting a while.