When Craig Venter and Dr. Francis Collins announced that they had sequenced… (Joyce Naltchayan / AFP )
When the human genome was sequenced a decade ago, scientists hailed the feat as a technical tour de force — but they also knew it was just a start. Our DNA blueprint was finally laid bare, but no one knew what it all meant.
Now an international team has taken the crucial next step by delivering the first in-depth report on what the endless loops and lengths of DNA inside our cells are up to.
The findings, reported in a slew of papers Wednesday in the journals Nature, Science and other publications, move far beyond a straightforward list of genes. They tally, in a super-complicated catalog, all the places along our DNA strands that are biochemically active — sites where proteins attach to DNA to control it, or where enzymes move in and make little alterations, and more besides.
Defining this hive of activity is essential, scientists said, because it transforms our picture of the human blueprint from a static list of 3 billion pairs of DNA building blocks into the dynamic master-regulator that it is.
The revelations will be key to understanding how genes are precisely controlled so that they leap into action at the right place and time in our bodies, allowing a whole, healthy human being to develop from a single fertilized egg. In addition, they will help explain how the carefully choreographed process can go awry, causing birth defects, diseases and aging.
“The human genome was a bit like getting ‘War and Peace’ in Russian — It’s a great book containing all of human experience, but [if] I don’t know any Russian it’s very hard to read,” said Ewan Birney, a computational biologist at the European Bioinformatics Institute in England and lead analysis coordinator for the project, which is known as ENCODE. The aim, he said, “is to take the human genome and try to make a usable translation.”
The $123-million effort involved more than 400 scientists and more than 1,600 experiments during five years of work. If presented graphically, the data generated so far would cover a poster 30 kilometers wide and 16 meters high, Birney has estimated.
This is still just a start — akin to “grainy images beamed back to Earth by the first satellite,” said Dr. Eric Green, director of the National Human Genome Research Institute, which funded ENCODE. But already, it’s throwing up surprises.
Strikingly, the data overturn old ideas that the bulk of DNA in our cells is useless -- albeit inoffensive -- junk just carried along for the evolutionary ride. Back in 2003 when the human genome was published, scientists estimated that less than 2% of it carries instructions for making proteins, and many of them thought the rest didn’t do very much.
But the new analysis shows that more than 80% of the human genome is active in at least one biological process that the ENCODE team measured. Nearly every part of it could end up being active when the data are more complete.
A huge chunk of that activity has to do with gene regulation — dictating whether the instructions each gene carries for making a unique protein will be executed or not. That is key, because pretty much every cell in our bodies carries the entire set of 21,000 genes. To adopt its unique identity, each cell — be it one in the pancreas that makes insulin or one in the skin making pigment or hair — only activates a subset.
Using a variety of laboratory methods and more than 150 types of human cells, the scientists found and mapped a total of almost half a million DNA sites that act as “switches”— turning genes off or on in one cell or another, at various times and intensities.
“You can’t move for switches,” Birney said.
The switches are activated or suppressed when master-regulator proteins bind to them or when chemical “tags” like methyl groups are added. Some of them are right where scientists would expect them to be — near to the genes they control — but some are extremely far away, the researchers found.
Though that came as a bit of a surprise, it makes sense, said molecular geneticist Joseph Ecker of the Salk Institute for Biological Studies in La Jolla, who was not involved in ENCODE but wrote a commentary accompanying the report.
“We draw DNA out as this long, linear thing where you can read from one end to the other, but the reality in the cell is that molecule is folded tightly and compactly and jammed into the nucleus of the cell,” Ecker said. When our DNA is crunched up that way, like a hairball, places far apart on a strand could end up very close to each other in physical space.