One could be forgiven for a little genetic déjà vu.
Launched in 1990, the Human Genome Project unveiled its first readout of the human DNA sequence with great fanfare in 2000. The human genome was declared essentially complete in 2003—but it took nearly 20 more years before the final, complete version was released.
This did not mark the end of humankind’s genetic puzzle, however. A new study has mapped the yawning gap between reading our genes and understanding them. Vast parts of the genome—areas the study authors have nicknamed the “Unknome”—are made of genes whose function we still don’t know.
This has important implications for medicine: Genes are the instructions for making the protein building blocks of the body. Plenty of those still shrouded in darkness could have profound medical significance and may hold the keys to disorders of development, cancer, neurodegeneration, and more.
The study makes it embarrassingly clear just how many important genes we know little to nothing about. It estimates that a fifth of human genes with a vital function are still essentially a mystery. The good news is that the research also outlines how scientists can focus on those mystery genes. “We might now be at the beginning of the end of the Unknome,” says Matthew Freeman of the Dunn School of Pathology at the University of Oxford, a coauthor of the study.
The research team used two tools to find the gaps in our knowledge. First, using the plethora of existing databases of genetic information, they compared the genetic codes of many different species to reveal genes that look roughly similar.
These riffs on a genetic theme are known as conserved genes, and even if we don’t understand what they do, we know that they must be important because nature is parsimonious and tends to use the same genetic machinery to do important jobs in different organisms. “The one thing we could be confident of is that, if important, these genes would be quite well-conserved across evolution,” says Freeman.
Once they had found similar genetic riffs in worms, humans, flies, bacteria, and other organisms, the researchers could look at what was known about the function of these clearly important genes and score them accordingly, with a high “knownness” score reflecting solid understanding.
Because so much genetic information is already available on hundreds of genomes and recorded in a standardized way, it was possible to automate this scoring process. “We then asked how many of those [conserved genes] have a score of less than one, where essentially nothing is known about them,” says Freeman. “To our surprise, two decades after the first human genome, it is still an extraordinary number.”
In all, the total number of human genes with a knownness score of 1 or less is currently 1,723 out of 19,664.
By the same token, the top 10 genes identified by the team’s rummage through genetic databases corresponded with “all the most famous genes, which is reassuring,” says Sean Munro of the Laboratory of Molecular Biology in Cambridge, a study coauthor. “We recognized every single one of them, and there are already thousands of papers about each of them.”
When it came to the substantial number that were unknown, the team conducted one more study, using the best understood (at the genetic level) organism of all: Drosophila melanogaster. These fruit flies have been the subject of research for more than a century because they are easy and inexpensive to breed, have a short life cycle, produce lots of young, and can be genetically modified in numerous ways.
The team used gene editing to dial down the use of around 300 low-scoring genes found in both humans and fruit flies. “We found that one-quarter of these unknown genes were lethal—when knocked out, they caused the flies to die, and yet nobody had ever known anything about them,” says Freeman. “Another 25 percent of them caused changes in the flies—phenotypes—that we could detect in many ways.” These genes were linked with fertility, development, locomotion, protein quality control, and resilience to stress. “That so many fundamental genes are not understood was eye-opening,” Freeman says. It’s possible that variation in these genes could have very big impacts on human health.
All of this “unknomics” information is held on a database, which the team is making available for other researchers to use to discover new biology. The next step may be to hand the data on these mystery genes and the mystery proteins they create over to AI.
DeepMind’s AlphaFold, for example, can provide important insights into what mystery proteins do, notably by revealing how they interact with other proteins, says Alex Bateman of the European Bioinformatics Institute, based near Cambridge, UK. So can cryo-EM, which is a way of producing images of large, complex molecules, he says. And a University College London team has shown a systematic way to use machine learning to figure out what proteins do in yeast.
The Unknome is unusual in that it’s a biology database that will shrink as we understand it better. The paper shows that over the past decade “we have moved from 40 percent to 20 percent of the human proteome having a certain level of unknownness,” says Bateman. However, at current progress rates, working out the function of all human protein-coding genes could take more than half a century, Freeman estimates.
The discovery that so many genes remain misunderstood reflects what is called the streetlight effect, or the drunkard’s search principle, an observational bias that occurs when people only search for something where it is easiest to look. In this case, it has caused what Freeman and Munro call a “bias in biological research toward the previously studied.”
The same goes for researchers, who tend to get funding for research in relatively well-understood areas, rather than going off into what Freeman calls the wilderness. This is why the database is so important, Munro explains—it fights back against the economics of academia, which avoids things that are very poorly understood. “There is a need for a different type of support to address these unknowns,” says Munro.
But even with the database becoming available and researchers picking through it, there will still be some knowledge blind spots. The study focused on genes that are responsible for proteins. Over the past two decades, uncharted areas of the genome have also been found to harbor the code for small RNAs—scraps of genetic material that can affect other genes, and which are critical regulators of normal development and bodily functions. There may be more “unknown unknowns” lurking in the human genome.
For now, there’s still plenty to get into, and Freeman hopes this work will encourage others to study the genetic Terra Incognita: “There’s more than enough Unknome for anyone who wants to explore genuinely new biology.”
Original article here
Sorry, the comment form is closed at this time.