Annual Report 2012

What is Whole Genome Sequencing (WGS)?

Rudolph Tanzi, Ph.D., Chairman, Research Consortium

 

Approximately 35,000 genes provide the blueprints for the roughly 350,000 proteins made in the human body. Genes are chemically made of deoxyribose nucleic acid (DNA), first described by Jim Watson and Francis Crick in the late 1950s. DNA is packaged into 46 different chromosomes, such as the X-shaped one on page 7. Chromosomes are found in the nucleus at the center of the cell—in this case, a nerve cell. There are two pairs of 23 chromosomes, 1–22 and either X or Y. Each parent gives you a set, making a total of 46 per cell.

The structure of DNA resembles a double helix consisting of chemicals called nucleotides (bases). The bases are named by letters: A (red), T (green), C (blue) and G (yellow). A on one strand always binds to T on the other, and C always to G. These are called “base pairs.” The human genome consists of 3 billion of these base pairs of DNA. Whole Genome Sequencing involves determining all 3 billion base pairs of DNA in a subject’s human genome as they are laid out, in order, across the 46 chromosomes. This sequential determination of the base pairs is called “sequencing” of the DNA.

Just picture recording the order of 3 billion red, yellow, green and blue beads on a string to get an idea of the process. Roughly 4 percent of the DNA in humans is in genes. Most, but not all, genes provide the blueprint used by cells of the body to make a protein, which is itself composed of a chain of 20 different amino acids arranged in different combinations, which in a given protein may number in the thousands. To make proteins, DNA is used to create RNA. Proteins then are assembled in RNA-based factories called “ribosomes,” which read the “genetic code” originally contained within the DNA for a particular gene in the nucleus of a cell. The RNA in the ribosome then guides the assembly of the protein from amino acids. The protein later is processed in various parts of the cell to achieve its final configuration. Some proteins serve as the building blocks of the body, while others carry out specific functions.

While most genes provide the template for a protein, some serve to regulate the activity of other genes without making proteins, e.g., by only making RNA. For example, some genes make “microRNAs” that can control the activity of other genes. About 96 percent of the DNA in the human genome sits between the genes and is called “intergenic” DNA. This used to be called “junk” DNA, but the recent international “ENCODE” project showed most of this DNA is not junk at all, since it serves to regulate the genes themselves.

In our Whole Genome Sequencing project, we will identify functional DNA variants throughout the human genome that are inherited as risk factors for Alzheimer’s disease. This study constitutes Phase III of the Alzheimer’s Genome Project™, and will be aimed at identifying all of the DNA variants in the genome (in genes as well as in the intergenic regions) that directly influence risk for the disease. The Whole Genome Sequencing study also will inform us as to how DNA variants in the intergenic portions of the genome regulate the activities of the Alzheimer’s genes. As part of our road map, these findings then will be used not only to better understand the causes of Alzheimer’s disease at the genetic level, but also be implemented to guide drug discovery efforts to slow down, stop or, hopefully, even reverse Alzheimer’s disease pathology.