Julianne Zedalis; John Eggebrecht

17.3 Whole-Genome Sequencing

Learning Objectives

In this section, you will explore the following questions:

What are three types of gene sequencing?
What is whole-genome sequencing?

Connection for AP^® Courses

Information presented in section is not in scope for AP^®. However, you can study information in the section as optional or illustrative material.

Teacher Support

With older techniques, identification of pathogenic bacteria is a time consuming process that may take days or weeks. Previously, identification of the tuberculosis bacteria can take up to six weeks. The development of DNA microarrays has enabled clinical laboratories to shorten that time to hours, with better specificity of the identification. This has provided physicians with the information they need to get patients on the most effective antibiotic therapy rapidly, providing better care and preventing the infectious agent from spreading to more hosts.

Although there have been significant advances in the medical sciences in recent years, doctors are still confounded by some diseases, and they are using whole-genome sequencing to get to the bottom of the problem. Whole-genome sequencing is a process that determines the DNA sequence of an entire genome. Whole-genome sequencing is a brute-force approach to problem solving when there is a genetic basis at the core of a disease. Several laboratories now provide services to sequence, analyze, and interpret entire genomes.

Whole-exome sequencing is a lower-cost alternative to whole genome sequencing. In exome sequencing, only the coding, exon-producing regions of the DNA are sequenced. In 2010, whole-exome sequencing was used to save a young boy whose intestines had multiple mysterious abscesses. The child had several colon operations with no relief. Finally, whole-exome sequencing was performed, which revealed a defect in a pathway that controls apoptosis (programmed cell death). A bone-marrow transplant was used to overcome this genetic disorder, leading to a cure for the boy. He was the first person to be successfully treated based on a diagnosis made by whole-exome sequencing.

The Science Practice Challenge Questions contain additional test questions related to the material in this section that will help you prepare for the AP exam. These questions address the following standards:
[APLO 2.23][APLO 3.5][APLO 3.20][APLO 3.21]

Strategies Used in Sequencing Projects

The basic sequencing technique used in all modern day sequencing projects is the chain termination method (also known as the dideoxy method), which was developed by Fred Sanger in the 1970s. The chain termination method involves DNA replication of a single-stranded template with the use of a primer and a regular deoxynucleotide (dNTP), which is a monomer, or a single unit, of DNA. The primer and dNTP are mixed with a small proportion of fluorescently labeled dideoxynucleotides (ddNTPs). The ddNTPs are monomers that are missing a hydroxyl group (–OH) at the site at which another nucleotide usually attaches to form a chain (Figure 17.12). Each ddNTP is labeled with a different color of fluorophore. Every time a ddNTP is incorporated in the growing complementary strand, it terminates the process of DNA replication, which results in multiple short strands of replicated DNA that are each terminated at a different point during replication. When the reaction mixture is processed by gel electrophoresis after being separated into single strands, the multiple newly replicated DNA strands form a ladder because of the differing sizes. Because the ddNTPs are fluorescently labeled, each band on the gel reflects the size of the DNA strand and the ddNTP that terminated the reaction. The different colors of the fluorophore-labeled ddNTPs help identify the ddNTP incorporated at that position. Reading the gel on the basis of the color of each band on the ladder produces the sequence of the template strand (Figure 17.13).

A deoxynucleotide consists of a deoxyribose sugar, a base, and three phosphate groups. Dideoxyribose is identical to deoxyribose except that the hydroxyl (–OH) group at the 3' position is replaced by H. A 3' hydroxyl is necessary for elongation of the DNA chain, and the chain therefore stops growing if a dideoxyribose instead of deoxyribose is incorporated into the growing chain.

Figure 17.12 A dideoxynucleotide is similar in structure to a deoxynucleotide, but is missing the 3' hydroxyl group (indicated by the box). When a dideoxynucleotide is incorporated into a DNA strand, DNA synthesis stops.

The left part of this illustration shows a parent strand of DNA with the sequence GATTCAGC, and four daughter strands, each of which was made in the presence of a different dideoxynucleotide: ddATP, ddCTP, ddGTP, or ddTTP. The growing chain terminates when a ddNTP is incorporated, resulting in daughter strands of different lengths. The right part of this image shows the separation of the DNA fragments on the basis of size. Each ddNTP is fluorescently labeled with a different color so that the sequence can be read by the size of each fragment and its color.

Figure 17.13 Frederick Sanger's dideoxy chain termination method is illustrated. Using dideoxynucleotides, the DNA fragment can be terminated at different points. The DNA is separated on the basis of size, and these bands, based on the size of the fragments, can be read.

Early Strategies: Shotgun Sequencing and Pair-Wise End Sequencing

In shotgun sequencing method, several copies of a DNA fragment are cut randomly into many smaller pieces (somewhat like what happens to a round shot cartridge when fired from a shotgun). All of the segments are then sequenced using the chain-sequencing method. Then, with the help of a computer, the fragments are analyzed to see where their sequences overlap. By matching up overlapping sequences at the end of each fragment, the entire DNA sequence can be reformed. A larger sequence that is assembled from overlapping shorter sequences is called a contig. As an analogy, consider that someone has four copies of a landscape photograph that you have never seen before and know nothing about how it should appear. The person then rips up each photograph with their hands, so that different size pieces are present from each copy. The person then mixes all of the pieces together and asks you to reconstruct the photograph. In one of the smaller pieces you see a mountain. In a larger piece, you see that the same mountain is behind a lake. A third fragment shows only the lake, but it reveals that there is a cabin on the shore of the lake. Therefore, from looking at the overlapping information in these three fragments, you know that the picture contains a mountain behind a lake that has a cabin on its shore. This is the principle behind reconstructing entire DNA sequences using shotgun sequencing.

Originally, shotgun sequencing only analyzed one end of each fragment for overlaps. This was sufficient for sequencing small genomes. However, the desire to sequence larger genomes, such as that of a human, led to the development of double-barrel shotgun sequencing, more formally known as pairwise-end sequencing. In pairwise-end sequencing, both ends of each fragment are analyzed for overlap. Pairwise-end sequencing is, therefore, more cumbersome than shotgun sequencing, but it is easier to reconstruct the sequence because there is more available information.

Next-generation Sequencing

Since 2005, automated sequencing techniques used by laboratories are under the umbrella of next-generation sequencing, which is a group of automated techniques used for rapid DNA sequencing. These automated low-cost sequencers can generate sequences of hundreds of thousands or millions of short fragments (25 to 500 base pairs) in the span of one day. These sequencers use sophisticated software to get through the cumbersome process of putting all the fragments in order.

Evolution Connection

Comparing Sequences

A sequence alignment is an arrangement of proteins, DNA, or RNA; it is used to identify regions of similarity between cell types or species, which may indicate conservation of function or structures. Sequence alignments may be used to construct phylogenetic trees. The following website uses a software program called BLAST (basic local alignment search tool).

Under “Basic Blast,” click “Nucleotide Blast.” Input the following sequence into the large "query sequence" box: ATTGCTTCGATTGCA. Below the box, locate the "Species" field and type "human" or "Homo sapiens". Then click “BLAST” to compare the inputted sequence against known sequences of the human genome. The result is that this sequence occurs in over a hundred places in the human genome. Scroll down below the graphic with the horizontal bars and you will see short description of each of the matching hits. Pick one of the hits near the top of the list and click on "Graphics". This will bring you to a page that shows where the sequence is found within the entire human genome. You can move the slider that looks like a green flag back and forth to view the sequences immediately around the selected gene. You can then return to your selected sequence by clicking the "ATG" button.

(credit: modification of work from MilliporeSigma)

The image shows how Sanger sequencing works. The first step is making more copies of the DNA segment through a PCR reaction. In this part, additional nucleotides are added to the solution to provide the building blocks of DNA segments. Some fluorescently-labeled nucletoides are also added.

What is the function of the fluorescently-labeled nucleotides?

These nucleotides are the basic building blocks that make up the DNA segment.
These nucleotides stop the DNA replication when they are added to the chain. This creates a ladder of differently-sized DNA segments.
These nucleotides promote the PCR reaction and allow it to proceed onward.
These nucleotides cannot be added to DNA fragments and are left out.

Use of Whole-Genome Sequences of Model Organisms

The first genome to be completely sequenced was of a bacterial virus, the bacteriophage fx174 (5368 base pairs); this was accomplished by Fred Sanger using shotgun sequencing. Several other organelle and viral genomes were later sequenced. The first organism whose genome was sequenced was the bacterium Haemophilus influenzae; this was accomplished by Craig Venter in the 1980s. Approximately 74 different laboratories collaborated on the sequencing of the genome of the yeast Saccharomyces cerevisiae, which began in 1989 and was completed in 1996, because it was 60 times bigger than any other genome that had been sequenced. By 1997, the genome sequences of two important model organisms were available: the bacterium Escherichia coli K12 and the yeast Saccharomyces cerevisiae. Genomes of other model organisms, such as the mouse Mus musculus, the fruit fly Drosophila melanogaster, the nematode Caenorhabditis. elegans, and humans Homo sapiens are now known. A lot of basic research is performed in model organisms because the information can be applied to genetically similar organisms. A model organism is a species that is studied as a model to understand the biological processes in other species represented by the model organism. Having entire genomes sequenced helps with the research efforts in these model organisms. The process of attaching biological information to gene sequences is called genome annotation. Annotation of gene sequences helps with basic experiments in molecular biology, such as designing PCR primers and RNA targets.

Link to Learning

Click through each step of genome sequencing at this site.

Review the Sanger sequencing method as pictured. Make a case for how deep sequencing offers an improvement on Sanger sequencing.

Deep sequencing allows for much faster sequencing of short DNA strands as compared to Sanger sequencing, which reads only short sequences of DNA at a slow rate, and it avoids Sanger's issues with chain termination and separation.
Sequence coverage is higher in Sanger sequencing as compared to deep sequencing.
Sanger sequencing is suitable when there is only one nucleotide difference between chains, whereas deep sequencing is suitable when there is more than one nucleotide difference between chains.
Sanger sequencing reads and sequences a genome multiple times, whereas deep sequencing accurately reads sequences the whole genome in a single time.

Uses of Genome Sequences

DNA microarrays are methods used to detect gene expression by analyzing an array of DNA fragments that are fixed to a glass slide or a silicon chip to identify active genes and identify sequences. Almost one million genotypic abnormalities can be discovered using microarrays, whereas whole-genome sequencing can provide information about all six billion base pairs in the human genome. Although the study of medical applications of genome sequencing is interesting, this discipline tends to dwell on abnormal gene function. Knowledge of the entire genome will allow future onset diseases and other genetic disorders to be discovered early, which will allow for more informed decisions to be made about lifestyle, medication, and having children. Genomics is still in its infancy, although someday it may become routine to use whole-genome sequencing to screen every newborn to detect genetic abnormalities.

In addition to disease and medicine, genomics can contribute to the development of novel enzymes that convert biomass to biofuel, which results in higher crop and fuel production, and lower cost to the consumer. This knowledge should allow better methods of control over the microbes that are used in the production of biofuels. Genomics could also improve the methods used to monitor the impact of pollutants on ecosystems and help clean up environmental contaminants. Genomics has allowed for the development of agrochemicals and pharmaceuticals that could benefit medical science and agriculture.

It sounds great to have all the knowledge we can get from whole-genome sequencing; however, humans have a responsibility to use this knowledge wisely. Otherwise, it could be easy to misuse the power of such knowledge, leading to discrimination based on a person's genetics, human genetic engineering, and other ethical concerns. This information could also lead to legal issues regarding health and privacy.

17.3 Whole-Genome Sequencing