Julianne Zedalis; John Eggebrecht

15.1 The Genetic Code

Learning Objectives

In this section, you will explore the following questions:

What is the “Central Dogma” of protein synthesis?
What is the genetic code, and how does nucleotide sequence prescribe the amino acid and polypeptide sequence?

Connection for AP^® Courses

Since the rediscovery of Mendel’s work in the 1900s, scientists have learned much about how the genetic blueprints stored in DNA are capable of replication, expression, and mutation. Just as the 26 letters of the English alphabet can be arranged into what seems to be a limitless number of words, with new ones added to the dictionary every year, the four nucleotides of DNA—A, T, C, and G—can generate sequences of DNA called genes that specify tens of thousands of polymers of amino acids. In turn, these sequences can be transcribed into mRNA and translated into proteins which orchestrate nearly every function of the cell. The genetic code refers to the DNA alphabet (A, T, C, G), the RNA alphabet (A, U, C, G), and the polypeptide alphabet (20 amino acids). But how do genes located on a chromosome ultimately produce a polypeptide that can result in a physical phenotype such as hair or eye color—or a disease like cystic fibrosis or hemophilia?

The Central Dogma describes the normal flow of genetic information from DNA to mRNA to protein: DNA in genes specify sequences of mRNA which, in turn, specify amino acid sequences in proteins. The process requires two steps, transcription and translation. During transcription, genes are used to make messenger RNA (mRNA). In turn, the mRNA is used to direct the synthesis of proteins during the process of translation. Translation also requires two other types of RNA: transfer RNA (tRNA) and ribosomal RNA (rRNA). The genetic code is a triplet code, with each RNA codon consisting of three consecutive nucleotides that specify one amino acid or the release of the newly formed polypeptide chain; for example, the mRNA codon CAU specifies the amino acid histidine. The code is degenerate; that is, some amino acids are specified by more than one codon, like synonyms you study in your English class (different word, same meaning). For example, CCU, CCC, CCA, and CCG are all codons for proline. It is important to remember the same genetic code is universal to almost all organisms on Earth. Small variations in codon assignment exist in mitochondria and some microorganisms.

Deviations from the simple scheme of the central dogma are discovered as researchers explore gene expression with new technology. For example the human immunodeficiency virus (HIV) is a retrovirus which stores its genetic information in single stranded RNA molecules. Upon infection of a host cell, RNA is used as a template by the virally encoded enzyme, reverse transcriptase, to synthesize DNA. The viral DNA is later transcribed into mRNA and translated into proteins. Some RNA viruses such as the influenza virus never go through a DNA step. The RNA genome is replicated by an RNA dependent RNA polymerase which is virally encoded.

The content presented in this section supports the Learning Objectives outlined in Big Idea 1 and Big Idea 3 of the AP^® Biology Curriculum Framework. The Learning Objectives merge Essential Knowledge content with one or more of the seven Science Practices. These Learning Objectives provide a transparent foundation for the AP^® Biology course, along with inquiry-based laboratory experiences, instructional activities, and AP^® Exam questions.

Big Idea 1	The process of evolution drives the diversity and unity of life.
Enduring Understanding 1.B	Organisms are linked by lines of descent from common ancestry.
Essential Knowledge	1.B.1 Organisms share many conserved core processes and features that evolved and are widely distributed among organisms today.
Science Practice	3.1 The student can pose scientific questions.
Science Practice	7.2 The student can connect concepts in and across domain(s) to generalize or extrapolate in and/or across enduring understandings and/or big ideas.
Learning Objective	1.15 The student is able to describe specific examples of conserved core biological processes and features shared by all domains or within one domain of life, and how these shared, conserved core processes and features support the concept of common ancestry for all organisms.
Big Idea 3	Living systems store, retrieve, transmit and respond to information essential to life processes.
Enduring Understanding 3.A	Heritable information provides for continuity of life.
Essential Knowledge	3.A.1 DNA, and in some cases RNA, is the primary source of heritable information.
Science Practice	6.5 The student can evaluate alternative scientific explanations.
Learning Objective	3.1 The student is able to construct scientific explanations that use the structure and functions of DNA and RNA to support the claim that DNA and, in some cases, that RNA are the primary sources of heritable information.

Teacher Support

The Central Dogma has been validated by many experiments. The flow of information from DNA to mRNA to polypeptide is the common scheme in all cells, both prokaryotic and eukaryotic. The information in DNA is contained in the sequence of nitrogenous bases. Next question is, How is the sequence of the nitrogenous bases translated into amino acids? A combination of two out of the four letters gives 16 possible amino acids (4² = 16); for example, AA, or AC; but, there 20 amino acids. A combination of three bases gives 64 possible sets (4³ = 64); for example, AAA or AAC. A combination of three bases in a row is a codon or “triplets.” This gives rise to more than enough combinations for the 20 common acids. Some amino acids are specified by a single codon, for example, methionine and tryptophan; others are encoded by up to six independent codons, for example, leucine.

Although protein synthesis follows the same general scheme in prokaryotes and eukaryotes, the detailed mechanism of each can be quite different. The presence of the nuclear membrane adds a layer of complexity to the process. In prokaryotes, transcription and translation are tightly coupled. As soon as the 5'-end of a mRNA has been transcribed from the template strand of DNA, ribosomes can latch onto it and polypeptide synthesis begins. Eukaryotic cells use a more complex series of steps. The enzyme RNA polymerase forms the transcription initiation complex with many proteins called transcription factors. The product of transcription, mRNA undergoes several modifications that change its stability and facilitate export from the nucleus. These extra steps allow greater control over gene expression. Although prokaryotic mRNA is not generally modified, eukaryotic mRNA strands undergo the addition of a methyl-guanosine cap at the 5'-end and a poly-adenosine tail at the 3'- end, without which they may not exit the nucleus. The mRNA also undergoes splicing to remove introns, the non–protein–coding regions of the gene. Protein translation depends on the presence of ribosomes, mRNA, a full complement of tRNA molecules, many enzymes, and many protein factors. As the polypeptide is synthesized, it starts folding into its three-dimensional structure. Further modifications will ensure that the protein is fully functional and shipped to its destination.

Ask the students what a dogma is. It will serve as an introduction to deviations from the Central Dogma. Viruses show numerous variations. The Human Immunodeficiency Virus (HIV) is a retrovirus. Its genome is encoded in RNA molecules which serve as a template for the synthesis of DNA by a virally encoded enzyme called reverse transcriptase. Point out that this enzyme, which is not found in humans, is the target of many anti-HIV medications. The flu virus carries non-coding strands of RNA molecules which are replicated in the host cell by a RNA-dependent RNA polymerase, an enzyme encoded in the viral genome. In the case of the flu virus, there is no DNA stage at all. The flow of information is RNA to RNA to proteins. Closer to “home,” the telomeres, the ends of the linear chromosomes in eukaryotes, are replicated by a special enzyme, a telomerase, which synthesizes DNA from an RNA template.

Just as we transfer information using letters and numbers, the cell transfers information using molecules. Emphasize the similarities between writing and the genetic code. Tell the students that much of the vocabulary of molecular genetics is borrowed from editing: transcription, translation, proofreading, missense, nonsense, etc.

Although the chapter does not use the term “open reading frame,” tie it to Figure 15.4. An open reading frame is a DNA sequence that follows a start codon and ends with a stop codon. A long open reading frame is likely to be a gene.

Teacher Support

Students confuse the vocabulary used to describe the Central Dogma. Copying information from DNA to RNA is transcription because the language is the same. Both are constructed using nucleotides. When a polypeptide is synthesized, the building blocks or “letters” have switched to amino acids. It is a translation. Although not quite identical, show students an example similar to the following:

Dog to Dog (transcription) to Canis (translation)

The first two words represent transcription. The letters are just copied. The last word has the same meaning, “dog” in Latin, but now the language is different.

Consider using the word “redundant” to help explain the meaning of the word “degenerate” in this context. Students confuse the fact that the code is degenerate—several codons can encode the same amino acid—with the fact that the genetic code is universal, which means that the same codon, AUG as an example, is translated as methionine in all cells. The confusion arises from students learning the two concepts at the same time. Give examples of changes in the codons which result in the same amino acids. Although the gene sequence is different, the polypeptide is the same. Remind students that each codon specifies one amino acid, but the reverse is not true. Depending on the amino acid, more than one codon will translate to the same amino acid.

Explain that many proteins of interest are synthesized in bacteria and yeast by inserting the genes for the proteins in the host expression systems. This is possible because the code is universal. If a gene coding for human insulin is inserted in the chromosomes of E. coli, the bacteria will synthesize human insulin.

Teacher Support

Give students examples of codons and ask them to find the matching amino acid. Bring to their attention that typographical errors are a great source of mutations. They should proofread their sequences carefully.

The Science Practice Challenge Questions contain additional test questions for this section that will help you prepare for the AP exam. These questions address the following standards:
[APLO 3.4][APLO 3.25]

The cellular process of transcription generates messenger RNA (mRNA), a mobile molecular copy of one or more genes with an alphabet of A, C, G, and uracil (U). Translation of the mRNA template converts nucleotide-based genetic information into a protein product. Protein sequences consist of 20 commonly occurring amino acids; therefore, it can be said that the protein alphabet consists of 20 letters (Figure 15.2). Each amino acid is defined by a three-nucleotide sequence called the triplet codon. Different amino acids have different chemistries (such as acidic versus basic, or polar and nonpolar) and different structural constraints. Variation in amino acid sequence gives rise to enormous variation in protein structure and function.

Structures of the twenty amino acids are given. Six amino acids; glycine, alanine, valine, leucine, methionine, and isoleucine; have R groups that are non-polar and aliphatic, meaning they do not have a ring. Six amino acids; serine, threonine, cysteine, proline, asparagine, and glutamate; have R groups that are polar but uncharged. Three amino acids; lysine, arginine, and histidine; have R groups that are positively charged. Two amino acids, glutamate and aspartate, have R groups that are negatively charged. Three amino acids; phenylalanine, tyrosine, and tryptophan; have nonpolar and aromatic (meaning they have a ring) R groups.

Figure 15.2 Structures of the 20 amino acids found in proteins are shown. Each amino acid is composed of an amino group (

N H_{3}^{+}

), a carboxyl group (COO^-), and a side chain (blue). The side chain may be nonpolar, polar, or charged, as well as large or small. It is the variety of amino acid side chains that gives rise to the incredible variation of protein structure and function.

The Central Dogma: DNA Encodes RNA; RNA Encodes Protein

The flow of genetic information in cells from DNA to mRNA to protein is described by the Central Dogma (Figure 15.3), which states that genes specify the sequence of mRNAs, which in turn specify the sequence of proteins. The decoding of one molecule to another is performed by specific proteins and RNAs. Because the information stored in DNA is so central to cellular function, it makes intuitive sense that the cell would make mRNA copies of this information for protein synthesis, while keeping the DNA itself intact and protected. The copying of DNA to RNA is relatively straightforward, with one nucleotide being added to the mRNA strand for every nucleotide read in the DNA strand. The translation to protein is a bit more complex because three mRNA nucleotides correspond to one amino acid in the polypeptide sequence. However, the translation to protein is still systematic and colinear, such that nucleotides 1 to 3 correspond to amino acid 1, nucleotides 4 to 6 correspond to amino acid 2, and so on.

To make a protein, genetic information encoded by the DNA must be transcribed onto an mRNA molecule. The RNA is then processed by splicing to remove exons and by the addition of a 5' cap and a poly-A tail. A ribosome then reads the sequence on the mRNA, and uses this information to string amino acids into a protein.

Figure 15.3 Instructions on DNA are transcribed onto messenger RNA. Ribosomes are able to read the genetic information inscribed on a strand of messenger RNA and use this information to string amino acids together into a protein.

The Genetic Code Is Degenerate and Universal

Given the different numbers of “letters” in the mRNA and protein “alphabets,” scientists theorized that combinations of nucleotides corresponded to single amino acids. Nucleotide doublets would not be sufficient to specify every amino acid because there are only 16 possible two-nucleotide combinations (4²). In contrast, there are 64 possible nucleotide triplets (4³), which is far more than the number of amino acids. Scientists theorized that amino acids were encoded by nucleotide triplets and that the genetic code was degenerate. In other words, a given amino acid could be encoded by more than one nucleotide triplet. This was later confirmed experimentally; Francis Crick and Sydney Brenner used the chemical mutagen proflavin to insert one, two, or three nucleotides into the gene of a virus. When one or two nucleotides were inserted, protein synthesis was completely abolished. When three nucleotides were inserted, the protein was synthesized and functional. This demonstrated that three nucleotides specify each amino acid. These nucleotide triplets are called codons. The insertion of one or two nucleotides completely changed the triplet reading frame, thereby altering the message for every subsequent amino acid (Figure 15.4). Though insertion of three nucleotides caused an extra amino acid to be inserted during translation, the integrity of the rest of the protein was maintained.

Illustration shows a frameshift mutation in which the reading frame is altered by the deletion of two amino acids.

Figure 15.4 The deletion of two nucleotides shifts the reading frame of an mRNA and changes the entire protein message, creating a nonfunctional protein or terminating protein synthesis altogether.

Scientists painstakingly solved the genetic code by translating synthetic mRNAs in vitro and sequencing the proteins they specified (Figure 15.5).

Figure shows all 64 codons. Sixty-one of these code for amino acids, and three are stop codons.

Figure 15.5 This figure shows the genetic code for translating each nucleotide triplet in mRNA into an amino acid or a termination signal in a nascent protein. (credit: modification of work by NIH)

In addition to instructing the addition of a specific amino acid to a polypeptide chain, three of the 64 codons terminate protein synthesis and release the polypeptide from the translation machinery. These triplets are called nonsense codons, or stop codons. Another codon, AUG, also has a special function. In addition to specifying the amino acid methionine, it also serves as the start codon to initiate translation. The reading frame for translation is set by the AUG start codon near the 5' end of the mRNA.

The genetic code is universal. With a few exceptions, virtually all species use the same genetic code for protein synthesis. Conservation of codons means that a purified mRNA encoding the globin protein in horses could be transferred to a tulip cell, and the tulip would synthesize horse globin. That there is only one genetic code is powerful evidence that all of life on Earth shares a common origin, especially considering that there are about 10⁸⁴ possible combinations of 20 amino acids and 64 triplet codons.

Link to Learning

Transcribe a gene and translate it to protein using complementary pairing and the genetic code at this site.

Some hereditary and age-related diseases are caused by translation errors. Explain why an error in translation may cause disease.

If there is an error in translation, the correct lipids will not be made for signaling, storage of energy or to perform vital functions. This can cause hereditary and age-related diseases.
Translation is the process in which a particular segment of DNA is copied into RNA (mRNA) by the enzyme RNA polymerase. Error in such copying can lead to various hereditary and age-related diseases.
Translation is the process used by ribosomes to synthesize proteins from amino acids. If there is an error in this process, the correct proteins will not be made to build important body tissue or perform vital functions thus leading to hereditary and age-related diseases.
Translation is the process Golgi bodies use to synthesize proteins from amino acids. If there is an error in this process, the correct proteins will not be made to build important body tissue or perform vital functions.

Science Practice Connection for AP® Courses

Think About It

A strand of DNA has the nucleotide sequence 3'……GCT GTC AAA TTC GAT……5'. What is the sequence of mRNA that is complementary to this DNA sequence? Using the chart of codons in the text, determine the sequence of amino acids which can be generated from this strand of DNA.
How does degeneracy of the genetic code make cells less vulnerable to mutations? What is an advantage of degeneracy with respect to the negative impact of random mutations on natural selection and evolution?

Teacher Support

The first question is an application of Learning Objective 3.1 and Science Practice 6.5 because students are explaining how the language of DNA can be transcribed and translated into a sequence of amino acids.

The second set of questions are an application of Learning Objective 1.15 and Science Practice 3.1 because students are asked to raise questions about the universal genetic code and the impact of its degeneracy on mutations.

Answer

3'…GCT GTC AAA TTC GAT…5'
mRNA 5'……CGA CAG UUU AAG CUA……3' ;
peptide…Arg Gln Phe Lys Leu……

Degeneracy is believed to be a cellular mechanism to reduce the negative impact of random mutations. Codons that specify the same amino acid typically only differ by one nucleotide. In addition, amino acids with chemically similar side chains are encoded by similar codons. This nuance of the genetic code ensures that a single-nucleotide substitution mutation might either specify the same amino acid but have no effect or specify a similar amino acid, preventing the protein from being rendered completely nonfunctional.

Scientific Method Connection

Which Has More DNA: A Kiwi or a Strawberry?

Photographs show a thin slice of a green kiwi fruit and a bowl of strawberries.

Figure 15.6 Do you think that a kiwi or a strawberry has more DNA per fruit? (credit “kiwi”: "Kelbv"/Flickr; credit: “strawberry”: Alisdair McDiarmid)

Question: Would a kiwifruit and strawberry that are approximately the same size (Figure 15.6) also have approximately the same amount of DNA?

Background: Genes are carried on chromosomes and are made of DNA. All mammals are diploid, meaning they have two copies of each chromosome. However, not all plants are diploid. The common strawberry is octoploid (8n) and the cultivated kiwi is hexaploid (6n). Research the total number of chromosomes in the cells of each of these fruits and think about how this might correspond to the amount of DNA in these fruits’ cell nuclei. Read about the technique of DNA isolation to understand how each step in the isolation protocol helps liberate and precipitate DNA.

Hypothesis: Hypothesize whether you would be able to detect a difference in DNA quantity from similarly sized strawberries and kiwis. Which fruit do you think would yield more DNA?

Test your hypothesis: Isolate the DNA from a strawberry and a kiwi that are similarly sized. Perform the experiment in at least triplicate for each fruit.

Prepare a bottle of DNA extraction buffer from 900 mL water, 50 mL dish detergent, and two teaspoons of table salt. Mix by inversion (cap it and turn it upside down a few times).
Grind a strawberry and a kiwifruit by hand in a plastic bag, or using a mortar and pestle, or with a metal bowl and the end of a blunt instrument. Grind for at least two minutes per fruit.
Add 10 mL of the DNA extraction buffer to each fruit, and mix well for at least one minute.
Remove cellular debris by filtering each fruit mixture through cheesecloth or porous cloth and into a funnel placed in a test tube or an appropriate container.
Pour ice-cold ethanol or isopropanol (rubbing alcohol) into the test tube. You should observe white, precipitated DNA.
Gather the DNA from each fruit by winding it around separate glass rods.

Record your observations: Because you are not quantitatively measuring DNA volume, you can record for each trial whether the two fruits produced the same or different amounts of DNA as observed by eye. If one or the other fruit produced noticeably more DNA, record this as well. Determine whether your observations are consistent with several pieces of each fruit.

Analyze your data: Did you notice an obvious difference in the amount of DNA produced by each fruit? Were your results reproducible?

Draw a conclusion: Given what you know about the number of chromosomes in each fruit, can you conclude that chromosome number necessarily correlates to DNA amount? Can you identify any drawbacks to this procedure? If you had access to a laboratory, how could you standardize your comparison and make it more quantitative?