Genetic Code in Biology

genetic code diagram

Core Concepts

In this article, we learn about how the Genetic Code translates DNA triplets into proteins and deals with DNA mutations.

Topics Covered in Other Articles

What is the Genetic Code?

In biology, the term “genetic code” describes the specific DNA sequences that correspond to specific amino acids during gene expression. Biochemists like to think of DNA as the source of all “biological information”, stored in units called genes. Each gene involves a sequence of nitrogenous bases which contains all the information needed to construct a protein. These DNA-dictated proteins then participate in virtually all the functions necessary in keeping the organism alive. This flow of information from DNA to proteins is considered so fundamental to our understanding of molecular biology that biochemists call the phenomenon the Central Dogma of Biology.

But how does DNA specify the structure of proteins? Before the discovery of the genetic code, scientists had known for decades that cells had some way to decode the base sequence of DNA into the amino acid sequence of proteins. In 1961, the Crick, Brenner et al. experiment “cracked the code”; they demonstrated that each sequence of three DNA nucleotides codes for one amino acid.

transfer of information from dna to amino acids through the genetic code
The above graphic shows a sequence of DNA triplets decoded into their respective amino acids.

The genetic code has two characteristics fundamental to its function:

  • The code is nonoverlapping. This means that every group of three nucleotides (or “triplets”) codes for only one amino acid in the final protein. Triplets form indivisible units that do not overlap. 
  • The code is degenerate. This means that multiple triplet sequences can code for the same amino acid. After all, there are 64 possible triplets and only 20 common amino acids, requiring most amino acids to have multiple triplets.

From Transcription to Translation

For a gene to be decoded and transformed into a protein, it must first be transcribed into mRNA in the nucleus. In transcription, RNA polymerase uses Watson-Crick base pairing to assemble an mRNA strand using the DNA gene as a template. Importantly, each DNA triplet transcribes into its complement, with G’s and C’s as well as A’s and T’s replacing each other. (Although RNA uses uracil instead of thymine!). Biochemists call the mRNA complement of these triplets codons. 


The other DNA strand not bound to RNA polymerase has the same sequence as the synthesized mRNA. Biochemists call this the sense strand, which thus has each mRNA codon, though it has thymines instead of uracils.

Once transcribed, mRNA leaves the nucleus and becomes processed by spliceosomes before moving on to the ribosome for translation. In translation, tRNA molecules enter the ribosome and bind to the mRNA. tRNAs have three nucleotide sequences, called anticodons, that base pair with each mRNA codon. Corresponding to the anticodon, each tRNA carries an amino acid which the ribosome subsequently adds to the growing peptide chain, eventually becoming a protein. 

translation, requiring the genetic code

To bind specific amino acids to specific tRNAs, enzymes known as aminoacyl-tRNA synthetases perform a reaction called tRNA activation (also known as “charging” or “loading”). In this reaction, the enzyme recognizes the sequence of the tRNA to match the correct amino acid to its tRNA.

trna activation

To understand which triplets, codons, and anticodons correspond to specific amino acids, we need to look at a genetic code table.

The Genetic Code Table

genetic code table

This table shows which mRNA codons become translated into which amino acids.

As you may notice, multiple codon sequences correspond to multiple amino acids. This demonstrates the degenerate quality of the genetic code. Additionally, many codons are specifically variable in the third base while still coding for the same amino acid. This corresponds to the ability of tRNA to perform “wobble” base pairing specifically in the third base. To learn more about the wobble hypothesis, check out this article.

In addition to the 20 amino acids, certain mRNA sequences (UAG, UAA, UGA) are known as “stop codons”. Appropriately, these sequences occur at the end of genes to tell ribosomes to stop translation. There also exists a “start codon”, at the beginning of the gene, which always codes for methionine in eukaryotes. In prokaryotes, this start codon corresponds to the slightly different formyl methionine.

The Genetic Code and Mutations

With an understanding of the genetic code, we can now begin to understand the harm of DNA mutations. When DNA mutates, its sequence changes. Due to the sensitivity of the genetic code, even slight changes in the DNA sequence can result in a different protein. These mutated proteins may be defective or directly harmful, which can have serious biological consequences.

the genetic code illustrating nonsense, missense, frameshift mutations

Point Mutations 

In a point mutation, or “missense” mutation, a single base in a gene changes. Naturally, this only affects one triplet, and thus only one amino acid in the final protein. This can still have serious consequences, such as in sickle-cell anemia, a mutation in the human hemoglobin gene. 

However, the genetic code minimizes the harm of point mutations. As we’ve mentioned before, the genetic code is degenerate, especially in the third base, so any point mutations in the third base likely result in the same amino acid. Additionally, many chemically similar amino acids have similar sequences. For instance, if the DNA coding for a non-polar leucine (AAC) mutates in the first base to CAC, the gene instead codes for the similarly non-polar valine. Though chemically different, this swap should have minimal consequences on the chemistry of the final protein.

Stop/Start Mutations 

In some stop/start mutations, or “nonsense” mutation, a stop or start codon changes to code for a different amino acid. Naturally, this fails to start translation or it results in incredibly long peptides without a proper stop. Another variety of this sort of mutation involves an intermediate codon mutating into a stop codon. Consequently, this results in a prematurely shortened peptide. In any of these cases, the structure of the peptide becomes drastically changed, often resulting in a non-functioning protein. 

To mitigate the risk of start/stop mutations, stop codons do have some degree of degeneration, as multiple sequences code for them. Nonetheless, these mutations tend to be fairly rare, even compared to other DNA mutations.

Frameshift Mutations

In frameshift mutations, an insertion or deletion of a few base pairs in a gene shifts the reading frame. Due to the nonoverlapping characteristic of the genetic code, this can have extreme consequences, as each codon after the insertion/deletion completely changes. 

Typically, insertions involve one additional nucleotide. Though a small change in the code, this one nucleotide shifts over every nucleotide that comes after. However, the ribosome will still read every three nucleotides as a codon, determining the following amino acid.

Deletions can range drastically in scope, but often have similar consequences concerning the frameshift. However, if the deletion involves a number of nucleotides divisible by three, the reading frame is retained. The protein will still lack intermediate amino acids, but this likely will have less serious structural consequences.