Lecture Notes on the Genetic Code

The below mentioned article provides an overview on the Genetic Code.

From the study of protein synthesis, it is evident that amino acid sequence in the protein molecule is determined by DNA which carries information for this purpose.

But where does the information lie? If we look at DNA chain the deoxy-ribonucleotides backbone seems to be of no help as it IS a structure with repeating nucleotides, AG CT AT GC and so on. Here the DNA bases adenine, guaninee, cytosine and thymine are denoted by standard letters A, G, C, and T respectively.

The code lies only in the four bases (A,T, G and C) which appear many times in the chain providing different base sequences to different DNA strands. Actually, the 4 bases in DNA molecules keep in a coded form the message for the structure of protein molecules.

Through transcription process the information coded in DNA is transferred to wRNA tape. Thus the ‘four letter’ (four bases—A, T, G and C) language of DNA is transcribed into another complementary four letter language in messenger RNA.

The next important step is the actual translation of four letter language of messenger RNA into specific protein with 20 letter-(amino acids) language.

Since the gene is involved in the synthesis of protein and since protein represents in its primary structure linear combinations of 20 amino acids, it is logical to conclude that the coded message of the gene must be in the form of words which determine the sequence of particular amino acids. This is what we mean by genetic code.

But how is it that just four bases in DNA codify the different amino acids in the proteins? If each base specifies to one aminoacid, then only 4 aminoacids can be codified. If two adjacent bases (i.e., two letter words) are utilized, the maximum number of base pairs (words) will be 4 x 4 = 16 which can account for only 16 different amino acids.

But, the number of different amino acids is 20 for which at least 20 different words will be needed. So two letter codes do not solve the purpose. Now, if we consider three letter words or triplet codes (which include three adjacent nucleotides), then 4x4x4 =64 different types of three latter code words will be possible.

They can account for 64 different amino acids and that is much more than 20 required. The different types of possible mRNA code words in each case have been shown in Table 21.1.

Characteristics of the Genetic Code:

1. In mRNA, the group of nucleotides that specifies one aminoacid is known as a code word or “codon”. The message is read as group three nucleotides at a time. Code is comma-free.

2. Since in triplet coding, the number of codons is more than the number of amino acids, it is hypothesised that the genetic code would be degenerate. Degenerate coding means that for some amino acids there would be more than one codon. This means that a messenger RNA, with two exceptions (AUG and UGG), may provide for the incorporation of a particular amino acid more than one triplet.

In addition to this, one would expect to find more than one transfer RNA for an amino acid, that is degenerately coded. Recent researches have proved that these expectations are correct. Let us consider here a segment of single chain of DNA molecule containing nine bases (ATAGTCGAT).

If it is supposed that the genetic code is commaless, i.e., without spacer words (non-sense words) between sense words, then the coded message would be read in the following two ways:

1, As overlapping words (Fig. 21.1).

2. As non-overlapping words (Fig. 21.1).

If the words are read in the overlapping fashion, then the third nucleotide of DNA (i.e.. A) appears repeatedly in three triplets ATA, TAG and AGT (Fig. 21.1) but if the code is read in non-overlapping manner then third nucleotide (A) appears only in one code ATA.

Now the question arises whether the genetic code is of overlapping type or non-overlapping type? The experimental evidence is in the favour of non-overlapping code, that is, the adjacent codons do not overlap.

The argument is against the concept of an overlapping code system is based on the fact that a change in third nucleotide of DNA would bring about changes in the three amino acids in the overlapping condition, because all the first three trinucleotides ATA, AG and AGT would be changed by substitution of a nucleotide other than A in third place.

The result would be that protein molecule formed in the changed condition will differ from the normally formed protein in three amino acids. In the non-overlapping code, since only one word will be changed by the substitution of a nucleotide other than A in the DNA molecule at third place, the protein formed in the changed condition will differ from that synthesized in normal condition in only single amino acid.

The latter condition has been found really practical. This suggests that a non-overlapping system is involved in genetic coding. The genetic code is universal, that is, all living organisms have the same genetic language. It means that a message from an animal cell will produce the same protein whether it is translated by protein synthesis machinery of a bacterial cell or plant cell.

4. The genetic code uses specific initiation codon and stop codons. AUG triplet codes for methionine and is the initiation signal and if AUG is absent from the 5′ end of mRNA, it would not be in a position to carry out translation or protein synthesis. Of the 64 triplet codes three are stop signals or termination signals or ‘non-sense’ codons, UAG, UAA and UGA, which stop the synthesis of polypeptide.

5. Wobble Base Pairing:

It has been found from sequence analysis that the pairing of base at the 5′ end of anticodon (that is complementary to third base of codon) is not as strict or precise as are the first and the second bases and it pairs with more than one base at the 3′ end of the codon i. e., U in third position of anticodon may pair with A or G in codon, G may pair with C or U and I (inosinic acid having the base hypoxanthine) may pair with U, C, A.

This is called wobble base pairing by Crick (1996) and only certain pairings are possible in this regard. Wobble does not allow any single /RNA to recognise four different codons (Fig. 21.2).

Deciphering of Genetic Code:

If the genetic message is actually contained in mRNA in the form of groups of 3 letter code words, it may be assumed that three consecutive bases of messenger RNA will be responsible for the attachment of one amino acid. Experimental evidence supporting the concept of triplet code was provided by F.H.C. Crick and co-workers (1961).

Marshal Nirenberg and his colleague John Matthaei were first to break the genetic code in vitro study of protein synthesis. Usually radioactive amino acids are used. Nirenberg and Matthaei took in a test tube the ribosomes, enzymes and energy liberating compounds and saturated them with the tRNAs obtained from the cell extract of a bacterium Escherichia coli.

Thus they stimulated the protein synthesizing mechanism of a cell in the test tube. In the experiment they observed that the incorporation of the amino acid phenylalanine into protein was considerably stimulated by the addition of an artificial RNA composed of uracil base only (i.e., mRNA containing UUUUUUUUUUUU or poly U).

This artificial RNA polymer and similar other polymers are synthesized with the help of an enzyme polynucleotide phosphorylase discovered by Ochoa, Nobel laureate. Since poly U RNA chain contains only one base U, it is clear that UUU is the code for amino acid phenylalanine, if triplet coding is correct. This opened up new possibility for breaking the genetic code.

After this significant discovery Nirenberg, Ochoa and their co-workers undertook the job of finding codons for various other amino acids with RNA polymers. These RNA polymers consisted of either one type of base like poly U or two bases, as poly AU(-AU-AU-AU and so on) or three types of bases, e.g., poly AUG (AUC-AUC-AUC………………. and so on). In some cases the codons for amino acids were identified easily.

Binding technique for breaking genetic code:

In many cases, however, there was ambiguity as the base sequence in the artificially synthesized messenger RNA could not be predetermined. Further, it was difficult to determine the sequence of bases in a codon, e.g., in making RNA polymer of three bases-A, U and C, there are six possible ways for the arrangement of bases in triplets AUG, ACU, CAU, UGA and UAG.

This was not easy to determine which one codon was valid for what amino acid. Answers were partly provided by the discovery made by Nirenberg and Leder (1964). They discovered that specific tRNA molecules could bind to ribosomal mRNA even if the mRNA was composed of a single trinucleotide. The binding of rRNA to mRNA is not possible if the mRNA consists of only mono or dinucleotides.

The genetic code as established for the bacterium Escherichia coli (C.I. = chain initiation; C.T. = chain termination or NONS = Non-sense

A = Adenine, G = Guanine, C = Cytosine, U = Uracil

ala = Alanine, arg = Arginine, asa = Aspartic acid

asp = Asparagine, cys = Cysteine, glue = Glutamic acid

gl = Glutamine, gly = Glycine, his = Histidine

ile = Isoleucine, leu = Leucine, lys = Lysine, met = Methionine

phe = Phenylalanine, pro = Proline, ser = Serine, tyr = Tyrosine

thr = Threonine, trp = Tryptophan, val = Valine.

This provides further proof for the triplet nature of the codes. Simple trinucleotides of known linear arrangement are easy to synthesize. The trinucleotide group with known base sequence were experimentally tested for their specificity to particular type of amino acid. Thus by sequence analysis of such a system the exact codes could be determined.

This is known as binding technique. Actually this technique helped a long way in working out the actual code. By this process all the 64 possible triplets were separately tested for 20 amino acids and condons for all the amino acids have been assigned. Figure 21.3 shows different amino acids and their respective codons.

In the Fig. 21.3 triplets UAG, UAA and UGA in mRNA are called non-sense codons or chain terminators (C.T.) as they do not code for any aminoacid and their presence in mRNA results in the stoppage of polypeptide formation. The triplets AUG and GUG appear to have two roles; firstly, they act as chain initiators (C.I.) and secondly, as specifies of amino acid methionine and valine respectively.

The genetic codes are universal, that is, the codes that are specific to particular amino acids in bacteria are also specific to those very amino acids in man, mouse, birds and all other living organism.

Dr. Khorana’s method:

Dr. Khorana was also working independently on the genetic codes. He was impressed by Dr. Nirenberg’s technique but he evolved his own method of cracking the code. He synthesized DNA of known sequence and added this to the cell free system in order to get messenger RNA of known sequence.

He began by linking just two DNA bases together and then polymerising the doublats give a long DNA strand containing an alternative sequence of the two bases (say, for example, AC—AC—AC—AC—AC—AC).

By repeating trinucleotides sequences such as TTC/AAG, and even tetranucleotide sequences such as TTAC/AATG he synthesized in laboratory eight kinds of artificial DNA with known base sequences which are as under:

The synthetic mRNA synthesized on the surface of synthetic DNA with known base sequence then directed the protein synthesis. From the sequence of amino acids in the protein molecules so obtained, the specific codes could be directly established for them. Using this procedure Khorana established all the 64 codons. Some of the salient features of this experiment are summarized here.

1. Repeating two nucleotide sequences in RNA, polypeptides with two amino acids in alternating arrangement were obtained e.g.,

2. Repeating trinucleotide sequences yielded a mixture of three proteins, each with a number of particular amino acid, for example, the repeating sequence—CUU yielded protein with polyphenylalanine, protein with polyleucine and protein with serine.

3. Repeating tetranucleotide sequences he got proteins with repeating 4 amino acids sequence as for example:

These experiments established beyond doubt 64 different codes. These are shown in Fig. 21.3.

In 1959, when Nirenberg was thinking of using RNA of known sequence to work out code words. Dr. Robert Holley and his team at Cornell University took the extremely bold decision to work out the structure of naturally occurring RNA. They chose the smallest form of RNA (tRNA) and set out to isolate it from yeast. By 1962 they had isolated and purified 3 yeast tRNAs.

The purified tRNAs was then split into segments by special techniques and then they started to work out piece by piece, the order of bases along the backbone of one of them. In the base sequence of tRNA, which carries the amino acid alanine, there was a region containing the triplet CGI (I=Inosine a chemical compound similar to G) or practically GGG.

That triplet acted as adapter nucleotide triplet or anticodon for alanine. By March 1965, they had succeeded in their efforts and anticodons for all amino acids had been identified. By determining the structure of rRNA molecule Dr. Holley and his colleagues have shown that it was possible to analyse with same technique other nucleic acids also.

For their brilliant analytical work Dr. Holley, Dr Hargovind Khorana and Dr. Nirenberg were named co-winners of the 1968 Nobel Prize.