Amino acids will be incorporated into proteins from the aminoacyl-tRNAs. To study this process we must acquire a wider knowledge of ribosomes and the messenger RNA, which will form a complex with the aminoacyl-tRNAs.

Ribosomes:

These are ribonucleoprotein particles consisting of about 65% RNAs and 35% proteins in the case of E.coli and which may be obtained from cellular homogenates by simple centrifugation at high speed (105 000 x g).

These particles are usually designated according to their sedimentation constant: the ribosomes present in bacteria (as also in chloroplasts) are called 70 S because such is their sedimentation constant when the Mg2+ ion concentration in the solution is 0.01 M.

If this concentration is lowered to 0.001 M, the 70 S ribosomes dissociate into 50 S and 30 S particles, and this phenomenon is reversible if the Mg2+ ion concentration is again raised. The ribosomes of the cytoplasm of plant and animal cells are slightly heavier (80 S) and also dis­sociate into 60 S and 40 S particles when Mg2+ ion concentration is lowered.

The size of mitochondrial ribosomes varies according to their origin be­tween 60 S in animals and about 75 S in yeast and plants.

The 50 S and 30 S, or 60 S and 40 S particles can themselves be dissociated, as shown by the diagram below, into ribosomal RNAs (rRNAs), and proteins (and also reconstituted from these components under certain conditions):

There is one molecule of each protein per ribosome except for L7 and L12 which are present with 2 molecules per ribosome.

rRNAs have a secondary structure comprising many short helicoidal regions; this was shown by studies of accessibility to chemical modifications and enzymatic digestions, as well as comparisons of the structures of rRNAs of various organisms enabling the detection of conserved characteristics. The secondary structure of 16 S rRNA.

Intensive studies on these particles and their constituents are now in progress to determine the structure of these complexes (especially the nature of interactions between rRNAs and ribosomal proteins) and the respective roles of various constituents.

As regards studies on the architecture, they are based on the one hand, on experiments of assembly or reconstitution of ribosomal particles from the rRNAs and proteins (one can for example study the effects of the omission of a protein), on the other hand, on structural analyses using diverse methods, like the localization of proteins in the particle by observation of antibodies in electron microscopy, study of protein-protein or protein-RNA proximity with the help of cross-linking reagents or neutron diffusion by particles containing deuteriated proteins.

As for studies on the functions of the constituents of ribosomes they are also based on varied approaches like affinity labeling with analogues — particularly active — of compounds binding to ribosomes, chemical modification of proteins or RNAs in an attempt to establish correlations between structural modification and alteration of a function, analysis of mutations modifying a protein or a rRNA (for example, those responsible for resistance to an an­tibiotic, etc.).

These studies could establish, on the ribosomes, a number of active centres implying some proteins or RNA regions and playing a role in the different steps of protein synthesis.

For example, the P site (site of peptidyl- tRNA binding) seems to involve proteins L2 and L27, the binding of the 30 S particle to the mRNA seems to imply proteins S1, S18 and S21 as well as the 3′ part of the 16 S rRNA (which pairs with a complementary sequence situated upstream the initiator codon AUG of the bacterial mRNAs), proteins L2 and L16 seem to be involved in the peptidyl-transferase activity, proteins L7/L12 (there are 2 copies of this dimer per ribosome, while all other ribosomal proteins are represented by only one copy) appear to be involved in GTP-ase activity, etc.

In systems where protein biosynthesis takes place, in vitro as well as in vivo, it is observed that the 70 S (or 80 S) ribosomes are attached to a chain of single-stranded RNA, thus forming aggregates called polyribosomes or polysomes which could be observed using the electron microscope.

The num­ber of ribosomes which can be present on the same RNA chain depends on several factors, especially the size of the mRNA and the efficiency with which the ribosomes bind to the mRNA to translate it.

In optimal conditions for protein synthesis, there is in average one ribosome for every 80 nucleotides. In general, there are less than ten ribosomes on the eucaryotic mRNAs (which are usually monocistronic), while there can be several tens on prokaryotic mRNAs (which are often polycistronic).

These polysomes are the seat of proteins synthesis as we will see in the following, but we must first explain the nature and important characteristics of the RNA to which the ribosomes are bound and which is called messenger RNA.

Messenger RNA (mRNA):

The existence of mRNAs was not detected for a long time because they represent only a very small percentage (generally 2 to 4%) of the total cellular RNAs (rRNAs represent about 80% and tRNAs about 15% of the total RNA). mRNA molecules can be revealed by labelling techniques using a radioactive precursor (14C — uracil, for example) which is allowed to act only for a very short time (this is called a pulse) which is about one minute for bacteria; at the end of this time, the mRNAs will be preferentially radioactive because their turnover is extremely rapid.

In bacteria infected by a phage, one can observe, very soon after the infection, the appearance of RNA capable of hybridizing specifically to the DNA of this phage (but not to the DNA of the host-cell or to the DNA of other phages); this is therefore RNA transcribed from the phage DNA, carrying genetic information necessary for the expression of the genes of this phage, i.e. necessary for the infectious cycle, and which must direct the synthesis of proteins specific of the phage.

By blocking transcription in cells (with the help of an appropriate antibiotic for example), one can study the life expectation of existing mRNA molecules either by measuring the capacity of the mRNA to program the synthesis of the corresponding proteins, or by measuring the capacity of the mRNA to hybridize to the DNA from which it is transcribed.

In bacteria, the half-life of mRNA molecules (period of time after which half is degraded) is about 1 to 2 minutes; this may appear short but considering that the time required for the synthesis of a polypeptide chain of average size is less than 10 seconds (the rate of protein synthesis is about 15 amino acids per second), one realizes that this life time is sufficient to enable a mRNA molecule to direct the synthesis of numerous protein molecules, especially as a series of ribosomes run along the mRNA, following one another at regular intervals and each of them permitting the synthesis of a polypeptide chain.

The degradation of mRNAs seems to proceed from 5′ to 3′ i.e. in the same direction as the synthesis of mRNAs and their translation. Such rapid synthesis and short life of bacterial mRNAs are necessary for the capability of adaptation that bacteria must possess to face changes of medium; they must be able to rapidly synthesize an inducible enzyme when they need it and cease to synthesize it when it is no longer required.

But the half-life of all mRNAs is not as short; in higher organisms in particular, there is no necessity of such rapid adaptation (the environment is more stable) and the mRNAs can exist for a longer time. The half-life of mRNAs of mammalian cells in culture is in the order of a few hours but can extend to 24 hours (i.e. the duration of the cell cycle).

Furthermore, in the particular case of cells deprived of their nucleus either after a natural process (mammalian red blood cells), or after an experimental intervention (Acetabularia), and DNA is no longer present to serve as template for the synthesis of new mRNA molecules, the mRNAs can function (direct protein synthesis) for hours or even days.

The size of mRNAs is very variable because the size of the proteins whose synthesis they direct is itself very variable. A polycistronic mRNA can result from the uninterrupted transcription of about ten adjacent genes in a given operon, its molecular weight can be in the order of 3.5 X 106 (which cor­responds to about 104 nucleotides) and it can thus be 3 times longer than the largest ribosomal RNAs (23 S RNA) and about 120 times longer than a tRNA.

Even in the case of monocistronic mRNA, the length is not limited to the coding region between the initiation codon (generally AUG) and the termina­tion codon (UAG, UAA or UGA): upstream the initiation codon (i.e. on the 5′ side) there is a leader region, and downstream the termination codon (i.e. on the 3′ side) there is a tail region.

Besides, in polycistronic mRNAs (i.e. in most bacterial mRNAs), there are intercistronic (or intergenic) regions which can be long (a few tens of nucleotides) or short (1 or 2 nucleotides only). In some cases there is on the contrary, an overlapping of two genes: the last nucleotide of the termination codon (UGA) of one of the genes is also the first nucleotide of the initiation codon (AUG) of the following gene.

Mechanism of Translation:

As observed in bacteria, the polysomes consisting of 70 S ribosomes bound to a mRNA chain, are not static; on the contrary each ribosome moves along this chain in the direction 5′ → 3′. When a ribosome has moved a particular distance, a second ribosome binds and begins, in its turn, to move towards the 3′ end of the mRNA, and so on. This movement corresponds to the translation, the deciphering of the message carried by the mRNA, thus permitting protein synthesis.

How is this message, coded in nucleic language, translated into protein language? How are amino acids specified (coded) to permit their incorpora­tion in a correct order? This is the problem of the genetic code. If one groups in twos the 4 types of nucleotides found in the mRNAs it is clear that there are only 16 combinations (or doublets) possible, which is insufficient to code 20 amino acids.

If on the contrary, one groups the nucleotides in threes, then 64 triplets are possible, which is amply sufficient; there is even an excess of triplets and we will see that this enables most amino acids to be coded by several triplets.

It could be shown that these triplets do not overlap (i.e. a nucleotide is never part of 2 triplets, it cannot, for example, be the last nucleotide of a triplet and the first of the following triplet), so that a polypeptide comprising n amino acids will be coded by a polyribonucleotide chain having 3n nucleotides.

The mRNA must therefore be considered as a succession of nucleotide triplets, or codons. These codons are, so to say, the words of polynucleotide sentences; they will be translated into amino acids which are the words of protein sentences.

Considering a protein of average size, having a molecular weight of 105 and consisting of about 800 amino acids, one may calculate that the corresponding mRNA will have 800 x 3 = 2 400 nucleotides, and the corresponding gene 2 400 X 2 = 4 800 nucleotides, which gives a molecular weight of 4 800 X 300 (average molecular weight of a nucleotide) = 1.5 x 106.

Knowing that there are about 2 000 different proteins in a bacterial cell, one can calculate that 1.5 X 106 x 2 000 = 3 x 109 daltons are required to code all these proteins, and one thus obtains the molecular weight of 3 x 109 indicated for the DNA of E.coli.

The existence of a triplet code was confirmed by the study of mutations bringing about a change in phase of the translation (frame-shift mutations) and consisting of an addition or on contrary, a deletion (loss) of one or several nucleotides.

It was indeed observed that if two mutations of opposite signs (e.g., an addition of one nucleotide and then a deletion of one nucleotide) take place sufficiently near one another (in the same gene), the correct translation is re-established after the second mutation, and if three mutations of the same sign (e.g., three additions of one nucleotide) take place in the same gene, the correct translation is re-established after the third mutation.

Translation is based on the same fundamental principle as the replication of DNA (or viral RNA) and the transcription of DNA into RNA: the pairing of complementary bases. It involves the tRNAs, or more exactly the aminoacyl-tRNAs, since each tRNA carries the amino acid which corresponds to it.

Each tRNA contains a particular nucleotide triplet, situated in a single-stranded loop (see fig. 6-44) and whose bases are therefore not paired; this triplet is called anticodon, precisely because it is susceptible to pair with a nucleotide triplet of complementary sequence (or codon) present in the mRNA.

Thus, each codon of the mRNA can be “read” (or translated) correctly because it is recognized only by the anticodon of the corresponding aminoacyl-tRNA. It is this specificity of pairing, between successive codons of the mRNA and anticodons of the corresponding aminoacyl-tRNAs which ensures the ordered incorpora­tion of amino acids into the protein.

The tRNA therefore plays the role of a molecular adaptor, i.e. it binds an amino acid and brings it to the growing peptide chain, in order that it is incorporated at the correct place specified by the corresponding codon.

The direction of synthesis of polypeptide chains was determined by Dintzis who followed as a function of time, the course of incorporation of tritiated leucine in the α and β chains of the hemoglobin synthesized by reticulocytes in suspension (at low temperature, to slow down protein synthesis).

At regular intervals (between 4 and 60 minutes), he isolated the hemoglobin, separated the α and β chains, subjected them to a hydrolysis by trypsin, fractionated the peptides obtained and determined their radioactivity. Knowing the position of peptides in the globin chains, he could observe that the peptides near the C-terminal end are the first to be labeled (after 4 minutes exposure to 3H leucine), and that in course of time the number of radioactive peptides in­creases from the C-terminal end towards the N-terminal end of the polypeptide chains, with the result that after 60 minutes, the radioactivity of all peptides containing leucine is somewhat equivalent.

The fact that the peptides near the C-terminal end are the first to be labeled indicates that they are synthesized last; for short exposures to 3H leucine, the labeled amino acid is incorporated only at the end of the chain of almost completed globin molecules and the only complete molecules which are labeled are those which contain the radioactive amino acid in the last peptides formed (during these short exposures some 3H-leucine is also incorporated at the beginning or in the middle of the chain, but these molecules have no time to be completed and are lost during the isolation of the complete hemoglobin).

From these experiments it could be deduced that the peptide chains are synthesized from the N-terminal end towards the C-terminal end.

Elongation of the Polypeptide Chain:

Figure 6-46a represents a polypeptide chain in formation with the first two amino acids already linked to one another by a peptide linkage. The second amino acid is still bound by its carboxyl to the tRNA no. 2 responsible for its transfer.

This tRNA no. 2 is also linked, on the one hand to codon no. 2 of the mRNA (by its anticodon), and on the other hand, to the ribosome in a site called “peptidyl” because it is the site normally reserved for the peptidyl-tRNA (by opposition to the “aminoacyl” site meant to receive the aminoacyl-tRNA).

The A site (aminoacyl) is also called acceptor site, while the P site (peptidyl) is also called D site (donor). It will be noted that the NH2 group of the first amino acid is not involved in the first peptide bond (it is its carboxyl group which is linked); actually, (as seen above) the protein chain is synthesized from its N-terminal end to its C-terminal end.

For the chain to lengthen by one unit (i.e. by one amino acid), a number of reactions must take place, and 3 steps may be distinguished:

a) Attachment of the Aminoacyl-tRNA:

The next aminoacyl-tRNA is selected thanks to its anticodon which must pair with codon no. 3 of the mRNA. Besides, it binds to the “aminoacyl” site of the ribosome (see fig. 6-46b). The binding requires on one hand, GTP and on the other hand, an elongation factor EF-T which in reality, consists of 2 factors differing by their stability to heat: EF-Ts which is stable and EF-Tu which is unstable.

The factor EF-Tu forms with GTP a binary complex EF-Tu-GTP which binds the aminoacyl-tRNA to give an aminoacyl-tRNA-EF-Tu-GTP complex. This complex will link with the ribosome, permitting the binding of the aminoacyl-tRNA to the A site and the hydrolysis of GTP, which liberates a binary complex EF-Tu-GDP.

The factor EF-Ts displaces the GDP from this complex, thus forming EF-Tu-EF-Ts (i.e. the complete EF-T factor), then EF-Ts is in its turn displaced by GTP, which regenerates the active form EF-Tu-GTP.

Mechanisms of the Attachment of the Amino-Acyl-tRNA

b) Formation of the Peptide Linkage:

The formation of the peptide linkage is catalyzed by a peptidyltransferase which is one of the proteins of the 50 S ribosomal particle (it is also called peptide-synthetase); it involves on one hand, the NH2 of amino acid no. 3 and on the other hand the carboxyl group of amino acid no. 2 which is still linked to tRNA no. 2 by an ester linkage; this ester linkage will therefore break and the peptide formed will leave tRNA no. 2 and, lengthened by one unit, it will be attached to the tRNA no. 3, (see fig. 6-46c).

c) Translocation:

After the formation of the new peptide linkage, the tripeptide is carried by tRNA no. 3 which is still bound to the “aminoacyl” site. This peptidyl-tRNA must leave the “aminoacyl” site to occupy the “peptidyl” site and displace from it the deacylated tRNA no. 2 which binds transitorily to a third site on the ribosome (E site) before being ejected from the ribosome (it can then bind a new amino acid molecule).

This translocation requires a protein factor — the EF-G factor (“Elonga­tion Factor G”) – and GTP (which is split into GDP + Pi) providing the energy required. A 3 nucleotide displacement of the ribosome brings it to codon no. 4 of the mRNA, enabling the arrival of aminoacyl-tRNA no. 4 which will bind to the “aminoacyl” site which has become vacant after the translocation (see fig. 6-46d).

Elongation of the Polypeptide Chain

The mechanisms of elongation of the peptide chain are almost the same in the eucaryotic cell, where one finds the eucaryotic elongation factors eEF-1 and eEF-2 which correspond respectively to the bacterial factors EF-T and EF-G. The eEF-2 factor required for translocation is inactivated by the diph-teric toxin, which transfers the ADPR (adenosine-diphosphate ribosyl) group from NAD to eEF-2, and this ADP ribosylation is responsible for the lethal effect of the toxin.

Cycle of Factors Required for Protein Synthesis:

The number of factors of ribonucleic nature and protein nature, required for the mechanism of protein biosynthesis is very large (much greater than 100). We have already seen that in E.coli the translation of procaryotic mRNAs requires tRNAs (more than 70), aminoacyl-tRNA synthetases (about 20), ribosomes comprising three types of rRNAs and over fifty proteins; moreover, in addition to the elongation factors EF-T and EF-G just mentioned, are also needed initiation factors and termination factors.

Cycle of the Main Factors Necessary for Protein Biosynthesis

After their participation in the synthesis of a polypeptide chain these factors can be re-utilized a number of times before being degraded. This is illustrated in figure 6-47 which shows a polysome, i.e. a mRNA chain to which are bound several ribosomes, traveling from the 5′ to the 3′ end.

The longer the distance traveled by the ribosome, in other words, the greater the region of the mRNA already translated, the longer the growing polypeptide chain. When the ribosome reaches the end of the message, the terminated protein chain is liberated; the ribosome detaches itself from the mRNA and dissociates into 30 S and 50 S sub-units which will join the pool of particles available for a new translation.

As already mentioned the tRNAs are liberated, after delivering the amino acid they carried and after having for a moment, carried the growing peptide chain; they can then bind another amino acid.

Lastly, after being translated a number of times, the messenger RNA will be enzymatically degraded into ribonucleotides; their longer life notwithstanding, the rRNAs and tRNAs will finally have the same fate, and these ribonucleotides can be used — with those synthesized de novo — for the synthesis of new RNA molecules.

Elucidation of the Genetic Code:

In 1961, Nirenberg and Matthaei carried out a decisive experiment: they added polyuridylic acid or poly U (a homopolynucleotide synthesized from UDP by polynucleotide phosphorylase) to a cell-free extract prepared from E.coli containing ATP, GTP1, Mg2+ ions, amino acids (in fact, they carried out 18 experiments, each with a different radioactive amino acid), ribosomes, tRNAs and all the necessary enzymes and factors, but freed from the en­dogenous messenger RNA by pre-incubation in presence of DNase (thus by destruction of the template, mRNA can no longer be formed, and the pre-ex­isting mRNA is rapidly catabolized).

In these conditions they observed the formation of a particular polypeptide, consisting solely of phenylalanine residues (polyphenylalanine). If poly U codes for phenylalanine, it means that the codon UUU is recognized by phenylalanyl-tRNA. Subsequently, it was observed that poly C permits the formation of polyproline (CCC is therefore a codori of proline) and that poly A permits the formation of polylysine (AAA is therefore a codon specifying lysine).

The use of copolymers, like poly UG, synthesized with the help of the polynucleotide phosphorylase could establish the probable composition of codons specifying a number of amino acids (e.g., 2U, 1G) but not their exact sequence.

A number of codons could be elucidated thanks to the use of polyribonucleotides of alternate dinucleotide or trinucleotide sequences, synthesized by Khorana and his team by means of DNA-polymerase and RNA-polymerase.

Thus a polymer consisting of alternate cytidylic and adenylic residues, CpApCp- ApCpApCpApC… comprises only 2 codons, CAC and ACA, whatever the place where translation begins, and in a cell-free extract it directs the synthesis of a polypeptide comprising histidyl and threonyl residues in alternance.

On the contrary, a polyribonucleotide of alternate trinucleotide sequence like ApAp- GpApApG…ApApG directs the synthesis of 3 different homopolypeptides: polylysine if the reading begins at the first A (codon AAG), polyarginine if the reading begins at the second A (codon AGA), polyglutamic acid if the reading begins at G (codon GAA).

The fact that only homopolypeptides are obtained shows that translation, once started, proceeds sequentially, i.e. the nucleotides are read regularly in sets of three, till the end, without leaving out any nucleotide.

Codon sequence determination made great advances thanks to another method developed in Nirenberg’s laboratory; it does not involve any protein synthesis, but only the formation of the specific complex between an aminoacyl-tRNA, the corresponding codon and the ribosomes. The formation of the specific complex takes place in presence of trinucleotide (e.g., UUU) and it is not necessary to use an oligo- or polynucleotide or to add GTP.

The principle of the technique is very simple: a given triplet, a 14C aminoacyl-tRNA and ribosomes are mixed and then filtered through a nitrocellulose membrane (millipore); the 14C aminoacyl-tRNA is retained only if its anticodon has paired with the triplet, but if it does not recognize this codon, it passes through the filter; therefore one simply measures the radioactivity present in the complex, i.e. retained on the filter. In this manner, one can determine the amino acid specified by a given codon, by determining — among the 20 aminoacyl- tRNAs — the one which can pair with the triplet studied.

We mentioned that the mRNAs are not only synthesized in the direction 5′ → 3′ but also translated in this direction. This latter point could be con­firmed experimentally by introducing in a cell-free system the polynucleotide s’pA (pA)npApC3‘ and studying the polypeptide formed in response to this message.

A polylysine is formed (having therefore a lysine at the N-terminal end), but it has an asparagine residue in C-terminal position. This result is in conformity with a translation of the mRNA in the direction 5′ → 3′. The fact that/nRNAs are translated in the direction 5′ → 3′ enables the translation to begin before the completion of the mRNA synthesis (i.e. a transcription-trans­lation coupling) in procaryotes. If the /nRNAs were translated in the reverse direction only the terminated molecules could be translated.

All these results are summarized in the genetic code table (see fig. 6-48). This table calls for some remarks.

There are 61 codons having a meaning or sense (i.e. specifying an amino acid) and 3 non-sense codons (which will be studied in the following). Besides, it is observed that amino acids can be coded by several codons; there are two exceptions, tryptophan which is coded only by UGG and methionine coded only by AUG.

Some amino acids like leucine, serine, arginine can be specified by 6 different triplets. It must be noted that the triplets YZU and YZC always correspond to the same amino acid; in most cases the same is true of codons YZA and YZG; there are only two exception: UG purine and AU purine; in many cases (8 boxes out of 16) the same amino acid is specified, whatever the 3rd letter of the codon (e.g., in the box UC: UCU, UCC, UCA and UCG, all code for serine).

This suggests that the first two bases are of capital importance in the pairing. This is a factor of genetic stability, because a point mutation (replacement of a base by another) at the third position will often have no effect on the nature of the specified amino acid (“silent” mutation), especially if it is a transition, i.e. the replacement of a purine by another, or a pyrimidine by another (the replacement of a purine by a pyrimidine or vice versa is called a transversion).

Genetic Code

Several triplets therefore correspond to one amino acid; it is said that the code is degenerate. One might wonder whether these diverse codons are recognized by a single rRNA or by several isoacceptor tRNAs. In fact, the same tRNA can often recognize several codons, by virtue of a certain flexibility, or wobble, in the pairing of the third nucleotide of the codon.

The wobble hypothesis, originally advanced by Crick, has been confirmed; it was indeed found that the first nucleotide of the anticodon can, in many cases, recognize several nucleotides at the 3rd position of the codon:

For example yeast tRNAala (see fig. 6-44) which has an I at the first position of the anticodon (IGC) can recognize the alanine codons GCU, GCC and GCA, and a tRNAphe having an anticodon GAA can read the two phenylalanine codons UUU and UUC as may be seen in the diagram below, (it must be noted that the sequences are written in the direction 5′ → 3′ and that the sequences of the corresponding codons and anticodons are antiparallel.)

Other modified nucleotides are sometimes present at the first position of the anticodon of some fRNAs and can influence the recognition of codons. Lastly, in certain cases, it appears that the pairing of two nucleotides out of three could suffice, especially in the case of 2GC pairs (which are more stable than the other pairs); this mechanism is called “two out of three”.

But the synonymous codons which differ at the level of one of their first two nucleotides must be read by different anticodons, therefore, by different tRNAs; this is the case for example for the leucine codons UUA and CUA, the arginine codons CGA and AGA, and of course for the serine codons UCU and AGU etc. whose first two nucleotides are different.

The genetic code was therefore elucidated by experiments carried out in vitro, with cell-free systems or ribosomes from E.coli. But it was later found that this code is universal, from viruses and bacteria up to higher plants and animals (including man). It was however noted recently that mitochondria (in yeast, mammals) use a genetic code which differs in some cases from the code called “universal”.

Indirect confirmations of the validity of the genetic code thus determined in vitro were obtained by the study of mutations — spontaneous or induced — and of their consequences in the amino acid sequence of proteins corresponding to the mutated genes.

Among the spontaneous mutations, were especially studied those affecting the genes coding for the globin chains in man. These mutations are generally point mutations, i.e. they consist of a modification of a single nucleotide (which is replaced by another) with the result that the initial codon is transformed into a new codon which differs from the former by only one nucleotide.

While studying the substitutions of amino acids resulting from these mutations, it was observed that they were indeed compatible with the replacement of a single nucleotide (in most cases, it is one of the four possible transitions, or A → U or U → A transversions): thus in the abnormal human hemoglobins were noted the substitutions Gly → Asp (which would result from a GGU → GAU transi­tion), Glu → Lys (GAA → AAA transition), His → Tyr (CAU → UAU transition), Asn → Lys ( AAU → AAA transversion), etc. and an indirect confirmation of these codons was thus obtained.

Among the induced mutations were studied — particularly in the case of tobacco mosaic virus — those caused by nitrous acid which is known to act by causing a deamination of bases: cytosine is deaminated into uracil, adenine is deaminated into hypoxanthine (which pairs with C, with the result that this modification is actually equivalent to a A → G transition).

While studying the substitutions of amino acids in the capsid protein in the progeny of viruses treated with nitrous acid, one notes for example Thr → Ile (ACA → AUA), Ser → Phe (UCU → UUU), Pro → Leu (CCC → CUC), Ile → Val (AUU → GUU) and these amino acids replacements also give an indirect confirmation of the validity of the codons which were assigned to them.

A direct confirmation of the validity of the genetic code could be obtained: the amino acid sequence of the coat protein of phage R17, a RNA-containing phage, was known, and upon determining the sequence of some nucleotide fragments of the corresponding cistron, one found nucleotide sequences coin­ciding with those which could be written — on the basis of the genetic code — from the amino acid sequences. It is interesting to note that the determination of the nucleotide sequence of the RNA of phage MS2 has shown that in such simple genomes all the 61 sense codons are utilized.

With the development of the DNA sequencing techniques it then became possible to verify, in the case of numerous genes, that the nucleotide sequence of the coding strand of the DNA corresponded to the amino acid sequence of the protein coded by the gene concerned.

The RNAs of RNA-containing phages and viruses which can be prepared from viral particles, are pure messenger RNAs and it is not surprising that the first protein biosynthesis experiments in vitro directed by natural RNAs were carried out with phage RNAs. They allowed, in a cell-free system of bacterial origin, the synthesis of proteins specific of the phage.

It is on the contrary much more difficult to isolate a particular messenger RNA of a higher organism because — as seen in the foregoing – the totality of mRNAs represents less than 5% of the cellular RNAs and besides, they consist of a mixture of several hundred (or even more) of different types of molecules.

However, using specialized cells or tissues which produce only a very limited number of different proteins (for example, reticulocytes for globin, some tumours for immunoglobulins, the eye-lense for crystallins, the oviduct for ovalbumin), one could isolate and purify mRNAs and obtain their translation into the corresponding proteins, either by introducing them in a cell-free system (prepared generally from reticulocytes or wheat germ), or by injecting them into Xenopus oocytes.

In the case of cells containing a large number of different mRNAs, when one wants to obtain one type of mRNA present in small quantities, one can selectively precipitate the polysomes containing this mRNA with the help of antibodies raised against the corresponding protein.

Lastly, it may be noted that ambiguities and errors can be observed during translation when organic solvents, an excess of Mg2+ ions or streptomycin, is added to the cell-free system; in presence of streptomycin in particular, poly U stimulates not only the incorporation of phenylalanine, but also that of isoleucine, leucine, serine and tyrosine, when the cell-free system is prepared from streptomycine-sensitive bacteria.

It was possible to determine the dif­ference between bacteria sensitive and resistant to this antibiotic; this dif­ference resides in one of the proteins of the 30 S ribosomal particle, protein S12 whose sequence is modified in the resistant mutants.

Initiation of the Peptide Chain:

It was observed that in E.coli there are 2 tRNAs specific of methionine, tRNAMmet and tRNAFmet, which can be purified by conventional methods of fractionation of isoacceptor tRNAs (bidimensional electrophoresis on polyacrylamide gel, column chromatography).

Both these tRNAs can be aminoacylated into methionyl-fRNA; but only methionyl — tRNAfmet can — as indicated by the letter “F” — be formylated (on the amino group of methionine) by N10-formyl-tetrahydrofolic acid in presence of a transformylase and it incor­porates methionine at the beginning of the chain, in response to the codon AUG (or GUG). As for the methionyl — tRNAMmet, it incorporates methionine inside the chains, and only in response to the codon AUG.

Proteins synthesized in a cell-free system prepared from E.coli begin with N-formyl-methionine while methionine (non formylated) represents only 40% of the N-terminal residues of ribosomal proteins isolated from E.coli (it is however the dominant amino acid at this position); this suggests the existence in vivo of an enzyme which deformylates methionine (deformylase) and of an enzyme capable of eliminating methionine (exopeptidase).

N-formyl-methionyl-tRNA was also shown to be present in other bacteria and also in mitochondria and chloropiasts; this represents one more similarity between organelles and bacteria (we have already seen that organelles have 70 S ribosomes, like bacteria).

On the contrary, no N-formyl-methionyl-tRNA is found in the cytoplasm of eucaryotic cells; the initiator rRNA is a methionyl- tRNAMmet (non-formylated but formylatable in vitro with the help of a bacterial transformylase), different from the methionyl-tRNAM responsible for thp in­corporation of methionine inside the chains. Here again, an enzyme can cleave off methionine which is then no longer found in N-terminal position in the completed protein.

In bacteria, there are 3 initiation factors IF-1, IF-2 and IF-3 which participate in the formation of the initiation complex by binding to the 30 S particle. The binding of GTP to IF-2 enables the mRNA and the initiation tRNA to join the complex while IF-3 is liberated. We then have binding of the 50 S particle, with hydrolysis of GTP and liberation of IF-1 and IF-2.

This hydrolysis of GTP takes place thanks to proteins L7 and L12 which also participate in the hydrolysis of GTP during the elongation of the protein chain. The 30 S particle participates in the formation of the initiation complex by binding to the mRNA on a ribosome binding site situated near the initiation codon AUG.

The formation of this complex implies pairing between an oligonucleotide sequence situated at the 3′ end of the 16 S RNA in the 30 S particle and a complementary sequence situated on the mRNA upstream the initiation codon AUG (thus differentiating this initiation AUG from an internal AUG).

For- myl-methionyl-tRNAFmet joins the complex by binding at the P site (while other aminoacyl-tRNAs can only bind at the A site of the ribosome), with the result that the A site will be able to receive aminoacyl-tRNA no. 2 in order to permit the formation of the first peptide linkage.

If the ribosomes are allowed to bind to the mRNA but the elongation is blocked, the ribosomes remain in the initiation site, and if a ribonuclease is then made to act, it will spare the mRNA region involved in the initiation complex.

This region can be isolated and studied, and it is observed that the bacterial ribosome protects a region of about 35 to 40 nucleotides, comprising two sequences common to all mRNAs: the initiation codon AUG (or GUG) and a sequence which has a complementarity with a hexameric sequence present near the 3′ end of the 16 S RNA:

The sequence identified by Shine and Dalgarno is constantly present upstream the initiation codon in the procaryotic mRNAs; this is an argument in favour of its participation in the initiation process of protein biosynthesis.

Besides, one knows a mutation in a gene of phage T7, where the terminal triplet AGG of this sequence is changed into AAG, and the binding of ribosomes to the mRNA is affected in this mutant. The study of a large number of Shine- Dalgarno sequences showed that the number of base pairs between the mRNA and the 16 S rRNA can vary from 3 to 9.

The 3′ region of 16 S rRNA seems to be involved in the initiation because it is very well conserved in bacteria, it is the target of kasugamycin (an inhibitor of initiation), and it is cross-linked with the initiation factors. This region can form a hairpin by intrachain base pairing.

The hexameric sequence (UCCUCC) is involved, either in this intrachain pairing, or in the pairing with the mRNA (this is a temporary pairing, broken after the initiation step, which then allows the ribosome to move along the mRNA).

In most cases, the ribosomes can bind independently at the beginning of each cistron of a polycistronic mRNA. Thus, the tryptophan operon of E.coli for example, is transcribed into a mRNA of about 7 000 nucleotides coding for 5 enzymes which catalyze various steps of the pathway of biosynthesis of this amino acid and for each of these 5 proteins there is on the mRNA an initiation signal (and a termination signal) of translation.

But since a ribosome covers a mRNA length of about thirty nucleotides (as shown by experiments of protec­tion of the RNA in the mRNA-ribosome initiation complex, against the attack by the pancreatic RNase), one can imagine that when the intercistronic region is short (and all the more so when there is overlapping) the 30 S particle of a ribosome terminating the translation of a cistron does not leave the mRNA and can immediately initiate the translation of the next cistron.

In some cases, the blocking of the translation, after the appearance of a non-sense codon in a polycistronic mRNA affects not only the expression of the mutated gene, but also that of the next (distal) cistrons; this phenomenon called polarity (or polar effect) is mainly due to an indirect effect at the level of transcription which is interrupted downstream the mutation, but there can be a polar effect due to the impossibility for the ribosomes to re-initiate the translation of the next cistrons.

In eucaryotes, near the initiation codon, there is apparently no sequence permitting, by base complementarity, the attachment of the small 40 S ribosomal sub-unit; the latter binds to the 5′ end of the mRNA and then moves up to the initiation site of protein synthesis. On the other hand, the eucaryotic initiation factors (called eIF) are more numerous (nine have been identified in red blood cells), and some are oligomeric, particularly eIF-2 and eIF-3.

The sub-unit α of the eIF-2 factor can be phosphorylated by a kinase (which inhibits the initiation of protein synthesis), either when there is heme deficiency in the red blood cells, or due to the effect of a double stranded RNA (or of interferon, which is itself induced by double stranded RNAs and inhibits the synthesis of viral proteins). This is an example of the control of gene expression at the level of translation.

The 3′ end of the small eucaryotic rRNA (18 S) presents a great similarity with that of the small procaryotic rRNA (16 S): in both cases the 3′ region can form, by intrachain pairing, a hairpin which comprises two adjacent dimethyl A.

On comparing the twenty nucleotides separating these two methy­lated residues from the 3′ OH end of the RNA, an almost complete homology is observed with only two differences: a dinucleotide sequence UU in bacteria is replaced by a AU or AA sequence in eucaryotes, and in eucaryotes there is a deletion of the pentanucleotide CCUCC (i.e. the quasi totality of the se­quence complementary to the Shine-Dalgarno hexanucleotide sequence).

The eucaryotic ribosomes do not bind to the mRNAs (which are, it may be recalled, mostly monocistronic) at an initiation site located just upstream the coding region, but they recognize the methylated cap, situated at the 5′ end of the non-translated leader region (whose length is generally less than 100 nucleotides), thanks to factors called cap binding proteins.

The cap is therefore necessary, and the uncapped mRNAs (obtained by blocking cap formation or by enzymatic cleavage of the cap) are not translated efficiently but for very rare exceptions. Some viral mRNAs (in the case of poliovirus, for example) have no cap and inhibit the translation of the mRNAs of the host cell by blocking the cap binding proteins.

When the leader region is short (less than 40 nucleotides) it may be im­agined that the ribosome covers both the cap and the initiation AUG codon. But when the distance is greater (e.g., it can reach 200-300 nucleotides), it is believed that the 40 S particle, after recognizing the cap, migrates along the mRNA till it meets the initiation codon.

The internal initiation sites, for example in the case of some viral mRNAs containing more than one cistron, do not seem to be recognized until a cleavage creates a new 5′ end near the initiation codon.

Termination of the Peptide Chain:

When poly U is used as messenger in a cell-free system, the poly- phenylalanine chains synthesized remain bound to the last tRNA and are not liberated from the polysomes. Poly U has no termination signal, a signal which natural messengers must have in order to permit the liberation of polypeptide chains.

Our knowledge on the termination mechanisms benefited in a large measure from studies carried out on the premature, accidental, termination of the synthesis of a protein taking place because of a non-sense mutation; this is a mutation which — after transcription of the cistron where it occurred — brings about the appearance in the mRNA of one of the 3 non-sense codons, UAG (amber), UAA (Ochre) or UGA (see fig. 6-48); these codons are called “non-sense”, because normally, the cell has no tRNA with an anticodon able to pair with any one of these 3 codons; such a codon is therefore not translated, and the synthesis of the corresponding protein is blocked at this level.

Thus, the RNA of a non-sense mutant of the bacteriophage f2, having a UAG codon at the seventh position in the cistron coding for the coat protein, permits only — in vivo as well as in vitro — the synthesis of a hexapeptide (fmet — ala — ser — asn —phe —thr).

The consequences, for the bacterial cell for example, of a non-sense mutation depend on the one hand, on the role — essential or otherwise — of the protein coded by the mutated gene, and on the other hand, on the position of the mutation in this gene.

For example, a non-sense mutation in the gene of β-galactosidase will have no consequence if the bacterium grows in a glucose-containing medium, because in these conditions the gene is not transcribed (see fig. 8-4).

A mutation in the i gene of the lactose operon will block the synthesis of the repressor with the result that the enzymes of the lactose operon will always be synthesized even in the absence of the product to be metabolized (lactose).

Lactose Operon

On the contrary a non-sense mutation in the gene of the RNA polymerase will be lethal. The position of the mutation also has some importance, because one can imagine a non-sense codon near the 3′ end of the cistron, permitting the synthesis of a protein lacking only a few amino acids at its C-terminal end which may not be essential for the biological activity of this protein.

In any case, a non-sense mutation can be corrected in two ways:

(i) By a second mutation in the same gene, more precisely in the same triplet, transforming the non-sense codon into a sense codon (see fig. 6-49a). This is a reversion; there can be a true reversion if the second mutation brings back the codon as it existed before the first mutation and in that case the protein will be necessarily functional; but a new sense codon may also appear, different from the one which was present before the first mutation, and in that case the protein will be biologically active only if the amino acid coded by this new triplet is compatible with biological activity.

Non-Sense Mutation

(ii) By a second mutation in an altogether different gene, the gene of a tRNA, the effect of which is more precisely a modification of the anticodon of this tRNA which becomes capable of pairing with the non-sense codon (fig. 6-49b). The non-sense codon therefore remains in the mRNA of the protein, but now it can be read. This is a suppression; in this case the protein will be active only if the amino acid brought by the suppressor tRNA is compatible with biological activity.

The natural termination of protein synthesis uses non-sense codons but it appears that the signal is not always limited to a single non-sense triplet.

It is logical that a more elaborate signal was selected during evolution, because a single triplet presents the serious disadvantage of easy conversion into a sense codon by a simple point mutation (replacement of one base by another), which would bring about the appearance of a chain consisting of two proteins fused together, and which would perhaps have the properties of neither (so that 2 enzymes would be missing in the cell). It has been possible to determine the sequence of an inter-cistronic region in the RNA of phage R17.

It comprises the 2 non-sense codons UAA and UAG, then an initiation codon AUG followed by 6 sense codons, and lastly the third non-sense codon UGA. Such a termina­tion signal appears to be better protected against point mutations, but it is not sure that the intercistronic regions of cellular mRNAs are as long.

It must be realized that a suppressor tRNA capable of reading a non-sense codon enters in competition with the termination factors (RF) which normally recognize this non-sense codon, not only in the mRNA corresponding to the mutated gene having a premature termination signal, but also in all the mRNAs containing this non-sense triplet as a normal termination signal, thus causing read through beyond the C-terminal end of a large number of proteins.

The translation will continue up to the next non-sense codon in phase, and the proteins thus elongated might lose their usual biological activities, which could be disastrous for the cell.

This explains why the efficiency of suppressor tRNAs is low, in the order of 10 to 40% for the suppressors of UAG, less than 10% for the suppressors capable of reading UAA (and UAG), which suggests that UAA is used more frequently as termination codon (because the cells can tolerate only a limited level of read through beyond the normal termination sites).

Furthermore, termination requires protein factors called RF (Release Fac­tors) which recognize the non-sense codons: RF-1 recognizes UAA and UAG, RF-2 recognizes UAA and UGA and RF-3 stimulates the activity of the other two factors.

Termination involves the rupture of the ester linkage between the last tRNA and the polypeptide chain, the liberation of this tRNA and the polypeptide chain and the dissociation of the 70 S ribosome into 50 S and 30 S particles which necessitates factor IF-3.

Inhibition of Protein Synthesis:

In the binding of amino acids to the corresponding tRNAs, an antibiotic is known, borrelidine, which specifically inhibits the aminoacylation of tRNAthr.

Puromycin is an antibiotic which presents a great structural similarity with the 3′ end of a tyrosyl-tRNA (see fig. 6-50); this enables it to react with a peptidyl-tRNA during the elongation of the peptide chain and to displace the tRNA to form a peptidyl-puromycin.

But since puromycin comprises an amide linkage and not an ester linkage between the carboxylic group of para-methoxy- phenylalanine (or methylether of tyrosine) and the amino group of the pen- tosamine, this peptidyl-puromycin can no longer allow elongation to continue and is released from the ribosome.

Structures of a Peptidyl-tRNA

Chloramphenicol inhibits protein synthesis on the 70 S ribosomes of procaryotes, mitochondria and chloroplasts, but not that taking place on the 80 S ribosomes of the cytoplasm of eucaryotes. On the contrary, cycloheximide inhibits protein synthesis on 80 S ribosomes, but not that taking place on 70 S ribosomes (procaryotes, mitochondria, chloroplasts).

As regards sensitivity to antibiotics as well as size of ribosomes and use of a formyl-methionyl-tRNAMmet for the initiation process, protein biosynthesis in mitochondria and chloroplasts resembles more the process taking place in procaryotic organisms than the one occurring in the neighbour­ing cytoplasm, which supports the hypothesis of an endo-symbiotic origin of these organelles; a number of arguments do suggest that mitochondria and chloroplasts could have derived from procaryotic organisms.

Colinearity of the Gene and the Peptide Chain:

Studies carried out on the mutations consisting of an addition or deletion of nucleotides, and particularly, the observation that three mutations of the same sign taking place in the same gene restore a translation in phase had already provided arguments in favour of the colinearity of the gene and the corresponding polypeptide chain.

This colinearity was confirmed by studies carried out on non-sense muta­tions in the gene coding for the protein of the head of phage T4; Brenner did show that there is a correlation between the length of the peptide chains formed and the position on the gene of codons mutated into non-sense codons.

Yanofsky and co-workers carried out a similar study on mis-sense mutations (sense codon transformed into another sense codon specifying a different amino acid) that occurred in one of the chains of E.coli tryptophan synthetase which is 267 amino acids long and whose sequence was determined; they observed, in about 20 different mutations, a correspondence between the positions of amino acids substituted in the peptide chain and the positions of sites mutated on the corresponding gene.

This colinearity rule is generally not respected in eucaryotes whose genes may contain coding parts, or exons, and non-coding parts, or introns.

Transport of Newly Synthesized Proteins:

a) Free Ribosomes and Ribosomes Bound to Membranes:

In eucaryotic cells one may distinguish free ribosomes in the cytosol, responsible for syn­thesizing the proteins which will be released in the cytosol, and ribosomes bound to the membranes of the endoplasmic reticulum, responsible for the synthesis of lysosomal proteins, proteins secreted outside the cell, and proteins which will be finally localized in the membrane, projecting on either side of this membrane. But the ribosomes themselves do not differ, they are free or bound to membranes depending on the protein they synthesize.

b) Signal Sequence or Signal Peptide:

A protein destined to cross the membrane of the endoplasmic reticulum binds to this membrane thanks to a signal sequence of fifteen to thirty amino acids, present at the N-terminal end of the nascent chain, but which is then cut by a signal peptidase and is therefore absent in the protein once secreted.

One knows at present, the signal sequence of more than hundred secreted proteins originating from diverse eucaryotic organisms; some common characteristics were observed: there is at least one residue having a positive charge at the N-terminal end, a hydrophobic part of 10 to 15 residues at the centre of the sequence (comprising leu, ile, vale, phe residues), and a more polar sequence of about 5 residues upstream the cleavage site.

However, some secreted proteins have an internal signal se­quence (e.g., ovalbumin) not situated at the N-terminal end. The addition of a signal sequence to a normally cytosolic protein enables the latter to be transported through the membrane of the endoplasmic reticulum, as shown by experiments where the signal sequence of β-lactamase of E.coli (an enzyme secreted in the periplasmic space between the external membrane and the inner membrane of the bacterium) was grafted at the N-terminal end of α- globin.

c) Signal Sequence Recognition Particle:

This particle (called SRP: Signal Recognition Particle) consisting of a 7S RNA (305 nucleotides) and 6 different proteins binds to a ribosome comprising a nascent protein chain having a signal sequence and will recognize, on the membrane of the endoplasmic reticulum, a receptor of SRP.

The ribosome carrying the nascent protein chain is thus brought to the level of the translocation system comprising especially 2 membrane proteins, the ribophorins I and II, while the SRP having done its work, returns to the cytosol. The elongation of the protein chain then con­tinues with the ribosome fixed to the membrane of the endoplasmic reticulum.

The proteins to be secreted and the lysosomal proteins completely pass through the membrane of the endoplasmic reticulum. On the contrary, other proteins must form part of a membrane: some have only one helical region across the membrane, others have several; some have their N-terminal end projecting out of the membrane on the cytosolic side and their C-terminal end projecting on the extracellular side, for other proteins it is the reverse.

d) Glycosylation of Proteins at the Level of the Endoplasmic Reticulum:

At the level of the endoplasmic reticulum, the signal sequences are eliminated by the signal peptidase and a large number of proteins undergo glycosylations to become glycoproteins. As indicated glycoproteins contain oligosaccharide units bound, either by O-glycosidic bonds to the hydroxyl of a serine or threonine, or by N-glycosidic bonds to the amide group of asparagine.

We have also seen how these units are transferred to the polypeptide chain thanks to an activated lipidic carrier, dolichol phosphate. Tunicamycin is an analogue of UDP- N-acetylglucosamine which blocks the fixation of N-acetylglucosamine on dolichol phosphate and therefore prevents the glycosylation of proteins.

e) Transport of Proteins to the Golgi Apparatus:

Proteins are transported by transfer vesicles from the cytoplasmic reticulum to the Golgi apparatus where modifications of oligosaccharide units takes place, as also a sorting of proteins which will be sent either to the plasmic membrane, or to the lysosomes, or to the secretory granules (or vesicles).

Various vesicles perform the transfer of proteins from one compartment of the Golgi apparatus to the other (compartments cis, median and trans are distin­guished) and from the trans compartment of the Golgi apparatus to various destinations (plasmic membrane, lysosomes, secretory granules).

f) Transport of Proteins from the Golgi Apparatus to their Final Destina­tion:

Some enzymatic proteins (particularly hydrolases) undergo phos­phorylation at the position 6 of several of their mannose residues; this forms a signal enabling their transport from the Golgi apparatus to the lysosomes. A mucolipidosis, characterized by an accumulation of glycolipids not digested in the lysosomes, is due to the absence of mannose-6-phosphate in several hydrolases which are then exported instead of being sent to the lysosomes.

In these patients, there is a lack of a specific phosphotransferase catalyzing the attack of the hydroxyl at position 6 of the mannose present in glycoprotein on the pyrophosphate bond of UDP-N-acetylglucosamine.

The glycoproteins comprising mannose-6-phosphate bind to a protein receptor of the membrane of the Golgi apparatus, and vesicles containing the glycoprotein-receptor complex, detach themselves by budding of the trans compartment of the Golgi apparatus.

After the glycoprotein has left the receptor and a phosphatase has dephosphorylated the mannose-6-phosphate, the transport vesicle can supply to the lysosomes, by fusion, the glycoproteins it contains.

On the contrary, it does not seem that for transport of other proteins, namely those which must be either secreted, or integrated in the plasmic membrane, particular monosaccharides serve as markers, and it is rather believed that elements of three-dimensional structure play a role in directing these glycoproteins to their correct destination.

g) Transport of Proteins to Mitochondria and Plasts:

The mitochondrial genome codes only for a very small number of mitochondrial proteins (about twelve), with the result that the great majority is coded by the nuclear genome, synthesized in the cytosol and then imported into the mitochondria.

The import of a protein into the mitochondrion requires the presence of a particular sequence (pre-sequence) at its N-terminal end, which must be recognized by receptors situated on the external membrane of the mitochondrion, but it appears that a certain destabilization of the three-dimensional structure of the protein is also necessary.

The passage through the external membrane therefore requires a pre- se­quence, different from the signal peptide permitting entry in the endoplasmic reticulum (see above), and bringing residues having a positive charge and hydroxylated residues whose organization in space perhaps plays a decisive role.

The imported protein can remain integrated in the external membrane if the pre-sequence is followed by an anchoring sequence and a second sequence charged positively.

But there are three other possible destinations, namely the intermembrane space, the inner membrane and the matrix, and for a protein to reach one of these three destinations there must be, on one hand, a protomotive force, because the addition of an uncoupling agent, like dinitrophenol, blocks the transport to these destinations and, on the other hand, a proteolytic cleavage of the amino-terminal pre-sequence. It appears that transport takes place at locations where outer membrane and inner membrane adhere to one another.

Techniques of genetic engineering have made possible the production of chimeric proteins; thus, when dihydrofolate reductase (a cytosolic enzyme) is preceded at its N-terminal end by the pre-sequence (61 amino acids) of cytochrome c1, it is imported into the intermembrane space of the mito­chondrion, i.e. at the place where the cytochrome c1 is normally localized. But if this enzyme is preceded by the pre-sequence (27 amino acids) of alcohol dehydrogenase (an enzyme localized in the matrix), dihydrofolate reductase is found in the mitochondrial matrix.

The import of proteins into mitochondria (and plasts) is a post-translational process which may be performed in vitro by incubating organelles in presence of the protein precursor, and therefore differs from the transport of proteins through the membrane of the endoplasmic reticulum which takes place, as already mentioned, concomitantly with the translation (while the protein chain is being synthesized).

Most proteins of chloroplasts are also coded by the nuclear genome, syn­thesized in the cytosol and imported in the plast. But there are more possible localizations in the plast (6) than in the mitochondrion: outer membrane, inter- membrane space, inner membrane, stroma, membrane of the thylakoid, lumen of the thylakoid. As in the case of mitochondria, the import of proteins requires a pre-sequence, also called transit peptide, at the N-terminal end, comprising residues with positive charge and hydroxylated residues.

This pre-sequence is cleaved during the transport which requires the hydrolysis of ATP rather than a protomotive force. In the case of chloroplasts also, it has been possible to import a cytosolic protein by grafting to it (at the level of the DNA) the transit peptide of a protein normally imported in the chloroplast (e.g., that of the small sub-unit of ribulose 1, 5 bisphosphate carboxylase).

It must be noted that the situation is more complex in the case of plasts, not only because of the existence of 6 different compartments in the chloroplast, but also owing to the presence, in plants, of plasts which are different with regard to their roles and therefore with regard to their proteins: chloroplasts (photosynthesis), chromoplasts (synthesis of carotenoids), amyloplasts (synthesis of starch).

h) Transport of Proteins to the Nucleus:

All nuclear proteins (especially histones, DNA polymerases, RNA polymerases and all proteins participating in the replication of DNA and transcription) are synthesized in the cytosol and must pass through the nuclear envelope of eucaryotes comprising an outer membrane and an inner membrane.

This transport seems to take place thanks to nuclear pores of 70 Å in diameter, for the small proteins (for example, histones), but for larger proteins a short peptide sequence appears to be necessary.

Thus, the transport to the cell nucleus of antigen T of S V 40 virus, a protein of molecular weight 92 000 daltons which controls the replication and transcription of viral DNA, requires a heptapeptide comprising 5 consecutive positively charged residues (Pro-Lys-Lys-Lys-Arg-Lys-Val).

The replacement of a Lys residue by Thr or Asn prevents the import of antigen T in the nucleus. It was also possible to bring about the transport of proteins to the nucleus by grafting (at the DNA level) this heptapeptide sequence on pyruvate kinase or on other cytosolic proteins.

Organization and Expression of MitochondriaI and Chloroplastic Genomes:

The mitochondrial genome of various eucaryotic cells, especially yeast cells and human cells, has been extensively studied in recent years; the results obtained are interesting and — for some — unexpected.

The human mitochondrial genome, which is a circular double-stranded DNA, comprises 16 569 base pairs and its sequence has been entirely deter­mined (the mitochondrial genome of the yeast Saccharomyces cerevisiae is five times larger and contains about 78 000 base pairs). This genome is very com­pact and characterized by a great economy.

It codes for:

1. 2 rRNAs, which are particularly short: 1559 and 954 nucleotides respec­tively (3 200 and 1660 nucleotides in yeast mitochondria).

2. 22 tRNAs (24 in yeast). This number of tRNAs is clearly smaller than the minimum number of tRNAs needed to read the 61 sense codons, taking into consideration the wobble hypothesis i.e. 31 tRNAs (plus the initiation tRNA). The translation of codons by a reduced number of tRNAs in the mitochondria is however accomplished thanks to different possibilities of codon-anticodon pairing.

Thus, some mitochondrial tRNAs, having a U at the first position of the anticodon, can read 4 codons differing only by their third nucleotide, i.e. the 4 codons of the same family (or the same box, in fig. 6-48), while procaryotic and eucaryotic tRNAs generally have a modified U at the first position of the anticodon, which restricts their possibilities of pairing.

Genetic Code

3. About thirteen polypeptides which are part of oligomeric membrane proteins: cytochrome oxidase, apocytochrome b, ATPase, NADH dehydro­genase (NADH ubiquinone oxidoreductase, sensitive to rotenone). It is of interest to note that the sub-unit no. 9 of ATPase is coded by the mitochondrial genome in yeast, but by the nuclear genome in Neurospora crassa, which suggests that transfers of genes took place during the evolution.

It was observed that human mitochondrial genes are not separated by intercistronic sequences (punctuation is carried out by the tRNA genes) and that the mRNAs contain neither leader sequence nor tail sequence. More than half of them do not even have, after transcription, any termination codon, the latter being formed only upon polyadenylation of the mRNA (the termination codon UAA is formed by addition of one or two adenylic residues to UA or U).

There seems to be only one promoter on each strand of the human mitochondrial DNA, which suggests that the genome is transcribed completely and symmetrically and that the two long precursors are cleaved (generally at the sites where the tRNAs are situated). In the yeast mitochondrion, on the contrary, transcription seems to take place from several different promoters.

Comparing the genes of apocytochrome b in human and yeast mitochondria, it is observed that the coding sequences are of the same size, but while there is no intron in the human mitochondrial gene, there are 5 introns in yeast (the intron no. 2 and the preceding exon seem to code for a maturase, the enzyme responsible for the excision of this intron).

Besides, while in man, there is neither leader sequence in 5′, nor tail sequence in 3′ (and not even any complete termination codon), in yeast there is a leader sequence of about 1000 nucleotides and a tail sequence of about 50 nucleotides.

Lastly it came as a surprise to find out that the genetic code used by the mitochondria is different from the genetic code which was thought to be universal (see fig. 6-48). For instance, UGA is a tryptophan coding (instead of non-sense) and AUA a methionine codon (instead of coding for isoleucine).

AGA and AGG can be used as termination codons (instead of being codons for arginine), AUA and AUU can be used as initiation codons (instead of being codons for isoleucine) and the 4 CUN codons can be recognized by threonyl-tRNA (instead of leucyl-tRNA), but as regards these peculiarities of the mitochondrial genetic code, it appears that there are variations according to the organisms.

It must be noted that the mitochondrial genome of plants is also circular but that it is clearly larger than that of mammals (about 16 kpb) or yeast (about 78 kpb) and that its size varies, according to the species, from 200 to more than 2 000 kpb, while it is not known presently whether these large mitochondrial genomes contain more genes.

Indeed, only about fifteen genes of proteins have been identified and sequenced at present; these are, on one hand, genes already identified in mitochondrial genomes of mammals or yeast, coding for sub-units I, II and III of cytochrome oxidase, sub-units of NADH-dehydrogenase, some sub-units of ATPase and cytochrome b and, on the other hand, some genes coding for proteins of the small ribosomal particle (S12, S13, S14).

As regards genes coding for rRNAs, one finds genes coding for the 2 large rRNAs and also — and this is characteristic of the mitochondrial DNA of plants — the gene of mitochondrial 5 S rRNA; another characteristic of plant mitochondria: a number of mitochondrial tRNAs are coded by the mitochondrial genome, but the other mitochondrial tRNAs are coded by the nuclear genome and are imported into the mitochondrion.

Moreover, a heterogeneity is observed in the size of mitochondrial DNA molecules of the same plant; this is due to the presence, in the mitochondrial genome, of repeated sequences (either in the same direction or in opposite direction) which can generate sub-genomic DNA molecules by recombination.

Lastly, it was recently shown that in plant mitochondria, there exists a phenomenon of correction of genetic information at the level of mRNAs, called RNA editing. It consists in modifying some Cs into Us and thus changing some codons. This results in the synthesis of a protein whose sequence in amino acids is different from the sequence which could be deduced from that of the gene, but has more similarity to the amino acid sequence of the cor­responding mitochondrial protein in other organisms (mammals, yeast).

On the contrary, the size of chloroplast genomes, which are also circular, does not vary considerably from one species to another and is about 120 to 190 kpb. In most of the species studied, the chloroplast DNA comprises a region (of about 10 to 25 kpb) present as two copies in opposite orientations (inverted repeats) which contain the genes of the rRNAs, and are separated by a large single copy region and a small single copy region.

The complete sequence of the chloroplast genome of two plants namely tobacco (155 kpb) and Marchantia (120 kpb), was recently determined.

The chloroplast DNA codes for the following products:

i. The chloroplast 23 S, 16 S, 5 S and 4.5 S rRNAs;

ii. Thirty chloroplast rRNAs;

iii. 19 chloroplast ribosomal proteins, i.e. 11 proteins of the 30 S particle and 8 of the 50 S particle (the others are imported from the cytoplasm);

iv. The translation initiation factor IF-1 (and probably, the elongation factor EF-Tu in some species only);

v. Three sub-units of the RNA polymerase;

vi. The large sub-unit of ribulose 1, 5 bisphosphate carboxylase (the small sub-units is coded in the nucleus, synthesized in the cytoplasm in the form of a slightly longer precursor which is imported into the chloroplast);

vi. Two proteins of photosystem I, five proteins of photosystem II, six polypeptides forming part of the ATP-synthase complex, and three proteins involved in electron transport (in the cytochrome b6/f complex).

In addition to the genes for these forty or so identified proteins, the chloroplast genome contains some forty open reading frames (ORF) which could also code for polypeptides, but most of the chloroplast proteins are coded by the nuclear genome.

Contrary to the mitochondria, it appears that chloroplasts use the universal genetic code.