In this article we will discuss about the process of biosynthesis of RNAs.

The polymerization of ribonucleosides-5′-triphosphates to RNA is a process called transcription, because the genetic information carried by the DNA is transcribed into RNA. This transfer of information is based on the same principle as the duplication of DNA, i.e. the pairing of complementary bases (we will see that this pairing also ensures the fidelity of translation).

We will first study the synthesis of various types of cellular RNAs (ribosomal RNAs, transfer RNAs, messenger RNAs) which are transcription products of the corresponding genes; then we will consider the particular case of the synthesis of ribonucleic acids of RNA-containing viruses and phages.

I. Polynucleotide-Phosphorylase:

Isolated by Grunberg-Manago and Ochoa in 1955, polynucleotide phos-phorylase is the first known enzyme capable of catalyzing the formation of polyribonucleotides.

It catalyzes the following reaction:

N1ADP + n2GDP + n3CDP + n4UDP → [(AMP)n1, (GMP)n2, (CMP)n3, (UMP)n4] + (n1 + n2 + n3 + n4) Pi 

This reaction does not require the presence of DNA as template, and the base composition of the product formed in the reaction generally reflects the relative proportions of ribonucleosides-5′-diphosphates added to the reaction mixture.

This is not compatible with the requirements of a faithful transcrip­tion of the genetic information from DNA to RNA, and the idea that this enzyme could be responsible for the biosynthesis of ribonucleic acids had to be abandoned.

Although it is widely distributed among bacteria, its physiological role is still unknown. It perhaps enables the degradation of cellular ribonucleic acids since it also catalyzes the phosphorolysis reaction (whence its name) at the intracellular phosphate concentrations.

In any case, this enzyme is ex­tremely useful because it permits in vitro synthesis of numerous homo- and copolymers depending on whether one or several ribonucleosides-diphosphates are introduced in the reaction mixture: with only UDP, polyuridylic acid (poly U) is obtained; with only ADP, one obtains polyadenylic acid (poly A); ADP + UDP will give poly AU; with the 4-nucleosides-diphosphates one will have poly AGUC, etc. The polynucleotides obtained in this manner were used in numerous physico-chemical works and in the first studies on the deciphering of the genetic code.

II. DNA-Dependent RNA-Polymerase (Transcriptase):

As indicated by its name, this is the enzyme permitting transcription.

It was discovered in 1960 by Hurwitz in E.coli and by Weiss in the rat liver; it catalyzes the following reaction:

The presence of the 4 ribonucleosides-5′-triphosphates is necessary for the transcription of DNA to take place. The polymer formed is a poly­ribonucleotide identical to the natural ribonucleic acids. The DNA necessary for the reaction serves as the template; in other words, the base sequence of this DNA controls the base sequence of the RNA synthesized.

This is shown by the analysis of the base composition and study of the nearest neighbours frequency of the DNA introduced in the reaction and the RNA synthesized.

Besides, if instead of introducing a natural DNA, a synthetic copolymer, poly dAT, is added to the reaction mixture, the enzyme catalyzes the synthesis of the com­plementary copolymer, poly UA, even if the 4 ribonucleosides triphosphates are present in the mixture.

Lastly, the DNA introduced in the reaction and the newly synthesized RNA can form, after heating and slow cooling, hybrids which are the sign of a complementarity. The polarity of the synthesized RNA is the opposite of that of the DNA strand which served as template (see fig. 6-35).

Mechanisms of the Transcription

As shown by fig. 6-35, the ribonucleic acids are synthesized in the direction 5′ → 3′. It is observed that the first nucleotide incorporated maintains its triphosphate group, contrary to the nucleotides incorporated subsequently; therefore, by introducing a nucleoside triphosphate labeled on its γ phosphate (32Pγ-ATP for example) in the reaction mixture, one can determine the number of polyribonucleotide chains synthesized in the reaction which begin with an adenylic nucleotide, by simply measuring the radioactivity incorporated in the polymer (as only the adenylic nucleotides incorporated at the beginning of the chain conserve their β and γ phosphate groups).

The use of ATP, GTP, OTP and UTP labeled on γ-phosphate showed that RNA chains have a purine at their 5′ end, i.e., they begin with a pppA or pppG.

This peculiarity also confirmed that the ribonucleic acids are actually synthesized in the direction 5′ → 3′; if some 32Pγ-ATP is introduced at the very first moments of incuba­tion and then “chased” by adding an excess of non-radioactive ATP, radioac­tivity (due to the triphosphate group) is found at the 5′ end of chains formed in the reaction, which proves that the 5′ ends were constituted at the beginning of the reaction when the radioactive ATP was present (if the chains were formed in the direction 3′ → 5′, the 5′ ends would be constituted later, when non­radioactive ATP is present in the mixture, and they would not be radioactive).

The use of 3′ deoxy ATP (labeled on adenine) confirmed that the RNAs are indeed synthesized in the direction 5′ → 3′. This analogue is found incorporated in the ribonucleic chains thanks to its 5′ triphosphate group (but any subsequent polymerization is then blocked), whereas if the elonga­tion took place from 3′ to 5′, it could not be incorporated due to the absence of OH in 3′.

1. Steps of the Transcription:

Several steps can be distinguished in the transcription:

A. Initiation:

This step implies the recognition of a region of the helicoidal double- stranded DNA molecule (region called “promoter”), the binding of the RNA polymerase (holoenzyme) to the DNA, a localized separation of the 2 strands producing a short single-stranded region which will serve as template for the pairing of ribonucleotides, and the selection of the first nucleotide(s) of the RNA chain.

B. Elongation (Lengthening):

Once the first ribonucleotides are polymerized (forming with the strand of template DNA, a short DNA-RNA hybrid), the core enzyme will move along the DNA, separating the two strands. This ensures the continuation of the incorporation of ribonucleotides, while behind, the newly synthesized RNA is displaced from the template DNA strand thus allowing the double helix of the DNA to form again.

C. Termination:

When the last ribonucleotide is incorporated, polymerization stops, and the transcription complex is dissociated; this liberates the enzyme and the newly formed RNA (after dissociation of the DNA-RNA hybrid) so that the DNA can resume its double stranded structure. This step implies the recognition of a DNA sequence called “terminator”.

2. The Bacterial RNA Polymerase:

A. Structure:

In the bacterial cell, a single RNA polymerase permits the synthesis of ribosomal RNAs, transfer RNAs and messenger RNAs. In E.coli the complete enzyme (or holoenzyme) has a molecular weight of about 450 000 and consists of four types of polypeptide chains, namely two α chains (mol. weight: 40 000), one β chain (mol. weight: 150000), one β’ chain (mol, weight: 155000) and one σ chain (mol. weight: 70 000).

The core enzyme of α2ββ’ structure, can perform the elongation of RNA chains, but the factor (or sub-unit) o is indispensable for the initiation of the transcription in a specific, site on a double-stranded DNA. Eucaryotic cells, on the contrary, possess several RNA polymerases which differ by their localization, structure, properties and the type of RNA whose synthesis they catalyze.

B. Mode of Action of the Enzyme:

Whereas the core enzyme can bind to any DNA sequence (called weak binding site) forming a closed complex (enzyme-DNA) where the DNA remains double-stranded, the σ factor notably modifies the affinity of RNA polymerase for the DNA; consequently, the holoenzyme has a reduced capacity to recognize the weak binding sites (the complexes formed will be less stable), but it has, on the contrary, the property of recognizing specific binding sites, the promoters, and binds to them strongly (in fact, more or less strongly, which will determine the efficiency of different promoters in the initiation of transcription), forming an open complex, thus called because of the limited opening of the double helix. Out of 4 X 106 base pairs of the DNA of E.coli, there are about 2 000 promoters.

This binary complex soon becomes a ternary complex (DNA-enzyme-nascent RNA chain) after formation of the first phosphodiester bond, then the σ factor detaches itself from the complex, and the core enzyme continues the polymerization (elongation).

The transitory association of the o factor with the enzyme decreases the stability of weak complexes (formed between the core enzyme and any region of the DNA) and, on the contrary, increases the stability of strong complexes formed by the complete enzyme at the promoters.

But this high stability would block the reaction, if the dissociation of o did not modify the enzyme which thus recovers (by again becoming core enzyme) its general affinity for the DNA (without specificity vis-a-vis particular sequences) ena­bling it to accomplish elongation.

The core enzyme moves along the template and the open region of the double helix accompanies this movement. It is considered that the enzyme covers about 60 DNA base pairs and that the region where the helix is open comprises 17 base pairs.

It is believed that the hybrid formed (between the template DNA strand and the synthesized RNA strand) covers about 12 base pairs and has only a very transitory existence because following the passage of the enzyme, the RNA strand is displaced and the DNA double helix (duplex) is reconstituted. Transcription is a very rapid process: in E.coli, at 37°, the mRNAs are synthesized at the rate of 40 to 50 nucleotides per second.

It must be noted that RNA polymerase has no exonuclease activity and cannot therefore have a corrector activity, contrary to what takes place for the DNA polymerase. Therefore, the fidelity of transcription is less than that of replication (it is estimated that one error occurs for 104 or 105 bases, which is 104 times more than in the case of replication).

This relatively high rate of error is however acceptable, on one hand because the errors are not trans­mitted to the progeny, and on the other hand because each gene is transcribed into numerous RNA molecules and the presence of a few defective molecules is not significant.

C. Role of Various Sub-Units:

Little information is presently available on the architecture of the RNA polymerase of E.coli and the roles of various sub-units.

It is believed that the β chain is involved in the binding of nucleosides-triphosphates because it could be labeled by some analogues of these substrates and because the origin of the β-chain determines — in reconstitution experiments — the sensitivity to some antibiotics like rifamycins (especially rifampicin) which inhibit initiation by acting before the formation of the first phosphodiester bond, or streptolydigins which inhibit elongation.

It is believed that the β’ chain which is the most basic, acts by binding the template, because in vitro addition of heparin, a polyanion which binds to the β’ sub-unit, blocks transcription, (heparin competes with the DNA for the binding of the enzyme).

Addition of the o factor changes the conformation of the core enzyme and its capacity to recognize the DNA (see above), but it is possible that sub-unit α also acts in the recognition of promoters because when E.coli is infected by the phage T4, an arginine of the a chain undergoes an ADP-ribosylation and this modification is accompanied by a decrease of the affinity of the enzyme for the promoters previously recognized by the holoenzyme.

One might wonder why the bacterium needs a multimeric RNA polymerase of this size (mol. weight = 450 000), while some phages code for monomeric RNA polymerases of much smaller size (mol. weight = 110 000 in the case of RNA polymerases of phages T3 and T7), catalyzing the synthesis of RNA at a higher rate (200 nucleotides incorporated per second, at 37°) than the enzyme of E.coli (40 nucleotides per second).

But whereas those phage polymerases recognize only some promoters on the DNA of the phage, the bacterial enzyme must catalyze the production of more than a thousand transcripts and its activity is modulated by numerous factors, of bacterial and of phage origin, specific of one or several operons, acting either positively or negatively; the complexity of the bacterial RNA polymerase may therefore be due to the necessary interactions with these diverse promoters and with different regula­tion factors rather than to the requirements of the catalytic activity proper.

3. Asymmetry of the Transcription:

In vivo, transcription is an asymmetric process, i.e. for a given gene or group of genes, only one of the 2 strands of the DNA is transcribed as per the rule: 1 gene → 1 polypeptide chain.

In presence of σ factor, it has also been possible to obtain an asymmetric transcription in vitro, which suggests that the signals permitting the RNA polymerase to recognize on the DNA the place where it must bind to begin the transcription have been preserved; this opinion is further supported by the fact that the RNAs formed in vitro can compete with those synthesized in vivo, during hybridization experiments, as would be ex­pected if, in fact, they were actually the transcription products of the same genes.

Thanks to advances made in the methods used for the determination of the nucleotide sequences of the DNA, it was possible to define, for a whole series of bacteriophage and bacterial genes, the structure of regions where the RNA polymerase binds specifically (promoter sites) before starting the RNA synthesis.

The comparison of these sequences revealed structural patterns (motifs) common to the various promoter sites studied. In the same way, it was possible to find a typical nucleotide sequence playing the role of termination signal of the transcription.

Let us add that in some cases, all the ribonucleic acids synthesized in vivo appear to result from the transcription of only one DNA strand (for example in the case of phage T7), but in other cases they are the transcription products of one of the strands for some genes and of the other strand for other genes (for example in the case of phage λ).

4. Sites Involved in the Initiation:

We have seen that the RNA polymerase recognizes a region of DNA called “promoter” and binds to it, thus allowing transcription to be initiated and to proceed until the enzyme reaches the “terminator” sequence, thus producing a RNA molecule called “primary transcription product” or “primary transcript”, which is mono- or polycistronic, and which is generally unstable, because it undergoes maturation processes.

A. Starting Point of the Transcription:

By convention, the starting point of the transcription is numbered +1, the bases situated upstream this point are numbered -1, -2, etc., the bases downstream are numbered +2, +3, etc.

The starting point, i.e. the DNA base pair corresponding to the first nucleotide of the RNA, is identified as follows: the template DNA is denatured, it is hybridized with the RNA obtained by transcription in vivo or in vitro, the non-paired DNA (i.e. the non- transcribed strand as well as the regions of the template strand extending beyond — on both sides — the transcribed sequence) is degraded by nuclease S1, and the RNA involved in the hybrid is digested by alkaline hydrolysis: it is then possible to determine the sequence of the DNA which was involved in the hybrid and define the starting point of the transcription.

This mapping technique with S1 nuclease was simplified by the use of single-stranded DNA fragments labeled at one end and corresponding to the presumptive transcription start region (see fig. 8-8). This starting point is part of the promoter, and is generally situated in the region of the DNA where the enzyme binds.

Control by Attenuation

B. Binding Sites of E.coli RNA Polymerase:

The site where the RNA polymerase binds to form the initiation complex is part of the promoter; its sequence can be determined after binding the enzyme in vitro to a DNA fragment containing a promoter, digesting by a DNase the DNA not protected by the enzyme, and extracting and sequencing the protected DNA.

This type of analysis provides fragments of 40 to 44 base pairs having at their centre the starting point of the transcription (which therefore extends from -20 to about + 20), because the enzyme attached to the protected fragment can in fact initiate the transcription and synthesize a small RNA chain of 17 to 20 nucleotides, terminated when the RNA polymerase is released at the end of the DNA fragment.

The binding site of the RNA polymerase can also be studied by the “foot- printing” method (see diagram below) where the DNA fragment protected by the enzyme (and labeled on one strand at one of the ends) is subjected to a limited endonuclease digestion which will split all the accessible phosphodiester bonds (not those protected by the RNA polymerase), but causing statistically one cleavage per molecule.

As in the method of the sequence gels, the fragments produced are separated according to their size by gel electrophoresis and for each ruptured bond, a band is obtained on the gel corresponding to the fragment extending from the labeled end to the cleavage site.

The comparison of the bands obtained on the gel for the non-­protected DNA fragment (where all the phosphodiester bonds can be rup­tured) and the protected fragment, shows that some bands are missing on the gel corresponding to the protected fragment; this is the region where the polymerase bound to the DNA has prevented the endonuclease action.

By combining foot-printing and sequencing it is possible to determine the position of the binding site and its sequence. Labeling the two DNA strands (in 2 separate experiments) one can even observe that the two strands are not protected with equal efficiency at the ends of the binding site, which suggests an asymmetric conformation of the polymerase bound to the DNA, in conformity with the asymmetry of the transcription itself.

By this technique, the binding site appears slightly greater (upstream of the starting point) than by the method based on the recovery of the protected fragment (see above): it extends from about -50 to +20. Comparable results were obtained by studying the protection of the DNA (by the RNA polymerase) against an exonuclease attack, demarcating a promoter region extending from -44 to + 20.

Determination of the Position of the Binding Site of RNA Polymerase

The study of more than hundred procaryotic promoters revealed the exist­ence, upstream the transcription initiation site, of common patterns (consen­sus) comprising 6 base pairs called respectively sequence -10 (or Pribnow box, or TATA box) and sequence -35.

Strong (effective) promoters, permitting frequent initiations of the transcription (every 2-5 seconds, for example in E.coli) have sequences very near the 2 consensus sequences, while weak promoters (permitting an initiation of the transcription of the gene only every 5 or 10 minutes in E.coli) present substitutions on their sequences -10 and -35.

Site-directed mutagenesis experiments devised to modify in vitro one or several bases of these consensus sequences and introduce into a bacterial cell the gene flanked with a modified promoter, showed that the substitution of only one base, either in the sequence —10, or in the sequence – 35, could bring about a considerable decrease or even loss of the promoter activity.

5. Termination of the Transcription:

A. Termination Independent of Rho Factor:

The termination signal is formed on the DNA by a palindromic sequence rich in G-C followed by a region rich in A-T (see fig. 8.7), which — after transcrip­tion — permits at the level of the RNA, the formation of a hair-pin structure (thanks to the G – C pairs which are relatively stable) followed by a series of at least four or five uridylic residues.

Signals Controlling the Transcription of the Lactose Operon

The RNA polymerase slows down or pauses when it finds a hair-pin structure, and the DNA-DNA hybrid formed beyond this structure is less stable as it rests on dA-rU pairs which causes a dissociation of the DNA-RNA hybrid, a re-association of the two DNA strands, a liberation of the core enzyme and a liberation of the newly formed RNA chain.

B. Termination Dependent on Rho Factor:

In certain sites, termination requires the presence of p (rho), a hexameric protein which binds to the single-stranded RNA and detaches it from the DNA-RNA hybrid.

6. Inhibition of Transcription:

Actinomycin D, an antibiotic produced by a strain of Streptomyces, has a phenoxazone ring carrying 2 cyclic peptides which is inserted between 2 succes­sive GC base pairs of the DNA.

This fixation on a double-stranded DNA prevents the DNA from functioning as template in the transcription, but a low concentration, it prevents neither the replication of the DNA nor the protein synthesis, which makes of actinomycin D a specific inhibitor of the synthesis of new RNA molecules in procaryotes and eucaryotes. This antibiotic has been used as an anti-cancerous agent, because it inhibits the growth of rapidly dividing cells.

Rifamycin, isolated from another strain of Streptomyces, and its derivative, rifampicin, inhibit the initiation of new RNA chains, but not the elongation of chains already being synthesized. It is the formation of the first phosphodiester bond which is blocked, and the sub-unit β of RNA polymerase is the target of the antibiotic.

7. Transcription in Eucaryotes:

A. Transcription and Translation Take Place in Different Compartments:

Whereas in procaryotes transcription and translation are coupled, i.e. trans­lation of a mRNA can begin before its synthesis is completed, in eucaryotes transcription takes place in the nucleus while translation takes place in the cytoplasm, so that the RNAs (rRNAs, (RNAs, mRNAs) will have to migrate from the nucleus to the cytoplasm.

The heterogenous nuclear RNAs (hnRNAs) are partly hydrolyzed by nucleases in the nucleus and partly converted into mRNAs. We shall see that this conversion (maturation) of the primary products of transcription (primary transcripts) into mRNAs implies some modifications at the 5’end (formation of a cap) and 3’end (addition of a tail of poly A) and the elimination of introns, with the result that sometimes the mature mRNA has only 10% of the size of the corresponding precursor.

It must be noted that these processes of transport and maturation of mRNAs constitute steps during which may come into play mechanisms of regulation of the expression of genes, which obviously do not exist in procaryotes.

B. Existence of 3 RNA Polymerases:

In eucaryotes, the mechanisms controlling transcription are varied and the presence of several RNA polymerases (contrary to what takes place in procaryotes where the synthesis of all RNAs is carried out by a single polymerase) is an important aspect.

Three DNA-dependent RNA polymerases are found in the nuclei of eukaryotic cells:

1. RNA polymerase I (or A), responsible for the synthesis of about 60% of the cellular RNAs, localized in the nucleolus, catalyzes the synthesis of ribosomal RNAs (rRNAs) and is insensitive to α-amanitine (one of the toxins of Amanita phalloides);

2. RNA polymerase II (or B), responsible for the synthesis of about 30% of the cellular RNAs, localized in the nucleoplasm, catalyzes the synthesis of heterogenous nuclear RNAs (hnRNAs) which are the precursors of messenger RNAs (mRNAs) and is inhibited by small concentrations of α-aminitine (of the order of 0.1μg/ml);

3. RNA polymerase III (or C), responsible for the synthesis of about 10% of cellular RNAs, also localized in the nucleoplasm, catalyzes the synthesis of transfer RNAs (tRNAs) and 5S RNA; the enzyme of animal cells is sensitive to α-amanitine at concentrations in the order of 500 μg/ml, while the enzyme of yeast or insect is insensitive.

The quaternary structure of these enzymes (molecular weight in the order of 600 000) is based on the same model as that of the bacterial polymerase (two large-sized sub-units of molecular weight 220 000 to 140 000 depending on the enzyme and several sub-units of smaller size).

The total number of sub-units which constitute these molecules is however much greater in eucaryotes (about fifteen), and only the largest ones have preserved resemblance with those of archaebacteria as revealed by their immunological properties. Some of the small sub-units (two or three depending on the species), probably involved in the basic catalytic function (formation of phosphodiester bonds), are common to the three classes (A, B and C) of polymerases.

The more intricate structure of RNA polymerases of eucaryotes most probably reflects the greater com­plexity of the tasks they perform. The regulation of transcription also involves other protein factors which interact both with the DNA and the polymerases.

As in procaryotes synthesis of RNAs takes place in eucaryotes in the direc­tion 5′ → 3′, by a nucleophile attack of the 3′ OH of the growing chain on the Pα of the following ribonucleoside 5′ triphosphate and no primer is necessary. The eucaryotic RNA polymerases also have no exonuclease activity, which excludes the possibility of correcting the errors occurring during the transcrip­tion.

C. Promoters of Eucaryotes:

Upstream the eucaryotic genes coding for proteins was observed the presence of sequences (or elements in “cis”) important for the transcription, especially a TATA box which is however situated farther away from the site of initiation of the synthesis of mRNAs than in procaryotes, generally at -25 (in yeast, the TATAAA sequence is situated further upstream).

This TATA box is necessary for the promoter activity as shown by site-directed mutagenesis experiments, but in addition, also come into play other sequences situated even further upstream (between – 40 and 110) called CAATbox and GCbox.

D. Protein Factors Stimulating Transcription:

One already knows a number of proteins having the property of stimulating transcription by binding specifically to promoter sequences situated upstream the genes. These proteins are called transcription factors (or elements acting in “trans”) and their binding sites are determined by “foot-printing” experi­ments.

The characterization of these proteins often uses the gel retardation technique which enables one to watch the migration of a DNA fragment (comprising a promoter sequence) slowed down by the binding of a transcription factor (in comparison to the migration of the DNA fragment alone).

For example, the following transcription factors were characterized:

1. Sp 1, a protein of about 100 kdaltons; in mammals it is required for the transcription of genes whose promoters comprise a GC box (especially in the case of the DNA of SV40 virus);

2. CTF or CCAAT-binding Transcription Factor, a protein of about 60 kdaltons which, in mammals binds to CAAT box;

3. Protein B which binds to TATA box in the drosophila;

4. HSTF, or Heat-Shock Transcription Factor which, in the drosophila binds to the promoters of heat-shock genes upstream their TATA box. It is distinct from factor σ32 which combines with the RNA polymerase of E.coli to permit the recognition of bacterial heat shock genes.

E. Enhancers:

These elements exert their stimulating activity even if they are situated several thousand base pairs away from the gene in question and they may be localized either upstream or downstream or even in the middle of the gene, on the coding DNA strand or on the non-coding strand.

In general a given enhancer is active only in some cells, because it requires a protein which is expressed only in these cells. For example the enhancer permitting the hor­monal action of glucocorticoids must bind the soluble hormone-receptor com­plex to be able to stimulate the transcription of a battery of genes.

How this long distance effect of enhancers is exerted is not yet known exactly, but it is believed that the binding of regulator proteins could bring nearer regions of the DNA initially distant and permit the formation of an active complex of transcription initiation comprising RNA polymerase II.

8. Maturation of Transcription Products:

A. Maturation of rRNAs and tRNAs:

In procaryotes as in eucaryotes, rRNAs and tRNAs are transcribed as precursors (longer than the functional RNAs) which will have to undergo a series of endo- and exo-nucleolytic cleavages to assume their final size.

For example in E.coli a primary transcript will supply the 3 rRNAs (16 S, 23 S and 5 S) and 1 or 2 tRNAs. Depending on the operon considered (there are 7rRNA operons in E.coli), there are 1 or 2 tRNA genes situated in the spacer between the genes of RNA 16 S and RNA 23 S. Other precursors contain several tRNA molecules.

tRNA

The maturation of tRNAs also implies the addition, under the effect of the tRNA nucleotidyl transferase, of the 3′ terminal CCA triplet, when the latter is not present (often it is not coded by the tRNA gene).

Furthermore, the precursors of rRNAs and especially of tRNAs undergo enzymatic modifications of which the simplest is methylation, either of the OH at position 2′ of ribose, or of the purine or pyrimidine base, thus producing the rare bases a few examples of which we mentioned in the foregoing.

These methylations are catalyzed by tRNAs methyltransferases (using generally S-adenosyl-methionine as donor of the CH3 group) which are extremely specific.

This specificity concerns the nature of the base (an enzyme which methylates guanine does not methylate the 3 other bases), the position which is going to be methylated (different enzymes are responsible for the formation of 1 methylguanine, N2 methylguanine, N2 dimethylguanine and 7 methylguanine), and the localization of the base (of the 15 or 20 guanines of a tRNA molecule, only one will be methylated).

Lastly, some precursors of eucaryotic rRNAs and tRNAs (in the cytoplasm as well as in the organelles) contain introns which must be spliced out. It appears that this splicing is controlled more by the secondary structure of the mature RNA than by particular signals present at the ends of the intron (as in the case of splicing of introns in precursors of eucaryotic mRNAs.)

B. Maturation of Eucaryotic mRNAs:

a) Cap Formation at the 5’end of mRNAs:

In eucaryotes as in procaryotes, precursors of mRNAs begin with an adenylic or guanylic nucleoside triphosphate, but in eucaryotes this 5’end of the chain under syn­thesis is very rapidly blocked by the binding of a guanylic residue in reverse position. This nucleotide, called cap, is added by action of GTP on the purine nucleoside triphosphate, forming an unusual 5’—5′ triphosphate bond.

This cap stabilises the mRNAs by preventing their degradation at the 5′ end by phosphatases or exonucleases, and stimulates the translation of mRNAs by the systems of eucaryotic protein synthesis.

This reaction, catalyzed by a guanylyl transferase (a nuclear enzyme), creates a 5′ —5′ triphosphate bond between the first nucleotide and the guanylic residue added:

The 5’end of the capped mRNA can undergo several methylations: in all eucaryotes, a guanine-7-methyltransferase binds a methyl group at position 7 on the guanylic residue added.

Then, except in unicellular eucaryotes, the ribose of the first nucleotide (the one which was first before the addition of the cap, i.e. the adenylic nucleotide in our diagram) is methylated at position 2′. Some mRNAs are also methylated at the position 6 of the adenine, or on the ribose of the next nucleotide (the first N).

These possibilities of methylation are summarized in the diagram below:

b) Polyadenylation of the 3’end of mRNAs:

The termination signals of transcription in eucaryotes are not fully understood, but some resemble those of procaryotes because they also have a hair-pin structure followed by a series of uridylic residues (fig. 8-7).

The precursors of mRNAs are then cleaved at their 3’end by an endonuclease which recognizes a AAUAAA sequence situated at about twenty nucleotides upstream the cleavage point (and apparently a UGUGUU sequence situated at about twenty nucleotides downstream).

This cleavage is followed by the addition of several tens or even several hundred (up to about 300) of adenylic residues at the 3’end of most eucaryotic mRNAs. Although cleavage and polyadenylation are correlated, the two reac­tions are catalyzed by different enzymes (endonuclease and poly A polymerase) within a complex which also involves a ribonucleoprotein particle containing a small nuclear RNA rich in uridine (RNA U1).

The mRNAs coding for histones are notable exception to this type of maturation. For these RNAs, the 3′ cleavage seems to imply the pairing of the 5’end of another small nuclear RNA (RNA U7) at a complementary sequence near the cleavage site.

The cleavage is not followed by polyadenylation, but the existence of a palindromic sequence just upstream the 3’end of the RNA brings about the formation of a hair-pin type secondary structure at that place. It is probable that cap and polyadenylation or secondary structure provide a protection of 5′ and 3′ ends of mRNAs against exonucleases.

This poly A tail is of great practical interest. On the one hand, it enables the separation of mRNAs from other cellular RNAs (and especially from rRNAs) by using a column of oligo U-Sepharose (or oligo dT-Sepharose) to which the mRNAs are bound by pairing of their tail, then eluted by lowering the salt concentration.

On the other hand, when a oligo dT fragment is hybridized to the tail of poly A, this fragment can serve as primer to the reverse transcriptase to synthesize a strand of complementary DNA (cDNA), which can be used as template for the synthesis of a DNA strand complementary to the first DNA strand, thus providing a double-stranded DNA corresponding to the mRNA and which can be cloned.

c) Elimination of Introns:

Numerous works of fine genetic mapping and of nucleotide sequencing showed that in procaryotes the sequence of amino acids in the protein chain reflects exactly that of codons in the messenger RNA and that the latter is a copy strictly complementary to the DNA (gene-protein colinearity).

While the quasi-totality of genome sequences are informational in procaryotes, less than 1/10 of the DNAs seem to be expressed in higher eucaryotes. A large part of the DNA consists of repeated sequences and is not transcribed. The study of the structure of eucaryotic genes also revealed that the “gene-protein colinearity” dogma which applies to procaryotes is not systematically transposable to eucaryotic systems.

But for a few exceptions, most of the known genes coding for proteins in higher eucaryotes contain non-coding sequences (called “introns”) which in­terrupt the coding sequences (called “exons”). These genes are called “split” or “mosaic” genes. The length of introns present in precursors of eucaryotic mRNAs can vary from 50 to 10 000 nucleotides.

The precise demarcation of exons and introns could be carried out for numerous genes coding for proteins. The comparison of nucleotide sequences at the exon-intron junctions revealed consensus nucleotide sequences present there.

Lastly a motif less strictly conserved but equally important is found inside the intron, 20-55 nucleotides away from its 3′ end; it is called “branch site” because it is involved in the formation of a loop (see below).

Site-directed mutagenesis experiments showed that these particular nucleotide sequences play the role of “signal” sequences indispensable for an adequate maturation process.

On the contrary, the rest of the intron can be modified without affecting the effectiveness of the process of elimination of the intron; one can thus remove a part of the intron, or add sequences of varying lengths, provided one does not modify the “signal” sequences at the cleavage points at the 5′ and 3′ ends of the intron, nor the branch site and provided new “signal” sequences are not created inside the intron.

The primary transcrip­tion products of split genes contain both the intron and exon sequences and constitute the pre-messenger-RNAs (pre-mRNAs) in the heterogenous nuclear RNAs (hnRNAs). Before moving into the cytoplasm and after addi­tion of the cap and the polyA tail these precursors undergo the third phase of maturation in which the intron sequences are eliminated.

This operation called splicing takes place within ribonucleoprotein complexes (hnRNP) in which the hnRNAs are included right from their synthesis. The first step of this maturation (fig. 6-36A) is the cleavage of the RNA at the 5′ end of the intron and the appearance of a loop (or lariat) structure by formation of a 2′, 5′ phosphodiester bond between the 5′-terminal G of the intron and an internal adenylic residue of the intron.

Then takes place a trans-esterification reaction where the 3’OH group of the free exon (donor site) attacks the phosphodiester bond of the second intron-exon junction (acceptor site), displaces the 3′ end of the intron and establishes the junction between the two exons, liberating the intron which carries the loop.

The precision and efficiency of splicing are conditioned by the consensus sequences situated at the ends of the intron and by the loop connection sequence (or branch site) located inside the intron.

The nucleus contains small RNAs (having less than 300 nucleotides) called snRNAs (small nuclear RNAs) associated with proteins to form complexes called snRNPs or “snurps” (small nuclear ribonucleoprotein particles) which play a role in the splicing of introns.

In mammals, for example the U1-snRNP binds at the cleavage site at the 5′ end of the intron (RNA U1 has a sequence complementary to that of this site), U2-snRNP binds to the branch site and to the series of pyrimidines upstream the cleavage site at the 3′ end of the intron and U5-SnRNP recognizes the cleavage site at the 3′ end; a particle comprising the U4 and U6-snRNPs is further added to form — with the precursor of mRNA — a large complex (of about 60S) called spliceosome.

In patients suffering from lupus erythematosus, an auto-immune disease characterized by the formation of antibodies against various nuclear constituents, the excision of introns is blocked by antibodies specific of these ribonucleoproteins, thus confirming the important role played by snRNPs in the process of elimination of introns.

Intron Splicing Mechanisms

It must be noted that anomalies in the excision of introns can cause hereditary diseases, like some thalassemias (characterized by the synthesis of abnormal hemoglobins); it was for example observed that a punctual mutation (G → A) about twenty nucleotides upstream the normal cleavage site at the 3′ end of the intron had created a new cleavage site (UUAG) on the pre- mRNA bringing about the synthesis of a defective hemoglobin.

In some cases, an auto-catalytic process has been described which needs no intervention of any protein. The best known example is that of the maturation of the precursor (6.4 kb) of the 26S rRNA of a protozoan, Tetrahymena (see fig. 6-36B), where the elimination of the intron needs, in addition to the RNA itself, GMP (guanosine, GDP or GTP) as the only cofactor.

The 3’OH group of GMP attacks the phosphodiester bond corresponding to the 5′ end of the intron and due to a trans-esterification reaction displaces the 3′ end of the exon. The free 3’OH group of the latter then attacks the phosphodiester bond of the 3′ end of the intron, causing its cleavage and then the junction between the 2 exons.

The selectivity of this mechanism leading to the precise excision of the sequences of the intron seems to be dictated by the secondary structure of the precursor at the level of the intron, structure comprising numerous double helical arms and numerous loops.

The auto-catalytic excision process continues: the 3’OH of the liberated intron of 414 nucleotides attacks a phosphodiester bond near the 5′ end to form a circle of 399 nucleotides and liberate a fragment of 15 nucleotides comprising the guanylic nucleotide incorporated during the first step.

Then, this circle opens, closes while losing a fragment of 4 nucleotides, and opens again to give a linear RNA having thus lost 19 nucleotides compared to the initial intron, called L-19 IVS (Linear minus 19 intervening sequence).

This L-19 IVS is capable of catalyzing the conversion of pentacytidylate into shorter and longer oligonucleotides, and therefore has a nuclease and polymerase activity (the velocity of pentacytidylate hydrolysis by this RNA is 1010 greater than the hydrolysis velocity without catalyst).

This RNA, as also the precursor of the rRNA, are therefore endowed with catalytic activity and were named ribozymes. Some believe that the existence of these riboenzymes points to the possibility of an auto-replication of RNAs without participation of proteins at the beginning of evolution.

It is interesting to note that the maturation of precursors of bacterial tRNAs calls upon a ribonucleoprotein, consisting of a protein of 20 000 daltons and a RNA (RNA M1) of 377 nucleotides, called RNase P and which causes an endonucleolytic cleavage generating the 5’end of mature tRNAs. The RNA M1 can by itself recognise the cleavage site and catalyze the specific cleavage at an appreciable velocity. This is therefore another example of RNA endowed with catalytic activity or ribozyme.

The auto-excision of the intron present in the precursor of rRNA of Tetrahymena implies the recognition and cleavage of the exon-intron junction (at the beginning, i.e. 5′ side of the intron) and the intron-exon junction (at the end of the intron), then the ligation of the two exons.

Figure 6-37 shows the recognition of the exon-intron junction (at the 5′ end of the intron) and the position where cleavage takes place (indicated by an arrow). This recognition involves the nucleotides 22 to 27 of the intron which pair with the last 6 nucleotides of the exon.

Recognition of the Exon-Intron Junction

The excised intron of the precursor of rRNA of Tetrahymena (this intron initially comprises 414 nucleotides) loses 19 nucleotides at its 5′ end to give L-19 IVS (Linear minus 19 intervening sequence).

This fragment can (see fig. 6-38) cleave a RNA in a specific site, as a restriction endonuclease, thanks to the pairing between an internal sequence of this fragment and a complementary sequence on the RNA. This fragment of 395 nucleotides (414-19) is a ribozyme; it is found intact at end of the reaction.

Cleavage of RNA by L-19IVS

Figure 6-39 illustrates the auto-excision of the RNA (+ strand) of a virusoid of lucerne called LTSV (Lucerne Transient Streak Virus). This RNA of 324 nucleotides can take a hammerhead shape by pairing of complementary bases which permits the auto-excision between the nucleotides 167 and 168 (arrow).

Auto-Excision

Such structures were observed in other plant virusoids, suggesting that the hammerhead shape is a sufficient condition to permit cleavage; this led re­searchers to synthesize small RNAs (in fact, oligoribonucleotides) having se­quences similar to those constituting the hammerhead structures in viroids or virusoids.

A small RNA of only 19 nucleotides (lower strand in figure 6-40) is thus capable of cleaving the strand to which it is associated in a hammerhead structure synthesized according to the model of the structure existing in the ASBV (Avocado Sunblotch Viroid).

Hammerhead Structure

The synthesis of such ribozymes, capable of cleaving RNA molecules at particular sequences, can lead to extremely important applications because they can prevent the expression of certain genes (by destroying the correspond­ing mRNAs) especially in the case of viruses pathogenic for plants, animals and man.

Important Steps of the Expression of Genes Coding

III. RNA Replicase:

Viruses (and RNA phages) use for the replication of their genome, one of the following 4 possibilities:

1. Viruses of the 1st group (like the virus of poliomyelitis, or phages of E.coli, such as f2, MS2, R17, Qβ) contain in their particles a strand of RNA (+) (by convention RNA + is the mRNA). During the infection, a (—) strand is synthesized, which serves a template for the synthesis of new (+) strands as we will see below.

2. The reverse situation prevails in the 2nd group (e.g., comprising the virus of rabies) The virion contains a (-) strand of RNA which will serve as template for the synthesis of (+) RNA.

3. The 3rd group of viruses (including the reoviruses) consists of double- stranded RNA viruses, in which the (±) duplex serves as template for the asymmetric synthesis of (+) strands.

4. In the 4th group are found the retroviruses (like the Rous Sarcoma Virus) in which the (+) RNA of the virion permits the intermediate synthesis of a DNA which in its turn serves as template for the synthesis of new (+) RNAs.

The RNA-containing phages mentioned above use for the replication of their ribonucleic genome, a RNA replicase or RNA synthetase (or RNA-de- pendent RNA polymerase) which catalyzes the following reaction:

This enzyme is not present in the non-infected cells of E.coli but appears in the cells infected by a RNA-containing phage like f2, MS2, R17, Qβ. An enzyme of this type is also found in cells infected by some plant viruses (e.g, Tobacco Mosaic Virus) or animal viruses (e.g., poliomyelitis virus).

It is coded by the viral genome (if not wholly, at least as far as one of the polypeptide chains is concerned): in the case of the phage Qβ, the enzyme comprises 4 sub-units: three of them are proteins of the host (E.coli), namely the protein S1 which is also found in the 30S ribosomal particles and the elonga­tion factors EF —Tu and EF —Ts, while only the fourth sub-unit is coded by the RNA of the phage.

It may be noted that the reaction catalyzed is exactly the same as that catalyzed by the RNA polymerase of healthy cells, only the template is different: it is not DNA (the action in vitro of DNase thus eliminates the synthesis of RNA by the DNA-dependent RNA polymerase, without af­fecting the activity of the viral RNA replicase), but RNA.

Besides, the RNA replicase is extremely specific in the sense that it requires the homologous RNA; for example, the RNA replicase of phase Qβ does not accept as template, the RNA of phage MS2 though the latter is rather close, and even less the various cellular RNAs.

This specificity offers the virus the advantage of allowing the preferential replication of the viral RNA (and thus permits the infectious process), while this RNA is in an environment where the cellular RNAs very largely predominate.

The RNA synthesized is not complementary but identical to the viral RNA and one wondered whether some special mechanism is involved in this transfer of information.

This is not so; the principle remains that of pairing of com­plementary bases, so that during the infection one has, in a first phase, synthesis of a (-) strand complementary to the (+) strand from viral particles, and formation of a duplex, or replicative form, consisting of a (+) strand and a (-) strand in double helix (some believe that these duplexes constitute artefacts formed during the isolation of viral RNAs).

The (-) strand then acts as template for the synthesis of numerous (+) strands. The (+) strands function as messengers to direct the synthesis of viral proteins and after encapsidation they form the genome of new viral particles.

They contain only 4 genes: the gene of the capsid protein, the genes of the maturation protein necessary to assemble RNA and protein to constitute the phage particles (the capsid com­prises 180 molecules of the capsid protein which has a molecular weight of 14 kd, and 1 molecule of the maturation protein which has a molecular weight of 38 kd), the gene of one of the 4 sub-units of the RNA replicase, and the gene of the protein responsible for the lysis of the bacterial cell (this gene overlaps on one hand the gene of the capsid protein and on the other hand the gene of the sub-unit of the replicase).

With the help of the RNA replicase of Qβ, Spiegelman could obtain in vitro the synthesis of RNA molecules identical to the RNA extracted from the phage; these RNA molecules synthesized in vitro were in their turn capable of acting as template in the reaction catalyzed by the enzyme (we have seen that the latter is very demanding as far as its template is concerned) and furthermore, they were capable of progratnming the formation of viral particles in the protoplasts of E.coli (they therefore possessed infectious capacity).

These were the first experi­ments which enabled the synthesis in vitro of a biologically active nucleic acid with the help of a purified enzyme; this synthesis in vitro of a ribonucleic genome was followed by that of a deoxyribonucleic genome (that of the phage ϕX 174).

Synthesis of Viral RNA by RNA Replicase

Even in the expression of such a small genome, there are regulation phenomena. On one hand, the replication of the (+) strand must not take place together with its translation (otherwise there would be collision between the RNA replicase moving along the (+) strand from 3′ towards 5′ and the ribosomes moving along the (+) strand from 5′ towards 3′).

This is avoided by the inhibiting action of the RNA replicase of phage Qβ on the binding of ribosomes to the (+) strand as long as a sufficient quantity of (-) strands has not been synthesized.

On the other hand, the 4 phage proteins are not syn­thesized in equal quantities, because much more capsid protein is required (180 molecules per phage particle); this is accomplished by the fact that the ribosomes bind much more strongly to the initiation site of the translation of the capsid protein than to the initiation sites of other proteins on RNA(+). Moreover, the capsid protein inhibits the synthesis of the sub-unit of RNA replicase by blocking the initiation site of the translation of this sub-unit.

As in the case of DNA, there are proofs that the RNA is actually the genome of RNA-containing phages and viruses; in some cases it was possible to cause the infection of the host cell by the RNA isolated and purified from viral particles.

On the other hand, after separating RNA and protein from a virus, one can carry out reconstitution experiments in certain experimental conditions; in the case of the tobacco mosaic virus, for example, one can form a hybrid virus from the RNA of a given strain (x) and the protein of another strain (y); after infection of tobacco by this hybrid, one observes the appearance of viral particles the capsid of which is of the type x, i.e. of the type of the strain having provided the RNA, and not of the type of the protein present in the hybrid which started the infection; since it is the RNA which specified the charac­teristics of the viral progeny, it is therefore the actual carrier of the genetic information of the virus.