The synthesis of enzymes – or more generally, the synthesis of proteins — implies the incorporation of amino acids in an order dictated by the genetic information.

Control of the Expression of Genes at the Level of Transcription:

In addition to the adaptation mechanisms operating in the cells in response to changes in environmental conditions, cellular differentiation and develop­ment imply the sequential expression of genes or groups of genes. It is presently established that control of gene expression takes place mainly at the level of transcription.

This transcriptional regulation operates through protein factors which inter­act with particular nucleotide sequences located generally upstream of genes (promoter sites) and with the RNA polymerase itself. Various methods ap­plicable to procaryotes and eucaryotes can reveal this mode of control, either by a direct quantification of the RNA transcribed from a given gene, or by an estimation of the accumulation of the corresponding protein.

A technique called “analysis by nuclease S1” or S1 mapping (see fig. 8-2) allows to quantify with precision the RNA molecules whose transcription has started at the correct site. In this technique, the RNAs synthesized either in in vitro systems or in transformed cells, are hybridized with a single-stranded DNA fragment labeled with 32P at its 5′ end and including the site of transcrip­tion initiation.

This DNA fragment, constituting a radioactive probe, will be placed in the hybridization medium in large excess with respect to the specific RNA. After hybridization, the unpaired molecules or portions of molecules are degraded by a treatment with nuclease S1 of Aspergillus, which hydrolyses selectively the single-stranded nucleic acids.

The fragments of the probe resistant to S1 nuclease are separated by electrophoresis on polyacrylamide gel in denaturing conditions and revealed by autoradiography. The probe fragment protected by the pairing with the specific RNA will be of a precise size equal to the distance between the initiator nucleotide (+1) and the labeling site (5′ end) of the probe. The intensity of the corresponding radioactive band on the autoradiogram will be directly proportional to the quantity of specific RNA present.

Principle of "Nuclease S1 Mapping"

When the product of the gene studied is easily detectable (either by its enzymatic activity or by its immunological properties) one can estimate the activity of this gene by determining the quantity of protein synthesized. If the detection of this protein is difficult, one can “fuse” by recombination in vitro, the promoter of the corresponding gene with the nucleotide sequence coding for a “reporter” protein easy to estimate, like β-galactosidase or chloram­phenicol-acetyl-transferase (CAT).

These regulation mechanisms involve the action of regulator proteins which modify the expression of genes by binding in a specific manner to the cor­responding promoter sequences. The study of these interactions is now pos­sible thanks to conventional methods.

1) Method of “Gel Retardation”:

The DNA fragment studied, made radioactive by labeling in vitro is incubated in presence of proteins which can bind to the DNA. After the formation of specific complexes, the mixture is subjected to electrophoresis on polyacrylamide gel in non-denaturing condi­tions. The DNA-protein complex will migrate more slowly than the fragment of free DNA and its position can be visualized on the gel by autoradiography.

2) Foot-Printing Method:

It consists in incubating the protein fractions with a fragment of double-stranded DNA labeled at one end and corresponding to the promoter region of a gene (see fig. 8-3). After the formation of specific complexes the DNA is subjected to the controlled action of DNase which will cleave the fragment in the accessible zones not protected by interaction with a factor.

Here, it is important to choose digestion conditions leading on average, to only one cleavage per DNA molecule. The position of the protected regions is then revealed by the study of the size of radioactive fragments of the residual DNA and comparison with the size of fragments obtained in the same conditions by action of DNase on the naked DNA, in absence of protein fractions.

A. Procaryotic Systems:

It is certainly in these systems (bacterial cells and bacteriophages) that the transcription regulation mechanisms are best understood at the molecular level. We will see that the modulation of the transcription of a gene is per­formed by the joint action of RNA polymerase and a series of regulatory proteins (activators, repressor, terminators, anti-terminators) which, in presence of appropriate co-factors, interact with the DNA (or mRNA) at the level of particular nucleotide sequences (targets). Two steps of the transcription are thus subjected to regulation: initiation (or start of transcription) and termination.

Principle of the Demarcation of the DNA Region

a) Initiation of Transcription:

During this step the RNA polymerase recognizes particular sequences on the double-stranded DNA and binds to it. The sub-unit (or factor) σ confers on the core enzyme the capacity to bind selectively to these sequences and start the transcription. These sequences form the promoter sites.

The study of the sequence of numerous promoters of E. coli revealed the presence of two regions of homology (called “consensus sequences”) localized at about 10 and 35 base pairs upstream the initiation point (see fig. 8-7) and respectively called Pribnow box or TATA box (because it is rich in nucleotides A and T) and sequence —35 (owing to its position).

The efficiency of the initiation of transcription is mainly modulated by regulatory proteins which also recognize and bind to specific target nucleotide sequences near the promoter site. Generally, these proteins act in the dimeric or tetrameric form and bind to a region of the DNA presenting a point of symmetry at the sequence level (palindromic sequence): the sequence of nucleotides along one strand is similar to that of the complementary strand read in the other direction (see fig. 8-7).

There are generally two types of control at the transcription initiation level:

i. The Negative Control:

Transcription which normally requires the presence of RNA polymerase alone is prevented by the presence of a protein factor (e.g., like the repressor of the lactose system);

ii. The Positive Control:

Transcription cannot take place in presence of RNA polymerase alone; other protein factors are needed to permit this transcription, e.g. the activator of the maltose system, or the CRP-protein required for a number of operons. This positive control is comparable with the action of specific σ factors, recognizing specific promoters;

Control of Transcription by Induction-Repression:

α) Lactose Operon:

Jacob-Monod Model:

In 1961, these authors proposed a model for the control of the expression of genes, on the basis of the observations made concerning the functioning of the lactose operon in E. coli. This operon contains the structural and regulator genes enabling the synthesis of enzymes involved in the metabolism of lactose, i.e.: aβ-galactosidase (coded by the gene z), a permease (coded by the gene y) and a transacetylase (coded by the gene a).

The 3 enzymes of the lactose operon are called inducible (the term adaptative is also used): if E.coli is grown in a conventional glucose-based medium, they are present only in negligible quantities in the cell, but if E.coli is made to grow in a medium which contains nothing but lactose as the only source of carbon, the bacterium adapts the expression of its genome and the concentra­tions of these 3 enzymes increase by about 1 000 times.

The problem is there­fore to identify the control mechanisms enabling the 3 structural genes of the lactose operon to be expressed only when needed and not the rest of the time.

Figure 8-4 shows the lactose operon repressed, a situation which exists in the absence of lactose. The regulatory gene i controls the synthesis of a repressor, a protein capable of diffusing in the cytoplasm and which has an affinity for the region o (operator site) of the lactose operon. This binding of the repressor to the operator prevents the transcription of the structural genes of the 3 enzymes by the RNA polymerase and as a result of the absence of messenger RNA, the 3 enzymes are not synthesized.

Lactose Operon in Repressed State

In presence of lactose (or generally, of an appropriate inducer), the repressor forms with the inducer a complex whose structure is modified by allosteric trans- conformation in such a manner that it can no longer bind to the operator site (see fig. 8-5).

Genes z, y and a can then be transcribed into a polycistronic mRNA, which permits the coordinated synthesis (in a constant quantitative ratio) of the 3 corresponding enzymes. The operon is induced. The initiation site recog­nized by the RNA polymerase is the promoter site (p).

Lactose Operon in Induced State

Experimental Confirmation of the Predictions of this Model:

Before presenting the experimental arguments supporting the Jacob-Monod model, a word must be said about the inducer which was used in these studies: isopropylthiogalactoside (IPTG), a synthetic galactoside capable of inducing the lactose operon like the natural galactoside that is lactose, but which differs from lactose by the fact that it cannot be metabolized; in other words it can “cheat” the regulating mechanisms the functioning of the operon, but not β-galactosidase.

Such an inducer, which makes the cell produce enzymes of which it cannot be the substrate is called gratuitous inducer. The advantage of using a gratuitous inducer lies in the fact that one can thus eliminate — for the study of phenomena — the metabolites of lactose (which cannot be produced from IPTG).

One must also get acquainted with the 3 types of mutants used, in which the control of the synthesis of the lactose operon is altered; they are mutants of the gene i or of the region o:

i. imutants are bacteria whose i gene has been altered in such a manner that the repressor (which is the product of the i gene) is no longer synthesized, or is modified and has lost its affinity for the operator. In these mutants the repression mechanism cannot therefore function, with the result that the operon is constantly induced and the 3 enzymes of the lactose operon are synthesized continuously — even in the absence of inducer — in maximum quantities.

ii. is mutants are super-repressed, because mutation of the i gene resulted in the synthesis of a repressor which has no affinity for the galactosides and is therefore bound permanently to the operator even in presence of the inducer. The operon cannot therefore be induced and the 3 enzymes are never synthesized;

iii. oc mutants (“constitutive operator”) are characterized by a deletion (more or less important) of the DNA of the o region, which can no longer bind the repressor, with the result that the 3 enzymes are synthesized continuously, even in the absence of inducer. In oc mutants as also in i mutants, the operator- repressor combination cannot take place; in the former case the operator site is defective, in the latter, the repressor.

More than 6 years lapsed before the existence of the repressor postulated by the Jacob-Monod model could be experimentally confirmed, thanks to the experiments of Gilbert and Muller-Hill.

These authors observed that a bac­terial extract placed in a dialysis bag binds IPTG (after allowing the cell-free extract to come into equilibrium with a radioactive IPTG solution, they ob­served that the IPTG concentration was slightly higher inside the bag), and using this binding of IPTG as a test to follow the repressor, they purified the repressor.

Chapeville and Co-Workers

It is a protein — which was expected — because “suppressive” non-sense mutations of the i gene were known; suppression is due to a mutation modifying a /RNA which becomes capable of reading a non-sense codon (see fig. 6-45), it is therefore a “correction” which takes place at the time of protein synthesis; since a mutation of the i gene could be “suppressed”, i.e. corrected by a second mutation whose effects manifest themselves at the protein synthesis level, it was clear that the product of the i gene had to be a protein. The purified protein obtained by following the binding of IPTG is indeed the repressor, in other words the product of the i gene, because the extracts of a is mutant (super-repressed, non-inducible) do not bind IPTG.

The other predictions of the model were then verified by the use of repres­sor labeled with 35S (after culture of E.coli in presence of 35S-sulphate whose 35S is incorporated into the sulphur-containing amino acids, and therefore into proteins). This labeled repressor was mixed with DNA containing the lactose operon, and the mixture was centrifuged in a sucrose gradient: the DNA sediments faster than the proteins and it was observed that 35S was present in the DNA.

It was shown that this, radioactivity does reflect the binding of the repressor to the operator by using, in another experiment, the DNA of an oc mutant — whose operator site is defective — and observing that in this case the quantity of 35S recovered after centrifugation in the DNA band was either nil, or much smaller (depending on the extent of the lesion in the o region).

Moreover, it must be indicated that the repressor binds only to native DNA, and not to denatured DNA. On the other hand, by mixing the DNA extracted from a wild strain with the radioactive proteins obtained front an i mutant (which synthesizes no repressor or a repressor without affinity for the operator) no 35S is observed in the DNA band after centrifugation, which shows that in the first experiment the radioactivity found in the DNA band does represent the binding of the repressor.

Lastly, by mixing an intact repressor with DNA containing an intact operator site, the binding of the labeled repressor to the DNA does not take place in presence of IPTG, which confirms the prediction of the model concerning the mechanism of induction (see fig. 8-5): in presence of an inducer, there is no binding of the repressor to the operator.

It was therefore verified that the repressor, product of the i gene, binds the inducer (except when it originates from a non-inducible is mutant), that it binds to the operator site (except when this site has undergone a deletion), but that the binding of the repressor to the operator does not take place in presence of an inducer and this is all in conformity with the predictions of the model.

With the help of cell-free extracts containing all the required factors, it was possible to obtain in vitro the transcription of the DNA comprising the lactose operon into mRNA specific of this operon, and then the translation of this mRNA into the corresponding specific proteins. The addition of the purified repressor to the incubation mixture blocks the synthesis of mRNA, and there­fore of the specific proteins, but addition of the inducer lifts this repression.

β) Tryptophan Operon:

In the case of the lactose operon we have seen that the repressor binds to the operator, except in presence of lactose (or another inducer) which lifts the repression of the operon. Lactose induces the synthesis of enzymes which will enable the cell to metabolize this disaccharide.

In the case of the tryptophan operon which contains the genes coding for the enzymes required for the synthesis of the amino acid by the bacterium, one has the opposite situation: when the amino acid is absent in the cell, the operon is functional (it is said to be derepressed), so that the amino acid is synthesized; on the contrary when the amino acid is present in excess, the operon is repressed.

In spite of this difference which is apparently fundamental, the model proposed by Jacob and Monod is also applicable: one only has to consider the repressor as a protein molecule unable to bind to the operator when it is alone and which requires — in order to bind — a molecule of co-repressor present in the cell when the amino acid is in excess. Hence, in the absence of amino acid, the operator is not blocked by the repressor, and the transcription of the structural genes corresponding to the enzymes can take place.

On the contrary, when the amino acid is present in the excess, the co-repressor combines with the repressor, which is then capable of binding to the operator and blocking the transcription of the structural genes. This model was also confirmed experimentally by the isolation of the tryptophan repressor, and by in vitro experiments of transcrip­tion and translation of DNA containing the tryptophan operon, experiments which enabled the verification of the role of the repressor.

Let us also indicate a variant in terminology: when the amino acid con­centration decreases in the cell (and when therefore there is a decrease in the concentration of co-repressor needed by the repressor to repress the operon), one uses the term derepression which is in fact equivalent to induction in the sense that both denote a situation characterized by the fact that the RNA polymerase will be able to resume the transcription of the structural genes of the operon (in derepression due to the disappearance of the co-repressor, in induction due to the appearance of the inducer).

γ) Prophage λ:

When the bacteriophage λ infects E.coli, there are two possibilities:

1. Either the phage multiplies in the host cell leading to the liberation of a number of new phage particles and death of the bacterial cell. This is the infectious cycle, which will not be studied here;

2. Or, the deoxyribonucleic genome of phage λ is integrated in the bacterial chromosome; it is then observed that the genes of λ (which is then called prophage and not phage) are not expressed; there is no infectious process, the bacteria continue to grow and divide, apparently normally.

In fact, the genes of λ are replicated as those of E.coli during the duplication of the chromosome: they are thus transmitted to the daughter-cells and the infectious process can start at any moment (even after numerous generations). This is lysogeny.

The problem which arises is therefore to know how is the expression of the prophage λ genes regulated. It was found that there is a repressor, coded by the cl gene of the prophage, which prevents the expression of all the genes of the prophage except that of cl (therefore, the cl gene is to the prophage λ what the i gene is to the lactose operon).

A cell of E.coli, carrying the prophage therefore contains molecules of the λ repressor which, not only repress the expression of genes of the prophage genes, but also prevent the expression of genomes of infectious λ phages which could penetrate into the cell (whence the name “immunising substance” sometimes given to the repressor).

Ptashne succeeded in isolating the repressor of the λ phage. He started with 2 cultures of E.coli containing prophage λ, previously irradiated by UV to decrease to the minimum the synthesis of bacterial proteins, and he infected them, one by λ phages having an intact cl (in presence of 14C-leucine), the other by λ phages having an altered cl gene (in presence of 3H-leucine).

After one hour, he stopped the 2 cultures, extracted the proteins and compared the fractionation profiles of proteins on a DEAE cellulose column: he observed a number of peaks where the ratio of the 2 isotopes was constant, but he found that an important peak (labeled with 14C), resulting from cells infected by λ having cl intact, was absent in the extract (labeled with 3H) resulting from cells infected by λ having the cI gene altered. This protein peak therefore had to be the product of cI gene, which was confirmed later.

Ptashne then showed that the radioactive repressor is found bound to the DNA of λ after mixing the two and centrifuging in a sucrose gradient; he verified that this binding does involve the repressor (no radioactivity was found in the DNA band when the radioactive proteins originated from a mutant having a non-sense codon in the cI gene) and requires a homologous operator site (there is no radioactivity in the DNA band, if this DNA is extracted from a phage having an operator different from that of the wild type λ phage which was used for preparing the labeled repressor).

It can be seen that the repression of the genes of the prophage λ also uses the mechanisms predicted by the Jacob- Monod model. However, a difference to be noted is the fact that the affinity of protein cl for its operator sites is not modulated, as in the case of the repressor of the lactose operon, by the fixation of a ligand.

It is the fact that cI is (or is not) synthesized from the genome of the phage which will (or not) lead to a repression of the expression of this genome. This is actually a highly simplified representation of the regulation of phage λ, because there is in fact a complex system of regulations in cascade permitting the sequential expression of the phage genes in the host cell once the repression is lifted.

δ) Remarks on the Control of Transcription by Induction-Repression:

Repression blocks the synthesis of new enzyme molecules by preventing the production of the corresponding mRNA, but it does not affect the enzyme molecules existing in the cell, which will continue to function until they disap­pear by the usual degradation processes. It also does not prevent the already synthesized mRNA molecules from continuing to be translated.

The effects of repression are therefore less rapid than those of feedback inhibition. On the contrary induction, owing to the rates at which transcription and translation take place, produces effects which can be observed very early and one may see enzymes appearing in bacteria a few minutes after the addition of an inducer.

We have seen that an excess of tryptophan — due for example, to an addition of this amino acid into the medium where the bacterium is grown — causes a repression of the tryptophan operon.

Neither the synthesis of the corresponding polycistronic mRNA, nor the synthesis of the various enzymes required for the formation of the amino acid will take place, and since these processes require much energy, it is clear that repression enables the cell to make a significant economy of energy.

The systems described above (lactose operon, tryphophan operon) were taken as examples because they are in perfect agreement with the Jacob- Monod model. But variant models have been established. The best known is the positive regulation. The standard example is the maltose system.

In this system, the enzymes required for the catabolism of this disaccharide are synthesized only when a protein activator coded by a regulatory gene, is activated by maltose and will interact with an initiator site, thus permitting the transcription of the structural genes.

This target site is located upstream the binding site of RNA-poIymerase (and not downstream as in the case of lactose operon), and the interaction of the activator protein with its target will make more efficient the starting of the transcription.

The same molecule will, according to the position of its site of fixation with respect to the polymerase, play the role of repressor or activator; this is in fact the case of protein cI in the λ phage.

In the arabinose operon, the product of the regulatory gene behaves as a repressor in the absence of arabinose, but in presence of arabinose it behaves as an activator and permits — as in the maltose system — the transcription of the structural genes of the arabinose operon.

On the other hand, the target sites of regulatory proteins are often more complex than in the examples described above. They are often numerous and their interaction with the proteins can lead to major modifications (loops) of the DNA double helix.

Interactions between regulatory proteins and DNA are now being better un­derstood. There are various biochemical, physico-chemical and genetic methods, allowing to determine the nucleotide residues which are in relation with the protein. In most cases the sequences involved are in palindromic organization (see fig. 8-7), in relation with the symmetrical polymeric structure of the protein.

Signals Controlling the Transcription of the Lactose Operon

It has been possible (especially for the lactose and λ repressors, and the CRP protein) to identify in the protein structure, several rather inde­pendent domains; only one of them binds strongly to the DNA (another domain acting mainly in the association between promoters).

The study of the three-dimensional structure of purified proteins and protein-DNA complexes deduced from crystallographic examinations revealed, in the domain which binds to the DNA, the presence of a preserved motif consisting of 2 α helices separated by a small β-turn (see fig. 8-6). In this motif called helix-turn-helix, one of the 2 α helices of the protein is inserted in the large groove of the DNA, establishing in it specific contacts with the atoms of the underlying nucleotides.

The second helix positioned across the large groove, stabilizes this structure by interacting simultaneously with the DNA and the first helix. Furthermore, the close interaction between the DNA and these proteins produces a slight curva­ture of the DNA which probably has repercussions on the stability of the double helix. It is now believed that it is the joint action of contacts regulatory proteins-RNA polymerase and of slight modifications of the DNA structure which lead to the transcriptional effect observed.

Complex between the Dime of the λ Repressor cI and the Operator Site

Data of protein sequence and tertiary structure indicate that there are other families of regulatory proteins which present within themselves major homologies, especially in regions involved in the binding to the DNA.

a) Simultaneous control of the transcription of several operons:

It has been shown that the transcription of several bacterial operons corresponding to different metabolic pathways, can be controlled by the same protein. In this case, the regulatory mechanism is generally of the positive type. For example, in the phenomenon of catabolic repression, the presence of glucose during bacterial growth inhibits the syntheses of a very large number of inducible enzymes pertaining to diverse catabolic pathways.

This phenomenon is due to an arrest of the transcription of the corresponding operons, transcription which requires not only the appropriate inducer (lactose for example, in the case of the lactose operon), but also a specific protein, having bound cyclic AMP (cAMP), which is called CAP (Catabolite gene Activator Protein) or CRP (Cyclic AMP Receptor Protein) and which binds to the DNA upstream of the promoter (see fig. 8-7). In presence of glucose, the cAMP content decreases and the deficiency in CRP-cAMP complex causes a block of the transcription.

Multiple regulation models were also established in the case of regulations by the nitrogen source, phosphate, ionic strength, etc.

On the other hand, the specificity of transcription can also be under a positive type of control by modifications at the RNA polymerase level itself. As men­tioned earlier, positive control implies the existence of protein substances which permit the expression of some genes (which are not transcribed in the absence of these substances). The sub-unit (or factor) o of the RNA polymerase of E.coli belongs to this group of substances.

Thus the synthesis of particular σ factors, presenting new specificities, permits the ex­pression of some operons in E.coli this is the case for example of the σ32 factor (thus named because it has a molecular weight of 32 kdaltons) which recog­nizes the promoters of heat-shock genes coding for a series of proteins syn­thesized in response to a thermal shock; these promoters have a —10 sequence different from the usual —10 sequence:

A similar mechanisms operates in the sporulation of Bacillus subtilis, which represents a simple model of cell differentiation. It also plays a role in the control of transcription of the viral genome, at different stages of the infection of E.coli by some bacteriophages of the Teven series (one knows for example, a factor σ T4 which acts during the infection of E.coli by this phage).

Other mechanisms can also play a role in the positive control of transcription: for instance appearance of new RNA polymerase (in the case of phages T3 and T7), or the modification of the core enzyme leading to new specificities.

b) Termination of transcription:

The transcription of an operon ends with the liberation of the RNA synthesized and of the RNA polymerase which leaves its template.

The termination of transcription confers on each transcription unit its in­dividuality with respect to the neighbouring operons but permits also, in certain conditions, the fine regulation of the expression of a given operon (as we shall see while studying the mechanism of attenuation).

In E.coli, termination takes place generally at specific sites on the DNA; these sites are of two types: some, called ρ (rho)-dependent, require the action of a termination protein, the ρ factor.

Others are ρ -independent; in this case a comparison of the 3′ terminal sequences of numerous messenger RNAs reveals the existence of palindromic sequence, rich in G + C, which permit a folding back of the RNA by pairing of complementary bases (see an example in the typical sequence of the terminator of fig. 8-7); these sequences are followed by a series of uridylic residues (poly U) at the 3′ end of the RNAs. These sequences appear in the form of a series of thymidylic residues in the typical sequences of the terminator of figure 8-7.

Diagrammatic representation of signals controlling the transcription of the lactose operon

The exact mechanism of termination is not yet fully understood but it appears probable that multiple and complex interactions take place, involving the RNA polymerase (core enzyme), the secondary hair-pin structure of the RNA, a destabilization of the RNA: DNA hybrid at the poly U residues, the ρ factor, or other cytoplasmic proteins (e.g., protein Nus A).

Mutations introducing non-sense codons in the nucleotide sequence of an operon can be the cause of a premature termination of the transcription of distal genes of this operon (the proximal genes are nearer the operator, the distal genes are more distant from the operator). These mutations are called “polar” because they bring about not only the arrest of translation in the mutated gene but also a reduction of the expression of distal genes.

These non-sense muta­tions (where translation is interrupted by liberation of the ribosomes) probably unmask transcription termination sites situated further downstream on the RNA and which are normally inactivated by the progression of ribosomes.

The majority of these termination sites, called weak, located inside the multicistronic operons, do not play an important physiological rule (see how­ever, the case of complex operons). It has however been shown that some internal sites participate in the regulation of transcription by being sensitive to particular components having no effect at the level of the initiation of transcription.

Regulatory proteins will then form specific complexes with RNA (and not DNA). This regulatory mechanism functions by selective reduction of the transcription of the distal portions of an operon. Such a system operates during the lytic cycle of the bacteriophage λ where a particular protein, an antiterminator protein, is required to prevent or lift the premature termination of transcription at precise sites.

Another regulatory mechanism, called attenuation, comes into play in most bacterial operons grouping the genes involved in the biosynthesis of amino acids (tryptophan, histidine, leucine, phenylalanine…).

In this type of operon, the cellular concentration of the corresponding amino acid (whose synthesis is catalyzed by the products of the operon) indirectly influences its own biosyn­thesis by modulating the efficiency of the premature termination of transcrip­tion at the level of a signal situated between the promoter and the first structural gene of the operon.

This attenuation mechanism implies a close coupling between the transcription of this proximal portion of the operon (leader region) and its translation. The RNA sequence transcribed from this region (leader RNA) being particularly rich in codons of the amino acid corresponding to the operon, it is finally the cellular concentration of the corresponding amino acyl tRNA which will regulate the translation of the leader RNA.

The arrest of transcription or its continuation downstream the leader RNA is controlled by a process involving two types of alternative secondary structures at the level of the 3′-terminal end of the leader RNA.

The model presently accepted for the attenuation mechanism is the following: when the cellular concentration of the amino acid (and therefore that of the amino acyl-tRNA) corresponding to the operon is low, the ribosomes moving along the leader RNA take a pause at the level of the codons corresponding to this amino acid.

The free part of the leader RNA, situated between the first ribosome and the RNA polymerase, can then adopt only one of the two possible conforma­tions, here, a structure which does not prevent the continuation of the transcription and expression of the structural genes of the operon (fig. 8-8a).

However, when the cellular concentration of the amino acid increases, the ribosomes stop their progression only at the level of the termination codon of the leader peptide. In this case, the only possible conformation for the free part of the leader RNA is a structure typical of the termination signal for the RNA polymerase and the transcription stops at the end of the leader sequence (see fig. 8-8b).

Control by Attenuation

For the tryptophan operon there are therefore two successive ‘locks’, one at the level of the operator, the other at the level of the attenuation site, to decrease the quantity of RNA synthesized in response to increasing cellular concentrations of tryptophan and tryptophanyl-tRNA respectively.

Complex Operons:

There are now numerous examples in E.coli where a number of genes are adjacent on the chromosome, genes whose expression can be obtained from several types of poly (or mono) cistronic messengers. For this purpose there are several promoters and several ter­minators, localized at various positions along the DNA and of different efficiencies.

Furthermore, the efficiency of these signals can depend on various external regulatory factors leading to the preferential increase (or decrease) of the expression of one, or several, of these genes in particular physiological conditions, while in most cases, the expression will be coor­dinated, taking place from a global polycistronic mRNA.

Such mechanisms enable the cell to appreciably refine its response to variations of the environment.

Transcription of Different mRNAs from the Same Group of Genes

B. Eucaryotic Systems:

The control of eucaryotic genes is indeed far from being as fully understood as that of some procaryotic genes, but it is now increasingly clear that the expression of eucaryotic genes is controlled by a variety of different genetic mechanisms, unknown in procaryotes.

a) Modifications at the Genome Level:

In some protozoans, nematodes, crustaceans and insects, the cellular differentiation of somatic lineages may be accompanied by a decrease of the quantity of chromatin or the loss of particular chromosomes.

The increase of the number of copies per genome of a class of genes is a means used by some cells to produce massive quantities of RNAs or specific proteins. This mechanisms of gene amplification operates for example in the oocytes during oogenesis in some batrachians for the massive synthesis of ribosomal RNA or in the egg of some insects to enhance the production of chorionic proteins (proteins constituting the outer envelope of the embryo).

Examples are known of genetic rearrangements where several coding regions, scattered on the same chromosome, fuse together during the development of a cell to form a new transcription unit. This phenomenon is called “somatic recombination”. The best known example is the formation of immunoglobin genes during the differentiation of lymphocytes B, cells specialized in the production of antibodies.

A molecule of antibody or immunoglobulin consists of 4 sub-units: 2 identi­cal, light polypeptide chains (220 amino acids) and 2 identical heavy chains (330-400 amino acids) joined to one another by disulphide bridges and non- covalent bonds to constitute a Y-shaped molecule (see box in fig. 8-10).

The amino acid sequence of the NH2-terminal end of these chains determines the specificity of the antibody vis-a-vis a given antigen and differs from one an­tibody to another (variable domains). This region represents the idiotype of the antibody or specific binding site of the antigen.

The remaining parts (constant domains) are not all identical, but exist in a limited number of forms: 2 types for the light chains (λ and к) and 5 types for the heavy chains (µ, δ, γ, ε and α), the latter determining the belonging of the immunoglobulins to each of the 5 classes (or isotypes) IgM, IgD, IgG, IgE and IgA present in mammals.

The organization of nucleotide sequences coding for the various regions and their assembly during the differentiation of lymphocytes accounts for the ex­treme diversity of antibodies which can be produced by an individual from a relatively small number of genes. Thus, the variable domain of the к light chains for example is coded by about three hundred different V1 elements and 4 different J1 elements, while the constant domain is coded by a single Ck element.

These elements, all present on the same chromosome, are arranged in the order V1n – J14 – Ck indicated in figure 8-10. The arrangement of elements coding for the heavy chains is similar but between the VL and JL elements, one finds about twenty different D elements which contribute to an increase in the diversity of the variable domains of these chains.

The sequences corresponding to the constant domains CmCd Cl, Ce and Ca are localized on the same chromosome (different from the one programming the light chains) after the VLn D20 and JL4 elements (see fig. 8-10).

Structure and Expression of Genes of Immunoglobulins

The assembly of a functional immunoglobulin gene takes place by the recombination of some coding elements at the level of the chromosomal DNA of lymphocytes of some of the coding elements.

In the case of light chains, anyone of the V1 elements fuses at random with one of the 4 J1 elements leading to the sequence V1i J1i – Cλ or к. For the heavy chains this first step includes an additional recombination at the level of a D element and leads to the structure VLi Di JLi – Cµ-α.

The mechanism of this somatic recombination is not yet clear, but it involves particular “signal” sequences located near each of the V, D and J elements. This recombination has two important consequen­ces. The first is the activation of transcription from the V element of the unit thus constituted. This stimulation of the constitutive promoter present in the V element is related to its coming closer to an enhancer situated between J and C.

The effect of this enhancer acts only on the nearest juxtaposed V element. The primary transcript resulting from the transcription of this unit then undergoes a maturation which eliminates the intermediate sequences according to the splicing mechanism of introns. The second consequence of this recombination is that it allows, thanks to the multiple possible combinations, the synthesis of a large number of chains differing by the sequences of the V, D and J elements.

The combination of a light chain and a heavy chain can thus probably provide nearly 107 different sites of binding of antigens. There is an additional factor of diversity which results from the lack of precision of the mechanism of juxtaposition of the V, D and J elements: the junction is established with a dif­ference of a few nucleotides more or less and furthermore, insertions of nucleotides catalyzed by a nucleotidyl-terminal-transferase, can take place during the recombination.

These two phenomena explain that antibodies coded by iden­tical elements are however able to present slightly different binding sites for antigens. Lastly, recent indications suggest that in V, D and J elements, spon­taneous somatic mutations are more frequent than in the rest of the genome. The joint effect of combinative mechanisms, junction errors, insertions of nucleotides and somatic mutations probably takes the potential number of different antibodies to nearly one billion.

Another example of diversification of the genetic expression from the genomic sequences is provided by the variability of the surface antigens of the trypanosome (a flagellate protozoan, blood parasite of various animals, responsible for sleeping sickness) during its development in the animal or­ganism. This variability which enables it to escape from the immune system of the host, is due to the successive expression of genes coding for related but appreciably different glycoproteins, which bind to the envelope of the trypanosome.

In the genome of this protozoan there are therefore some hundred “basic versions” of this gene. These genes are generally not transcribed, but one of them can be duplicated and the copy made (“expression copy”) is expressed if it is transferred by translocation at the end of a chromosome (telomer).

Such a mechanism is applied by the parasite about once in 15 days and the activation of a new copy leads, in most cases, to the destruction of the previous one. In this manner, the parasite modifies its surface antigens before the immune response of the host organism can neutralize it.

Besides these important modifications of the genome there are others which are more discrete, especially modifications of some nucleotide residues of the DNA. A cytosine can for instance be methylated on carbon 5 of the pyrimidine heterocycle (5-methyl-cytosine), if it is adjacent to a guanine (CpG) in the nucleotide sequence of the DNA of eucaryotic cells.

It seems, at least in vertebrates, that the proportion of methylated CpG in a gene or in its immedi­ate neighbourhood reflects the state of activity of this gene: a given gene is generally under-methylated when it is actively transcribed and over-methylated when it is not expressed.

Changes in the rate of methylation seem to involve the replication of the DNA, but the mechanisms operating in this type of control are not fully under­stood. It is possible for example that methylation exerts its effect by modifying the interaction of some regulatory protein factors with their binding sites on the DNA.

Lastly, it must be emphasized that except for vertebrates, the relation between genetic expression and degree of methylation of the DNA is much less clear: for instance in insects, cytosine is practically never methylated, while the quasi totality of CpG remains constantly modified in plants.

It is however possible that in these cases, only a limited number of strategic sites undergo a controlled methylation, playing the same role as in vertebrate but having so far escaped the detection methods which are not sufficiently sensitive.

Finally, let us not forget that the DNA of eucaryotic cells is arranged in a nucleosomal structure. Studies on the digestion of chromatin by some nucleases suggest that the genes being transcribed are generally more sensitive to the action of these nucleases than the genes which are not expressed.

Moreover, active genes possess a hypersensitive region situated immediately upstream the initiation site and probably free from nucleosomes. It is likely that such domains constitute sites accessible to the regulatory proteins in particular, and to the transcription machinery in general.

b) Specialization of RNA Polymerases:

While in procaryotes only one enzyme catalyzes the transcription of all kinds of RNAs (mRNAs, rRNAs and tRNAs), three different classes of RNA polymerases share this task in eucaryotes.

c) Particular Nucleotide Sequences (or Particular Signals):

Convention­al genetics has been particularly useful for the study of the expression of procaryotic genes but due to the material itself, it was not widely applied to eucaryotic systems. It is only with the development of the technology of gene cloning that rapid progress could be made. With these techniques it became possible to isolate a large number of eucaryotic genes of varied functions and origins and to study their structure.

The comparison of their nucleotide sequences revealed the existence of structural motifs common to most of them and the study of the role of these highly conserved sequences was undertaken by “in vitro genetics” experiments.

Purified genes were thus modified in vitro in these regions (deletions or point modifications) and the “effect of these “mutations” on the control of the expression of these genes was tested either in systems of transcription in vitro, or after reintroduction into live cells (microinjection in the nucleus of cells in culture or transfer by endocytosis of a precipitate of calcium phosphate-DNA deposited on the cells). The level of specific transcription (i.e. starting at the correct site on the DNA) is then determined by S1 mapping (see fig. 8-2).

These studies have shown that the sequences involved in the control of transcription in eucaryotes differ notably from those of procaryotes. Thus, the promoter of eucaryotic genes transcribed by RNA polymerase B and coding for proteins seems to cover a DNA region much larger than in procaryotes, comprising the transcription initiation site, and extending to more than 100 base pairs upstream.

One finds a TATA sequence similar to that of procaryotic promoters but localized at about 30 base pairs from the initiation site (instead of 10 in procaryotes). This TATA box, present in the large majority of eucaryotic genes coding for proteins, seems to be involved in the selection of the site of transcription initiation.

Other elements are required for an effective transcription by RNA polymerase B (see fig. 8-11). They are localized upstream the TATA box, generally between 40 and 100 base pairs upstream the initiation site. Their sequence is less conserved but it is often richer in G and C. The elements present in the 100 base pairs which precede the sequences coding for the RNA thus constitute the structure of the “mini­mum promoter” and provide the base (or constitutive) level of transcription by RNA polymerase B.

While the relative positions of the elements of the constitutive promoter are not very flexible, another type of promoter element called enhancer, stimulates the transcription of genes (between 50 and more than 1 000 times), inde­pendently of its orientation and position with respect to the initiation site, whether it is situated upstream the latter, in an intron, or even downstream the gene. This activity, two-directional, and relatively independent of the distance, is characteristics of this type of promoter elements.

Originally identified in viral genomes, these enhancers have since been found in numerous cellular genes. Of variable length, they are generally formed by an association of distinct, often repeated motifs (enhansons).

Each of these motifs being the binding site of a particular protein factor, it appears that the enhancer effect results from the combined (often synergic) action of these factors on the initiation complex formed by the RNA polymerase and the general factors of transcription.

To permit the interaction between proteins, sometimes separated by considerable distances on the genome, the folding back in loop of the DNA has been postulated. It is the promoter nearest the enhancer which however appears to benefit most from its effect.

Thus, in the genes of immunoglobulins, the enhancer is localized between the elements J and C of the genes coding for the light chains k and the heavy chains. This particular position of the enhancer, downstream the initiation site, explains the fact that only the promoter present in the nearest V element, fused with the J or DJ elements during the differentiation of lymphocytes B, is activated.

The promoter elements of the non-rearranged V elements, situated very far upstream, are probably outside the field of the enhancer and remain inactive. Most of these enhancer elements present, in addition, the interesting charac­teristic of being active only in a particular tissue thus conferring on the genes to which they are linked a tissue-specific expression. The enhancer of im­munoglobulins genes is thus functional only in lymphoid cells.

It is therefore probable that the enhancers are required for the intensive expression of genes coding for a major product of cells having reached (or passing through) a particular stage of differentiation. The presence in a tissue of the adequate population of factors recognizing all the enhansons constituting a enhancer and the absence or inactivation of some of these factors in other tissues, may explain the cellular specificity of enhancers.

An example of more flexible regulation is provided by the hormonal control of gene expression. The steroid hormones (glucocorticoids, estrogens, progesterone and testosterone in vertebrates and ecdysteroids in insects) can thus selectively induce or stimulate the transcription of some genes in par­ticular tissues.

These hormones penetrate into the target-cells and bind to protein recep­tors localized in the cytoplasm (receptors of glucocorticoids) or in the nucleus (e.g., receptors of estrogens and progesterone).

Binding of the hormone in­duces the dimerization of the hormone-receptor complex, the activation of the inducer domain (possibly by dissociation of a inhibitor protein bound to the inactive receptor) and the translocation of the hormone-receptor complex into the nucleus where it combines with particular sequences on the DNA.

These binding sites, generally large in number, are localized at variable distances upstream the TATA box (or sometimes downstream, in an intron) of control­led genes. The properties of these sites were thus found comparable with those of the enhancer elements described above and it has been proposed that they are in fact conditional enhancers functioning only in presence of the hormone- receptor complex.

DNA sequences having properties opposite to those of enhancers were discovered and called silencers. Such elements which bind particular protein factors and inhibit transcription independently of their orientation and posi­tion, were identified near a large number of genes (genes of interferons α and β, gene of ovalbumin…) and sometimes juxtaposed to enhancers (enhancer of the gene of heavy chains of immunoglobulins). There is no doubt that the simultaneous presence of silencers and enhancers must contribute to a finer control of tissue-specific expression.

The salivary glands of the Drosophila larva represent a choice system for the study of the mechanisms of hormonal induction because the stimulation of transcriptional activity of the chromosome can be visualized directly. A polytenization of the DNA (cycle of 9 to 10 successive replications without separation of chromatids and without division of the nucleus) has taken place in these cells leading to the formation of giant chromosomes where the actively transcribed zones appear like swellings called “puffs”.

Thus, in the 30 minutes following a treatment with ecdysteroids, one can observe the formation of half a dozen specific puffs whose induction is independent of protein synthesis. On the contrary other puffs, appearing later after this treatment, seem to depend on neo-synthesized proteins. The study of the mechanism of action of these proteins will reveal whether, as may be imagined, they are initiation factors of transcription presenting new specificities (of the bacterial σ factors type for example).

As for the genes transcribed by RNA polymerase A (genes coding for the ribosomal RNAs, or rDNAs), the promoter sequences extend approximately from -150 to +10 (see fig. 8-11). The sequences which include the initiation site (+1) up to the position -40 however seem to play a dominant role in the selectivity of the transcription. No obvious consensus sequence could be found from the comparison of genes of different species.

In the case of the ribosomal transcription units of Xenopus laevis, sequences possessing properties of en­hancer elements were identified in the spacers (about 4 Kb) which separate the copies of rDNA.

The structure of these enhancers is however original in that they correspond to more or less perfect repetitions of a sequence of 42 base pairs present in the promoter region, between the positions -114 and -72. It is therefore likely that these repeated elements play an attractive role for the protein factors which will then bind to the promoter and activate the transcription.

The situation is again different for the genes transcribed by RNA polymerase C where the specificity of initiation is controlled by nucleotide sequences situated inside the coding region (see fig. 8-11). In the case of genes coding for the ribosomal 5S RNA (which has about 120 nucleotides), the responsible sequences are situated between positions +50 and +80.

In the case of tRNA genes (which have about 80 nucleotides), called tDNAs, the essential elements are constituted by two distinct regions generally between +10 and + 30 (block A) and between + 50 and + 70 (block B) separated by a central element not essential for the promoter activity, comprising frequently one or several introns.

These two promoter elements contain each a different sequence of about 10 base pairs found in the genes of all the tRNAs as well as in those of the 5S aRNA. In the latter however, only the second seems to be involved in the promoter. A series of recent observations indicate that some genes transcribed by RNA polymerase C (e.g., gene coding for RNA U6) possess control elements situated upstream the transcription initiation site and especially a sequence of TATA type.

In some conditions, these elements outside the transcribed sequences are sufficient to bring about an efficient transcription by RNA polymerase C and can even be recognized by RNA polymerase B. These observations which suggest that different polymerases can utilize common transcription factors, must be viewed together with the fact that some polypeptide sub-units which constitute the RNA polymerases are present in the 3 classes of enzymes.

Reconstitution studies of in vitro specific transcription systems from cell extracts or partially purified fractions could show that the promoter elements defined above are binding sites of particular transcription factors required in addition to the RNA polymerase. These specific nucleoprotein interactions were revealed by “footprinting” experiments, (see fig. 8-3).

These experiments revealed the existence of multiple factors, specific of different promoter elements, whose combined action seems to be required for optimal promoter activity. The purification of an increasing number of these factors from eucaryotic cells and the cloning of the corresponding genes enabled the determination of some of their structural and functional charac­teristics.

Concerning the domain involved in the binding of factors to the DNA, the existence of 3 types of structural motifs (see fig. 8-12) could be deduced from the study of the amino acid sequence of these factors. The helix-turn-helix motif (similar to the one identified by crystallographic study in some procaryotic regulation factors like cl and CRP) was thus found in the homeo- domain of most proteins which are involved in the control of the early steps of development.

The zinc finger motif corresponds to a folding back of the peptide chain into a loop of 12 to 14 amino acids, stabilized by coordina­tion bonds between a Zn atom and pairs of cysteine and/or histidine. This type of structure was initially discovered in a transcription factor of RNA polymerase C, the factor TFIIIA (which binds to the intragenic promoter of sequences coding for 5S RNA) where this motif is repeated 9 times, consecu­tively.

Similar structures have since been identified in regulator proteins of polymerase B, like factor Spl (3 fingers) binding to the motif GGGCGG (see fig. 8-11), or receptors of steroid hormones (2 fingers). A third type of motifs was found in molecules which bind efficiently to the DNA only in the form of dimers. This motif, generally situated near the C-terminal end of the protein, corresponds to a portion of the polypeptide chain in α-helix, comprising 4 or 5 leucines, each spaced by 6 amino acids.

Thanks to this periodicity (1 leucine every 7 amino acids), the hydrophobic side chains of these leucines are all localized on the same face of the α-helix. Two molecules having this structure can thus combine together at these helices by interdigitation of the side chains of leucines, the interaction being possibly stabilized by hydrophobic bonds between the leucines themselves, or between these residues and other hydrophobic amino acids positioned between the leucines.

This structure called “leucine zipper”, is believed to be involved in the binding to the DNA, in an indirect manner, by enabling the coupling of proteins having only a low affinity for the DNA separately.

Leucine zippers were found in transcription factors binding to the DNA in the form of homodimers (like factor C/EBP which binds to consensus sequences GTGGAATT AG — see fig. 8-11 — or CAAT present in some promoter elements) or heterodimers (like the proteins coded by the proto-oncogenes c-jun and c-fos which assemble together before bind­ing to their recognition site on the DNA).

Let us mention lastly, that leucine zipper structures were also found in proteins comprising helix-turn-helix or zinc finger motifs. This observation suggests that leucine zippers have a more general role in the field of protein-protein interactions.

Promoters of the Genes Transcribed by the RNA Polymerases A,B and C

While the DNA binding domain of these transcription factors is responsible (thanks to the structural motifs described above and their immediate environ­ment) for the high specificity of protein-DNA interactions, a different domain is involved in transcriptional activation. It appears that these enhancer regions often consist of α-helices of amphipathic character, i.e., presenting groups of negative charge on one face of the helix and groups of positive charge on the other.

Experiments of construction of chimerical factors, by genetic recom­bination in vitro, confirmed that the DNA binding domains were responsible for the specificity for a particular promoter element, but showed that the enhancer regions were generally interchangeable (between factors of diverse functions and origins).

These observations emphasize not only the relative independence of the two domains, but also the high degree of conservation of the transactivation mechanism, from yeast to the human cell. At the molecular level, this mechanism is not yet known and its deciphering will necessarily involve the comparative study of the tertiary structure of these specific nucleoprotein complexes.

It however appears that at least some of the interactions of these factors have repercussions on the secondary structure of the DNA at the level of promoter sequences leading to the induction or repression of transcription.

The transmission of the topological modifications of the DNA (e.g., transition from the form B, right-handed double-helix, to the form Z, left-handed double- helix) along the molecule (e.g., by the appearance of superhelical turns or denatured zones) are indeed facilitated by the organization in loops of vast genetic domains (more than 100 Kb) whose ends are lied to the chromosomal skeleton.

Thus, schematically, a local fusion of the DNA would be perceived as a signal of induction, because the separation of strands facilitates the start of the transcription. Conversely, a strengthening of the B structure of the DNA at the initiation site would correspond to a “repression” of the transcription.

Three Types of Structural Motifs

In this perspective, it is quite remarkable that genes whose products are involved in the same function are very frequently grouped together, forming real specialized genetic domains. This is especially the case with ribosomal genes, genes of histones, globins or interferons.

This organization in battery is clearly favourable to a coordinated regulation of the genes concerned. Other families of genes can be controlled simultaneously without being necessarily clustered. In this connection one may cite genes under hormonal control, heat-shock genes, genes induced by light (whose products are involved in photosynthesis systems), genes coding for at least a part of the translation system (proteins constituting ribosomes, elongation factors, sub-units of the RNA polymerase A).

In these examples a promoter sequence specific of each family is recognized by diffusible factors (activated by binding of hormones, by thermal or photonic excitation, or by other stimuli) controlling the level of expression of the corresponding genes.

Lastly, one can mention the particular case of a series of genes (homeotic genes) controlling morphogenesis in the Drosophila. Little is known yet on the functioning of these genes which are expressed each at a precise stage of the embryonic development. It was however found that they present a strong homology in a sequence of 180 base pairs (homeobox), situated in one of their exons.

The homology is even more striking when one compares the sequences of the 60 amino acids coded by these regions. The basic character of this peptide sequence (homeo-region) is remarkable because at least 1 amino acid out of 4 is, either an arginine or a lysine. This observation suggests that the homeotic genes code for protein factors which would bind to well defined sequences on the DNA through basic regions corresponding to the homeobox.

These factors would thus control the activity of other genes involved in the embryogenesis. The fact that the homeobox has been conserved in the course of evolution (since it was found in several higher organisms including man) underlines the importance of its function.

In addition to numerous factors which, in association with precise sites on the DNA, ensure the selectivity of the transcription initiation and control its efficiency, there are also other factors which modulate the rate of elongation of RNA chains.

Little is known at present on the processes of termination of transcription in eucaryotes. While the RNA polymerase C finds, at the 3′ end of all the genes it transcribes, a precise termination signal comparable with that of the ρ-inde­pendent procaryotic genes (a short palindromic sequence, followed by a series of T on the non-coding strand), no typical sequence of termination could be identified for the genes transcribed by the polymerases A and B.

It however appears that the transcription of ribosomal genes generally stops upstream 3 consecutive thymidines. The primary transcripts coding for proteins often possess heterogenous 3′ ends, due to terminations in multiple sites occurring in a DNA zone, of size varying with the genes but always containing sequences rich in AT.

Post-Transcriptional Control of Gene Expression:

The physical separation of the genetic material (nucleus) and the machinery responsible for its translation (cytoplasm) involve in eucaryotes, mechanisms of maturation and transport of the transcripts.

Such mechanisms do not exist in procaryotes where transcription and translation are coupled. In the latter, the post-transcriptional regulation will therefore operate mostly at the translation level, whereas in eucaryotes it can also take place during the steps of matura­tion and transport of RNAs to the cytoplasm.

A. Maturation and Transport of mRNAs:

In procaryotes, the preferential destruction of distal parts of messengers enables, in some cases, the decoordination of the expression of genes belonging to the same multicistronic operon.

In eucaryotes, each step of the maturation of mRNAs (fig. 6-41) is a potential site of post-transcriptional regulation of gene expression. A particularly striking example illustrating this mode of control is that of alternating splicing. In many cases a given pre-mRNA can lead to two or several different mature mRNAs, corresponding each to the combination of a different series of exons.

Important Steps of the Expression of Genes Coding from Proteins in Eucartyotes

This phenomenon is mostly observed in viruses where it permits maximal exploitation of the coding capacity of the DNA by the combined use of the three reading frames. But there are also examples of alternating splicing in animals where, in this manner, the same gene can code for different proteins according to the tissue where it is expressed.

Lastly, another way of modifying the expression of a gene is by controlling the stability of its messengers. The binding of a protein (“poly A-binding- protein” or PABP) on the poly A sequence at the 3′ end of mRNAs increases the life span of these mRNAs, probably by protecting them against an exonucleolytic degradation from this end. Conversely, the presence of a zone rich in AU in the non-translated 3′ region of some mRNAs seems to be the cause of their instability by preventing the binding of the PABP.

On the other hand it was shown that some hormones could have an effect on the stability of some mRNAs: for example, prolactin prolongs the life of casein mRNAs in mammary glands.

B. Translation:

We are giving below some examples illustrating various possibilities of regulation at the translational level:

a) Modification of the Activity of Translation Initiation Factors:

It has been shown that the phosphorylation of one of the eucaryotic initiation factors (eIF-2) leads to the inhibition of this step and therefore to an arrest of protein synthesis in the cell considered. Such a phosphorylation has been shown in cells treated with interferon (resulting in an arrest of translation of viral RNAs) as well as in cells synthesizing hemoglobin (where the action of hemin permits a regulation of the synthesis of globin).

b) Negative Self-Control of Translation:

Some ribosomal proteins of E.coli seem capable, when present in excess in the cell, to bind to the cor­responding polycistronic mRNA, in a site located near (upstream) the initia­tion site of the translation of the first cistron, and thus block their own production by preventing the binding of the ribosome. A control of the same type was recently proposed for the translation of threonyl-tRNA synthetase.

In some cases, the recognition site on the mRNA has some homology (at the level of the primary or secondary structure) with the region recognized by the protein on the RNA with which this protein normally interacts, for instance on the rRNA in the case of ribosomal proteins, or on tRNAThr in the case of threonyl-tRNA synthetase.

c) Role of the Secondary Structure of the RNA:

The best documented example is the expression (in vivo and in vitro) of RNA-containing phages (type Qβ and R17). The RNA of these phages, which acts as messenger, codes for 4 proteins whose expression varies in quantity and in time. Such a control disappears after treatment of the RNA with denaturing agents.

It appears that it is the accessibility of the AUG initiation codons which is involved in this regulation, accessibility which can be modified by the interven­tion of proteins having a “regulatory effect”, as well as by the translation of the message itself.

In fact the secondary structure of mRNAs can lead to the appearance of a specific binding site for proteins; the formation of such complexes will result in a variation of the efficiency of the translation and/or stability of the correspond­ing mRNA (e.g., regulation of the expression of ribosomal proteins in E.coli).

A very particular model of modification of gene expression was observed recently: some regions of the DNA can be transcribed in both directions. Beside the “normal” messenger (used for the synthesis of proteins) will be synthesized whole or part of a complementary RNA called “anti-sense RNA”, the presence of which will perturb the expression of the “normal” messenger.

The sequence complementarity of the two RNAs (messenger and anti-sense) leads to the formation of hybrids in the cytoplasm of the cell. In such hybrid molecules, the structuring of the mRNA is no longer possible. On the other hand, the accessibility of some sequences of the mRNA is no longer possible, like that of the Shine-Dalgarno sequence (indispensable for translation initia­tion in procaryotes) to ribosomes.

It has been shown that the injection of synthetic anti-sense RNA in cultured cells leads to an appreciable decrease of the translation of the corresponding mRNA. Active research is now in progress in view of the therapeutic or biotechnological utilization of this method; it is thus hoped to block the expression of pathogenic viral genomes for which it is difficult to prepare a vaccine (e.g., AIDS virus).

d) Role of tRNAs and Aminoacyl-tRNA Synthetases:

Some differentiation phenomena lead to the synthesis of very large quantities of specific proteins. This is the case with the silk gland of the silkworm where is synthesized 1 broin, a protein very rich in some amino acids (glycine, alanine and serine alone represent 90% of the amino acids). Protein synthesis may therefore be limited by the concentrations of tRNAs charged with these amino acids.

It was shown that in these differentiated cells, the synthesis of aminoacyl-tRNA synthetases and tRNAs specific of these amino acids (and only these) was increased, thus fulfilling this requirement. This model can be compared with the one sug­gested for explaining the synthesis, from the genome of some phages, of specific tRNAs, which permit the translation of some phage messengers par­ticularly rich in codons recognized by these tRNAs.

e) Cleavage of a Polyprotein Precursor:

In procaryotes the genes of the same operon are transcribed into only one mRNA, called “poly-cistronic”. Translation starts independently at the signals preceding each cistron of this mRNA. In eucaryotes there is only one translation initiation site per molecule of mRNA, but several proteins can be cleaved from a single precursor polypep­tide. The maturation of these ‘polyprotein” precursors represents another possible control point of the expression of eucaryotic genes.

Importance of the Coordinated Expression of Genes:

An Example of Perturbation: Oncogenesis:

The control mechanisms of gene expression not only maintain the “house­keeping” functions (metabolism, growth and cellular division) taking place in all living cells, but also participate in the processes of cellular differentiation and specialization implied by the ontogenic development (from the egg to the adult) of multicellular organisms.

This closely coordinated set of regulations, where each gene or group of genes is expressed in a particular tissue for a particular time and at precise rates, thus adapt the functioning and multiplica­tion of cells of each organ to the needs of the entire organism.

However, it happens that some cells acquire new properties enabling them to free themsel­ves from the constraints imposed on normal cells: such cells called “transformed” cells, are generally characterized by an active and unlimited mul­tiplication, no longer needing their binding to a support.

Within an organism, the appearance of transformed cells whose development escapes any control, is char­acterized by the formation of cancerous tumors (solid or liquid depending on whether the cells have a structural role or are circulating cells).

The molecular bases of cancerogenesis (or oncogenesis) are being actively investigated. They seem to be related to the perturbation of some cellular genes whose products normally participate in intra- or intercellular com­munications. These genes (called proto-oncogenes in normal cells) can code, for example, for transcription factors, growth factors or membrane receptors of growth factors.

The activation of these receptors, due to the binding of the corresponding growth factors, produces within the cell a cascade of events (phosphorylation of particular proteins, increase of pH and of concentration of Na+ and Ca++) whose effects contribute to the stimulation of replication and transcription of the genome (mitogenic action). Moreover, the products of other proto-oncogenes can modulate the intermediate reactions leading to these phenomena.

In this context, one can easily imagine that a modification of the activity of these proto-oncogenes can upset the functions controlling especially the cel­lular cycle and lead to malignant transformation.

Thus, point mutations (caused by carcinogenic chemical substances or by ionizing radiations) can alter the coding sequences of proto-oncogenes (which are then called “c-onc”, for cellular oncogenes) and can thus modify the properties of their expression products.

The consequences of these modifications can be spectacular if they inactivate the corresponding protein or prevent the regulation of its activity. Similar consequences can result from an increase in the rate of expression of a c-onc either by stabilization of the corresponding mRNA, or by activation of its transcription (after translocation of the gene near a transcriptional activator or enhancer), or by gene amplification (i.e. multiplication of the number of copies of the gene).

Lastly, some oncogenes are activated by their integration in the genome of a retrovirus. These viruses, whose genome is an RNA molecule, but whose replication involves a “retrotranscription” of this RNA into DNA and possibly an insertion in the cellular chromosomes, may have, in the course of their history incorporated a proto- oncogene of the host cell. The cellular gene generally mutated, placed under the control of the viral promoter, can then behave like a viral oncogene (v- onc), when the virus infects other cells.

It therefore appears that cancerous transformation can result from qualitative or quantitative alterations of different proto-oncogenes or varied combinations of proto-oncogenes. These multiple possibilities explain the great variety of cancers observed due to the tissue specificities of the expression of these genes.