Determination of Nucleotide Sequences

The determination of the base sequence of a nucleic acid is much more difficult than the determination of the sequence of amino acids of a protein, because there are only 4 types of monomers instead of 20.

This explains why one began with the determination of the sequences of transfer ribonucleic acids (more than 200 sequences are known); these do offer two great advantages: they are relatively short and have — in addition to the 4 normal bases — a few rare bases (acting as reference points), which considerably facilitates the determination of their primary structure. In the meantime, the structure of other types of ribonucleic acids (ribosomal ribonucleic acids, viral ribonucleic acids) has been determined or is under study, and presently, the nucleotide sequences of deoxyribonucleic acids are also being studied.

In practice, to determine the base sequence of an RNA or DNA, several selective hydrolyses are carried out and the fragments obtained are analyzed. We will now see briefly the principal modes of hydrolysis of ribonucleic acids and deoxyribonucleic acids.

Contents

Hydrolysis of Ribonucleic Acids and Deoxyribonucleic Acids:
Determination of the Sequence of an Oligoribonucleotide:
Determination of the Sequence of Deoxyribonucleic Acids:

Hydrolysis of Ribonucleic Acids and Deoxyribonucleic Acids:

A. Alkaline Hydrolysis:

Figure 6-9 illustrates the mechanism of this hydrolysis; it is seen that it implies the intermediate formation of a 2′ 3′ cyclic phosphate; this is the reason why the rupture of all phosphodiester bonds of the RNA molecule — by scission between the phosphate group and carbon 5′ of the adjacent ribose – leads to the formation of not only nucleosides-3′-phosphates, but also nucleosides-2′-phosphates, although the hydroxyl of carbon 2′ is not at all involved in the internucleotide linkages of the intact RNA chain.

Deoxyribonucleic acids are resistant to alkaline hydrolysis, because there is no hydroxyl group at the position 2′ to permit the reactive mechanism illustrated by figure 6-9.

B. Hydrolysis by the Pancreatic Ribonuclease (RNase):

This enzyme is an endonuclease, which splits the phospbodiester bonds within the RNA chains; it acts in the same manner as the alkalis, by the same mechanism (which explains that deoxyribonucleic acids are resistant), but only when the nucleotide situated on the 3′ side is a pyrimidine nucleotide, as shown in figure 6-10, and it liberates nucleosides-3′-phosphates and oligonucleotides- 3′-phosphates.

C. Hydrolysis by the Ribonuclease T₁:

This RNase has a specificity different from the one above; it is an endonuclease which splits the phosphodiester bonds of ribonucleic acids when a guanylic nucleotide (or an inosinic nucleotide, a rare nucleotide found in the transfer ribonucleic acids) is present on the 3′ side, as shown by figure 6-11.

D. Hydrolysis by the Ribonudease U₂:

It is an endonuclease which splits the phosphodiester bonds of ribonucleic acids when an adenylic or guanylic nucleotide is present on the 3′ side. Used on fragments resulting from the hydrolysis by RNase, T₁, it will specifically split at the adenylic nucleotides. Besides, conditions have been determined, permitting the enzyme to split the ribonucleic acids preferentially at the adenylic nucleotides.

E. Hydrolysis by the Snake Venom Phosphodiesterase:

Contrary to what we have seen so far, this enzyme splits the phosphodiester bonds between the phosphate group and the carbon 3′ of the adjacent ribose, thus liberating nucleosides-5′-monophosphates (see fig. 6-12).

Another difference: this is an exonuclease which attacks the oligo- and poly-nucleotides progressively from their 3′ end by detaching one nucleoside-5′-monophosphate at a time. It attacks the deoxyribonucleic acids as well as the ribonucleic acids.

F. Hydrolysis by the Spleen Phosphodiesterase:

This is also an exonuclease but it acts in a direction opposite to the previous one. Attacking the ribo- and deoxyribo-polynucleotides at their 5′ end, it liberates sequentially nucleosides-3′-monophosphates.

G. Action of Alkaline Phosphatase of E.coli:

The alkaline phosphatase of E.coli and the other phosphatases are – contrary to the enzymes studied till now, which were all phosphodiesterases – phosphomonoesterases which can detach the phosphate group only at the end of a chain as shown by figure 6-13.

H. Action of Deoxyribonucleases (“DNases”):

The first DNases known were less specific than the RNases.

Two endonucleases were particularly studied:

1. The pancreatic DNase (I) which generates oligodeoxyribonucleotides-5′- monophosphates;

2. The acid DNase (II), which yields oligodeoxyribonucleotides-3′- monophosphates.

But, more recently restriction endonucleases were discovered, they are present in diverse bacterial species where they act in degrading any foreign DNA which would have penetrated in the cell (while the homologous DNA is protected by methylation of a base on each strand at the site of the cleavage).

These restriction enzymes cleave the molecules of double-stranded DNA at very special nucleotide sequences, characterized by a symmetry and present in a very limited number in the DNA molecules.

As examples, are given below the sequences recognized by three restriction endonucleases commonly used and extracted respectively from E.coli (Eco R I), Haemophilus aegyptus (Hae III) and Providencia stiiarti (Pst I), as well as the places where the cleavages take place (indicated by arrows):

It is interesting to note that the cleavage by a restriction endonuclease like Eco R I or Pst I produces fragments possessing cohesive ends (i.e. single- stranded ends having complementary sequences), which then allows the joining of two fragments of DNA of different origins but which were cleaved by the same restriction enzyme and therefore possess the same cohesive ends. This property is used in experiments of genetic recombination in vitro, particularly to introduce deoxyribonucleic acids of diverse origins in phages or plasmids.

Determination of the Sequence of an Oligoribonucleotide:

To illustrate how one can determine a nucleotide sequence with the help of carefully chosen enzymes, we will take the example of the trinucleotide GpAp-Up. As shown by figure 6-14, phosphatase can be used first; it detaches the terminal phosphate at the 3′ end of the fragment.

Then follows an alkaline hydrolysis which will yield a nucleoside (uridine) easily separable from the 2 nucleotides also liberated; we can now say that there was a uridylic nucleotide at the 3′ end of the original trinucleotide.

Besides, a quantitative analysis of nucleotides (and nucleoside) obtained by alkaline hydrolysis reveals the base composition of the oligonucleotide. In this case, the trinucleotide GpApU yields Gp, Ap and U in equal quantities, whereas for instance the tetra- nucleotide GpApApU would have given 1Gp and 1U but 2 Ap.

The quantitative analysis of bases is carried out in some cases (after perchloric hydrolysis). The action of snake venom phosphodiesterase on another aliquot of the trinucleotide previously treated by the phosphatase, will yield a nucleoside (guanosine) easily separable from the 2 nucleotides liberated; this will indicate that there is a guanylic nucleotide at the 5′ end of the trinucleotide.

Knowing the nucleotide of each end, it is deduced that the adenylic nucleotide is in the middle and that the trinucleotide has the sequence GpApUp.

Determination of the Sequence of Ribonucleic Acids:

Three types of methods are presently used for determining the primary structures of ribonucleic acids.

i. Enzymatic hydrolysis with the help of pancreatic, T₁ or U₂ ribonucleases, or another nucleases, followed by fractionation of the oligonucleotides obtained, either by column chromatography (which requires rather large quantities of material, but permits subsequent characterization of nucleotides — especially the rare nucleotides — by their absorption spectrum in ultra violet light), or by two-dimensional electrophoresis on cellulose acetate, ion exchange paper, or polyacrylamide gel (which necessitates lesser quantities, but requires the use of ribonucleic acids labeled with ³²P — which is relatively easy in microorganisms, but poses problems in higher organisms — because the identification of oligonucleotides is done by autoradiography by applying the paper or the gel against a film for a given time).

The ribonucleic acids which are difficult to label in vivo can be labeled in vitro (post-labeling), either at their 5′ end (after removal of the terminal 5′ phosphate, and binding of ³²P phosphate by action of a polynucleotide kinase in presence of ATP labeled on the third phosphate), or at their 3′ end (with ³²P or by periodic” oxidation followed by a reduction by sodium borohydride labeled with tritium).

These post-labeling techniques are also applicable to fragments obtained by hydrolysis — partial or total — of an RNA. Oligonucleotides separated by two-dimensional electrophoresis can sometimes (the shorter ones) be identified by their position. Otherwise, they must undergo complementary hydrolysis to be analyzed.

Integration of the identified oligonucleotides is done by the study of longer fragments obtained by limited hydrolysis (by lowering temperature, enzyme concentration or hydrolysis time).

ii. A method recently developed for the determination of the sequence of either a tRNA, or a fragment of RNA of comparable length, consists in labeling the 5′ end of this RNA with ³²P by means of a polynucleotide kinase and ATP (³²P_y), then submitting various aliquots to different specific enzymatic hydrolyses (by pancreatic, T₁ or U₂ RNases, or other nucleases) in conditions under which there will be statistically only one cleavage per chain.

The fragments thus obtained will be fractionated according to their size by electrophoresis on polyacrvlamide gel, and only the fragments carrying the 5′ end labeled with ³²P (*p) will be revealed by autoradiography.

At the same time, a partial alkaline hydrolysis which can cleave all phosphodiester bonds (but causes statistically only one cleavage per chain) serves as reference. The sequence of a fragment (which can reach 70 nucleotides or more) can thus be read off directly on the gel. The principle of this reading is illustrated in the following diagram, in the case of a pentanucleotide.

Radioactive oligonucleotides obtained after hydrolysis by:

iii. Study of a strand of viral RNA synthesized in vitro synchronously, in present of the RNA replicase, the complementary strand serving as template and the 4 radioactive ribonucleoside-triphosphates. Interruption of the synthesis after very short times (a few seconds) gives radioactive newly synthesized fragments (whose lengths increase as reaction time is extended) the sequence of which can be determined. This method was used for studying the sequence of the RNA the Qβ phage.

Determination of the Sequence of Deoxyribonucleic Acids:

In recent years, methods were developed for determining DNA sequences. These methods are based on the preparation of specific fragments with the help of restriction endonucleases and the cloning of these fragments which provides large quantities of the DNA to be sequenced.

The analysis of these fragments can be made by the method described by Maxam and Gilbert, a method very similar to the one described above for the determination of the sequence of ribonucleic acids. The principle is actually identical, except that various chemical hydrolyses are used (instead of enzymatic hydrolyses).

Four aliquots of a DNA fragment (labeled with ³²P at one end) are subjected to four different reactions, in conditions under which there will be statistically, only one cleavage (or a small number of cleavages) per chain:

i. A treatment with dimethylsulphate which methylates particularly guanine in N7 (and to a lesser degree, adenine in N3); under the effect of heat, guanine is detached and the polynucleotide chain is broken there. One therefore obtains fragments ending at the point where there was a G in the chain;

ii. A treatment with dimethylsulphate in acid medium detaches the 2 purines, and therefore yields fragments ending where there were adenines and guanines in the chain, thus enabling — by comparison with the previous reaction — the determination of fragments ending with A;

iii. A treatment with hydrazine in NaCl M permits the determination of fragments ending with C;

iv. A treatment with hydrazine (without salt) enables the cleavage at the cytosines and thymines, and — by comparison with the previous reaction — the identification of fragments ending with T.

The fragments produced by these four reactions are subjected concurrently to an electrophoresis which separates them according to their size. The shortest fragment is found at the bottom of the gel and, knowing that the size of two successive fragments differs only by one nucleotide, the sequence of the DNA can be read off after autoradiography, as shown by the example of the above diagram.

Presently, one can separate, by electrophoresis on a polyacrylamide gel of 40 cm length, fragments ranging from 1 to about 300 nucleotides (or more, by using longer gels). One may verify that no error has been committed by sequencing the two strands and checking that the complementary is perfect.

To establish the sequence of a longer DNA region (one gene can contain several thousands of nucleotides, especially if it has introns), the sequence of adjacent fragments is determined.

To make sure that small fragments were not lost (two restriction sites recognized by a given enzyme can be relatively near one another), one must sequence overlapping fragments, obtained by means of various restriction endonucleases, and the order of which has been established.

One can also determine the sequence of deoxyribonucleic acids by the dideoxynucleotides method or Sanger’s method: to a single-stranded DNA (which will serve as template) is hybridized a short complementary primer (having a 3’OH), from which DNA polymerase I will polymerize the four deoxyribonucleoside-triphosphates (dNTP) and thus synthesize a DNA strand complementary to the template.

In fact, four independent polymerization reactions are performed by introducing in each, one of the four dideoxyribonucleoside-triphosphates (ddNTP), having neither an OH in 3′, nor in 2′, and whose incorporation in the DNA chain in place of the corresponding dNTP will therefore bring about the (prematured) termination of the chain (because there is no OH in 3′ to enable the formation of the next phosphodiester bond).

Each reaction mixture therefore contains the 4 dNTP (of which one is labeled on the Pα) and only one ddNTP. When the analogue is ddATP for example, a random incorporation of ddATP (and therefore, a random termination of the chain) can be obtained by adjusting the ddATP/dATP ratio; there will be finally, a series of DNA fragments of different sizes, having all the primer at their 5′ end and a ddA in 3′.

After the 4 polymerization reactions (each with a different ddNTP) 4 series of fragments are obtained, each series having at the 3′ end, respectively ddA, ddG, ddC, ddT.

These fragments are then separated, by electrophoresis on polyacrylamide gel, according to their size (resolution is such that one can distinguish between two fragments of sizes differing only by one nucleotide), and after autoradiography one can read the sequence on the gel, proceeding from the bottom of the gel (where is found the shortest fragment) to the top, as shown by the diagram.

This gives the sequence of the newly synthesized DNA (which is complementary to the strand used as template) from the primer, in the direction 5′ → 3′.