This article throws light upon the seven important methods used for DNA sequencing. The seven important methods used for DNA sequencing are:
(1) Sanger’s Method (2) Maxam and Gilbert Method (3) Hybridization Method (4) Pal Nyren’s Method (5) Automatic DNA Sequencer (6) Slab Gel Sequencing Systems and (7) Capillary Gel Electrophoresis.
DNA sequencing is the determination of the precise sequence of nucleotides in a sample of DNA. Before the development of direct DNA sequencing methods, DNA sequencing was difficult and indirect. The DNA had to be converted to RNA, and limited RNA sequencing could be done by the existing cumbersome methods. Thus, only shorter DNA sequences could be determined by this method. Using this method, Walter Gilbert and Alan Maxam at Havard University determined that the Lac operator is a 27 bp long sequence.
The development of direct DNA sequencing techniques changed the scope of biological research. The evolution of DNA sequencing technology from plus-minus sequencing to pyro-sequencing within about 20 years parallels the progress in biology from molecular biology to genomics.
The development of DNA sequencing techniques with enhanced speed, sensitivity and throughput are of utmost importance for the study of biological systems. Sequence determination is most commonly performed using di-deoxy chain termination technology. Pyro-sequencing, a non-electrophoretic real- time bio-luminometric method for DNA sequencing has emerged as a state of the art sequencing technology.
This technology has the advantage of accuracy, ease of use, and high flexibility for different applications. Pyro-sequencing allows the analysis of genetic variations including SNPs, insertion/deletions and short repeats, as well as assessing RNA allelic imbalance, DNA methylation status and gene copy number.
Contents
Method # 1. Sanger’s Method:
The first DNA sequencing method devised by Sanger and Coulson in 1975 was called plus and minus sequencing that utilized E. coli DNA pol I and DNA polymerase from bacteriophage T4 with different limiting triphosphates. This technique had a low efficiency. Sanger and co-worker (1977) eventually invented a new method for DNA sequencing via enzymatic polymerization that basically revolutionized DNA sequencing technology.
The most popular method for doing this is called the dideoxy method or Sanger method (named after its inventor, Frederick Sanger, who was awarded the 1980 Nobel prize in chemistry [his second] for this achievement). Finding a single gene amid the vast stretches of DNA that make up the human genome – three billion base-pairs’ worth – requires a set of powerful tools. These tools include genetic maps, physical maps and DNA sequence which is a detailed description of the order of the chemical building blocks, or bases, in a given stretch of DNA.
Scientists need to know the sequence of bases because it tells them the kind of genetic information that is carried in a particular segment of DNA. For example, they can use sequence information to determine which stretches of DNA contain genes, as well as to analyze those genes for changes in sequence, called mutations, that may cause disease.
The first methods for sequencing DNA were developed in the mid-1970s. At that time, scientists could sequence only a few base pairs per year, not nearly enough to sequence a single gene, much less the entire human genome. By the time the HGP began in 1990, only a few laboratories had managed to sequence a mere 100,000 bases, and the cost of sequencing remained very high. Since then, technological improvements and automation have increased speed and lowered cost to the point where individual genes can be sequenced routinely, and some labs can sequence well over 100 million bases per year.
DNA is synthesized from four deoxynucleotide triphosphates. The top formula shows one of them: deoxythymidine triphosphate (dTTP) (Fig. 23.7). Each new nucleotide is added to the 3′ – OH group of the last nucleotide added.
The dideoxy method gets its name from the critical role played by synthetic nucleotides that lack the -OH at the 3′ carbon atom. A dideoxynucleotide (dideoxythymidine triphosphate – ddTTP as shown here) can be added to the growing DNA strand. When it is added it stops chain elongation because there is no 3′ -OH for the next nucleotide to be attached. For this reason, the dideoxy method is also called the chain termination method.
The bottom formula shows the structure of azidothymidine (AZT), a drug used to treat AIDS. AZT (which is also called zidovudine) is taken up by cells where it is converted into the triphosphate. The reverse transcriptase of the human immunodeficiency virus (HIV) prefers AZT triphosphate to the normal nucleotide (dTTP). Because AZT has no 3′ -OH group, DNA synthesis by reverse transcriptase halts when AZT triphosphate is incorporated in the growing DNA strand. Fortunately, the DNA polymerases of the host cell prefer dTTP, so side effects from the drug are not as severe as might have been predicted.
The Procedure:
The DNA to be sequenced is prepared as a single strand (Fig. 23.8).
This template DNA is mixed with the following:
(a) A mixture of all four normal (deoxy) nucleotides in sample quantities
i. dATP
ii. dGTP
iii. dCTP
iv. dTTP
(b) A mixture of all four dideoxynucleotides, each present in limiting quantities and each labeled with a “tag” that fluoresces a different colour:
i. ddATP
ii. ddGTP
iii. ddCTP
iv. ddTTP
(c) DNA polymerase I:
Because all four normal nucleotides are present, chain elongation proceeds normally until, by chance, DNA polymerase inserts a dideoxy nucleotide instead of the normal deoxynucleotide. If the ratio of normal nucleotide to the dideoxy versions is high enough, some DNA strands will succeed in adding several hundred nucleotides before insertion of the dideoxy version halts the process.
At the end of the incubation period, the fragments are separated by length from longest to shortest. The resolution is so good that a difference of one nucleotide is enough to separate that strand from the next shorter and next longer strand. Each of the four dideoxynucleotides fluoresces a different colour when illuminated by a laser beam and an automatic scanner provides a printout of the sequence.
Method # 2. Maxam and Gilbert Method:
In 1977, Maxam and Gilbert described a sequencing method based on chemical degradation at specific locations of the DNA molecule. The end labeled DNA fragments are subjected to random cleavage at adenine, cytosine, guanine or thymine positions using specific chemical agents and the products of these fours reactions are separated using polyacrylamide gel electrophoresis (PAGE). As in Sanger method, the sequence can be easily read from four parallel lanes in the sequencing gel.
Double stranded or single stranded DNA from chromosomal DNA can be used as template. Originally, end labeling was done with P phosphate or with a nucleotide linked to P and enzymatically incorporated into the end fragment. The read length is up to 500bp. The chemical reactions in the technique are slow and involved hazardous chemicals that require special handling in the DNA cleavage reaction.
As in Sanger’s method, additional cautions in Maxam and Gilbert method include purification and separation of DNA fragments and higher analysis time. Therefore, this technology is not suitable for high throughput large-scale investigation.
Method # 3. Hybridization Method:
Ed Southern’s (1990) sequencing by hybridization technique relies on detection of specific DNA sequences using hybridization of complementary probes. It utilizes a large number of short nested oligonucleotides immobilized on a solid support to which the labeled sequencing template is hybridized. The target sequence is deduced by computer analysis of the hybridization pattern of the sample DNA.
DNA sequence can also be analyzed by sequencing by synthesis. Sequencing by hybridization makes use of a universal DNA microarray, which harbors all nucleotides of length k (called “k-words”, or simply words when k is clear). These oligonucleotides are hybridize to an unknown DNA fragment, whose sequence one would like to determine.
Under ideal conditions, this target molecule will hybridize to all words whose Watson-Crick complements occur somewhere along its sequence. Thus, in principle, one would determine in a single microarray reaction the set of all k-long substrings of the target and try to infer the sequence from those data.
The average length of a uniquely resconstructible sequence using an 8-mer array is <200 bases, far below a single read length on commercial gel-lane machine. The main weakness of sequencing by hybridization is ambiguous solutions-when several sequences have the same spectrum; there is no way to determine the true sequence.
Method # 4. Pal Nyren’s Method:
In 1996, Pal Nyren’s group reported that natural nucleotide can be used to obtain efficient incorporation during a sequencing-by-synthesis protocol. The detection was based on the pyrophosphate (inorganic biphosphate) released during the DNA polymerase reaction, the quantitative conversion of pyrophosphate to ATP by sulfurylase and the subsequent production of visible light by firefly luciferase.
The first major improvement was inclusion of dATPaS in place of dATP in the polymerization reaction, which enabled the pyrosequencing reaction to be performed in homogeneous phase in real time.
The non-specific signals were attributed to the fact that dATP is a substrate for luciferase. Conversely, dATPaS was found to be inert for luciferase, yet could be incorporated efficiently by all DNA polymerases tested. The second improvement was the introduction apyrase to the reaction to make a four-enzyme system. Apyrase allows nucleotides to be added sequentially without any intermediate washing step.
Pyrosequencing nonelectrophoretic real-time DNA sequencing method is based on sequencing by synthesis based on the pyrophosphate (inorganic biphosphate) released during the DNA polymerase reaction.
In a cascade of enzymatic reaction, visible light is generated that is proportional to the number of incorporated nucleotides. The cascade starts with a nucleic acid polymerization reaction in which inorganic bip-hosphate (PPi) is released as a result of nucleotide incorporation by polymerase.
The released PPi is subsequently converted to ATP by ATP sulfurylase, which provides the energy to luciferase to oxidize luciferin and generate light. The light so generated is captured by a CCD camera and recorded in the form of peaks known as pyrogram (compared with electropherograms in Sanger’s method). Because the added nucleotide is known the sequence of template can be determined.
Standard pyrosequencing uses the Klenow fragment of E. coli DNA pol I, which is relatively slow polymerase. The ATP sulfurylase used in pyrosequencing is a recombinant version from the yeast and the luciferase is from the American firefly. The overall reaction from polymerization to light detection takes place within three to four seconds at real time.
One pmol of DNA in a pyrosequencing reaction yields 6 x 1011 ATP molecules which in turn, generate more than 6 x 109 photons at a wavelength of 560 nm. This amount of light is easily detected by a photodiode, photomultiplier tube or a CCD camera. Pyrosequencing technology has been further improved into array-based massively parallel microfluidic sequencing platform.
Method # 5. Automatic DNA Sequencer:
A variant of the above dideoxy-method was developed, which allowed the production of automatic sequencers. In this new approach, different fluorescent dyes are tagged either to the oligonucleotide primer (dye primers) in each of the four reaction tubes (blue for A, red for C, etc), or to each of the four ddNTPs (dye terminators) used in a single reaction tube: when four tubes are used, they are pooled.
After the PCR reaction is over, the reaction mixture is subjected to separation of synthesized fragments through electrophoresis (Fig. 23.9). Depending upon the electrophoretic system used, whether slab gel electrophoresis or capillary electrophoresis, following two types of automatic sequencing systems have been designed.
Method # 6. Slab Gel Sequencing Systems:
These systems make use of ultrathin (75 µm) slab gels and involve running of atleast 96 lanes per gel. In these systems, automation in sample loading of sequencing gels has also been achieved, by using a plexiglass block having wells that are same distance apart as the comb teeth cut in a porous membrane that is used as a comb for drawing samples by capillary action.
Each well in plexiglass block is filed with a sample (PCR dideoxy-reaction mixture), so that when the porous membrane comb is lowered onto the sample wells in the pexiglass the samples are drawn up automatically into the comb teeth by capillary action.
Using this approach of employing porous combs, automated loading of up to 192, 384 or 480 samples per gel has been achieved. The porous comb with the samples is placed between the glass plates of the gel apparatus above the flat surface of the polymerized gel and the samples are driven from the comb into the gel by electrophoresis.
Method # 7. Capillary Gel Electrophoresis:
In these systems, slab gel electrophoresis is replaced by capillary gel electrophoresis to analyse DNA samples. In these systems, instead of scanning DNA as it migrates through 96 lanes each in a series of 96 capillary tubes, DNA fragments pass are scanned.
In the original models of the above old slab gel machines, gels must be poured and reagents frequently reloaded, interrupting the sequencing.
In capillary gel sequencing systems, on the other hand, the robot moves the DNA samples and reagents through the tubes continuously, requiring attention only once a day. The system produces a steady flow of data, each signal representing one of the four DNA bases (adenine, cytosine, guanine and thymine).
DNA sequencing by PCR:
DNA sequencing can also be carried out using asymmetric PCR method.
Examples of sequencing:
Beginning in the late 1990s, the scientific community witnessed a remarkable climax of accomplishments related to DNA sequencing. In addition to the historic sequencing of the human genome, sequences have now been generated for the genomes of several key model organisms, including the mouse (Mus musculus); the rat (Rattus norvegicus); two fruit flies (Drosophila melanogaster and D. pseudoobscura); two roundworms (Caenorhabditis elegans and C. briggsae); yeast (Saccharomyces cerevisiae) and several other fungi; a malaria-carrying mosquito (Anopheles gambiae) along with a malaria-causing parasite (Plasmodium falciparum); two sea squirts (Ciona savignyi and C. intestinalis); a long list of microbes; and a couple of plants, including mustard weed (Arabidopsis thaliana) and rice (Oryza sativa).
Sequencing work is well underway on the honey bee (Apis mellifera), and is just getting started or expected to begin soon on the chimpanzee (Pan troglodytes), the cow (Bos taurus), the dog (Canis familiaris) and the chicken (Gallus gallus).