Read this article to learn about the creation of gene libraries with diagram.
Gene Libraries:
The collection of DNA fragments (specifically genes) from a particular species represents gene libraries.
The creation or construction of gene libraries (broadly genomic libraries) is accomplished by isolating the complete genome (entire DNA from a cell) which is cut into fragments, and cloned in suitable vectors.
Then the specific clone carrying the desired (target) DNA can be identified, isolated and characterized. In this manner, a library of genes or clones (appropriately considered as gene bank) for an the entire genome of a species can be constructed. The sizes of genomes in different species are variable (Table 9.1). A complete gene library for each organism contains all the genomic DNA.
Biotechnologists are particularly interested in the isolation of genes (and therefore creation of gene libraries) which encode for proteins. There is a distinct difference in the genes of prokaryotic and eukaryotic cells. In prokaryotic organisms, the structural genes coding for proteins are continuous. However, in case of eukaryotes, the coding regions (exons) of structural genes are separated by non- coding regions (introns). For this reason, the construction of gene libraries for eukaryotes is more complicated.
Creating a Gene Library:
The DNA from the source organism is digested by restriction endonuclease (e.g., EcoRI), to result in fragments. It is desirable to create conditions so that partial digestion and not complete digestion occurs. By this way, all possible DNA fragments of variable size can be produced. The partial digestion of a DNA with a restriction endonuclease is depicted in Fig. 9.1.
The cleavages occur at different sites to result in DNA fragments of varying lengths, some of them may be large while others are small. In practice, a combination of restriction enzymes are used to digest source DNA to release a large number of DNA fragments. The desired fragments can be isolated and cloned.
Some workers use the term shotgun experiment (or shotgun approach) for creation of random clones; (without necessarily identifying all of them) from a genomic DNA. In shotgun approach, the DNA is subjected to random cleavage by restriction endonucleases.
Maniatis technique for creating gene library:
In the technique developed by Maniatis et aI (1978), two restriction endonucleases are used to cut the target DNA. Partial digestion allows the formation of majority of DNA fragments with a length of 10-30 kb. The fragments are frequently overlapping and they can be fractionated by gel electrophoresis. The isolated fragments (approximately 20 kb in size) are inserted into λ phage vector and cloned.
Establishing a gene library for humans:
The human cellular DNA (the entire genome) may be subjected to digestion by restriction endonucleases (e.g., EcoRI). The fragments formed on an average are of about 4 kb sizes, (i.e., 4000 nitrogen’s bases). Each human chromosome, containing approximately 100,000 kb can be cut into about 25,000 DNA fragments. As the humans have 23 different chromosomes (24 in man), there are a total of 575,000 fragments of 4 kb length formed. Among these 575,000 DNA fragments is the DNA or gene of interest (say insulin gene).
Now is the selection of a vector and cloning process. E.coli, a harmless bacterium to humans is most commonly used. The plasmids from E. coli are isolated. They are digested by the same restriction enzyme as was used for cutting human genome to form open plasmids. The human chromosomal DNA fragments and open plasmids are joined to produce recombined plasmids.
These plasmids contain different DNA fragments of humans. The recombined plasmids are inserted into E. coli and the cells multiply (Fig. 9.2). The E. coli cells possess all the human DNA in fragments. It must, however be remembered that each E. coli cell contains different DNA fragments. All the E. coli cells put together collectively represent genomic library (containing about 575,000 DNA fragments).
Other vectors for creating genomic libraries:
In place of phages and plasmids, other vectors are in use for construction of large sized DNA libraries. These include cosmids, bacterial artificial chromosomes (BACs) and yeast artificial chromosomes (YACs). These are considered as high capacity vectors. Although they are ideal for construction of gene libraries, there are many practical difficulties associated with their use.
PCR as an Alternative to Genomic Library Construction:
PCR is a technique for amplification of a specific DNA sequence. PCR with primers can be used to isolate target DNA directly from the genome. Thus, PCR serves as an alternative to DNA library (gene library) construction by cloning. This is not always possible since PCR technique can be employed for amplification of short length DNAs (usually 1-2 kb with a maximum of 5 kb). Further, the high temperature used in PCR, sometimes causes damage to bases and generates nicks in DNA strands.
Long PCR:
With some modifications in the PCR, it is now possible to amplify DNA fragments up to a length of 22 kb from the human genomic DNA. This is achieved by using a combination of two DNA polymerase enzymes, besides lowering the reaction temperature. One of the DNA polymerases has proof-reading activity to remove the mismatched bases.
Some commercial companies in fact provide enzyme cocktails ideally suited for long PCR e.g., TaqPlus Long PCR system marketed by Strata gene. It contains Taq polymerase and the thermo-stable proof-reading enzyme Pfee polymerase.
Long PCR has been applied for the structured analysis of human genes and genomes of HIV. Long PCR is unlikely to replace the construction of genomic libraries. This is because creation of DNA libraries is permanent while long PCR is temporary. Further, PCR can be employed for amplifying selected DNA fragments (of interest) from genomic libraries.
Fragment libraries:
Biotechnologists often come across minute quantities of starting materials e.g., single cells, fixed tissues, fossils etc. It is quite difficult to apply traditional technique for construction of genomic libraries from such samples. PCR is ideally suited for isolation and amplification of genes from very small samples. Thus, PCR can be used for the creation of random genomic fragment libraries.
Complementary DNA Libraries:
Cloning of eukaryotic genes is rather complicated and requires special techniques. This is mainly due to the non-coding sequences (introns) in the DNA. In the eukaryotic cell as the gene is transcribed, the RNA undergoes several changes in the nucleus (referred to as splicing) to release a mature and functional mRNA into the cytoplasm.
In this manner, the introns are removed. In some genes, introns form a major bulk of the gene. For instance, in the human dystrophin gene, as much as 99% of the DNA sequence is composed of introns. There are as many as 79 introns!
The prokaryotes, particularly the bacteria, do not possess the ability to remove the introns. Hence the functional mRNA is not correctly formed in a prokaryotic cell for an eukaryotic gene. Thus, cloning of eukaryotic genes becomes a difficult task.
Synthesis of complementary DNA:
Complementary DNA (cDNA) is a double- stranded complement of an mRNA. cDNA can be synthesized from mRNA by reverse transcription. An eukaryotic functional mRNA which does not have introns possesses a G cap at 5′ and a poly (A) tail at the 3′ end (approximately 200 adenine residues).
The requisite mRNA is isolated and purified (particularly from cells which are rich in the specific mRNA e.g., pancreatic cells for insulin mRNA). An oligo-dT primer is added to bind to the short segment of poly A tail region (by annealing). This primer provides 3′-hydroxyl group for the synthesis of a DNA strand.
With the addition of the enzyme reverse transcriptase and four deoxynucleotides (dATP, dTTP, dGTP and dCTP), DNA synthesis proceeds. For the bases A, G, C and U in the template (mRNA), the corresponding complementary bases in DNA respectively are T, C, G and A. The newly synthesized first DNA strand has a tendency to fold back on to itself for a few nucleotides to form a hairpin loop (Fig. 9.3).
The loop of the first DNA strand serves as the template for the synthesis of second DNA strand. By the addition of E. coli DNA polymerase (Klenow fragment), the second DNA strand synthesis occurs starting from the end of the hairpin loop. On treatment with the enzyme RNase H, mRNA molecules are degraded. The enzyme SI nuclease cleaves,, the hairpin loops and degrades single- stranded-DNA extensions. The final products are complementary DNA copies of original mRNA, some of them are complete while others are incomplete.
Limitation of the technique:
The main disadvantage with the hairpin method is the loss of a small sequence at the 5′ end of cDNA due to cleavage by SI nuclease.
Improved method for cDNA synthesis:
To overcome the limitation described above, some improvements have been made in cDNA synthesis. One such improved technique is shown in Fig. 9.4.
As the first strand of cDNA is synthesized, it is tailed with cytidine residues with the help of the enzyme terminal transferase. The mRNA strand is hydrolysed with alkali, and the full length cDNA is recovered. A synthetic oligo-dG primer is then annealed to oligo-dC. This in turn enables the synthesis of the second strand of cDNA. By this improved technique, a full length of cDNA corresponding to mRNA (in turn the gene) is obtained. But the efficiency of this method is comparatively lower.
Construction of cDNA libraries:
The complementary DNA molecules can be cloned in cloning vector (e.g., plasmid), for creating cDNA libraries. The cDNA insertion into the vector should have correct orientation. This is achieved by the addition of a synthetic linker to the double- stranded cDNA. In a technique developed by Okayama and Berg (1982), the mRNA is first linked to the plasmid cloning vector and then cDNA synthesis is carried out.
RT-PCR as an alternative to cDNA cloning:
Reverse transcription followed by PCR (RT-PCR) can amplify the mRNA to give cDNA. RT-PCR is very rapid hence cDNA molecules can be obtained in a short period. Further, even the long length mRNAs can be conveniently used in RT-PCR. There are some disadvantages also in RT-PCR. The DNA polymerase used in RT-PCR is error- prone, and even a very minute contamination of mRNA (with other mRNAs) will give false results.
Screening Strategies:
Once a DNA library or a cDNA library is created, the clones (i.e., the cell lines) must be screened for identification of specific clones. The screening techniques are mostly based on the sequence of the clone or the structure/function of its product.
Screening by DNA Hybridization:
The target sequence in a DNA can be determined with a DNA probe (Fig. 9.5). To start with, the double-stranded DNA of interest is converted into single strands by heat or alkali (denaturation). The two DNA strands are kept apart by binding to solid matrix such as nitrocellulose or nylon membrane.
Now, the single strands of DNA probe (100-1,000 bp) labeled with radioisotope are added. Hybridization (i.e., base pairing) occurs between the complementary nucleotide sequences of the target DNA and the probe. For a stable base pairing, at least 80% of the bases in the two strands (target DNA and the probe) should be matching. The hybridized DNA can be detected by autoradiography.
DNA Probes:
The DNA probes used for screening purpose can be synthesized in many ways.
Random primer method:
Radioisotope labeled DNA primers can be produced by this technique (Fig. 9.6). The double- stranded DNA containing the sequence needed to serve as a probe is denatured. A mixture of synthetic oligonucleotides, with all possible combinations of bases (A, G, C and T), with a length of 6 nucleotides each serve as primers. Some of these primers with complementary sequences will hybridize with the template DNA. This occurrence is entirely by chance and the probability is reasonably good.
By the addition of four deoxyribonucleotides (one of them is radiolabeled) and in the presence of the enzyme DNA polymerase of E. coli (Klenow fragment), the primers are extended on the template DNA. Since a radioactive label is used, the newly synthesized DNA fragments are labeled at appropriate places, and these are the DNA probes. A number of labeled DNA probes can be produced from an unlabeled template DNA.
Non-isotopic DNA probes:
For the production of non-isotopic DNA probes, one of the four deoxynucleotides (used for primer extension described above) is tagged with a label (e.g., biotin). The label of the DNA probes can be detected by use of chemical and enzymatic reactions.
Screening by Colony Hybridization:
The DNA sequence in the transformed colonies can be detected by hybridization with radioactive DNA probes (sometimes labeled RNA probes can also be used). Colony hybridization technique is also referred to as replica plating by some authors. The technique depicted in Fig. 9.7 is briefly described.
The transformed cells are grown as colonies on a master plate. Samples of each colony are transferred to a solid matrix such as nitrocellulose or nylon membrane. The transfer is carefully carried out to retain the pattern of the colonies on the master plate. Thus, the nitrocellulose paper contains a photocopy pattern of the master plate colonies. The colony cells are lysed and deproteinized.
The DNA is denatured and irreversibly bound to matrix. Now a radiolabeled DNA probe is added which hybridizes with the complementary target DNA. The non-hybridized probe molecules are washed away. The colony with hybridized probe can be identified on autoradiograph. The cells of this colony (from the master plate) can be isolated and cultured.
Many a times multiple colonies are detected on hybridization by a DNA probe. This is due to overlapping sequences. To identify which colony has the complete sequence of the target gene, data observed from the restriction endonuclease analysis will be helpful.
Modifications of colony hybridization technique:
Several improvements in the colony hybridization technique, described above, have been made in recent years. In the plaque lift technique, nitrocellulose paper is directly applied on the upper surface of master agar plate making a direct contact. By this way, plaques can be lifted and several identical DNA prints can be made from a single plate. This technique increases reliability. More recently, screening of DNA libraries is carried out by automated techniques.
Screening by PCR:
Polymerase chain reaction (PCR) is as good as hybridization technique for screening DNA libraries. But adequate information (on the franking sequences of target DNA) must be available to prepare primers for this method. The colonies are maintained in multiwall plates, each well is screened by PCR and the positive wells are identified.
Screening by Immunological Assay:
Immunological techniques can be used for the detection of a protein or a polypeptide, synthesized by a gene (through transcription followed by translation). The procedure adopted for immunological assay and hybridization technique (described already) are quite comparable. Screening procedure by immunological assay is depicted in Fig. 9.8, and briefly described hereunder.
The cells are grown as colonies on master plates which are transferred to a solid matrix (i.e., nitrocellulose). The colonies are then subjected to lysis and the released proteins bound to the matrix. These proteins are then treated with a primary antibody which specifically binds to the protein (acts as an antigen), encoded by the target DNA. After removing the unbound antibody by washings, a second antibody is added which specifically binds to the first antibody.
Again the unbound antibodies are removed by washings. The second antibody carries an enzyme label (e.g., horse reddish peroxidase or alkaline phosphatase) bound to it. The detection process is so devised that as a colourless substrate it is acted upon by this enzyme, a coloured product is formed. The colonies which give positive result (i.e., coloured spots) are identified. The cells of a specific colony can be sub-cultured from the master plate.
Screening by Protein Function:
If the target DNA of the gene library is capable of synthesizing a protein (particularly an enzyme) that is not normally produced by the host cell, the protein activity can be used for screening. A specific substrate is used, and its utilization by a colony of cells indicates the presence of an enzyme that .acts on the substrate. For instance, the genes coding for enzymes α-amylase and β-glucosidase can be identified by this technique.