A genome library consists of a large collection of bacterial cultures, in each of which a different segment of the genomic DNA of an organism has been cloned by recombinant DNA technology, so that the entire collection represents its complete genome.

The analogy between a library of books and a genome library is based on the fact that both genes and books are store-houses of information. Information in a book is contained in the sentences made from words which are, in turn, formed by assorting the alphabetic letters. Similarly, a gene is a sequence of DNA bases having three-lettered codons.

The size of a genome library depends naturally on the genome size. For example, if a library is to be set up of an organism having a genome of 4 x 106 base pairs (bp) using a vector of 40,000 bp insert size, the minimum number of bacterial clones (each clone being a separate culture) is 4 x 106/4 x 104 i.e. 100. However, to make sure that at least 95% of the genome is covered in the entire collection, the size of the library would be three times larger i.e. 300.

The genome size of some viruses, E. coli and eukaryote organisms is shown in Table 9.9:

 

Genome Size of Some Viruses and Organisms

In a genome library, each “book” is a separate clone of bacteria in which a specific fragment of the DNA of the organism has been inserted via a vector. So, the number of books (volumes) in the library depends on the size of the total DNA and also on the size of DNA that can be inserted into a single vector (insert size). The common restriction enzymes, like Eco R1, Hind III, Bam HI etc. have a recognition sequence consisting of 6 base pairs.

On the other hand, some newly-discovered restriction enzymes, like Not I and Sfi I, have recognition sites consisting of 8 bp. The probability of the presence of longer recognition sequences in a particular stretch of DNA is less than that of shorter recognition sequences. This means that a segment of DNA is cleaved into a smaller number of larger fragments by the restriction enzymes having a longer recognition sequence.

The average size of the fragments due to cleavage by Not I and Sfi I is about 66,000 bp. However, such large fragments cannot be cloned into the conventional plasmids or cosmids and special type of vectors have been developed for this purpose. Larger DNA fragments are suitable for setting up libraries of eukaryotic organisms, because they have larger genomes, so that the number of clones to cover the entire genome becomes smaller.

The large DNA fragments exceeding 60kb size, as produced by Not I, can be separated by means of a specialized gel electrophoresis technique, known as pulsed field gel electrophoresis (PFGE). In contrast to the conventional gel electrophoresis where the electric field is maintained in a constant direction, in PFGE, the orientation of the electric field is periodically changed, alternating it between two sets of electrodes at right angles to each other.

This causes an improved separation of large DNA fragments, because large fragments take more time to orient themselves in response to the changed electric field. The separated DNA fragments are known as restriction fragment length polymorphisms (RFLPs), both in conventional gel electrophoresis as well as in PFGE.

For cloning large DNA fragments, the common bacterial plasmids are unsuitable a vectors, because their insert size is small. Bacteriophage PI has a large insert size of about 85 kb It has been used successfully for cloning genes of Drosophila. For insertion of still larger fragments up to 1,000 kb (1 Mb) in size, special vectors have been developed by genetic engineering techniques. One such vector is known as yeast artificial chromosome (YAC).

YAC vector contains four types of genetic elements, viz. a cloning site, a yeast centromere and genetic markers that can be selected in yeast, an E. coli plasmid origin of replication and genetic markers that can be selected in E. coli, and lastly, a pair of telomere sequences from Tetrahymena. Thus, YAC is a shuttle vector which can replicate in both E. coli and yeast.

The structure of YAC vector is diagrammatically shown in Fig. 9.141:

Structure of Yeast Artificial Chromosome Vector

When YAC vector is used for cloning, the circular vector chromosome is isolated after growth in E. coli and subjected to two different restriction enzymes, A and B as shown in Fig. 9.141. The cleavage by B results in opening of the circular vector and elimination of the portion lying between the telomeres. The cleavage site of the other restriction enzyme (A) lies within the cloning site of the vector. This enzyme opens the insertion site where the large DNA fragment can be inserted and ligated to form a recombinant DNA molecule.

The recombinant YAC vector can then be introduced into yeast cells by transformation. The vectors which have the DNA of the donor inserted at the cloning site can be selected in yeast cells by inactivation of a marker gene located in the cloning site. Finally, the yeast cells which have acquired a particular DNA fragment (a gene) can be identified by the colony hybridization technique.

A genome library has the advantage that it can be maintained indefinitely by merely periodical transfer of the cultures of the host organisms, like bacteria or yeast in fresh media. A useful DNA fragment can be retrieved from the library whenever required, if the clone carrying the particular DNA is known.

Artificial synthesis of cloning vectors, like YAC, capable of inserting large DNA fragments has encouraged research in mapping and sequencing complex genomes of eukaryotic organisms, including humans. The ambitious Human Genome Project was undertaken with the aim to map the 70,000 odd genes distributed in the 46 chromosomes (2n) constituting the 3 billion base-pair long human genome, and to determine the nucleotide sequence of the entire genome.

The development of automated DNA sequencing methods has made it possible to determine the sequence of more than 1,000 bases in a DNA fragment per day. The complete genomes have been sequenced for a number of other organisms, including yeast, E. coli, Arabidopsis, Coenorhabditis, and several bacteria.

Home››Genome››Genome Library››