In this article we will discuss about the plant genome project.
The major revolution in the study of genomes of different species was brought about due to the availability of recombinant DNA and PCR technologies. These techniques helped in preparation of molecular maps of many plant and animal genomes. The objective of genomic research in any species is to sequence the whole genome and to decipher functions of all the different coding and non-coding sequences.
The technology for large-scale DNA sequencing has enable scientists to undertake genome sequencing project in a realistic time scale. Since the time of first ‘large’ genome sequencing in bacteriophage λ in 1983, the projects on different groups have been completed.
Some notable examples include the bacterium Escherichia coli, the yeast Saccharomyces cerevisae, the weed Arabidopsis thaliana, the rice Oryza sativa, the nematode Caenorhabditis elegans, the fruit fly Drosophila melanogaster, the mouse Mus musculus, the primate chimpanzee Pan troglodytes, the human Homo sapiens sapiens.
In crop science genomics has been promoted using Arabidopsis, a weed plant found all over the world, easy to grow, short span of life cycle, one of the smallest genome among dicot plants and Oryza sativa (rice), one of the members from monocot having very simple genome organisation and similarity with other major cereal plants.
For any genome sequencing programme, the following steps are undertaken (Fig. 22.18):
1. Construction of Linkage Maps with Molecular Markers:
Different kinds of molecular markers like RFLP, RAPD, microsatellites; AFLP, etc. are used to construct the actual physical linkage map. The RFLP markers are then hybridised with vectors containing the DNA fragments that are to be sequenced, enabling the positioning of various cloned DNA fragments along the chromosome.
2. Gene Libraries:
Construction of gene libraries using restriction endonuclease and then by cloning the sheared DNA into vectors like cosmid or YAC is essential to identify the overlapping clones.
3. Screening of Libraries and Constructing Contigs:
STS (sequence targeted sites) or the EST (expressed sequence tags) helps a lot in sequencing process by facilitating the task of annotation of the final sequence. When the ordered overlapping clones are available then the contigs are constructed, which represent stretches of contiguous sequence ready clones.
4. Sequencing:
After identification of YAC clones these are further fragmented by restriction endonuclease and again sub-cloned into cosmid. After identification of overlapping ‘cosmid’ clones these are then sheared and cloned in M 13 vectors or pUC plasmid and used for sequencing using the Sanger method.
5. New Vectors BAC & PAC and the Shot Gun Approach:
YAC clones are found to be unstable and chimeric in nature, so the use of alternative vectors such as BAC (Bacterial Artificial Chromosome) or the PAC (PI derived Artificial Chromosome) became necessary, within which-100 kb can be cloned easily.
With the aid of computers and special software ends the every BAC are sequenced and then matched with other BAC clones and overlapping are detected. This process reduces the number of BAC clones to be sequenced fully and it is more powerful technique.
The genome sequencing in Arabidopsis thaliana was followed by the efforts of sequencing the genomes in several crop plants like cereals, oilseeds, legumes, vegetables, etc. The plant genomes that are sequenced or targeted to be sequenced during the first decade of the present century have been listed in Table 22.2.
Arabidopsis Genome Project:
Three American groups namely Meyerowitz, Somerville and Goodman—at three different universities took the first step leading to the project on sequencing of the Arabidopsis genome later known as AGI.
The first RFLP map was created by Meyerowitz, second by the Goodman group and later integrated. Initially the conventional ‘clone-by- clone’ was dominant approach but later shot gun approach was followed using cDNA libraries and ESTs.
By 1999 the first report came on chromosome 2 and 4, but now the whole genome has been sequenced with all the information’s.
Few of them are as follows:
1. The genome has approx. length of 145Mbp, of which genes are present (coding region is 2-2.5 kb) at each 4-5 kb interval.
2. Telomere and centromere regions are full of repeated DNA.
3. Centromeric region has transposons and pseudogenes.
4. Sometimes entire stretches of DNA and genes are duplicated between chromosomes.
5. Approx. 20% of the genes have signal sequences and target products into organelles such as chloroplasts or mitochondria. Entire mitochondrial genome of Arabidopsis is represented on chromosome no. 4.
6. It has about 20,000-50,000 genes of various functional groups.
Rice Genome Project:
Scientists from Japan started the Rice Genome Programme (RGP) in 1991. In 1998 the second phase of RGP was launched. Now 10 countries are participating in the international Rice Genome Sequencing Programme (IRGSP).
Monsanto (2000) produced the draft of rice genome of Japonica variety which has explored the following:
a. The rice genome is estimated to comprise of 420-466Mbp of DNA.
b. Monsanto produced 399 Mb of sequences from 3391 BAC clones in 2000 (not available to public).
c. In 2002 the draft sequence of Indica rice variety was published in Beijing Genomic Institute and Syngenta published a draft sequence of Japonica variety.
d. Similar to Arabidopsis, the early work began with random genomic clones, but in recent years cDNAs and ESTs have supplied a large number of markers. IRGSP has made the progress in preparing contig maps employing YAC and BAC.
e. From the sequence data available, it has been reported that the total number of genes in rice is not much larger than in Arabidopsis and most of the genes of two plants show homology.
f. According to available data on rice from Gen Bank, 28,282,731 bases of sequences has been submitted from the rice EST sequencing projects.
g. Average rice gene is 2.2 kbp containing 3.9 exons and 2.9 introns. The density of rice gene is one gene per 5.7 kbp.