In this article we will discuss bout:- 1. Organisation and Significance of Arabidopsis Genome 2. Sequencing Strategies of Arabidopsis Genome 3. Complete View 4. Annotation of Flowering Pathway.
Organisation and Significance of Arabidopsis Genome:
Arabidopsis has emerged as one of the most widely used organisms for studying biology of higher plants. This tiny weed plants belongs to brassicaceae family and closely related to man food plants such as cabbage, cauliflowers, turnip, raddish etc. This little weed plant is advantages to researchers studying plant biology and genomics.
Using Arabidopsis, researchers can manipulate light, soil and nutrients to study the effect on organ development, physiology of seed production. They can be exploited to deploy viral, bacterial and fungal pathogen into miniature laboratory to assess the damage. By exploiting Arabidopsis genomics and their sequence, in hands-finding a crop gene involved in any of the above process is very much simpler than before.
Year 2000 must be remembered as the year in the field of biology. Arabidopsis genome sequence consortium has sequenced 118.7 million base pairs of this plant nuclear genome. This is perhaps the most accurate eukaryotic genome sequence obtained so far.
More than 30 megabases of annotated genome sequence has already been deposited in Gen Bank by a European consortium laboratory. Other eukaryotic organisms are the yeast Saccharomyces cereviseae, nematode worm Coenorhabditis and fruitfly Drosophila melanogaster.
Although Arabidopsis thaliana is weed and does not have any agronomic value, but it has several advantages for genome analysis including short life span, tiny size, small number of chromosome and relatively small nuclear genome.
These advantages have consequently influenced scientific community like international collaboration on the Arabidopsis Genome Initiative (AGI) began sequencing the genome in 1996. This group was well funded to provide a high quality annotation of the entire genome.
Six research groups in Japan, Europe and United States have collaborated on the sequences. The genome sequencing groups are involved in sequencing and annotated assigned chromosomal region. The genome analysis group carried out the analysis and contribution authors interpreted the genome analysis, incorporating other data and analysis.
Sequencing Strategies of Arabidopsis Genome:
The Arabidopsis Genome Initiative Group employed large insert bacterial artificial chromosome (BAC), phage (P) and transformation, competent artificial chromosome (CAC) libraries as the primary strategies for sequencing.
Similarly, combined strategies using STS selected markers and physical map construction by P, TAC and BAC clone were used for searching chromosome 3. Identification of cytoplasmic tRNA was earned out by using tRNA scan-SE. A combination of algorithms was used to define gene structure.
Based on the implications of sensitivity of the gene, prediction softwares predicted nearly 25, 498 genes. On comparison C. elegans has 19,099 genes and Drosophila 13,601 genes. Arabidopsis and C. elegans have similar gene density whereas Drosophila has a lower gene density. Majority of the Arabidopsis predicted gene functions are involved in metabolism, gene regulation and in defence mechanism.
Nearly 8.23% of Arabidopsis protein transcription has similar to gene in other eukaryotic organisms. In contrast, 48-60% of the genes involved in protein synthesis have similar genes in other eukaryotic genomes. A total of 11,601 protein types were identified (Fig. 25.2).
It has been observed that the absolute number of Arabidopsis gene families could represent same range as the other eukaryotes indicating that a protein 11,000 to 15,000 types is sufficient for a wide diversity of multicellular life.
Arabidopsis genes are compact and contain exons of about 250 base pairs, intervened by short non-coding intron regions. The genes are closely spaced, about 4.6 kilobases apart. These small nature of genes helped Arabidopsis to extensive re-organisation. The exons in Arabidopsis are richer in guanine and cytosine base (44%) than in introns (32%). This ratio correlates a distinctive feature of plant genes.
Complete sequencing of Arabidopsis genome provides broader insight into the complete view of chromosomal organisation. Its genomic sequence revealed 1,528 tandem arrays containing 4,140 individual genes. This shows that nearly 17% of cell genes of Arabidopsis are arranged in tandem arrays. Blast software used to identify colinear cluster of genes residing in duplicated segments.
Several software is in pipe line to annote Arabidopsis thaliana (Table 25.2). Transposons are existed in Arabidopsis and account for atleast 10% of the genome. This genome has a class I and II elements. Several passenger genes have been known to be present in transposable units. Transposons like copia and gypsy, long and short interspersed nuclear elements are well repressed in Arabidopsis.
Complete View of Arabidopsis Chromosome:
The first chromosome of Arabidopsis measures 29,106,111 base pairs. The top and bottom arm contains sequence of around 14,449,213 and 14,655 respectively. The total number of genes present in chromosome is around 6,543. The total length of 35,000 exons represents 8,192,559 base pairs. Similarly, no. of introns goes up to 28,430.
Chromosome two contains 19,646,945 base pairs (19.6 mb) in length of which 3,607,090 and 16,039,851 genes to top and bottom arm respectively. An average of 4,030 genes has found to be residing in chromosome two. Average gene length is around 1,949 base pairs and peptide length (bp) is 42%. Sequencing of chromosome 2 was completed and published in 1999 by Arabidopsis Genome Initiation Group (TIGR). This group contributing one third of the entire sequence.
Chromosome 3 is sub-metacentric and represents about 20% of the Arabidopsis genome. Both yeast artificial chromosome (YAC) and transformation competent artificial chromosome (TAC) were employed in the broader investigation on chromosome 3 of Arabidopsis.
All the technique leads to the discovery of 23,172,617 base pairs of non-redundant sequence. The bottom arm containing 9,586,343 base pairs, whereas top arm are enclosed with 13,590,268 base pairs. The coding and non-coding region of chromosome 3 is accountable to 35% and 44% respectively.
Nearly 43% of the DNA potentially encodes proteins. Chromosome 3 harbours unexpected guest like 5 kb chloroplast DNA. Chromosome 3 containing a number of genes which are homologous to human disease genes. Human genome project has been published. Based on the analysis of human genes, several clues can be had about functions of these genes in plant.
Least number of genes could be seen in chromosome 4 (3,825). The total length of the chromosome is 17,549,867 bp. with its top arm occupies 3,052,106 bp. The average length of genes in chromosome is around 2,138 bp while average peptide length is 448 bp. Total numbers of exons and introns accommodated in chromosome 4 are 20,090 and 16,240 respectively. The total prediction (3,825) coded by genes in chromosome 4 was characterised by INTERPRO software.
The chromosome 5 is 26 megabase lengths and it is second largest Arabidopsis chromosome and almost represents 21% of sequenced regions of the genome (The cold spring Harbor and Washington University sequencing consortium 2000). In the sequencing strategy, chromosome 5 was analysed by using number of overlapping bacterial artificial chromosome (BAC), phage (P1) and transformation-competent artificial chromosome.
The total sequenced genes comprise 25,953,409 bp. The average gene density on chromosome 5 is about 1 gene 4.4 kb. Gene density varies along the chromosome, with low gene density close to the centromere region. The chromosome 5 display 5.874 number of genes with average gene length is 1,974 bp.
However, chromosome 5 contains 4,110 gene encoding proteins of predicted functions of which several major classes of proteins involved in metabolites (21%), transcription (18.6%) and defence (11.9%). This chromosome also contains 37 families of genes encoding proteins of predicted functions. Several genes on chromosome 5 exhibit high degree of similarity to genes of known functions in other organisms that have not been previously identified in Arabidopsis.
Interestingly, eighty eight genes on chromosome 5 have high similarity to the 289 genes involved in human diseases. Most of the genes have been found to be conserved well in other organs such as Drosophila and Caenorhabditis revealing significant potential for Arabidopsis genome analysis and to our knowledge of human genome condition.
Annotation of Flowering Pathway in Arabidopsis:
Annotation can be interpreted, as the attachment of information to a genomic sequence. Annotation of the long day plant Arabidopsis have revealed several pathways that regulate floral induction and regulate flowering induction in determining chromatin structure are also involved in floral induction.
Many photoreceptor such as phytochromes (PHy) and cryptochromes (CRY) are involved in floral induction: Arabidopsis contain five PHy and two CRY genes. In addition, circardian genes are well conserved in Arabidopsis. They include family of gene PSEUDO RESPONSIVE REGULATORS (APRR).
Atleast two MYB related genes have been identified in Arabidopsis circardian clock system-like LHY and CCAI. Major quantitative trait locus (QTL) genes for flowering time in Arabidopsis are CONSTANTS (a circardian clock related floral regulator) and CONSTANT-LIKE (Coll and Col2).
FLOWERING LOCUS C (FLC) and MADS-box genes are key regulator of flowering and vernalization pathways in Arabidopsis. Certain genes in Arabidopsis belongs to polycomb group which effect floral inductions are VRN2, EMBRYONIC FLOWERING (EMF2) and FERTILIZATION INDEPENDENT ENDOSPERM (EIE) VRN2 can affect chromatin structure to repress gene expression. Similarly, HETEROCHROMATIN PROTEIN 1 (LHP1) genes may be involved in heterochromatin formation and affect flowering time in Arabidopsis.
The genes CO and FT were recently identified as flowering time QTLs namely Hd, and Hd3a. The PHOTOPERIOD SENSITIVITY 5 (SE5) gene which encodes a hemooxygenase for phytochrome chromophore biosynthesis is indispensible for photoperiodic flowering. In addition, to diurnal mRNA expression pattern of circardian clock genes LHY are well conserved.