This article provides an overview on cellular and molecular aspects of biodiversity on Earth.

Introduction:

Biodiversity is the variability among living organisms from all sources, including land- based and aquatic ecosystems, and the ecosystems of which they are part. These include diversity within species, between species, and of ecosystems.

Diversity is the key to ensuring the continuance of life on Earth. It is also a fundamental requirement for adaptation and survival and continued evolution of species.

As a recent addition to the English language, with roots in both ecology and evolution, the definition of the word ‘biodiversity’, however, is still evolving. Botanists and Zoologists typically consider biodiversity to refer to the variation and frequency of organisms within a given area, whereas evolutionary biologists may prefer to include in this definition the genes that contribute to the variation within a species as well.

In either case, biodiversity can generally be defined as the total composition of evolutionary units in a given environment. Taxonomists who sought to identify and classify the entire consortium of species in a given environment have traditionally influenced studies of biodiversity.

By extensive study of many environments, a list of all of the world’s species could be devised. Studies in this area have consisted of random samplings of a given environment, and have focused primarily on eukaryotic organisms that can be collected and described macroscopically. To date, approximately two million species have been described and named.

This gives rise to an estimated 5-10 million species as the total number of eukaryotic species globally. We have a reasonable understanding of most of the birds, mammals, reptiles and amphibians, with respect to their numbers and distribution across the globe. The same is not true of plants, fungi, insects and other invertebrate species-virtually nothing is known of the majority of these, except that they exist.

Rise of Biological Diversity:

The history of global biological diversity is best seen in the marine animals since the ocean is where life started, and marine animals are the best represented in the fossil record. Multicellular animals first appeared about 600 million years ago in the early Paleozoic and there was a rapid rise in number of families during the Cambrian and Ordovician. Diversity remained relatively constant (perhaps even declining) up until about 200 million years ago and then it rose again to its current all-time high of close to 800 families.

Four Eras are Recognized, and are Characterized by Typical Life Forms:

1. Precambrian : The origin of life.

2. Paleozoic (“Ancient Life”): The origin of plants, most invertebrate types, the first vertebrates (back-boned animals, including fishes, amphibians, and reptiles).

3. Mesozoic (the “Age of Reptiles”): The origin of flowering plants, dinosaurs, birds, and mammals.

4. Cenozoic (the “Age of Mammals”): The diversification of flowering plants, insects, birds and mammals, and the appearance of humans.

The eras are divided into periods. Biological diversity was dramatically depleted by five mass extinction episodes at the ends of the Ordovician, Devonian, Permian, Triassic and Cretaceous periods. At each of these times a large fraction of existing species was wiped out, leaving the survivors to repopulate the biological world.

The most famous of these was the extinction at the end of the Cretaceous because this ended the age of dinosaurs and made possible the evolution and dominance of mammals. But it was not the most devastating of the mass extinctions. The Earth was formed about 5 billion years ago.

By the end of about first 1 billion year, however, microbes with the ability to produce oxygen were becoming widespread, releasing large quantities of this reactive molecular gas into the oceans and atmosphere. Many of these microbes persist today; for example, blue-greens (cyanobacteria) use light from the Sun and chlorophyll to convert carbon dioxide and water into “free” molecular oxygen and produce essential organic substances such as carbohydrates.

Other bacteria use bacteriochlorophyll and other photosynthetic proteins to convert light to metabolic energy. Bacteria formed microbial mats on land as early as three billion years ago. Fossilized remnants and other biochemical evidence from South Africa suggest that photosynthetic bacteria (primarily cyanobacteria) may have colonized the wet surface of clay-rich soil during rainy seasons, but were blanketed by aerosol deposits laid down during subsequent dry seasons.

Such mats may have formed in surface pools, water edges, and other wet spots on land. A series of giant meteorites essentially sterilized the planet about 3.8 billion years ago. Rocks 3.5 billion years old contain microfossils of primitive one-celled organisms without a nucleus (“prokaryotes”) resembling bacteria and blue-green algae (cyanobacteria), and carbon isotope ratios characteristic of biological materials, representing the earliest clear signs of life.

The first cells with a nucleus (“eukaryotes”) appeared 2 billion years ago, and the first organisms made up of many cells (multicellular algae) appeared about 1.8 billion years ago. In addition to prokaryotes and eukaryotes, a third major group of organisms, called Archaea, consisting of about 500 species but making up about 30% of the biomass on Earth was not discovered until 1977.

They live in the most extreme environments on Earth – the hottest, coldest, and highest-pressure environments, so they are sometimes called ‘extremophiles’. Most of their known biomass is in the Antarctic. I have started this article with biodiversity of photosynthetic prokaryotes because of this pivotal role that they have played in the early part of the origin of life on earth.

1. Diversity of Photosynthetic Prokaryotes:

The Major Groups of True Phototrophs:

Based on the sequence diversity of 16S rRNA (also see later), there are five phylogenetically distinct groups of photosynthetic prokaryotes within the domain Bacteria : the cyanobacteria, the purple bacteria, the green sulfur bacteria, the heliobacteria, and the green filamentous bacteria (Stackebrandt et al. 1996).

The taxonomy of prokaryotes faces unique challenges in recognizing distinct species. Different taxonomic systems, the rapid addition of new species, and the changing of names of existing species make it difficult to keep track of all genera and species. There is not a consistent higher-order taxonomy in place at this time for all of the phototrophs.

Some of the phylogenetic groups are discussed here and note is made of some of the genera within them. Among the five groups, the cyanobacteria are distinguished from all the others by being oxygenic. With their chlorophyll-a based dual photosystems, they are able to oxidize water and produce oxygen as a waste product of photosynthesis.

The oxygenic chloroplasts of the photosynthetic Eucarya and the prochlorophytes (presently put under cyanobacteria) are also in the same group. There is substantial diversity within the group and approximately 50 genera are currently recognized. All of the other phototrophs are anoxygenic, cannot use water as a reductant, do not have chlorophyll-a (Chl a) and have only one photosystem.

As far as we know, no photosynthetic organelles exist in the Eukarya derived from any of these anoxygenic photosynthetic groups. The non-photosynthetic mitochondria, however, may have evolved from purple bacterial relatives in the Proteobacteria (Pierson, 2002).

A large variety of characteristics can be analyzed to describe the diversity of Phototrophs. The distribution of many of these characteristics crosses the phylogenetic boundaries of the five groups. For this reason it will be convenient to discuss the diversity of photosynthetic prokaryotes by discussing the diversity of their characteristics rather than discussing the diversity of the phylogenetic groups.

Major metabolic features, such as the ability to fix nitrogen, tolerance of oxygen and capacity to respire are important characteristics of diversity that are factors in the environmental distribution of many phototrophs. The presence of specific carotenoid pigments that protect against photo-oxidative damage and the biosynthetic pathways for the production of these pigments are also important characteristics.

Again, cell wall structures, the presence of gas vesicles, flagella, gliding motility, the ability to form spores etc. are definitely relevant to the environmental distribution of many phototrophs. In this part of the review only those characteristics that relate directly to the process of photosynthesis has been explored, because that is the process that distinguishes these bacteria from other closely related species.

Additionally, as it is impossible to address all the aspects of this photosynthetic diversity in one sub-heading, this part will focus on a limited suite of characteristics that relate directly to the process of photosynthesis: diversity of the chlorophyll pigments, diversity of reaction centers, diversity of light-harvesting systems and associated ultra structural apparatus, diversity of CO2 fixation pathways, and diversity of reductants used in photosynthesis. After considering these characteristics, the diversity of the environmental distribution of phototrophs will be examined.

(a) Diversity of Photosynthetic Pigments:

Photosynthesis is carried out not only by green plants but also by many bacteria that contain photosynthetic pigments. Chlorophyll molecules are essential for the process of photosynthesis in all phototrophs and form the heart of the photosynthetic apparatus. A total of 10 different chlorophyll molecules are found in the photosynthetic prokaryotes.

Many differ only slightly in structure and absorption properties; others have much greater differences. The chlorophyll molecules are derivatives of porphyrins that are complex ring structures. They have a magnesium ion in the center of the ring structure. Recently some naturally occurring bacterio-chlorophylls containing zinc instead of magnesium were found in photosynthetic bacteria growing under acidic conditions.

In their native state within the cells, chlorophyll molecules are bound to proteins as pigment/protein complexes. These complexes, in turn, are associated with cell membranes or with special light-harvesting structures such as chlorosomes. The interactions among the chlorophylls and the protein molecules alter the absorption properties of the chlorophylls.

Hence the same type of chlorophyll will have different absorption bands when associated with different proteins within the cell or with different proteins present in different species. This is the source of the variation in and wide range of absorption properties in these bacteria (See Table 1).

Diversity of photosynthetic pigments

Chlorophylls a, b, and d as major pigments are limited in distribution to the oxygenic phototrophic prokaryotes. Cyanobacteria contain Chl-a and phycobilin pigments. The prochlorophytes within the cyanobacterial group contain Chl-a and b and the marine prokaryote Acaryochloris marina (cyanobacteria) contains Chl-d (3-desvinyl-3-formyl Chi a) as its major pigment (Miyashita et al. 1997: Schiller et al. 1997).

The primary reaction center pigment is Chl-a in the oxygenic phototrophs. It is the only chlorophyll pigment present for light harvesting in the cyanobacteria. But the phycobilins serve as major accessory pigments for light harvesting in these organisms. The anoxygenic phototrophs contain one or more bacterio-chlorophylls (BChl).

Their reaction centers contain different bacterio-chlorophylls (BChl a, b or g). Their light harvesting machinery also contain BChl a, b or g as well as c, d or e. Photosynthetic prokaryotes depend on light for their livelihood. The enormous range in wavelengths absorbed by these organisms is due to the great diversity of chlorophyll molecules (see Table 1) they possess, enhanced further by the variations in the complexes formed by the pigments with their proteins.

The presence of carotenoid and phycobilin molecules further extends the light- absorbing range of the phototrophs. All the diverse phototrophs taken together can use the entire spectrum of visible and near-IR radiation from 350 to 1,020 nm to sustain photosynthesis. In this way diverse phototrophs can coexist in a wide range of environments without becoming limited by light. Their coexistence is often seen as a discrete vertical stratification in aquatic and sediment environments.

Structure of Four Chlorophylls 

(b) Diversity of Reaction Centers:

The actual photochemical energy conversion occurs in the reaction center, which is a large membrane-spanning protein complex containing one of the four chlorophylls (Chlorophyll-a, Bacterio-chlorophyll a, b or g) Fig. 5-1. Because the reaction center is the heart of the photochemical energy-conversion process, the key to the origin and evolution of photosynthesis lies in the origin and evolution of diversity in these complexes.

The evolution of the light-harvesting systems, electron-transport chains, and CO2-fixation pathways are interesting aspects of the evolution of diversity of photosynthesis as well. While these latter systems are accessory processes that may enhance the overall efficiency of energy conversion and provide certain advantages for growth under various environmental restrictions, they are secondary in importance to the function of the reaction center.

The diversity found among the reaction centers is less than that found in the light-harvesting systems. As noted, only four different chlorophyll molecules function in photochemistry, whereas many more chlorophylls, as well as carotenoids and phycobilins function in light harvesting.

Basics of energy conversion process in photosynthesis

The essence of energy conversion in photosynthesis is summarized in Fig. 5-2. In photosynthesis the reaction centers (RC) use light energy to produce charge separation in the form of an oxidized dimer or special pair of chlorophyll molecules that lost an electron while in the excited state. There are two functionally distinct types of reaction centers, which differ in the nature of the electron acceptors.

The path of the electron from the excited special pair is through pheophytin molecules to quinone molecules and then to the Electron Transport system (ETS) in pheophytin-quinone reaction centers (RC2). These Re s are found in purple bacteria and green filamentous bacteria and in photosystem II (PS II) of cyanobacteria.

In Fe-S reaction centers (RCI), the electron moves from the special pair to a low-potential Fe-S protein. These RCS are found in green sulfur bacteria and heliobacteria and in photosystem I (PSI) of cyanobacteria. In addition to the functional similarities among the RCS in each of the two groups defined above, there are small but recognizable sequence similarities among the RC proteins that correlate with each group.

Either Chiorophyll-a or Bacterio-Chlorophyll-a can function as the special pair in both types of reaction centers. All species of anoxygenic photosynthetic bacteria within a phylogenetic group have only one functional type of RC. The cyanobacteria are exceptional in that all species contain both types of RC.

The simplest of the RCS is the FeS RCl of ne heliobacteria, which contains Bacterio-Chlorophyll g. The green sulfur bacteria have a homodimeric RC. The cyanobacterial photosystem 1 RC is a large complex heterodimeric protein with Chlorophyll-a as the RC pigment. Cyanobacterial photosystem II has more polypeptide subunits, but the essential photochemical core is composed of two subunits (D1 and D2) that have some sequence homology to the purple bacterial L and M proteins.

Photosystem II differs from all of the other RCS because of its high redox potential and association with the water-oxidizing protein complex that produces oxygen. The cyanobacterial photosystems II and I are linked by an electron-transport chain when water is being oxidized.

Several cyanobacteria, however, can operate photosystem I alone using electron donors, such as hydrogen and sulfide (Cohen et al. 1986). Based on function, the nature of the electron carriers, and limited sequence homology, it appears that all the pheophytin/quinone RC2s are related to each other.

Likewise, the Fe-S rcis appear to be related to each other. It is not as clear whether these two major types of rc have a common origin. If not, then the process of chlorophyll-based photosynthesis would have arisen twice. These fundamental questions on the origin of photochemical reaction centers are difficult to answer.

(c) Diversity of Light-Harvesting Systems:

The diversity of light-harvesting systems in the photosynthetic bacteria is much greater than that of the RCS. The size and number of light-harvesting units in some bacteria vary with environmental conditions. Photosynthetic bacteria growing under light-limited conditions proliferate more light-harvesting pigment per cell.

The amount of light-harvesting pigment vastly exceeds the amount of RC pigment, the latter accounting for 1% or less of the total pigment. The function of the light- harvesting pigments is to keep the reaction centers from running out of light energy. Some bacteria have complex and diverse light-harvesting pigments, arranged in highly structured arrays or antennae to efficiently collect and funnel the absorbed light energy to the RCS.

The enormous diversity of light-harvesting pigment/protein complexes among the photosynthetic bacteria is directly responsible for their ability to grow in dense complex communities, different members of the community able to use different wavelengths of light. Phototrophs often grow in layered or stacked communities in response to various environmental factors.

The density of particular organisms in one layer can be so great that they attenuate all the incident light in the wavelength ranges absorbed by their light harvesting pigments. Beneath such a layer, the environment becomes dark as far as those wavelengths are concerned. The non-absorbed wavelengths are transmitted through such a thick layer, however.

The diversity in the light-harvesting (LH) complexes allows other bacteria to use different wavelengths of light that may still be available in these situations (Pierson et al. 1990). The diversity of light-harvesting pigments thus has a profound effect on the ecologic distribution of phototrophic bacteria and permits the growth of light- dependent phototrophs in any early environment exposed to light and suitable for life no matter what the restriction of wavelengths (light quality) due to environmental conditions, including the presence of other phototrophs.

Because of the different light- harvesting pigments and the density of organisms present such layered communities are conspicuously colored and readily detected by the naked eye. The evolution of such diversity in light-harvesting systems was most likely driven by the competition for light.

Since shorter wavelength light penetrates relatively well in water, diversification of shorter wavelength pigment complexes such as carotenoids and phycobilins could have occurred in aquatic habitats. Since the far red and near-IR wavelengths penetrate poorly in water it is more likely that the enormous diversity of red and near IR-absorbing Chl and BChl protein complexes evolved in response to light competition in crowded shallow microbial mat communities (Pierson et al. 1990).

The heliobacteria have the simplest and the smallest LH system. BChl g is the major light-harvesting pigment and absorbs maximally in vivo at 788 nm. There are no separate LH protein complexes. The LH BChl g molecules are associated with the same pigment protein core complex making up the RC. The purple bacteria have either BChl a or b serving both LH and RC functions but never both pigments.

The filamentous green bacteria and green sulfur bacteria differ markedly from the other bacteria discussed so far in having accessory chlorophylls for LH. These accessory chlorophylls are housed in distinct antenna structures called chlorosomes. These small ovoid bodies are appressed to the cytoplasmic side of the cell membrane, which houses the RCs (Oeize and Golecki, 1995).

The accessory chlorophylls include BChl c,d, and e. The chlorophylls are aggregated in the chlorosomes with relatively little protein. In the green sulfur bacteria, the chlorosomes are entirely peripheral in location and there are no intra-cytoplasmic membranes. In the green filamentous bacteria the chlorosomes are smaller (106 by 32 nm) (Oeize and Golecki, 1995). The most studied of these organisms, Chloroflexus aurantiacus has peripheral chlorosomes lining the cell membrane.

The cyanobacteria have highly structured antenna complexes involving several different LH pigments. Phycobilisomes (PBSs) can be seen in electron micrographs covering the thylakoids (photosynthetic membranes) in cyanobacteria. They are composed of two or three pigment protein complexes that are structured to funnel excitation energy to the RCs in the membrane. Phycoerythrin (absorption maximum at 565 nm) and phycocyanin (absorption maximum at 620 nm) are the peripheral LH chromoproteins. Phycocyanin is present in all cyanobacteria, whereas only some contain phycoerythrin (Grossman et al. 1995).

The presence of phyobiliproteins greatly enhance light absorption in the green part of the spectrum. Cyanobacteria can alter the proportions of the different chromoproteins in response to changing environmental conditions by a process of chromatic adaptation (Grossman et al. 1995).

Other oxygenic phototrophic prokaryotes include the prochlorophytes, which contain Chl a/b antenna complexes for light harvesting (Partensky et. al. 1997). Some strains of Prochlorococcus marinus also contain small amounts of phycobiliproteins (Hess et al. 1996).

A marine phototrophic prokaryote was described that contained large amounts of Chl d (absorption maximum at 714-718 nm) as its major pigment (Miyashita et al. 1997). Small amounts of Chl a present in the cells may function in the RCs. The cells have peripheral thylakoids like other cyanobacteria and lack phycobilisomes, as do the prochlorophytes.

(d) Diversity of Photoautotrophy:

Metabolic diversity among phototrophs is vast. One of the most interesting aspects of the diversity of photosynthetic bacteria is that while all phototrophs make energy by photosynthesis in the form of ion gradients or ATP, many do not grow primarily as autotrophs. This situation is quite unlike that of their eukaryotic counterparts.

Photoheterotrophy is the major or preferred form of carbon metabolism among many of the purple bacteria (previously described as the non-sulfur purple bacteria). It is also the preferred metabolism for most of the strains of Chloroflexus (filamentous green bacteria) in pure culture.

It is the only form of carbon metabolism for the heliobacteria, which are apparently incapable of autotrophy. Thus it is important to recognize that photosynthesis is not synonymous with autotrophy and the two processes are always recognized as distinct and treated separately by microbiologists who study photosynthetic prokaryotes.

The two processes (photosynthetic energy conversion and autotrophic metabolism) most likely have independent evolutionary histories. The process of photosynthesis generates the ATP and reducing power (though often indirectly) needed for CO2 fixation in the phototrophs that also happen to be autotrophs.

(e) Carbon Dioxide Fixation Pathways:

Among the phototrophs that are autotrophs, there is some interesting diversity in CO fixation pathways. The autotrophic purple bacteria and the oxygenic cyanobacteria use the reductive pentose phosphate pathway (Calvin Cycle) for CO2 fixation, just as in higher plant and algal chloroplasts and in many chemolithoautotrophic prokaryotes (Tabita, 1995).

The green sulfur bacteria stand alone among phototrophs in using a reductive tricarboxylic acid (TCA) cycle for their obligate autotrophic growth (Sirevag, 1995). When grown autotrophically, Chloroflexus aurantiacus uses the unique 3-hydroxypropionate pathway for CO2 fixation (Sirevag, 1995).

(f) Sources of Reducing Power:

The diversity of reductants used to sustain photosynthetic CO2 fixation is considerable and distributed across several groups of phototrophs. The only group-specific reductant is the use of water by the oxygenic prokaryotes — the cyanobacteria. As far as is known, water can be used as a reductant only by the high-potential photosystem II RCs with the higher energy of a Chl a special pair and only when this RC is functioning in series with PSI.

The oxidation of water requires the presence of the manganese-containing water oxidizing complex associated with the PSII RC on the inside surface of the thylakoid membranes (Blankenship and Hartman, 1998). The product of the reaction is oxygen.

Other reductants are not so group specific. Hydrogen and reduced sulfur compounds can serve as reductants for photoautotrophy in the purple bacteria, green sulfur bacteria, and green filamentous bacteria. In some cyanobacteria PSI can function independently of PSII and sustain autotrophy with reducing equivalents from hydrogen or hydrogen sulfide thus enabling these bacteria to perform anoxygenic photosynthesis (Cohen et al. 1986 ; Camacho et al. 1996).

Most recently, ferrous iron has been shown to sustain photoautotrophic growth in some species of purple bacteria (Ehrenreich and Widdel, 1994). Ecologic data suggest that it may function as a reductant in other phototrophs as well, including cyanobacteria (Cohen, 1989; Pierson et al. 1999).

2. Microbial Diversity in Soil:

Although most microbiologists agree that the diversity of microbial communities in soil is extraordinary, they do not necessarily agree on how that diversity is best measured. Diversity is composed of two elements: richness and evenness, so that the highest diversity occurs in communities with many different species present (richness) in relatively equal abundance (evenness) (Huston, 1994).

However, there are fundamental difficulties associated with determining the richness and evenness of communities composed of microbes whose morphologic traits generally convey little physiologic or phylogenetic information. The traditional way to study microbes is to grow and study them in pure culture. However, this approach has severe limitations for studying soil microbial communities because < 1% of the bacteria present in the soil can be readily grown on standard laboratory media (Torsvik et al. 1990).

Thus soil microbial communities are usually studied by examining the presence of microbial biomarkers in the soil. These biomarkers are frequently molecules such as lipids, proteins, or nucleic acids that convey either phenotypic or genotypic information about the microorganisms from which they originate. Phospholipid fatty acids (PLFAS) make up one group of biomarkers commonly used to study changes in microbial community structure in soil.

Because the PLFA composition of membranes changes in response to the physiologic condition of the cell, these markers provide phenotypic information about microbial communities, PLFA profiles have been used to determine whether soil microbial communities are similar or different, but generally it is difficult to identify the organisms that account for the similarities or differences among these communities (Zelles, 1999).

Additionally, the genes that encode for ribosomal RNA (rRNA) have a low rate of evolutionary change and are conserved among all cellular life forms, making them useful in examining phylogenetic relationships among organisms (Woese, 1987). The nucleotide sequences and secondary structures of rRNAs consist of conserved domains found in all living organisms and variable domains that contain sequence motifs specific for groups of related organisms or even individual species.

As a result, the rRNA-encoding genes have proved to be valuable biomarkers for studying the richness and evenness of microbial communities in natural environments. Also, methods that rely on polymerase chain reaction (PCR) to amplify nucleic acids can be used to examine the richness of microbial communities, their ability to accurately represent the evenness of species in the community may be limited by biases imposed during amplification and cloning steps (Wintzingerode et al. 1997).

The evenness of microbial communities can be measured by using DNA probes to determine the abundance of specific microbes or microbial groups within the community by using either nucleic acid hybridization or fluorescent in situ hybridization (FISH) techniques. Nucleic acid hybridization is used to measure the abundance of either DNA or RNA from microorganisms in a community, whereas fish allows specific microbial cells to be identified by using epifluorescent microscopy.

Scale of Diversity Measurements:

The phylogenetic scale used to measure microbial diversity is no less important a consideration than the physical scale at which diversity is measured. For example, since the sequence of the 16S rRNA changes slowly over time relative to the rate at which microbes evolve new traits, organisms with similar 16S rRNA genes can have different genetic characteristics encoding distinct phenotypic traits.

As a result, studies of microbial diversity that focus solely on the 16S rRNA gene can underestimate community richness. This does not mean that the 16S rRNA gene is a poor choice of a molecule to use in evaluating diversity in microbial communities. On the contrary, its slow rate of change is what makes it a useful measure of the evolutionary history of an organism.

Rather physiologic traits predicted by differences in the 16S rRNA gene are those that take a long time to develop. In contrast, the nucleotide sequence of a protein-encoding gene generally changes more rapidly and tends to reveal higher levels of diversity. However, any measure of diversity based on a single functional gene still underestimates the actual diversity present in a soil community, because organisms that have identical DNA sequences at one locus can have multiple differences in other loci.

As a result of these considerations, the extent of microbial diversity that is measured in any system is proportional to the phylogenetic resolution of the method used. The most comprehensive analysis of the genetic diversity of a microbial community requires characterization of every distinct genome present in the community. Recent advances have made possible the analysis of large portions of the genomes of soil organisms (Rondon et al. 2000). Further development in the analysis and interpretation of such data could reveal valuable insights into the diversity of microbial communities in the soil.

3. Marine Prokaryote Diversity:

Nearly a decade after the first studies of that respond to RNA genes from planktonic systems environmental stimuli a more complete picture of diversity of marine prokaryotes is now emerging. A total of 703 sequences of 16S rDNAs of marine bacteria directly cloned from seawater or macro-aggregates samples, as well as those of 454 bacterial strains isolated from sea-water and sea-ice, are classified according to the RDP system.

In the cultivation-independent studies, 16S rDNAs were retrieved from different depths at several different oceanic and coastal regions. Most cultured isolates were recovered from surface water samples. Since 16S rDNA-based phylogeny is now the standard identification method, nearly all newly isolated cultures of under scribed strains have a corresponding 16S rDNA sequence. As of today, >95% of all officially named species isolated from seawater are represented by a 16S rDNA sequence.

Micro-Heterogeneity in Highly Related rRNA Genes:

Another unexpected finding, discovered in the first cultivation-independent analysis of marine bacteia (Giovannoni et al. 1990), was the high diversity of similar rRNA genes within single populations. Clusters of highly related, co-occurring rRNA genes are now commonly observed in almost all cultivation-independent surveys.

These clusters appear in many disparate phylogenetic lineages. The prevalence and frequency of these clusters in unrelated taxa suggest that they reflect fundamental evolutionary and ecologic processes.

One study examined two co-occurring uncultured Crenarchaeota variants that differed by only 0.08% across their entire 16S and 23S rRNAs (Schleper et al. 1998). Examination of >28 kilo base pairs of DNA sequence flanking the rRNA operons revealed that these variants had identical gene arrangements, and the homologous protein encoding genes shared high similarity.

These data also support the notion that naturally occurring highly related rRNA sequence variation reflects true organism-level variation. Another study showed that naturally co-occurring isolates of Prochlorococcus spp., with 97% 16S rRNA sequence similarity, had strikingly different physiologic properties.

Planktonic Archaea:

Archaeal 16S rDNAs have been cloned from seawater samples but the microorganisms harboring these rRNA genes have not yet been culivated. Both major subdivisions of the Archaea-the Crenarchaeota and the Euryarchaeota – are represented in marine plankton.

A preliminary survey of marine plankton using “universal” rRNA gene PCR primers first revealed archaeal rRNA sequences from 100- and 500-m depths in the Pacific Ocean (Fuhrman et al. 1992). These oceanic archaeal rRNA genes were most closely related to those of cultivated Crenarchaeota—a branch of Archaea then thought to consist solely of hyperthermophiles.

After these initial reports, evidence for a widespread distribution of new uncultivated Archaea was found in many different marine plankton samples. In addition, a few rRNA gene clones of a third euryarchaeotal group, referred to as group 3 and peripherally related to group 2 euryarchaeota, were isolated from marine plankton (Fuhrman and Davis, 1997).

More recent studies have revealed the presence of group 3 in association with coastal marine sediments (Munson et al. 1997, Vetriani et al. 1998). As with many other prokaryotic groups, highly related rDNA sequence variants ( >97% sequence similarity) of planktonic Archaea have typically been isolated from individual samples. These clusters of highly similar rDNAs apparently reflect the presence of multiple strain variants that coexist and presumably contribute to the overall diversity of the archaeal gene pool.

After their initial detection, relatives of the crenarchaeotal group I plankton were found in many other habitats, including freshwater and marine sediments, animals, terrestrial soil, deep-subsurface paleosols, and an anaerobic digestor (Suzuki and DeLong, 2002). Despite this, all the crenarchaeotal sequences derived from marine plankton cluster closely together and are not found within the soil-, sediment- or freshwater-derived groups.

Most of the euryarchaeotal group 2 sequences have also been derived from marine plankton, but a recent report suggests that these Archaea may also be found in the intestinal microflora of marine fish (van der Maarel et al. 1998).

A few group 3-euryarchaeota rDNAs were isolated from plankton (Fuhrman and Davis, 1997); but much larger proportions of group 3 rDNA clones have been recovered from coastal marine sediments, indicating that marine sediments may be the natural biotope for this group (Munson et al. 1997; Vetriani et al. 1998).

Planktonic Bacteria:

The sequence database comparison between planktonic clones 16S rDNA sequences and those of strains isolated from seawater in general agree with the results of two studies directly comparing 16S rDNA sequences from environmental clones and cultivated strains from the same water samples (Benlloch et al. 1995). In terms of numbers, major RDP bacterial groups constituting the majority of the 16S rDNA sequences retrieved from either cultivated organisms or clone libraries are similar.

Proteobacterial 16S rDNAs represent ca. 70% and 65% of all 16S rDNA of sequenced isolates and clones, respectively. Recoveries in the other groups, in terms of the percentage Of environmental clones versus isolates, are Flexibacter-Bacteroides-Cytophaga, 16% versus 10%; the gram positives, 6% versus 4% ; and cyanobacteria 4% versus 3%. It should be emphasized that this agreement between the 16S rDNA of sequenced isolates and environmental clones holds true only when major bacterial divisions are compared.

More detailed analysis reveals that only in a minority of cases do 16S rDNA sequences of cultivated organisms closely match those directly cloned from seawater. In addition, these statistics refer only to a sampling of 16S rDNA sequences in existing databases and may not reflect percentages of organisms in the environment.

There have been great amounts of work with the marine planktonic bacterial rRNA sequences and even a mention of some of them is beyond the scopes of this review. The readers are encouraged to look into the recent advancements of this rapidly advancing area of marine biosciences.

4. Diversity in the rRNA Sequence:

The rRNA sequences of many organisms have rich diversity. The nucleotide sequences and secondary structures of rRNA consist of conserved domains found in all living organisms and variable domains that contain sequence motifs specific for groups of related organisms or even individual species.

As a result the rRNA encoding genes have proved to be valuable biomarkers for studying the richness and evenness of microbial communities in natural environments. The rRNA is a particularly good marker for phylogenetic studies involving microorganisms for several reasons.

First, the ribosome, which is responsible for protein synthesis, and its rRNA, are found in all organisms on Earth, microorganisms as well as plants and animals. Second, the ribosome is highly conserved because large changes in sequence affect the normal functioning of the critical and complex process of protein synthesis. Third, two of the rRNAs, 16S (or 18S in eukaryotes) and 23S (28S in eukaryotes) rRNA, are sufficiently long (1500 or more nucleotides) that they contain a large amount of evolutionary information, making them suitable for molecular evolutionary studies. In contrast, 5S rRNA, which has also been used for phylogenetic purposes, has more restricted applications because of its limited information content (only 120 nucleotides).

Universal tree of life (Phyla of Bacteria, Archaea & Eukarya)

The pioneering studies of Woese (1987, 1994) have revolutionized our understanding of biologic evolution. The sequence of 16S and 18S rRNA was obtained from a broad spectrum of organisms. From this phylogenetic information it is possible, for the first time, to construct a scientifically based tree of all life on earth.

This Universal Tree of life shows that three domains of life exist: Bacteria. Archaea, and Eucarya (Woese et al. 1990). It is important to recognize that there is no time scale on the tree of life (Fig. 5.3). When rRNA-based approaches were first used to detect and identify novel populations in microbial mats, we saw the beginning of a general reassessment of microbial biodiversity (Teske, A. and Stahl, D. A., 2002), on the large .scale of discovering new microbial kingdoms as well as on the small scale of tracking down apparently minor but significant variations among closely related strain.

For example, morphologically and physiologically, closely related bacteria encountered within a mat habitat are often composed of several genetically and eco-physiologically distinct populations, each filling related niches in a mat ecosystem.

This has been most clearly demonstrated for “strains” adapted to different temperature optima (Ferris et al. 1996b) using molecular tools such as denaturing gradient gel electrophoresis (DGGE) to track changing population abundance along naturally occurring temperature gradients (Ferris et al. 1996a ; Ferris et al. 1997).

Quantitative rRNA hybridization made it possible to quantify the contribution of individual microbial populations to the total rRNA pool of a microbial mat. Since, rRNA abundance is well correlated with microbial growth rate. This measure provides a more direct assessment of the metabolically active populations within a system.

This method was used to localize and quantify different phylogenetic groups at specific depth intervals of cyanobacterial mats (Risatti et al. 1994 ; Minz et al. 1999a). For microorganisms demonstrating some degree of phylogenetic and physiologic cohesion (e.g., sulfate- reducing bacteria, methanogens, and nitrifiers), this approach provides a basis to infer functional contribution.

For example, studies by Minz et al. (1999b) revealed that a population related to described Desulfonema sp. contributed to as much as 30% of the biomass in region defined by the oxygen chemocline of a hyper-saline microbial mat. This localization was not anticipated based on the conventional view that this species is restricted to anoxic habitats and suggested a possible close coupling between sulfide- oxidizing and sulfate reducing populations in that mat community. These observations again point to the need for direct observations of organisms inhabiting gradient environments to develop a more complete understanding of their eco-physiology.

5. Diversity of Microbial Heterotrophic Metabolism:

A heterotroph is an organism that utilizes reduced organic compounds as major source of carbon. Fundamentally, at the level of metabolism all heterotrophs have remarkable similarities. For example, the synthesis and role of the nucleic acids are virtually equivalent in all cells, and cells that can synthesize DNA and RNA do so from monomeric substrates.

Similarly, all organisms that can synthesize amino acids and polymerize these to protein do so via equivalent enzymatic reactions. Also, reactions involved in substrate-level phosphorylation and the synthesis of ATP via a proton gradient in diverse species may best be described as variations on a central theme.

However, broad diversity does occur among the heterotrophic micro-organisms but this diversity rests mostly in the mechanisms whereby substrates are altered (anabolically and catabolically) to fit into the central metabolic pathways.

The major metabolic pathways followed in synthesizing the various monomers that are polymerized to assemble the major constituents of the cell are now known. Elucidation of the pathways involved in generating the twenty amino acids, purines/pyrimidine’s, B vitamins, fatty acids, and other monomers that combine to form a functioning cell was quite revealing.

This research led us to conclude that all the monomers that are combined to form the macromolecules to make up a cell can be synthesized from a limited number of basic intermediates. These intermediates, termed precursor metabolites, are generated in cells by central core reactions.

Any organism that can grow with a single substrate as the carbon source—whether it is propane, glucose, or acetate— will metabolize the compound via pathways that ultimately yield the precursor metabolites. The precursor metabolites are the primary building blocks for the ultimate synthesis of cell parts and lead to the regeneration of self.

Some important precursor metabolites are— Glucose-6-phosphate, Fructose-6-phosphate, Pentose-5-phosphate, Erythrose-4-phosphate, Glyceraldehyde-3-phosphate, 3-Phosphoglycerate, Phosphoenolpyruvate, Pyruvate, Acetylcoenzyme A, Oxaloacetate, a-Ketoglutarate, Succinyl CoA.

The largest of these consists of six carbon atoms (glucose-6-phosphate and fructose- 6-phosphate). An individual microorganism may follow an alternative route when synthesizing a precursor metabolite during growth on a particular substrate. However, the de novo biosynthesis of a given cellular constituent (amino acid, purine, etc.) is virtually the same in all living cells.

If a microorganism lacks the enzymatic machinery for synthesizing an essential monomeric compound (B vitamin, amino acid, purine/pyrimidine, fatty acid), that microorganism will grow only when that compound is present in the growth medium. Microorganisms that evolved in niches rich in organics, e.g., lactic acid bacteria, may well be dependent on the environment for a supply of selected essential monomers.

The Core Pathways:

The major metabolic pathways involved in generating the 12 precursor metabolites are the Embden-Meyerhof-Parnas (EMP); the hexose monophosphate shunt (HMS); and the Kreb’s, or tricarboxylic acid cycle (TCA). Some species utilize the Entner-Doudoroff (ED) pathway for the generation of triose phosphate. These pathways proceed in a straightforward manner in an organism growing aerobically with glucose as carbon and energy source.

Glucose is considered the most abundant monomeric product of biological synthesis on Earth. Therefore, that glucose has evolved with a major role in biosynthesis and energetics is not remarkable. Escherichia coli, Bacillus subtilize, and a broad array of microorganisms that can grow aerobically on glucose employ EMP, HMS, and TCA.

These pathways provide the microbe with both building blocks and a ready source of energy. The first precursor metabolite synthesized in EMP is glucose-6-phosphate via a simple phosphorylation reaction, with ATP as phosphate donor. The other precursors follow.

However, this step is not available to the population of heterotrophs that grow readily on other simple substrates instead of glucose and this is exactly a situation where the metabolic diversity makes its entry to allow the organism to manufacture the precursor metabolite using alternative routes. Consider Mycobacterium vaccae untilizing acetate as sole source of carbon and energy instead of glucose.

The peptidoglycan in the cell wall of this organism has glucose as a constituent as does E. coli growing on glucose. Similarly, the DNA and RNA of both the E. coli and M. vaccae cultures contain pentoses with triose phosphate in their cellular lipids. Therefore, even if M. vaccae is not able to obtain glucose from the environment, it must manufacture this metabolite in order to grow.

It is essential, therefore, that an organism utilizing a substrate such as acetate generate the precursor metabolites that originate from EMP or HMS. They accomplish this by essentially reversing the reaction sequences ; Acetylcoenzyme A (acetyl CoA) is synthesized from acetate and uncombined coenzyme A (CoASH) and is the starting compound for the synthesis of glucose. Acetyl CoA would also enter into the inducible glyoxylate cycle for the synthesis of the essential precursors α-ketoglutarate, succinyl CoA, and oxaloacetate.

Heterotrophic microorganisms utilize those parts of the pathways that fit their lifestyle. The glyoxylate cycle would be of no value to a microorganism growing with glucose or α-ketoglutarate as substrate. The glyoxylate cycle is an inducible enzyme system that is functional in microorganisms during growth on selected substrates, such as acetate.

Therefore, it follows that the key to metabolic diversity in the microbial world rests in the ability of a given microorganism to manipulate an available substrate to provide intermediates that can enter into the core pathway(s). This manipulation generally occurs via a limited number of reactions that are unique to the organism in question.

Some Examples of How Microorganisms can Modulate their Metabolic Characters Exemplifying Metabolic Diversity, in Response to the Environments, are given below:

(i) The insolubility and sheer size of many macromolecules, including protein and cellulose, precludes their entry into a cell. Microorganisms that utilize these macromolecules must generate enzymes extracellular or at the cell surface to render them of a size suitable for digestion.

To solve this problem, proteins are cleaved by an array of enzymes produced by many bacteria and fungi. The endospore-forming members of the genus Bacillus are among the more effective producers of proteases, and these microorganisms have been exploited for the commercial production of those enzymes. Actinomycetes, such as Streptomyces griseus and several fungal species, are also employed commercially in protease production. The enzymes involved differ somewhat in character, depending on the source, but functionally they cleave a protein into lower molecular weight polypeptides.

(ii) The glucose polymers glycogen, starch, cellulose, and peptidoglycan are widespread in microbes, plant, and animal cells. Polymers that are derivatives of glucose, such as chitin, cellulose, starch, and peptidoglycan, are regularly introduced into the environment. Xylans, polymers of the pentose xylose, are second in abundance to cellulose among the sugar-based polymers.

The polymers, the Enzymes that Cleave them, and Particular Strains of Bacteria that Possess these Enzymes are as follows:

Carbohydrate substrate, Enzyme and Organism

The mode of action of these enzymes differs somewhat, but the results of their actions are quite similar—the production of monosaccharides. The ability of microorganisms to utilize sugars with three to seven carbons is considerably enhanced by enzymes that interconvert the various sugars. Two such enzymes are the transaldolases and transketolases.

These inter-conversions are indispensable in autotrophs in which a constant supply of C5 sugars is mandatory for a functional Calvin-Benson Cycle. Xylans are abundant in nature and are polymers of xylose, a five-carbon sugar. The ready conversion of xylose or xylulose to the other essential sugars can be accomplished by enzymatic reactions using Transaldolase and Transketolase.

(iii) Microorganisms effectively catabolise all of the naturally occurring aromatic compounds that enter the biosphere. Tyrosine, tryptophan, and riboflavin, for example, are suitable substrates for the growth of selected species of bacteria. In the assimilation of these compounds, a substantial part is released as CO2, and the other carbons are catabolised to intermediates that are incorporated into cellular components via the core pathways.

Microorganisms also readily catabolize many non-biogenic aromatic compounds— components of petroleum and chemical synthesis. The bio-degradative pathways for some of the diverse aromatic compounds, both naturally occurring and synthetic, have been elucidated. A broad coverage of these catabolic pathways is not practical in this chapter, but a couple of examples will be sufficient.

Benzene is a major industrial chemical, and several million metric tonnes are produced in the United States each year. A portion of this toxic compound ultimately enters the environment. Microorganisms, including members of the genus Pseudomonas, readily utilize benzene as growth substrate.

There are two major pathways for benzene catabolism. The meta-fission pathway for catechol dissimilation generates pyruvate and acetate. Both of these enter into the core pathways through the TCA cycle. The products of ortho-fission are succinate and acetyl CoA, which, likewise, enter the TCA cycle directly.

These pathways illustrate an important point: A non-biological compound can enter into the core of metabolism by a limited number of enzymatic reactions. In the case of benzene, only six to eight are required. This limited number applies not only to benzene but to virtually all compounds, both naturally occurring and products of chemical synthesis, that are utilized by microorganisms.

An aromatic polymer of widespread occurrence in nature, lignin, the structural component of woody plants, will be considered here because this macromolecule is disassembled by a unique mechanism. Lignin is a compound of virtually unlimited structural variability and is considered the most abundant renewable aromatic compound on earth. It is a certainty that lignin is not utilized by a single microbial species as substrate.

The three aromatic alcohols that combine to form lignin are coumaryl, coniferyl and sinapyi alcohol. These alcohols are copolymerized not by biosynthetic reactions but by free-radical reactions, resulting in a random structure. There are no enzymes known that would have the infinite varied substrate specificity that would be essential to biodegrade lignin.

How, then, is lignin degraded? The haphazard heterogeneous polymer is biodegraded in nature by white rot basidiomycetes via random oxidative attack. Growth on a cosubstrate such as cellulose provides carbon and energy for the fungus. During growth on the cosubstrate, a fungus, such as Phanerochaete chrysosporium, synthesizes peroxidases, which generate peroxides, which in turn “combust” the lignin.

The nonspecific oxidation of lignin yields low molecular weight products that are mineralized by various bacteria and fungi present in these environmental niches. Side chains are cleaved from aromatic products, and the benzoic nucleus is further biodegraded by reactions similar to those discussed for benzene.

Although this example is a prime exception to the dogma that all naturally occurring compounds can serve as substrate for a given species, it is evident that lignin is recycled by the concerted action of a consortium of microbes. Many more unique processes may be revealed as we explore the vast array of microorganisms present in nature that have not yet been characterized.

(iv) The chemical age has provided humans with thousands of chemical compounds. Most of the products of chemical synthesis have some similarity to and can actually mimic naturally occurring intermediates. Chemotherapeutics, insecticides, herbicides, etc. are generally effective because they “substitute” for and thus may interfere with or inhibit natural function.

Other compounds developed by man bear little resemblance to compounds generated by living cells and are effective because they are toxic. While some such toxic non-biological chemicals (benzene and toluene) are readily mineralized by microorganisms, many others are degraded slowly (trichloroethylene, lindane, and dalapon), whereas some others are poorly or virtually non-degradable (mirex or plastics) as these chemicals are quite resistant to the enzymatic machinery available in the microbial world.

Biological mechanisms for hastening the biodegradation of many such molecules are the cooxidative processes akin to those involved in lignin mineralization. Cooxidation occurs when a microorganism, while growing on a utilizable substrate, oxidizes a non-growth substrate it encounters in the immediate environment.

Generally, the enzymes involved in cooxidation are oxygenases that function in a manner similar to the methane or propane monooxygenases. Both propane and methane monooxygenases are effective in removing chlorine atoms from chlorinated hydrocarbons, e.g., trichloroethylene, a major environmental hazard. Co-oxidation is involved in the mineralization of cycloalkanes, which are significant components of paraffin-base crude oil and constantly introduced into the biosphere.

Microorganisms that can utilize the cycloalkanes are not present in most environments, but organisms that express monoxygenases are abundant in most soils. Cyclohexane is oxidized to the homogenous cycloalkanone by these monooxygenases. Microorganisms that utilize cyclohexanol or cyclohexanone as growth substrate are abundant and can be readily isolated from soil.

Cyclohexanone oxygenase adds oxygen to form the lactone, which is ultimately cleaved to adipic acid. Adipic acid is subjected to p-oxidation, forming one molecule of acetyl CoA and one of succinyl CoA. These enter into the core pathways through the TCA cycle.

A combination of co-oxidation by one species and utilization of the oxygenated product by another results in mineralization of a compound that is relatively recalcitrant. These processes can remove from the environment both naturally occurring compounds and products of chemical synthesis.

(v) The methane cycle plays a major role in the recycling of carbonaceous compounds that enter anaerobic environments. Swamps, marshes, and rice fields are major sources of methane, but anoxic microenvironments in grasslands and soil are also important in methanogenesis. Ruminants, termites, and other animals, including humans, have microbial populations in their digestive system that generate methane.

The microorganisms responsible for methane biogenesis are all Archaea. Production of methane from biological sources far exceeds that from coalmines, gas wells, and other abiogenic sources. The methanogens are autotrophs and are not considered further.

Methanotroph is the name given to microorganisms that utilize methane as sole carbon and energy source. These microorganisms are heterotrophs and of widespread occurrence in soil and water. All are obligatorily aerobic bacteria The methanotrophic methylotrophs are limited to growth on methane and related one-carbon compounds, such as methanol.

There is also an array of non-methanotrophic methylotrophs in nature that utilize methanol, dimethyl sulfide, or methylamine as source of carbon and energy. These organisms are not obligate for one-carbon substrates and grow well on sugars, amino acids, organic acids, and other substrates. Methane is assimilated by two distinctly different mechanisms : the ribulose-monophosphate pathway (Type I) and the serine pathway (Type II).

The ribulose-monophosphate pathway somewhat resembles the Calvin-Benson cycle, whereby the one-carbon substrate is incorporated into a phosphorylated five-carbon sugar. The addition of the C1 in methylotrophs, however, does not result in cleavage of the sugar, as occurs in the Calvin-Benson mechanism.

The intermediate added to ribulose monophosphate in Type I methylotrophs is at the oxidation level of formaldehyde. CO2 is not utilized as carbon source by these bacteria. The ultimate product of C1 assimilation is fructose-6-phosphate, a precursor metabolite, and this compound readily fits into the core pathway(s).

The serine pathway is markedly different and quite complex. The ultimate product that enters the core pathway is the precursor metabolite acetyl CoA. The non methanotrophic methylotrophs characterized to date all employ the serine pathway for C1 assimilation. It is interesting to note that Type I and II microorganisms differ not only in carbon assimilation but are also distinctly different in cell structure and their classification.

The Type I group has internal membranes that are perpendicular to the long axis of the cell, whereas the internal membranes in Type II are parallel to the cell membrane. Both Type I and II are in the phylum proteobacter but the former are in the γ subdivision and the latter in the α group.

The limited number of examples cited above should serve to define metabolic diversity. Metabolic diversity among heterotrophs rests on the vast array of catabolic processes available to individual microbial species in the biosphere. The concerted action of these microorganisms results in the ultimate mineralization of virtually all the organic material that enters the environment. The preceding examples affirm that the basic difference between species is in the compound(s) that they can utilize as growth substrate, not in the mechanisms utilized in cell synthesis.

6. Genetic Diversity:

DNA is responsible for encoding the phenotypic characteristics of an organism. Differences in DNA sequences between organisms create genetic diversity. These changes are also responsible for the subtle differences (such as hair colour, eye colour, or height) between organisms of the same species.

This genetic diversity is able to manifest itself as biological diversity through the structure, organization, regulation, and expression of DNA. These effects determine how organisms develop physically, assimilate nutrients, interact with the environment, and even, in some cases, how they behave.

It is these properties of genetic diversity that support the effective stability of natural environments. Multiple biological and non-biological components interact through intricate nutrient cycling webs to create a macroscopic global environment. This global environment can be imagined as a pyramid in which the entire structure depends on each of the small blocks that are used to create and support the larger structure.

For this reason, it is often useful to examine the environmental impacts of small ecological components such as microorganisms. As noted above, microorganisms are believed to be the forms in which life originated on this planet. This theory is supported by the fact that they display the highest degree of biological diversity.

Genetic Polymorphism:

Genome sequences allow us to compare the set of genes that make an animal with those found in protists (e.g. comparison of a nematode worm or fly with yeast). Yeast has about 5,000 genes. Many multi-cellular animals have 15-20,000 genes (nematode worms, flies, sea squirts). Of these, a basic core of about 2,500 genes is conserved between animals and yeast. These core genes are usually involved in vital metabolic functions—”house keeping” functions that are common in all organisms.

Repetitive DNA:

Encoded in human DNA are about 100,000 genes. This represents only about 5% of the total DNA in the chromosomes. The function of the remaining 95% of the genome is not yet understood. The bulk of intergenic DNA is unique or low copy, but large areas of this apparently non-functional genome are also repetitive in nature. Both the unique and the repetitive components of intergenic DNA (and also of introns) are valuable resources of genetic polymorphism.

This is because there appear to be no constraints on introducing and maintaining sequence changes as long as there is no function that can be impeded by such changes. Repetitive DNA can be further subdivided, both with respect to the degree of repetition and with respect to the relative location of the elements of a repeat. Repeat families comprise a continuum of copy numbers per genome, from just a few up to millions.

Genetic polymorphism is partly caused by variable numbers of elements at a given repetitive locus and repeat families differ from one another in the way their elements are arranged in the genome. The latter can be usefully divided into tandem repeats and interspersed repeats.

Tandem Repeats:

Repetitive sequence elements, which are arranged in tandem, are known a satellite, minisatellite and microsatellite sequences. Currently the three terms refer to different levels of repetition and different repeat unit length (See Table 3 below).

Different levels of repetition and different repeat unit length

Satellite DNA is dispersed over almost the entire genome. Satellites, minisatellites and microsatellites can be highly variable and this form excellent tools for genetic individualization. Their variability is most often due to particular arrays on a given chromosome having different repeat numbers in different people.

Thus they form allelic variants and for a number of mini-and microsatellites almost every individual is heterozygous. Polymorphism created by such elements is termed variable number of tandem repeats (VNTR) polymorphism. It is worthwhile to mention that in the literature, the use of these three terms are not uniform and is sometimes rather confusing. Mini-satellites are sometimes equated with VNTRS but VNTR is a term applicable to all repeat classes.

Microsatellites are occasionally referred to as ‘simple’ sequences or short tandem repeats (STR), but STR is also used for synthetic tandem repeat probes capable of detecting mini-satellite sequences. Polymorphism due to variation in the number of elements within a given array is thought to be generated during DNA replication, for example, by mutational process of slipped strand mispairing.

In addition to allelic variations in repeat number, polymorphism at mini- and microsatellite loci can also be caused by sequence changes in the vicinity of these repeats. Finally, a novel approach aims at exploiting variation within the repeat units of different mini-satellite alleles. This is known as mini-satellite variant repeat polymerase chain reaction (MVR-PCR) or ‘digital fingerprinting’.

Interspersed Repeats:

In the nuclear DNA of humans and most other eukaryotes, repetitive sequences can be found that are not organized in tandem arrays but are more or less regularly interspersed with unique DNA sequences throughout the genome. The Alu and Kpn repeat families are representative examples of short and long interspersed nuclear elements, known as SINES and LINES, respectively. In evolutionary terms, they have probably contributed to genetic differences between species and individuals by playing a role in retrotransposition events and in promoting unequal crossing-over.

Characterisation of DNA Polymorphism:

For characterization and analysis of DNA polymorphism two routes may be pursued. One is to aim at one locus at a time, the single-locus approach, whereas the other is to analyze several loci simultaneously—the multilocus approach. While the latter method yields a DNA fingerprint in one step, the former gives a multilocus DNA profile only by combining a number of locus-specific assays (Krawcjak and Schmidtke, 1998).

Example of restriction fragment length polymorphism

(a) Single-Locus Approaches to DNA Polymorphism:

A probe that detects a single hyper variable locus is called a locus specific or a single locus probe (SLP). By selective cloning of large mini-satellites it has been possible to isolate some of the most variable loci in the human genome (Singh, 1991). Fifteen hyper variable loci, with heterozygosities ranging from 60% to 99.4%, with an average value of 95%, have been isolated (Singh, 1991).

Example of Variable number Tandem Repeat (VNTR) polymorphism

DNA polymorphisms were first studied by analyzing DNA digested with restriction enzymes using Southern blotting and molecular hybridization techniques. An example of such an experiment is illustrated in Fig 5.4a. In this particular case, genomic DNA from three individuals was digested with the restriction enzyme Mspl, gel-separated, blotted on to a -membrane, and subsequently hybridized with a radiolabeled DNA probe.

Three Alternative Patterns Arose:

(1) Shows a strong band of 5 kb,

(2) Has a strong band of 2 kb and

(3) Exhibits both bands, each at about half the intensity of the other two patterns.

The interpretation of this result is easy when it is remembered that humans are diploid organisms—every individual carries two homologous chromosomes in somatic cells, one inherited from the mother and one from the father. Patterns (1) and (3) are due to both chromosomes being identical with respect to the polymorphism revealed here (homozygosity), and band pattern (2) is due to the two chromosomes being different (heterozygosity). Fig 5.4b depicts these two chromosomes, abbreviated as A and B, showing that the difference between them is due to the presence of an additional Mspl recognition site on chromosome B, and the absence of this site on chromosome A.

Thus, pattern (1) corresponds to the presence of two A chromosomes (genotype AA), pattern (3) reflects the presence of two B chromosomes (genotype BB), and pattern (2) is due to the presence of one A and one B chromosome (genotype AB). The type of polymorphism just described is called a restriction fragment length polymorphism (RFLP), because it obviously comprises restriction fragments of different lenghts.

The cause of the length difference is sequence variation within the recognition sequence itself. Thus, this type of polymorphism is a restriction site polymorphism (RSP). This sub-classification is necessary because there are other types of RFLPS, which are not due to alterations of the restriction enzyme recognition site itself.

Fig. 5.5 shows the analysis of genomic DNA from the same three individuals as in Fig. 5.4 following digestion with the restriction enzymes Hinf I and Hae III, Southern blotting, and probing with another DNA probe. All three individuals show different patterns, but these appear to be independent of the enzyme used, differing only with respect to the absolute but not the relative size of the bands.

Fig 5.5b illustrates the molecular basis of this polymorphism. Although the restriction sites themselves are invariant, the space between them is not. The different chromosomes contain different numbers of short, tandemly arranged repetitive sequence elements. This type of polymorphism is called a variable number of tandem repeat (VNTR) polymorphism. Note that a probe has been used that recognizes a unique i.e. locus-specific DNA sequence flanking the repeat array.

RFLP analysis Based on PCR

With PCR technology many RFLPS can be visualized, bypassing the rather laborious methods of Southern blotting. A PCR reaction will usually generate enough DNA to be directly visible on a gel after staining. If the PCR amplified DNA of an RSP-containing region is cut by the corresponding enzyme prior to electrophoresis, band size differences show up directly (Fig. 5.6). Similarly, a VNTR-containing locus may also be investigated directly using the PCR approach (Fig. 5.7).

Analysis of VNTR polymorphism by PCR technique

So far, we have considered only polymorphisms that manifest themselves in inter- chromosomal length differences of the DNA segment under study. In order to characterize such polymorphisms precisely, a technique was required by which the DNA fragments could be size-separated. However, when the question is reduced to determining the presence or absence of a particular DNA sequence, then simpler methods, such as the ASO approach can be used.

There are, of course, many other types of polymorphism than those described above that can be studied at the DNA level. Deletions, insertions and rearrangements of smaller or larger segments occur in the human and other genomes. Outside coding regions, such sequence alterations are usually of no phenotypic consequence, and may thus also attain population frequencies high enough to be useful for DNA fingerprinting. An ever-expanding collection of single-copy probes is available today and is in current use in forensic and medical settings as well as in basic genetic research.

In Order to Obtain DNA Profiles of Optimal Discriminatory Power, as Required, for Example, in Most Forensic Casework, Probes have to be Combined that:

(1) Detect loci not linked to each other and lacking allelic association ;

(2) Are specific for a single locus each;

(3) Detect polymorphic loci with a sufficient number of alleles of appropriate frequencies;

(4) Create easily interpretable gel or blot patterns (e.g. one band in homozygotes and two bands in heterozygotes).

Wong et al. (1987) and Smith et al. (1990) have described a set of five probes, which have a calculated probability of less than one in 3x 1013 of producing identical DNA profiles from two unrelated individuals. Fig. 8 shows the results of applying three probes (MS8, MS43 and G3) and two of these probes (MS1 and MS31), respectively, in a small nuclear family with mother, father and child.

It should be noted that, in contrast to DNA fingerprints obtained with multilocus probes, these banding patterns are simple superimpositions of complete single-locus patterns. One of the most variable and thus informative loci described to date, D1S7 is among those detected by the aforementioned probes (MS1). It has a 9 bp basic repeat unit. More than 99% of individuals tested so far have been heterozygous, with alleles ranging from 1 kb to 23 kb.

Theoretically, more than 2,400 different allelic length can be expected at this locus. These can be resolved by conventional agarose gel electrophoresis, quasi-continuous allele length distributions result. Zischler et al. (1992) have reported on a set of similarly efficient single-locus probes which were derived from sequences containing (CAC)n repeat elements.

It has been known for a long time that the constituent repeats of mini-satellites are not necessarily completely identical but may differ slightly from the core sequence (Krawcjak and Schmidtke. 1998). It is again thanks to Jeffreys and his co-workers that this observation was systematically exploited by a technique called mini-satellite variant repeat mapping by PCR (MVR-PCR), also referred to as ‘digital DNA fingerprinting’ (Krawcjak and Schmidtke, 1998).

The method was established for the mini-satellite locus DISS, detected by probe MS32, which comprises at least 50 different length alleles with :he repeat units measuring 29 bp. Some of the repeat units, however, show base substitutions, and the ordering of wild type and mutant repeat units within an array define a huge variety of alleles.

Using combinations of PCR primers specific for these sequence variants and primers outside the repeat array common to all alleles, highly individual- specific digital codes can be generated. The great advantage of this approach rests on the fact that phenotype and genotype data are easily computerized. This novel technique is likely to be highly useful both in forensic analyses and in paternity testing.

(b) Multilocus Approaches to DNA Polymorphism:

A multilocus DNA fingerprint can be generated either by the simultaneous application of several probes, each one specific for a particular locus, or by applying a single DNA probe that simultaneously detects several loci. Earlier it was noted that a VNTR polymorphism at a specific locus is normally visualized using a DNA probe flanking the polymorphic repeat array.

However, if the elements of a repeat array occur not only at that one locus but also recur in other parts of the genome (again in tandem repeats and perhaps also in variable numbers), then the repeat sequence itself could be used as a probe in order to obtain a multilocus pattern.

The position and intensity of individual bands in this pattern would depend mainly on two potential variables: (1) the average number of repeat elements covered by an array, and (2) the distance between the repeat array and the nearest recognition site for restriction enzyme used

The relative contribution of these two variables depends critically upon what proportion of the restriction fragment is actually occupied by the repeat array. If this proportion is large, the band position and intensity will be roughly proportional to the element number of the array. However, if it is small, the element number will influence only band intensity, whereas the band position will depend mainly on sequence characteristics outside the repeat array.

As with single-locus probes, a large number of multilocus systems have been developed. In contrast to single-locus probes, which, by definition, detect a unique locus each, there is considerable overlap between some multilocus probes with respect to the loci with which they react. This means that expanding a set of single-locus probes will always increase the amount of information, whereas the application of several multilocus probes may not.

As yet, the molecular basis of this phenomenon is only poorly understood. It is interesting to note that many multilocus probes have been shown to cross-hybridize to polymorphic loci in a wide range of animal and plant species. It is not clear whether this obvious sequence conservation also reflects a shared function.

The most widely used multilocus probes—named 33.6 and 33.15—were developed in the laboratory of Alec Jeffreys, who also coined the term DNA fingerprinting (Jeffreys et. al. 1985). These probes typically detect 17 variable DNA fragments per individual in the size range of 3.5 – 20 kb.

Only about 1% fragments are co-detected by both probes. Detailed analyses of 33.6 and 33.15 revealed that, at the genomic level, the mini-satellites detected by these probes are composed of tandem arrays of 3-40 elements having a core sequence of GGGCAGGANG (with N standing for any of the four bases).

Between different arrays, considerable variation is seen with respect to the extent of sequence heterogeneity among their constituent elements. It was found that probes derived from arrays with high element homogeneity pick up a high degree of repeat number polymorphism in the population, and vice versa. This is the expected result when the polymorphism is generated by unequal crossing-over events during meiosis.

Another multilocus DNA fingerprinting system that deserves special attention makes use of synthetic oligonucleotides. This approach was devised and developed by Jor Epplen and his group (1986). Based on the observation that certain repeat sequences such as GATA or GACA are common to all eukaryotic genomes (Singh et al. 1981 ; Epplen et al. 1982), probes were constructed that are capable of detecting hypervariable loci and thus can generate DNA fingerprints from human DNA.

These probes include multimers of CA, CR, GATA, GACA, GAA, GGAT, or TCC. However, the highest degree of genetic individualization in humans was achieved using a CAC pentamer (Schafer et al. 1988), which is now widely used in paternity testing (Krasczak et al. 1993). It is as yet unclear how polymorphism at loci detected by probe (CAC)5 actually comes about, but it can be speculated that the observed patterns derive from perfect repeats lying adjacent to more degenerate repeats thus forming simple mini-satellites.

Some other extensively used multilocus probe (MLP) systems are M13 (Vassart et al., 1987), Bkm and Bkm-derived clone 2(8) (Singh and Jones, 1986 ; Jones et al. 1987; Singh et al., 1988), Drosophila per gene related to 6-base pair tandem repeat (Georges et al. 1987), tandemly repeated core sequence downstream to α-globin gene (Fowler et al. 1988) etc. Probes 33.6 and 33.15 have already been extensively used in the UK.

In India use is being made of Bkm probe in paternity disputes and crime investigations. Singh et al. (1981) isolated a sex chromosome-associated repetitive DNA from female Indian banded krait Bungarus fasciatus as a minor satellite DNA component and designated it Bkm. Bkm sequences have been shown to be present in many eukaryotes but absent in any appreciable quantity in prokaryotes.

These are preferentially concentrated in the sex-determining region of the sex chromosomes of Drosophila, snakes, birds, mouse and man (Singh, 1991) and are also present scattered in other parts of the genome. The conserved components of Bkm are long arrays of repeats of the tetranucleotide GATA.

These scattered copies of Bkm can be used to detect RFLP after restricting human DNA with the restriction enzymes Hinfl, Alul and BstNI and Southern blotting Bkm probe or Bkm- associated clone 2(8) (Singh and Jones, 1986). The use of this variability, detected earlier by Singh and co-workers, for DNA finger-printing has now been thoroughly tested and its potential use in forensic investigations confirmed.

The Bkm probe 2(8) detects qualitatively more polymorphic regions in the human genome that the probe used by Jeffreys (Singh, 1991). Because of the association of Bkm sequences with the sex- determining chromosome, by using the Bkm probe, Singh and co-workers (Singh, 1991) recovered clones from human and mouse genomic libraries that are Y-chromosome-specific and, therefore, can be used for sexing biological samples.

Population analysis of 300 individuals using a combination of Bkm and Bkm-associated probes have revealed a novel amplification of specific DNA fragments that exist as free copies in the cell (Singh, 1991). After extensive population studies and calculation of allele frequencies for different populations in India, Bkm is being successfully used in forensic investigation.

A single MLP provides sufficient numbers of variable bands to establish positive identity of an individual beyond reasonable doubt. It, therefore, constitutes a single powerful test for positive identification of stain or tissue or of parentage, including paternity. The probability of observing identical patterns for two individuals with an MLP is of the order of one in 1014 to one in 1030. Considering the world population is less than 5 ∞ 1010, DNA fingerprint patterns are highly unique. An MLP cannot, however, be used reliably to type mixed samples, for example in the case, of multiple rape.

Single Nucleotide Polymorphism (SNP):

Single nucleotide polymorphisms or SNPS (pronounced “snips”) are DNA sequence variations that occur when a single nucleotide (A, T, C or G) in the genome sequence is altered. For example a SNP might change the DNA sequence AAGGCTAA to ATGGCTAA. For a variation to be considered a SNP, it must occur in at least 1% of the population, SNPS, which make up about 90% of all human genetic variation, occur every 100 to 300 bases along the 3-billion-base human genome.

Two of every three SNPS involve the replacement of cytosine (C) with thymine (T). SNPS can occur in both coding (gene) and non-coding regions of the genome. Many SNPS have no effect on cell function, but scientists believe others could predispose people to disease or influence their response to a drug.

Although more than 99% of human DNA sequences are the same across the population, variations in DNA sequence can have a major impact on how humans respond to disease; environmental insults such as bacteria, viruses, toxins, and chemicals and drugs and other therapies.

This makes SNPS of great value for biomedical research and for developing pharmaceutical products or medical diagnostics, SNPS are also evolutionarily stable—not changing much from generation to generation—making them easier to follow in population studies. Scientists believe SNP maps will help them identify the multiple genes associated with such complex diseases as cancer, diabetes, vascular disease, and some forms of mental illness. These associations are difficult to establish with conventional gene-hunting methods because a single altered gene may make only a small contribution to the disease.

Identification of SNPS and their Characterization:

An understanding of human gene variation is required for human disease genetics. A few studies have focused on analysis of individual genes in many individuals e.g. those encoding beta-globin and lipoprotein lipase. One study examined 49 genes by comparing two independent sequences in public databases. Cargill et al. (1999) surveyed coding sequence diversity of 106 genes with roles in cardiovascular, endocrine and neurological systems in a sample of Europeans, African Americans, African Pygmies and Asians with an average of 114 chromosomes screened for each gene. 560 SNPS (confirmed by sequencing) were identified in the 196.2 Kb analyzed.

Their data suggested that the average gene contains—4 coding sequence SNP’S (CSNPS) with allele frequencies at a few % in the human population. From this figure they estimated that 240,000-400,000 common CSNPS exist over the human genome in line with workers who performed a similar study on 75 candidate genes for human blood pressure and hypertension using variant detection arrays (VDA) method.

Home››Biodiversity››