This article provides a close look on human genome project.
The most important features of a DNA molecule are the nucleotide sequences, and the identification of genes and their activities. Since 1920, scientists have been working to determine the sequences of pieces of DNA. This was further extended for the complete sequence determination of genome of certain lower organisms e.g. plasmid pBR 322 in 1979. The mitochondrial genome was sequenced in 1981.
Contents
The Birth and Activity of Human Genome Project:
The human genome project (HGP) was conceived in 1984, and officially begun in earnest in October 1990. The primary objective of HGP was to determine the nucleotide sequence of the entire human nuclear genome. In addition, HGP was also entrusted to elucidate the genomes of several other model organisms e.g. Escherichia coli, Saccharomyces cerevisiae (yeast), Caenorhabditis elegans (roundworm), Mus musculus (mouse). James Watson (who elucidated DNA structure) was the first Director of HGP.
In 1997, United States established the National Human Genome Research Institute (NHGRI). The HGP was an international venture involving research groups from six countries—USA, UK, France, Germany, Japan and China, and several individual laboratories and a large number of scientists and technicians from various disciplines. This collaborative venture was named as International Human Genome Sequencing Consortium (IHGSQ and was headed by Francis Collins.
A total expenditure of $3 billion, and a time period of 10-15 years for the completion of HGP was expected. A second human genome project was set up by a private company — Celera Genomics, of Maryland USA in 1998. This team was led by Craig Venter. Very rapid and unexpected progress occurred in HGP with good cooperation between the two teams of workers and improved methods in sequencing.
Announcement of the draft sequence of human genome:
The date 26th June 2000 will be remembered as one of the most important dates in the history of science or even mankind. It was on this day, Francis Collins and Craig Venter, the leaders of the two human genome projects, in the presence of the President of U.S., jointly announced the working drafts of human genome sequence. The detailed results of the teams were later published in February 2001 in scientific journals Nature (IHGSC) and Science (Celera Genomics).
The human genome project results attracted worldwide attention. This achievement was hailed with many descriptions in the media.
i. The mystery of life un-raveled.
ii. The library of life.
iii. The periodic table of life.
iv. The Holy grail of human genetics.
It may however, be noted that the draft human genome sequences were not complete, and may represent around 90%. The remaining 10% is made up of sequence where few genes are located.
Mapping of the Human Genome:
The most important objective of human genome project was to construct a series of maps for each chromosome. In Fig. 12.1, an outline of the different types of maps is given.
1. Cytogenetic map:
This is a map of the chromosome in which the active genes respond to a chemical dye and display themselves as bands on the chromosome.
2. Gene linkage map:
A chromosome map in which the active genes are identified by locating closely associated marker genes. The most commonly used DNA markers are restriction fragment length polymorphism (RFLP), variable number tandems repeats (NTRs) and short tandem repeats (STRs). VNTRs are also called as minisatellites while STRs are microsatellites.
3. Restriction fragment map:
This consists of the random DNA fragments that have been sequenced.
4. Physical map:
This is the ultimate map of the chromosome with highest resolution base sequence. Physical map depicts the location of the active genes and the number of bases between the active genes.
Organization of Human Genome:
An outline of the organization of the human genome is given in Fig. 12.2. Of the 3200 Mb, only a small fraction (48 Mb) represents the actual genes, while the rest is due to gene-related sequences (introns, pseudo genes) and inter-genic DNA (long interspersed nuclear elements, short inter-spread nuclear elements, microsatellites, DNA transposons etc.). Inter-genic DNA represents the parts of the genome that lie between the genes which have no known function. This is appropriately regarded as junk DNA.
Genes Present in Human Genome:
The two genome projects differ in their estimates of the total number of genes in humans. Their figures are in the range of 30,000-40,000 genes. The main reason for this variation is that it is rather difficult to specifically recognize the DNA sequences which are genes and which are not.
Before the results of the HGP were announced, the best guess of human genes was in the range of 80,000-100,000. This estimate was based on the fact that the number of proteins in human cells are 80,000-100,000, and thus so many genes expected. The fact that the number of genes is much lower than the proteins suggests that the RNA editing (RNA processing) is widespread, so that a single mRNA may code for more than one protein.
A diagrammatic representation of a typical structure of an average human gene is given in Fig. 12.3. It has exons and introns.
A broad categorization of human gene catalog in the form of a pie chart is depicted in Fig. 12.4. About 17.5% of the genes participate in the general biochemical functions of the cells, 23% in the maintenance of genome, 21% in signal transduction while the remaining 38% are involved in the production of structural proteins, transport proteins, immunoglobins etc.
Human Genes Encoding Proteins:
It is now clear that only 1.1-1.5% of the human genome codes for proteins. Thus, this figure 1.1-1.5% represents exons of genome.
As already described, a huge portion of the genome is composed of introns, and inter-genic sequences (junk DNA).
The major categories of the proteins encoded by human genes are listed in Table 12.4. The function of at least 40% of these proteins are not known.
Marked Differences in Individual Chromosomes:
The landscape of human chromosomes varies widely. This includes many features such as gene number per mega base, GC content, density of SNPs and number of transposable elements. For instance, chromosome 19 has the richest gene content (23 genes per mega base) while chromosome 13 and Y chromosome have the least gene content (5 genes per mega base).
Other Interesting/Important Features of Human Genome:
For more interesting features of human genome, refer Table 12.3.
i. It is surprising to note that the number of genes found in humans is only twice that present in the roundworm (19,099) and thrice that of fruit fly (13,001).
ii. Around 200 genes appear to have been derived from bacteria by lateral transfer. Surprisingly, none of these genes are present in non-vertebrate eukaryotes.
iii. The proteins encoded by human genes are more complex than that of invertebrates.
iv. The flood of the data of human genome projects will be highly useful for bioinformatics and biotechnology.
Genomes of Some Other Organisms Sequenced:
Sequencing of genomes is not confined to humans. For obvious reasons and significance, human genome sequencing attracted worldwide attention. In fact, the first genome sequence of the bacteriophage QX174 was determined in 1977. Yeast was the first eukaryotic organism to be sequenced (1986). Recently, the mouse, an animal model closest to human has been sequenced. A selected list of genomes that have been sequenced is given in Table 12.5.
Ethics and Human Genome:
The research on human genomes will make very sensitive data available that will affect the personal and private lives of individuals. For instance, once it is known that a person carries genes for an incurable disease, what would be the strategy of an insurance company? How will the society treat him/her?
There is a possibility that individuals with substandard genome sequences may be discriminated. Human genome results may also promote racial discrimination categorizing the people with good and bad genome sequences. Considering the gravity of ethics related to a human genome, about 3% of the HGP budget was earmarked for ethical research.
In the 1990s, there was a move by some scientists to patent the genes they discovered. This created an uproar in the public and scientific community. Fortunately, the idea of patenting genes (of human genome sequencing) was dropped. The fear still exists that genetic information will be used for commercial purposes.