The following points highlight the three important techniques for genome sequencing. The techniques are: 1. Shotgun Approach for Small Bacterial Genome 2. Sequencing of Mapped Clones for S.Cerevisiae Genome 3. Mapping and Sequencing the Human Genome.
Technique # 1. Shotgun Approach for Small Bacterial Genome:
The first bacterial sequence to be published, for the 1830 kb genome of the bacterium Haemophilus influenzae, was obtained by the shotgun approach. The details of this genome project were quite astonishing.
The project was carried out by a large team including R.D.Fleiscmann, M.D.A (lams, O. White and others at The Institute for Genomic Research (TIGR) at Gaithersburg, Maryland. This project involved breaking of DNA H. influenzae into fragments between 1.6 and 2.0 kb in size, cloning these fragments and then carrying out 28643 individual sequencing experiment with 19687 clones taken at random from the collection.
Of these experiments, 4339 were deemed to be ‘failures’ as they resulted in less than 400 bp of sequence. The remaining 24304 (i.e., 28643-4339) sequences were entered into the computer which 30 hours later, produced 140 large assemblies of contiguous overlapping sequences called contings. Each of these contings being a sub-fragment of the genome as whole.
Individual contings did not overlap- by chance the sequences in the gaps between them had not been obtained. The next step was therefore to work out which pairs of contings were adjacent to one another on the genome.
For this several approaches were used. The most useful approach involved the hybridization analysis of a genomic library of H .influenzae. DNA was cloned in a lambda (λ) vector. The probes in these experiments were short olignucleotides, synthesized in the laboratory, whose sequence matched the sequences at the ends of each counting.
If two oligonucleotides hybridized to the same lambda (λ) clone, then the ends of the contings from which they were designed must lie within that clone; in other words the two contings must be adjacent in the genome. Once a pair of adjacent contings was discovered, the gap between them was closed by sequencing the appropriate piece of DNA from within the lambda (λ) clone.
Shotgun sequencing of other Bacterial Genome:
The advantage of the shotgun approach is that it is relatively rapid, especially if the project is set up on a production-line basis. This is done with a large group of researchers working in a coordinated fashion, each person with his or her task in preparing DNA, carrying out the DNA sequencing experiments, or analysing the data.
Three months after the Haemophilus influenza sequence, TIGR published the sequence of the Mycoplasma genitalium genome. The genome was relatively short at 580 kb, required only 9846 sequencing experiments which were carried out by five people over a period of eight weeks.
A year later, the more complicated 1739 kb genome of Methanococcus jannaschii (36718 sequencing experiments) was published. These achievements are good examples of advantage of the shotgun approach.
Technique # 2. Sequencing of Mapped Clones for S.Cerevisiae Genome:
From the data described above regarding the H .influenzae and M .genitalium sequencing projects, it can be estimated that a shotgun approach for sequencing the yeast genome, which is 12520 kb, would require over 200000 individual sequencing experiments. If five people can carry out almost 10,000 sequencing experiments in eight weeks then obtaining 200000 sequences is quite possible but definitely it is not an ideal approach.
For some organisms with genomes of 10,000 to 20,000 kb the answer might be ‘yes’, presuming that future computer technology can deal with the data analysis. But with yeast (S. cerevisiae) an alternative and more efficient approach was possible.
This was because the yeast genome had been completely mapped by conventional genetic analysis so the positions of many genes on its 16 chromosomes were already known. The advantage of complete genetic map is that it allows genomic clones containing particular regions of the genome to be identified simply by establishing which genes are contained in the DNA carried by the clone.
If enough clones are examined then it should be possible to identify a series that spans an entire chromosome, as was done for the first yeast chromosome to be sequenced, number III.
The average size of the DNA molecules carried by the 29 clones that spanned yeast chromosome III was 10.8 kb. Rather than sequencing all these clones in a single laboratory, this particular project was set up as a collaborative venture between 35 research groups in countries throughout Europe.
Each group was responsible for generating 10-20 kb of sequence from different segments of the chromosome. The resultant data were assembled by a central laboratory at Martinsried, Germany that specializes in DNA sequence handling.
Thus, the yeast genome (i.e., chromosome III) was the first eukaryotic chromosome sequence which was worked out in 1992 by S.G Oliver, Q.J.M. van der Aart, M.L. Agostinicarbone et al.
The entire project of mapping chromosome III of yeast took two years. Parallel projects were set up elsewhere in Europe, USA, Canada and Japan to sequence the other 15 yeast chromosomes and the complete genome sequence was obtained in just under six years.
Technique # 3. Mapping and Sequencing the Human Genome:
The human genome is 240 times the length of the yeast genome and 1640 times that of Haemophilus influenzae. Human Genome Project had faced problems on a scale for beyond those that confronted the organizers of the bacterial and yeast projects.
However, the basic strategy for sequencing the human genome is not very different from that adopted by the yeast project, i.e., identify a series of clones that span a chromosome and use these to obtain sequences that can be positioned on the human DNA map.
The main problem was that until the Human Genome Project got under way, no detailed map of the human genome was available. The reason for this is quite clear: conventional genetic analysis is not possible with humans because controlled breeding experiments are unthinkable.
Until the early 1980s, alternative means of mapping genes had not been developed. Therefore, the primary objective of the Human Genome Project was to build up the genetic map needed to support the DNA sequencing stage of the project.
For this purpose the following methods have been adopted:
1. A Limited Genetic Map has been Established by Human Pedigree Analysis:
During the 1980s the first attempts were made to develop of human genetic map by examining the inheritance of genetic markers in human families. This is an extension of pedigree analysis, which had been used for several years as a means of establishing the likelihood that unborn children will inherit a disease.
To adapt pedigree analysis for the development of a human genetic map all that was necessary was to follow the inheritance of a variety of markers, and then to use the standard methods for eukaryotic genetic analysis to identify pairs of markers which are linked and to determine the recombination frequencies between them.
The procedure that was adopted was to select a small number of reference families, 21 in the first major study, and examine the inheritance of as many markers as possible by determining the genotypes of all the members of these families. In these studies, multiallelic markers such as blood groups and HLA genes were also used.
2. Use of Microsatellite DNA:
In 1990s, rather than depending on genes as genetic markers, the researchers at the Genethon Laboratory in France used parts of the extra-genic DNA, i.e., microsatellite clusters. Microsatellite clusters are made up of repeats of short sequences such as 5′-CA-3′ exist at numerous places in the human genome, and the number of repeats in a unit can change length as a result of recombination or errors in DNA replication. This means that each of these units is, in effect, a multiallelic locus, in which alleles are distinguished by the number of repeats that are present.
As there are many microsatellite loci scattered around the genome they are ideal markers for obtaining a detailed genetic map: the most recent version of the human map includes locations for 5264 of them.
3. Placing Genomic Clones on the Human Genetic Maps:
The progress in developing the human genetic map was paralleled by equally important development in DNA cloning. As with the yeast (S.cerevisae) project, a prerequisite for the sequencing phase of the human project is to obtain a library of genomic clones, each containing a different fragment of DNA, that form an overlapping series that spans each human chromosome.
With S.cerevisiae, this was achieved by cloning DNA in a cosmid vector, which until the early 1990s was the type of vector capable of handling the largest pieces of DNA. But the maximum length of DNA that can be cloned in a cosmid vector is only 40 kb.
This is suitable for a yeast genomic library as only 1000 clones are required for a 95% chance that every segment of genome is included (Table 59.1), but the equivalent library with human DNA contains over 250000 clones. This is simply too many clones to deal with.
Calculated from the following formula:
Where N = number of clones; P = probability that any given gene is present, set at 0.95 (i.e., 95%) in these calculations; a – average size of the DNA fragments in the clone library; b = total size of the genome.
The strategy used by the Human Genome project has been to prepare the clone library in YAC (i.e., yeast artificial chromosome) vectors. These vectors have such high cloning capacities that each can carry 1000 kb or more of DNA, meaning that a complete library requires only 6500 clones.
Although this is still a large number, but it is manageable. In fact, to ensure that the coverage of the genome is as complete as possible, the main library that has been used contains 33000 clones.
With the availability of YAC libraries of the human genome, the main activity included placing the individual clones on the genetic map, in order to identify those that carry overlapping pieces of DNA. Several procedures were being used, one of these being to identify which clones contain which microsatellite loci.
This can be done relatively easily by PGR with the primers originally used to type the alleles present at an individual locus. If a PGR, specific for a single microsatellite, is carried out with all the genomic clones in a library, then those clones contains that microsatellite are revealed. This allows clones to the assigned to the appropriate positions on the genome.
The main drawback is the worry that, because of problems with preparation of YAG libraries, the DNA present in some clones may have undergone rearrangements, and some clones may in fact contain pieces of DNA from different parts of the genome.
If this is the case then these libraries clearly cannot be used as source of DNA for sequencing, and indeed errors may occur in positioning clones on the genetic map. It may, therefore, be necessary to obtain new libraries, for example, in a P1 vector (plasmid vector).
These cannot be used for such large pieces of DNA and YAGs, so more clones will have to be examined, but it is hoped that even if the information provided by the YAG libraries is inaccurate, it will provide researchers with a shortcut to positioning the P1 clones on the genetic map.