In this article we will discuss about the meaning an classes of DNA polymorphisms.

Meaning of DNA Polymorphisms:

Different alleles of a gene produce different phenotypes which can be detected by making crosses between parents with different alleles of two or more genes. Then by determining recombinants in the progeny, a genetic map can be deduced.

These are low resolution genetic maps that contain genes with observable phenotypic effects, all mapped to their respective loci. The position of a specific gene, or locus can be found from the map. However, measurements showed that the chromosomal intervals between the mapped genes would contain vast amounts of DNA.

These intervals could not be mapped by the recombinant progeny method because there were no markers in those intervening regions. It became necessary to find additional differential markers or genetic differences that fall in the gaps. This need was met by exploitation of various polymorphic DNA markers.

A DNA polymorphism is a DNA sequence variation that is not associated with any observable phenotypic variation, and can exist anywhere in the genome, not necessarily in a gene. Polymorphism means one of two or more alternative forms (alleles) of a chromosomal region that either has a different nucleotide sequence, or it has variable numbers of tandemly repeated nucleotides.

Thus, it is a site of heterozygosity for any sequence variation. Many DNA polymorphisms are useful for genetic mapping studies, hence they are referred to as DNA markers. DNA markers can be detected on Southern blot hybridisation or by PCR.

The alleles of DNA markers are co-dominant, that is they are neither dominant nor recessive as observed in alleles of most genes. DNA polymorphisms constitute molecularly defined differences between individual human beings.

Classes of DNA Polymorphisms:

There are some major classes of DNA polymorphisms.

They are:

1. Single Nucleotide Polymorphisms:

SNP is a single base pair change, a point mutation, and the site is referred to as SNP locus. SNPs are the most common type of DNA polymorphism, occurring with a frequency of one in 350 base pairs, and accounting for more than 90 per cent of DNA sequence variation. The majority of SNPs are found to be present in the non-coding regions of the genome, known as non-coding SNPs. SNPs in the coding regions, that is within genes, are known as coding SNPs (cSNPs).

Detailed studies of cSNPs in humans indicate that each gene has about four cSNPs, half of which resulting in missense mutations in the encoded protein, and half of which produce silent mutations. Whether a cSNP affects a phenotype, depends on the amino acid that is changed by the polymorphism.

About one-half of missense mutations that are SNPs are estimated to cause genetic disease in humans. A non-coding SNP can also affect gene function if it is located in the promoter region or in the gene regulatory region. A small number of SNPs can create a restriction site, or eliminate an already existing restriction site. SNP-induced alterations in restriction sites are detected by using the restriction enzyme followed by Southern blot analysis or PCR.

An individual SNP locus can be analysed by using the technique of allele-specific oligonucleotide (ASO) hybridisation. The search for one particular SNP locus in humans is a challenge, because this is one base pair that is polymorphic out of the three billion base pairs in the human genome.

In the ASO technique, a short oligonucleotide that is complementary to one SNP allele is synthesised and mixed with the target DNA. Hybridisation is performed under high stringency conditions that would allow only a perfect match between probe and the target DNA. That means, the oligonucleotide will not hybridize with target DNA that has any other SNP allele at that locus. Positive result of hybridisation indicates the SNP locus precisely.

A more recent technique of DNA Microarrays can be used for simultaneous typing of hundreds or thousands of SNPs. Details of this technique used for SNPs and genome wide gene expression are described later in this section.

A small number of SNPs can lead to changes in restriction sites either by creating a restriction site or eliminating one. Such SNPs can be detected by using the restriction enzyme for the site, and detection is done by Southern blot analysis or PCR. The different patterns of restriction sites in different genomes yield fragments of different lengths, called restriction fragment length polymorphisms (RFLPs) described below.

2. Restriction Fragment Length Polymorphisms:

RFLPs are restriction enzyme recognition sites that are present in some genomes and absent in others. Consider an organism heterozygous for an RFLP whose genotype we represent as Rr. This organism is backcrossed with another that is homozygous for the RFLP variation allele (rr). Genomic DNA from the progeny of this cross (Rr x rr gives progeny of which 50% is Rr and 50% is rr) is subjected to restriction enzyme digestion, and fragments separated on Southern blots.

The restriction fragments obtained are hybridised with a probe (a cloned DNA fragment) that will distinguish the various genotypes for an RFLP. The probe DNA is unique because it comes from only one DNA segment of the genome and that overlaps the restriction site. A key point of this technique, therefore, is the use of a specific cloned single-copy DNA probe that is specific for an individual marker locus.

Crosses between the positive RFLP organism with other RFLP bearing organisms would yield parental combinations and re-combinations. From the frequency of recombinants, a detailed RFLP map can be produced. RFLPs were the first DNA markers that were in use for characterisation of plant and animal genomes. They have now been replaced by markers based on variation in the number of short tandem repeats (STRs) described below.

3. Short Tandem Repeats:

STRs are also known as microsatellites and simple sequence repeats (SSRs). A tandem repeat is a sequence that is repeated end to end in the same orientation. STRs are 2 to 6 base pair DNA sequences tandemly repeated a few times.

For example, the sequence TCACATCACATCACATCACATCACA is a five-fold repeat of the sequence TCACA. There are dinucleotide, trinucleotide, four-nucleotide, five-nucleotide and six-nucleotide STRs in the human genome.

Microsatellite analysis can be done using a single-copy DNA to serve as a PCR primer pair specific for each marker locus. In contrast with RFLPs that have only one or two alleles in a population, STRs have a much larger number of alleles which can be detected in a population analysis.

Consequently, STRs have a higher proportion of heterozygotes which makes them more suitable for mapping purposes. Polymorphisms in STRs is common in populations which makes them valuable tools in genetic mapping.

4. Variable Number Tandem Repeats:

VNTRs, also called minisatellite markers, the repeat unit is a little larger than in STRs, from seven to a few tens of base pairs long. The VNTR loci in humans are 1 to 5 kilo-base sequences containing repeat units about 15 to 100 nucleotides long. VNTR loci also show polymorphisms. Due to the greater length of VNTR repeats that makes PCR unsuitable, analysis of VNTRs relies on restriction digestion and Southern blotting.

The entire genomic DNA is cut with a restriction enzyme which cuts on either side of the VNTR locus, but does not have a target site within the VNTR arrays, followed by Southern blotting. The VNTR specific probe against a particular repeat sequence of the VNTR locus, will bind at all locations of the repeat sequence in the genome, resulting in a large number of different sized fragments.

The number of tandem repeats is variable from one individual to the other, therefore Southern blot provides a distinct distinguishing pattern of fragments for a single individual. These patterns are also referred to as DNA fingerprints. The technique finds useful application in identification of individuals and in deciding parentage.

5. Microsatellite Markers:

Variable numbers of di-nucleotides repeated in tandem, called microsatellite markers, are dispersed in the genome. The most common type are CA and the complementary GT repeats. Probes are designed for detection of DNA regions surrounding individual microsatellite repeats by using PCR.

The procedure is explained by taking the example of human DNA as follows. Human genomic DNA is subjected to restriction digestion by an enzyme such as Alu l, that will result in fragments about 400 base pairs in length. The fragments are cloned into a vector and Southern blotting is carried out.

To identify genomic inserts that contain CA/GT di-nucleotides, probes specific for these di-nucleotides are used. Sequence of the positive clones is determined, on the basis of which PCR primers are designed that will hybridise with single-copy DNA sequences flanking the specific tandemly repeated microsatellite sequences. PCR amplification is carried out using these primer pairs and genomic DNA.

Thus, if any size variation exists in the stretch of tandemly repeated microsatellite sequence, it would be detected through gel electrophoresis of the DNAs from different individuals. The size variations may differ among the different individuals, all these variations could be determined. A size variation results in amplification product of a different size and represents a marker allele.

6. Randomly Amplified Polymorphic DNA:

RAPDs are based on random PCR amplification. The procedure is carried out by randomly designing primers for PCR which will amplify several different regions of the genome by chance. Such a primer results in amplification of only those DNA regions that have near them, inverted copies of the primer’s own sequence.

The PCR products consist of DNA bands representing different sizes of the amplified DNA. The set of amplified DNA fragments is called randomly amplified polymorphic DNA (RAPD). Certain bands may be unique for an individual and can serve as DNA markers in mapping analysis.