In this article we will discuss about the repeated sequence of chromosomal DNA.
Eukaryotic genomes contain large amount of repetitive sequences, sometimes present in hundreds or thousands of copies per genome. The understanding of repetitive sequences is based on studies conducted on denaturation (separation of DNA double helix into its two component strands) and renaturation (re-association of the single strands into stable double-stranded DNA molecules) of DNA.
The two strands of a DNA molecule are held together by weak non-covalent bonds. When DNA is warmed in saline solution, a temperature is reached when two strands begin to separate, leading to single-stranded molecules in solution. This is called thermal denaturation or DNA melting.
The progression of thermal denaturation can be followed by observing increase in absorbance of the dissolved DNA. The nitrogenous bases of DNA absorb ultraviolet radiation with an absorbance maximum near 260 nm. In single stranded DNA, the hydrophobic interactions caused by base stacking are increased which increases the ability of the bases to absorb ultraviolet radiation.
The temperature at which the shift in absorbance is half completed is called the melting temperature (Tm) of DNA. The higher the GC content of the DNA, the higher the Tm. The reason being that there are 3 hydrogen bonds between G and C which confer stability on GC pairs, in comparison with AT pairs that are joined by two hydrogen bonds. Thus AT rich sections of DNA melt before the GC rich.
When denatured DNA is cooled slowly, the single strands reassociate to form double-stranded molecules, and properties of double helical DNA are restored, that is, it absorbs less ultraviolet light. This is called renaturation or reannealing. As described later, the property of reannealing has led to the development of methodology called nucleic acid hybridisation.
Britten and Kohne (1967) studied renaturation kinetics of DNA and discovered repeated sequences.
Walker (1969) distinguished 3 kinetic classes of DNA:
Fast reannealing fraction or highly repetitious DNA,
Intermediate reannealing fraction or moderately repetitious DNA, and
The slow annealing unique or single copy fraction.
Kinetic Classes of DNA:
1. Highly Repeated DNA Sequences:
Also called reiterated or redundant DNA. Consists of sequences present in at least a million copies per genome, constitutes about 10% of the total DNA in vertebrates. Such sequences are usually short, about a few hundred nucleotides long, and present in clusters in which the given sequence is repeated over and over again without interruption in tandem arrays (end-to-end manner). Highly repeated sequences include the satellite DNAs, minisatellite DNAs and the microsatellite DNAs.
Satellite DNA:
Consists of short sequences about 5 to 100 bp in length. During density gradient centrifugation, satellite DNA separates into a distinct band, because the base composition of satellite DNA is different from that of bulk DNA. A species may have more than one satellite sequence as in Drosophila virilis which has 3 satellite sequences, each 7 nucleotides long.
Satellite DNA is present around centromeres in centromeric heterochromatin. In humans, 3 blocks of satellite DNA are present in the secondary constrictions of chromosomes 1, 9 and 16. A fourth block is present at the distal portion of the long arm of the Y chromosome.
Minisatellite DNA:
These usually occur in clusters with about 3000 repeats, their size ranging from 12 to 100 bp in length. Minisatellite sequences occupy shorter stretches of the genome than the satellite sequences. Minisatellites are often unstable and the number of copies of minisatellites can increase or decrease from one generation to the next. The length of the minisatellite locus could vary within the same family, and in the population (polymorphism). Changes in minisatellite sequences can affect expression of nearby genes.
Microsatellite DNA:
These include the shortest sequences one to five base pairs long, present in clusters of about 50 to 100 base pairs in length. They are dispersed evenly throughout the DNA. The human genome contains about 30,000 different microsatellite loci. Changes in the number of copies of certain microsatellite sequences are responsible for some inherited diseases.
2. Moderately Repeated DNA Sequences:
These are partially redundant. The sequences are highly similar but may not be identical. This fraction includes sequences that are repeated within the genome from a few times to tens of thousands of times. The genes for RNAs and histones are of this type. They constitute 15% of the DNA in mouse, 45% in Xenopus, and 80% in wheat, onion and salmon.
3. Unique or Single-Copy Sequences:
These sequences are present only once in the genome, or at the most, in few copies. They have a slow rate of re-association. Most of the structural genes are found among the unique sequences. Mouse contains 70% and Xenopus about 55% of single copy sequences.
Dispersed Repeated Sequences:
Unlike repeated DNA described above in which repeated sequences are clustered in a tandem manner, there are some repeat sequences that are scattered throughout the genome, referred to as dispersed or interspersed DNA, instead of being clustered as tandem repeats. Dispersed repeated sequences have been studied in many organisms.
These are families of repeated sequences interspersed throughout the genome with unique sequence DNA. Often, small numbers of families have very high copy numbers and make up most of the dispersed repeated DNA in genome. In general, two interspersion patterns are encountered which allow these sequences to be classified as SINEs (short interspersed elements) or LINEs (long interspersed elements).
Families of SINEs have sequences about 100 to 400 bp long, whereas LINEs have about 1000 to 7000 bp . All eukaryotic organisms have LINEs and SINEs, although their relative proportions vary widely. Drosophila and birds have mostly LINEs, humans and frogs have mostly SINEs. LINEs and SINEs represent a significant proportion of all the moderately repetitive DNA in the genome.
Mammalian diploid genomes have about 500,000 copies of the LINE-1 (L1) family of repeated sequences representing about 15% of the genome. Other LINE families are much less abundant than LINE-1. Full length LINE-1 family members are 6 to 7 kilo bases long. The full length LINE-1 elements are transposons, that is, they encode enzymes for movement of these elements in the genome.
A good example of SINEs are the Alu sequences in mammalian genomes, so called because they contain a single site for the restriction endonuclease Alu I. Alu sequences are about 300 base pairs long, and about a million such sequences are dispersed throughout the genome, accounting for nearly 10% of the total cellular DNA.
Alu sequences are transcribed into RNA, but they do not encode proteins, and their function is not known. Significantly, like the LINE-1 sequences, Alu sequences are also transposable elements, and capable of moving to different sites in genomic DNA if enzymes required for movement are supplied by active LINE elements.
In Situ Localisation of Satellite DNA:
The precise locations of repeated DNA sequences on eukaryotic chromosomes have been determined by the technique of in situ hybridisation, first developed by Pardue and Gall (1970). The method is based on the fact that only those single strands of DNA/DNA or DNA/RNA hybridise which have complementary base sequences.
Cytological preparations of chromosome spreads are treated with NaOH which dissociates DNA. The preparations are incubated in a solution containing single-stranded nucleic acid molecules (either DNA or transcribed RNA), which are labelled with tritium. The regions of the chromosomes that contain complementary base sequences hybridise with the Corresponding sequences in single- stranded molecules. Their locations are determined by autoradiography.
Using labelled mouse satellite DNA, Pardue and Gall (1970) could determine the location of satellite sequences in the constitutive heterochromatin adjacent to the centromeres of mitotic chromosomes (Fig. 19.2c). Except for Y, all the remaining mouse chromosomes have satellite DNA at the centromeres. Later on many materials have shown satellite DNA in constitutive heterochromatin, that which forms C-bands with Giemsa.
Sometimes there may be more than one type of satellite DNA in a genome. Human chromosomes have 4 satellite sub-fractions present in chromosomes 1, 9, 16 and Y. All the 4 satellite sub-fractions hybridise with chromosome 9. It is also interesting from the evolutionary standpoint that all the human satellite sub-fractions hybridise with monkey and chimpanzee DNA.