In this article we will discuss about DNA (Deoxyribonucleic Acids). After reading this article you will learn about: 1. Molecular Structure of DNA 2. Alternative Forms of DNA 3. Repetitive 4. C-Value 5. Sequencing .
Contents:
- Molecular Structure of DNA
- Alternative Forms of DNA
- Repetitive DNA
- C-Value of DNA
- Sequencing of DNA
Contents
1. Molecular Structure of DNA:
Watson-Crick Model of DNA:
In 1953, J.D. Watson and F.H.C. Crick deduced the three-dimensional model of DNA structure, which immediately helped them to infer the mechanism of DNA replication. This brilliant contribution soon appeared to be the starting point for the revolution in biological thinking as it threw light on the gene function in molecular terms.
Crick and Watson proposed a new model for DNA in 1953 on the basis of X-ray diffraction measurements obtained by Wilki ns in the lithium salt of DNA, which showed the repeat distance to be 34Å and strong maximal reflections at 3.4Å, plus the chemical data obtained by E. Chargaff (1950). For these contributions, Watson, Crick, and Wilkins got the Nobel Prize in medicine in 1962.
The important features of their model of DNA are:
(i) Two spiral chains twisted around each other in a regular fashion which again are coiled around a common axis to form a double helix. The chains are antiparallel, i.e., vectorial in nature, which can be said to have a direction or polarity.
(ii) The sugar-phosphate backbones remain on the outside, whereas the purine and pyrimidine bases are on the inside of the helix. The base planes are perpendicular to the helix axis, while the planes of the sugars are almost at right angles to those of the bases.
(iii) The diameter of the DNA double helix is 20Å or 2.0 nm. The bases are stacked at a center- to-center distance of 3.4Å or 0.34 nm from each other and related by a rotation of 36°. So the double helix makes a complete turn of 360°.
The length of a complete turn of the helix is 34Å or 3.4 nm. Hence, the helical structure repeats after 10 residues, i.e., there are 10 base pairs per turn. The helix has one shallow (approximately 12Å) groove or minor groove and one deep groove (approximately 22Å) or major groove across.
(iv) The two chains are held together by hydrogen bonds between pairs of bases. Adenine is always paired with thymine and guanine with cytosine.
(v) The base sequence along a polynucleotide chain is variable and a specific sequence of bases carries the genetic information.
Watson and Crick deduced the structure of DNA in its B lattice configuration which is stable at 92% relative humidity and in solutions of low ionic strength. It consists of two right-handed helical polynucleotide chains of opposite polarity with plectonemic coiling around the same axis to form a double helix.
The important aspect of the model is the specificity of the base pairing. The pairs are arranged in such a fashion that a pyrimidine of one chain always pairs with a purine of the opposite chain, and vice versa. Due to the steric restriction and hydrogen bonding of the bases, a regular helix of the sugar-phosphate backbone of each polynucleotide chain is formed. The space in between the two helices is always approxiately 11Å.
Only certain base pairs can be accommodated in this space. These allowed base pairs are adenine with thymine and guanine with cytosine. The space is insufficient for two purines and more than enough for two pyrimidines. Pairing is also restricted by hydrogen bonding factor.
The hydrogen atoms in the purine and pyrimidine bases have well defined positions. Adenine cannot pair with cytosine because there would be two hydrogens near one of the bonding positions and none at the other.
Likewise, guanine cannot pair with thymine. Adenine forms two hydrogen bonds with thymine, whereas guanine forms three with cytosine. So, specific purine-pyrimidine pairing is a must because of steric as well as hydrogen bonding factors.
This base pairing rule was strongly supported by the earlier studies of E. Chargaff and his collaborators in the period 1950-1953 regarding base composition of DNA isolated from all types of organisms. All ‘normal’ DNAs exhibit certain chemical regularities that are known as Chargaff’s rules according to which A = T and G = C; as a corollary Σ purines (A + G) = Σ pyrimidines (C + T); also (A + C) = (G + T).
In contrast to the base equivalence there are wide variations in the A + T / G + C contents of different DNA species. Higher plants and animals have an excess of A + T over G + C in their DNA, whereas among the viruses, bacteria and lower plants there are much more variations, both A + T rich and G + C rich species occur. DNA molecules with a high G + C content are more resistant to thermal melting than A + T rich molecules.
The Watson – Crick model of DNA double helix is right handed, i.e., the turns run clockwise looking along the helical axis. This model represents the ‘B form’ of DNA.
2. Alternative Forms of DNA:
Recently it has been found that DNA may exist in other forms depending on the number of nucleotides per turn and the distance between adjacent repeating units. This is mainly achieved by changes in the rotation of groups when appropriate changes are made in the conditions. General characteristics of the different forms of DNA are summarized in the following table.
The A form is found in fibres of 75% relative humidity and requires the presence of Na+, K+ or Ca2+ as the counter ions. The bases do not lie flat, but are tilted 20° away from the perpendicular to the helical axis. A form is probably very close to the conformation of double stranded regions of RNA hybrid duplexes with one strand of DNA and one strand of RNA.
Watson and Crick constructed the model of DNA in its B lattice configuration found in fibres of very high (92%) relative humidity and in solutions of low ionic strength. This B form of DNA having major and minor grooves is thought to prevail in the living cell.
The C form occurs when DNA fibres are maintained in 66% relative humidity in presence of lithium ions. It does not occur in vivo. These three forms of DNA are available in all DNAs irrespective of sequence.
The D and E forms have the fewest base pairs per turn (only 8 & 7.5 respectively) and are lacking in guanine.
Z DNA provides the most striking contrast with the classical structural families. It is a left- handed double helical DNA containing about 12 residues per turn. The structure was proposed by Alexander Rich and his coworkers. The path of the sugar-phosphate backbone is zigzag in nature. For that reason it is known as Z DNA.
This structure is found in fibres having alternating purine- pyrimidine sequences. It exists only at very high salt concentration.
A number of other sequence dependent structural, variations have been detected that may serve locally important functions in DNA metabolism. For example, some sequences (four or more adenine residues) cause bends in the DNA helix. This bending is important in the binding of some proteins to DNA. A common type of sequence found in DNA is a palindrome.
The term is applied to regions of DNA in which there are inverted repetitions of base sequence with two fold symmetry occurring over two strands of DNA. Such sequences are self-complementary within each of the strands and therefore have the potential to form hairpin or cruciform structures.
When the inverted sequence occurs within each individual strand of the DNA, the sequence is called a mirror repeat. Mirror repeats cannot form hairpin or cruciform structures.
A particularly unusual DNA structure, known as H-DNA, is found in polypyrimidine/polypurine tracts that also incorporate a mirror repeat within the sequence. A novel feature of H-DNA is the pairing and inter-winding of three strands of DNA to form a triple helix.
These structural variations serve as the important sites of initiation or regulation in DNA metabolism (replication, recombination and transcription).
3. Repetitive DNA:
Stretches of DNA up to several thousand bases that occur in multiple copies in an organism’s genome is called repetitive DNA. The repeated sequences often remains tandemly oriented. Repetitive DNA often represents a major component (20-50%) of the eukaryotic genome. The first evidence for repetitive DNA came from density gradient analysis of eukaryotic DNA.
When eukaryotic DNA is isolated, fragmented and centrifuged to equilibrium in a CsCl2 density gradient, it usually reveals the presence of one large band of DNA (usually called main band DNA), and one to several small bands, called satellite bands. The DNA in this band is often referred to as satellite DNA (Fig. 3.50).
Analysis of isolated satellite DNA shows that it contains repeated sequence of several base pairs. By in situ hybridization chromosomal locations of several satellite DNAs have been determined. It usually involves annealing single strands of isolated radioactive satellite DNA directly to denatured DNA in chromosome squash preparation.
After washing out the non-hybridized radioactive materials, the locations of satellite DNA sequences in chromosome are determined by autoradiography.
From renaturation kinetics study it has been observed that the eukaryotic genome chracteristically contains (1)a fraction (up to 90%) of unique or single copy DNA, (2) a fraction of middle repetitive DNA sequences present in 10 to 105 copies, and (3) a fraction of highly repetitive DNA sequences present in greater than 105 copies.
Middle repetitive class appears to be quite heterogeneous in certain eukaryotes. The highly repetitive DNA contains both satellite and non-satellite DNA sequences. Much of the genome consists of middle repetitive sequences interspersed with large blocks of single copy or unique DNA.
4. C-Value of DNA:
C-value is the total amount of DNA in a haploid genome. A fixed c-value is the characteristic of each living species. Among the living organisms, there is enormous variation in the range of c-values from as little as less than 106 bp for a mycoplasma to as much as 1011 bp for some plants and amphibians. C-value changes in different evolutionary groupings.
There is an increase in c-value with the increase in complexity of the organism. The rough correlation between the c-value of an organism and complexity of its morphology and metabolism has numerous exceptions, known as c- value paradox.
It is defined as the exceptions to the rule that the quantity of an organism’s genetic material (c-value) correlates with the complexity of tis morphology and metabolism. It is the descrepancy between genome size and genetic complexity.
For example, the genomes of lung fishes are 10 to 15 times larger than those of mammals, although mammals remain highest in evolutionary scale. Likewise some algae have genomes 10 times larger than angiosperms. In the organism whose genome size is large but the complexity is less much of its extra DNA remains unexpressed, but its function is largely a matter of conjecture.
The apparent number of genes like the overall quantity of DNA roughly parallels organism’s complexity. Thus human has 75000 genes compared to E. coli having 4288 genes.
The c-value paradox expresses the existence of two features — (1) there is an excess of DNA compared to the amount that codes for proteins, (2) there are large variations in c-values between certain species whose apparent complexity does not very much.
5. Sequencing of DNA:
Nowadays, DNA sequencing is one of the central techniques to molecular biology. The usage of codons, mutations, identification of gene regulatory sequences etc., can only be elucidated by analyzing DNA sequences.
The sequencing procedure consists of the following steps:
(i) Cleavage of the whole DNA strand into specific small fragments.
(ii) Determination of the sequence of nucleotides in each fragment.
(iii) Determination of the order of the fragments in the original DNA polymer by repeating the preceding steps to yield a set of overlapping fragments of the cleavage points in the first step.
Fragmentation of DNA Strand by Restriction Endonucleases:
After 1975, DNA sequencing technology has been advanced very much with the discovery of restriction endonucleases that cleave DNA at specific sites. These were originally identified that the presence of these enzymes in the bacteria restricts the growth of bacteriophages, hence the name restriction endonucleases. About 400 such enzymes, isolated from different organisms, are now known.
The nomenclature of restriction endonucleases is done by a standard procedure, with particular reference to the organism from which it is isolated. The first letter of the enzyme denotes the genus and the next two letters denote the species of the bacteria.
A subunit is used to identify the strain while the final number indicates the order of discovery of the enzyme. For example Eco RI is from Escherichia coli while Hae III is from Haemophilus agegypticus.
The bacteria protect their own DNA from nucleolytic attack by methylating the bases at susceptible sites, a chemical modification that blocks the actions of the enzyme. The sequences recognized by these enzymes are 4 to 8 nucleotides long and characterized by a particular type of internal symmetry.
The enzyme Eco RI recognizes the following sequence:
This segment is said to have twofold rotational symmetry because it can be rotated 180° without change in base sequence. This sequence is said to be a palindrome. The spliced DNA fragments thus have staggered cuts to produce sticky ends (cohesive ends).
Some restriction enzymes (e.g., Hpa HI) cleave the two strands of DNA at the symmetry axis to yield restriction fragments with fully base-paired blunt ends. DNA fragments with sticky ends are particularly useful in recombinant DNA technology.
By using different restriction endonucleases for the digestion of a particular DNA, a restriction map with characteristic sites of action can be constructed. The fragmented DNA pieces can be isolated by techniques such as agarose gel electrophoresis.
(i) Chemical Cleavage Method:
DNA can be rapidly sequenced by specific chemical cleavage. The method was devised by Allan Maxam and Walter Gilbert. In this method DNA is labelled at one end of one strand with 32P by the enzyme polynucleotide kinase which inserts 32P at the 5′ hydroxyl terminus.
The labelled DNA is partially cleaved at each base to produce a set of radioactive fragments extending from the label point to each of the position of that base. For example, if the sequence is
5’—32P—ATGCATCG—3′
the labelled fragments derived due to specific cleavage on the 5′ side would be
These cleavage products are then separated by polyacrylamide – gel electrophoresis. The gel chromatograms are then autoradiographed. Reading all the seven bands in ascending order gives the sequence 5′ ATGCATCG 3′. Different reagents are used to cleave the DNA chain.
Dimethylsulphate damages the purine bases by methylation of adenine at N – 3 and guanine at N – 7, the glycosidic bonds of which are then broken at neutral pH leaving the sugar without a base.
Cytosine and thymine are split by hydrazine. The backbone is then cleaved by piperidine, which displaces the products of the hydrazine treatment and catalyzes elimination of the phosphates.
(ii) Enzymatic Method:
Base sequence of DNA may also be determined by enzymatic replication techniques developed by F. Sanger and his associates. For this contribution he was awarded the Nobel Prize for the second time in 1980. DNA polymerase I is used to copy a particular sequence of a single-stranded DNA. A complementary primer fragment, obtained from a restriction enzyme digest is required for synthesis.
The incubation mixture contains, along with the four labelled nucleotide triphosphate precursors, a 2′, 3′-dideoxy analog of one of them. The incorporation of this analog blocks further growth of the new chain because it lacks a 3′-hydroxyl terminus to form the next phosphodiester bond.
The four sets of fragments are then separated by gel electrophoresis and the base sequence of the new DNA is determined from the autoradiograph of the gel.
Another excellent technique was developed by Arthur Kornberg, John Josse and Dale Kaiser from Standford, in 1959. The technique is known as nearest neighbour frequency analysis. The technique revealed clearly as to how any of the four bases is located next to any other base in a particular single strand of DNA.
Large-scale sequencing is done by automation, in this method the primers used in four chain- extension reactions are each linked to different fluorescent dyes. The separately reacted mixtures are then combined and electrophoresed in a single lane.
The terminal end of each fragment is identified by its chracteristic fluorescence. The base sequences are identified by computerized fluorescence detector. By this automated detecting system 1000 bases can be identified per day.