The below mentioned article provides a note on overlapping genes.
It is now becoming apparent that in both prokaryotes and eukaryotes a nucleotide sequence can code for more than one protein. The nucleotide sequence of a gene may be translated into a complete polypeptide chain. Then a part of the same sequence may also code for another protein of a short length.
There are several ways of doing this. Sometimes the second polypeptide may be initiated or terminated within the sequence of the first protein. For example, the gene coding for the coat protein of the RNA phage Qβ.
The synthesis of most proteins is terminated at UGA codon when they are about 14,000 Daltons. However, a protein of about 36,000 Daltons may be synthesised when the sequence beyond UGA is also read in continuation and translated.
The tumour virus SV40 shows further complexities in the fine structure of a gene. Its genome is a single circular DNA molecule consisting of more than 5,200 nucleotide pairs. Soon after infection, one half of one strand of viral DNA (designated early region) transcribes mRNA which is translated into 2 proteins: a small t antigen (mol. weight 17,000-20,000), and a large T antigen (mol. weight 94,000).
Subsequent to this the viral DNA replicates itself. After replication, the second half of the viral genome from the opposite strand (late region) transcribes the late mRNA which leads to the synthesis of the following 3 structural proteins: VP1 (mol. weight 43,000), VP2 (mol. weight 39,000) and VP3 (mol. weight 27,000).
If the total molecular weight of the proteins synthesised by the SV40 genome is considered, it is much more than the size of the DNA molecule (5,200 bases, i.e. 1733 triplets of bases) can code for. Actually there are a number of ways by which the virus uses the same gene sequence to code for more than one polypeptide chain. From these situations the concept of overlapping genes has emerged.
The nucleotide sequence that codes for VPS also codes for the carboxy-terminal end of VP2. In this way VP3 has some amino acids identical to those at the C-terminus of VP2. Another kind of overlap is indicated by the presence of identical amino acid sequences at the C-terminal ends of both VP2 and VP3 on the one hand and at the amino terminal end of VP1.
There is a sequence of about 120 nucleotides which first codes for the amino acids of VP2 and VPS, and then it codes for the amino terminal portion of VP1. Similarly, it has also been found that the amino terminal sequence of both large T and small t antigens (specified by the early region) are identical.
Overlapping Genes in φX174:
The bacteriophage φX 174 is an extremely small icosahedral virus containing a single stranded DNA molecule about 5,400 nucleotides long. Nine genes A, B, C, D, E, J, F, G and H have been identified on φX DNA and they code for 9 specific proteins.
The combined weight of these proteins is 250,000 Daltons. A genome of about 5,400 bases has a maximum coding capacity for about 1800 amino acid residues with a combined weight of about 200,000.
Obviously, the total mass of proteins coded for is significantly greater than is expected from the amount of DNA contained in φX genome. The works of Sanger and Coulson (1975), Barrelled (1976) and Weisbeek (1977) have revealed several mechanisms underlying the compact genetic organisation in φXDNA.
The authors have employed the techniques of restriction mapping using several different enzymes such as Hind II, Hae III, Hpa II and Alu I. By applying Sanger and Coulson’s ‘plus and minus’ technique for determining DNA, sequences, Barrell et al., have found the sequences of genes D, E and J and in fact the entire sequence of φX174 DNA. Weisbeek have sequenced genes A and B.
The termination codon of gene D overlaps the initiation codon for gene J by one nucleotide. Further, the mutation amb 6 determined genetically to lie in gene J actually lies 179 nucleotides before the initiating codon for J. Gene J therefore lies in another gene.
The location of gene E has been identified from two amber mutations am3 and am6. Both the mutations lie within the sequence of gene D. Analysis of the sequence around the amber mutations has shown that the sequence of gene E (273 nucleotides long) overlaps the sequence of gene D (456 nucleotides long); that both genes are translated in two reading frames in two different phases.
The sequence for gene E is displaced one base to the right from that of gene D. By identifying the initiation and termination codons of gene D, it is concluded that the sequence of gene E lies in the latter part of gene D, and specifies a protein of about 10,000 Daltons. Thus gene E overlaps a portion of gene D; the sequence in the overlap region codes for two proteins in two phases (Fig. 22.12).
The genes A and B have been characterised by Weisbeek (1977) by mapping positions of several mutations by a marker rescue technique. All the mutations in gene B have been identified within the sequence of gene A.
However, the nonsense mutations in gene A do not impair the function of gene B; similarly nonsense mutations in gene B do not affect the activity of gene A; the genes A and B belong to different complementation groups.
The study of amber mutations in gene A which results in synthesis of shorter chains of A and A* proteins has proved that gene B is completely contained within gene A and is translated in two different reading frames: one which leads to synthesis of A and A * protein, the other for synthesis of B protein. Smith (1977) working in Sanger’s lab have shown that gene A not only overlaps gene B but even extends beyond gene B.
Overlapping genes have also been identified in the single-stranded DNA virus G4 which is closely related to and has the same order of genes as φX174. In G4 also gene A overlaps gene D, and gene E overlaps gene D. It also has a gene if containing portions of sequence of gene A and C. The last 86 nucleotides of gene A and the first 89 nucleotides of gene C together constitute the sequence of gene K.