In this article we will discuss about the genetic code, gene expression and DNA fingerprinting.

Genetic Code:

Genetic code is the relationship between the sequence of nucleotides on mRNA and the sequence of amino acids in the polypeptide. George Gamow, a physicist proposed that in order to code for all the 20 amino acids, the code should be made up of three nucleotides.

Har Gobind Khurana developed a chemical method for the synthesis RNA molecule with defined base combinations (homopolymers and copolymers) to develop genetic code.

Marshall Nirenberg put forward a cell-free system for protein synthesis that helped in-deciphering the code.

Severe Ochoa showed that the polynucleotide phosphorylase also helped in polymerizing RNA with defined sequences in a template-independent manner.

A checker board for the genetic code is given below:

Codons for the Various Amino Acids

Salient Features of Genetic Code:

(i) Codon is a triplet. Out of the 64 codons, 61 codons code for 20 amino acids and 3 codons (UAA, UGA and UAG) do not code for any amino acids. Thus, they function as terminating codons.

(ii) Genetic code is unambiguous and specific. Thus, one codon codes for only one amino acid.

(iii) Some amino acids are coded by more than one codon. Hence, the code is degenerate.

(iv) The codon is read in mRNA in a contiguous fashion, i.e., without punctuation. Thus, the code is commaless.

(v) The genetic code is nearly universal, i.e., one codon codes for the same amino acid in all organisms except in mitochondria and few Protozoa.

(vi) AUG is a codon with dual functions. It codes for the amino acid methionine (met), and also acts as an initiator codon.

Mutation:

The sudden inheritable change in the genetic material is defined as mutation.

It is of the following types:

1. Point Mutation:

Mutation in a single base pair, which is replaced by another base pair, e.g., in sickle-cell anaemia. A point mutation in β-globin chain results in the change of amino acid residue glutamate to valine.

2. Frameshift Mutation

It is a change in the reading frame because of insertion or deletion of base pairs. e.g., ATCGCTTATA.

(i) Insertion:

If one or more nucleotides are added in the DNA segment is called insertion. If three or its multiple bases are added, they will not change the reading frame. In-fact, they are a new amino acid, residue glutamate to valine.

(ii) Deletion:

If one or more nucleotides are removed from the DNA segment is called deletion. Here also, if three or its multiple bases are removed, it will not change the reading frame but will remove one or more amino acid, e.g., ACTCTTATA.

tRNA: The Adapter Molecule:

The presence of an adapter molecule, which could read the code on one end and on the other end would bind to the specific amino acids was proposed by Francis Crick. tRNA was known before genetic code and was called snRNA (soluble RNA) but later its role as an adapter molecule was reported.

Structure of tRNA:

tRNA has a structure like clover leaf. But its three dimensional structure depicts that it is an inverted L-shaped molecule.

tRNA have five arms or loops:

(i) Anticodon Loop:

It has bases complementary to the code.

(ii) Amino Acid Acceptor End:

At this end, it binds to amino acids.

(iii) T-Loop:

It helps in binding to ribosome.

(iv) D-Loop:

It helps in binding aminoacyl synthetase.

(v) Variable Loop:

It is variable in both nucleotide composition and in length.

tRNA-the Adapter Molecule

Initiator tRNA:

It is a specific tRNA for the process of initiation and there are no tRNAs for stop codons.

Translation:

This process requires transfer of genetic information from a polymer of nucleotides to a polymer of amino acids. The process of polymerisation of amino acids to form a polypeptide is known as translation. Thus, the proteins are synthesised from mRNA with the help of ribosomes.

Ribosome:

It is responsible for protein synthesis. It consists of structural RNAs and around 80 different proteins.

Ribosome exists as two subunits in inactive stage:

(i) Small Subunit:

When a small subunit encounters an mRNA, translation of mRNA to protein begins.

(ii) Large Subunit:

It consists of two sites where amino acids can bind to each other and to be close enough for the formation of a peptide bond. Ribosome also acts as a catalyst (23 srRNA in bacteria is the enzyme-ribozyme) for peptide bond formation.

Translational Unit:

It is the sequence of RNA flanked by the start codon (AUG) and the stop codon. It codes for a polypeptide.

Un-translated Regions (UTR):

These are some additional sequences in an mRNA that are not translated. They are present at both the ends, i.e., at 5′ end (before start codon) and at 3′ end (after stop codon)

Stages of Protein Synthesis:

Synthesis of proteins takes place through the three stages i.e., initiation, elongation and termination.

Initiation:

In prokaryotes, initiation requires the ribosome (large and small subunits), the mRNA, initiation tRNA and 3 initiation factors (IFs). For initiation to take place, the ribosome first binds to mRNA at the start codon (AUG) that is always recognised by the initiator tRNA.

Activation of Amino Acid:

The formation of peptide bonds require energy and in first phase, the amino acids are activated in the presence of ATP and linked to their tRNA also known as charging of tRNA or amino-acylation of tRNA. In the presence of ATP, amino acids become activated by binding with aminoacyl tRNA synthetase enzyme.

Transfer of Amino Acid to tRNA:

The cap region of mRNA binds to the ribosome (smaller subunit). Ribosome has two sites, i.e., A-site and P-site. First, the smaller subunit binds with the initiator tRNA. Then it binds to the larger subunit, so that the initiation codon (AUG) lies on the P-site. Then the initiatior tRNA (methionyl tRNA) binds to the P-site.

Elongation of Polypeptide Chain:

Another charged aminoacyl tRNA complex binds to the A-site of the ribosome. A peptide bond tends to form between carboxyl group (—COOH) of amino acid at P site and amino group (—NH3) of amino acid at A-site by the enzyme peptidyl transferase.

During this stage, ribosome moves from codon to codon along the mRNA in the 5′ → 3′ directions.

Amino acids are then added one-by-one in the sequence of codon and translated into a polypeptide sequences.

Termination of Polypeptide:

The A-site of ribosome reaches a termination codon. The termination codon does not code for any amino acid. Now no tRNA binds to the A-site of ribosome. At the end, a release factor binds to a stop codon and translation is terminated. Thus, the complete polypeptide will be released from the ribosome.

Translation

Regulation of Gene Expression:

Gene expression results in the formation of a polypeptide. The regulation of gene expression may occur at various levels. In eukaryotes, regulation of gene expression means to control the amount of time of formation of gene products, according to the requirement of the cell.

It takes place at the following levels:

1. Transcriptional Level:

A primary transcript is formed.

2. Processing:

Level Regulation of splicing.

3. mRNA:

It is transported from nucleus to the cytoplasm.

4. Translational Level:

In prokaryotes, gene expression is regulated by controlling the rate of initiation of transcription.

Genes in a cell are expressed to perform a particular function or a set of functions. The metabolic, physiological or environmental conditions regulate expression of genes.

The development and differentiation of embryo into adult organisms are also a result of coordinated regulation of expression of several set of genes.

In a transcriptional unit, the activity of RNA polymerase at a given promoter is in turn regulated by the interaction with accessary proteins.

The accessibility of promoter regions of prokaryotic DNA in many cases is regulated by the interaction of proteins with sequences termed as operators. The sequences of the operator bind a repressor protein. Each operon has its specific operator and specific repressor. For example, lac operon interacts with lac repressor only.

Lac Operon:

Jacob and Monod first proposed the concept of operon. It is a transcriptionally regulated system, where a polycistronic structural gene is regulated by a common promoter and regulatory genes, e.g., lac (lactose) operon, trp (tryptophan) operon, ara (arabinose) operon, his (histidine) operon and val (valine) operon.

Structure of Lac Operon:

(i) One regulatory gene (the i gene), which codes for the repressor of the lac operon.

(ii) Three structural genes are

(a) z gene codes for beta-galactosidase (β-gal), that helps in the hydrolysis of disaccharide into monomeric units i.e., lactose into galactose and glucose.

(b) y gene codes for permease, that increases the permeability of the cell to β-galactoside.

(c) a gene codes for a transacetylase.

Note:

It does not, refer to as inducer, rattier it is derived from the word inhibitor.

Lactose Regulates Switching On and Off of the Operon:

Lactose is known to be the inducer and the substrate for the enzyme β-galactosidase. If lactose is provided as the carbon source in the growth medium, in the absence of the preferred carbon source such as glucose, the lactose is transported to the cells by the action of enzyme permease and induces the operon.

Inducing of Operon by Lactose:

The ‘i’ gene synthesizes the repressor of the operon. Repressor binds to the operator region of the operon, preventing RNA polymerase from transcription of operon and is inactivated by the interaction with the inducer (lactose) or allolactose.

Now that the repressor is inactivated, RNA polymerase can access to the promoter and transcription continues. Regulation of lac operon by repressor is referred to as negative regulation. Lac operon can work under the control of positive regulation also.

Lac Operon

When Lactose is absent:

‘i’ gene regulates and mRNA in the absence translate repression. The binds to the operator region of the operon and as a result prevents RNA polymerase to bind to the operon.

The operon will switched off in this situation.

When Lactose is Present:

Lactose acts as an inducer here and binds to the repressor. Thus, it forms an inactive repressor. The repressor fails to bind the operator region. The RNA polymerase binds to the operator and transcripts lac mRNA. Lac mRNA is known to be polycistronic, which produces all three enzymes, e.g., β-galactosidase, permease and transacetylase. In this situation, operon will be switched on this situation.

Human Genome Project (HGP):

It was a mega project of 13 years sequencing human genome that launched in the year 1990 and completed in 2003. It was coordinated by the US Department of Energy and the National Institute of Health.

The two main factors that contributed in the completion of this project are:

(i) Availability of simple and fast techniques of the determination of DNA sequences.

(ii) Genetic engineering techniques, which help to isolate and clone any segment DNA.

Human genome project was called a mega project, because of the following reasons:

(i) The human genome is said to have approximately 3 x 109 bp and if the cost of sequencing is US $ 3 per bp then total approximate cost is about US $ 9 billion.

(ii) Approximately 3300 books would be needed to store the complete information, if the sequence were to be stored in a typed form in books whose each page contained 1000 letters and each book contained 1000 pages.

(iii) The large and enormous amount of data that was generated also necessitates the use of high speed computational devices for the storage of data, retrieval and analysis.

Human Genome Project was in close association with the new and rapid developing area of biology known as bioinformatics.

Goals of HGP:

(i) To identify all the genes (approx. 20,000-25,000) in human DNA.

(ii) To determine the sequences of the chemical base pairs (3 billion) that constitute human DNA.

(iii) To store up this information in databases.

(iv) To improve the tools required for data analysis.

(v)Transfer the related technologies to other sectors (like industries).

(vi) Address the Ethical, Legal and Social Issue (ELSI) that could arise from the project.

Benefits from HGP:

(i) The knowledge of the DNA variations among individuals can lead to discover new ways in diagnosis, treatment and prevention of various diseases affecting mankind.

(ii) They can help learning about DNA sequences of non-human organisms.

(iii) The knowledge of their ability to solve challenges in healthcare, agriculture, energy production, environmental remediation can be obtained.

For example, such organisms that have been sequenced, i.e., bacteria, yeast, Drosophila (fruit fly), plants (rice and Arabidopsis).

Methodologies of HGP:

Two major approaches are as follows:

(i) Expressed Sequence Tags (ESTs):

This method focuses on identifying all the genes that are expressed as RNA.

(ii) Sequence Annotation:

This method involves sequencing the whole set of genome (that contained all coding and non-coding sequence) and then assigning functions to the different regions in the sequence.

Sequencing of Genome:

(i) First the total DNA in a cell is isolated and converted to smaller fragments.

(ii) Then these fragments are cloned in a host {e.g., bacteria and yeast) using specialised vectors BAG (Bacterial Artificial Chromosome) and YAC (Yeast Artificial Chromosome).

(iii) The cloning of DNA fragments lead to the amplification of each fragment, which helps in easy sequencing process.

(iv) Using automated DNA sequences, the DNA fragments were sequenced which work on the principle developed by Fredrick Sanger.

(v) These sequences of DNA fragments were arranged on the basis of some overlapping regions present in them.

(vi) The specialised computer based programmes were developed for the alignment of these sequences.

(vii) These sequences were annotated and assigned to each chromosome.

Salient Features of Human Genome:

(i) There are 3164.7 million nucleotide bases in the human genome.

(ii) In an average gene, there are 3000 bases. The largest known human gene is Dystrophin (2.4 million bases).

(iii) Total number of genes in human genome = 30,000, Almost all (99.9%) of the nucleotide bases are exactly same in every human individual.

(iv) For over 50% of the discovered genes, the function; are unknown.

(v) Less than 2% of the genome codes for proteins.

(vi) Repetitive sequences are stretches of DNA sequences, which are repeated (sometimes 100-1000 times).

(vii) They have no direct coding functions. They help in understanding chromosome structure, dynamics and evolution.

(viii) Chromosome 1 has the maximum number of genes (2968).

(ix) Chromosome-Y has the least number of genes (231).

(x) There are about 1.4 million locations in human genome, where single base DNA differences occur (SNPs— Single Nucleotide Polymorphisms).

This information is helpful in finding chromosomal locations for the disease-associated sequences.

(xi) The next challenging task was to assign the genetic and physical map on the genome.

This was generated using the information on polymorphism of restriction endonuclease recognition sites and certain repetitive DNA sequences called as microsatellites.

Applications and Future of HGP:

(i) Its knowledge is helpful in research involving biological systems including human biology.

(ii) With the whole genome sequences and newer technologies, we can be systematic in our approach to research questions on a broader scale.

(iii) All the genes in a genome or all the transcript in a particular tissue/organ/tumour can be studied.

DNA Fingerprinting:

It involves the identification of differences in repetitive DNA. Repetitive DNA is a specific region in DNA sequence in which a small stretch of DNA is repeated many times. During density gradient centrifugation, these repetitive DNA are separated from the bulk genomic DNA.

Bulk DNA forms a major peak during centrifugation.

Other small peaks are known as Satellite DNA.

These sequences do not code for any proteins normally, but they constitute a great portion of human genome.

The satellite DNA is classified into many categories such as microsatellite, mini satellite, etc., on the basis of length of segment, number of repetitive units, the base composition (A : T-rich or G : C-rich), etc.

They show high degree of polymorphism. Repetitive DNA forms the basis of DNA fingerprinting.

Polymorphism:

It is the variation in individuals at genetic level. In an individual, DNA from every tissue (e.g., blood, hair follicle, skin, bone, saliva, etc.) shows the same degree of polymorphism. Thus, they become very essential tool in forensic applications. Polymorphism is inherited from parents to children. So it is useful for the identification (forensic application) and paternity testing. Therefore, polymorphism arises due to mutations. It plays an important role in evolution and speciation.

In a population, if an inheritable mutation is observed at high frequency, it is known as DNA polymorphism. There are different types of polymorphism, from single nucleotide change to large scale changes.

Technique of DNA Fingerprinting:

Alec Jeffreys initially developed this technique to find out markers for the inherited diseases that involved Southern blot hybridisation using radiolabelied VNTR. The process is also known as DNA typing or DNA profiling. He used a satellite DNA as probe and called it Variable Number of Tandem Repeats (VNTRs).

VNTRs are the most important fact for DNA fingerprinting. The VNTRs of two persons may be of same length and sequence at certain sites, but vary at others. Therefore, it differs from person to person except the monozygotic (identical) twins. The VNTRs belongs to a class of DNA satellite known as minisatellite. The copy number varies from chromosome to chromosome in an individual.

If the size of VNTR varies in size from 0.1 to 20 kb. Consequently, after the hybridisation with VNTR probe, the autoradiogram gives many bands of different sizes. Thus, these bands will give the characteristic pattern for an individual. The sensitivity of this technique can be increased by using PGR (Polymerase Chain Reaction).

The technique has the following steps:

(i) DNA Isolation DNA is extracted from the cells in a high speed centrifuge.

(ii) Amplification Many copies of the extracted DNA can be made by the use to polymerase chain reaction.

(iii) Digestion of DNA by restriction Paternal endonucleases.

(iv) Separation of DNA fragments by electrophoresis.

(v) Blotting-transfer of the separated DNA fragments to synthetic membranes (like nylon or nitrocellulose).

(vi) Hybridisation, with the help of a radiolabelied VNTR probe (small segments of DNA which help to detect the presence of a gene of a long DNA sequence). These protein target a specific nucleotide sequence that is complementary to them.

(vii) Autoradiography Detection of hybridised DNA fragments by autoradiography.

DNA Fingerprinting

Applications of DNA Fingerprinting:

(a) Used as a tool in forensic investigations.

(b) To settle paternity disputes.

(c) To study evolution, by determining the genetic diversities among population.

Home››Genetics››