In this article we will discuss about the primary structure of peptides and proteins. 

Peptides and proteins consist of chains of amino acids linked by peptide bonds. We observe a structure analogy with the biuret molecule, formed by the condensation of 2 urea molecules (as indicated by its name) with elimination of one molecule of NH3(H2N—CO — NH-CO —NH2) and this is why polypeptides give the so-called biuret reaction characterized by a violet coloration observed upon addition of SO4Cu in alkaline solution.

We distinguish, in general:

1. The Oligopeptides:

Dipeptides (formed by the binding of 2 amino acids), tripeptides (3 amino acids) which do not give the characteristic biuret reaction (some may give an atypical coloration).

2. The Polypeptides:

From tetrapeptides onwards, which all give the biuret reaction.

Proteins are polypeptides, but we will see that in addition to peptide bonds linking the amino acids, there are other types of interactions giving the protein a three-dimensional structure. The size limit beyond which a polypeptide is called protein is not rigorously fixed; this designation is generally reserved to polypeptides of molecular weight > 10 000 and which do not dialyse through a cellophane membrane.

Having isolated a peptide or a protein, one generally determines first the nature of the constituent amino acids, and then, the order in which they are linked (i.e. the primary structure).

I. Amino Acid Composition:

1. Hydrolysis of Proteins:

To split the peptide bonds and liberate the amino acids constituting the proteins one may use acids, bases or proteolytic enzymes. As it is very difficult to obtain the complete enzymatic hydrolysis of a protein, we will not discuss here the hydrolysis using enzymes.

As for the alkaline hydrolysis, it causes structure modifications and rearrange­ments of some amino acids. One therefore carries out acidic hydrolysis by HCl 6N, at 105°, in a tube sealed under vacuum or nitrogen, with a weak protein concentration (0.5 to 1%), generally for a period of 24 hours.

This method has the disadvantage of converting glutamine and asparagine into the correspond­ing dicarboxylic acids (+ NH3) and destroying tryptophan which will have to be titrated separately after hydrolysis by another acid (methane-sulphonic acid).

2. Amino Acid Analysis:

Historically, it was first tried to use specific methods for each amino acid. These methods were either colorimetric or enzymatic. They were all aban­doned due to their lack of specificity or low sensitivity, and more modern techniques of chromatographic separation of amino acids were adopted: these techniques combine excellent reproducibility and very high sensitivity. Besides, they are amenable to complete automation.

Three types of chromato­graphy are now proposed:

a) Ion Exchange Chromatography:

Like electrophoresis, this method divides amino acids in 3 groups:

i. Acidic amino acids, retained by the anion exchanger resins (Dowex 2, Amberlite IRA-400) which are often polyamino resins:

Resin-NH3+OH + R―COO → Resin – NH3+ OOC―R + OH

ii. Basic amino acids, retained by cation exchangers which are often polysulphonated resins (Dowex 50) orpolycarboxylic resins (Amberlite IRC 50)

Resin-SO3 — Na+ + R — NH3+ → Resin – SO3 +H3N — R + Na+

iii. Neutral amino acids which, in principle, are not retained by any of the above 2 categories of resins.

Stein and Moore carried out the complete fractionation of amino acids with a sulphonated polystyrene (SO3Na+) at a low pH where all the amino acids (even the dicarboxylic) are in the form of cations (R―NH3+) and therefore extensively retained at the top of the column where they are localized in a thin band even if the volume of the solution (hydrolysate) is large.

One performs an elution gradient where the pH and ionic force increase gradually; the amino acids are eluted, according to their degree of ionization, in very distinct peaks which are titrated colorimetricaliy with ninhydrin. The system can be automated and used for qualitative and quantitative analysis of a protein hydrolysate (see fig. 1-9).

Profile Obtained after Chromatographic Fractionation

b) Gas Phase Chromatography:

This is a more recent method of com­plete separation of amino acids. These are first rendered volatile by chemical modification of their polar groups: methylation of the amine, alcohol or phenol groups, esterification of carboxylic groups. They are then separated by passage through a capillary column (2 m length, 0.1 mm diameter) with an inner coating of a liquid film. This column is heated at temperatures generally ranging between 100 and 200°C.

The amino acids are partly soluble in this non-volatile stationary liquid phase and partly vaporized. They are then carried along by a vector gas (mobile phase) at different velocities depending on the respective solubilities in the stationary liquid film. This is therefore a liquid-gas partition chromatography. Different devices, now commercially available, apply this technique and allow the titration of very small quantities (of the order of one picomole).

c) Adsorption Chromatography:

This is the latest method. It is a very high resolution, liquid-solid type adsorption chromatography called high per­formance liquid chromatography (HPLC). The amino acids are converted into aromatic derivatives by reaction with a chemical reagent, e.g. Edman’s reagent. They are adsorbed on a column of micro-granular silica (spherical grains of 5 µm diameter) whose OH groups are substituted by long hydrocar­bon chains of 18 carbon atoms.

This highly hydrophobic phase retains the amino acid derivatives depending on the hydrophobicity of the radical R. The elution of the column is performed by gradients of water and acetonitril based solvents and titration is carried out by simple spectrophotometry in U V. Quan­tities of the order of one picomole can be detected.

It must be noted that the commercial devices available for these three methods are now entirely automated thanks to the systematic introduction of microprocessors and micro-computers which control the different operations including the calculation of the quantities titrated and the printing of results (on paper printer or magnetic tape).

3. Results:

Results of the analysis of amino acid content may be expressed in several ways:

i. One can calculate percentages by weight (g amino acid/100 g protein), or express the results in number of residues for 100 g protein or 100 amino acids,

ii. One can draw a nitrogen balance, i.e. calculate the quantity of nitrogen from each amino acid and refer it, either to 100 g protein (in this case, the sum must be equal to the nitrogen content in the protein, about 16%), or to 100 g total nitrogen (in this case, the sum must be near 100, if the titrations of different amino acids were carried out correctly),

iii. One can also — on the basis of weight percentages — calculate the number of residues of each amino acid per molecule of protein when the molecular weight of the latter has been determined (otherwise we can refer the number of residues to an arbitrary weight of 104 or 105 g). In this manner, one obtains, the gross formula of protein after rounding off the figures to the nearest digit. For example, the gross formula of sheep α-corticotrophine is: Ala 3, Arg 3, Asp 2, Glu 5, Gly 3. His 1, Leu 1, Lys 4, Met 1, Phe 3, Pro 4, Ser 3, Trp 1, Tyr 2, Val 3.

II. Study of the Sequence of Amino Acids:

1. Determination of Terminal Amino Acids:

Sequence studies generally begin with the determination of terminal amino acids because this determination is often comparatively easy, and also, because the number of end group gives information on the form of the polypeptide molecule and on the number of polypeptide chains which form the protein molecule: a cyclic chain has no end group; a semi-cyclic chain has one; a linear chain, two; two linear chains, four, etc.

A. Determination of Free Terminal α-Amino Groups:

A few methods have been used; only three of them will be presented here: two chemical and one enzymatic methods:

a) Sanger’s Dinitrophenyl-Amino Acid Method:

While studying the chemical properties of amino acids, it was seen (figure 1-4) that the NH2 group can condense with dinitro-fluorebenzene. If it is the NH2 of the amino acid at the end of the polypeptide chain called NH2-termmal end or N-terminal (precisely because of the presence of an amino acid having its amino group free), we obtain a dinitrophenyl-polypeptide.

Action of 1-Fluoro-2-4 Dinitrobenzene

The linkage thus formed is more stable than the peptide linkage, so that after hydrolysis by HCl we obtain all the free amino acids and the N-terminal amino acid in the form of dinitrophenyl- amino acid (DNP amino acid), easily identifiable by paper chromatography.

One can also react 1-dimethyl-amino-niaphthalene-5 sulphonyl chloride with the NH2-terminai to obtain — after hydrolysis — a dansyl amino acid detectable by its yellow fluorescence. The principle is the same, but this method is 100 times more sensitive than the previous one.

b) Edman’s Phenylthiohydantoin Method:

The principle of this method is illustrated in figure 1-10. There is formation of phenylthiohydantoin of the N-terminal amino acid which can be separated and identified by chromato­graphy.

But contrary to the previous method where a total hydrolysis of the polypeptide chain is necessary, the chain here is simply severed of its N-terminal residue because in acid medium, the first peptide linkage becomes more fragile due to the substitution; it splits, liberating a phenylthiohydantoin which can be identified by chromatography, and it is the amino acid number 2 which is now in N-terminal position; another molecule of phenylthioisocyanate is added; it will condense on this amino acid and permit its detachment and identification, and so on: this is called a recurring degradation.

This method can therefore allow the determination of the sequence of several amino acids starting from the N-terminal end; the method has even been automated in a device named “Sequenator”.

Reactions in Edman's Method of Phenylthiohydantoins

c) Enzymatic Degradation by Aminopeptidase:

This enzyme hydrolyses the peptide linkage where the N-terminal amino acid is involved. After detach­ing the first amino acid it will liberate the second, which now carries the free NH2, and so on: one can therefore deduce the sequence of amino acids at this end from the rate of liberation of amino acids.

It was thought for some time that the enzyme was specific for leucine (hence its name: leucine-aminopeptidase), but it is not so; it also detaches other amino acids when they are in N-terminal position; however, in the case of some amino acids, its action is very slow which explains why this method does not represent a general approach to study sequences at the N-terminal ends.

B. Determination of Free Terminal α-Carboxylic Groups:

We shall present a chemical and an enzymatic method.

a) Hydrazinolysis:

When a protein is treated with hydrazine at 100°C, all the peptide linkages are split, and all the residues are converted into hydrazides, except the COOH-terminal (or C-terminal) residue which remains in the form of free amino acid, easily isolated and identified.

b) Enzymatic Degradation by a Carboxypeptidase:

One of the best methods of determination of the C-terminal amino acid consists in reacting a carboxypeptidase, a pancreatic enzyme which hydrolyses the peptide linkage involving the C-terminal amino acid.

Once it is detached, the second amino acid which now carries the free COOH is freed, and so on. Here again, we have in principle, a method of recurring degradation, but actually one can rarely go beyond a few amino acids because — as in the case of the amino peptidase — some peptide linkages are hydrolysed much more slowly than others.

Liberation of the C-Terminal Amino Acid

Obviously, the use of a carboxypeptidase, as also that of the aminopeptidase, can give easily interpretable results only if there is only one polypeptide chain; if a protein has several polypeptide chains, the amino acids liberated could as well originate from the gradual hydrolysis of a single chain as from the simul­taneous attack of several chains.

2. Problem of the Number of Peptide Chains:

If the analysis of the terminal groups shows the presence of several chains, we may have two cases:

i. The chains are identical (there is only one N-terminal amino acid and only one C-terminal amino acid). The determination of the sequence is then per­formed on the entire protein.

ii. The chains are different (there is more than one N-terminal amino acid or more than one C-terminal amino acid). In this case we must first separate the chains from one another and study the sequence of each of them.

This separation first requires a dissociation of the protein, i.e. a break of the inter-chain linkages. If these linkages are non-covalent (ionic, hydrogen, Van der Waals) they are split by a change of pH or ionic force, or by the action of certain agents like urea, guanidine or detergents. If these linkages are covalent (disuiphide bridges between two residues of cysteine), they are split chemical­ly, for example, by oxidation (R-S-S-R’→ R-SO3H + R’ – SO3H).

As for the separation of dissociated chains, it uses the general methods of fractionation of mixtures of proteins.

3. Determination of the Sequence of Amino Acids:

Certain methods permit a recurring degradation of polypeptide chains, but till now this type of approach has not taken us very far in the chain. In order to determine the complete sequence of a chain we must carry out partial hydrolyses, determine the sequence of the small peptides thus obtained and then — as in a Chinese puzzle — reconstitute the complete sequence by cross-check operations.

Let us take the example of a hexapeptide whose gross formula AB2C2D has been determined. Sanger’s method of DNP-amino acids indicates that A is the N-terminal amino acid and by action of carboxypeptidase, B and then C are detached successively. One can therefore write the formula of hexapeptide in the following manner, showing within parentheses the amino acids whose sequence is still unknown: A(BCD)CB.

A number of partial hydrolyses are then carried out, either chemical (by decreasing acid concentration, tempera­ture or heating time), or enzymatic (using proteolytic enzymes like pepsine, trypsine, chymotrypsine, which — hydrolyse preferen­tially certain peptide linkages, i.e. peptide linkages in which certain amino acids are involved).

One thus obtains dipeptides and tripeptides which are frac­tionated by different techniques (electrophoresis, ion exchange chromatog­raphy, counter-current distribution etc.); their sequence is very easily determined; as can be seen below, from these dipeptides and tripeptides one can easily deduce by comparison, the sequence of the hexapeptide:

4. Results:

Since Sanger’s determination of the complete sequence of insulin — the first polypeptide whose exact structure was established — numerous complete sequences were determined.

This relates to other hormones of polypeptide nature (oxytocin, vasopressin, adrenocorticotropic hormone, glucagon, etc.), enzymes (lysozyme, ribonuclease, etc.) globins or protein fractions of various hemoglobins and myoglobins (which are heteroproteins whose prosthetic group is heme), electron transporters like cytochrome c (also a heteroprotein), coat proteins of several phages and viruses, etc.

Based on these studies, one may draw some general conclusions:

Amino acids can bind in any order, there is no incompatibility, so that a very large number of sequences is possible for a polypeptide chain of known amino acid content: for a tripeptide consisting of three different amino acids A, B, C, there are already as many as 6 possible sequences: ABC, ACB, BAC, BCA, CAB, CBA. We will see that the sequence of amino acids in proteins is deter­mined genetically.

One always finds — except accidentally (mutation) — the same sequence of amino acids in the different molecules of the same protein originating from a given species; this suggests that during the biosynthesis of proteins, there is a control of the sequence of amino acids and that this control is hereditary.

On the contrary, some differences appear when one compares the sequen­ces of similar proteins extracted from neighbouring or distant species (for example, while comparing the cytochromes c or the globins of different species), although these proteins have the same biological activity.

This sug­gests that some amino acids are indispensable to the physiological role of the protein, while some others are not and can therefore be replaced. The substitution of a single amino acid by another, is sufficient in certain cases, to seriously impair the biological activity of the protein.

Resemblances in the primary structure of similar proteins originating from different species are not due to chance but are generally the reflection of a single ancestral structure; evolution indeed takes place through mutations, i.e. changes in the genetic heritage and the main effect of these mutations is a modification of the structure of proteins. The study of protein sequences is therefore of phylogenic interest because it offers the possibility of following the filiation of species in the course of evolution.

Home››Proteins››