The following points highlight the four main steps to be followed for classifying organisms. The steps are: 1. Collection of Data 2. Coding of the Data 3. Determination of Similarity 4. Determination of Taxonomic Relations Between OTUs.
Step # 1. Collection of Data:
The organism intended to be classified have to be chosen, so that they give a fair representation of the group. They must include the type specimens. The cultures must be recently isolated from natural habitats and should be isolated from different parts of the world. Each culture will be treated as an Operational Taxonomic Unit (OTU) and should be given an index number.
Next, each OTU is to be examined for a large number of phenotypic characters which must not be less than fifty, preferably several hundred. The characters should cover a broad range like morphological, biochemical, physiological, serological, molecular biological, etc.
In numerical taxonomy, a character or trait is defined as any property that can vary between the selected OTUs. The value a character can assume is called a character state e.g. the average width of a bacterial cell is a character and the measurement of the width is a character state.
The chosen characters of each OTU is to be carefully examined with sufficient replications. The average differences between replications should not exceed by more than 5%. Because the quality of data is of great importance in numerical analysis, any character showing a high degree of variation in replicates becomes unsuitable and such characters should be rejected.
Step # 2. Coding of the Data:
Each chosen character is considered as a unit trait. The collected data on the character studies have to be next coded for numerical analysis. The familiar way of coding the character is in positive (+) or negative (-) form. Sometimes, instead of + and -, the presence of a character in an OTU is denoted by 1 and its absence by 0.
For example, if an OTU is motile, it can be coded as either + or 1 and if non- motile, by either – or 0. All the chosen characters of all OTUs have to be coded following either of the symbols (+/- or 1/0) without any omissions. Next, the coded data are arranged in a tabular form to have a t x n table, where t denotes the number of OTUs and n the number of unit characters.
A hypothetical t x n table has been shown in Table 3.5.:
t1 to t10 represent 10 OTUs and n1 to n10 represent the taxonomic characters. The presence or absence of each unit character in each OTU is denoted by + and – signs respectively. In actual practice, the numbers of both OTUs and characters should be many more.
Step # 3. Determination of Similarity:
The character data tabulated in a t x n table are next analyzed to yield similarity between the OTUs. The simplest method is to count for each pair of OTUs number of characters which are identical in both, i.e. either ++ or – -, and the number of characters which differ in the pair e.g. + – or – +.
Thus, similarities or dissimilarities (matches or mismatches) for each pair of OTUs, e.g. between t1/t2, t1/t3 to t1/t10 and then t2/t3, t2/t4 to t2/t10 and so on, are counted. Similarity of each pair of OTUs can then be calculated from the proportion of characters that matches and the total number of characters studied and expressed as simple matching coefficient (SSM). The coefficients are often expressed as per cent similarity by multiplying with 100.
The calculation of simple matching coefficient is done in the following way:
SSM = a + d/a + b + c + d, where ‘a’ denotes positive matches (++) between two OTUs, ‘d’ denotes negative matches (- -), ‘b’ stands for mismatch (+ -) ‘c’ for mismatch (- +) and a + b + c + d gives the total number of characters studied. It is evident that SSM will yield a fraction which on multiplication by 100 gives % similarity between any pair of OTUs.
One of the possible alternatives to determine similarity is by calculation of Jacquerd’s coefficient (Sj). In calculation of Jacquerd’s coefficient, the negative matches i.e. ‘d’ of SSM is not taken into account. Thus, Sj = a/ a + b + c where a, b and c denote the same as used in calculation of SSM. SJ can also be expressed as percent by multiplication with 100.
The matching coefficient values are next tabulated to yield a similarity matrix or Smatrix. The square table of the Smatrix is symmetrical along the diagonal i.e. the left lower triangle is identical with the upper right triangle. So only the left triangle is filled up as shown in Table 3.7.
The figures entered in the table are SSM x 100 and are based on the hypothetical t x n table shown in Table 3.6:
Step # 4. Determination of Taxonomic Relations between OTUs:
The Smatrix itself does not reveal the taxonomic relations between the OTUs. To make such relations evident, the matrix needs manipulation through rearrangement by which the OTUs having highest similarity are brought close to each other forming clusters.
For this purpose, the table is searched for the highest similarity values between pairs or groups of OTUs. These form a nucleus or nuclei of the cluster or clusters. The search is continued for the next highest similarity values and those OTUs are added to the cluster nuclei resulting in the increase in the cluster size.
The searching process is continued with gradually decreasing similarity values until all OTUs fuse into the cluster. There may be ultimately more than one cluster in the group under study. Such clusters of OTUs having a high degree of similarity are known as phenons in numerical taxonomy. The procedure for determination of phenons in a group is commonly known as cluster analysis.
In the same way similarity values of other OUT’s were calculated:
The results of cluster analysis are usually represented in the form of a tree-like diagram, called a dendrogram. Generally, the dendrogram is drawn horizontally and vertical lines drawn from left to right across it indicate increasing similarity percent. The branches of the tree are joined at different levels of similarity values. Ultimately,- all the branches join to form the stem of the dendrogram.
A dendrogram representing the phenetic relations of the ten hypothetical OTUs is shown in Fig. 3.7. The similarity level at which a cluster or phenon can be treated as a genus or species in terms of traditional taxonomy is a matter of subjective consideration. However, generally OTUs having about 80% similarity are taken as members of the same species.
In the above example, except OTUs t3 and t10, the others have 80% or higher similarity and, therefore, they belong to the same species of conventional taxonomy. The OTUs t3 and t10 have similarity at 70% level between them. Therefore, these two OTUs do not belong to the same species.