Notes on Analysis of Variance

This article provides a study note on analysis of variance.

It is a measure of variation among treatments/genotypes/replications, etc., and is obtained by dividing the sum of squares (SS) by corresponding degrees of freedom (n — 1) to get mean sum of squares (MSS), i.e., variance. Variance parameter widely used in statistical analysis is the comparison among different sample means.

Analysis of Variance from RBD Design:

Randomization of any experiment is important for obtaining a precise result. Mean comparison between (genotypes/families/replications, etc.) treatments requires one or more conditions such as random allocation of the treatments to various plots (field experiments)/divisions.

Thus, for successful comparison of different treatments following aspects should be carefully dealt with:

1. Random allocation of treatments;

2. Replication of treatments—it average out as far as possible the effects of environmental differences (including edaphic factors) so that the various treatments are given equal importance to demonstrate their performances.

For example, if we consider that the yield of 4 genotypes has to be compared in a given field condition those 4 genotypes have to be randomly allocated to plots in replications. The genotypes are so assigned that each one get equal scope for their performance. Randomization is non-deliberate allocation of different treatments and it will minimize (equalize) environmental effects.

Note 1:

In field experiments:

(a) Plot size should be equal;

(b) Distance between rows and plants in a plot should be uniform.

Layout of the experiment should be prepared before hand and finally implemented.

Example 1:

Four rice varieties were grown in 4 replications in randomized block design and their yield/plot was assessed. From yield data do you consider that the mean yield of varieties differ among themselves?

Solution:

Significance:

No biasness over the experiment—Wide range of treatments with no restriction on number of replications.

Demerits:

Sectoral representation of some variety/treatments (that is, the treatments were not distributed uniformly) may occur.

Yield of the varieties—kg/plot.

In the present case null hypothesis is assumed as:

(a) Varietal means are same, i.e., the varieties do not differ among themselves in yield.

(b) Varieties do not differ in replication.

First Step:

Add row and column totals to find out grand total = 115(T).

CF (Correction factor) = T²/n = (115)²/16 = 826.56.

Second Step:

Grand total sum of squares = 5² + 6² + 5² + 7² + . . . + 9² + 9² + 9² + 8² = 1003.

Third Step:

(Total SS = Grand total SS – CF = 1003 – 826.56 = 176.44.

Fourth Step:

SS due to variety (column) = 23² + 11² + 45² + 36²/4 – CF =166.19.

It is divided by 4 as each value is sum of four items.

Fifth Step:

SS due to replication (row) = 27² + 31² + 27² + 30² /4 – CF = 3.19.

Sixth Step:

SS due to error

= Total SS – (SS of variety + SS of replication)

= 176.44 — (166.19 + 3.19) = 7.06.

Results indicated that the varieties varied significantly among themselves at 0.1% level of significance; however, the varieties did not varied in replication.

Critical Difference (CD):

The square root of the error mean square measures the standard error per plot due to uncontrolled environmental effects. The varietal means were obtained from four plots (replication) and, therefore, standard error of varietal mean in the present case will be

√0.78/4 = 0.44.

The standard error of difference of means of two varieties will be:

0.44 x √2 = 0.44 x 1.41 = 0.6204.

From the value of standard error of difference we can calculate the value of the difference which will be just significant at a chosen level of significance. The difference is known as the critical difference for the particular level of significance (generally 5% level of significance is considered adequate).

Hence, CD is t-value x standard error of difference (5%) for 9 DF

= 2.26 (t-value at 5% for 9 DF in the present case) x 0.6204

= 1.4.

That is, if the difference between 2 varieties is 1.4, then it is significant.

From CD-value it is apparent that yield varied significantly between the varieties.

Estimation of Heritability from ANOVA:

Genetic Gain = Genetic Advance:

The genetic gain is the difference between the mean of the progeny of selected individuals (X̅_p) and the base population (X̅₀)

R (genetic gain) = X̅_p — X̅₀.

Genetic gain can also be predicted using the following formula R = ih.²σ_p; where, i is the standardized selection differential (5% level has been used mostly 2.06), h² is heritability, i.e., σ²_g/σ²_p, op is the phenotypic standard deviation.

RBD (Anova test) from Unequal Replications:

Example 2:

Performance of mean height of 4 lines of rice are to be tested. They are sown in the field as single row. Plant number in them were unequal. Plant heights are in cm.

Conclusion:

Since the observed value of F(5.631) is greater than the value of F at 1% at df n₁ = 3, n₂ = 42, the test is highly significant indicating thereby that there are significant difference in plant height between different lines.

Treatment means, i.e., line means:

Line 1 = 199/16 = 12.44 cm;

Line 2 = 94/11 = 8.55 cm;

Line 3 = 74/8 = 9.25 cm;

Line 4 = 151/11 = 13.73 cm;

From the above mean heights of the four lines it is found that the mean value of line 2 is lowest and that of line 4 is highest.

Significance of Difference of Means of Line 2 and Line 4:

Conclusion:

The observe value of t, i.e., 5.128 is greater than the table value of t at 20(n₁ + n₂ – 2 = 11 + 11 – 2 = 20) at 1% level. Therefore, the value of observed t is highly significant which indicates that the difference in plant height between the Line 2 and 4 is highly significant.

Analysis of Variance using Latin Square Design:

Here the treatments are so designed that each treatment is represented once in each row and once in each column. Number of treatments must be same as number of replications.

Four genotypes of wheat (A, B, C and D) in four replications 4 x 4 assignment.

As randomization is non-biased representation, it may so happens that varieties/families/treatments, etc., in field may have sectorial representation which maybe a hindrance to minimization of error (soil fertility). However, Latin square design will be only accommodative in square or rectangular field.

Example 3:

Seeds of 4 wheat genotypes (A,B,C and D) were given mutagenic treatments of ethyl methane sulphonate at 0.25%, 3h, 0.50%, 3h and 1.0%, 3h (treatments: 0-control, 0.25%, 0.5% and 1.0%) and grown in 4 replications in field plots. On harvest yield was assessed per plots (kg/plot).

Solution:

Test:

1. Yield of the varieties differed or not.

2. Variety yield in relation to replication.

3. Variety yield in relation to treatments.

Given data in tabular form:

Solution:

Null hypothesis—Mean yield of the genotypes is same:

(a) among themselves,

(b) among replication and

(a) Varieties (Genotypes) in relation to replication:

(b) Genotypes in relation to treatments:

Inference:

1. Yield (mean) of the genotypes do not differ among themselves significantly.

2. Mean yield of the genotypes did not vary in replication.

3. Mean yield of the genotypes varied significantly in treatments.

Analysis of Variance from Split Plot Design:

In split plot design the field is divided into identical blocks and such blocks are considered similar to replications. Each block is divided into main plots, where the first factor (treatment) is assigned at random and subsequently each main plot is subdivided into sub-plots to be allotted randomly to the levels of the second factor. It is breeder’s choice to consider the treatments as first or second factor.

Significance of this layout is that it gives effective control over treatments and minimizes error. Further, interaction between two factors can also be analyzed.

Thus, in the design the field is:

1. Divided into blocks = replications.

2. Each block is divided into main plots, where the main treatment will be assigned.

3. Each main plot is sub-divided in sub-plots = sub-treatment.

4. Size of the main plots and sub-plots should be identical.

5. Random distribution of the treatments into plots.

Layout of the Experiment with a Specific Example:

Treatments:

1. Date of sowing:

2. Methods of sowing:

(a) m₁—Broadcasting

(b) m₂—Drilling

Blocks = Replications = 3.

Diagram showing how field is divided in split plot design.

Example 4:

Split plot design experiment has been conducted to evaluate the differences in yield in relation to sowing dates (A—15th November; B—1st December; C—15th December; D—1st January and E—15th January) in 8 genotypes of Nigella sativa (control and 7 mutant lines) to evaluate the yield differences (seed yield in gm). The experiment was conducted in 3 blocks.

Solution:

Main treatment = Sowing dates = Main plots;

Sub-treatment = Genotypes = Sub-plots.

Results have been tabulated.

Inference:

It has been observed that both treatments and their interactions have been significant at 0.1% level. It has also been noted that the differences between genotypes were of a much smaller order than those between dates of planting.

The estimates of the two error variance were 0.52 [Error a] and 0.192 [Error b], thus confirming the expectation that the main plot error is likely to be higher than sub-plot error. Further, it may also be concluded that sowing period A (1st November) gives maximum yield irrespective of genotypes.

Notes on Analysis of Variance | Biostatistics