After reading this article you will learn about:- 1. Meaning of X2– Test 2. Salient Features of X2– Test 3. Applications of X2– Test 4. Testing Goodness of Fit of X2– Test 5. Test of Linkage following the use of X2-Test 6. Use of X2-Test in Binomial Distribution 7. X2-Test Independence 8. X2-Test of Heterogeneity 9. Use of X2-Test in Poisson Distribution.
Contents:
- Meaning of X2– Test
- Salient Features of X2– Test
- Applications of X2– Test
- Testing Goodness of Fit of X2– Test
- Test of Linkage following the use of X2-Test
- Use of X2-Test in Binomial Distribution
- X2-Test Independence
- X2-Test of Heterogeneity
- Use of X2-Test in Poisson Distribution
1. Meaning of X2– Test:
It is used for testing the agreement of observed frequencies with those expected upon a given hypothesis or in other words it can be said that it is test of deviation between theoretical and observed frequencies and to see whether the deviation is significant or not. If the deviation is significant then the assumed hypothesis on which the test is performed is rejected.
2. Salient Features of X2– Test:
1. Complete agreement of observed and expected frequencies will give X2-value as zero; but due to chance deviation (from sample fluctuations) positive values will be scored.
2. The use of X2-test requires that the frequency in any class is not 5 or less.
3. The test is applicable only to comparisons of observed and expected values of absolute frequencies.
3. Applications of X2– Test:
1. Testing goodness of fit.
2. To determine genetic ratios [gene(s) involved for a particular trait] from F2/BC1 segregating data.
3. Estimation of linkage.
4. Use in testing independence of classifications.
5. Use in testing heterogeneity of data.
4. Testing Goodness of Fit of X2– Test:
Example 1:
A chlorophyll mutant (viridis) was spotted in M2 generation of black cumin (Nigella sativa L.) following treatments of dry seeds with 5kR gamma-rays. The mutant was crossed with pollens from normal plants and F1 progenies were raised. All F1‘s were normal and on selfing 72 normal and 8 viridis were obtained in F2 segregating population.
Question:
State the nature of inheritance of the mutant trait.
Solution:
(a) First see how many phenotypic classes are there. In the present case there are 2 classes—normal and mutant. Therefore, the hypothesis to be assumed will be from the following genetic ratios: 1 : 1, 3 : 1, 9 : 7, 13 : 3,15 : 1.
(b) Add the total observations and select ratio which will be closest to observe values (as goodness of fit of X2-test is performed).
(c) As the trait concerned is qualitative, X2-test of goodness of fit has to be applied to ascertain inheritance of the mutant trait.
(d) Hypothesis considered in the present case is 15 : 1. Therefore, it has to be seen whether 15 : 1 is good-fit or not.
As there are two phenotypic classes, degree of freedom will be n – 1 =1 [degree of freedom is the choice of the experiment; n = number of classes in the present case).
Comment: The X2 value for 1 DF has been found to be 1.92, which is much less than the X2-value given at table (3.84) at 5% (standard check) for 1 DF. Therefore, the deviation that occurred between observed and expected values is not significant and considered to be good-fit at the probability level of 0.10 to 0.20 (from X2-table). Thus, the assumed hypothesis is accepted (15 : 1).
Inference:
1. 15 : 1 genetic ratio indicated that the nature of inheritance of chlorophyll trait (present case) is digenic (15 : 1 is a modification of Mendel’s di-hybrid ratio), that is, 2 genes (two pairs of alleles) are involved for the trait.
2. As the mutant crossed with normal gave normal at F1 and in F2 frequency of mutants was much lower than normal it may be concluded that mutation was recessive in nature.
Thus, the mutant trait was disgenic recessive to normal.
Note: Why it is (O – E)2/E?
1. To eliminate negative sign, the deviation is squared.
2. As the deviation from expected is tested, therefore, deviation is divided by expected.
Example 2:
A brown seed-coat (bb) mutant of Nigella sativa was crossed with plants having black seed-coat (BB). The F1 plants were black seeded and were crossed with brown seed-coat mutants and in BC1 generation 32 black seed-coat and 28 brown seed-coat plants were observed.
Question:
Test the hypothesis at 5% level and comment on the nature of inheritance of the mutant trait (brown seed-coat trait).
Solution:
1. As there are only two phenotypic classes and the F1‘s were crossed with recessive parent so the assumed hypothesis can only be 1 : 1.
2. Instead of F2 generation the data has been obtained from BC1 segregating population.
The calculated X2-value 0.26 at 1 DF is much less than the X2-table value (3.86) at 5% for 1 DF. Thus, the observed and expected values are good-fit (p-value 0.50 to 0.70) and, therefore, the deviation between them is not significant and consequently the assumed hypothesis is accepted.
Inference:
As the hypothesis 1 : 1 is accepted, it can be concluded that the mutant trait is monogenic (one pair of alleles) recessive to normal.
In practical classes seeds are given in petri-plates to test goodness of fit of a particular data. For some data careful test should be conducted.
Example 3:
Solution:
Let the assumed hypothesis be 1 : 1. Therefore, expected frequencies will be 20 for black seeds and 20 for brown seeds.
As X2-value 0.4 at 1 DF is much less than table value 3.87 at 1 DF, the deviation between expected and observed is not significant rather good-fit at the probability level of 0.50 to 0.55. Therefore, the hypothesis 1 : 1 is accepted and seed-coat colour is controlled by one pair of alleles.
But for the same set of data—hypothesis 9 : 7 may be considered.
In this case also the hypothesis 9 : 7 is accepted as the deviation between observed and expected values is not significant and is good-fit at 0.80 to 0.90 probability level. Acceptance of this hypothesis is indicated that the seed-coat colour had a digenic mode of inheritance.
Inference:
9 : 7 is the correct hypothesis as it shows lesser deviation between observed and expected values than 1:1. Thus, assumption of hypothesis in X2– test of goodness of fit should be based on close fit of expected with observed values.
5. Test of Linkage following the use of X2-Test:
Example 4:
A tall homozygous pea plant (TT) bearing yellow pods (YY) was crossed with a dwarf plant (tt) having green pods (yy).
The F1 plants raised were all tall and yellow poded and on selfing of these plants F2 plants were developed in the following frequencies:
Question:
Comment on the assortment of T and Y genes.
Solution:
X2-test of goodness of fit will be applied to test the hypothesis 9 : 3 : 3 : 1 is the assumed hypothesis as it is a data from di-hybrid cross.
Therefore, X2 = 46.66 at 3 DF (4 phenotypic classes; n — 1 =3).
As computed X2-value 46.66 at 3 DF is much higher than table value 7.8 at 3 DF, the deviations between observed and expected values are not good-fit and, therefore, the assumed hypothesis 9 : 3 : 3 : 1 is not accepted.
This significant deviation may be due to failure of independent assortment of T and Y genes as the consequence of linkage. High frequency of parental combinations than recombinant also indicated the possibility of linkage.
Linkage can be estimated from f2 data by the following formula:
P2 = E – M/N,
where P = linkage value, E = sum of end classes (parental classes), M = sum of middle classes (recombinant classes) and N = number of progenies.
p2 = 140 – 20/160 = 0.75; P = √0.75 = 0.8660.
Therefore, percentage of linkage in the given data is 86.6. Recombination value is thus 13.4%. As 1% recombination is considered as 1 map unit, therefore, T and Y genes are 13.4 map units distance apart.
Determination of Linkage by Partitioning the Components of Dihybrid Ratio following the use of Orthogonal Function:
The method of partitioning X2 is also of great use in the detection of linkage. If the data under study is concerned with two traits segregating simultaneously, then it is important to know whether the two characters are inherited independently or tend to be associated.
For the purpose, it is necessary to test the segregation of two characters separately for their agreement with the respective expected ratios and if the segregation of individual trait is satisfied then the test of their independence can be proceed 1.
Thus, the objective can be achieved in a systematic manner by partitioning the total X2 for 3 degrees of freedom into components by the use of orthogonal function. Therefore, orthogonal function gives a precise knowledge of independent comparison in di-hybrid ratios.
Supposing the observed frequencies in the four classes are a1, a2, a3, a4, respectively, then following formulae are used:
Example 5:
So, here there are 2 traits:
(a) Stigma colour—controlled by A gene (A and a alleles.).
(b) Flower colour—controlled by B gene (B and b alleles,).
Thus, we have the results:
Same result would have come if X2-value was calculated directly.
Comment:
Results indicated that segregation of A (A and a alleles) and B (B and b alleles) factors independently were in agreement to theoretical expectations but not their joint inheritance. Thus, the discrepancy was due to failure of independent assortment between ‘A’ and ‘B’ factors.
Therefore, the merit of the method lies in locating precisely the source of discrepancy and in the present case it was due to linkage and not due to failure of segregation between alleles of A and B genes.
Example 6:
A tall homozygous pea plant (PP) having yellow pods (BB) was crossed with dwarf green poded plants having complementary genotypes (ppbb).
All F1 plants were tall and yellow poded (PpBb) and were selfed to obtain F2 progenies:
Question:
Use X2-test of goodness of fit to show the assortment (allelic interactions) of P and B genes.
Solution:
Assumed hypothesis is 9 : 3 : 3 : 1.
Thus, X2-value 1185.77 at 3 DF (table value 7.8 at 3DF, 5% level of significance) indicated that observed and expected values are not good-fit and consequently the hypothesis 9 : 3 : 3 : 1 has to be rejected. This rejection may be due to linkage between P and B genes, i.e., P and B factors have not assorted independently.
Let us consider this on the basis of orthogonal function.
Thus, by partitioning the components of dihybrid ratio it can be seen that high X2-value was not only due to linkage but also due to failure of segregation P(P and p alleles) and B (B and b alleles) factors.
6. Use of X2-Test in Binomial Distribution:
Binomial distribution is a discrete probability distribution and is defined by the p.m.f.
f(x) = nCxpxqn-x . (x = 0,1,2,…,n),
where p and q are positive fractions (p + q = 1).
The distribution is known as binomial because the probabilities are given by the binomial series
(p + q)n = qn + nC1pqn-1 + nC2p 2qn-2 + nC3p3qnn-3 + . . . + pn.
The two constant n and p appearing in the expression for f(x) are known as parameters of the binomial distribution. If the values of parameters are known, the distribution is completely known (q = 1 — p).
Binomial distribution takes into account the following:
(a) The result of any trial can be classified only under two categories (let us say, success and failure);
(b) The probability of success in each trials remains a constant;
(c) The trials are independent.
Example 7:
A total number of 160 families with 4 children have been surveyed and the following result has been obtained:
Question:
Is the family distribution consistence with the hypothesis of equal number of boys and girls?
Solution:
As boys and girls represent two set of terms, such problem can be solved by expanding the binomial (a + b)n.
Expansion of (a + b)n – an + an-1b + an-2b2 + an-3b3 + . . . + bn.
Each term of the expansion has an appropriate coefficient and it can be calculated from the general formula: n!/s!t!, where n = index of binomial, s = index of a in the term, t = index of b in the term and ! = factorial [e.g., 4! —4 x 3 x 2 x 1; 5! — 5 x 4 x 3 x 2 x 1; n! — 1 x 2 x 3 x 4 x … x n].
In the present problem (a + b)n can be written as (a + b)4, where a = boy and b = girl.
Hypothesis to be assumed is boys and girls are equally (randomly/ consistent) distributed in the family. That is probability of girl is 1/2 and that of boy is also 1/2.
(a + b)4 = a4 + a3b + a2b2 + ab3 + b4.
Using the formula n’/s!t!, the coefficient has been calculated and represented as:
Degree of freedom n — 1 = 5 — 1 = 4. As the computed X2-values 9.02 at 4DF is not greater than the table value at 5% level (9.02 at 4DF), it can be concluded that expected and observed frequencies for families are good-fit. Thus the hypothesis assumed is accepted and can be inferred that the distribution of boys and girls within families is equal, i.e., random in nature.
7. X2-Test Independence:
Another common use of the X2-test is in testing independence of classifications in what are known as contingency table. Data are set out in a table with rows and columns, i.e., each observation is assigned to one of the cells in the table.
For example, if there are r rows and c columns, the table is generally called r x c contingency table, where r and c may represent any number and the simplest table of this kind is 2 x 2 contingency table.
For a 2 x 2 table, there is only one degree of freedom, i.e., only one of the four cell frequencies can be arbitrarily given if the row and column total remain fixed. It is, therefore, necessary to make a correction of formula, so that its approximation to the continuous chi-square distribution can be improved. This is known as Yates correction for continuity.
Example 8:
In a survey of 200 boys of which 75 were intelligent, 40 had skilled fathers; while 85 of the unintelligent boys had unskilled fathers. Do these figures, support the hypothesis that skilled father have intelligent boys?
Solution:
The data are shown in the following 2 x 2 table:
Null hypothesis—The two attributes skill of father and intelligence of son are independent (As it is a test of independence—the assumed hypothesis is considered to be independent between attributes).
As the observed value 8.02 at 1 DF is greater than the table value (3.84 at 1 DF) at 5% level of significance, there exist significant deviation between observed and expected cell frequencies and, therefore, the assumed hypothesis is rejected.
Inference:
By the use of X2-test of independence it can be concluded that skilled fathers have intelligent sons (the attributes are dependent).
Example 9:
Anaphase I chromosomal segregation in Allium cepa (2n = 16) showed seasonal variation and the data obtained has been represented in 2 x 2 contingency table. Test whether the attributes are independent or not.
Solution:
Null hypothesis—Anaphase I separation of chromosomes is independent of seasons:
The calculated X2-value 6.42 at 1 DF is greater than the table value 3.87 (1 DF at 5% level of significance) and, therefore, the hypothesis is rejected as there exist significant deviation between observed and expected cell frequencies.
Inference:
Anaphase I separation of Allium cepa chromosomes varied with seasons (the traits are dependent to one another).
Example 10:
500 PMCs of Nigella sativa (black cumin) were assessed in 2 different seasons (summer and winter) for studying anaphase I segregation of chromosomes (2n = 12) and the data obtained have been tabulated in r x c contingency table. Test the independence of the characters (anaphase I chromosome segregation and seasons,).
Solution:
Null hypothesis—Anaphase I chromosome segregation is independent to seasons.
The expected value for each class can be calculated from the following formula:
E = r x c / G, where r = row, c = column and G = grand total. G
Expected values:
The calculated X2-value 9.62 at 2 DF is higher than table value (5.9 at 2 DF) at 5% level of significance, thereby indicating that the expected and observed frequencies are not good-fit rather showed significant deviations. Thus, the hypothesis is rejected.
Inference:
Anaphase I chromosome segregation is dependent to seasons.
Example 11:
Yield of wheat varieties in (quintal/hectare) relation to irrigation and non-irrigation has been presented in tabular form. Test whether yield of wheat varieties are related to irrigation.
Solution:
Null hypothesis—Wheat varieties are independent to irrigation.
Following formula can be used to compute X2-value:
The calculated X2-value 2.01 at 4DF is less than the table (9.49 at 4DF) at 5% level of significance and, therefore, the hypothesis assumed have been accepted (deviation between observed and expected frequencies non-significant).
Inference:
Yield of wheat varieties is independent to irrigation.
8. X2-Test of Heterogeneity:
X2-test of heterogeneity is very similar to that of testing independence and widely used in genetic experiments, where the consistency (randomness) or otherwise (inconsistency = non-randomness = significant variation) of several groups of data can be tested. The advantage of this method is that it shows actual proportions observed in each class.
Example 12:
Metaphase I chromosome associations in tetraploid Nigella damascena (2n = 12) have been documented over 4 generations. Test whether chromosomal configurations (I, II and IV) per cell were consistent over the generations or not.
Thus, analysis of X2-test of heterogeneity of chromosome configurations over the generations at 3 DF revealed that frequency of quadrivalents (p > 0.001) and bivalents (p > 0.01) per cell at metaphase I among tetraploids were inconsistent (non-random/significant) over the generations; while, the univalents were random (p = 0.30) in distribution [uniformly distributed over the generations].
Example 13:
Univalent and bivalent frequency per cell have been presented in Table from 10 plants of Ocimum basilicum (basil 2n = 72).
Question:
Find out from the given data whether univalent and bivalent frequencies per cell were consistent among the plants.
Solution:
X2-test of heterogeneity has been performed to assess distribution of univalent and bivalent per cell over the plants. Result has been tabulated (process of calculation have been shown in the previous example).
Result indicated that bivalents/cell was random (consistent) among the plants; while, frequency of univalent/cell was non-random (significant variants occurred among plants).
9. Use of X2-Test in Poisson Distribution:
Poisson distribution is a discrete probability distribution and is defined by the probability mass function (p.m.f.)
where m is positive. The quantity e is a mathematical constant and is given by the infinite series
The constant m is the parameter of the Poisson distribution. It may be noted that the variable assumes an infinite number of values x = 0, 1, 2, 3, … ∞.
The distribution is named after S.D. Poisson.
Example 14:
In a community 50 quadrats were laid down (100 x 100 m) and number of basil species (Ocimum spp.) were counted. Data has been given below:
Do you consider that the distribution of basil species in the community is random?
Solution:
Null hypothesis—Distribution of basil species in the community is random
Mean density of basil species per quadrat
= (0 x 12) + (1 x 5) + (2 x 15) + (3 x 10) + (4 x 3) + (5 x 2) + (6 x 3)/50
= 2.1.
The expected number of basil species per quadrat can be derived from Poisson series: e-m, e-mm, e-mm2/2!, e-mm3/3!, …, where m is the mean density of basil individuals.
For the Poisson series (m = 2.1 and putting x = 0,1,2,3,4,5,6), x = 0,
at 5 DF (2 less than the number of terms used; here 0 to 6 gives 7 terms), X2– value is much higher than the tabulated value at 5% for 5 DF thereby indicating significant variations between observed and expected values. Thus, null hypothesis is rejected and it may be concluded that distributions of Ocimum species are non-random.