Statistics in Genetics & Plant Breeding: Frequently Asked Questions

Everything you need to know about the uses of statistics in genetics and plant breeding !

Q. 1. Define statistics?

Ans. Statistics is a branch of applied mathematics which deals with collection, presentation, analysis and interpretation of numerical data.

The statistics which is commonly used in genetics and plant breeding is divided into two parts, viz.:

(i) Statistical methods, and

(ii) Experimental designs.

There are three types of statistics as given below:

(i) First degree statistics or First order statistics

(ii) Second degree statistics or second order statistics

(iii) Third degree statistics or High order statistics.

Q. 2. What do you mean by first degree statistics?

Ans. It includes mean, which is used for the measurement of all type of parameters. The calculation of first degree statistics is simple. The estimates based on first order statistics are more robust and reliable due to high level of precision.

Q. 3. What are uses of first order statistics in plant breeding and genetics?

Ans. Arithmetic mean is used in comparing varieties or breeding material. In biometrical genetics, first degree statistics in used for the study of generation means, heterosis and inbreeding depression, combining ability effects, metro-glyph analysis, stability analysis etc.

Q. 4. What is second degree statistics?

Ans. When variances and co-variances are used for estimation of certain parameters, it is called second degree statistics. The parameters which are based on second degree statistics are not statistically every robust because of inaccuracies involved in the estimation of variances and covariance’s. Moreover, calculation of second order statistics is more difficult than first order statistics.

Q. 5. What are applications of second order statistics in genetics and plant breeding?

Ans. In biometrical genetics, second order statistics is used for the estimation of correlations, path coefficients, discriminant function, D² statistics, heritability, co-heritability, genetic advance, components of variance in diallel, partial diallel, line x tester cross, triple test cross, bioparental cross, triallel cross and quardriallel cross.

Q. 6. What is third degree statistics?

Ans. It includes complex interactions like kurtosis and skewness, which are used in fitting frequency curves, surfaces and flatness. In biometrical genetics, high order statistics is used to study skewness of F₂, F₃ and biparental progenies.

Q. 7. What is χ² test?

Ans. χ² is a test of statistical significance which is used to test the significance of difference between observed and expected frequencies or ratios.

The general formula of χ² test is as follows:

χ² = ∑(O – E²)/E, where ∑ = summation, O = observed frequencies and E = expected frequencies.

Q. 8. Who developed χ² test?

Ans. The χ² test was developed by Karl Pearson who was an English mathematician. He applied statistics to biological problems of heredity and evolution.

Q. 9. What are applications of χ² test in genetics and plant breeding?

Ans. χ² test is commonly used in Mendelian genetics and population genetics.

In genetics, χ²test is used for following three main purposes:

(i) To test the validity of various segregation ratios.

(ii) For detection of linkage.

(iii) For study of gene frequencies in population genetics.

Q. 10. What is degree of freedom?

Ans. The degree of freedom refers to number of independent comparisons. Generally degree of freedom is one less than the total classes of comparison. In two class segregation (3: 1), degree of freedom is 1 and in four class segregation (9: 3: 3: 1), the degree of freedom is 3.

Q. 11. What is arithmetic mean?

Ans. Arithmetic mean is defined as the sum of all observations in a simple divided by their number.

It is denoted by X̅ and is calculated as follows:

Arithmetic mean X̅ = (∑X)/N where ∑ = summation, X = an observation and N = number of observations in a sample.

Q. 12. What are measures of dispersion?

Ans. The degree to which numerical data tend to spread about the mean value is called dispersion. Thus dispersion is a measure of variation in a sample. The measures of dispersion include range, standard deviation, variance, standard error and coefficient of variation. These measures are also known as simple measures of variability.

Q. 13. What do you mean by range?

Ans. Range is the difference between the highest and the lowest values present in the observations in a sample. Suppose in cotton, 20 observations have been recorded on seed oil content. The highest value is 25% and the lowest value is 15%. The range will be 25-15 = 10. Thus it is a measure of the spread of variation in a sample.

Q. 14. Define standard deviation.

Ans. Standard deviation is the square root of the arithmetic mean of squares of all deviations measured from the mean. In other words, it is the square root of the variance.

It is the best measure of variation in a population and is calculated as follows:

where, ∑X, X² and N = summation, an observation, square of an observation and number of observations, respectively.

Q. 15. What is variance?

Ans. Variance is defined as the average of the squared deviation from the mean or it is the square of standard deviation.

It is an effective measure of variability and is estimated by the following formula:

Q. 16. What is standard error?

Ans. It is the measure of the mean difference between sample estimate mean (X̅) and population parameter (µ). It is the measure of uncontrolled variation present in a sample.

It is estimated as follows:

Standard Error = SD/√N where SD = standard deviation and N = number of observations.

Q. 17. What is coefficient of variation?

Ans. The ratio of standard deviation of a sample to its mean expressed in percentage is called coefficient of variation.

Coefficient of variation does not have any unit and is estimated as follows:

where, SD = Standard deviation, X̅ = Mean.

Q. 18. Who evolved the coefficient of variation? How it is useful in plant breeding?

Ans. The coefficient of variation was evolved by Karl Pearson. A sample in which coefficient of variation is higher would have greater variation than the one in which it is lower. In plant breeding, phenotypic, genotypic and environmental coefficients of variation are estimated from corresponding variances.

Q. 19. What do you mean by tests of significance?

Ans. Statistical procedures which are used to decide whether differences under study are significant or non-significant are known as tests of significance. The commonly used tests of significance include Z-test, t-test and F-test. These tests are used to decide whether the difference under study is significant or can be attributed to chance or the fluctuation of sampling.

Q. 20. What is null hypothesis?

Ans. A hypothesis of no difference is known as null hypothesis. This hypothesis was given by Fisher which is devoted as H₀. This hypothesis states that no true difference exists in the sample result and population parameter. The observed difference is only due to sampling fluctuation.

Q. 21. Define Z-test.

Ans. It is a test of significance which is used to compare two means when the sample size is large (more than 30).

Q. 22. What is t-test?

Ans. A test of significance which is used for comparing two means, when sample size is small (upto 30). It is of two types: Student t—used with paired observations and Fisher’s t—used when observations are unpaired.

Q. 23. What is analysis of variance?

Ans. The statistical procedure which separates the total variation into different components is called analysis of variance.

It has following four main advantages:

(i) It is useful in estimating components of variance.

(ii) It provides basis for test of significance.

(iii) It permits estimation of phenotypic, genotypic and environmental coefficients of variability.

(iv) It also permits estimation of broad sense heritability and expected genetic advance.

Q. 24. Define analysis of covariance and explain its uses.

Ans. The statistical technique which splits simultaneously the variation of two variables (characters) into various components is called analysis of covariance.

It has following main advantages along with analysis of variance:

(i) It permits estimation of co-heritability

(ii) It also permits estimation of correlations, regression, path coefficients, selection indices etc.

Q. 25. What are joint applications of variances and co-variances in plant breeding?

Ans. The estimates of variances and co-variances permit estimation of following statistics:

(i) Phenotypic, genotypic and environmental correlations

(ii) Above three types of path coefficients.

(iii) Regression coefficient

(iv) Co-heritability

(v) D² statistics

(vi) Selection indices, etc.

Q. 26. What is regression coefficient?

Ans. Regression coefficient is a statistical measure of the average functional relationship between two or more variables. In regression analysis, one variable is considered as dependent and other is independent. Thus it measures the degree of dependence of one variable on the other. Regression coefficient was first used by Sir Francis Galton for estimating the relationship between the heights of fathers and their sons. It is expressed in terms of original unit of data and is denoted by b.

Q. 27. What is correlation?

Ans. The statistical measure that measures the degree and direction of association between two or more than two variables is called correlation. It is represented by r and is independent of the unit of measurement. Its value lies between—1 and 1.

The general formula for calculation of correlation is given below:

Correlation r x y = Cov. (Xy)/√vX.v.Y

where r x y = Correlation between X and y

Cov(Xy) = Covariance between X and y

v X = Variance of X, and

vy = Variance of y

Q. 28. What are experimental designs?

Ans. Experimental designs are various types of plot arrangements which are used to test a set of treatments to draw valid conclusions about a particular problem.

Experimental designs are used for following purposes:

(i) For testing the significance of difference among various objects of comparison.

(ii) For testing of treatments in scientific manner.

(iii) For partitioning of variation in different components.

(iv) For estimation of various statistical estimates.

(v) For proper interpretation of scientific results and drawing’-valid conclusions.

Q. 29. What is an experiment?

Ans. Experiment is a scientifically planned method. The experiment is conducted to draw valid conclusion about a particular problem. The conclusion is drawn based on statistical observations.

Q. 30. Define treatment.

Ans. Treatments are nothing but various objects of comparison! In plant breeding, treatments include a set of varieties, a set of hybrids, and a set of advanced breeding cultures. In agronomic trials, it may include various levels of fertilizers, sowing dates, spacing, seed rate, number of irrigations etc.

Q. 31. What is experimental unit?

Ans. Experimental unit refers to the group of material to which a treatment is applied in the experiment. It may be a plot of land, a patient in a hospital, a group of cattle in a dairy etc.

Q. 32. What are basic principles of experimental designs?

Ans. There are three basic principles of experimental designs, viz. replication, randomization and local control. All these principles help in reducing the experimental error and increasing accuracy of results.

Q. 33. What is replication?

Ans. The repetition of treatments under investigation is known as replication.

There are three main advantages of replication as follows:

(i) Increase in replication increases the precision by reducing the experimental error to a great extent.

(ii) Replication helps in detection of variation in a treatment and thus helps in comparison of various treatments.

(iii) It helps in estimation of average performance of various treatments in an experiment.

Q. 34. Define randomization and explain its advantages.

Ans. Randomization refers to allocation of treatments to different plots, in an experiment, by a random process. Randomization is done with the help of random statistical table. Randomization gives equal chance to all treatments- for being allotted to a more fertile plot as well as less fertile plot.

Q. 35. What is local control?

Ans. The principle of making use of greater homogeneity in group of experimental units for reducing the experimental error is called local control. It is also known as error control. The fertility variation may be either as gradient (in strips) or in patches (sporadic). The first type of fertility effects can be reduced by dividing the field into homogeneous blocks and second type of fertility is minimized by randomization of treatments within the block.

Q. 36. Define experimental error.

Ans. The variation due to environmental factors or uncontrolled factors is called experimental error. It can be reduced by adopting, replication and randomization.

Q. 37. What is correction factor?

Ans. In the analysis of variance and covariance, the square of grand total divided by number of observations is called correction factor.

It is calculated as follows:

where N = number of observations.

Q. 38. What is F-test?

Ans. It is a test of significance which is used for testing the significance of differences among several treatments. It differs from Z-test and t-test which are applied to test the significance of difference between two treatment means or between sample mean and population mean.

Q. 39. Define F-value?

Ans. It is the ratio between the treatment variance and error variance.

It is also called variance ratio and is estimated as follows:

F-Value = Treatment variance/Error variance

The observed F value is compared with table F value at appropriate degrees of freedom and at desired level of probability (5% or 1%), If the observed F value is more than Table F value, the differences among treatments are considered significant and vice-versa.

Q. 40. What is critical difference?

Ans. The least significant difference, greater than which all the differences are significant is known as critical difference (CD) or least significant difference (LSD).

It is estimated as follows:

Critical difference = SE difference × t

SE difference = √2VE/r where Ve = error variance,

R = replications, t = table value at error degrees of freedom.

Critical difference is used to compare the observed differences among different treatments. If the difference is greater than critical difference, it is considered as significant and vice-versa.

Q. 41. What is the difference between χ² test and F-test?

Ans. χ² test is used to test the significance of difference between observed and expected frequencies, whereas F test is used to test the significance of difference among several means. Moreover, χ² test is based on single variance, whereas F test is based on ratio of two variances.

Q. 42. What are different types of experimental designs used in plant breeding?

Ans. Various types of experimental designs which are used in genetics and plant breeding are listed below:

(i) Completely randomized design

(ii) Randomized block design

(iii) Latin square design

(iv) Lattice design

(v) Split plot design, and

(vi) Augmented design.

Q. 43. What is completely randomized design?

Ans. The design which is used when the experimental material is limited and experimental unit is homogeneous is called completely randomized design. This design is specially used for pot culture experiments. The principle of local control is not adopted in this design because homogeneous blocks are not formed in this design.

Q. 44. What is randomized block design?

Ans. The experimental design which controls fertility variation in one direction only is called randomized block design. Adoption of this design is useful where the fertility variation between the blocks is significant. The principle of local control is adopted in this design because the experimental field is divided into homogeneous blocks. This design can test upto 20 treatments Beyond 20, the efficiency of error control is decreased due to increase in heterogeneity within the block.

Q. 45. Explain Latin Square design.

Ans. This experimental design simultaneously controls fertility variation in two directions. In this design the number of treatments, replications, rows and columns is equal. This design adopts principle of local control of forming rows and columns. In this design, maximum 12 treatments can be tested.

Q. 46. What is lattice design?

Ans. Lattice designs are incomplete block designs in which number of treatments or varieties forms square.

Q. 47. What is augmented design?

Ans. This an experimental design which is used to test a large number of germplasm lines in a limited experimental area. In this design, standard or check varieties are replicated among the germplasm lines after a definite number of germplasm lines. The concept of augmented design was developed by Federer (1956).

Q. 48. What is split plot design?

Ans. The experimental design in which experimental plots are divided into main plots, sub-plots and ultimate plots is called split plot design. In this design several factors are studied simultaneously with different levels of precision. It includes factors like irrigation, sowing date, seed rate etc.

Statistics in Genetics & Plant Breeding: Frequently Asked Questions | Biology