Analysis of Covariance
Any scientific experiment is performed to know something that is unknown about a group of treatments
and to test certain hypothesis about the corresponding treatment effect.
When variability of experimental units is small relative to the treatment differences and the experimenter
do not wishes to use experimental design, then just take large number of observations on each
treatment effect and compute its mean. The variation around mean can be made as small as desired by
taking more observations.
When there is considerable variation among observations on the same treatment and it is not possible to
take an unlimited number of observations, the techniques used for reducing the variation are
(i) use of proper experimental design and
(ii) use of concomitant variables.
The use of concomitant variables is accomplished through the technique of analysis of covariance. If both
the techniques fail to control the experimental variability then the number of replications of different
treatments (in other words, the number of experimental units) are needed to be increased to a point where
adequate control of variability is attained.
Introduction to analysis of covariance model
In the linear model
Y X 11 X 2 2 ... X p p ,
if the explanatory variables are quantitative variables as well as indicator variables, i.e., some of them
are qualitative and some are quantitative, then the linear model is termed as analysis of covariance
(ANCOVA) model.
Note that the indicator variables do not provide as much information as the quantitative variables. For
example, the quantitative observations on age can be converted into indicator variable. Let an indictor
variable be
Analysis of Variance | Chapter 12 | Analysis of Covariance | Shalabh, IIT Kanpur
1
, 1 if age 17 years
D
0 if age 17 years.
Now the following quantitative values of age can be changed into indicator variables.
Ages (years) Ages
14 0
15 0
16 0
17 1
20 1
21 1
22 1
In many real application, some variables may be quantitative and others may be qualitative. In such
cases, ANCOVA provides a way out.
It helps is reducing the sum of squares due to error which in turn reflects the better model adequacy
diagnostics.
See how does this work:
In one way model : Yij i ij , we have TSS1 SSA1 SSE1
In two way model : Yij i j ij , we have TSS 2 SSA2 SSB2 SSE2
In three way model : Yij i j k ik , we have TSS3 SSA3 SSB3 SS 3 SSE3
If we have a given data set, then ideally
TSS1 TSS 2 TSS3
SSA1 SSA2 SSA3 ;
SSB2 SSB3
So SSE1 SSE2 SSE3 .
SS (effects ) / df
Note that in the construction of F -statistics, we use .
SSE / df
So F -statistic essentially depends on the SSEs .
Smaller SSE large F more chance of rejection.
Analysis of Variance | Chapter 12 | Analysis of Covariance | Shalabh, IIT Kanpur
2