General Linear Hypothesis and Analysis of Variance
Regression model for the general linear hypothesis
Let Y1 , Y2 ,..., Yn be a sequence of n independent random variables associated with responses. Then
we can write it as
p
E (Yi ) j xij , i 1, 2,..., n, j 1, 2,..., p
j 1
Var (Yi ) 2 .
This is the linear model in the expectation form where 1 , 2 ,..., p are the unknown parameters and
x ij ’s are the known values of independent covariates X 1 , X 2 ,..., X p .
Alternatively, the linear model can be expressed as
p
Yi j xij , i , i 1, 2,..., n; j 1, 2,..., p
j 1
where i ’s are identically and independently distributed random error component with mean 0 and
variance 2 , i.e., E ( i ) 0 Var ( i ) 2 and Cov ( i , j ) 0(i j ).
In matrix notations, the linear model can be expressed as
Y X
where
Y (Y1 , Y2 ,..., Yn ) ' is a n1 vector of observations on the response variable,
X 11 X 12 ... X 1 p
X 21 X 22 ... X 2 p
the matrix X is a n p matrix of n observations on p independent
X X ... X
n1 n 2 np
covariates X 1 , X 2 ,..., X p ,
( 1 , 2 ,..., p ) is a p 1 vector of unknown regression parameters (or regression
coefficients) 1 , 2 ,..., p associated with X 1 , X 2 ,..., X p , respectively and
(1 , 2 ,..., n ) is a n1 vector of random errors or disturbances.
We assume that E ( ) 0, the covariance matrix V ( ) E ( ') 2 I p , rank ( X ) p .
Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur
1
, In the context of analysis of variance and design of experiments,
the matrix X is termed as the design matrix;
unknown 1 , 2 ,..., p are termed as effects,
the covariates X 1 , X 2 ,..., X p , are counter variables or indicator variables where x ij counts
the number of times the effect j occurs in the ith observation xi .
x ij mostly takes the values 1 or 0 but not always.
The value xij 1 indicates the presence of effect j in xi and xij 0 indicates the absence
of effect j in xi .
Note that in the linear regression model, the covariates are usually continuous variables.
When some of the covariates are counter variables, and rest are continuous variables, then the
model is called a mixed model and is used in the analysis of covariance.
Relationship between the regression model and analysis of variance model
The same linear model is used in the linear regression analysis as well as in the analysis of variance.
So it is important to understand the role of a linear model in the context of linear regression analysis
and analysis of variance.
Consider the multiple linear model
Y 0 X 1 1 X 2 2 ... X p p .
In the case of analysis of variance model,
the one-way classification considers only one covariate,
two-way classification model considers two covariates,
three-way classification model considers three covariates and so on.
If , and denote the effects associated with the covariates X , Z and W which are the counter
variables, then in
One-way model: Y X
Two-way model: Y X Z
Three-way model: Y X Z W and so on.
Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur
2
,Consider an example of agricultural yield. The study variable Y denotes the yield which depends on
various covariates X 1 , X 2 ,..., X p . In the case of regression analysis, the covariates X 1 , X 2 ,..., X p are
the different variables like temperature, the quantity of fertilizer, amount of irrigation etc.
Now consider the case of one-way model and try to understand its interpretation in terms of the
multiple regression model. The covariate X is now measured at different levels, e.g., if X is the
quantity of fertilizer then suppose there are p possible values, say 1 Kg., 2 Kg.,..., p Kg. then
X 1 , X 2 ,..., X p denotes these p values in the following way.
The linear model now can be expressed as
Y o 1 X 1 2 X 2 ... p X p
by defining
1 if effect of 1 Kg.fertilizer is present
X1
0 if effect of 1 Kg.fertilizer is absent
1 if effect of 2 Kg.fertilizer is present
X2
0 if effect of 2 Kg.fertilizer is absent
1 if effect of p Kg.fertilizer is present
Xp
0 if effect of p Kg.fertilizer is absent.
If the effect of 1 Kg. of fertilizer is present, then other effects will obviously be absent and the linear
model is expressible as
Y 0 1 ( X 1 1) 2 ( X 2 0) ... p ( X p 0)
0 1
If the effect of 2 Kg. of fertilizer is present then
Y 0 1 ( X 1 0) 2 ( X 2 1) ... p ( X p 0)
0 2
If the effect of p Kg. of fertilizer is present then
Y 0 1 ( X 1 0) 2 ( X 2 0) ... p ( X p 1)
0 p
and so on.
Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur
3
, If the experiment with 1 Kg. of fertilizer is repeated n1 number of times then n1 observation on
response variables are recorded which can be represented as
Y11 0 1.1 2 .0 ... p .0 11
Y12 0 1.1 2 .0 ... p .0 12
Y1n1 0 1.1 2 .0 ... p .0 1n1
If X2 =1 is repeated n2 times, then on the same lines n2 number of times then n2 observation on
response variables are recorded which can be represented as
Y21 0 1.0 2 .1 ... p .0 21
Y22 0 1.0 2 .1 ... p .0 22
Y2 n2 0 1.0 2 .1 ... p .0 2 n2
The experiment is continued and if X p 1 is repeated n p times, then on the same lines
Yp1 0 1.0 2 .0 ... p .1 P1
Yp 2 0 1.0 2 .0 ... p .1 P 2
Ypn p 0 1.0 2 .0 ... p .1 pn p
All these n1 , n2 ,.., n p observations can be represented as
y
11 1 1 0 0 0 0 11
y12 1 1 0 0 0 0
12
y1n 1 1 0 0 0 0
1n
1 1
y21 1 0 1 0 0 0 21
y 0
22 1 0 1 0 0 0 22
1
2 n2 1
y 0 1 0 0 0 2 n2
p
p1 1
y 0 0 0 0 1 p1
1 0 0 0 0 1
y p2 p2
1 0 0 0 0 1
y pn pn
p p
or
Y X .
Analysis of Variance | Chapter 2 | General Linear Hypothesis and Anova | Shalabh, IIT Kanpur
4