Chapter 11
Specification Error Analysis
The specification of a linear regression model consists of a formulation of the regression relationships and of
statements or assumptions concerning the explanatory variables and disturbances. If any of these is violated,
e.g., incorrect functional form, the improper introduction of disturbance term in the model, etc., then
specification error occurs. In a narrower sense, the specification error refers to explanatory variables.
The complete regression analysis depends on the explanatory variables present in the model. It is understood
in the regression analysis that only correct and important explanatory variables appear in the model. In
practice, after ensuring the correct functional form of the model, the analyst usually has a pool of explanatory
variables which possibly influence the process or experiment. Generally, all such candidate variables are not
used in the regression modeling, but a subset of explanatory variables is chosen from this pool.
While choosing a subset of explanatory variables, there are two possible options:
1. In order to make the model as realistic as possible, the analyst may include as many as
possible explanatory variables.
2. In order to make the model as simple as possible, one may include only fewer number of
explanatory variables.
In such selections, there can be two types of incorrect model specifications.
1. Omission/exclusion of relevant variables.
2. Inclusion of irrelevant variables.
Now we discuss the statistical consequences arising from both situations.
1. Exclusion of relevant variables:
In order to keep the model simple, the analyst may delete some of the explanatory variables which may be of
importance from the point of view of theoretical considerations. There can be several reasons behind such
decisions, e.g., it may be hard to quantify the variables like the taste, intelligence etc. Sometimes it may be
difficult to take correct observations on the variables like income etc.
Econometrics | Chapter 11 | Specification Error Analysis | Shalabh, IIT Kanpur
1
, Let there be k candidate explanatory variables out of which suppose r variables are included and (k r )
variables are to be deleted from the model. So partition the X and as
X X1 X 2 and 1 2 .
nk
nr n( k r ) r1 ( k r )1)
The model y X , E ( ) 0, V ( ) 2 I can be expressed as
y X 11 X 2 2
which is called a full model or true model.
After dropping the r explanatory variable in the model, the new model is
y X 11
which is called a misspecified model or false model.
Applying OLS to the false model, the OLSE of 1 is
b1F ( X 1' X 1 ) 1 X 1' y.
The estimation error is obtained as follows:
b1F ( X 1' X 1 ) 1 X 1' ( X 11 X 2 2 )
1 ( X 1' X 1 ) 1 X 1' X 2 2 ( X 1' X 1 ) 1 X 1'
b1F 1 ( X 1' X 1 ) 1 X 1'
where ( X 1' X 1 ) 1 X 1' X 2 2 .
Thus
E (b1F 1 ) ( X 1' X 1 ) 1 E ( )
which is a linear function of 2 , i.e., the coefficients of excluded variables. So b1F is biased, in general. The
bias vanishes if X 1' X 2 0, i.e., X 1 and X 2 are orthogonal or uncorrelated.
The mean squared error matrix of b1F is
MSE (b1F ) E (b1F 1 )(b1F 1 ) '
E ' ' X 1 ( X 1' X 1 ) 1 ( X 1' X 1 ) 1 X 1' ' ( X 1' X 1 ) 1 X 1' ' X 1 ( X 1' X 1 ) 1
' 0 0 2 ( X 1' X 1 ) 1 X 1' IX 1 ( X 1' X 1 ) 1
' 2 ( X 1' X 1 ) 1.
Econometrics | Chapter 11 | Specification Error Analysis | Shalabh, IIT Kanpur
2
Specification Error Analysis
The specification of a linear regression model consists of a formulation of the regression relationships and of
statements or assumptions concerning the explanatory variables and disturbances. If any of these is violated,
e.g., incorrect functional form, the improper introduction of disturbance term in the model, etc., then
specification error occurs. In a narrower sense, the specification error refers to explanatory variables.
The complete regression analysis depends on the explanatory variables present in the model. It is understood
in the regression analysis that only correct and important explanatory variables appear in the model. In
practice, after ensuring the correct functional form of the model, the analyst usually has a pool of explanatory
variables which possibly influence the process or experiment. Generally, all such candidate variables are not
used in the regression modeling, but a subset of explanatory variables is chosen from this pool.
While choosing a subset of explanatory variables, there are two possible options:
1. In order to make the model as realistic as possible, the analyst may include as many as
possible explanatory variables.
2. In order to make the model as simple as possible, one may include only fewer number of
explanatory variables.
In such selections, there can be two types of incorrect model specifications.
1. Omission/exclusion of relevant variables.
2. Inclusion of irrelevant variables.
Now we discuss the statistical consequences arising from both situations.
1. Exclusion of relevant variables:
In order to keep the model simple, the analyst may delete some of the explanatory variables which may be of
importance from the point of view of theoretical considerations. There can be several reasons behind such
decisions, e.g., it may be hard to quantify the variables like the taste, intelligence etc. Sometimes it may be
difficult to take correct observations on the variables like income etc.
Econometrics | Chapter 11 | Specification Error Analysis | Shalabh, IIT Kanpur
1
, Let there be k candidate explanatory variables out of which suppose r variables are included and (k r )
variables are to be deleted from the model. So partition the X and as
X X1 X 2 and 1 2 .
nk
nr n( k r ) r1 ( k r )1)
The model y X , E ( ) 0, V ( ) 2 I can be expressed as
y X 11 X 2 2
which is called a full model or true model.
After dropping the r explanatory variable in the model, the new model is
y X 11
which is called a misspecified model or false model.
Applying OLS to the false model, the OLSE of 1 is
b1F ( X 1' X 1 ) 1 X 1' y.
The estimation error is obtained as follows:
b1F ( X 1' X 1 ) 1 X 1' ( X 11 X 2 2 )
1 ( X 1' X 1 ) 1 X 1' X 2 2 ( X 1' X 1 ) 1 X 1'
b1F 1 ( X 1' X 1 ) 1 X 1'
where ( X 1' X 1 ) 1 X 1' X 2 2 .
Thus
E (b1F 1 ) ( X 1' X 1 ) 1 E ( )
which is a linear function of 2 , i.e., the coefficients of excluded variables. So b1F is biased, in general. The
bias vanishes if X 1' X 2 0, i.e., X 1 and X 2 are orthogonal or uncorrelated.
The mean squared error matrix of b1F is
MSE (b1F ) E (b1F 1 )(b1F 1 ) '
E ' ' X 1 ( X 1' X 1 ) 1 ( X 1' X 1 ) 1 X 1' ' ( X 1' X 1 ) 1 X 1' ' X 1 ( X 1' X 1 ) 1
' 0 0 2 ( X 1' X 1 ) 1 X 1' IX 1 ( X 1' X 1 ) 1
' 2 ( X 1' X 1 ) 1.
Econometrics | Chapter 11 | Specification Error Analysis | Shalabh, IIT Kanpur
2