Chapter 1 Introduction 1
Chapter 2 The Classical Multiple Linear Regression Model 2
Chapter 3 Least Squares 3
Chapter 4 Finite-Sample Properties of the Least Squares Estimator 7
Chapter 5 Large-Sample Properties of the Least Squares and Instrumental Variables Estimators 14
Chapter 6 Inference and Prediction 19
Chapter 7 Functional Form and Structural Change 23
Chapter 8 Specification Analysis and Model Selection 30
Chapter 9 Nonlinear Regression Models 32
Chapter 10 Nonspherical Disturbances - The Generalized Regression Model 37
Chapter 11 Heteroscedasticity 41
Chapter 12 Serial Correlation 49
Chapter 13 Models for Panel Data 53
Chapter 14 Systems of Regression Equations 63
Chapter 15 Simultaneous Equations Models 72
Chapter 16 Estimation Frameworks in Econometrics 78
Chapter 17 Maximum Likelihood Estimation 84
Chapter 18 The Generalized Method of Moments 93
Chapter 19 Models with Lagged Variables 97
Chapter 20 Time Series Models 101
Chapter 21 Models for Discrete Choice 1106
Chapter 22 Limited Dependent Variable and Duration Models 112
Appendix A Matrix Algebra 115
Appendix B Probability and Distribution Theory 123
Appendix C Estimation and Inference 134
Appendix D Large Sample Distribution Theory 145
Appendix E Computation and Optimization 146
In the solutions, we denote:
• scalar values with italic, lower case letters, as in a or
• column vectors with boldface lower case letters, as in b,
• row vectors as transposed column vectors, as in b,
• single population parameters with greek letters, as in ,
• sample estimates of parameters with English letters, as in b as an estimate of ,
• sample estimates of population parameters with a caret, as in ˆ
• matrices with boldface upper case letters, as in M or ,
• cross section observations with subscript i, time series observations with subscript t.
These are consistent with the notation used in the text.
,Chapter 3
Least Squares
1 x1
. . The normal equations are given by (3-12), Xe = 0 , hence for each of the
1. (a) Let X = .
1 xn
columns of X, xk, we know that xk’e=0. This implies that e
= 0 and i xiei = 0 .
i i
(b) Use i ei = 0 to conclude from the first normal equation that a = y − bx .
(c) Know that e i i
= 0 and xe
i i i = 0 . It follows then that (x − x)e
i i i = 0 . Further, the latter
implies (x − x)(y − a − bx ) = 0 or (x − x)(y − y − b(x − x))= 0 from which the result
i i i i i i i i
follows.
2. Suppose b is the least squares coefficient vector in the regression of y on X and c is any other Kx1 vector.
Prove that the difference in the two sums of squared residuals is
(y-Xc)(y-Xc) - (y-Xb)(y-Xb) = (c - b)XX(c - b).
Prove that this difference is positive.
Write c as b + (c - b). Then, the sum of squared residuals based on c is
(y - Xc)(y - Xc) = [y - X(b + (c - b))] [y - X(b + (c - b))] = [(y - Xb) + X(c - b)] [(y - Xb) + X(c - b)]
= (y - Xb) (y - Xb) + (c - b) XX(c - b) + 2(c - b) X(y - Xb).
But, the third term is zero, as 2(c - b) X(y - Xb) = 2(c - b)Xe = 0. Therefore,
(y - Xc) (y - Xc) = ee + (c - b) XX(c - b)
or (y - Xc) (y - Xc) - ee = (c - b) XX(c - b).
The right hand side can be written as dd where d = X(c - b), so it is necessarily positive. This confirms what
we knew at the outset, least squares is least squares.
3. Consider the least squares regression of y on K variables (with a constant), X. Consider an alternative set of
regressors, Z = XP, where P is a nonsingular matrix. Thus, each column of Z is a mixture of some of the
columns of X. Prove that the residual vectors in the regressions of y on X and y on Z are identical. What
relevance does this have to the question of changing the fit of a regression by changing the units of
measurement of the independent variables?
The residual vector in the regression of y on X is MXy = [I - X(XX)-1X]y. The residual vector in
the regression of y on Z is
MZy = [I - Z(ZZ)-1Z]y
= [I - XP((XP)(XP))-1(XP))y
= [I - XPP-1(XX)-1(P)-1PX)y
= MXy
Since the residual vectors are identical, the fits must be as well. Changing the units of measurement of the
regressors is equivalent to postmultiplying by a diagonal P matrix whose kth diagonal element is the scale
factor to be applied to the kth variable (1 if it is to be unchanged). It follows from the result above that this
will not change the fit of the regression.
4. In the least squares regression of y on a constant and X, in order to compute the regression coefficients on
X, we can first transform y to deviations from the mean, y , and, likewise, transform each column of X to
deviations from the respective column means; second, regress the transformed y on the transformed X without
a constant. Do we get the same result if we only transform y? What if we only transform X?
3
, In the regression of y on i and X, the coefficients on X are b = (XM0X)-1XM0y. M0 = I - i(ii)-1i
is the matrix which transforms observations into deviations from their column means. Since M0 is idempotent
and symmetric we may also write the preceding as [(XM0)(M0X)]-1(XM0M0y) which implies that the
regression of M0y on M0X produces the least squares slopes. If only X is transformed to deviations, we
would compute [(XM0)(M0X)]-1(XM0)y but, of course, this is identical. However, if only y is transformed,
the result is (XX)-1XM0y which is likely to be quite different. We can extend the result in (6-24) to derive
what is produced by this computation. In the formulation, we let X1 be X and X2 is the column of ones, so
that b2 is the least squares intercept. Thus, the coefficient vector b defined above would be b = (XX)-1X(y
- ai). But, a = y - b x so b = (XX)-1X(y - i( y - b x )). We can partition this result to produce
(XX)-1X(y - i y )= b - (XX)-1Xi(b x )= (I - n(XX)-1 x x )b.
(The last result follows from Xi = n x .) This does not provide much guidance, of course, beyond the
observation that if the means of the regressors are not zero, the resulting slope vector will differ from the
correct least squares coefficient vector.
5. What is the result of the matrix product M1M where M1 is defined in (3-19) and M is defined in (3-14)?
M1M = (I - X1(X1X1)-1X1)(I - X(XX)-1X) = M - X1(X1X1)-1X1M
There is no need to multiply out the second term. Each column of MX1 is the vector of residuals in the
regression of the corresponding column of X1 on all of the columns in X. Since that x is one of the columns in
X, this regression provides a perfect fit, so the residuals are zero. Thus, MX1 is a matrix of zeroes which
implies that M1M = M.
6. Adding an observation. A data set consists of n observations on Xn and yn. The least squares estimator
based on these n observations is b = (X X )−1 X y . Another observation, xs and ys, becomes
n n n n n
available. Prove that the least squares estimator computed using this additional observation is
1
b =b + (X X )−1x ( y − xb ).
n,s n 1 + x (X X ) x −1 n n s s s n
s n n s
Note that the last term is es, the residual from the prediction of ys using the coefficients based on Xn and bn.
Conclude that the new data change the results of least squares only if the new observation on y cannot be
perfectly predicted using the information already in hand.
7. A common strategy for handling a case in which an observation is missing data for one or more variables is
to fill those missing variables with 0s or add a variable to the model that takes the value 1 for that one
observation and 0 for all other observations. Show that this ‘strategy’ is equivalent to discarding the
observation as regards the computation of b but it does have an effect on R2. Consider the special case in
which X contains only a constant and one variable. Show that replacing the missing values of X with the
mean of the complete observations has the same effect as adding the new variable.
8. Let Y denote total expenditure on consumer durables, nondurables, and services, and Ed, En, and Es are the
expenditures on the three categories. As defined, Y = Ed + En + Es. Now, consider the expenditure system
Ed = d + dY + ddPd + dnPn + dsPs + d
En = n + nY + ndPd + nnPn + nsPs + n
Es = s + sY + sdPd + snPn + ssPs + s.
Prove that if all equations are estimated by ordinary least squares, then the sum of the income coefficients will
be 1 and the four other column sums in the preceding model will be zero.
For convenience, reorder the variables so that X = [i, Pd, Pn, Ps, Y]. The three dependent variables
are Ed, En, and Es, and Y = Ed + En + Es. The coefficient vectors are
bd = (XX)-1XEd, bn = (XX)-1XEn, and bs = (XX)-1XEs.
The sum of the three vectors is
b = (XX)-1X[Ed + En + Es] = (XX)-1XY.
Now, Y is the last column of X, so the preceding sum is the vector of least squares coefficients in the
regression of the last column of X on all of the columns of X, including the last. Of course, we get a perfect
4