Econometrics
Max Batstra
April 2024
1
, Lecture 1
Background Material
For a random vector, the expectation is defined as a vector composed of the expected values of its corre-
sponding elements:
X1 EX1
X2 EX2
EX = E . =
..
..
.
Xn EXn
Definition 1. [Covariance Matrix]
Let X ′ ≡ X − EX. For X ∈ Rn the covariance matrix is defined as:
Var(X) = E (X − EX)(X − EX)⊤
′ ′
X1 X1 X1′ X2′ · · · X1′ Xn′
X2′ X1′ X2′ X2′ · · · X2′ Xn′
= E
.. .. .. ..
. . . .
Xn′ X1′ Xn′ X2′ ··· Xn′ Xn′
Var (X1 ) Cov (X1 , X2 ) ··· Cov (X1 , Xn )
Cov (X2 , X1 ) Var (X2 ) ··· Cov (X2 , Xn )
=
.. .. .. ..
. . . .
Cov (Xn , X1 ) Cov (Xn , X2 ) · · · Var (Xn )
Some useful properties:
• Var(X) = EXX ⊤ − EXEX ⊤
• Var(AX) = A Var(X)A⊤ for a compatible non-stochastic matrix A
• Cov(X, Y ) = (Cov(Y, X))⊤
• Var(X + Y ) = Var(X) + Var(Y ) + Cov(X, Y ) + Cov(Y, X)
Some useful properties of conditional expectations:
• For any functions f and h it holds that: E[f (X)h(Y ) | Y ] = E[f (X) | Y ]h(Y )
• If X and Y are independent, then E[X | Y ] = E[X]
• If E[X | Y ] = E[X], then Cov(X, Y ) = 0. This property is called mean independence
2
,The following distributions are related and are extensively used in statistical inference:
• Multivariate normal distribution X ∼ N (µ, Σ), X ∈ Rn , µ ∈ Rn
1 1 ⊤ −1
f (x) = p exp − (x − µ) Σ (x − µ) , x ∈ Rn
(2π)n det(Σ) 2
Suppose that:
X1 µ1 Σ11 Σ12
∼N ,
X2 µ2 Σ21 Σ22
Then X1 ∼ N (µ1 , Σ11 ) and X2 ∼ N (µ2 , Σ22 ). And:
X2 | X1 = x ∼ N µ2|1 (x), Σ2|1
Where µ2|1 (x) = µ2 + Σ21 Σ−1 −1
11 (x − µ1 ) and Σ2|1 = Σ22 − Σ21 Σ11 Σ12
• Chi-square distribution X ∼ χ2n , E[X] = n, Var(X) = 2n
Suppose Z ∼ N (0, In ) such that the elements of Z are i.i.d. standard normal random variables, then
n
X
X = Z ⊤Z = Z 2 ∼ χ2Pni=1 1 = χ2n
i=1
• Student’s t-distribution X ∼ tn
Let Z ∼ N (0, 1) and Y ∼ χ2n be independent random variables, then:
Z
X=p ∼ tn
Y /n
For large n the density of tn approaches that of N (0, 1). Let X ∼ tn , then E[X] does not exist for
n
n = 1 and E[X] = 0 for n > 1. Var(X) = for n > 2. A t-distribution is symmetric around it’s
n−2
mean, hence the skew is also zero, therefore µ′3 = 0.
• F -distribution X ∼ Fv1 ,v2 Let X1 ∼ χ2v1 , X2 ∼ χ2v2 , then
X1 /v1
∼ Fv1 ,v2
X2 /v2
Note that if X ∼ tn , then X 2 ∼ F1,n
3
,Taking derivatives of vectors has the following properties:
Let x ∈ Rn and a ∈ Rn and A ∈ Rn×m , then:
∂a⊤ x h ∂a⊤ x ∂a⊤ x
i
• = a⊤
= ··· a1 ··· an
∂x⊤ ∂x1 ∂xn
∂a⊤ x
∂x1 a1
⊤
∂a x .. .
• = .
= . =a
.
∂x
∂a⊤ x an
∂xn
∂a⊤
1 x
⊤
a
∂A⊤ x
∂xx
.. .1
•
= = . = A⊤
∂x⊤ . .
∂a⊤
mx a⊤m
∂x
h i
∂x⊤ A ∂x⊤ a1 ∂x⊤ am
•
∂x = ∂x ··· ∂x
= a1 ··· am =A
If m = n and A is symmetric, then:
∂x⊤ Ax
= 2Ax
∂x
4
, Lecture 2
Linear Regression Model and OLS
A pair (Yi , Xi ) is called an observation and the collection of observations is called a sample. The joint
distribution of the sample is called a population. In the linear regression model, it is assumed that
E [Yi | Xi ] = β1 Xi1 + β2 Xi2 + · · · + βk Xik = Xi⊤ β
LRM Assumptions
In this section we formally define the linear regression model. First define:
⊤ ⊤
X≡ X1 X2 ··· Xn ,y ≡ Y1 Y2 ··· Yn ,ε ≡ ε1 ε2 ··· εn
Note that here X is a matrix, and y, ε are vectors.
A1 to A4 are the four classical regression assumptions, Assumptions (A1)-(A5) define the classical normal
linear regression model or (LRM)
(A1) y = Xβ + ε =⇒
(A2) E[ε | X] = 0 =⇒ ε is mean independent of X
2
(A3) Var(ε | X) = σ In =⇒ εi have same variance and are uncorrelated
(A4) rank(X) = k =⇒ X ⊤ X is invertible
(A5) ε | X ∼ N (0, σ 2 In ) =⇒ independence of the residuals
(A6) {(Yi , Xi ) : i = 1, . . . , n} are i.i.d. =⇒ {εi : i = 1, . . . , n} are i.i.d
(A2∗ ) E [εi Xi ] = 0 =⇒ weaker assumption of uncorrelatedness (A2)
From (A5) it also follows that:
y | X ∼ N (Xβ, σ 2 In )
Independence of the residuals may be achieved through (A5) or through (A1) and (A6)
Estimation by the Method of Moments
One of the oldest methods of finding estimators is called the Method of Moments (MM). The MM involves
equating theoretical moments with corresponding sample moments. For MM we need (A2) or (A2∗ ), then
first theoretical moment is:
0 = E[Xi εi ] = E[Xi (Yi − Xi⊤ β)]
and the first sample moment is (note that from then on we use β̂):
n
1X
0= Xi Yi − Xi⊤ β̂
n i=1
5
Max Batstra
April 2024
1
, Lecture 1
Background Material
For a random vector, the expectation is defined as a vector composed of the expected values of its corre-
sponding elements:
X1 EX1
X2 EX2
EX = E . =
..
..
.
Xn EXn
Definition 1. [Covariance Matrix]
Let X ′ ≡ X − EX. For X ∈ Rn the covariance matrix is defined as:
Var(X) = E (X − EX)(X − EX)⊤
′ ′
X1 X1 X1′ X2′ · · · X1′ Xn′
X2′ X1′ X2′ X2′ · · · X2′ Xn′
= E
.. .. .. ..
. . . .
Xn′ X1′ Xn′ X2′ ··· Xn′ Xn′
Var (X1 ) Cov (X1 , X2 ) ··· Cov (X1 , Xn )
Cov (X2 , X1 ) Var (X2 ) ··· Cov (X2 , Xn )
=
.. .. .. ..
. . . .
Cov (Xn , X1 ) Cov (Xn , X2 ) · · · Var (Xn )
Some useful properties:
• Var(X) = EXX ⊤ − EXEX ⊤
• Var(AX) = A Var(X)A⊤ for a compatible non-stochastic matrix A
• Cov(X, Y ) = (Cov(Y, X))⊤
• Var(X + Y ) = Var(X) + Var(Y ) + Cov(X, Y ) + Cov(Y, X)
Some useful properties of conditional expectations:
• For any functions f and h it holds that: E[f (X)h(Y ) | Y ] = E[f (X) | Y ]h(Y )
• If X and Y are independent, then E[X | Y ] = E[X]
• If E[X | Y ] = E[X], then Cov(X, Y ) = 0. This property is called mean independence
2
,The following distributions are related and are extensively used in statistical inference:
• Multivariate normal distribution X ∼ N (µ, Σ), X ∈ Rn , µ ∈ Rn
1 1 ⊤ −1
f (x) = p exp − (x − µ) Σ (x − µ) , x ∈ Rn
(2π)n det(Σ) 2
Suppose that:
X1 µ1 Σ11 Σ12
∼N ,
X2 µ2 Σ21 Σ22
Then X1 ∼ N (µ1 , Σ11 ) and X2 ∼ N (µ2 , Σ22 ). And:
X2 | X1 = x ∼ N µ2|1 (x), Σ2|1
Where µ2|1 (x) = µ2 + Σ21 Σ−1 −1
11 (x − µ1 ) and Σ2|1 = Σ22 − Σ21 Σ11 Σ12
• Chi-square distribution X ∼ χ2n , E[X] = n, Var(X) = 2n
Suppose Z ∼ N (0, In ) such that the elements of Z are i.i.d. standard normal random variables, then
n
X
X = Z ⊤Z = Z 2 ∼ χ2Pni=1 1 = χ2n
i=1
• Student’s t-distribution X ∼ tn
Let Z ∼ N (0, 1) and Y ∼ χ2n be independent random variables, then:
Z
X=p ∼ tn
Y /n
For large n the density of tn approaches that of N (0, 1). Let X ∼ tn , then E[X] does not exist for
n
n = 1 and E[X] = 0 for n > 1. Var(X) = for n > 2. A t-distribution is symmetric around it’s
n−2
mean, hence the skew is also zero, therefore µ′3 = 0.
• F -distribution X ∼ Fv1 ,v2 Let X1 ∼ χ2v1 , X2 ∼ χ2v2 , then
X1 /v1
∼ Fv1 ,v2
X2 /v2
Note that if X ∼ tn , then X 2 ∼ F1,n
3
,Taking derivatives of vectors has the following properties:
Let x ∈ Rn and a ∈ Rn and A ∈ Rn×m , then:
∂a⊤ x h ∂a⊤ x ∂a⊤ x
i
• = a⊤
= ··· a1 ··· an
∂x⊤ ∂x1 ∂xn
∂a⊤ x
∂x1 a1
⊤
∂a x .. .
• = .
= . =a
.
∂x
∂a⊤ x an
∂xn
∂a⊤
1 x
⊤
a
∂A⊤ x
∂xx
.. .1
•
= = . = A⊤
∂x⊤ . .
∂a⊤
mx a⊤m
∂x
h i
∂x⊤ A ∂x⊤ a1 ∂x⊤ am
•
∂x = ∂x ··· ∂x
= a1 ··· am =A
If m = n and A is symmetric, then:
∂x⊤ Ax
= 2Ax
∂x
4
, Lecture 2
Linear Regression Model and OLS
A pair (Yi , Xi ) is called an observation and the collection of observations is called a sample. The joint
distribution of the sample is called a population. In the linear regression model, it is assumed that
E [Yi | Xi ] = β1 Xi1 + β2 Xi2 + · · · + βk Xik = Xi⊤ β
LRM Assumptions
In this section we formally define the linear regression model. First define:
⊤ ⊤
X≡ X1 X2 ··· Xn ,y ≡ Y1 Y2 ··· Yn ,ε ≡ ε1 ε2 ··· εn
Note that here X is a matrix, and y, ε are vectors.
A1 to A4 are the four classical regression assumptions, Assumptions (A1)-(A5) define the classical normal
linear regression model or (LRM)
(A1) y = Xβ + ε =⇒
(A2) E[ε | X] = 0 =⇒ ε is mean independent of X
2
(A3) Var(ε | X) = σ In =⇒ εi have same variance and are uncorrelated
(A4) rank(X) = k =⇒ X ⊤ X is invertible
(A5) ε | X ∼ N (0, σ 2 In ) =⇒ independence of the residuals
(A6) {(Yi , Xi ) : i = 1, . . . , n} are i.i.d. =⇒ {εi : i = 1, . . . , n} are i.i.d
(A2∗ ) E [εi Xi ] = 0 =⇒ weaker assumption of uncorrelatedness (A2)
From (A5) it also follows that:
y | X ∼ N (Xβ, σ 2 In )
Independence of the residuals may be achieved through (A5) or through (A1) and (A6)
Estimation by the Method of Moments
One of the oldest methods of finding estimators is called the Method of Moments (MM). The MM involves
equating theoretical moments with corresponding sample moments. For MM we need (A2) or (A2∗ ), then
first theoretical moment is:
0 = E[Xi εi ] = E[Xi (Yi − Xi⊤ β)]
and the first sample moment is (note that from then on we use β̂):
n
1X
0= Xi Yi − Xi⊤ β̂
n i=1
5