This walkthrough of the course is relevant for the academic year of 25/26’, this
means material can differ per year. Midterm only has MC questions. Final exam 1/3
MC questions and 2/3 open questions. Stata is not involved in both exams.
Week 1: Introduction to Econometrics
Econometrics is the use of statistical methods to estimate and analyze economic
relationships using data. Econometrics is used to empirically estimate economic
relationships, test economic theories, make economic predictions, evaluate policies.
With observational data (not from a lab), we often see correlation but want causality.
To interpret a regression coefficient causally, we need the ceteris paribus (everything
else constant) to hold. We want the change in Y due only to X, not because other
things moved too. The important difference between correlation and causation, is that
correlation shows that two variables move together, while causation means that
changing X creates a change in Y: X Y, it’s a causal effect.
In a simple linear regression model, we have one output variable (y), and one
independent variable (x), this means that the model will looks as follows; Y = β 0 + β1X
+ u, where both β0 & β1 are population parameters. Population parameters are the
true (unknown) numbers that describe the relationship in the entire population, not
just the sample. In this simple regression model, u, is the error term. The error term
(u) includes everything that is not inside the model itself, often Y depends on many
more things besides just X, this is bundled in the u term.
It is important to remember that we can only causally interpret β 1 if X changes, while
nothing inside the error term changes systematically. This means that β 1 is causal if
and only if E [ X | u ] = 0. An example will make this really clear;
Consider the following economic model, where we want to estimate the effect of
education on wages; Wages = β 0 + β1Education + u. Suppose that u contains ability,
which you did not measure ability affects both education and wages. More able
people tend to get more education, and more able people tend to earn higher wages
independent of their educational achievements. Therefore, people with high
education tend to have higher ability this means that the following does NOT hold;
E [ X | u ] = 0. If education is correlated with the error term (as is the case), we have
an endogeneity problem. It can be solved using exogenous variation, this can be
done by using a policy rule for example, people born after a certain date must stay
in school longer. This creates a jump in education independent of ability (u).
There exists 3 different types of data;
1. Cross-sectional data; many units, one time point
2. Time series data; one unit, many time points
3. Panel data; Many units, many time points (most useful)
,It is important that every dataset is independent and identically distributed.
Independent means that Yi does not tell you anything about Yj. Identically distributed
means that every observation is drawn from the same population, with the same
population parameters.
If we have the simple linear regression model Y = β 0 + β1X + u, we can implement a
change in x: ∆ y =β 1 ∆ x +∆ u, then we take the derivative of y with respect to x:
Δy
=β 1 , only if Δ u=0 . This means that we can only speak of causality if the ceteris
Δx
paribus condition holds (holding everything else (u) constant). β 1 is the change in y
from a one-unit change in X , if the error term does not change. The error term can
contain; omitted variables, randomness, non-linearities, and measurement errors.
Normalization implies; E [ X | u ] = 0, the average value of the error term is zero for
every value of X. So, for people with a low X and high X, the average value of the
error term is the same. If this happens X is exogenous (not contaminated by the
error). If it fails X is endogenous and β 1 is not causal. If E [ X | u ] = 0 then, Cov (x,u)
= 0.
In economic models, our goal is to estimate the population parameters. We can do
this using OLS (ordinary least squares principle). OLS chooses ^
β1 & ^
β 0 in such a way
that it reduces the sum of squared residuals. ^β describes the OLS estimates
computed from the sample (they are the best guesses of the population parameters).
X bar and Y bar are the mean of the corresponding variables.
We can estimate the population parameters as follows;
n
∑ ( x i−x ) ( y i− y )
^
β 1= i=1 , this is the OLS slope estimator.
n
∑ ( xi −x ) 2
i =1
^
β 0= y − ^
β 1 x , this is the OLS intercept estimator.
Where x = the mean of x values and y = the mean of y values.
The calculation of ^
β 1 shows the numerator, which is the sample covariance (how x
and y move together). The denominator in the calculation of ^ β 1 shows the sample
variance (how much spread in x). If ^
β > 0, then x and y are positively correlated, but
1
correlation is not causality. Only causality (again) if; E [ X | u ] = 0.
The key difference between covariance and correlation, is that covariance is
measured in units, while correlation is used uniformly and shows a number between -
1 & 1.
, σ^Y
The OLS slope estimate can be computed using; ^ β 1= ^
ρ XY ⋅ , where ^
ρ XY = correlation
σ^X
σ^Y
(direction and strength, but no units). While, , converts the unit-free measure into
σ^X
the units of the slope. Therefore, the slope of a regression line is basically correlation,
but shown in units of the variables selected. If Y shows euros/hour and X is years of
education, then ^ β 1 is euros/hour per year of education. Correlation could not say that
since it is not measured in units. This then relates back to scatterplots if the data is
organized in an upward cloud, this implies that the covariance is positive which
implies that the correlation is positive this means that the OLS estimator of the
slope ( ^
β 1) tends to be positive as well. We require the covariance of X conditional on
the error term to be 0, otherwise the causal effect on Y is caused by two factors. The
causal part from X to Y and the part of u moving with X (confounding). The population
correlation can be calculated as follows:
Cov ( X ,Y )
ρ XY =
σ X σY
The key point to keep in mind here is that: If ^β 1> 0, then X and Y are positively
correlated (in the sample). But correlation alone is not causality (back to
endogeneity).
The intercept in a regression model (constant), is the output of Y, if X is zero in the
dataset. Again OLS focuses on minimizing the sum of squared residuals, this is
shown in the following formula;
n n
min ∑ ( yi − ^ β 1 x i ) which is the same as saying; min ∑ ( y i− ^
2
β 0− ^
2
- y i) = SSR
^
β0 , ^
β 1 i=1 ^
β0 , ^
β1 i=1
(sum squared residuals), this shows how far the observed points are from the
regression line. The smaller SSR, the better the model fits the larger the
explained variance.
Other useful measures: SST and SSE;
n
SST = ∑ ( y i− y ) ,how much the y values vary around their average y .
2
-
i=1
n
- SSE = ∑ ( ^y i− y )2 ,how much variation in y does the regression line capture
i=1
using x.
The difference between SSR and SSE; SSR = estimated error from your fitted
regression. SSR can never increase by adding a variable, it will either stay the same
or decrease. SSE = the explained sum of squares, ý is the baseline prediction if you
used no x : you’d predict everyone’s y as the same number, the mean. ^y i are the
model’s predictions using x . The differences ( ^y i − ý ) show how much your predictions
move away from the baseline. If SSE is small predictions are close to ý , so x does