Arnaud Robyns KU Leuven
APPLICATION OF STATISTICS
Schedule:
General Information:
• Exam is written open book (16 out of the 20 points)
• Tasks account for 4 out of the 20 points
• Retake: same but tasks are replaced by PC-room practical exam
• Usage of R Studio is important
• Core course: "Applications of Statistics 2023-2024.pdf"
• Your own notes can also be brought to the exam (but no loose sheets)
• Mock exam with answers is available on Toledo (ATSTAT)
1
,Arnaud Robyns KU Leuven
PART 1
Regression: Introduction
We distinguish two sorts of data groups. The first one contains empirical and experimental data,
while the second one is comparing data of different events such as cross-sectional data, time
series data, and panel data (combination of the two previously mentioned).
Regressions are mainly used for observational data (first group) and ANOVA for experimental
data (second group).
Linear Regression Model:
The sign of the coefficient (that are related to a certain independent variable) will determine
whether the increase in size of the independent variable has a positive or negative effect on our
dependent variable. For linear models, exponents or interactions (divide or multiply variable
terms) are not possible. Logarithms can still appear, on the dependent on independent side
(recall: lin-lin, log-lin, lin-log, log-log).
2
,Arnaud Robyns KU Leuven
How to find values? – Minimize OLS Estimators using following formula:
The goal of minimizing is to minimize the deviation of the residuals.
Some explanations:
• ‘b’ is an estimators of ß and we denote it with ^ß
• ‘b’ can denote a value of a random variable (stochastic)
• There is a big difference between sample and population line (A population includes all
members of a defined group, whereas a sample is a subset of that population used to
represent the whole)
3
, Arnaud Robyns KU Leuven
Regression: Properties of OLS Estimators
Ordinary Least Squares (OLS) estimators minimize the sum of the squared differences between
observed and predicted values, providing estimates that are the "best fit" line for the data points,
aiming to represent the relationship between variables.
(A1) Expectation of error terms
(residuals) = 0
(A2) Variance of residuals is constant
(homoscedasticity)
(A3) no correlation between error
terms (cov = corr*var(x)*var(y))
In short, OLS are unbiased (E(b) = ß),
consistent (estimate converges to true
value), efficient (smallest variance
between, all estimators), BLUE (no
perfect multicollinearity or
homoscedasticity), and asymptotic
normal (in very large samples, the
distribution can be approximated by a
normal distribution)
1. Unbiasedness: On average, OLS estimators provide accurate estimates that are neither
systematically too high nor too low.
2. Consistency: As the sample size increases, the estimates converge to the true population
values.
3. Efficiency: OLS estimators are statistically efficient, meaning they have the smallest
variance among all unbiased estimators.
4. Asymptotic Normality: In large samples, the sampling distribution of OLS estimators
approaches a normal distribution.
5. Gauss-Markov Theorem: OLS estimators are best linear unbiased estimators (BLUE)
when certain assumptions (such as no perfect multicollinearity, homoscedasticity, etc.)
are met.
4
APPLICATION OF STATISTICS
Schedule:
General Information:
• Exam is written open book (16 out of the 20 points)
• Tasks account for 4 out of the 20 points
• Retake: same but tasks are replaced by PC-room practical exam
• Usage of R Studio is important
• Core course: "Applications of Statistics 2023-2024.pdf"
• Your own notes can also be brought to the exam (but no loose sheets)
• Mock exam with answers is available on Toledo (ATSTAT)
1
,Arnaud Robyns KU Leuven
PART 1
Regression: Introduction
We distinguish two sorts of data groups. The first one contains empirical and experimental data,
while the second one is comparing data of different events such as cross-sectional data, time
series data, and panel data (combination of the two previously mentioned).
Regressions are mainly used for observational data (first group) and ANOVA for experimental
data (second group).
Linear Regression Model:
The sign of the coefficient (that are related to a certain independent variable) will determine
whether the increase in size of the independent variable has a positive or negative effect on our
dependent variable. For linear models, exponents or interactions (divide or multiply variable
terms) are not possible. Logarithms can still appear, on the dependent on independent side
(recall: lin-lin, log-lin, lin-log, log-log).
2
,Arnaud Robyns KU Leuven
How to find values? – Minimize OLS Estimators using following formula:
The goal of minimizing is to minimize the deviation of the residuals.
Some explanations:
• ‘b’ is an estimators of ß and we denote it with ^ß
• ‘b’ can denote a value of a random variable (stochastic)
• There is a big difference between sample and population line (A population includes all
members of a defined group, whereas a sample is a subset of that population used to
represent the whole)
3
, Arnaud Robyns KU Leuven
Regression: Properties of OLS Estimators
Ordinary Least Squares (OLS) estimators minimize the sum of the squared differences between
observed and predicted values, providing estimates that are the "best fit" line for the data points,
aiming to represent the relationship between variables.
(A1) Expectation of error terms
(residuals) = 0
(A2) Variance of residuals is constant
(homoscedasticity)
(A3) no correlation between error
terms (cov = corr*var(x)*var(y))
In short, OLS are unbiased (E(b) = ß),
consistent (estimate converges to true
value), efficient (smallest variance
between, all estimators), BLUE (no
perfect multicollinearity or
homoscedasticity), and asymptotic
normal (in very large samples, the
distribution can be approximated by a
normal distribution)
1. Unbiasedness: On average, OLS estimators provide accurate estimates that are neither
systematically too high nor too low.
2. Consistency: As the sample size increases, the estimates converge to the true population
values.
3. Efficiency: OLS estimators are statistically efficient, meaning they have the smallest
variance among all unbiased estimators.
4. Asymptotic Normality: In large samples, the sampling distribution of OLS estimators
approaches a normal distribution.
5. Gauss-Markov Theorem: OLS estimators are best linear unbiased estimators (BLUE)
when certain assumptions (such as no perfect multicollinearity, homoscedasticity, etc.)
are met.
4