Summary

Summary Complete exam material Data Science Methods, Master Econometrics and Master Data Science & Business Analytics, University of Amsterdam

Rating

Sold

Pages

Uploaded on

16-10-2025

Written in

2025/2026

Complete summary of the exam material for the course Data Science Methods in the Master Econometrics and Master Data Science and Business Analytics at the University of Amsterdam. The summary is in English. All lectures are in the summary, with extra information on some more complex topics.

Show more Read less

Institution

Course

Content preview

Data Science Methods Overview CHoogteijling

Data Science Methods
Contents
1 Model Evaluation 3
1.1 Linear Models for Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Generalization Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 The Bias-Variance Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 Estimating the Expected Prediction Error . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.5 In-Sample Measures for Generalization Error: AIC and BIC . . . . . . . . . . . . . . . . . 5
1.6 K-fold Cross-Validation (CV) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2 Shrinkage methods 8
2.1 Ridge Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2 Lasso Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

3 Dimension reduction 9
3.1 Curse of Dimensionality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.2 Feature Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.3 Principal Component Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.4 Selecting the Number of Factors L . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.5 PCA versus Factor Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

4 Nonparametric Regression: k-Nearest Neighbors and Kernel Regression 12
4.1 k-Nearest Neighbors Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
4.2 Kernel Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.3 The MSE of the NW Estimator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.4 Local Linear Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

5 Linear Discriminant Analysis 15
5.1 Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
5.2 Decision Theory for Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
5.3 Linear Methods for Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
5.4 Linear Probability Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
5.5 LDA for Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
5.6 Reduced Rank LDA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
5.7 Fisher’s Linear Discriminant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
5.8 QDA and Regularized Discriminant Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 18
5.9 Model Evaluation applied to Classification Problems . . . . . . . . . . . . . . . . . . . . . 18

6 Logistic Regression and Stochastic Gradient Descent 19
6.1 Logistic Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
6.2 Training Logistic Regression Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

1

,Data Science Methods Overview CHoogteijling

6.3 Regularisation of Logistic Regression Models . . . . . . . . . . . . . . . . . . . . . . . . . 19
6.4 Comparison of Logistic Regression and LDA . . . . . . . . . . . . . . . . . . . . . . . . . . 19
6.5 Newton-Raphson Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
6.6 Stochastic Gradient Descent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

7 Clustering Methods 22
7.1 K-Means Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
7.2 Hierarchical Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

8 Bayesian Updating 24
8.1 Bayes’ Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
8.2 Bayes Estimators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
8.3 Bayesian Learning: Recursion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
8.4 Bayesian Learning and Ridge Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

9 Model Averaging 26
9.1 Weighting Schemes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
9.2 Consistency and Asymptotic RMSE Optimality . . . . . . . . . . . . . . . . . . . . . . . . 28
9.3 Model Averaging for Gaussian Mixture Model . . . . . . . . . . . . . . . . . . . . . . . . . 29

A Background 31
A.1 Jensen’s Inequality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
A.2 Rayleigh Quotient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
A.3 Logarithm Cribsheet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
A.4 Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
A.5 Eigenvectors and Eigenvalues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
A.6 The Lagrangian Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
A.7 Matrix Inverse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

B Test Questions 34

2

,Data Science Methods Overview CHoogteijling

1 Model Evaluation
Model performance is measured by how well a model generalizes. There are two potential objectives
for model evaluation, they can both play a role.

• Model selection is comparing the performance of different models to identify the best model.
• Model assessment is estimating the ability of a model to perform on new data.

In data-rich situations we can use train-val-test split, and in cases of insufficient data we can use cross-
validation.

1.1 Linear Models for Regression
Suppose we have p features X = (X1 , . . . , Xp )T in the feature space and the target variable Y . We
consider the regression model of the form
Y = f (X, β) + ε, with
M
X
f (X, β) = βm hm (X)
m=0
ε˜N (0, σε2 error term

• A linear regression model has basis functions hm (X), m = 1, . . . , M as the features, spanning an
M -dimensional feature space.

We set up the log-likelihood function to find the least squares problem and maximize to the noise variance:

1. We have the likelihood function, that helps determine the model parameters βj and σε . Where X
is the N × (M + 1) matrix with elements X nm = hm (xn ) and y = (y1 , . . . , yN )T .
N
Y
P (y | X, β, σε ) = N (yn | |f (xn , β), σε2 )
n=1

2. We take the logarithm of P (y | X, β, σε ), where ED (β) is the sum-of-squared-errors-function.
This shows that maximizing the likelihood with respect to the βm is equivalent to minimizing the
sum-of-squared-errors.
N ED (β)
ln P (y | X, β, σε ) = −N ln σε − ln(2π) −
2 σε2
N N
1X 2 1X
ED (β) = (yn − f (xn , β)) = (yn − β T h(xn ))2
2 n=1 2 n=1

3. We differentiate the log likelihood function with respect to βm .
N
1 X
∂βm ln P (y | X, β, σε ) = − (yn − β T h(xn ))(hm (xn )
σε2 n=1

4. We set these to zero for m = 0, . . . , M and solve for βm , then we have the normal equations for the
least squares problem.
β̂ = (X T X)−1 X T y = X + y
X + = (X T X)−1 X T Moore-Penrose inverse

5. We maximize the log likelihood function with respect to the noise variance σε2 .
N
1 X
σε2 = (yn − β̂ T h(xn ))
N n=1

3

, Data Science Methods Overview CHoogteijling

1.2 Generalization Error
We have the loss functions for a trained regression model fˆ(X):

L(Y, fˆ(X)) = (Y − fˆ(X))2 squared error
L(Y, fˆ(X)) − |Y − fˆ(X)| absolute error

The generalization error shows how well the model predicts responses for new data independently
drawn from the same population distribution. For the data set T = {(xn , yn )}N
n=1 .

errT = E(X,Y ) [L(Y, fˆ(X)) | T ]

The expected prediction error quantifies how well a predictive model is expected to perform on new,
unseen data.

err = ET ,(X,Y ) [L(Y, fˆ(X))]
err = ET [errT ]

The training error is the average loss on the set T the model was trained on.
N
1 X
err = L(yn , fˆ(xn ))
N n=1

• The prediction error is the average discrepancy between the model’s predictions and the true values
of the dependent variable for new observations.
• The prediction error is the expectation of the generalization error when averaged over all possi-
ble sets of observations T because the observations are drawn independently from the same joint
distribution as (X, Y ).
• The generalization error should be small to ensure low prediction error on unseen data.
• The generalization error can often not be estimated directly, so we use the estimate of the expected
prediction.
• The training error can never be an indicator of the generalization performance, as we can make the
training error arbitrarily small without improving generalization performance.
• Overfitting is when the model is too tailored to the specifics of the noise in the training set.

1.3 The Bias-Variance Decomposition
The prediction error can be decomposed into three terms: the bias (squared) of the estimated model,
plus the variance of the estimated model, plus the variance of the Gaussian noise.

• The bias term measures how much on average our estimated model deviates from the true mean,
given by the function f (X).
• The variance term is the expected (squared) deviation of the estimated model around its mean.
• The third term is an irreducible error, due to the inherent variance in the data-generating process
around its true mean f (X).

err[x0 ] = E[(Y − fˆ(X))2 | X = x0 ]
= (E[f (x0 )] − f (x0 ))2 + E[f (x0 ) − E[fˆ(x0 )]]2 + σε2
= bias2 (fˆ(x0 )) + Var(fˆ(x0 )) + σ 2ε
2
= bias + variance + σε2

4

Report Copyright Violation

Written for

Institution: Universiteit van Amsterdam (UvA)
Study: Econometrics
Course: Data Science Methods

All documents for this subject (1)

Document information

Uploaded on: October 16, 2025
Number of pages: 34
Written in: 2025/2026
Type: SUMMARY

Subjects

econometrics
bayesian updating
classification
regression
data science
logistic regression
linear discriminant analysis
expectation maximization algorithm

$12.10

Get access to the full document:

Written by students who passed

Immediately available after payment

Read online or as PDF

Get to know the seller

charhoog

3.0

(1)

Get to know the seller

charhoog Universiteit van Amsterdam

View profile

Sold

Member since

2 year

Number of followers

Documents

Last sold

2 weeks ago

3.0

1 reviews

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions

What do I get when I buy this document?

You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.

Satisfaction guarantee: how does it work?

Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.

Who am I buying these notes from?

Stuvia is a marketplace, so you are not buying this document from us, but from seller charhoog. Stuvia facilitates payment to the seller.

Will I be stuck with a subscription?

No, you only buy these notes for $12.10. You're not tied to anything after your purchase.

Can Stuvia be trusted?

4.6 stars on Google & Trustpilot (+1000 reviews) 45362 documents were sold in the last 30 days Founded in 2010, the go-to place to buy study notes for 16 years now

Summary Complete exam material Data Science Methods, Master Econometrics and Master Data Science & Business Analytics, University of Amsterdam

Content preview

Written for

Document information

Subjects

Get to know the seller

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Didn't get what you expected? Choose another document

Pay as you like, start learning right away

Working on your references?

Frequently asked questions

What do I get when I buy this document?

Satisfaction guarantee: how does it work?

Who am I buying these notes from?

Will I be stuck with a subscription?

Can Stuvia be trusted?