Written by students who passed Immediately available after payment Read online or as PDF Wrong document? Swap it for free 4.6 TrustPilot
logo-home
Summary

Summary Complete exam material Data Science Methods, Master Econometrics and Master Data Science & Business Analytics, University of Amsterdam

Rating
-
Sold
-
Pages
34
Uploaded on
16-10-2025
Written in
2025/2026

Complete summary of the exam material for the course Data Science Methods in the Master Econometrics and Master Data Science and Business Analytics at the University of Amsterdam. The summary is in English. All lectures are in the summary, with extra information on some more complex topics.

Show more Read less
Institution
Course

Content preview

Data Science Methods Overview CHoogteijling



Data Science Methods
Contents
1 Model Evaluation 3
1.1 Linear Models for Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Generalization Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 The Bias-Variance Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 Estimating the Expected Prediction Error . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.5 In-Sample Measures for Generalization Error: AIC and BIC . . . . . . . . . . . . . . . . . 5
1.6 K-fold Cross-Validation (CV) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2 Shrinkage methods 8
2.1 Ridge Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2 Lasso Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

3 Dimension reduction 9
3.1 Curse of Dimensionality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.2 Feature Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.3 Principal Component Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.4 Selecting the Number of Factors L . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.5 PCA versus Factor Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

4 Nonparametric Regression: k-Nearest Neighbors and Kernel Regression 12
4.1 k-Nearest Neighbors Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
4.2 Kernel Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.3 The MSE of the NW Estimator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.4 Local Linear Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

5 Linear Discriminant Analysis 15
5.1 Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
5.2 Decision Theory for Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
5.3 Linear Methods for Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
5.4 Linear Probability Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
5.5 LDA for Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
5.6 Reduced Rank LDA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
5.7 Fisher’s Linear Discriminant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
5.8 QDA and Regularized Discriminant Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 18
5.9 Model Evaluation applied to Classification Problems . . . . . . . . . . . . . . . . . . . . . 18

6 Logistic Regression and Stochastic Gradient Descent 19
6.1 Logistic Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
6.2 Training Logistic Regression Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19



1

,Data Science Methods Overview CHoogteijling


6.3 Regularisation of Logistic Regression Models . . . . . . . . . . . . . . . . . . . . . . . . . 19
6.4 Comparison of Logistic Regression and LDA . . . . . . . . . . . . . . . . . . . . . . . . . . 19
6.5 Newton-Raphson Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
6.6 Stochastic Gradient Descent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

7 Clustering Methods 22
7.1 K-Means Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
7.2 Hierarchical Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

8 Bayesian Updating 24
8.1 Bayes’ Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
8.2 Bayes Estimators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
8.3 Bayesian Learning: Recursion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
8.4 Bayesian Learning and Ridge Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

9 Model Averaging 26
9.1 Weighting Schemes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
9.2 Consistency and Asymptotic RMSE Optimality . . . . . . . . . . . . . . . . . . . . . . . . 28
9.3 Model Averaging for Gaussian Mixture Model . . . . . . . . . . . . . . . . . . . . . . . . . 29

A Background 31
A.1 Jensen’s Inequality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
A.2 Rayleigh Quotient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
A.3 Logarithm Cribsheet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
A.4 Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
A.5 Eigenvectors and Eigenvalues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
A.6 The Lagrangian Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
A.7 Matrix Inverse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

B Test Questions 34




2

,Data Science Methods Overview CHoogteijling


1 Model Evaluation
Model performance is measured by how well a model generalizes. There are two potential objectives
for model evaluation, they can both play a role.

• Model selection is comparing the performance of different models to identify the best model.
• Model assessment is estimating the ability of a model to perform on new data.

In data-rich situations we can use train-val-test split, and in cases of insufficient data we can use cross-
validation.


1.1 Linear Models for Regression
Suppose we have p features X = (X1 , . . . , Xp )T in the feature space and the target variable Y . We
consider the regression model of the form
Y = f (X, β) + ε, with
M
X
f (X, β) = βm hm (X)
m=0
ε˜N (0, σε2 error term

• A linear regression model has basis functions hm (X), m = 1, . . . , M as the features, spanning an
M -dimensional feature space.

We set up the log-likelihood function to find the least squares problem and maximize to the noise variance:

1. We have the likelihood function, that helps determine the model parameters βj and σε . Where X
is the N × (M + 1) matrix with elements X nm = hm (xn ) and y = (y1 , . . . , yN )T .
N
Y
P (y | X, β, σε ) = N (yn | |f (xn , β), σε2 )
n=1

2. We take the logarithm of P (y | X, β, σε ), where ED (β) is the sum-of-squared-errors-function.
This shows that maximizing the likelihood with respect to the βm is equivalent to minimizing the
sum-of-squared-errors.
N ED (β)
ln P (y | X, β, σε ) = −N ln σε − ln(2π) −
2 σε2
N N
1X 2 1X
ED (β) = (yn − f (xn , β)) = (yn − β T h(xn ))2
2 n=1 2 n=1

3. We differentiate the log likelihood function with respect to βm .
N
1 X
∂βm ln P (y | X, β, σε ) = − (yn − β T h(xn ))(hm (xn )
σε2 n=1

4. We set these to zero for m = 0, . . . , M and solve for βm , then we have the normal equations for the
least squares problem.
β̂ = (X T X)−1 X T y = X + y
X + = (X T X)−1 X T Moore-Penrose inverse

5. We maximize the log likelihood function with respect to the noise variance σε2 .
N
1 X
σε2 = (yn − β̂ T h(xn ))
N n=1


3

, Data Science Methods Overview CHoogteijling


1.2 Generalization Error
We have the loss functions for a trained regression model fˆ(X):

L(Y, fˆ(X)) = (Y − fˆ(X))2 squared error
L(Y, fˆ(X)) − |Y − fˆ(X)| absolute error

The generalization error shows how well the model predicts responses for new data independently
drawn from the same population distribution. For the data set T = {(xn , yn )}N
n=1 .

errT = E(X,Y ) [L(Y, fˆ(X)) | T ]

The expected prediction error quantifies how well a predictive model is expected to perform on new,
unseen data.

err = ET ,(X,Y ) [L(Y, fˆ(X))]
err = ET [errT ]

The training error is the average loss on the set T the model was trained on.
N
1 X
err = L(yn , fˆ(xn ))
N n=1

• The prediction error is the average discrepancy between the model’s predictions and the true values
of the dependent variable for new observations.
• The prediction error is the expectation of the generalization error when averaged over all possi-
ble sets of observations T because the observations are drawn independently from the same joint
distribution as (X, Y ).
• The generalization error should be small to ensure low prediction error on unseen data.
• The generalization error can often not be estimated directly, so we use the estimate of the expected
prediction.
• The training error can never be an indicator of the generalization performance, as we can make the
training error arbitrarily small without improving generalization performance.
• Overfitting is when the model is too tailored to the specifics of the noise in the training set.


1.3 The Bias-Variance Decomposition
The prediction error can be decomposed into three terms: the bias (squared) of the estimated model,
plus the variance of the estimated model, plus the variance of the Gaussian noise.

• The bias term measures how much on average our estimated model deviates from the true mean,
given by the function f (X).
• The variance term is the expected (squared) deviation of the estimated model around its mean.
• The third term is an irreducible error, due to the inherent variance in the data-generating process
around its true mean f (X).


err[x0 ] = E[(Y − fˆ(X))2 | X = x0 ]
= (E[f (x0 )] − f (x0 ))2 + E[f (x0 ) − E[fˆ(x0 )]]2 + σε2
= bias2 (fˆ(x0 )) + Var(fˆ(x0 )) + σ 2ε
2
= bias + variance + σε2


4

Written for

Institution
Study
Course

Document information

Uploaded on
October 16, 2025
Number of pages
34
Written in
2025/2026
Type
SUMMARY

Subjects

$12.10
Get access to the full document:

Wrong document? Swap it for free Within 14 days of purchase and before downloading, you can choose a different document. You can simply spend the amount again.
Written by students who passed
Immediately available after payment
Read online or as PDF

Get to know the seller

Seller avatar
Reputation scores are based on the amount of documents a seller has sold for a fee and the reviews they have received for those documents. There are three levels: Bronze, Silver and Gold. The better the reputation, the more your can rely on the quality of the sellers work.
charhoog Universiteit van Amsterdam
Follow You need to be logged in order to follow users or courses
Sold
13
Member since
2 year
Number of followers
6
Documents
12
Last sold
2 weeks ago

3.0

1 reviews

5
0
4
0
3
1
2
0
1
0

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

Student with book image

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Working on your references?

Create accurate citations in APA, MLA and Harvard with our free citation generator.

Working on your references?

Frequently asked questions