Class notes

College Notes Machine Learning Week 1 - 5

Rating

Sold

Pages

Uploaded on

21-03-2021

Written in

2019/2020

College aantekeningen Machine Learning Week 1 - 5 (2 colleges per week)

Institution

Course

Content preview

Machine Learning Week 1:

Hoorcollege 1 Introduction

Deductive reasoning: discrete, unambiguous, known rules
Inductive reasoning: ambiguous, experimental rules, unknown rules -> has to be broken
down to a few rules.
Machine learning: provides systems the ability to automatically learn and improve from
experience without being explicitly programmed (inside other software, in analytics/data
mining, science/statistics)
What makes a good ML problem?
● can’t solve it explicitly, approximate solutions are fine
● limited reliability, predictability, interpretability is fine
● plenty of examples to learn from
- e.g recommending a movie, predicting driving time
Abstract tasks generalize the problem of learning
Online learning: acting and learning at the same time
Reinforcement learning: online learning in a world based on delayed feedback
Offline learning: separate learning and acting - take a fixed dataset of examples, train a
model to learn from these examples, test the model to see if it works
Solve ML problems in general by using a few abstract tasks like classification, regression or
clustering.
1. Supervised: explicit examples of input and output, learn to predict the output for an
unseen input. (mailtjes naar spam of niet) labeled data
- Classification: assign a class to each example
Instances : data we provide our system (examples)
Features of the instance: things we measure
Target value: thing we are trying to learn
Dataset fed to learning algorithm -> produce a classifier: a small machine that
attempts to solve the learning problem.

Example: handwriting recognition, chess problem -> translate problems into
tasks.
Feature space: features plotted against each other
1. Linear classifier: draw a line, since a line is defined by two
parameters, we can also plot in a model space. (the two parameters of
the lines) We define a loss function which tells us how well a particular
model does for our data.
Loss function (model) = performance of model on the data. It takes the
model as input and the data as a constant
2. Decision Tree classifier: studies one feature in isolation at every node.
3. K-nearest neighbour: for a new point, it just looks at the k-points that
are closest and assigns the class that is most frequent in that set.
k = hyperparameter: you have to choose it yourself.

Variations:
Features usually numerical or categorical

, Binary classification: a task with two classes
Multiclass classification: more than two classes
Multilabel classification: none, some or all classes may be true (not
part of course)
Class probabilities/scores: the classifier reports a probability for each
class

- Regression: assign a number to each example. Predict a number instead of a
class. Output space and feature space in same model. We want to model the
relation between the features and the target
Loss function regression: 1/n Sum (f(xi) - yi)2 (MSE)
Regression tree: segment feature space into blocks and assign each a
number.
-> Overfitting (memorizing data instead of generalizing): training loss : the
loss for a given model on the training data. Never judge performance on
training data, leave some unseen data.

The aim of machine learning is not to minimise the loss on the training data,
but to minimise the loss on your test data. You don’t get to see the test data
until you’ve chosen your model!
The problem of machine learning is to choose a model that fits the pattern,
and ignores the noise.

2. Unsupervised: only inputs provided, find any pattern that explains something about
the data
- Clustering: ask the learner to split the instances into a number of clusters.
(unsupervised classification)
K-means: recursive nearest mean until the algorithm converges to a stable
state.
- Density Estimation: we want to learn how likely new data is. It thus produces
a number. The task of modelling the probability distribution behind your data.
- Generative Modeling: building a model from which you can sample new
examples is called generative modeling. -> deep learning

Hoorcollege 2: Linear Models 1

, Linear models and search: search methods are extremely versatile.
- Linear Regression: 1 feature f(x) = wx+b. If we have 2 features, each feature gets its
own weight. Multiple features, we arrange the features of a single instance into a
vector.
-> arbitrary number of features: dot product of the data and the weights and add a
bias. The weights and the bias are the parameters of the model.
Dot product!

Which model fits our data best?
1. Loss function: which tells us how well a particular choice of model does
2. Search method: search the space of all models for a particular model that
results in a low loss.
● Common Loss function for regression: MSE. Squares ensure that negative and
positive residuals don’t cancel out, but also that big errors affect the loss more
heavily than small errors.
-> the loss function maps every point in the model space to a loss value.
● Optimization: p^ = arg min loss X,Y (p) . We are trying to find the input (model
parameters) for which a particular function (the loss) is at its minimum. Find lowest
value as possible
1. Random search: Start with a random point p, loop: pick a point p’ close to p, if
loss p’ is smaller than loss p, p<p’. Analogy: hiker in a snowstorm!
It chooses the next point by sampling uniformly among all points with some
pre-chosen distance.
It works well because our problem is convex: if a line drawn between any two
points on the surface lies entirely above the surface. Implicates there is only 1
minimum.

-> What if the loss surface has multiple local minima? The random search will
go to one of the local minima and get stuck there.
Trick: simulated annealing: still pick the next point that isn’t better than the
current one, but only with a small probability. It still has some probability of
escaping.
In many situations local minima are fine.
● Variations in step size: fixed radius, random uniform or normally distributed.
● Space of linear models is continuous, if the space is discrete such as in tree models
you need to define which models are close to each other.
● Black Box Optimization: random search, simulated annealing. Very simple. We only
need to compute the loss function for each model. Can require many iterations. Also
works for discrete model spaces.
● Parallel Search: random search a couple of times independently.
- Introduce some form of communication between the searches to make it
more useful. Population Methods.
Evolutionary Algorithms: rank the population by loss, remove the worst half,
‘breed’ (crossover operator) a new population.
-> Slow + expensive for complex models, difficult to tune.
● Branching Search: the more closely we inspect our local neighbourhood, the faster
we converge.

Report Copyright Violation

Written for

Institution: Vrije Universiteit Amsterdam (VU)
Study: Business Analytics
Course: Machine Learning

All documents for this subject (4)

Document information

Uploaded on: March 21, 2021
Number of pages: 20
Written in: 2019/2020
Type: Class notes
Professor(s): Peter bloem
Contains: Week 1 - 5 (2 colleges per week)

Subjects

error

$7.64

Get access to the full document:

Written by students who passed

Immediately available after payment

Read online or as PDF

Get to know the seller

hannah85

3.0

(1)

Get to know the seller

hannah85 Vrije Universiteit Amsterdam

View profile

Sold

Member since

7 year

Number of followers

Documents

Last sold

3 year ago

3.0

1 reviews

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions

What do I get when I buy this document?

You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.

Satisfaction guarantee: how does it work?

Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.

Who am I buying these notes from?

Stuvia is a marketplace, so you are not buying this document from us, but from seller hannah85. Stuvia facilitates payment to the seller.

Will I be stuck with a subscription?

No, you only buy these notes for $7.64. You're not tied to anything after your purchase.

Can Stuvia be trusted?

4.6 stars on Google & Trustpilot (+1000 reviews) 50650 documents were sold in the last 30 days Founded in 2010, the go-to place to buy study notes for 16 years now

College Notes Machine Learning Week 1 - 5

Content preview

Written for

Document information

Subjects

Get to know the seller

Why students choose Stuvia

Created by fellow students, verified by reviews

Didn't get what you expected? Choose another document

Pay as you like, start learning right away

Working on your references?

Frequently asked questions

What do I get when I buy this document?

Satisfaction guarantee: how does it work?

Who am I buying these notes from?

Will I be stuck with a subscription?

Can Stuvia be trusted?