Geschreven door studenten die geslaagd zijn Direct beschikbaar na je betaling Online lezen of als PDF Verkeerd document? Gratis ruilen 4,6 TrustPilot
logo-home
Samenvatting

Summary Machine Learning

Beoordeling
3,6
(5)
Verkocht
51
Pagina's
61
Geüpload op
01-02-2023
Geschreven in
2022/2023

English Summary of Machine Learning course of Master Data Science and Society at Tilburg University. A summary of lecture materials, readings, and notes.

Voorbeeld van de inhoud

Machine learning
Lecture 1

Machine learning is about automation of problem solving. It is the study of computer
algorithms that improve automatically through experience. Involves becoming better at a
task T based on some experience E with respect to some performance measure P.
Examples:
- Span detection
- Movie recommendation
- Speech recognition
- Credit risk analysis
- Autonomous driving
- Medical diagnosis.
It comes up with a learned algorithm. It is about learning from experience.

What does it involve?
- ML may involve a notion of generalization. When the machine learns relationships
between the input and the output, we want this to work on unseen data, which is
the concept of generalization. Is it safe to assume that current observations are
generalized to future observations?
- Annotated data, objective, optimization algorithm, features/representations,
assumptions are some critical components.
- We assume the database presents the population. As we have more data, the output
becomes better.
- There is an optimization algorithm that incrementally works towards the best
outcome.

Different types of learning:
Starting points:
- Supervised learning: annotated/labelled dataset / ground truth
o Classification: discrete variable
o Regression: continuous variable
- Unsupervised learning: unlabeled dataset
o clustering

Examples:
Spam vs non-spam?




This is usually a problem of text mining. The emails have to be pre-processed in such a way
that we can create features from the dataset. This is a binary classification problem. The

,learning algorithm should come up with a function that matches the representation of the
emails.
- Find examples of spam and non-spam
- Come up with a learning algorithm
- A learning algorithm infers rules from examples: if (A or B or C) and not D, then spam
- These rules can then be applied to new data (emails)

Learning algorithms:
- See several different learning algorithms
- Implement 2-3 simple ones from scratch in Python
- Learn about Python libraries for ML (scikit-Learn)
- How to apply them to real-world problems

Machine learning examples:
- Recognize handwritten numbers and letters
- Recognize faces in photos
- Determine whether text expresses positive, negative or no opinion
- Guess person’s age based on a sample of writing
- Flag suspicious credit-card transactions
- Recommend books and movies to users based on their own and others’ purchase
history
- Recognize and label mentions of people’s or organization names in text

Types of learning problems:
Regression:
- Response: a (real) number
- Predict a person’s age
- Predict price of stock
- Predict student’s score on exam
Binary classification:
- Response: Yes/No answer
- Detect spam
- Predict polarity of product review: positive vs negative
Multiclass classification:
- Response: one of a finite set of options
- Classify newspaper article as:
o Politics, sports, science, technology, health, finance
- Detect species based on photo
o Passer domesticus, Calidris alba, Streptopelia, decaocto, corvus cornax
Multilabel classification:
- The output does not have to consist of a single thing, but it could be multiple things
(this is the difference with multiclass classification)
- Assign songs to one or more genres (rock, pop, metal)
- You are not trying to find all of the labels correctly, but you are trying to find the
most correct labels during training.
Autonomous behavior (example of a car)
- Input: measurements from sensors – camera, microphone, radar, accelerometer.

, - Response: instructions for actuators – steering, accelerator, brake.
- Evaluation: choose a baseline, choose a metric, compare!
- Different tasks, different metrics:
o Predicting age
o Flagging spam

Two metrics that we often use in regression problems:
- Mean absolute error – the average (absolute) difference between true value and
predicted value (yn true value (ground truth), ŷn predicted value)


- Mean squared error: the average square of the difference between true value and
predicted value – more sensitive to outlier, but it is differentiable (as opposed to
MAE)



For a binary classification problems, the metrics often used are:
- Accuracy
- Error rate
These are not really informative, especially if the database is not balanced.



Classification:
- False positive – flagged as spam, but not spam
- False negative – not flagged, but is spam
- False positives are a bigger issue for this problem!
- Ture positive – spam classified as spam
- Ture negative – not-spam classified as not-spam

Precision and recall:
- Metrics which focus on one kind of mistake
- Precision: what fraction of flagged emails were real spam?

- Recall: what fraction of real spams were flagged?


Example:

, Confusion matrix example:




f-score:
- Harmonic mean between precision and recall (a kind of average)


- Aka F-measure

Fβ :
- Parameter β quantifies how much more we care about recall than precision, when it
is greater than 1, that means, recall is weighted more, when it is smaller than 1, that
means precision is weighted more



Multiclass classification:
You can still make a confusion matrix with multiclass classification as well.




When there are more than two classes, you have to come up with alternatives when it
comes to rating the learning outcomes. You can use macro-average and micro-average.

Macro-average:
Precision true positive over labeled positives; recall, true positives over actual positives.
- You can only use this if the data is balanced.
- Compute precision and recall per-class, and average:

- Rare classes have the same impact as frequent classes

Micro-average:
- Gives every point equal importance (this is the difference from the macro-average).
- Micro averaging treats the entire set of data as an aggregate result, and calculates 1
metric rather than k metrics that get averaged together

Documentinformatie

Geüpload op
1 februari 2023
Aantal pagina's
61
Geschreven in
2022/2023
Type
SAMENVATTING

Onderwerpen

€4,99
Krijg toegang tot het volledige document:
Gekocht door 51 studenten

Verkeerd document? Gratis ruilen Binnen 14 dagen na aankoop en voor het downloaden kun je een ander document kiezen. Je kunt het bedrag gewoon opnieuw besteden.
Geschreven door studenten die geslaagd zijn
Direct beschikbaar na je betaling
Online lezen of als PDF

Beoordelingen van geverifieerde kopers

Alle 5 reviews worden weergegeven
5 maanden geleden

1 jaar geleden

1 jaar geleden

1 jaar geleden

2 jaar geleden

3,6

5 beoordelingen

5
2
4
1
3
1
2
0
1
1
Betrouwbare reviews op Stuvia

Alle beoordelingen zijn geschreven door echte Stuvia-gebruikers na geverifieerde aankopen.

Maak kennis met de verkoper

Seller avatar
De reputatie van een verkoper is gebaseerd op het aantal documenten dat iemand tegen betaling verkocht heeft en de beoordelingen die voor die items ontvangen zijn. Er zijn drie niveau’s te onderscheiden: brons, zilver en goud. Hoe beter de reputatie, hoe meer de kwaliteit van zijn of haar werk te vertrouwen is.
liekebuuron Avans Hogeschool
Bekijk profiel
Volgen Je moet ingelogd zijn om studenten of vakken te kunnen volgen
Verkocht
171
Lid sinds
5 jaar
Aantal volgers
103
Documenten
15
Laatst verkocht
2 maanden geleden

3,3

12 beoordelingen

5
5
4
2
3
1
2
0
1
4

Recent door jou bekeken

Waarom studenten kiezen voor Stuvia

Gemaakt door medestudenten, geverifieerd door reviews

Kwaliteit die je kunt vertrouwen: geschreven door studenten die slaagden en beoordeeld door anderen die dit document gebruikten.

Niet tevreden? Kies een ander document

Geen zorgen! Je kunt voor hetzelfde geld direct een ander document kiezen dat beter past bij wat je zoekt.

Betaal zoals je wilt, start meteen met leren

Geen abonnement, geen verplichtingen. Betaal zoals je gewend bent via iDeal of creditcard en download je PDF-document meteen.

Student with book image

“Gekocht, gedownload en geslaagd. Zo makkelijk kan het dus zijn.”

Alisha Student

Bezig met je bronvermelding?

Maak nauwkeurige citaten in APA, MLA en Harvard met onze gratis bronnengenerator.

Bezig met je bronvermelding?

Veelgestelde vragen