Samenvatting

Summary Machine Learning (880083-M-6)

Beoordeling

Verkocht

Pagina's

Geüpload op

06-10-2021

Geschreven in

2020/2021

Summary Machine Learning (-M-6), written in the spring semester of 2021 for Data Science & Society on Tilburg University.

Instelling

Vak

Voorbeeld van de inhoud

Machine learning
Machine Learning is the study of computer algorithms that improve automatically through
experience (involves becoming better at a task T based on some experience E with respect to some
performance measure P). So, the computer learns from its mistakes. It gets better all the time.
For example, steps in flagging spam:
1. You find examples of SPAM and non-SPAM (training set)
2. Come up with a learning algorithm
3. A learning algorithm infers rules from examples  for example, certain words
4. These rules can then be applied to new and unseen data (test set)  this is very important
for generalization. Our goal is to make the algorithm useful for new e-mails, we do not care
AT ALL about the e-mails in the training data. We want the algorithm that most correctly
classifies NEW data.
Examples of machine learning:
- Flagging spam e-mails
- Flagging suspicious credit card transactions
- Recommend books based on earlier purchases (from both the individual and other people)
- Recognize and label names of people’s organization’s names in text

In machine learning, you follow the following steps:
1. Define a ML problem and propose a solution. So you define classification/regression,
supervised/unsupervised, expected outcome and measures to evaluate performance.
2. Construct your dataset. So collect and clean raw data, and then split the data.
3. Use feature engineering, so think about how you will represent the data.
4. Train a model, so optimize parameters and minimize the loss function.
5. Use the trained model to make predictions, so evaluate the performance of your model.

Feature transformation
Transforming features, such as feature extraction or feature selection. You do this for the following
reasons:
- Mandatory transformations for data compatibility: these are actions you should perform
because otherwise your calculations won’t work (covert non-numeric into numeric, resize
inputs etc.)
- Optional quality transformation that may help the model perform better: these are actions
you might want to perform to get better performance (such as normalizing, OOV)

,Feature extraction
Feature extraction refers to the process of transforming raw data into numerical features that can be
processed while preserving the information in the original data set. So basically you create new
features because the existing features are not informative. It can be accomplished manually or
automatically. So you update your existing features or create new ones.
Manually: requires identifying and describing the features that are relevant for a given problem and
implementing a way to extract those features (you define meaningful features).
Automatically: uses specialized algorithms or deep networks to extract features automatically from
signals or images without the need for human intervention (meaningful features are extracted
within the algorithm).

Feature selection
In contrast to feature extraction, you do not update/create new features, but you simply select
features you already have. This will reduce the dimensionality of your dataset; it will become
smaller. If features are noisy, or if they are redundant, you should just delete them.
Advantages:
- Simplified models
- Shorter training times
- Potentially improved performance (irrelevant features may hurt the performance, for
example if two features are both important but they are highly correlated)
- Reducing overfitting

Binary classification
Type of learning problem. With binary classification, we want to find yes/no, positive/negative,
etcetera. For instance, finding out whether an e-mail is spam or not.

Multilabel classification
Type of learning problem. This is similar to binary classification, but instead of two classes, there are
multiple classes to choose from. The response is a finite set of yes/no.

Regression
Type of learning problem. With regression, we want to find a real number (predict people’s age,
predict sales, etc)

,Ranking
Type of learning problem. With ranking, we want to order objects according to their relevance. For
instance, google pages are ranked from most relevant to least relevant. You will probably never view
the least relevant ones.

Sequence labeling
Type of learning problem. With sequence labeling, the input is a sequence of elements (for instance,
words). The response is a corresponding sequence of labels.

Sequence-to-sequence modeling
Type of learning problem. Similar to sequence labeling, but with sequence-to-sequence modeling,
the response is another sequence of elements (for example, different lengths or different sources).

Autonomous behavior
Type of learning problem. With autonomous behavior, the inputs can be about everything. The idea
is that the object learns form itself. For example, a self-driving car gives measurements from sensors
etcetera as input and instructions for self-driving as response.

Evaluation metrics
- MSE: evaluation metric, this is the mean squared error (y_pred – y_true). You use the MSE
instead of the MAE when you want to punish for high errors (outliers).

- MAE: evaluation metric, this is the mean absolute error (y_pred – y_true).

- Error rate: this is the opposite of accuracy (TP+TN / all), which is FP + FN / all. So basically
this indicates the score of wrong classifications.

, Accuracy/Recall/Precision/F score
These evaluation metrics focus on a specific king of mistake.
Accuracy = (TP + TN) / all
Precision = TP / (TP + FP)
Recall = TP / (TP + FN). You use this when a FN is much worse than a FP, for example with corona.
F score: you use this with unbalanced classes. It is the harmonic mean between recall and precision.
With the beta, you can specify how many times you care more about recall than about precision.
The standard is beta = 1 (F1 score), meaning that you value recall and precision equally.

Macro-average/Micro-average
Macro-average = compute precision and recap per-class, and average. Rare classes have the same
impact as frequent classes. Macro-average is the total precision/recall etc.
Micro-average = treat each correct prediction as TP, each missing classification as FN and each
incorrect prediction as FP. Micro-averaging is used in single-label classification. We average over all
classes, including the null/default class (so precision = recall = F score = accuracy). Micro-average is
the precision/recall etc. per class.

Decision tree
A decision tree is a supervised learning method. Trees are recursively defined data structures. There
is a base case (leaf node) and recursive cases (branch nodes). You should start with the question that
eliminates most options, so choose the question that if we had to classify data based only on one
question, which question would do best? Therefore, the best attribute is the one which has the
highest information gain (aka the lowest entropy, since IG = 1 – E). A tree consists of:
- Root: top of the tree (Root node)
- Nodes: check the value of a feature (internal nodes)
- Edges: correspond to value of a test, connects to next node or leaf, these
are the arrows

Meld schending auteursrecht

Geschreven voor

Instelling: Tilburg University (UVT)
Studie: Data Science & Society
Vak: Machine Learning (880083M6)

Alle documenten voor dit vak (18)

Documentinformatie

Geüpload op: 6 oktober 2021
Aantal pagina's: 61
Geschreven in: 2020/2021
Type: SAMENVATTING

Onderwerpen

machine learning

€3,79

Krijg toegang tot het volledige document:

Geschreven door studenten die geslaagd zijn

Direct beschikbaar na je betaling

Online lezen of als PDF

Maak kennis met de verkoper

xtessaroes

4,0

(1)

Maak kennis met de verkoper

xtessaroes Tilburg University

Bekijk profiel

Volgen

Verkocht

Lid sinds

5 jaar

Aantal volgers

Documenten

Laatst verkocht

3 jaar geleden

4,0

1 beoordelingen

Recent door jou bekeken

Waarom studenten kiezen voor Stuvia

Gemaakt door medestudenten, geverifieerd door reviews

Kwaliteit die je kunt vertrouwen: geschreven door studenten die slaagden en beoordeeld door anderen die dit document gebruikten.

Niet tevreden? Kies een ander document

Geen zorgen! Je kunt voor hetzelfde geld direct een ander document kiezen dat beter past bij wat je zoekt.

Betaal zoals je wilt, start meteen met leren

Geen abonnement, geen verplichtingen. Betaal zoals je gewend bent via iDeal of creditcard en download je PDF-document meteen.

“Gekocht, gedownload en geslaagd. Zo makkelijk kan het dus zijn.”

Alisha Student

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.

Tevredenheidsgarantie: hoe werkt dat?

Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.

Van wie koop ik deze samenvatting?

Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper xtessaroes. Stuvia faciliteert de betaling aan de verkoper.

Zit ik meteen vast aan een abonnement?

Nee, je koopt alleen deze samenvatting voor €3,79. Je zit daarna nergens aan vast.

Is Stuvia te vertrouwen?

4,6 sterren op Google & Trustpilot (+1000 reviews) Afgelopen 30 dagen zijn er 50860 samenvattingen verkocht Opgericht in 2010, al 16 jaar dé plek om samenvattingen te kopen

Summary Machine Learning (880083-M-6)

Voorbeeld van de inhoud

Geschreven voor

Documentinformatie

Onderwerpen

Meer vakken binnen Tilburg University (UVT) > Data Science & Society

Maak kennis met de verkoper

Recent door jou bekeken

Waarom studenten kiezen voor Stuvia

Gemaakt door medestudenten, geverifieerd door reviews

Niet tevreden? Kies een ander document

Betaal zoals je wilt, start meteen met leren

Bezig met je bronvermelding?

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Tevredenheidsgarantie: hoe werkt dat?

Van wie koop ik deze samenvatting?

Zit ik meteen vast aan een abonnement?

Is Stuvia te vertrouwen?