Samenvatting

Machine Learning Course Summary — Supervised Learning, Clustering, Neural Networks, and Reinforcement Learning

Beoordeling

Verkocht

Pagina's

Geüpload op

27-04-2026

Geschreven in

2024/2025

A complete and structured Machine Learning summary covering the core concepts needed for revision, including data preprocessing, regression, classification, Naive Bayes, decision trees, ensemble methods, clustering, representation learning, multilayer perceptrons, optimization, and reinforcement learning. Ideal for exam preparation and quick review.

Meer zien Lees minder

Instelling

Vak

Voorbeeld van de inhoud

Every machine
learning has 3 components :

1) Representation -
the space of allowed models

Linear tree/Sets of Rule/Instances/Graphical models / networks...
Regression/decision neural
·

2) Evaluation -
how to
juge one model vs .
another

/ / likelihood
Accuracy/ Precision & recall mean squared ever
·

3) the models
Optimization -
a method to search
among forthe highest-scoring one
·

Combinatorial optimization / Convex optimization / Constrained/Nonconvex

Supervised learning unsuperised learning Reinforcement learning
·

correct out put known for each ·
correct output is not known ·
learn action to maximize payoff

training example create an internal representation of the
learning is based on new and
· ·

learn to predict output when
given an input rector input capturing regularities/structure in data Relation with
game theory control
...
· ·

, ,

·
the most used
type Methods Methods
wide-area
of academic and industrial
appli K-mean
clustering Q-learning
·
·

.

Methods ·
Restricted Boltzmann Machine. ·
SARSA

Support Lector Machine Examples Examples
Artificial Neural Networks Discover clusters Decision
making process
·

·

·
Decision Trees. . . ·

Discover factors/structures Control

Examples of specific tasks : Learns from data :
·

Strategic optimization
·

Classification : discute output ·

Training data does not include ·
On-line
learning
Learns
Regression real-valued output desired outputs
·
: :

Learne from data :
·

from collected data

Training data include delived outpute
by exploring the
agent environment
·
·

Designing a
learning system

Yearner
Training data >

~
Environment /
Experience
Knowledge
V

Testing data
>
Performance
Element

,Underfitting
Model is too simple to capture the underlying patterns in the data. It fails to learn from the training data adequately,
resulting in poor performance on both training and testing datasets.
Causes:
• An overly simplistic model
• Insufficient training
• Lack of relevant features
Consequences: the model has a high bias, leading to inaccurate predictions. It may exhibit a high error rate even on the
training set

Overfitting
Model learns the noise and random fluctuations in the training data instead of the actual underlying patterns. This results in
a model that performs exceptionally well on the training set but poorly on unseen data
Causes:
• A model that is too complex
• Excessive training
• Insufficient data to train the model adequately
Consequences: The model has a high variance, meaning it is sensitive to small changes in the training data. It may
generalize poorly to new data

,Lesson 2

Data preparation for ML
Types of data attributes:
• Nominal —> ID numbers, eye color, zip codes
• Ordinal —> rankings, grades, height tallies, medium
• Interval —> calendar dates
• Ratio —> length, time, mass

Discrete attributes
• has only a finite or countable infinite set of values
• Zip codes, counts, or the set of words in a collection of documents
• Often represented as integer variable
• Binary attributes are a special case of discrete attributes

Continuous attributes:
• has real numbers as attributes values
• Temperature, height, weight
• Represented as floating point variables

Types of data sets
• Record: data matrix, document data, transaction data
• Graph: World Wide Web, molecular structures
• Ordered: spatial data, temporal data, sequential data, genetic sequence data

Characteristics of data
• dimensionality: high dimensional data brings a number of challenges
• Sparsity: only presence counts
• Resolution: patterns depend on the scale
• Size: type of analysis may depend on size of data

Data Preprocessing
• Aggregation: combining two or more attributes (or objects) into a single attribute
• Sampling: main technique employed for data reduction
• Feature extraction: transforms the data in the high-dimensional space to a space of fewer dimensions
• Feature subset selection: tries to find a representative subset of the original variables
• Feature creation: create new attributes that can capture the important information in a data sets much more efficiently
than the original attributes
• Discretization and Binarization: the process of converting a continuous attributes into an ordinal attribute
• Attribute transformation: a function that maps the entire set of values of a given attribute to a new set of replacement
values such that each old value can be identified with one of the new values

, Noise
Random errors or variances in the data that do not reflect the true underlying patterns. It can arise from various sources,
such as measurement errors, data entry errors or inconsistencies in data collection processes
Impact: obscure the true relationship between input features and the target variable

Outliers
Data points that differ significantly from the majority of the dataset. They can occur due to variability in the measurement,
data entry errors, or they may represent significant anomalies.

Missing values
Occur when data for a particular observation is not available. They can result from various factors, including data collection
errors, participant non-responses, or system failures

types of missing values:
1. Missing completely at random (MCAR)
• Missingness of a values is independent of attributes
• Fill in values based on the attribute
• Analysis may be unbiased overall
1. Missing at Random (MAR)
• Missingness is related to other variables
• Fill in values based on other values
• Almost always produces a bias in the analysis
1. Missing Not at Random (MNAR)
• Missingness is related to unobserved measurements
• Informative or non-ignorable Missingness
1. Not possible to know the situation from the data

Imbalanced data
The number of objects in some classes are much smaller than the number of objects from the other classes
Possible approches: resampling, collect more data, choose the right evaluation metrics and the right models

Similarity and Dissimilarity Measures
Similarity measures
• numerical measure of how alike two data objects are
• Is higher when objects are more alike
Dissimilarity measures
• numerical measure of how different two data objects are
• Lower when objects are more alike

Meld schending auteursrecht

Geschreven voor

Instelling: University Of Luxembourg.
Vak: Machine Learning

Alle documenten voor dit vak (1)

Documentinformatie

Geüpload op: 27 april 2026
Aantal pagina's: 45
Geschreven in: 2024/2025
Type: SAMENVATTING

Onderwerpen

machine learning
surpervised learning
clustering
neural networks
reinforcement learning
regression
classification
decision trees
naive bayes
exam revision

$10.73

Krijg toegang tot het volledige document:

Geschreven door studenten die geslaagd zijn

Direct beschikbaar na je betaling

Online lezen of als PDF

Maak kennis met de verkoper

eugniedelaunay

Maak kennis met de verkoper

eugniedelaunay Computer Science

Bekijk profiel

Volgen

Verkocht

Lid sinds

3 weken

Aantal volgers

Documenten

Laatst verkocht

0.0

0 beoordelingen

Recent door jou bekeken

Waarom studenten kiezen voor Stuvia

Gemaakt door medestudenten, geverifieerd door reviews

Kwaliteit die je kunt vertrouwen: geschreven door studenten die slaagden en beoordeeld door anderen die dit document gebruikten.

Niet tevreden? Kies een ander document

Geen zorgen! Je kunt voor hetzelfde geld direct een ander document kiezen dat beter past bij wat je zoekt.

Betaal zoals je wilt, start meteen met leren

Geen abonnement, geen verplichtingen. Betaal zoals je gewend bent via iDeal of creditcard en download je PDF-document meteen.

“Gekocht, gedownload en geslaagd. Zo makkelijk kan het dus zijn.”

Alisha Student

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.

Tevredenheidsgarantie: hoe werkt dat?

Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.

Van wie koop ik deze samenvatting?

Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper eugniedelaunay. Stuvia faciliteert de betaling aan de verkoper.

Zit ik meteen vast aan een abonnement?

Nee, je koopt alleen deze samenvatting voor $10.73. Je zit daarna nergens aan vast.

Is Stuvia te vertrouwen?

4,6 sterren op Google & Trustpilot (+1000 reviews) Afgelopen 30 dagen zijn er 49246 samenvattingen verkocht Opgericht in 2010, al 16 jaar dé plek om samenvattingen te kopen

Machine Learning Course Summary — Supervised Learning, Clustering, Neural Networks, and Reinforcement Learning

Voorbeeld van de inhoud

Geschreven voor

Documentinformatie

Onderwerpen

Maak kennis met de verkoper

Recent door jou bekeken

Waarom studenten kiezen voor Stuvia

Gemaakt door medestudenten, geverifieerd door reviews

Niet tevreden? Kies een ander document

Betaal zoals je wilt, start meteen met leren

Bezig met je bronvermelding?

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Tevredenheidsgarantie: hoe werkt dat?

Van wie koop ik deze samenvatting?

Zit ik meteen vast aan een abonnement?

Is Stuvia te vertrouwen?