Geschreven door studenten die geslaagd zijn Direct beschikbaar na je betaling Online lezen of als PDF Verkeerd document? Gratis ruilen 4,6 TrustPilot
logo-home
Samenvatting

Machine Learning Course Summary — Supervised Learning, Clustering, Neural Networks, and Reinforcement Learning

Beoordeling
-
Verkocht
-
Pagina's
45
Geüpload op
27-04-2026
Geschreven in
2024/2025

A complete and structured Machine Learning summary covering the core concepts needed for revision, including data preprocessing, regression, classification, Naive Bayes, decision trees, ensemble methods, clustering, representation learning, multilayer perceptrons, optimization, and reinforcement learning. Ideal for exam preparation and quick review.

Meer zien Lees minder
Instelling
Vak

Voorbeeld van de inhoud

Every machine
learning has 3 components :




1) Representation -
the space of allowed models

Linear tree/Sets of Rule/Instances/Graphical models / networks...
Regression/decision neural
·




2) Evaluation -
how to
juge one model vs .
another

/ / likelihood
Accuracy/ Precision & recall mean squared ever
·




3) the models
Optimization -
a method to search
among forthe highest-scoring one
·

Combinatorial optimization / Convex optimization / Constrained/Nonconvex



Supervised learning unsuperised learning Reinforcement learning
·

correct out put known for each ·
correct output is not known ·
learn action to maximize payoff

training example create an internal representation of the
learning is based on new and
· ·




learn to predict output when
given an input rector input capturing regularities/structure in data Relation with
game theory control
...
· ·

, ,



·
the most used
type Methods Methods
wide-area
of academic and industrial
appli K-mean
clustering Q-learning
·
·

.




Methods ·
Restricted Boltzmann Machine. ·
SARSA

Support Lector Machine Examples Examples
Artificial Neural Networks Discover clusters Decision
making process
·

·




·
Decision Trees. . . ·

Discover factors/structures Control

Examples of specific tasks : Learns from data :
·




Strategic optimization
·

Classification : discute output ·

Training data does not include ·
On-line
learning
Learns
Regression real-valued output desired outputs
·
: :




Learne from data :
·



from collected data

Training data include delived outpute
by exploring the
agent environment
·
·




Designing a
learning system


Yearner
Training data >




~
Environment /
Experience
Knowledge
V



Testing data
>
Performance
Element

,Underfitting
Model is too simple to capture the underlying patterns in the data. It fails to learn from the training data adequately,
resulting in poor performance on both training and testing datasets.
Causes:
• An overly simplistic model
• Insufficient training
• Lack of relevant features
Consequences: the model has a high bias, leading to inaccurate predictions. It may exhibit a high error rate even on the
training set


Overfitting
Model learns the noise and random fluctuations in the training data instead of the actual underlying patterns. This results in
a model that performs exceptionally well on the training set but poorly on unseen data
Causes:
• A model that is too complex
• Excessive training
• Insufficient data to train the model adequately
Consequences: The model has a high variance, meaning it is sensitive to small changes in the training data. It may
generalize poorly to new data

,Lesson 2


Data preparation for ML
Types of data attributes:
• Nominal —> ID numbers, eye color, zip codes
• Ordinal —> rankings, grades, height tallies, medium
• Interval —> calendar dates
• Ratio —> length, time, mass


Discrete attributes
• has only a finite or countable infinite set of values
• Zip codes, counts, or the set of words in a collection of documents
• Often represented as integer variable
• Binary attributes are a special case of discrete attributes


Continuous attributes:
• has real numbers as attributes values
• Temperature, height, weight
• Represented as floating point variables


Types of data sets
• Record: data matrix, document data, transaction data
• Graph: World Wide Web, molecular structures
• Ordered: spatial data, temporal data, sequential data, genetic sequence data


Characteristics of data
• dimensionality: high dimensional data brings a number of challenges
• Sparsity: only presence counts
• Resolution: patterns depend on the scale
• Size: type of analysis may depend on size of data


Data Preprocessing
• Aggregation: combining two or more attributes (or objects) into a single attribute
• Sampling: main technique employed for data reduction
• Feature extraction: transforms the data in the high-dimensional space to a space of fewer dimensions
• Feature subset selection: tries to find a representative subset of the original variables
• Feature creation: create new attributes that can capture the important information in a data sets much more efficiently
than the original attributes
• Discretization and Binarization: the process of converting a continuous attributes into an ordinal attribute
• Attribute transformation: a function that maps the entire set of values of a given attribute to a new set of replacement
values such that each old value can be identified with one of the new values

, Noise
Random errors or variances in the data that do not reflect the true underlying patterns. It can arise from various sources,
such as measurement errors, data entry errors or inconsistencies in data collection processes
Impact: obscure the true relationship between input features and the target variable


Outliers
Data points that differ significantly from the majority of the dataset. They can occur due to variability in the measurement,
data entry errors, or they may represent significant anomalies.


Missing values
Occur when data for a particular observation is not available. They can result from various factors, including data collection
errors, participant non-responses, or system failures


types of missing values:
1. Missing completely at random (MCAR)
• Missingness of a values is independent of attributes
• Fill in values based on the attribute
• Analysis may be unbiased overall
1. Missing at Random (MAR)
• Missingness is related to other variables
• Fill in values based on other values
• Almost always produces a bias in the analysis
1. Missing Not at Random (MNAR)
• Missingness is related to unobserved measurements
• Informative or non-ignorable Missingness
1. Not possible to know the situation from the data


Imbalanced data
The number of objects in some classes are much smaller than the number of objects from the other classes
Possible approches: resampling, collect more data, choose the right evaluation metrics and the right models


Similarity and Dissimilarity Measures
Similarity measures
• numerical measure of how alike two data objects are
• Is higher when objects are more alike
Dissimilarity measures
• numerical measure of how different two data objects are
• Lower when objects are more alike

Geschreven voor

Instelling
Vak

Documentinformatie

Geüpload op
27 april 2026
Aantal pagina's
45
Geschreven in
2024/2025
Type
SAMENVATTING

Onderwerpen

$10.73
Krijg toegang tot het volledige document:

Verkeerd document? Gratis ruilen Binnen 14 dagen na aankoop en voor het downloaden kun je een ander document kiezen. Je kunt het bedrag gewoon opnieuw besteden.
Geschreven door studenten die geslaagd zijn
Direct beschikbaar na je betaling
Online lezen of als PDF

Maak kennis met de verkoper
Seller avatar
eugniedelaunay

Maak kennis met de verkoper

Seller avatar
eugniedelaunay Computer Science
Volgen Je moet ingelogd zijn om studenten of vakken te kunnen volgen
Verkocht
-
Lid sinds
3 weken
Aantal volgers
0
Documenten
11
Laatst verkocht
-

0.0

0 beoordelingen

5
0
4
0
3
0
2
0
1
0

Recent door jou bekeken

Waarom studenten kiezen voor Stuvia

Gemaakt door medestudenten, geverifieerd door reviews

Kwaliteit die je kunt vertrouwen: geschreven door studenten die slaagden en beoordeeld door anderen die dit document gebruikten.

Niet tevreden? Kies een ander document

Geen zorgen! Je kunt voor hetzelfde geld direct een ander document kiezen dat beter past bij wat je zoekt.

Betaal zoals je wilt, start meteen met leren

Geen abonnement, geen verplichtingen. Betaal zoals je gewend bent via iDeal of creditcard en download je PDF-document meteen.

Student with book image

“Gekocht, gedownload en geslaagd. Zo makkelijk kan het dus zijn.”

Alisha Student

Bezig met je bronvermelding?

Maak nauwkeurige citaten in APA, MLA en Harvard met onze gratis bronnengenerator.

Bezig met je bronvermelding?

Veelgestelde vragen