MACHINE LEARNING
Regulation: 2022 R
II Year / IV Semester
AM3403
Machine Learning: Concepts and Applications
Unit-IV
Lecture Notes
,Syllabus
UNSUPERVISED LEARNING AND OPTIMISATION
Unsupervised learning: Expectation maximization - Gaussian mixture models -K-
means / K medoid hierarchal clustering-top-down, bottom-up -single linkage-
multiple linkage. Dimensionality Reduction- Linear Discriminate Analysis,
Principal Components Analysis, Factor Analysis, Independent Component
Analysis. Optimization- Going Downhill, Least-Squares optimization, Conjugate
Gradients
4.1 Introduction
Unsupervised learning is a type of machine learning where the model is trained on unlabeled
data. Unlike supervised learning, where the model learns from labeled input-output pairs,
unsupervised learning algorithms explore the data’s underlying structure, patterns, and
relationships without explicit guidance. The workflow of unsupervised learning alogrithms
The typical workflow of a Machine Learning (ML) pipeline, divided into two key stages: Training &
Validation and Prediction.
1. Training & Validation Phase
,In this phase, the model is trained using historical data and evaluated for accuracy. The steps are:
Historical Data:
Collected data from past events or records.
This data often includes features (inputs) and outcomes (outputs).
Data Preprocessing:
Cleaning, transforming, and preparing the data for training.
Steps may include handling missing values, normalization, encoding categorical
variables, etc.
Machine Learning:
An unsupervised or supervised model is trained to learn patterns or rules from the
processed data.
Algorithms like clustering, decision trees, or neural networks are commonly used.
Pattern/Rules Generation:
The trained model extracts meaningful patterns or rules from the data.
For example, in an anomaly detection model, these patterns identify what constitutes
―normal‖ behavior.
Validation (Business Context):
The generated patterns/rules are tested in real-world scenarios.
This step ensures the model aligns with business objectives and identifies false positives
or negatives.
The red ❌ or blue ❌ symbols indicate whether the model's predictions align with desired
outcomes.
2. Prediction Phase
This phase involves applying the trained model to new data for making predictions.
New Data:
Incoming real-time or unseen data.
Data Preprocessing:
The new data undergoes the same cleaning and transformation steps applied during
training to maintain consistency.
, Pattern/Rules Application:
The trained model’s learned patterns/rules are applied to predict outcomes for the new
data.
Prediction:
The model generates a prediction (e.g., classifying data points, detecting anomalies, etc.).
Validation (Business Context):
The prediction is validated against business goals or ground truth if available.
Again, the red ❌ or blue ❌ indicates whether the model’s prediction is acceptable or
needs improvement.
4.2 Key Characteristics of Unsupervised Learning
No Labels: The dataset contains only input data without corresponding output labels.
Pattern Discovery: The model identifies hidden patterns, groupings, or trends within the
data.
Autonomous Learning: The algorithm determines data insights independently.
Useful for Exploratory Data Analysis (EDA): Often used to uncover insights or
prepare data for further modeling.
4.3 Common Techniques in Unsupervised Learning
1. Clustering:
o Groups similar data points into clusters based on shared characteristics.
o Examples:
k-Means
Hierarchical Clustering
DBSCAN (Density-Based Spatial Clustering)
2. Association Rule Learning:
o Identifies relationships between variables in large datasets.
o Examples:
Apriori Algorithm
Eclat Algorithm
3. Dimensionality Reduction:
o Reduces the number of features in a dataset while retaining important
information.
o Examples:
PCA (Principal Component Analysis)
t-SNE (t-Distributed Stochastic Neighbor Embedding)
UMAP (Uniform Manifold Approximation and Projection)
4. Anomaly Detection:
o Identifies data points that deviate significantly from the norm.