Class notes

Machine Learning and Data mining

Rating

Sold

Pages

134

Uploaded on

19-03-2023

Written in

2022/2023

It is the well written notes for Machine learning and Data mining

Institution

Course

Content preview

Machine Learning and Data Mining
Lecture Notes
CSC 411/D11
Computer Science Department
University of Toronto
Version: February 6, 2012

Copyright c 2010 Aaron Hertzmann and David Fleet

,CSC 411 / CSC D11 CONTENTS

Contents
Conventions and Notation iv

1 Introduction to Machine Learning 1
1.1 Types of Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 A simple problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2 Linear Regression 5
2.1 The 1D case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Multidimensional inputs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3 Multidimensional outputs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

3 Nonlinear Regression 9
3.1 Basis function regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.2 Overfitting and Regularization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.3 Artificial Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.4 K-Nearest Neighbors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

4 Quadratics 17
4.1 Optimizing a quadratic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

5 Basic Probability Theory 21
5.1 Classical logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
5.2 Basic definitions and rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
5.3 Discrete random variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
5.4 Binomial and Multinomial distributions . . . . . . . . . . . . . . . . . . . . . . . 25
5.5 Mathematical expectation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

6 Probability Density Functions (PDFs) 27
6.1 Mathematical expectation, mean, and variance . . . . . . . . . . . . . . . . . . . . 28
6.2 Uniform distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
6.3 Gaussian distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
6.3.1 Diagonalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
6.3.2 Conditional Gaussian distribution . . . . . . . . . . . . . . . . . . . . . . 33

7 Estimation 35
7.1 Learning a binomial distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
7.2 Bayes’ Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
7.3 Parameter estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
7.3.1 MAP, ML, and Bayes’ Estimates . . . . . . . . . . . . . . . . . . . . . . . 38
7.4 Learning Gaussians . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

Copyright c 2011 Aaron Hertzmann and David Fleet i

,CSC 411 / CSC D11 CONTENTS

7.5 MAP nonlinear regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

8 Classification 42
8.1 Class Conditionals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
8.2 Logistic Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
8.3 Artificial Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
8.4 K-Nearest Neighbors Classification . . . . . . . . . . . . . . . . . . . . . . . . . 46
8.5 Generative vs. Discriminative models . . . . . . . . . . . . . . . . . . . . . . . . 47
8.6 Classification by LS Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
8.7 Naı̈ve Bayes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
8.7.1 Discrete Input Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
8.7.2 Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

9 Gradient Descent 53
9.1 Finite differences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

10 Cross Validation 56
10.1 Cross-Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

11 Bayesian Methods 59
11.1 Bayesian Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
11.2 Hyperparameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
11.3 Bayesian Model Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

12 Monte Carlo Methods 69
12.1 Sampling Gaussians . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
12.2 Importance Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
12.3 Markov Chain Monte Carlo (MCMC) . . . . . . . . . . . . . . . . . . . . . . . . 73

13 Principal Components Analysis 75
13.1 The model and learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
13.2 Reconstruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
13.3 Properties of PCA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
13.4 Whitening . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
13.5 Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
13.6 Probabilistic PCA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

14 Lagrange Multipliers 83
14.1 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
14.2 Least-Squares PCA in one-dimension . . . . . . . . . . . . . . . . . . . . . . . . 87
14.3 Multiple constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
14.4 Inequality constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

Copyright c 2011 Aaron Hertzmann and David Fleet ii

, CSC 411 / CSC D11 CONTENTS

15 Clustering 92
15.1 K-means Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
15.2 K-medoids Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
15.3 Mixtures of Gaussians . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
15.3.1 Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
15.3.2 Numerical issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
15.3.3 The Free Energy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
15.3.4 Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
15.3.5 Relation to K-means . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
15.3.6 Degeneracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
15.4 Determining the number of clusters . . . . . . . . . . . . . . . . . . . . . . . . . 101

16 Hidden Markov Models 103
16.1 Markov Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
16.2 Hidden Markov Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
16.3 Viterbi Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
16.4 The Forward-Backward Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 107
16.5 EM: The Baum-Welch Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 110
16.5.1 Numerical issues: renormalization . . . . . . . . . . . . . . . . . . . . . . 110
16.5.2 Free Energy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
16.6 Most likely state sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114

17 Support Vector Machines 115
17.1 Maximizing the margin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
17.2 Slack Variables for Non-Separable Datasets . . . . . . . . . . . . . . . . . . . . . 117
17.3 Loss Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
17.4 The Lagrangian and the Kernel Trick . . . . . . . . . . . . . . . . . . . . . . . . . 120
17.5 Choosing parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
17.6 Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122

18 AdaBoost 123
18.1 Decision stumps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
18.2 Why does it work? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
18.3 Early stopping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128

Copyright c 2011 Aaron Hertzmann and David Fleet iii

Report Copyright Violation

Written for

Course: Machine learning and Data mining

All documents for this subject (3)

Document information

Uploaded on: March 19, 2023
Number of pages: 134
Written in: 2022/2023
Type: Class notes
Professor(s): Dr. k. gunasekharan
Contains: All classes

Subjects

machine learning
data mining
artificial intelligence
ai
ml
dm
machine learning and data mining
aiampml
mlampdm

$10.99

Get access to the full document:

Written by students who passed

Immediately available after payment

Read online or as PDF

Get to know the seller

kalyan

Get to know the seller

kalyan

View profile

Sold

Member since

3 year

Number of followers

Documents

Last sold

0.0

0 reviews

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions

What do I get when I buy this document?

You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.

Satisfaction guarantee: how does it work?

Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.

Who am I buying these notes from?

Stuvia is a marketplace, so you are not buying this document from us, but from seller kalyan. Stuvia facilitates payment to the seller.

Will I be stuck with a subscription?

No, you only buy these notes for $10.99. You're not tied to anything after your purchase.

Can Stuvia be trusted?

4.6 stars on Google & Trustpilot (+1000 reviews) 47055 documents were sold in the last 30 days Founded in 2010, the go-to place to buy study notes for 16 years now

Machine Learning and Data mining

Content preview

Written for

Document information

Subjects

Get to know the seller

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Didn't get what you expected? Choose another document

Pay as you like, start learning right away

Working on your references?

Frequently asked questions

What do I get when I buy this document?

Satisfaction guarantee: how does it work?

Who am I buying these notes from?

Will I be stuck with a subscription?

Can Stuvia be trusted?