Written by students who passed Immediately available after payment Read online or as PDF Wrong document? Swap it for free 4.6 TrustPilot
logo-home
Class notes

Machine learning

Rating
-
Sold
-
Pages
227
Uploaded on
16-10-2024
Written in
2024/2025

This document containing performing of machine learning

Institution
Course

Content preview

CS229 Lecture Notes

Andrew Ng and Tengyu Ma

June 11, 2023

,Contents

I Supervised learning 5
1 Linear regression 8
1.1 LMS algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.2 The normal equations . . . . . . . . . . . . . . . . . . . . . . . 13
1.2.1 Matrix derivatives . . . . . . . . . . . . . . . . . . . . . 13
1.2.2 Least squares revisited . . . . . . . . . . . . . . . . . . 14
1.3 Probabilistic interpretation . . . . . . . . . . . . . . . . . . . . 15
1.4 Locally weighted linear regression (optional reading) . . . . . . 17

2 Classification and logistic regression 20
2.1 Logistic regression . . . . . . . . . . . . . . . . . . . . . . . . 20
2.2 Digression: the perceptron learning algorithm . . . . . . . . . 23
2.3 Multi-class classification . . . . . . . . . . . . . . . . . . . . . 24
2.4 Another algorithm for maximizing `(θ) . . . . . . . . . . . . . 27

3 Generalized linear models 29
3.1 The exponential family . . . . . . . . . . . . . . . . . . . . . . 29
3.2 Constructing GLMs . . . . . . . . . . . . . . . . . . . . . . . . 31
3.2.1 Ordinary least squares . . . . . . . . . . . . . . . . . . 32
3.2.2 Logistic regression . . . . . . . . . . . . . . . . . . . . 33

4 Generative learning algorithms 34
4.1 Gaussian discriminant analysis . . . . . . . . . . . . . . . . . . 35
4.1.1 The multivariate normal distribution . . . . . . . . . . 35
4.1.2 The Gaussian discriminant analysis model . . . . . . . 38
4.1.3 Discussion: GDA and logistic regression . . . . . . . . 40
4.2 Naive bayes (Option Reading) . . . . . . . . . . . . . . . . . . 41
4.2.1 Laplace smoothing . . . . . . . . . . . . . . . . . . . . 44
4.2.2 Event models for text classification . . . . . . . . . . . 46



1

,CS229 Spring 20223 2


5 Kernel methods 48
5.1 Feature maps . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
5.2 LMS (least mean squares) with features . . . . . . . . . . . . . 49
5.3 LMS with the kernel trick . . . . . . . . . . . . . . . . . . . . 49
5.4 Properties of kernels . . . . . . . . . . . . . . . . . . . . . . . 53

6 Support vector machines 59
6.1 Margins: intuition . . . . . . . . . . . . . . . . . . . . . . . . . 59
6.2 Notation (option reading) . . . . . . . . . . . . . . . . . . . . 61
6.3 Functional and geometric margins (option reading) . . . . . . 61
6.4 The optimal margin classifier (option reading) . . . . . . . . . 63
6.5 Lagrange duality (optional reading) . . . . . . . . . . . . . . . 65
6.6 Optimal margin classifiers: the dual form (option reading) . . 68
6.7 Regularization and the non-separable case (optional reading) . 72
6.8 The SMO algorithm (optional reading) . . . . . . . . . . . . . 73
6.8.1 Coordinate ascent . . . . . . . . . . . . . . . . . . . . . 74
6.8.2 SMO . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75


II Deep learning 79
7 Deep learning 80
7.1 Supervised learning with non-linear models . . . . . . . . . . . 80
7.2 Neural networks . . . . . . . . . . . . . . . . . . . . . . . . . . 84
7.3 Modules in Modern Neural Networks . . . . . . . . . . . . . . 92
7.4 Backpropagation . . . . . . . . . . . . . . . . . . . . . . . . . 98
7.4.1 Preliminaries on partial derivatives . . . . . . . . . . . 99
7.4.2 General strategy of backpropagation . . . . . . . . . . 102
7.4.3 Backward functions for basic modules . . . . . . . . . . 105
7.4.4 Back-propagation for MLPs . . . . . . . . . . . . . . . 107
7.5 Vectorization over training examples . . . . . . . . . . . . . . 109


III Generalization and regularization 112
8 Generalization 113
8.1 Bias-variance tradeoff . . . . . . . . . . . . . . . . . . . . . . . 115
8.1.1 A mathematical decomposition (for regression) . . . . . 120
8.2 The double descent phenomenon . . . . . . . . . . . . . . . . . 121
8.3 Sample complexity bounds (optional readings) . . . . . . . . . 126

, CS229 Spring 20223 3


8.3.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . 126
8.3.2 The case of finite H . . . . . . . . . . . . . . . . . . . . 128
8.3.3 The case of infinite H . . . . . . . . . . . . . . . . . . 131

9 Regularization and model selection 135
9.1 Regularization . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
9.2 Implicit regularization effect (optional reading) . . . . . . . . . 137
9.3 Model selection via cross validation . . . . . . . . . . . . . . . 139
9.4 Bayesian statistics and regularization . . . . . . . . . . . . . . 142


IV Unsupervised learning 144
10 Clustering and the k-means algorithm 145

11 EM algorithms 148
11.1 EM for mixture of Gaussians . . . . . . . . . . . . . . . . . . . 148
11.2 Jensen’s inequality . . . . . . . . . . . . . . . . . . . . . . . . 151
11.3 General EM algorithms . . . . . . . . . . . . . . . . . . . . . . 152
11.3.1 Other interpretation of ELBO . . . . . . . . . . . . . . 158
11.4 Mixture of Gaussians revisited . . . . . . . . . . . . . . . . . . 158
11.5 Variational inference and variational auto-encoder (optional
reading) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160

12 Principal components analysis 165

13 Independent components analysis 171
13.1 ICA ambiguities . . . . . . . . . . . . . . . . . . . . . . . . . . 172
13.2 Densities and linear transformations . . . . . . . . . . . . . . . 173
13.3 ICA algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 174

14 Self-supervised learning and foundation models 177
14.1 Pretraining and adaptation . . . . . . . . . . . . . . . . . . . . 177
14.2 Pretraining methods in computer vision . . . . . . . . . . . . . 179
14.3 Pretrained large language models . . . . . . . . . . . . . . . . 181
14.3.1 Open up the blackbox of Transformers . . . . . . . . . 183
14.3.2 Zero-shot learning and in-context learning . . . . . . . 186

Connected book

Written for

Institution
Course

Document information

Uploaded on
October 16, 2024
Number of pages
227
Written in
2024/2025
Type
Class notes
Professor(s)
Andrew ng and tengyu ma
Contains
All classes

Subjects

$8.99
Get access to the full document:

Wrong document? Swap it for free Within 14 days of purchase and before downloading, you can choose a different document. You can simply spend the amount again.
Written by students who passed
Immediately available after payment
Read online or as PDF

Get to know the seller
Seller avatar
hariprasadr

Get to know the seller

Seller avatar
hariprasadr Jhon
Follow You need to be logged in order to follow users or courses
Sold
-
Member since
2 year
Number of followers
0
Documents
4
Last sold
-

0.0

0 reviews

5
0
4
0
3
0
2
0
1
0

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

Student with book image

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Working on your references?

Create accurate citations in APA, MLA and Harvard with our free citation generator.

Working on your references?

Frequently asked questions