Class notes

Class notes ECO548528

Rating

Sold

Pages

Uploaded on

19-04-2025

Written in

2024/2025

All notes in readable format

Institution

Course

Content preview

UNIT – 3: INTRODUCTION TO STATISTICAL LEARNING THEORY

FEATURE EXTRACTION
1. Principal Component Analysis: Principal Component Analysis is an unsupervised learning
algorithm that is used for the dimensionality reduction in machine learning. It is a statistical process
that converts the observations of correlated features into a set of linearly uncorrelated features with
the help of orthogonal transformation.

These new transformed features are called the Principal Components. It is a technique to draw strong
patterns from the given dataset by reducing the variances. The PCA algorithm is based on some
mathematical concepts such as:
➢ Variance and Covariance
➢ Eigenvalues and Eigen factors

Some common terms used in PCA algorithm:
(i) Dimensionality: It is the number of features or variables present in the given dataset. More easily,
it is the number of columns present in the dataset.
(ii) Correlation: It signifies that how strongly two variables are related to each other. Such as if one
changes, the other variable also gets changed. The correlation value ranges from -1 to +1. Here, -1
occurs if variables are inversely proportional to each other, and +1 indicates that variables are directly
proportional to each other.
(iii) Orthogonal: It defines that variable are not correlated to each other, and hence the correlation
between the pair of variables is zero.
(iv) Eigenvectors: If there is a square matrix M, and a non-zero vector v is given. Then v will be
eigenvector if Av is the scalar multiple of v.
(v) Covariance Matrix: A matrix containing the covariance between the pair of variables is called
the Covariance Matrix.

Principal Components in PCA: The transformed new features or the output of PCA are the
Principal Components. The number of these PCs are either equal to or less than the original features
present in the dataset. Some properties of these principal components are given below:
➢ The principal component must be the linear combination of the original features.
➢ These components are orthogonal, i.e., the correlation between a pair of variables is zero.
➢ The importance of each component decreases when going to 1 to n, it means the 1 PC has the
most importance, and n PC will have the least importance.

Steps for PCA algorithm
(i) Getting the dataset: Firstly, we need to take the input dataset and divide it into two subparts X
and Y, where X is the training set, and Y is the validation set.

(ii) Representing data into a structure: Now we will represent our dataset into a structure. Such as
we will represent the two-dimensional matrix of independent variable X. Here each row corresponds
to the data items, and the column corresponds to the Features. The number of columns is the
dimensions of the dataset.

, (iii) Standardizing the data: In this step, we will standardize our dataset. Such as in a particular
column, the features with high variance are more important compared to the features with lower
variance. If the importance of features is independent of the variance of the feature, then we will
divide each data item in a column with the standard deviation of the column. Here we will name the
matrix as Z.

(iv) Calculating the Covariance of Z: To calculate the covariance of Z, we will take the matrix Z,
and will transpose it. After transpose, we will multiply it by Z. The output matrix will be the
Covariance matrix of Z.

(v) Calculating the Eigen Values and Eigen Vectors: Now we need to calculate the eigenvalues
and eigenvectors for the resultant covariance matrix Z. Eigenvectors or the covariance matrix are the
directions of the axes with high information. And the coefficients of these eigenvectors are defined as
the eigenvalues.

(vi) Sorting the Eigen Vectors: In this step, we will take all the eigenvalues and will sort them in
decreasing order, which means from largest to smallest. And simultaneously sort the eigenvectors
accordingly in matrix P of eigenvalues. The resultant matrix will be named as P*.

(vii) Calculating the new features Or Principal Components: Here we will calculate the new
features. To do this, we will multiply the P* matrix to the Z. In the resultant matrix Z*, each
observation is the linear combination of original features. Each column of the Z* matrix is
independent of each other.

(viii) Remove less or unimportant features from the new dataset: The new feature set has
occurred, so we will decide here what to keep and what to remove. It means, we will only keep the
relevant or important features in the new dataset, and unimportant features will be removed out.

Example:
Step 1: Data
➢ We consider a dataset having n features or variables denoted by X1; X2; ……. ;Xn
➢ Let there be N examples
➢ Let the values of the ith features Xi be Xi1; Xi2; ………; XiN
Features Example 1 Example 2 ……. Example N
X1 X11 X12 ……. X1N
X2 X21 X22 ……. X2N
Xi Xi1 Xi2 ……. XiN
Xn Xn1 Xn2 ……. XnN

Step 2: Compute the means of the variables
Features Example 1 Example 2 ……. Example N
X1 X11 X12 ……. X1N
X2 X21 X22 ……. X2N
Xi Xi1 Xi2 ……. XiN
Xn Xn1 Xn2 ……. XnN

Report Copyright Violation

Written for

Institution: Bikaner Technical Universityy
Course: ECO548528

All documents for this subject (5)

Document information

Uploaded on: April 19, 2025
Number of pages: 11
Written in: 2024/2025
Type: Class notes
Professor(s): Nitesh
Contains: All classes

Subjects

engineering
ml
ai

$8.49

Get access to the full document:

Written by students who passed

Immediately available after payment

Read online or as PDF

Get to know the seller

radheshyamsulera0104

Get to know the seller

radheshyamsulera0104 Shekhawati

View profile

Sold

Member since

1 year

Number of followers

Documents

Last sold

0.0

0 reviews

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions

What do I get when I buy this document?

You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.

Satisfaction guarantee: how does it work?

Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.

Who am I buying these notes from?

Stuvia is a marketplace, so you are not buying this document from us, but from seller radheshyamsulera0104. Stuvia facilitates payment to the seller.

Will I be stuck with a subscription?

No, you only buy these notes for $8.49. You're not tied to anything after your purchase.

Can Stuvia be trusted?

4.6 stars on Google & Trustpilot (+1000 reviews) 47251 documents were sold in the last 30 days Founded in 2010, the go-to place to buy study notes for 16 years now

Class notes ECO548528

Content preview

Written for

Document information

Subjects

Get to know the seller

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Didn't get what you expected? Choose another document

Pay as you like, start learning right away

Working on your references?

Frequently asked questions

What do I get when I buy this document?

Satisfaction guarantee: how does it work?

Who am I buying these notes from?

Will I be stuck with a subscription?

Can Stuvia be trusted?