1
ISYE 6501 MIDTERM 1 LATEST UPDATES ACTUAL
QUESTIONS AND CORRECT ANSWERS ALREADY
GRADED A+ GUARANTEED SUCCESS
What do you want to find with a SVM model?
Find values of a0, a1,,up to am that classifies the points correctly and has the
maximum gap or margin between the parallel lines.
What if it's not possible to separate green and red points in a SVM model?
Utilize a soft classifier -- In a soft classification context, we might add an extra
multiplier for each type of error with a larger penalty, the less we want to accept
mis-classifying that type of point.
Soft Classifier
What should the sum of the green points in a SVM model be?
The sum of green points should be greater than or equal to 1
Does a SVM classifier need to be a straight line?
No, SVM can be generalized using kernel methods that allow for nonlinear
classifiers. Software has a kernel SVM function that you can use to solve for both
linear and nonlinear classifiers.
Can classification questions be answered as probabilities in SVM?
Yes.
What should the sum of the red points in a SVM model be?
The sum of red points should be less than or equal to -1
What should the total sum of green and red points be?
, 2
The total sum of all green and red points should be equal to or greater than 1
because yj is 1 for green and -1 for red.
First principal component
PCA -- a linear combination of original predictor variables which captures the
maximum variance in the data set. It determines the direction of highest variability
in the data. Larger the variability captured in first component, larger the
information captured by component. No other component can have variability
higher than first principal component.
it minimizes the sum of squared distance between a data point and the line.
Second principal component
PCA -- also a linear combination of original predictors which captures the
remaining variance in the data set and is uncorrelated with Z¹. In other words, the
correlation between first and second component should is zero.
Account for errors in SVM classification. Trading off minimizing errors we make
and maximizing the margin.
To trade off between them, we pick a lambda value and minimize a combination of
error and margin. As lambda gets large, this term gets large.
The importance of a large margin outweighs avoiding mistakes and classifying
known data points.
Should you scale your data in a SVM model?
Yes, so the orders of magnitude are approximately the same.
Data must be in bounded range.
Common scaling: data between 0 and 1
a. Scale factor by factor
b. Linearly
How should you find which coefficients to hold value in a SVM model?
If there is a coefficient who's value is very close to 0, means the corresponding
attribute is probably not relevant for classification.
Does SVM work the same for multiple dimensions?
, 3
Yes
K Nearest Neighbor Algorithm
Find the class of the new point, Pick the k closest points to the new one, the new
points class is the most common amongst the k neighbors.
What should you do about varying level of importance across attributes with K
Nearest Neighbors?
Some attributes might be more important than others to the classification --- can
deal with this by weighting each dimension's distance differently.
Unimportant attributes may be removed as they are not very important for the
classification.
What is the difference between real and random effects in validation?
Real effects: same in all data sets
Random effects: different in all data sets
How should one generally split their data set?
Training (building models) / Validation (picking model) / Test (estimate
performance)
Rotating versus randomness when validating data?
Rotation: can make sure each part of the data is equally separated
Randomness: no chance of bias
K-fold Cross-Validation
takes number of sections (k) and tests against eachother so you don't have to worry
about what is being left out. Gives a better estimate of model quality.
Clustering
takes a set of data points, dividing them into groups so each group contains points
that are close to eachother or similar.
Distance Norms
Given 2 points x and y with coordinates x1, x2 and y1, y2 -- the distance between
them is the square root of x1-y1 squared + x2-y2 squared.
ISYE 6501 MIDTERM 1 LATEST UPDATES ACTUAL
QUESTIONS AND CORRECT ANSWERS ALREADY
GRADED A+ GUARANTEED SUCCESS
What do you want to find with a SVM model?
Find values of a0, a1,,up to am that classifies the points correctly and has the
maximum gap or margin between the parallel lines.
What if it's not possible to separate green and red points in a SVM model?
Utilize a soft classifier -- In a soft classification context, we might add an extra
multiplier for each type of error with a larger penalty, the less we want to accept
mis-classifying that type of point.
Soft Classifier
What should the sum of the green points in a SVM model be?
The sum of green points should be greater than or equal to 1
Does a SVM classifier need to be a straight line?
No, SVM can be generalized using kernel methods that allow for nonlinear
classifiers. Software has a kernel SVM function that you can use to solve for both
linear and nonlinear classifiers.
Can classification questions be answered as probabilities in SVM?
Yes.
What should the sum of the red points in a SVM model be?
The sum of red points should be less than or equal to -1
What should the total sum of green and red points be?
, 2
The total sum of all green and red points should be equal to or greater than 1
because yj is 1 for green and -1 for red.
First principal component
PCA -- a linear combination of original predictor variables which captures the
maximum variance in the data set. It determines the direction of highest variability
in the data. Larger the variability captured in first component, larger the
information captured by component. No other component can have variability
higher than first principal component.
it minimizes the sum of squared distance between a data point and the line.
Second principal component
PCA -- also a linear combination of original predictors which captures the
remaining variance in the data set and is uncorrelated with Z¹. In other words, the
correlation between first and second component should is zero.
Account for errors in SVM classification. Trading off minimizing errors we make
and maximizing the margin.
To trade off between them, we pick a lambda value and minimize a combination of
error and margin. As lambda gets large, this term gets large.
The importance of a large margin outweighs avoiding mistakes and classifying
known data points.
Should you scale your data in a SVM model?
Yes, so the orders of magnitude are approximately the same.
Data must be in bounded range.
Common scaling: data between 0 and 1
a. Scale factor by factor
b. Linearly
How should you find which coefficients to hold value in a SVM model?
If there is a coefficient who's value is very close to 0, means the corresponding
attribute is probably not relevant for classification.
Does SVM work the same for multiple dimensions?
, 3
Yes
K Nearest Neighbor Algorithm
Find the class of the new point, Pick the k closest points to the new one, the new
points class is the most common amongst the k neighbors.
What should you do about varying level of importance across attributes with K
Nearest Neighbors?
Some attributes might be more important than others to the classification --- can
deal with this by weighting each dimension's distance differently.
Unimportant attributes may be removed as they are not very important for the
classification.
What is the difference between real and random effects in validation?
Real effects: same in all data sets
Random effects: different in all data sets
How should one generally split their data set?
Training (building models) / Validation (picking model) / Test (estimate
performance)
Rotating versus randomness when validating data?
Rotation: can make sure each part of the data is equally separated
Randomness: no chance of bias
K-fold Cross-Validation
takes number of sections (k) and tests against eachother so you don't have to worry
about what is being left out. Gives a better estimate of model quality.
Clustering
takes a set of data points, dividing them into groups so each group contains points
that are close to eachother or similar.
Distance Norms
Given 2 points x and y with coordinates x1, x2 and y1, y2 -- the distance between
them is the square root of x1-y1 squared + x2-y2 squared.