ANSWERS(RATED A+)
Rows - ANSWERData points are values in data tables
Columns - ANSWERThe 'answer' for each data point (response/outcome)
Structured Data - ANSWERQuantitative, Categorical, Binary, Unrelated, Time Series
Unstructured Data - ANSWERText
Second principal component - ANSWERPCA -- also a linear combination of original
predictors which captures the remaining variance in the data set and is uncorrelated
with Z¹. In other words, the correlation between first and second component should
is zero.
What if it's not possible to separate green and red points in a SVM model? -
ANSWERUtilize a soft classifier -- In a soft classification context, we might add an
extra multiplier for each type of error with a larger penalty, the less we want to accept
mis-classifying that type of point.
Support Vector Model - ANSWERSupervised machine learning algorithm used for
both classification and regression challenges.
Mostly used in classification problems by plotting each data item as a point in n-
dimensional space (n is the number of features you have) with the value of each
feature being the value of a particular coordinate.
Then you classify by finding a hyperplane that differentiates the 2 classes very well.
Support vectors are simply the coordinates of individual observation -- it best
segregates the two classes (hyperplane / line).
What do you want to find with a SVM model? - ANSWERFind values of a0, a1,...,up
to am that classifies the points correctly and has the maximum gap or margin
between the parallel lines.
What should the sum of the green points in a SVM model be? - ANSWERThe sum of
green points should be greater than or equal to 1
What should the sum of the red points in a SVM model be? - ANSWERThe sum of
red points should be less than or equal to -1
What should the total sum of green and red points be? - ANSWERThe total sum of
all green and red points should be equal to or greater than 1 because yj is 1 for
green and -1 for red.
, First principal component - ANSWERPCA -- a linear combination of original predictor
variables which captures the maximum variance in the data set. It determines the
direction of highest variability in the data. Larger the variability captured in first
component, larger the information captured by component. No other component can
have variability higher than first principal component.
it minimizes the sum of squared distance between a data point and the line.
Soft Classifier - ANSWERAccount for errors in SVM classification. Trading off
minimizing errors we make and maximizing the margin.
To trade off between them, we pick a lambda value and minimize a combination of
error and margin. As lambda gets large, this term gets large.
The importance of a large margin outweighs avoiding mistakes and classifying
known data points.
Should you scale your data in a SVM model? - ANSWERYes, so the orders of
magnitude are approximately the same.
Data must be in bounded range.
Common scaling: data between 0 and 1
a. Scale factor by factor
b. Linearly
How should you find which coefficients to hold value in a SVM model? - ANSWERIf
there is a coefficient who's value is very close to 0, means the corresponding
attribute is probably not relevant for classification.
Does SVM work the same for multiple dimensions? - ANSWERYes
Does a SVM classifier need to be a straight line? - ANSWERNo, SVM can be
generalized using kernel methods that allow for nonlinear classifiers. Software has a
kernel SVM function that you can use to solve for both linear and nonlinear
classifiers.
Can classification questions be answered as probabilities in SVM? - ANSWERYes.
K Nearest Neighbor Algorithm - ANSWERFind the class of the new point, Pick the k
closest points to the new one, the new points class is the most common amongst the
k neighbors.
What should you do about varying level of importance across attributes with K
Nearest Neighbors? - ANSWERSome attributes might be more important than others
to the classification --- can deal with this by weighting each dimension's distance
differently.
Unimportant attributes may be removed as they are not very important for the
classification.
What is the difference between real and random effects in validation? -
ANSWERReal effects: same in all data sets
Random effects: different in all data sets