DSCI 100 Study Guide | Questions and answers
KNN Regression
How KNN regression differs - Uses average of nearest neighbors to predict a
from KNN classification continuous value KNN Classification
- Uses majority vote of nearest neighbors to predict category/class
The output should be a value that is the prediction of the
Interpret the output of a KNN
response variable made by the regression model
regression.
To find the K that gives the best_params_
smallest RMSPE, you can use: E.G. sacr_fit.best_params_ (sacr_fit is the GridSearchCV object)
the response variable (e.g. predicting house price the RMSPE
RMSPE is measured in the same
is $83,825 which means on new observations, we expect
units as:
the error in our prediction to be roughly $83,825)
In the context of KNN Small RMSPE =
regression, compare and good Large
contrast goodness of fit and RMSPE = bad
prediction properties (namely
RMSE vs RMSPE).
They have the exact same equation: RMSE is for evaluating
RMSE and RMSPE difference? prediction quality on the training data
RMSPE is for evaluating prediction quality on the testing or validation
data
1. is a simple, intuitive algorithm,
, Describe the advantages of K- 2. requires few assumptions about what the data must look like, and
nearest neighbours regression 3. works well with non-linear relationships (i.e., if the
(3) relationship is not a straight line) since it just uses the
nearest neighbours to predict its values
1. becomes very slow as the training data gets larger,
Describe the disadvantages of K-
nearest neighbours regression 2. may not perform well with a large number of predictors, and
(3) 3. may not predict well beyond the range of values input in your
training data.
Simple linear regression average squared vertical distance between itself and each of
chooses the straight line of the observed data points in the training data
best fit by choosing the line
that minimizes the: __________________
KNN Regression
How KNN regression differs - Uses average of nearest neighbors to predict a
from KNN classification continuous value KNN Classification
- Uses majority vote of nearest neighbors to predict category/class
The output should be a value that is the prediction of the
Interpret the output of a KNN
response variable made by the regression model
regression.
To find the K that gives the best_params_
smallest RMSPE, you can use: E.G. sacr_fit.best_params_ (sacr_fit is the GridSearchCV object)
the response variable (e.g. predicting house price the RMSPE
RMSPE is measured in the same
is $83,825 which means on new observations, we expect
units as:
the error in our prediction to be roughly $83,825)
In the context of KNN Small RMSPE =
regression, compare and good Large
contrast goodness of fit and RMSPE = bad
prediction properties (namely
RMSE vs RMSPE).
They have the exact same equation: RMSE is for evaluating
RMSE and RMSPE difference? prediction quality on the training data
RMSPE is for evaluating prediction quality on the testing or validation
data
1. is a simple, intuitive algorithm,
, Describe the advantages of K- 2. requires few assumptions about what the data must look like, and
nearest neighbours regression 3. works well with non-linear relationships (i.e., if the
(3) relationship is not a straight line) since it just uses the
nearest neighbours to predict its values
1. becomes very slow as the training data gets larger,
Describe the disadvantages of K-
nearest neighbours regression 2. may not perform well with a large number of predictors, and
(3) 3. may not predict well beyond the range of values input in your
training data.
Simple linear regression average squared vertical distance between itself and each of
chooses the straight line of the observed data points in the training data
best fit by choosing the line
that minimizes the: __________________