QUESTIONS AND VERIFIED
CORRECT ANSWERS GRADED
A+ 100% GUARANTEED PASS [
LATEST 2026-2027]
Scaling: scaling data between 0 and 1 - CORRECT ANSWER-We scale all numbers between 0 and
1 by taking the min and max for that value and subtracting the min from each data point and
dividing it by the range (max - min) for that value. This is how you scale linearly.
Standardizing: standardizing to a normal distribution - CORRECT ANSWER-We first find the
mean and standard deviation of the variable. Then we subtract the mean from each variable
and divide it by the standard deviation. We would do this if we want to understand how far
from the mean each data point is.
When to use scaling and standardization - CORRECT ANSWER-Generally, you want to use scaling
if your data has some type of bounded range (e.g. SAT scores) since standardization won't
guarantee that a data point will stay within a range
k-Nearest Neighbor (KNN) - CORRECT ANSWER-KNN is a classifier that classifies data points
based on the k number of data points nearest to a data point. To find the class of a new point,
you pick the k closest points (neighbors) to the new one. The new point's class is the most
common among the k neighbors.
The calculation for KNN is much more straightforward with the main parameters being how
distance is calculated (typically straight-line) and what the optimal value for k should be.
,KNN vs SVM - CORRECT ANSWER-KNN is better when there are more than 2 classes present.
However, SVM is faster at classifying.
Is scaling important in KNN? Why or why not? - CORRECT ANSWER-Scaling is very important in
KNN since KNN is a distance based algorithm. Without scaling, one feature would play a much
larger impact than the other at determining the closest distance between data points.
Is it a good idea for you to use predictions from your training set for model validation? -
CORRECT ANSWER-No. Predictions made from a training data set are often too optimistic since
it is likely that your model is picking up random effects present in training data. Training data is
just used for training the model and should not be used for deriving how good the model
performs.
What are the two types of patterns that exist in data? - CORRECT ANSWER-Real effects: real
relationships between attributes and the response variable
Random effects: random, but looks like a real effect
Why does fitting a model on different data sets remove the impacts of random effects? -
CORRECT ANSWER-Real effects are the same in all data sets. If there is truly a relationship
between two variables, then that effect will always be present even if you change data sets.
Random effects are different in all data sets. When you change your data set, any random
effects your model picked up in training won't help it when it sees new data with different
random effects.
Are model scores derived from validation data sets typically higher or lower than ones derived
from training data sets? Why or why not? - CORRECT ANSWER-They are almost always going to
be lower than scores derived from training sets. The predictions made on a training set contain
, both real effects and random effects from that data. When that same model is run on a new
validation set, only the model's ability to pick up real effects should remain.
Training and Validation Sets: What are the objectives of each and which should be larger or
smaller? - CORRECT ANSWER-Training sets should be larger and are meant to train the model
and have it fitted on. Validation sets should be smaller and meant to be used for estimating the
model's effectiveness.
Training and Validation Sets: Choosing the best model from a group? - CORRECT ANSWER-When
choosing the best model among a group, you would to use the model score from the validation
set to compare results. However; you would not use the score from the validation set to
measure the model's overall accuracy. You would need to fit the model against a third testing
data set to evaluate its performance.
Why can't you use the validation score to measure a model's performance when choosing the
best model from a group? - CORRECT ANSWER-It is likely that the model that performed the
best during validation did so because it happened to be better at picking up the random effects
in your validation set than other models. As a result, the validation score it produced is probably
too optimistic.
Model scores are always a sum of fit to real patterns and fit to random patterns. If several
models are pretty close to each other in how well they pick up real patterns, the deciding factor
often becomes how well they fit random patterns in the validation data.
Training, validation, and test sets - CORRECT ANSWER-Training: trains our model and is used to
fit the models
Validation: used to compare and choose the best model
Test: used to estimate the performance of the chosen model
Note: Validation sets are only used when we are comparing multiple models. If only one model
was built, then we do not need a validation step and just need a Training and Test set.