VERSION EXAM 1 & 2 WITH COMPLETE 600 QUESTIONS AND
CORRECT DETAILED ANSWERS\LATEST UPDATE \BRAND NEW!!
How do we find the best value of k in k means?
Elbow method: we calculate the total distance of each data point to its
cluster center and plot it in two dimensions. We look for the kik in the
graph.
When clustering for prediction how do we choose the prediction?
When we see a new point, we just choose whichever cluster center is
closest.
What is the difference between classification and clustering?
With classification mdoels, we know each data point's attributes and we
already know the right classification for the data points (supervised). In
clustering (unsupervised) we know the attributes but we don't know what
group any of these data points are in.
What is the difference between supervised learning and unsupervised learning?
Supervised - the response is known
Unsupervised - response is not known.
,The k-means algorithm for clustering is a "heuristic" because...
...it isn't guaranteed to get the best answer but it will get to a solution quickly.
A group of astronomers has a set of long-exposure CCD images of various
distant objects. They do not know yet which types of object each one is, and
would like your help using analytics to determine which ones look similar.
Which is more appropriate: classification or clustering?
clustering
Suppose one astronomer has categorized hundreds of the images by
hand, and now wants your help using analytics to automatically
determine which category each new image belongs to. Which is more
appropriate: classification or clustering?
classification
Which of these is generally a good reason to remove an outlier from your
data set?
A. The outlier is an incorrectly-entered data, not real data.
B. Outliers like this only happen occasionally.
If the data point isn't a true one, you should remove it from your data set.
What is an outlier?
A data point that is very different from the rest
,What graph or plot can we use to find outliers?
box-and-whisker plot
What are the parts of a box-and-whisker plot?
The bottom and top of the box are the 25th and 75th percentile. The
middle value is the median. The whiskers stretch up and down to the
most extreme non-outlier values.
Where would outliers exist in a box and whisker plot
outside of the whiskers.
What are some ways to deal with outliers that are bad data?
Omit them or use imputation
What can change detection be used for?
Determining whether action might be needed, determining impact of
past action, determining changes to help plan.
What is Cumulative sum (CUSUM) used for
detect in crease, decrease or both
, What is C used for in the Cusum formula
Since we expect there to be some randomness, we include a value C to
pull the running total down
If we have a larger C ...
the harder for S_t to get large and the less sensitive the method will be
If we have a smaller C ...
the more sensitive the method is because S_t can get larger faster
What factors go into finding the right values of C and T?
how costly it is if the model takes a long time to nice a change, and how
costly it is if the model think it has found a change that really isn't there.