Answers Verified 100% Correct
What is a method to pick the number of clusters to use for k-means clustering? -
ANSWER An elbow diagram. Find the point where the benefit of adding another cluster
gets really small (the curve flattens)
How do you use k-means for predictive analytics? - ANSWER Find the distance to the
nearest cluster center and assign that point to that cluster.
When would you choose a classification model over a clustering model? - ANSWER
When you know the classification of your data. Then you can build your classification
model and predict new points.
What is the difference between supervised and unsupervised learning? - ANSWER The
response is known for supervised learning (classification) and unknown for
unsupervised learning (clustering)
Which is more common: supervised or unsupervised learning? - ANSWER Supervised
Point outlier - ANSWER values that are far from the rest of the data
Contextual outlier - ANSWER Value isn't far from the rest overall, but is far from points
nearby in time
Collective outlier - ANSWER something is missing in a range of points but we can't tell
exactly where
Box and whisker plot - ANSWER A way to find outliers in a single dimension
What is a way to find outliers in multi-dimensional data? - ANSWER You could build an
exponential smoothing model and look at errors
Two approaches to dealing with outliers that are bad data - ANSWER Omit those
points
Use imputation
Why are hypothesis tests often not sufficient for change detection? - ANSWER They
are often too slow to detect change.
In cusum, a higher T value makes the model... - ANSWER ...detect changes slower
and less likely to make false detections.
, In exponential smoothing, a alpha value closer to 1 is chosen if... - ANSWER There's
less randomness so we're more willing to trust observation x_t
A multiplicative seasonality like Holt-Winters means that the seasonal effect is... -
ANSWER ...proportional to the baseline.
How do you find good values for alpha, beta and gamma for forecasting? - ANSWER
Minimize the squared errors
What does ARIMA stand for? - ANSWER AutoRegressive Integrated Moving Average
The key parts of ARIMA - ANSWER 1. Differences
2. Autoregression
3. Moving Average
What is regression? - ANSWER Predicting the value of something based on other
factors
What is autoregression? - ANSWER Using earlier values of the same thing we're trying
to predict. (only works with time series data)
What does GARCH stand for? - ANSWER Generalized Autoregressive Conditional
Heteroskedasticity
What is GARCH used for? - ANSWER Estimating forecast variance. (especially
important in investments)
What is simple linear regression? - ANSWER Linear regression with one predictor.
When would regression be used instead of a time series model? - ANSWER When
there are other factors or predictors that affect the response.
What is the difference between AIC and AICc - ANSWER AICc has a correction term to
handle smaller data sets because AIC requires a very large data set
When does BIC start to break down? - ANSWER When the number of parameters is
close to the number of data points.
What is the main difference between AIC and BIC? - ANSWER BIC's penalty term is
bigger.
What is the only type of analytics questions regression models don't ANSWER?
-