Why would we want to estimate the variance?
- Knowing the variance can help us estimate the amount of error
Why is GARCH different from ARIMA and exponential smoothing?
- GARCH estimates variance
- ARIMA and exponential smoothing both estimate the value of an attribute; GARCH
estimates the variance
When would regression be used instead of a time series model?
- When there are other factors or predictors that affect the response.
- Regression helps show the relationships between factors and a response
If two models are approximately equally good, measures like AIC and BIC will favor the simpler
model. Simpler models are often better because...
- Simpler models are less likely to be over-fit, easier to understand, and easier to explain
What is not a common use of regression?
- Prescriptive analytics: Determining the best course of action
- Regression is often good for describing and predicting, but is not as helpful for suggesting a
course of action
True or false: regression is a way to determine whether one thing causes another.
- False. Regression can show relationships between observations, but it doesn't show whether
one thing causes another
,Suppose our regression model to estimate how tall a 2-year-old will be as an adult has the
following coefficients:
0.56xFatherHeight + 0.51xMotherHeight - 0.02xFatherHeightxMotherHeight
The negative sign on the coefficient of FatherHeightxMotherHeight means:
- People with two taller-than-average parents won't be as tall as the individual effects of
father's height and mother's height add up to
- The negative coefficient for the interaction term brings down the overall estimate
What does "heteroscedasticity" mean?
- The variance is different in different ranges of the data
You might want to de-trend data before...
- ...using time-series data in a regression model
Factor-based models like regression generally don't account for time-based effects like trend.
Which of the following does principal component analysis (PCA) do?
- Transform data so there's no correlation between dimensions and rank the new dimensions
in likely order of importance.
If you use principal component analysis (PCA) to transform your data and then you run a
regression model on it, how can you interpret the regression coefficients in terms of the
original attributes?
- Each original attribute's implied regression coefficient is equal to a linear combination of the
principal components' regression coefficients.
,This is equivalent to using the inverse transformation.
True or false: In a regression tree, every leaf of the tree has a different regression model that
might use different attributes, have different coefficients, etc.
- True. Each leaf's individual model is tailored to the subset of data points that follow all of
the branches leading to the leaf.
Tree-based approaches can be used for other models besides regression.
- True. For example, a classification tree might have a different SVM or KNN model at each
leaf. It might even use SVM at some leaves and KNN at others (though that's probably rare).
A common rule of thumb is to stop branching if a leaf would contain less than 5% of the data
points. Why not keep branching and allow models to find very close fits to each very small
subset of data?
- Fitting to very small subsets of data will cause overfitting. With too few data points, the
models will fit to random patterns as well as real ones
True or False: When using a random forest model, it's easy to interpret how its results are
determined.
- False. Unlike a model like regression where we can show the result as a simple linear
combination of each attribute times its regression coefficient, in a random forest model there
are so many different trees used simultaneously that it's difficult to interpret exactly how any
factor or factors affect the result.
A logistic regression model can be especially useful when the response...
- ...is a probability (a number between zero and one) or is binary (either zero or one).
A model is built to determine whether data points belong to a category or not. A "true
negative" result is:
- A data point that is not in the category, and the model correctly says so. True' and 'false'
refer to whether the model is correct or not, and 'positive' and 'negative' refer to whether the
model says the point is in the category.
, True or False: The most useful classification models are the ones that correctly classify the
highest fraction of data points.
- False. Sometimes the cost of a false positive is so high that it's worth accepting more false
negatives, or vice versa.
What do descriptive questions ask?
- What happened? (e.g., which customers are most alike)
What do predictive questions ask?
- What will happen? (e.g., what will Google's stock price be?)
What do prescriptive questions ask?
- What action(s) would be best? (e.g., where to put traffic lights)
What is a model?
- Real-life situation expressed as math.
What do classifiers help you do?
- differentiate
What is a soft classifier and when is it used?
- In some cases, there won't be a line that separates all of the labeled examples. So we use a
classifier that minimizes the number of mistakes.
What does it mean when the classifier/decision boundary is almost parallel to the vertical x-
axis?
- The horizontal attribute is all that is needed.