1.A common rule of thumb is to stop branching if a leaf would contain less than
5% of the data points. Why not keep branching and allow models to find very close
fits to each very small subset of data? - ANSWER Fitting to very small subsets of
data will cause overfitting.
Explanation: With too few data points, the models will fit to random patterns as
well as real ones. Lesson 10.2
2. True or false: In a regression tree, every leaf of the tree has a different regression
model that might use different attributes, have different coefficients, etc. -
ANSWER True.
Explanation: Each leaf's individual model is tailored to the subset of data points
that follow all of the branches leading to the leaf. Lesson 10.1
3.True or false: Tree-based approaches can be used for other models besides
regression. - ANSWER True.
Explanation: For example, a classification tree might have a different SVM or
KNN model at each leaf. It might even use SVM at some leaves and KNN at others
(though that's probably rare). Lesson 10.1
4. What does "heteroscedasticity" mean? - ANSWER The variance is different in
different ranges of the data. Lesson 9.1
5. You might want to de-trend data before... - ANSWER ...using time-series data in
a regression model.
,Explanation: Factor-based models like regression generally don't account for time-
based effects like trend. Lesson 9.2
6. Which of the following does principal component analysis (PCA) do? -
ANSWER Transform data so theres no correlation between dimensions.
Rank the new dimensions in likely order of importance.
Lesson 9.3
7. If you use principal component analysis (PCA) to transform your data and then
you run a regression model on it, how can you interpret the regression coefficients
in terms of the original attributes? - ANSWER Each original attribute's implied
regression coefficient is equal to a linear combination of the principal components'
regression coefficients.
Explanation: This is equivalent to using the inverse transformation. Lesson 9.4
8. When would regression be used instead of a time series model? - ANSWER
When there are other factors or predictors that affect the response.
Explanation: Regression helps show the relationships between factors and a
response. Lesson 8.1
9. If two models are approximately equally good, measures like AIC and BIC will
favor the simpler model. Simpler models are often better because... - ANSWER 1.
Simple models are easier to explain and "sell" to managers and executives
,2. The effects observed in simple models are easier for everyone, including
analytics professionals, to understand
3. Simple models are less likely to be over-fit to random effects
Explanation: Simpler models are less likely to be over-fit, easier to understand, and
easier to explain. Lesson 8.2
10. Which of the following is not a common use of regression? - ANSWER
Prescriptive analytics: Determining the best course of action.
Explanation: Regression is often good for describing and predicting, but is not as
helpful for suggesting a course of action. Lesson 8.3
11. True or false: regression is a way to determine whether one thing causes
another. - ANSWER False.
Explanation: Regression can show relationships between observations, but it
doesn't show whether one thing causes another. Lesson 8.4
Suppose our regression model to estimate how tall a 2-year-old will be as an adult
has the following coefficients:
0.56xFatherHeight + 0.51xMotherHeight - 0.02xFatherHeightxMotherHeight
13.Classification problems are commonly solved using what model(s)? -
ANSWER Support Vector Machine
14. Clustering problems are commonly solved using what model(s)? - ANSWER
k-means
, 15. Response Prediction questions are commonly solved using what model(s)? -
ANSWER -ARIMA
-CART
-Exponential smoothing
-linear regression
-logistic regression
-Random Forest
16.Validation questions are commonly solved using what model(s)? - ANSWER -
Cross Validation
17. Variance Estimation questions are commonly solved using what model(s)? -
ANSWER -GARCH
18. Examples of models that are designed for use with time series data - ANSWER
-ARIMA
-CUSUM
-Exponential Smoothing
-GARCH
19. In the soft classification SVM model where we select coefficients a_0 ... a_m
to minimize sum(max(0, 1 - (sum(a_i * x_ij) + a_0 ) * y_j ) + C * sum(a_i ^ 2) ). If
we want to have a larger margin even though it means possibly having more
classification error, the value of C should get: - ANSWER Larger
20. Best way to split data - ANSWER -70% for training
-15% for validation
-15% for test