MAS II Questions and Answers
1. Explain two benefits to using a tree based method (Statistical Learning, Trees): Tree-based methods are
-simple
-easy to interpret
2. Explain one disadvantage to using a tree-based method (Statistical Learn- ing, Trees): Tree-based method are
not competitive with the best supervised learn- ing approaches in terms of prediction accuracy
3. Explain how one could increase prediction accuracy when modeling with trees (Statistical Learning, Trees):
Combining a large number of trees can result in dramatic improvements in prediction accuracy, at the expense of some
loss in interpretation.
4. Explain what kind of problems decision trees can be applied to? (Statistical Learning, Trees): Decision trees
can be applied to both regression and classifica- tion problems
5. Recursive Binary Splitting (Statistical Learning, Trees): -Begin at the top of the tree, at which point all
observations belong to a single region, then successively split the predictor space so that RSS (or the Gini index if
classification tree) is minimized at each split. Go with the split that has the smallest RSS (or Gini Index) compared to the
other potential splits.
-Greedy because at each step of the tree-building process, the best split is made at that particular step, rather than looking
ahead and picking a split that will lead to a better tree in some future step.
-Leads to overfitting because the tree is too complex (has too many splits)
-The best splits will group observations with similar responses together.
-A tree with no split will have the highest RSS. Yhat will be the mean of the entire data (or majority vote if classification
6. Regression Tree (Statistical Learning, Trees): -Used to predict a quantitative response
-Predicted response given by the mean response of the training observations in the same region
-restrict final nodes to have at least 5 observations in them
7. Classification Tree (Statistical Learning, Trees): -Used to predict a qualitative response (ie will the person with
certain characteristics have good health or bad health)
, MAS II Questions and Answers
-Predicted class of the observation belongs to the most commonly occurring class of training observations in the region
8. Explain when it is appropriate to use Gini Index, Entropy, or Classification Error: -Growing Trees (to get T0):
Gini and Entropy to ensure node purity
-Pruning: Gini, Entropy, Classification Error
-Prediction: Classification error
9. For classification tree, why is node purity important when growing a tree?-
: The purer a node is, the more confident in our classification of a test observation.
Translation: If the node is primarily one class, then we will feel confident that the new observation belongs to that class.
10.Explain when a tree may outperform a linear model (Statistical Learning, Trees): If there is a highly non-
linear, complex relationship between the features and the response, then decision trees may outperform classical
approaches.
11.List Pros and Cons of Using Trees (Statistical Learning, Trees): Pros:
-Trees are very easy to explain
-Trees more intuitive because they more closely mirror human decision making
-Trees can be displayed graphically
-Trees can easily handle qualitative predictors without the need to create dummy variables
Cons:
-Trees do not have the same level of predictive accuracy as some of the other regression and classification approaches
-Trees are non-robust. A small change in the data can cause a large change in the final estimated tree.
12.Explain how you can improve the predictive accuracy of a tree (Statistical Learning, Trees): By aggregating
many decision trees, using methods like bagging, random forests , and boosting, the predictive performance of trees can
be substan- tially improved
13.Explain why using a single decision tree would be a bad idea. (Statistical Learning, Trees): -This tree, even
after being pruned, would overfit the data and have high variance.
-High variance means that if we test how well the tree predicts using new data, it does poorly.
, MAS II Questions and Answers
-To get better prediction accuracy, we need to reduce the variance (we have seen this happen when we have averaged
things)
14.Bagging (Statistical Learning, Trees): -Create B bootstrapped training sets (artificially created using the
original data)
-Create B trees and then average their predictions to get a final prediction if regression is the goal. Take a majority vote if
classification is the goal.
-The B trees created are not pruned (the have low bias because they are overfitted, but they are accurate to training data)
-The B trees do not have the same number of nodes
-The number of trees B is not critical, a very large value of B will not lead to overfitting.
-But B should be sufficiently large. When it is, the Out of Bag error (OOB error) is equivalent to leave-one-out cross
validation error.
15.Out of Bag error (Statistical Learning, Trees): -A way to approximate the test error for bagged trees.
-We know that for every observation, there is about 1/3 of the bagged trees that did not use the observation when creating
the model. The observation is "out of bag."
-When we want test error, we want "new data", so to these 1/3 models, the observa- tion is a new observation. You can fi
the test error for that observation by finding the MSE of the 1/3 bagged trees (if regression is the goal). If classification is
the goal, take a majority vote using the specified bagged trees.
-The OOB approach for estimating test error is particularly convenient when per- forming bagging on large data sets whe
LOOCV would take a long time.
16.List Pros and Cons for Bagging (Statistical Learning, Trees): Pro:
-Increase predictive accuracy
-Can obtain a summary of the importance of each predictor using RSS (if regression) or Gini Index (if classification)
Con:
-Loses interpretability
-No picture
, MAS II Questions and Answers
-Not as big of a decrease in variance as forest because the trees can be highly correlated (ie they all look the same)
17.Forests (Statistical Learning, Trees): -Same idea as bagging, except it decor- relates the bagged trees by only
allowing the trees to choose a split from a subset of predictors instead of all (subset of predictors changes at each split
and is random). This prevents all the trees from starting with the same first split since it is the strongest predictor. If they
did, then all the trees would start to look the same. This results in (p-m)/p splits (on average) that will not consider the
strong predictors so other predictors have more of a chance
-You get more diverse bagged trees, a larger reduction in variance, and larger reduction in test error and OOB error
compared to bagging. See explanation below.
Averaging many highly correlated quantities does not lead to as large of a reduction in variance as averaging many
uncorrelated quantities. Use m = square root(# of predictors)
18.List Pros and Cons for Forests (Statistical Learning, Trees): Pros:
-Greater reduction in variance than bagging, especially if predictors are highly correlated
-Lower test error and OOB error
Cons:
-Still abstract, no picture
19.Explain how Forests are different from Bagged Trees (Statistical Learning, Trees): Same thing as bagging,
except the trees are decorrelated.
-Note that if the subset of predictors in a forest is equal to the total predictors (m=p), than the forest is equivalent to the
bagged trees.
20.Boosting (Statistical Learning, Trees): -trees are grown sequentially: each tree is grown using information
from previously grown trees. Boosting does not involve bootstrap sampling; instead each tree is fit on a modified
version of the original data set.
Fit small tree using the current residuals, rather than the outcome Y , as the response. Update yhat by adding in a shrunke
version of the new tree. Update the residuals using the updated yhat. Then fit new small tree to the updated residuals.
Repeat.
1. Explain two benefits to using a tree based method (Statistical Learning, Trees): Tree-based methods are
-simple
-easy to interpret
2. Explain one disadvantage to using a tree-based method (Statistical Learn- ing, Trees): Tree-based method are
not competitive with the best supervised learn- ing approaches in terms of prediction accuracy
3. Explain how one could increase prediction accuracy when modeling with trees (Statistical Learning, Trees):
Combining a large number of trees can result in dramatic improvements in prediction accuracy, at the expense of some
loss in interpretation.
4. Explain what kind of problems decision trees can be applied to? (Statistical Learning, Trees): Decision trees
can be applied to both regression and classifica- tion problems
5. Recursive Binary Splitting (Statistical Learning, Trees): -Begin at the top of the tree, at which point all
observations belong to a single region, then successively split the predictor space so that RSS (or the Gini index if
classification tree) is minimized at each split. Go with the split that has the smallest RSS (or Gini Index) compared to the
other potential splits.
-Greedy because at each step of the tree-building process, the best split is made at that particular step, rather than looking
ahead and picking a split that will lead to a better tree in some future step.
-Leads to overfitting because the tree is too complex (has too many splits)
-The best splits will group observations with similar responses together.
-A tree with no split will have the highest RSS. Yhat will be the mean of the entire data (or majority vote if classification
6. Regression Tree (Statistical Learning, Trees): -Used to predict a quantitative response
-Predicted response given by the mean response of the training observations in the same region
-restrict final nodes to have at least 5 observations in them
7. Classification Tree (Statistical Learning, Trees): -Used to predict a qualitative response (ie will the person with
certain characteristics have good health or bad health)
, MAS II Questions and Answers
-Predicted class of the observation belongs to the most commonly occurring class of training observations in the region
8. Explain when it is appropriate to use Gini Index, Entropy, or Classification Error: -Growing Trees (to get T0):
Gini and Entropy to ensure node purity
-Pruning: Gini, Entropy, Classification Error
-Prediction: Classification error
9. For classification tree, why is node purity important when growing a tree?-
: The purer a node is, the more confident in our classification of a test observation.
Translation: If the node is primarily one class, then we will feel confident that the new observation belongs to that class.
10.Explain when a tree may outperform a linear model (Statistical Learning, Trees): If there is a highly non-
linear, complex relationship between the features and the response, then decision trees may outperform classical
approaches.
11.List Pros and Cons of Using Trees (Statistical Learning, Trees): Pros:
-Trees are very easy to explain
-Trees more intuitive because they more closely mirror human decision making
-Trees can be displayed graphically
-Trees can easily handle qualitative predictors without the need to create dummy variables
Cons:
-Trees do not have the same level of predictive accuracy as some of the other regression and classification approaches
-Trees are non-robust. A small change in the data can cause a large change in the final estimated tree.
12.Explain how you can improve the predictive accuracy of a tree (Statistical Learning, Trees): By aggregating
many decision trees, using methods like bagging, random forests , and boosting, the predictive performance of trees can
be substan- tially improved
13.Explain why using a single decision tree would be a bad idea. (Statistical Learning, Trees): -This tree, even
after being pruned, would overfit the data and have high variance.
-High variance means that if we test how well the tree predicts using new data, it does poorly.
, MAS II Questions and Answers
-To get better prediction accuracy, we need to reduce the variance (we have seen this happen when we have averaged
things)
14.Bagging (Statistical Learning, Trees): -Create B bootstrapped training sets (artificially created using the
original data)
-Create B trees and then average their predictions to get a final prediction if regression is the goal. Take a majority vote if
classification is the goal.
-The B trees created are not pruned (the have low bias because they are overfitted, but they are accurate to training data)
-The B trees do not have the same number of nodes
-The number of trees B is not critical, a very large value of B will not lead to overfitting.
-But B should be sufficiently large. When it is, the Out of Bag error (OOB error) is equivalent to leave-one-out cross
validation error.
15.Out of Bag error (Statistical Learning, Trees): -A way to approximate the test error for bagged trees.
-We know that for every observation, there is about 1/3 of the bagged trees that did not use the observation when creating
the model. The observation is "out of bag."
-When we want test error, we want "new data", so to these 1/3 models, the observa- tion is a new observation. You can fi
the test error for that observation by finding the MSE of the 1/3 bagged trees (if regression is the goal). If classification is
the goal, take a majority vote using the specified bagged trees.
-The OOB approach for estimating test error is particularly convenient when per- forming bagging on large data sets whe
LOOCV would take a long time.
16.List Pros and Cons for Bagging (Statistical Learning, Trees): Pro:
-Increase predictive accuracy
-Can obtain a summary of the importance of each predictor using RSS (if regression) or Gini Index (if classification)
Con:
-Loses interpretability
-No picture
, MAS II Questions and Answers
-Not as big of a decrease in variance as forest because the trees can be highly correlated (ie they all look the same)
17.Forests (Statistical Learning, Trees): -Same idea as bagging, except it decor- relates the bagged trees by only
allowing the trees to choose a split from a subset of predictors instead of all (subset of predictors changes at each split
and is random). This prevents all the trees from starting with the same first split since it is the strongest predictor. If they
did, then all the trees would start to look the same. This results in (p-m)/p splits (on average) that will not consider the
strong predictors so other predictors have more of a chance
-You get more diverse bagged trees, a larger reduction in variance, and larger reduction in test error and OOB error
compared to bagging. See explanation below.
Averaging many highly correlated quantities does not lead to as large of a reduction in variance as averaging many
uncorrelated quantities. Use m = square root(# of predictors)
18.List Pros and Cons for Forests (Statistical Learning, Trees): Pros:
-Greater reduction in variance than bagging, especially if predictors are highly correlated
-Lower test error and OOB error
Cons:
-Still abstract, no picture
19.Explain how Forests are different from Bagged Trees (Statistical Learning, Trees): Same thing as bagging,
except the trees are decorrelated.
-Note that if the subset of predictors in a forest is equal to the total predictors (m=p), than the forest is equivalent to the
bagged trees.
20.Boosting (Statistical Learning, Trees): -trees are grown sequentially: each tree is grown using information
from previously grown trees. Boosting does not involve bootstrap sampling; instead each tree is fit on a modified
version of the original data set.
Fit small tree using the current residuals, rather than the outcome Y , as the response. Update yhat by adding in a shrunke
version of the new tree. Update the residuals using the updated yhat. Then fit new small tree to the updated residuals.
Repeat.