BADM 211 Final Exam Random
Forests
1. Understand what the out-of-bag score indicates - answer-out of bag observations are
the remaining observations not used to fit a given bagged tree (because bagging makes
a model out of a sample of the observations [and then does it many times], meaning
there are some leftover and these leftover, or out of bag, observations are used to test
the error of the model, i.e. estimate the test error) (note: bagging it just doing
bootstrapping multiple times)
-OOB score is used to estimate the test error of the bagged model
2. Know the differences between Decision Trees and Random Forests - answer
decision trees:
-creates one singular tree
-easy to interpret
-but they're unstable
-but they have poor predictive performance
-but they're not cross validated
random forests:
-basically averages a bunch of trees which results in cross validation (e.g. when
predicted a numerical value it will take the average of the values predicted by the
various methods)
-is an ensemble method
-predicts more accurately
-multiple methods (i.e. bootstrap samples) are used initially, and
predictions/classifications tabulated
-reduces variance in predictions
-tends to cancel out error
-but you lose the interpretability and the rules embodied in a single tree
3. Know the differences between bagging and Random Forests - answerbagging:
bootstrap aggregating
-multiplier effect comes from multiple bootstrap samples, rather than multiple methods
(bootstrapping is to take resamples, with replacement, from the original data)
1. generate multiple bootstrap samples
2. run algorithm on each and produce scores
3. average those scores (regression) or take majority vote (classification)
random forests:
-improves bagging by decorrelating the trees
Forests
1. Understand what the out-of-bag score indicates - answer-out of bag observations are
the remaining observations not used to fit a given bagged tree (because bagging makes
a model out of a sample of the observations [and then does it many times], meaning
there are some leftover and these leftover, or out of bag, observations are used to test
the error of the model, i.e. estimate the test error) (note: bagging it just doing
bootstrapping multiple times)
-OOB score is used to estimate the test error of the bagged model
2. Know the differences between Decision Trees and Random Forests - answer
decision trees:
-creates one singular tree
-easy to interpret
-but they're unstable
-but they have poor predictive performance
-but they're not cross validated
random forests:
-basically averages a bunch of trees which results in cross validation (e.g. when
predicted a numerical value it will take the average of the values predicted by the
various methods)
-is an ensemble method
-predicts more accurately
-multiple methods (i.e. bootstrap samples) are used initially, and
predictions/classifications tabulated
-reduces variance in predictions
-tends to cancel out error
-but you lose the interpretability and the rules embodied in a single tree
3. Know the differences between bagging and Random Forests - answerbagging:
bootstrap aggregating
-multiplier effect comes from multiple bootstrap samples, rather than multiple methods
(bootstrapping is to take resamples, with replacement, from the original data)
1. generate multiple bootstrap samples
2. run algorithm on each and produce scores
3. average those scores (regression) or take majority vote (classification)
random forests:
-improves bagging by decorrelating the trees