and Answers
Denning [Date] [Course title]
, Factor Based Models - Correct Answers:s :classification, clustering, regression. Implicitly assumed that
we have a lot of factors in the final model
Why limit number of factors in a model? 2 reasons - Correct Answers:s :overfitting: when # of factors is
close to or larger than # of data points. Model may fit too closely to random effects
simplicity: simple models are usually better
Classical variable selection approaches - Correct Answers:s :1. Forward selection
2. Backwards elimination
3. Stepwise regression
greedy algorithms
Backward elimination - Correct Answers:s :variable selection; classical
Opposite of forward selection. Start with model with all factors, at each step find worst factor and
remove from model. Continue until no more to add, # of factor threshold is satisfied. Remove factors at
the end that were not good enough
Forward selection - Correct Answers:s :variable selection; classical
Start with model with no factors, at each step find best new factor to add. Continue until none bad
enough to remove, # of factor threshold is satisfied. Remove factors at the end that were not good
enough
Stepwise regression - Correct Answers:s :variable selection; classical
Combination of forward selection and backwards elimination. Start with all or no factors. Each step
remove/add a factor. As it continues, after adding in new factor we eliminate right away any factors that
may be good. Helps model adjust when new factors are added, goodness values change
Ways of determining if factors are good enough in variable selection - Correct Answers:s :p-value,
Rsquared, AIC, BIC
Greedy algorithm - Correct Answers:s :At each step, it does the one thing that looks best
without taking future options into consideration. Good for initial analysis