When might overfitting occur?
- when the # of factors is close to or larger than the # of data points causing the model to
potentially fit too closely to random effects
Why are simple models better than complex ones?
- less data is required; less chance of insignificant factors and easier to interpret
What is forward selection?
- we select the best new factor and see if it's good enough (R^2, AIC, or p-value) add it to our
model and fit the model with the current set of factors. Then at the end we remove factors
that are lower than a certain threshold
What is backward elimination?
,- we start with all factors and find the worst on a supplied threshold (p = 0.15). If it is worse
we remove it and start the process over. We do that until we have the number of factors that
we want and then we move the factors lower than a second threshold (p = .05) and fit the
model with all set of factors
What is stepwise regression?
- it is a combination of forward selection and backward elimination. We can either start with
all factors or no factors and at each step we remove or add a factor. As we go through the
procedure after adding each new factor and at the end we eliminate right away factors that
no longer appear.
What type of algorithms are stepwise selection?
- Greedy algorithms - at each step they take one thing that looks best
, What is LASSO?
- a variable selection method where the coefficients are determined by both minimizing the
squared error and the sum of their absolute value not being over a certain threshold
How do you choose t in LASSO?
- use the lasso approach with different values of t and see which gives the best trade off
Why do we have to scale the data for LASSO?
- if we don't the measure of the data will artificially affect how big the coefficients need to be
What is elastic net?
- A variable selection method that works by minimizing the squared error and constraining
the combination of absolute values of coefficients and their squares
What is a key difference between stepwise regression and lasso regression?
- If the data is not scaled, the coefficients can have artificially different orders of magnitude,
which means they'll have unbalanced effects on the lasso constraint.
Why doesn't Ridge Regression perform variable selection?
- The coefficients values are squared so they go closer to zero or regularizes them
What are the pros and cons of Greedy Algorithms (Forward selection, stepwise elimination,
stepwise regression)