DATA MINING AND STAT LEARN ACTUAL TEST
SCRIPT 2026 VERIFIED SOLUTIONS
◉ Forward Selection. Answer: Start with a model that has no factors.
At each step we find the best new factor to add to the model
and put it in as long as it's a good enough improvement. When
there's no factor that's good enough to add, or if we've added as
many factors as we want to have, we stop.
◉ Backward Elimination. Answer: We start with a model that
includes all factors and at each step, we find the worst factor and
remove it from the model. We continue until there's no factor bad
enough to remove, and the model doesn't have any more factors
than we want.
◉ Stepwise Regression. Answer: Multiple different forms, but
essentially a combination of forward selection and backwards
elimination. Since in each step these models look at only the best
current option and don't take future possibilities into account it is
known as the Greedy Algorithm
◉ Lasso Approach. Answer: We add a constraint to the standard
regression equation. The goal is still to minimize SSE given the
regression a budget t to use on coefficients.
, It'll use that budget on the most important coefficients which means
all the rest of the factors will have zero coefficient and so those
factors won't be part of the model.
◉ Elastic Net. Answer: Elastic net is effectively a combination of
LASSO and Ridge Regressions that trades some bias in order to
reduce variance and ultimately reduce total prediction error.
Constrains a combination of the absolute value of the coefficients
and their squares.
◉ Elastic Net Pros/Cons. Answer: Pros: Variable selection benefits of
LASSO
Predictive Benefits of Ridge Regression
Cons: Arbitrarily rules out some correlated variables
Underestimate coefficients of very predictive variables
◉ A/B Testing. Answer: Analytic method used to pick the best out of
several alternatives. Best used when data can be collected quickly,
from a representative population, and the amount of data is small
relative to the whole population.
◉ Factorial Design Tests. Answer: Design of experiment method
used to test a multitude of combinations. E.g 2 fonts x 2 wordings x 2
backgrounds to compare efficacy
SCRIPT 2026 VERIFIED SOLUTIONS
◉ Forward Selection. Answer: Start with a model that has no factors.
At each step we find the best new factor to add to the model
and put it in as long as it's a good enough improvement. When
there's no factor that's good enough to add, or if we've added as
many factors as we want to have, we stop.
◉ Backward Elimination. Answer: We start with a model that
includes all factors and at each step, we find the worst factor and
remove it from the model. We continue until there's no factor bad
enough to remove, and the model doesn't have any more factors
than we want.
◉ Stepwise Regression. Answer: Multiple different forms, but
essentially a combination of forward selection and backwards
elimination. Since in each step these models look at only the best
current option and don't take future possibilities into account it is
known as the Greedy Algorithm
◉ Lasso Approach. Answer: We add a constraint to the standard
regression equation. The goal is still to minimize SSE given the
regression a budget t to use on coefficients.
, It'll use that budget on the most important coefficients which means
all the rest of the factors will have zero coefficient and so those
factors won't be part of the model.
◉ Elastic Net. Answer: Elastic net is effectively a combination of
LASSO and Ridge Regressions that trades some bias in order to
reduce variance and ultimately reduce total prediction error.
Constrains a combination of the absolute value of the coefficients
and their squares.
◉ Elastic Net Pros/Cons. Answer: Pros: Variable selection benefits of
LASSO
Predictive Benefits of Ridge Regression
Cons: Arbitrarily rules out some correlated variables
Underestimate coefficients of very predictive variables
◉ A/B Testing. Answer: Analytic method used to pick the best out of
several alternatives. Best used when data can be collected quickly,
from a representative population, and the amount of data is small
relative to the whole population.
◉ Factorial Design Tests. Answer: Design of experiment method
used to test a multitude of combinations. E.g 2 fonts x 2 wordings x 2
backgrounds to compare efficacy