Regression Analysis - PSTAT 126
Dr. Xiyue Liao
Department of Statistics and Applied Probability
University of California, Santa Barbara
March 6, 2018
,Review of the Previous Class
In the previous lecture, we learned about how to select variables and build a “correctly-specified”
model. There are four outcomes of building a model:
1 A regression model is correctly specified (outcome 1) if the regression equation contains
all of the relevant predictors, including any necessary transformations and interaction terms.
This is the best outcome: unbiased regression coefficients and unbiased predictions of the
response.
2 A regression model is underspecified (outcome 2) if the regression equation is missing
one or more important predictor variables.
This is the worst outcome: biased regression coefficients and biased M SE.
3 A regression model contains one or more extraneous variables (outcome 3).
Extraneous variables are neither related to the response nor to any of the other predictors.
Good news: unbiased regression coefficients and an unbiased M SE. Bad news: confidence
intervals tend to be wider and our hypothesis tests tend to have lower power.
4 A regression model is over-specified (outcome 4), then the regression equation contains
one or more redundant predictor variables.
Good news: unbiased regression coefficients and an unbiased M SE. Bad news: the model
is more complicated and hard to understand than necessary.
,Strategy to Build a Model
1 Know your goal, know your research question. Knowing how you plan to use your
regression model can assist greatly in the model building stage.
2 Identify all of the possible candidate predictors.
1. Don’t worry about interactions or the appropriate functional form – such as x2 and log x
– just yet.
2. Just make sure you identify all the possible important predictors.
3 Use variable selection procedures to find the middle ground between an
underspecified model and a model with extraneous or redundant variables.
Two possible variable selection procedures are stepwise regression and best subsets
regression.
4 Fine-tune the model to get a correctly specified model.
Iterate back and forth between formulating different regression models and checking the
behavior of the residuals until you are satisfied with the model.
, Overview of Stepwise Regression
1 First, we start with no predictors in our “stepwise model.”
2 Then, at each step along the way we either enter or remove a predictor based on some
criteria, for example:
1 the general (partial) F -tests – that is, the t-tests for the slope parameters – that are
obtained
2 Akaike’s Information Criterion (AIC)
3 Bayesian Information Criterion (BIC)
3 We stop when no more predictors can be justifiably entered or removed from our stepwise
model.