Lecture 8: Logistic Regression
Logistic Regression: focuses on categorical outcomes with continuous and/or categorical predictors
Can we predict a categorical outcome from a set of predictors?
Is success on a statistics exam predicted from number of hours spent studying, or whether or not
student has a math A-level
What is relative strength of individual predictors?
Is smoking more strongly associated with heart disease than watching TV for >20 hours per week?
Are there interactions amongst predictor variables?
Perhaps physical inactivity predicts heart disease, but only in non-smokers
How good is our statistical model (the regression model) at classifying cases for which the outcome is
known?
Linear vs Logistic: fitted values it returns should be probabilities of a certain group membership
Outcome of binary values will not form a normal distribution
Two different means will not result in equal variance
Linear regression returns values from - to + infinities; probabilities must be between 0 and 1
Categorical outcomes:
Binary: only two possible outcomes (pass/fail)
Dichotomous: only two options
Mutually exclusive: each case must be a member of one group only
Multinomial: several outcomes without an order (hair color, geographical region)
Ordinal: several possible outcomes but are ordered
Logistic Function
Categorical variable changes across continuous variables by an S shape
Log Odds Transformations —> give an odds value a symmetrical scale from -infinity to +infinity
Probabilities range from 0 to 1
Odds:
0 to 1 reflect probability of less than 0.5 but odds from 1 to infinity reflect p > 0.05
dd i fl d i i dd i fi i fl i i dd
Logistic Regression: focuses on categorical outcomes with continuous and/or categorical predictors
Can we predict a categorical outcome from a set of predictors?
Is success on a statistics exam predicted from number of hours spent studying, or whether or not
student has a math A-level
What is relative strength of individual predictors?
Is smoking more strongly associated with heart disease than watching TV for >20 hours per week?
Are there interactions amongst predictor variables?
Perhaps physical inactivity predicts heart disease, but only in non-smokers
How good is our statistical model (the regression model) at classifying cases for which the outcome is
known?
Linear vs Logistic: fitted values it returns should be probabilities of a certain group membership
Outcome of binary values will not form a normal distribution
Two different means will not result in equal variance
Linear regression returns values from - to + infinities; probabilities must be between 0 and 1
Categorical outcomes:
Binary: only two possible outcomes (pass/fail)
Dichotomous: only two options
Mutually exclusive: each case must be a member of one group only
Multinomial: several outcomes without an order (hair color, geographical region)
Ordinal: several possible outcomes but are ordered
Logistic Function
Categorical variable changes across continuous variables by an S shape
Log Odds Transformations —> give an odds value a symmetrical scale from -infinity to +infinity
Probabilities range from 0 to 1
Odds:
0 to 1 reflect probability of less than 0.5 but odds from 1 to infinity reflect p > 0.05
dd i fl d i i dd i fi i fl i i dd