The linearity assumption can be evaluated by plotting the logit of the success rate
versus the predicting variables.
If there's a curvature or some non-linear pattern, it may be an indication that the lack of
fit may be due to the non-linearity with respect to some of the predicting variables
Logistic Regression Coefficient - CORRECT ANSWER-We interpret the regression
coefficient beta as the log of the odds ratio for an increase of one unit in the predicting
variable
We do not interpret beta with respect to the response variable but with respect to the
odds of success
The estimators for the regression coefficients in logistic regression are unbiased and
thus the mean of the approximate normal distribution is beta. The variance of the
estimator does not have a closed form expression
Model parameters - CORRECT ANSWER-The model parameters are the regression
coefficients.
There is no additional parameter to model the variance since there's no error term.
For P predictors, we have P + 1 regression coefficients for a model with intercept (beta
0).
We estimate the model parameters using the maximum likelihood estimation approach
Response variable - CORRECT ANSWER-The response data are Bernoulli or binomial
with one trial with probability of success
MLE - CORRECT ANSWER-The resulting log-likelihood function to be maximized, is
very complicated and it is non-linear in the regression coefficients beta 0, beta 1, and
beta p
MLE has good statistical properties under the assumption of a large sample size i.e.
large N
For large N, the sampling distribution of MLEs can be approximated by a normal
distribution
,The least square estimation for the standard regression model is equivalent with MLE,
under the assumption of normality.
MLE is the most applied estimation approach
Parameter estimation - CORRECT ANSWER-Maximizing the log likelihood function with
respect to beta0, beta1 etc in closed (exact) form expression is not possible because
the log likelihood function is a non-linear function in the model parameters i.e. we
cannot derive the estimated regression coefficients in an exact form
Use numerical algorithm to estimate betas (maximize the log likelihood function). The
estimated parameters and their standard errors are approximate estimates
Binomial Data - CORRECT ANSWER-This is binary data with repititions
Marginal Relationship - CORRECT ANSWER-Capturing the association of a predicting
variable to the response variable without consideration of other factors
Conditional Relationship - CORRECT ANSWER-Capturing the association oof a
predicting variable to the response variable conditional of other predicting variables in
the model
Simpson's paradox - CORRECT ANSWER-This is when the addition of a predictive
variable reverses the sign on the coefficients of an existing parameter
It refers to reversal of an association when looking at a marginal relationship versus a
partial or conditional one. This is a situation where the marginal relationship adds a
wrong sign
This happens when the 2 variables are correlated
Normal Distribution - CORRECT ANSWER-Normal distribution relies on a large sample
of data. Using this approximate normal distribution we can further derive confidence
intervals.
Since the distribution is normal, the confidence interval is the z-interval
**Applies for Logistic & Poisson Regression
Hypothesis Testing (coefficient == 0) - CORRECT ANSWER-To perform hypothesis
testing, we can use the approximate normal sampling distribution.
The resulting hypothesis test is also called the Wald test since it relies on the large
sample normal approximation of MLEs
, To test whether the coefficient betaj = 0 or not, we can use the z- value
**Applies for Logistic & Poisson Regression
Wald Test (Z-test) - CORRECT ANSWER-The z-test value is the ratio between the
estimated coefficient minus 0, (which is the null value) divided by the standard error
We reject the null hypothesis that the regression coefficient is 0 if the z value (gets too
large) is larger in absolute value than the z critical point, (or the 1- alpha over 2 of the
normal quantile).
We interpret that the coefficient is statistically significant
**Applies for Logistic & Poisson Regression
Hypothesis Testing (coefficient == constant) - CORRECT ANSWER-To test if the
regression coefficient is equal to this constant b, then the z-value changes.
We subtract b from the estimated coefficients of the numerator
We decide to reject/accept using the P-value
The P-value is 2 times the left tail of the standard normal of the quantile provided by the
absolute value of the z-value
P-value = 2P(Z > |z-value|)
**Applies for Logistic & Poisson Regression
Hypothesis testing (statistical significance: +/-) - CORRECT ANSWER-Here, the z-value
is the same but the P-value will change
Positive:
P-value = P(Z > z-value)
Negative:
P-value = P(Z < z-value)
**Applies for Logistic & Poisson Regression
Statistical Inference - CORRECT ANSWER-Logistic Regression: Normal Distribution.
The statistical inference based on the normal distribution applies only under large
sample data. If the sample size, or n, is small? Then the statistical inference is not
reliable i.e. warn on the lack of the reliability of the results