ISYE6414 MIDTERM 2QUESTIONS &
ANSWERS(RATED A+)
In logistic regression, the relationship between the probability of success and the
predicting variables is nonlinear. - ANSWERTRUE: The equation that links the
predictors to the probability is:
𝑝(𝑥1,...,𝑥𝑝)=
𝑒𝑥𝑝(𝛽0+𝛽1𝑥1+...+𝛽𝑝𝑥𝑝) / 1+𝑒𝑥𝑝(𝛽0+𝛽1𝑥1+...+𝛽𝑝𝑥𝑝)
This relationship is not linear.
In logistic regression, the error terms are assumed to follow a normal distribution. -
ANSWERFALSE: There are no error terms in logistic regression
The logit function is the log of the ratio of the probability of success to the probability
of failure. It is also known as the log odds function. - ANSWERTRUE: 𝑔(𝑝)=ln(p/1−𝑝)
The logit link function is also known as the log odds function.
In Poisson regression, we assume a nonlinear relationship between the log rate and
the predicting variables. - ANSWERFALSE: In Poisson regression, we assume a
linear relationship between the log rate and the predicting variables.
Linearity Assumption: 𝑙𝑜𝑔(𝐸(𝑌|𝑥1,...,𝑥𝑝))=𝛽0+𝛽1𝑥1+...+𝛽𝑝𝑥𝑝
Interpret the coefficient for concCu. - ANSWERA 1-unit increase in the concentration
of copper decreases the log odds of botrytis blight surviving by 0.27483 when sulfur
stays fixed.
Suppose you wanted to test if the coefficient for concCu is equal to -0.3. What z-
value would you use for this test? - ANSWERz-value = (estimated coefficient - null
value)/standard error of estimated coefficient = (-0.27483+0.3)/0.01784 = 1.411
Construct an approximate 95% confidence interval for the coefficient of concS. -
ANSWER95% confidence interval = (estimated coefficient - z critical point * standard
error of estimated coefficient, estimated coefficient + z critical point * standard error
of estimated coefficient) = (-4.32735 - 1.96*0.26518, -4.32735 + 1.96*0.26518) = (-
4.847, -3.808)
The number of parameters that need to be estimated in a logistic regression model
with 6 predicting variables and an intercept is the same as the number of parameters
that need to be estimated in a standard linear regression model with an intercept and
same predicting variables. - ANSWERFALSE: As there is no error term in a logistic
regression model, there is no additional parameter for the variance of the error
terms. As a result, the number of parameters that need to be estimated in a logistic
regression model with 6 predicting variables and an intercept is 7. The number of
parameters that need to be estimated in a standard linear regression model with an
intercept and same predicting variables is 8.
, The log-likelihood function is a linear function with a closed-form solution. -
ANSWERFALSE: The log-likelihood function is a non-linear function. A numerical
algorithm is needed in order to maximize it.
In logistic regression, the estimated value for a regression coefficient 𝛽𝑖 represents
the estimated expected change in the response variable associated with one unit
increase in the corresponding predicting variable, 𝑥𝑖 , holding all else in the model
fixed. - ANSWERFALSE: We interpret logistic regression coefficients with respect to
the odds of success.
Under logistic regression, the sampling distribution used for a coefficient estimator is
a Chi-squared distribution when the sample size is large. - ANSWERFALSE: The
coefficient estimator follows an approximate normal distribution
When testing a subset of coefficients, deviance follows a chi-square distribution with
𝑞q degrees of freedom, where 𝑞q is the number of regression coefficients in the
reduced model. - ANSWERFALSE: When testing a subset of coefficients, deviance
follows a chi-square distribution with q degrees of freedom, where q is the number of
regression coefficients discarded from the full model to get the reduced model.
Logistic regression deals with the case where the dependent variable is binary, and
the conditional distribution 𝑌𝑖|𝑿𝑖,1,⋯,𝑿𝑖,𝑝 is Binomial. - ANSWERTRUE: Logistic
regression is the generalization of the standard regression model that is used when
the response variable y is binary or binomial.
In logistic regression, if the p-value of the deviance test for goodness-of-fit is smaller
than the significance level 𝛼, then it is plausible that the model is a good fit. -
ANSWERFALSE: For logistic regression, if the p-value of the deviance test for
goodness-of-fit is large, then it is an indication that the model is a good fit.
If a logistic regression model provides accurate classification, then we can conclude
that it is a good fit for the data. - ANSWERFALSE: 'Goodness of fit doesn't
guarantee good prediction." And conversely, good prediction doesn't guarantee that
the model is a good fit.
To evaluate whether the model is a good fit or equivalently whether the assumptions
hold, we can use the Pearson or deviance residuals to evaluate whether they are
normally distributed. We can evaluate that using the histogram and the normality
plots. If they're normally distributed, then we conclude that the model is a good fit.
Another approach to evaluating goodness of fit is through hypothesis testing. In the
goodness of fit test, the null hypothesis is that the model fits well, and the alternative
is that the model does not fit well. The test statistic for the goodness of fit test is the
sum of squared deviances. Under the null hypothesis of good fit, the test statistic has
an approximate Chi-Square distribution with n-p-1 degrees of freedom. Very
important to remember that if the p-value is small, we reject the null hypothesis of
good fit, and thus we conclude that the model is not a good fit.
ANSWERS(RATED A+)
In logistic regression, the relationship between the probability of success and the
predicting variables is nonlinear. - ANSWERTRUE: The equation that links the
predictors to the probability is:
𝑝(𝑥1,...,𝑥𝑝)=
𝑒𝑥𝑝(𝛽0+𝛽1𝑥1+...+𝛽𝑝𝑥𝑝) / 1+𝑒𝑥𝑝(𝛽0+𝛽1𝑥1+...+𝛽𝑝𝑥𝑝)
This relationship is not linear.
In logistic regression, the error terms are assumed to follow a normal distribution. -
ANSWERFALSE: There are no error terms in logistic regression
The logit function is the log of the ratio of the probability of success to the probability
of failure. It is also known as the log odds function. - ANSWERTRUE: 𝑔(𝑝)=ln(p/1−𝑝)
The logit link function is also known as the log odds function.
In Poisson regression, we assume a nonlinear relationship between the log rate and
the predicting variables. - ANSWERFALSE: In Poisson regression, we assume a
linear relationship between the log rate and the predicting variables.
Linearity Assumption: 𝑙𝑜𝑔(𝐸(𝑌|𝑥1,...,𝑥𝑝))=𝛽0+𝛽1𝑥1+...+𝛽𝑝𝑥𝑝
Interpret the coefficient for concCu. - ANSWERA 1-unit increase in the concentration
of copper decreases the log odds of botrytis blight surviving by 0.27483 when sulfur
stays fixed.
Suppose you wanted to test if the coefficient for concCu is equal to -0.3. What z-
value would you use for this test? - ANSWERz-value = (estimated coefficient - null
value)/standard error of estimated coefficient = (-0.27483+0.3)/0.01784 = 1.411
Construct an approximate 95% confidence interval for the coefficient of concS. -
ANSWER95% confidence interval = (estimated coefficient - z critical point * standard
error of estimated coefficient, estimated coefficient + z critical point * standard error
of estimated coefficient) = (-4.32735 - 1.96*0.26518, -4.32735 + 1.96*0.26518) = (-
4.847, -3.808)
The number of parameters that need to be estimated in a logistic regression model
with 6 predicting variables and an intercept is the same as the number of parameters
that need to be estimated in a standard linear regression model with an intercept and
same predicting variables. - ANSWERFALSE: As there is no error term in a logistic
regression model, there is no additional parameter for the variance of the error
terms. As a result, the number of parameters that need to be estimated in a logistic
regression model with 6 predicting variables and an intercept is 7. The number of
parameters that need to be estimated in a standard linear regression model with an
intercept and same predicting variables is 8.
, The log-likelihood function is a linear function with a closed-form solution. -
ANSWERFALSE: The log-likelihood function is a non-linear function. A numerical
algorithm is needed in order to maximize it.
In logistic regression, the estimated value for a regression coefficient 𝛽𝑖 represents
the estimated expected change in the response variable associated with one unit
increase in the corresponding predicting variable, 𝑥𝑖 , holding all else in the model
fixed. - ANSWERFALSE: We interpret logistic regression coefficients with respect to
the odds of success.
Under logistic regression, the sampling distribution used for a coefficient estimator is
a Chi-squared distribution when the sample size is large. - ANSWERFALSE: The
coefficient estimator follows an approximate normal distribution
When testing a subset of coefficients, deviance follows a chi-square distribution with
𝑞q degrees of freedom, where 𝑞q is the number of regression coefficients in the
reduced model. - ANSWERFALSE: When testing a subset of coefficients, deviance
follows a chi-square distribution with q degrees of freedom, where q is the number of
regression coefficients discarded from the full model to get the reduced model.
Logistic regression deals with the case where the dependent variable is binary, and
the conditional distribution 𝑌𝑖|𝑿𝑖,1,⋯,𝑿𝑖,𝑝 is Binomial. - ANSWERTRUE: Logistic
regression is the generalization of the standard regression model that is used when
the response variable y is binary or binomial.
In logistic regression, if the p-value of the deviance test for goodness-of-fit is smaller
than the significance level 𝛼, then it is plausible that the model is a good fit. -
ANSWERFALSE: For logistic regression, if the p-value of the deviance test for
goodness-of-fit is large, then it is an indication that the model is a good fit.
If a logistic regression model provides accurate classification, then we can conclude
that it is a good fit for the data. - ANSWERFALSE: 'Goodness of fit doesn't
guarantee good prediction." And conversely, good prediction doesn't guarantee that
the model is a good fit.
To evaluate whether the model is a good fit or equivalently whether the assumptions
hold, we can use the Pearson or deviance residuals to evaluate whether they are
normally distributed. We can evaluate that using the histogram and the normality
plots. If they're normally distributed, then we conclude that the model is a good fit.
Another approach to evaluating goodness of fit is through hypothesis testing. In the
goodness of fit test, the null hypothesis is that the model fits well, and the alternative
is that the model does not fit well. The test statistic for the goodness of fit test is the
sum of squared deviances. Under the null hypothesis of good fit, the test statistic has
an approximate Chi-Square distribution with n-p-1 degrees of freedom. Very
important to remember that if the p-value is small, we reject the null hypothesis of
good fit, and thus we conclude that the model is not a good fit.