MGT 6203 EXAM MASTERY PACK - LATEST QUESTIONS WITH
VERIFIED ANSWERS - COMPLETE GUIDE
The most typical issues with linear regression fitting: ANSWER 1. Response-
predictor relationships' nonlinearity
Two. Error term correlation
Third. Variance of error terms that is not constant
Four. Unusual
Fifth. Points of high leverage
Sixth. Collinearity
How to determine linearity ( ANSWER 1). Examine the Y vs. X variable
scatterplot. Does it follow a linear pattern?
Two. Fitted plot against OR residual plot (particularly helpful in multiple
regression). We want this plot to show no trends.
If the error terms are correlated... ANSWER 1) The genuine standard errors
will be underestimated by the projected SE
2) P values will be lower than they should be, and confidence and prediction
ranges will be narrower than they should be.
3) We might have unjustified faith in the model.
To find auto-correlation in a linear model, use the Durbin-Watson test.
,The ANSWER to heteroskedasticity (non-constant error variance) is that you
want constant variance. The residuals vs. fitted plot will show this.
Your confidence intervals and hypothesis tests could be deceptive if there is
non-constant error.
If heteroscedasticity is present, the Y can be transformed. For example, ln(Y)
The correlation between each of the ei variables is known as autocorrelation.
Standardized residuals vs Y are used to view outliers.
High leverage: ANSWER Verified that removing it (along with two or three
other points) results in a discernible alteration in the model
Cook's Distance (identifying high leverage points): ANSWER calculates the
variation in the distance between the derived regression coefficients. Any Cook
(Ci) value more than one has a significant impact.
ANSWER 1) Residual vs. Fitted (Look for non-linear patterns) in the R Plot()
function
2) Normal Q-Q (Determine whether residuals follow a normal distribution)
3) Scale-Location: Verify that standardized residuals are distributed uniformly
over the fitted value range.
4) Leverage versus. residuals (to identify any influential pints, Ci >1)
There are two different kinds of outliers: ANSWER 1) Y (response) outlier
2) Leverage point, X (predictor) outlier
Additional Outliers... ANSWER 1) If the values in the response deviate from
the mean by more than two to three standard deviations.
, 2) An outlier is considered influential if its removal significantly alters the
regression analysis.
When two or more predictors have a high degree of correlation, this is known as
multi-collinearity. Variance Inflation Factors (VIF) were used to detect
Principle Component Analysis and Factor Analysis could be a solution.
ANSWER : Variance Inflation Factor (VIF) = 1/(1-R^2j)
If the link is strong, R squared will be near 1. There will be a big VIF. VIF
greater than 5
The base case of dummy (Indicator) variables can have binary values of 0.
Middle {0,1}; Old {0,1}; Young {0,0}. "Because the other two are both zero,
the young categorical dummy variable is unique."
Log Transformation: ANSWER 1) Level-level model: no modifications
2) Linear-Log model: X variables are transformed logarithmically
3) Log-Linear Model: Use log Y as the dependent variable after transforming
the Y variable.
4) Log-log model: Using the logs of the variables x and y.
Motives for Data Transformation: ANSWER 1) To attain a (more)
Notes on data interpretation: ANSWER
Quadratic Model/Polynomial- ANSWER Isolated coefficient interpretation is
not possible with the quadratic model.
VERIFIED ANSWERS - COMPLETE GUIDE
The most typical issues with linear regression fitting: ANSWER 1. Response-
predictor relationships' nonlinearity
Two. Error term correlation
Third. Variance of error terms that is not constant
Four. Unusual
Fifth. Points of high leverage
Sixth. Collinearity
How to determine linearity ( ANSWER 1). Examine the Y vs. X variable
scatterplot. Does it follow a linear pattern?
Two. Fitted plot against OR residual plot (particularly helpful in multiple
regression). We want this plot to show no trends.
If the error terms are correlated... ANSWER 1) The genuine standard errors
will be underestimated by the projected SE
2) P values will be lower than they should be, and confidence and prediction
ranges will be narrower than they should be.
3) We might have unjustified faith in the model.
To find auto-correlation in a linear model, use the Durbin-Watson test.
,The ANSWER to heteroskedasticity (non-constant error variance) is that you
want constant variance. The residuals vs. fitted plot will show this.
Your confidence intervals and hypothesis tests could be deceptive if there is
non-constant error.
If heteroscedasticity is present, the Y can be transformed. For example, ln(Y)
The correlation between each of the ei variables is known as autocorrelation.
Standardized residuals vs Y are used to view outliers.
High leverage: ANSWER Verified that removing it (along with two or three
other points) results in a discernible alteration in the model
Cook's Distance (identifying high leverage points): ANSWER calculates the
variation in the distance between the derived regression coefficients. Any Cook
(Ci) value more than one has a significant impact.
ANSWER 1) Residual vs. Fitted (Look for non-linear patterns) in the R Plot()
function
2) Normal Q-Q (Determine whether residuals follow a normal distribution)
3) Scale-Location: Verify that standardized residuals are distributed uniformly
over the fitted value range.
4) Leverage versus. residuals (to identify any influential pints, Ci >1)
There are two different kinds of outliers: ANSWER 1) Y (response) outlier
2) Leverage point, X (predictor) outlier
Additional Outliers... ANSWER 1) If the values in the response deviate from
the mean by more than two to three standard deviations.
, 2) An outlier is considered influential if its removal significantly alters the
regression analysis.
When two or more predictors have a high degree of correlation, this is known as
multi-collinearity. Variance Inflation Factors (VIF) were used to detect
Principle Component Analysis and Factor Analysis could be a solution.
ANSWER : Variance Inflation Factor (VIF) = 1/(1-R^2j)
If the link is strong, R squared will be near 1. There will be a big VIF. VIF
greater than 5
The base case of dummy (Indicator) variables can have binary values of 0.
Middle {0,1}; Old {0,1}; Young {0,0}. "Because the other two are both zero,
the young categorical dummy variable is unique."
Log Transformation: ANSWER 1) Level-level model: no modifications
2) Linear-Log model: X variables are transformed logarithmically
3) Log-Linear Model: Use log Y as the dependent variable after transforming
the Y variable.
4) Log-log model: Using the logs of the variables x and y.
Motives for Data Transformation: ANSWER 1) To attain a (more)
Notes on data interpretation: ANSWER
Quadratic Model/Polynomial- ANSWER Isolated coefficient interpretation is
not possible with the quadratic model.