Inferential statistics – Test 2
Contents
Unit 550 – Multiple regression addition: the e6ect of two variables ................................................. 3
The e%ect of both a dummy and a ratio variable on another scale (ratio) variable ................................. 3
Checking residuals .......................................................................................................................... 4
Unit 553 – Multiple regression interaction: the combined e6ect of a ratio variable and one or more
dummies...................................................................................................................................... 4
Interaction/moderation .................................................................................................................... 5
Unit 554 – Multiple regression and non-linearity ............................................................................. 5
Introducing non-linearity .................................................................................................................. 5
Detecting and dealing with non-linearity ........................................................................................... 6
Unit 560 – Non normality of residuals and omitted variables ........................................................... 6
Normality of errors........................................................................................................................... 6
Normality of the residuals............................................................................................................. 7
Non-normality of residuals ........................................................................................................... 7
Why are deviations from normality problematic? ........................................................................... 7
Detecting non-normality of the residuals ....................................................................................... 7
What can be done about it? .......................................................................................................... 7
Studying the normality of errors using QQ plots ................................................................................. 7
Normal Uniform ................................................................................................................... 8
Bimodal ....................................................................................................................................... 8
Skewed right ................................................................................................................................ 8
................................................................................................................................................... 8
Skewed left .................................................................................................................................. 8
Testing for normality: Shapiro-Wilk test ............................................................................................. 9
Shapiro-Wilk test ......................................................................................................................... 9
Fixing the non-normality problem by transforming the y-variable ........................................................ 9
Checking and fixing normality in R .................................................................................................... 9
1. Creating and storing residuals .............................................................................................. 9
2. displaying residuals (histogram) ........................................................................................... 9
3. displaying residuals (QQ plot)............................................................................................... 9
4. testing the normality .......................................................................................................... 10
5. Repairing the non-normality problem.................................................................................. 10
6. Repairing the non-normality problem.................................................................................. 10
Unit 561 – Heteroscedasticity, non-equal variances and interaction e6ects ................................... 10
Homogeneity of errors ....................................................................................................................10
Why is heteroscedasticity “bad”?................................................................................................ 10
Why does unequal error variance occur in a model? .................................................................... 10
How to detect? .......................................................................................................................... 10
How to solve the problem? ......................................................................................................... 11
Testing for homogeneity of errors .....................................................................................................11
Levene’s test .............................................................................................................................. 11
, Levene’s test and Welch t-test .................................................................................................... 11
Breusch Pagan test .................................................................................................................... 11
Take aways ................................................................................................................................ 12
Checking homogeneity using R........................................................................................................12
Unit 563 – Dealing with outliers, influential cases and multicollinearity ......................................... 12
Residuals, leverage and influential cases in linear regression ............................................................12
Detecting cases with leverage in R .............................................................................................. 13
Detecting cases with influence in R ............................................................................................. 13
Solutions ................................................................................................................................... 13
Detecting outliers (leverage) & influential cases using R ....................................................................13
Multicollinearity..............................................................................................................................13
Solutions multicollinearity .......................................................................................................... 14
Detecting multicollinearity using R ..................................................................................................14
Detecting multicollinearity ......................................................................................................... 14
VIF in R ...................................................................................................................................... 15
Solution ..................................................................................................................................... 15
Unit 470b – Describing and assessing the relationship between a scale and a dummy .................... 15
Dealing with dummy outcomes .......................................................................................................15
Solving the problem of a dichotomous variable............................................................................ 16
Predicting dichotomous outcomes ..................................................................................................16
Assessing hypotheses about dichotomous outcomes.......................................................................17
Logistic regression in R ...................................................................................................................17
Assessing a logistic regression model: the confusion matrix & the quality of models...........................18
Data: prevalence ........................................................................................................................ 19
Model quality 1: accuracy and error rate ...................................................................................... 19
Model quality 2: specificity ......................................................................................................... 19
Model quality 3: sensitivity.......................................................................................................... 19
Model quality 4: precision ........................................................................................................... 19
Assessing a logistic regression model: explanatory power .................................................................19
Model quality in multivariate regression: (adjusted) R-squared ..................................................... 19
Model quality in logistic regression.............................................................................................. 19
Unit 510 – nonparametric alternative for means testing: Wilcoxon signed-rank test ........................ 20
Introducing non-parametric tests ....................................................................................................20
Ranks and signed ranks ..................................................................................................................21
When using ranks? ..................................................................................................................... 21
Singed ranks .............................................................................................................................. 21
Wilcoxon signed-rank test ...............................................................................................................21
Wilcoxon signed-rank test using R ...................................................................................................22
Unit 545 – a nonparametric alternative for testing the e6ect of a dichotomous variable (Mann-
Whitney-Wilcoxon) or a nominal variable (Kruskal Wallis test) ....................................................... 22
Mann Whitney Wilcoxon test ...........................................................................................................22
Kruskal Wallis test ..........................................................................................................................23
Steps in the test ......................................................................................................................... 23
2
, Unit 550 – Multiple regression addition: the e6ect of two
variables
The e%ect of both a dummy and a ratio variable on another scale (ratio)
variable
Addition: both variables independently e4ect the dependent variable
Y = bo + b2*type + b1*education
Type can be 0 or 1
This simplifies to:
Y = Bo + b1*education
Y = (Bo + b2) + b1*education
Because b1* education is the same in both group à same e4ect
When analyzing data, always check
1. Independent cases condition
2. Random selection of cases
3. (10% condition)
4. ‘Even distribution’ condition
In multiple regression, two types of expectations
Testing if the model as a whole has
some e4ect
à R squared and the F-test
Two variables have or not have an
e4ect
à b-coe4icients and the t-test
3
Contents
Unit 550 – Multiple regression addition: the e6ect of two variables ................................................. 3
The e%ect of both a dummy and a ratio variable on another scale (ratio) variable ................................. 3
Checking residuals .......................................................................................................................... 4
Unit 553 – Multiple regression interaction: the combined e6ect of a ratio variable and one or more
dummies...................................................................................................................................... 4
Interaction/moderation .................................................................................................................... 5
Unit 554 – Multiple regression and non-linearity ............................................................................. 5
Introducing non-linearity .................................................................................................................. 5
Detecting and dealing with non-linearity ........................................................................................... 6
Unit 560 – Non normality of residuals and omitted variables ........................................................... 6
Normality of errors........................................................................................................................... 6
Normality of the residuals............................................................................................................. 7
Non-normality of residuals ........................................................................................................... 7
Why are deviations from normality problematic? ........................................................................... 7
Detecting non-normality of the residuals ....................................................................................... 7
What can be done about it? .......................................................................................................... 7
Studying the normality of errors using QQ plots ................................................................................. 7
Normal Uniform ................................................................................................................... 8
Bimodal ....................................................................................................................................... 8
Skewed right ................................................................................................................................ 8
................................................................................................................................................... 8
Skewed left .................................................................................................................................. 8
Testing for normality: Shapiro-Wilk test ............................................................................................. 9
Shapiro-Wilk test ......................................................................................................................... 9
Fixing the non-normality problem by transforming the y-variable ........................................................ 9
Checking and fixing normality in R .................................................................................................... 9
1. Creating and storing residuals .............................................................................................. 9
2. displaying residuals (histogram) ........................................................................................... 9
3. displaying residuals (QQ plot)............................................................................................... 9
4. testing the normality .......................................................................................................... 10
5. Repairing the non-normality problem.................................................................................. 10
6. Repairing the non-normality problem.................................................................................. 10
Unit 561 – Heteroscedasticity, non-equal variances and interaction e6ects ................................... 10
Homogeneity of errors ....................................................................................................................10
Why is heteroscedasticity “bad”?................................................................................................ 10
Why does unequal error variance occur in a model? .................................................................... 10
How to detect? .......................................................................................................................... 10
How to solve the problem? ......................................................................................................... 11
Testing for homogeneity of errors .....................................................................................................11
Levene’s test .............................................................................................................................. 11
, Levene’s test and Welch t-test .................................................................................................... 11
Breusch Pagan test .................................................................................................................... 11
Take aways ................................................................................................................................ 12
Checking homogeneity using R........................................................................................................12
Unit 563 – Dealing with outliers, influential cases and multicollinearity ......................................... 12
Residuals, leverage and influential cases in linear regression ............................................................12
Detecting cases with leverage in R .............................................................................................. 13
Detecting cases with influence in R ............................................................................................. 13
Solutions ................................................................................................................................... 13
Detecting outliers (leverage) & influential cases using R ....................................................................13
Multicollinearity..............................................................................................................................13
Solutions multicollinearity .......................................................................................................... 14
Detecting multicollinearity using R ..................................................................................................14
Detecting multicollinearity ......................................................................................................... 14
VIF in R ...................................................................................................................................... 15
Solution ..................................................................................................................................... 15
Unit 470b – Describing and assessing the relationship between a scale and a dummy .................... 15
Dealing with dummy outcomes .......................................................................................................15
Solving the problem of a dichotomous variable............................................................................ 16
Predicting dichotomous outcomes ..................................................................................................16
Assessing hypotheses about dichotomous outcomes.......................................................................17
Logistic regression in R ...................................................................................................................17
Assessing a logistic regression model: the confusion matrix & the quality of models...........................18
Data: prevalence ........................................................................................................................ 19
Model quality 1: accuracy and error rate ...................................................................................... 19
Model quality 2: specificity ......................................................................................................... 19
Model quality 3: sensitivity.......................................................................................................... 19
Model quality 4: precision ........................................................................................................... 19
Assessing a logistic regression model: explanatory power .................................................................19
Model quality in multivariate regression: (adjusted) R-squared ..................................................... 19
Model quality in logistic regression.............................................................................................. 19
Unit 510 – nonparametric alternative for means testing: Wilcoxon signed-rank test ........................ 20
Introducing non-parametric tests ....................................................................................................20
Ranks and signed ranks ..................................................................................................................21
When using ranks? ..................................................................................................................... 21
Singed ranks .............................................................................................................................. 21
Wilcoxon signed-rank test ...............................................................................................................21
Wilcoxon signed-rank test using R ...................................................................................................22
Unit 545 – a nonparametric alternative for testing the e6ect of a dichotomous variable (Mann-
Whitney-Wilcoxon) or a nominal variable (Kruskal Wallis test) ....................................................... 22
Mann Whitney Wilcoxon test ...........................................................................................................22
Kruskal Wallis test ..........................................................................................................................23
Steps in the test ......................................................................................................................... 23
2
, Unit 550 – Multiple regression addition: the e6ect of two
variables
The e%ect of both a dummy and a ratio variable on another scale (ratio)
variable
Addition: both variables independently e4ect the dependent variable
Y = bo + b2*type + b1*education
Type can be 0 or 1
This simplifies to:
Y = Bo + b1*education
Y = (Bo + b2) + b1*education
Because b1* education is the same in both group à same e4ect
When analyzing data, always check
1. Independent cases condition
2. Random selection of cases
3. (10% condition)
4. ‘Even distribution’ condition
In multiple regression, two types of expectations
Testing if the model as a whole has
some e4ect
à R squared and the F-test
Two variables have or not have an
e4ect
à b-coe4icients and the t-test
3