Table of Contents
Chapter 2 – The Spine of Statistics........................................................................................................................................... 2
Chapter 6 – The Beast of Bias................................................................................................................................................... 3
Chapter 8 – Correlation............................................................................................................................................................ 5
Chapter 9 – The Linear Model (Regression).............................................................................................................................. 6
Chapter 11 – Moderation, Mediation, and Multicategory Predictors........................................................................................ 9
Chapter 12 – GLM 1: Comparing Several Independent Means................................................................................................. 11
Chapter 13 – GLM 2: Comparing Means Adjusted for Other Predictors...................................................................................13
Chapter 14 – GLM 3: Factorial Designs.................................................................................................................................... 15
Chapter 15 – GLM 4: Repeated-Measures Designs.................................................................................................................. 17
Chapter 16 – GLM 5: Mixed Designs....................................................................................................................................... 20
Assumptions + Violations....................................................................................................................................................... 22
Effect Sizes............................................................................................................................................................................. 26
Null hypotheses..................................................................................................................................................................... 27
Contrasts............................................................................................................................................................................... 28
Lectures................................................................................................................................................................................. 29
1
, Statistics, 2019 | Field 5th ed. | Romy V.
Chapter 2 – The Spine of Statistics
Field
The standard error
The standard error of the mean is the standard deviation of sample means. As such, it is a
measure of how representative of the population a sample mean is likely to be. A large
standard error (relative to the sample mean) means that there is a lot of variability
between the means of different samples and so the sample mean we have might not be
representative of the population mean. A small standard error indicates that most sample
means are similar to the population mean (i.e., our sample mean is likely to accurately
reflect the population mean).
Confidence intervals
A confidence interval for the mean is a range of scores constructed such that the
population mean will fall within this range in 95% of samples.
The confidence interval is not an interval within which we are 95% confident that the
population mean will fall.
Null hypothesis significance testing (NHST)
NHST is a method for assessing scientific theories. The basic idea is that we have two
competing hypotheses: one says that an effect exists (the alternative hypothesis) and the
other says that the effect doesn’t exist (the null hypothesis). We compute a test statistic
that represents the alternative hypothesis and calculate the probability that we would get
a value as big as the one we have if the null hypothesis were true. If this probability is less
than 0.05 we reject the idea that there is no effect and say that we have a statistically
significant finding. If the probability is greater than 0.05 we do not reject the idea that
there is no effect, and we say that we have a non-significant finding.
We can make two types of error: we can
believe that there is an effect when, in reality,
there isn’t (a Type I error); and we can believe
that there is not an effect when, in reality, there
is (a Type II error).
The power of a statistical test is the probability
that it will find an effect when one exists.
The significance of a test statistic is directly linked to the sample size: the same effect will
have different p-values in different-sized samples, small differences can be deemed
‘significant’ in large samples, and large effects might be deemed ‘non-significant’ in small
samples.
Exercises
The SSERROR is a ‘total’ and is, therefore, affected by the number of data points. The
variance is the ‘average’ variability but units squared. The standard deviation is the
average variation (variance) but converted back to the original units of measurement. As
such, the size of the standard deviation can be compared to the mean (because they are
in the same units of measurement).
2
, Statistics, 2019 | Field 5th ed. | Romy V.
Chapter 6 – The Beast of Bias
Field
Skewness and kurtosis
To check that the distribution of scores is
approximately normal, look at the values of
skewness and kurtosis in the output.
Positive values of skewness indicate too many low
scores in the distribution (skewed to the right),
whereas negative values indicate a build-up of high
scores (skewed to the left).
Positive values of kurtosis indicate a heavy-tailed distribution,
whereas negative values indicate a light-tailed distribution.
The further the value is from zero, the more likely it is that the
data are not normally distributed.
You can convert these scores to z-scores by dividing by their
standard error. If the resulting score (when you ignore the
minus sign) is greater than 1,96 then it is significant (p < 0.05).
Significance tests of skew and kurtosis should not be used in large samples (because they
are likely to be significant even when skew and kurtosis are not too different from normal).
Normality tests
The Kolmogorov-Smirnov (K-S) test can be used (but shouldn’t be) to see if a distribution
of scores significantly differs from a normal distribution.
If the K-S test is significant (Sig. in the SPSS table is less than 0.05) then the scores are
significantly different from a normal distribution. Otherwise, scores are approximately
normally distributed.
The Shapiro-Wilk test does much the same thing, but it has more power to detect
differences from normality (so this test might be significant when the K-S test is not).
Warning: In large samples these tests can be significant even when the scores are only
slightly different from a normal distribution. Therefore, I don’t particularly recommend
them, and they should always be interpreted in conjunction with histograms, P-P 1 or Q-Q2
plots, and the values of skew and kurtosis.
Homogeneity of variance
Homogeneity of variance/homoscedasticity is the assumption that the spread of outcome
scores is roughly equal at different points on the predictor variable.
The assumption can be evaluated by looking at a plot of the standardized predicted values
from your model against the standardized residuals (zpred vs. zresid).
When comparing groups, this assumption can be tested with Levene’s test and the
variance ratio (Hartley’s Fmax).
1
A probability plot for assessing how closely two data sets agree, which plots the two cumulative distribution
functions against each other. Used to evaluate the skewness of a distribution.
2
A probability plot, which is a graphical method for comparing two probability distributions by plotting their
quantiles against each other. Also used to compare the shape of distributions.
3
, Statistics, 2019 | Field 5th ed. | Romy V.
o If Levene’s test is significant (Sig. in the SPSS table is less than 0.05) then the
variances are significantly different in different groups. Otherwise, homogeneity of
variance can be assumed.
o The variance ratio is the largest group variance divided by the smallest. This value
needs to be smaller than the critical values in the additional material.
Warning: There are good reasons not to use Levene’s test or the variance ratio. In large
samples they can be significant when group variances are similar, and in small samples
they can be non-significant when group variances are very different.
Exercises
An effect size is an objective and standardized measure of the magnitude of an observed
effect. Measures include Cohen’s d, Pearson’s correlations coefficient r, and eta2.
o An important advantage of effect sizes is that they are not directly affected by sample
size. In contrast, p-values tend to get smaller (for a given effect size) as the sample
size increases.
Effect sizes are based on the standard deviation (e.g., Cohen’s d expresses the difference
between two group means in units standard deviation), whereas the test statistics divide
the raw effect by the standard error. Thus, small effects can be statistically significant as
long as the sample is large. As a consequence, statistically significant effects are not
always practically relevant. It is recommended to report p-values, CI’s and effect size,
because the three measures provide complementary information.
Power is the probability that a test will detect an effect of a particular size (a value of 0.8 is
a good level to aim for).
4