(0) Fundamentals in Stats1
● Sampling from a population → simple random, systematic, stratified, cluster, convenience etc
● Descriptive statistics: summarise sample or population data with numbers, tables and graphs
● Inferential statistics: make predictions about population parameters, based on sample data
(1) Associations and Causality
Spurious association: a connection between two variables that appears to be causal but is not
● Correlation ≠ causation
1.1 Criteria for Establishing Causality
John Stuart Mill (1943)
We can only argue that B is caused by A if…
1. There is a relationship between A and B (association)
2. B must take place after A (appropriate time order)
3. The association between A and B is not explained by other factors (elimination of alternative
explanations)
1.1.1 Eliminating Alternative Explanations
→ controlling for other variables, eliminate its effects
Experimental Control: in research design
● Randomised Controlled Trial (RCT) is often considered the gold standard
○ Time-order manipulated (criteria 2)
○ Alternative explanations (partially) excluded through randomisation (criteria 3)
○ Both observable and non-observable characteristics must be equal
Statistical Control: in data-analysis strategy
● Option 1: examine X-Y relationship within subgroups (based on other variables)
→ often unrealistic
● Option 2: include alternative explanations in statistical model
1.2 Multivariate Associations
Involves evaluating multiple variables (more than two) to identify any possible association
among them
● Important to recognise relevant alternative explanations → know your theory
● Adjust your statistical analyses and interpretation accordingly → know your statistics
→ to avoid biassed results due to lurking variables
,Types of Multivariate Associations
1.2.1 Spurious Associations
When both variables are also related to a third variable and the association between X
and Y disappears (mostly) when controlling for the third variable
Eg the association between height and maths skills is fully explained by school grade
→ Consequently, estimated association between variables can change dramatically depending on the
data analysis strategy
, 1.2.2 Suppression
Sometimes, we find (almost) no association between X and Y until we control for a
third variable
→ association between intervention and language skills were suppressed due to intervention group
scoring lower on pretest intervention language skills
1.2.3 Simpson’s Paradox – direct-indirect
Relationship between X and Y is even reversed within levels of a third variable
→ on average, there is a negative association: experienced typists type faster and make fewer typos.
At the individual level, there is a positive experience: the faster you type, the more typos you make
, 1.2.4 Chain Relations – mediation
X1 causes X2, X2 causes Y
→ important to identify the ‘working mechanisms’ of an intervention
1.2.5 Statistical Interaction (Moderation)
The association between X1 and Y differs across levels of X2
Options
● No association between X and Y
○ But it does exist at subpopulation based on X2: eg, positive and negative effects in
subpopulation cancels each other out
● Positive association between X and Y
○ But different strengths or even negative/non-existing within subpopulation based on
X2
● Negative association between X and Y
○ But different strengths or even positive/non-existing within subpopulation based on
X2
Note: the average X-Y association does not necessarily reflect the association in all subpopulations