➔Introduction Lecture:
Aim of the course:
1. Prepare use for the use of statistics in practical work and for the period after
2. Learn how to apply the most commonly used statistical analysis techniques in a
responsible way (i.e. checking underlying assumptions)
3. At the end, you should also be better able to judge the statistical facets of research
carried out by others.
Why to we need advanced techniques in statistics:
- Multivariable regression models can be used to adjust and control for
confounding and effect modification.
- Multivariable = Multiple = Many Independent variables
- Some research questions involve:
Course content:
- ANOVA and ANCOVA
- Linear regression
- Logistic regression
- Survival analysis
- Linear marginal and multilevel models
NB: These techniques are not only tests, they are statistical models!
Statistical models:
- A statistical model is NOT a mere test, but a mathematical “simplification or
approximation of reality”.
- George Box said that “All models are wrong, but some are useful”.
- There are three possible purposes for a statistical model:
1. Prediction = to accurately predict an outcome from a set of predictors →
Prognostic research
, 2. Explanation = to estimate (causal) effects of risk factors by means of
adjusted effect estimation → etiological research
3. Description = to capture the association between outcome and independent
variables → lack of (formal) underlying causal theory.
How to choose the appropriate statistical technique: Given a research question:
1. Identify the dependent variable
2. Identify the independent variable
3. Determine the nature of each variable:
- Level of measurement:
1. Nominal: Unordered categories
2. Ordinal: Ordered categories
3. Interval: numerical variables where the intervals between values
are meaningful, but no absolute zero
4. Ratio: numerical variables with equal intervals between values
and a true absolute zero.
- How many within/between subject factors:
1. Within subject factor: For each subject, the outcome is measured
repeatedly under different levels or conditions of the factor (e.g.
time, experimental conditions)
2. Between-subject factors: For each subject, the outcome is
measured under only one level or condition of the factor (e.g.,
exposure in cross-sectional study, treatment group in randomized
control trial).
Choice of statistical technique:
- The choice of statistical technique depends on the type of variables we have.
, ➔ANOVA and ANCOVA Theoretical Lecture
Outline of this topic:
1. Two-sample T-test
2. One way ANOVA
3. Multi way ANCOVA
4. ANCOVA
The statistical technique to be used mainly depends on:
1. How the study is designed
2. The measurement level of stress/heartbeat
Example with a simple research question: Does the stress level affect heartbeat?
Heartbeat (bpm) measures in two groups: low and high stressed people. Which statistical
technique should we use to analyze the data?
- Two sample T test
What if we define 3 levels of stress (e.g., low, medium, high)?
- One way ANOVA
What if we consider regular physical exercise?
- Multi-way ANOVA
What if we adjust for age (years)?
- ANCOVA
Part 1: Two Sample T-test:
- Two Sample T-test is about comparing means between 2 groups
- Here we look at the mean difference:
, Two-sample T-test: Assumptions:
1. Independence: Observations are independent within and between groups (e.g., no
repeated measures of heartbeat per participant).
2. Normality: The dependent variable follows (approximately) a normal distribution
within each group.
3. Equal variances. The variance of the dependent variable is (approximately) the
same in the two groups (only for pooled T-test).
4. The dependent variable should be of at least interval scale level.
Statistical hypothesis testing:
1. State the null and alternative hypothesis
2. Collect the data and provide a summary
3. Choose a level of uncertainty
4. Compute the rest statistic
5. Take a decision
6. Draw a conclusion
→ The assumptions underlying the test should be evaluated before drawing
conclusions!