Inhoudsopgave
| Lecture 1 – Basic stats...........................................................................................................................................2
| Lecture 2: Introduction and Linear Regression.................................................................................................5
| Lecture 3 – Refreshing Linear Regression........................................................................................................10
| Lecture 4: Multiple regression...........................................................................................................................12
| Lecture 5: Regression with categorical predictors...........................................................................................15
| Lecture 6: Regression Assumptions...................................................................................................................19
| Lecture 7: How do we approach causality?......................................................................................................23
| Lecture 8: Mediation...........................................................................................................................................27
| Lecture 9: Moderation and interaction.............................................................................................................30
| Lecture 10: Moderation with PROCESS ().......................................................................................................33
| Lecture 11 & 12: Logistic regression.................................................................................................................35
,| Lecture 1 – Basic stats
Two types of statistics
1. Descriptive
2. Inferential statistics (central limit theorem & null hypothesis)
Descriptive statistics
We want to summarize the variables in as little information as possible, while
providing enough information to understand it. We use different methods for
each type of variable
There are two levels of measurement:
1. Continuous
Interval: numeric without a meaningful 0
Ratio: numeric with a meaningful 0
2. Categorical
Nominal: categories without a rank order
Ordinal: categories with a rank order
Levels of Eigenschappen
measurement
Categoriaal/ Nominal – namen (geen Category without order,
Qualitative volgorde, geslacht, kleur no meaningful intervals,
ogen, etc) and no true zero.
Ordinal – order Category with order. You
(volgorde, maar geen do not know the exact
duidelijke afstand, differences between
opleiding) categories. Calculating a
mean is not meaningful.
Quantitative Interval – interval Ordered with equal
tussen waarden (gelijke intervals, but no true
afstanden zonder 0, zero
temperatuur)
Ratio – begint bij 0 Ordered with equal
(gelijke afstanden, intervals and a
lengte, gewicht leeftijd) meaningful zero
THIS IS IMPORTANT!!!
, Type of
Goal Type of chart Properties
data
Bar chart / Pie chart –
Shows the number of
observations per
Categorical Nominal Compare frequencies
category; illustrates the
proportion between
categories
Cross-table – Shows the
relationship between two
or more categorical
See the relationship
variables by displaying
between two variables
the frequencies of
different combinations of
values
Bar chart – Shows the
Categorical Ordinal Show distribution number of observations
per category
Relationship
Categorical Bar chart – Shows the
between two
× number of observations
ordinal
Categorical per category
variables
Histogram – For
continuous data such as
Quantitativ Interval / Show spread or
height or temperature;
e Ratio distribution
shows how values are
distributed
Scatterplot – Shows the
relationship between two
Quantitativ Relationship
quantitative variables by
e× between two
plotting points on a graph,
Quantitativ quantitative
with one variable on the
e variables
x-axis and the other on
the y-axis
Descriptive statistics – categorial
Frequency table: A list of all
possible options for a variable,
along with the observations for
each option. You can use both
absolute and relative
frequencies.
In the frequency table you can
see what is the most common
and what is the less common.
Cross-tables (kruistabel): The
relationship between two or more
, categorical variables by showing the frequencies of different combinations of
values.
Row proportions: It indicates the
proportion of values in a row that belong to
each category, often expressed as a
percentage.
Colomn proportions: Column
proportions indicate the
proportion of values in a column
that belong to each category,
often expressed as a percentage.
It shows how the categories are
distributed within each column
rather than across rows.
When you use descriptive
statistics for continuous data, focus on three key aspects:
1. Central Tendency – Indicates the “center” of the data
or the most typical value. Common measures: mean
and median.
Mean: gemiddelde
Median: middle number
Mode: most common
2. Variability – Shows how much the values differ from
each other. Common measures: standard deviation
and quantiles.
3. Distribution Shape – Examines the form of the data
distribution. Check for normality, skewness, or the
presence of tails using histograms or density plots.
A histogram can have a bell shape if the data is evenly
distributed. However, it can also be skewed to the right, in
which case the bars are higher on the left and lower on the
right. In this case, the distribution follows: mode < median
< mean. If the distribution is skewed to the left, the bars are higher on the right
and lower on the left, and the distribution follows: mean < median < mode
Central limit theorem: When the sample size is large enough (at least n = 30)
and the samples are randomly drawn, they will always be approximately normally
distributed. This holds true even if the original distribution of the data is not
normal.
Sampling distribution: The mean of your sample. The assumption of a
normally distributed sampling distribution is that the samples are randomly
drawn. This assumption is necessary for inferential statistics. However, it is
important to realize that a fully randomized sample is rarely achieved in practice
in social science research. You get a sample 𝑛 from the population 𝑁. The SD will
be smaller than the population SD (standard error).
Inferential statistics