Class 1
• Mean
• SD: standard deviation measures how spread-out individual data points are around the
mean.
• SE: standard error measures how accurately the sample mean estimates the population
mean.
• 95% confidence interval: range where we are 95% confident where the true population
mean lies.
𝑆𝐷
𝑆𝐸 =
√𝑛
95% CI = 𝑀 ± 1.95𝑆𝐸
Frequency: how often a value occurs in your data.
Mode: value that occurs the most often.
Percentiles: Divide your data into 100 equal parts.
The xᵗʰ percentile is the value below which x% of the data falls.
Q₁ = 25th percentile → 25%
Q₃ = 75th percentile → 75%
IQR = Q₃ – Q₁
Outliers: data points that are unusually far from the rest of the data.
Often defined as below Q₁ – 1.5×IQR or above Q₃ + 1.5×IQR
50th = median
1
,Class 2: inferen.al sta.s.cs
Sta$s$cal Principles
Level of measurements
Categorical
Categorical → qualitaVve numbers represent categories
Nominal / binary
Objects are either idenVcal or different
Do you like apples?
• Yes =1
• No = 0
Univariate staVsVcs: frequency & mode
Ordinal
Objects are greater or smaller
How would you rank order those 3 apples based on your preferences?
Univariate staVsVcs: frequency & mode & median
QuanVtaVve
Interval: differences between consecuVve ranks are equal
Would you eat an apple right now?
• 1 = Not at all
• 5 = Very much
Univariate staVsVcs: frequency, mode, median & mean
Common to treat using scales
RaVo
All properVes of interval, 0 value is meaningful
How many apples did you have in the last 7 days?
Sampling
Sampling error
• StaVsVcs of the sample generally differ from the enVre populaVon
• Sampling errors corresponds to the difference between the sample staVsVc and
populaVon parameter
2
, • Sampling error is always present when making statements about populaVon
• The main strategy for reducing sampling error is increasing the sample size: the larger
the sample, the lower the sampling error
SE is the measure of the sampling error
Sampling bias
• Sampling bias: systemaVc tendency to sampling error
• Self-selecVon bias: parVcipants with specific characterisVcs are more likely to take part in
a study than other
• Pre-screening: the way parVcipants are pre-screened may lead to a sample bias
• Under-coverage: specific members of the populaVon are inadequately represented in the
sample
• Survivorship: successful observaVons are more likely to be represented in the sample
than unsuccessful observaVons
Sampling bias is a threat to external validity because it limits the generalizability of research
findings to the broader populaVon
Sampling techniques
Probability samples
Reduce risk of sampling bias and enhance internal and external validity
Random
StraVfied or representaVve
Non-probability samples
Anyone who passes by can be selected
Inference
InferenVal staVsVcs
H0: ABC = 0
H1: ABC ¹ 0
Example correlaVon coefficient “R”
H0: R = 0
H1: R ¹ 0
The decisions: based on the test results we can decide if
H0 is not rejected ® the effect is not significant
H0 is rejected ® the effect is significant
The decisions criteria
1. P-value
2. 95% confidence interval
3
• Mean
• SD: standard deviation measures how spread-out individual data points are around the
mean.
• SE: standard error measures how accurately the sample mean estimates the population
mean.
• 95% confidence interval: range where we are 95% confident where the true population
mean lies.
𝑆𝐷
𝑆𝐸 =
√𝑛
95% CI = 𝑀 ± 1.95𝑆𝐸
Frequency: how often a value occurs in your data.
Mode: value that occurs the most often.
Percentiles: Divide your data into 100 equal parts.
The xᵗʰ percentile is the value below which x% of the data falls.
Q₁ = 25th percentile → 25%
Q₃ = 75th percentile → 75%
IQR = Q₃ – Q₁
Outliers: data points that are unusually far from the rest of the data.
Often defined as below Q₁ – 1.5×IQR or above Q₃ + 1.5×IQR
50th = median
1
,Class 2: inferen.al sta.s.cs
Sta$s$cal Principles
Level of measurements
Categorical
Categorical → qualitaVve numbers represent categories
Nominal / binary
Objects are either idenVcal or different
Do you like apples?
• Yes =1
• No = 0
Univariate staVsVcs: frequency & mode
Ordinal
Objects are greater or smaller
How would you rank order those 3 apples based on your preferences?
Univariate staVsVcs: frequency & mode & median
QuanVtaVve
Interval: differences between consecuVve ranks are equal
Would you eat an apple right now?
• 1 = Not at all
• 5 = Very much
Univariate staVsVcs: frequency, mode, median & mean
Common to treat using scales
RaVo
All properVes of interval, 0 value is meaningful
How many apples did you have in the last 7 days?
Sampling
Sampling error
• StaVsVcs of the sample generally differ from the enVre populaVon
• Sampling errors corresponds to the difference between the sample staVsVc and
populaVon parameter
2
, • Sampling error is always present when making statements about populaVon
• The main strategy for reducing sampling error is increasing the sample size: the larger
the sample, the lower the sampling error
SE is the measure of the sampling error
Sampling bias
• Sampling bias: systemaVc tendency to sampling error
• Self-selecVon bias: parVcipants with specific characterisVcs are more likely to take part in
a study than other
• Pre-screening: the way parVcipants are pre-screened may lead to a sample bias
• Under-coverage: specific members of the populaVon are inadequately represented in the
sample
• Survivorship: successful observaVons are more likely to be represented in the sample
than unsuccessful observaVons
Sampling bias is a threat to external validity because it limits the generalizability of research
findings to the broader populaVon
Sampling techniques
Probability samples
Reduce risk of sampling bias and enhance internal and external validity
Random
StraVfied or representaVve
Non-probability samples
Anyone who passes by can be selected
Inference
InferenVal staVsVcs
H0: ABC = 0
H1: ABC ¹ 0
Example correlaVon coefficient “R”
H0: R = 0
H1: R ¹ 0
The decisions: based on the test results we can decide if
H0 is not rejected ® the effect is not significant
H0 is rejected ® the effect is significant
The decisions criteria
1. P-value
2. 95% confidence interval
3