Statistics 1:
Population and Sample 2
Central Limit Theorem 2
Standard Error = σ/ n 3
Sampling: 3
Variables: measurement levels 4
Formula: For a population 4
Formula: For a sample: 4
Measures of Centrality 4
Rule of Thumb: Confidence interval 5
Symbols and meanings 5
Common Charts 6
Inferential statistics 6
Overview of Inference 6
Classical Hypothesis Testing 6
Confidence interval 7
Prob-value (or p-value) 7
Distribution 7
Step-by-Step to Assess Distribution 8
Exploratory Data Analysis (EDA) 8
Z-testing and T-testing 9
Z-testing 9
T-Testing 9
What is the t-Statistic? 9
Steps for Z- and T-Test 10
Key Differences 11
Kind of Tests 12
Summary Table: Must-Know Tests 12
HOW TO ANSWER: 13
General Approach 13
Difference between errors: 13
Non-parametric methods 14
Types of Non-parametric Tests 14
Sign Test 15
Steps: 15
Wilcoxon-Signed-Rank Test 15
Mann-Whitney Test 16
Requirements: 16
Binomial Test 17
Steps to Perform a Binomial Test 18
Special Notes 18
, 1
Population and Sample
Population The entire set of elements
Sample A subset of elements from the population, taken with the intention of making
inferences about the population
Parameter Numerical property of the population
Statistic Numerical property of a sample
Variability The phenomenon whereby repeated sampling from the same population results
in different values for the statistic
Sampling distribution
Describes how the statistic varies when sampling is repeated, in other words: describes (extent
of) variability. This is the basis for inference
Central Limit Theorem
Even if a variable X is not normally distributed in the population, the Central Limit Theorem
tells us that:
- The sampling distribution of the mean will be approximately normal if the sample size is
large enough and the standard deviation (σ) is fixed.
Standard Error = σ/ 𝑛
- where σ is the population standard deviation, and n is the sample size.
This means we can use normal distribution methods for the sample mean, even when the
population data is not normally distributed.
Sampling Bias: It occurs when the method of collecting your sample makes it more likely to
include individuals with certain characteristics from the population.
Sampling:
Method Example Best For
Systematic Survey every 5th customer in a queue Simple and evenly spaced samples;
avoids random clustering.
Cluster Survey all students in 3 randomly The population is divided into clusters, and
chosen schools a random selection of entire clusters is
made. Then, all individuals within the
selected clusters are surveyed.
, 2
Stratified Survey 10% from each income The population is divided into strata
bracket (groups based on specific characteristics,
like age or income)
Other types of sampling
- Simple Random Sampling: Every member of the population has an equal chance of
being selected.
- Convenience Sampling: Selection based on ease of access, without randomization,
which may lead to bias.
- Quota Sampling: Non-random selection to ensure the sample reflects certain
characteristics of the population.
- Snowball Sampling: Existing study subjects recruit future subjects from among their
acquaintances, useful for hard-to-reach populations.
Types of data:
1. Qualitative: Non-numerical values
2. Quantitative: Numerical values (counts, measurements)
- Discrete: Range of possible values is limited
- Continuous: Intermittent values (that occur at irregular intervals) are also possible.
Variables: measurement levels
Scale Characteristics Examples
Nominal Categorical, no order or ranking Eye color, car types,
koppen climate
Ordinal Categorical, ranked; differences between ranks Education levels,
unknown satisfaction, beaufort
scale
Ratio/Interval Measurable, equal intervals; allows meaningful Temperature K, height,
comparisons (0 means none) income
Interval Does not exist (0 doesn’t mean “none”). Temperature (°C or °F),
IQ scores, calendar
years.
Binary variable (a.k.a.: Dummy, or Boolean):
Is a type of variable that can take on only two possible values.
- True or not true, yes or no, 1 or 0
- Special case of a nominal variable: Mean = proportion of “1”
Population and Sample 2
Central Limit Theorem 2
Standard Error = σ/ n 3
Sampling: 3
Variables: measurement levels 4
Formula: For a population 4
Formula: For a sample: 4
Measures of Centrality 4
Rule of Thumb: Confidence interval 5
Symbols and meanings 5
Common Charts 6
Inferential statistics 6
Overview of Inference 6
Classical Hypothesis Testing 6
Confidence interval 7
Prob-value (or p-value) 7
Distribution 7
Step-by-Step to Assess Distribution 8
Exploratory Data Analysis (EDA) 8
Z-testing and T-testing 9
Z-testing 9
T-Testing 9
What is the t-Statistic? 9
Steps for Z- and T-Test 10
Key Differences 11
Kind of Tests 12
Summary Table: Must-Know Tests 12
HOW TO ANSWER: 13
General Approach 13
Difference between errors: 13
Non-parametric methods 14
Types of Non-parametric Tests 14
Sign Test 15
Steps: 15
Wilcoxon-Signed-Rank Test 15
Mann-Whitney Test 16
Requirements: 16
Binomial Test 17
Steps to Perform a Binomial Test 18
Special Notes 18
, 1
Population and Sample
Population The entire set of elements
Sample A subset of elements from the population, taken with the intention of making
inferences about the population
Parameter Numerical property of the population
Statistic Numerical property of a sample
Variability The phenomenon whereby repeated sampling from the same population results
in different values for the statistic
Sampling distribution
Describes how the statistic varies when sampling is repeated, in other words: describes (extent
of) variability. This is the basis for inference
Central Limit Theorem
Even if a variable X is not normally distributed in the population, the Central Limit Theorem
tells us that:
- The sampling distribution of the mean will be approximately normal if the sample size is
large enough and the standard deviation (σ) is fixed.
Standard Error = σ/ 𝑛
- where σ is the population standard deviation, and n is the sample size.
This means we can use normal distribution methods for the sample mean, even when the
population data is not normally distributed.
Sampling Bias: It occurs when the method of collecting your sample makes it more likely to
include individuals with certain characteristics from the population.
Sampling:
Method Example Best For
Systematic Survey every 5th customer in a queue Simple and evenly spaced samples;
avoids random clustering.
Cluster Survey all students in 3 randomly The population is divided into clusters, and
chosen schools a random selection of entire clusters is
made. Then, all individuals within the
selected clusters are surveyed.
, 2
Stratified Survey 10% from each income The population is divided into strata
bracket (groups based on specific characteristics,
like age or income)
Other types of sampling
- Simple Random Sampling: Every member of the population has an equal chance of
being selected.
- Convenience Sampling: Selection based on ease of access, without randomization,
which may lead to bias.
- Quota Sampling: Non-random selection to ensure the sample reflects certain
characteristics of the population.
- Snowball Sampling: Existing study subjects recruit future subjects from among their
acquaintances, useful for hard-to-reach populations.
Types of data:
1. Qualitative: Non-numerical values
2. Quantitative: Numerical values (counts, measurements)
- Discrete: Range of possible values is limited
- Continuous: Intermittent values (that occur at irregular intervals) are also possible.
Variables: measurement levels
Scale Characteristics Examples
Nominal Categorical, no order or ranking Eye color, car types,
koppen climate
Ordinal Categorical, ranked; differences between ranks Education levels,
unknown satisfaction, beaufort
scale
Ratio/Interval Measurable, equal intervals; allows meaningful Temperature K, height,
comparisons (0 means none) income
Interval Does not exist (0 doesn’t mean “none”). Temperature (°C or °F),
IQ scores, calendar
years.
Binary variable (a.k.a.: Dummy, or Boolean):
Is a type of variable that can take on only two possible values.
- True or not true, yes or no, 1 or 0
- Special case of a nominal variable: Mean = proportion of “1”