Statistics – Complete Notes
By Rujul Nayak (University of Cambridge O er-Holder)
Section 1: Data Collection 2
Section 2: Data Presentation 3
Section 3: Data Interpretation 7
Section 4: Basics of Probability 9
Section 5: The Binomial Distribution 9
Section 6: The Normal Distribution 10
Section 7: Hypothesis Testing 11
ff
, OCR A Level Mathematics A Notes by Rujul Nayak
Section 1: Data Collection
When obtaining data about a population (a group of individuals or items that you intend to study),
it may be impractical to survey every member of the population (this is called a census). Therefore
we have to use sampling techniques to choose a subset of the population (a sample) to study.
⌊k⌋
n
For each method, let n be the size of the population and k the size of the sample. Also a = .
The easiest method is opportunity sampling, where members are surveyed based on their
availability to take part (e.g. by selecting the rst k people to walk into the room). This can lead to
many issues with bias.
An alternative method is simple random sampling: the members of the population are ordered
from 1 to n, and then random numbers are chosen to determine the sample (ignoring repeats).
While this eliminates bias, it may be time consuming for large populations and sample sizes.
These issues may be xed by systematic sampling. The population is again ordered from 1 to n,
and then a single random number is chosen from 1 to a. That numbered member, and then every
ath member after that, is chosen as the sample until k members have been chosen. For example,
with n = 100 and k = 7, if the random number chosen was 9 then the sample would consist of
members 9, 23, 37, 51, 65, 79, 93. Here, only one random number needs to be chosen, but the
choices are not independent of each other.
In strati ed sampling, the population is split into groups (called “strata”, singular “stratum”)
based on characteristics such as age or gender. Members from each stratum are then sampled
such that the proportion of the sample that falls into a particular stratum is the same as the
proportion of the full population that falls into that stratum. This allows all groups of the population
to be represented in the sample. Note that a di erent sampling method such as systematic
sampling must be used to choose the members of each strata to add to the sample.
A similar method is quota sampling, where instead of assigning proportions of the sample based
on proportions of the characteristic in the whole population, a quota is set by the researchers for
each stratum (e.g. 60% should be female). Furthermore, non-random methods like opportunity
sampling are often used as the sampling method within each strata, so this often less rigorous
than strati ed sampling.
Cluster sampling is a method where the population is instead split into “clusters” based on
factors like geography, such that each cluster is a reasonable representation of the entire
population. However, instead of sampling from every cluster, a number of clusters are chosen,
and then either some or all of the members in those clusters are surveyed (the other clusters are
ignored). This can be more convenient than other methods if the population is spread out over a
large geographical area.
Page 2 of 14
fi fi fi fi ff
, OCR A Level Mathematics A Notes by Rujul Nayak
Section 2: Data Presentation
Data may be either continuous (could take any numerical value in a range) or discrete (one of a
nite number of groups). We can represent such data in a number of ways. Note that some
methods only work with discrete data, so we would need to split continuous data into groups in
order to use those methods.
The simplest way of representing data is using a frequency table. Here, each group (or interval of
the continuous data) is shown on the left next to its frequency on the right.
The rst table below shows some data about the colours of 60 cars seen on the road (discrete),
and the second one shows some data about the masses of 75 apples (grouped continuous).
Colour Frequency Mass, m grams Frequency
White 17 60 ≤ m < 80 14
Black 15 80 ≤ m < 95 13
Silver 14 95 ≤ m < 105 22
Blue 10 105 ≤ m < 120 19
Red 4 120 ≤ m < 140 7
Note that the groups don’t have to be of equal sizes for the continuous data.
Bar charts and vertical line graphs both show the frequency of each group of data as a vertical
bar (with the height of each bar proportional to the frequency). Bar charts have thicker bars while
vertical line graphs have thin lines. Also, bar charts are often used to represent qualitative (non-
numerical) data, while vertical line graphs are used to represent quantitative (numerical) groups.
The bar chart below shows average monthly temperatures in London and New York. As there are
two series, the two bars for each month are placed next to each other.
30
25
Average Temperature /°C
20
15
10
5
0
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
London New York
Page 3 of 14
fi
fi