Edexcel GCSE (9-1)
Statistics Paper 2
Higher Tier Mastery
PART 0: THE NAVIGATOR
● PART I: THE PRIMER
○ Welcome to the Big Leagues
○ The "Critical Action" Cheat Sheet
● PART II: THE ELITE TEST BANK
○ Questions 1–28: Foundational Syntax & Application
○ Questions 29–58: Professional Simulation
○ Questions 59–88: Grandmaster Synthesis
PART I: THE PRIMER
Welcome to the big leagues. This test bank intercepts high-stakes procedural and conceptual
errors, forging your raw mathematical ability into top-tier professional statistical intuition for the
2026/2027 landscape. By utilizing this protocol, you will cease rote memorization and begin
dynamically evaluating why data behaves the way it does, effectively eliminating amateur
analytical traps and integrating directly with advanced UT Austin STA 301 data science
standards.
The "Critical Action" Cheat Sheet:
● Histograms (The Area Rule): Frequency Density = Frequency / Class Width. The area of
the bar represents the frequency, never the height.
● Cumulative Frequency (The Upper Bound Rule): Always plot points at the strict upper
bound of the class interval. Plotting at the midpoint is a catastrophic procedural failure.
● Outlier Detection (The 1.5 IQR Rule): An outlier is strictly any value below Q1 - 1.5(IQR)
or above Q3 + 1.5(IQR).
● Probability Syntax: Mutually Exclusive means events cannot happen simultaneously
(P(A \cap B) = 0). Independent means one event does not affect the probability of another
(P(A|B) = P(A)).
● Standardisation: Z-scores (z = \frac{X - \mu}{\sigma}) strip away raw units, comparing
performance strictly relative to the internal standard deviation of the specific cohort.
,PART II: THE ELITE TEST BANK
Questions 1–28: Foundational Syntax & Application
Q1: A practitioner is designing a database for a 2026 Kenyan demographic study, specifically
recording the exact land area (in square kilometers) of agricultural plots in Kajiado County.
Which term BEST describes this type of data? A) Qualitative, ordinal, and discrete. B)
Quantitative, continuous, and univariate. C) Quantitative, discrete, and bivariate. D) Qualitative,
categorical, and continuous.
● The Answer: B (Quantitative, continuous, and univariate.)
● Distractor Analysis:
○ A is incorrect: Area is numerical (quantitative), not descriptive.
○ C is incorrect: Area can theoretically be measured to infinite decimal places
(continuous), not just whole numbers (discrete). Only one variable is tracked here
(univariate).
○ D is incorrect: Categorical data cannot be continuous.
The Mentor's Analysis: Data classification sets the hard deck for your methodology. Land
area, like time or weight, is infinite in its precision depending on your GPS or surveying
instrument, making it strictly continuous. Professional Intuition: If you can theoretically add
decimal places forever, the data is continuous.
Q2: A retail auditor in Nairobi surveys the first 50 customers who exit a store on a Tuesday
morning regarding their 2026 purchasing habits. Which sampling methodology is the MOST
ACCURATE classification of this approach? A) Simple random sampling. B) Quota sampling. C)
Opportunity (convenience) sampling. D) Stratified random sampling.
● The Answer: C (Opportunity (convenience) sampling.)
● Distractor Analysis:
○ A is incorrect: There is no equal-probability selection mechanism utilized across the
entire customer base.
○ B is incorrect: No sub-group demographic targets (quotas) were established prior to
the survey.
○ D is incorrect: The population was not proportionally divided into non-overlapping
strata before selection.
The Mentor's Analysis: Selecting whoever is immediately available is convenience sampling. It
is inexpensive but injects massive selection bias (e.g., ignoring weekend shoppers).
Professional Intuition: Operational ease usually comes at the direct cost of statistical
representativeness.
Q3: A statistician at UT Austin is formulating a study for STA 301 to investigate the relationship
between algorithmic processing time and the volume of synthetic data ingested. Which
statement represents the MOST APPROPRIATE hypothesis? A) "Larger datasets take longer."
B) "As the volume of synthetic data ingested increases, the algorithmic processing time tends to
increase." C) "Data volume and processing time will be recorded and plotted on a scatter
diagram." D) "There is no correlation between the type of algorithm and the server temperature."
● The Answer: B ("As the volume of synthetic data ingested increases, the algorithmic
processing time tends to increase.")
● Distractor Analysis:
○ A is incorrect: This is too vague and lacks formal comparative structure.
, ○ C is incorrect: This describes a methodology, not a testable statistical prediction.
○ D is incorrect: This tests the wrong variables, entirely ignoring the stated objective.
The Mentor's Analysis: A valid hypothesis explicitly defines the expected relationship between
the explanatory variable and the response variable. Professional Intuition: A hypothesis is a
declarative prediction of a relationship, not a question or a summary of your actions.
Q4: A 2026 clinical questionnaire asks: "How often do you utilize telehealth services? [ ] Rarely [
] Sometimes [ ] Frequently." What is the PRIMARY flaw in this question design? A) It uses
overlapping response boxes. B) It lacks an option for "Never." C) It uses subjective, undefined
timeframes. D) It asks a leading question that introduces bias.
● The Answer: C (It uses subjective, undefined timeframes.)
● Distractor Analysis:
○ A is incorrect: The options provided do not numerically overlap.
○ B is incorrect: While "Never" is missing, the more critical structural failure is the lack
of objective metric.
○ D is incorrect: The question is neutrally phrased.
The Mentor's Analysis: "Sometimes" to a chronically ill patient means three times a month; to
a healthy individual, it means once a year. Undefined metrics yield unanalyzable qualitative
data. Professional Intuition: Always force quantitative boundaries (e.g., "1-2 times per month")
to eliminate respondent subjectivity.
Q5: In a 2026 study evaluating the efficacy of a new AI-driven agritech fertilizer, 100 plots are
given the new mixture, while another 100 plots are given standard water. What is the PRIMARY
statistical purpose of the second group of plots? A) To act as an explanatory variable. B) To act
as a control group to establish a baseline for comparison. C) To increase the overall sample size
to n=200. D) To utilize the matched pairs technique.
● The Answer: B (To act as a control group to establish a baseline for comparison.)
● Distractor Analysis:
○ A is incorrect: The explanatory variable is the presence of the fertilizer.
○ C is incorrect: While the sample size is larger, the specific function of the
unmedicated group is to control extraneous environmental variables.
○ D is incorrect: Matched pairs require linking specific individuals across groups,
which is not indicated here.
The Mentor's Analysis: A control group isolates the impact of the independent variable.
Without a baseline, you cannot prove the fertilizer caused the crop yield rather than an
unusually rainy season. Professional Intuition: If you cannot isolate the variable, you cannot
validate the causation.
Q6: A researcher wishes to survey 500 citizens in Kajiado County. They divide the population by
Sub-County (Isinya, Loitokitok, Mashuuru) and randomly select a sample from each Sub-County
that is strictly proportional to its actual population size. Which sampling technique is EXACTLY
being utilized? A) Systematic sampling. B) Quota sampling. C) Stratified random sampling. D)
Cluster sampling.
● The Answer: C (Stratified random sampling.)
● Distractor Analysis:
○ A is incorrect: Systematic sampling selects every n-th item from a comprehensive
list.
○ B is incorrect: Quota sampling is non-random; the interviewer chooses who fills the
demographic quota.
○ D is incorrect: Cluster sampling randomly selects entire pre-existing geographical
groups, not proportional samples from within all groups.