Mean is the average: x̄ = sum of values / number of values OR x̄ = Σx / n. Data is numeric and may not contain outliers.
Median is the midpoint: n + ; data is numeric and may contain outliers.
Mode is the most frequent: data is either numeric or categorical and may contain outliers.
IQR (Q2) is equal to the difference between 75th and 25th percentiles: Q 3 – Q1.
Range contains the max – min values.
Outliers: Q1 – 1.5 x IQR < x < Q3 + 1.5 x IQR
Categorical data: qualities and words; ordinal: characteristics in order (eye colour), nominal: characteristics without
order (star rating). Include frequency tables, column/bar graphs, and dot plots.
Numerical data: quantities and numbers; discrete: only specific values such as whole numbers (test scores),
continuous: any value (height of a tree). Include frequency tables, dot plots, stem & lea plots, boxplots, and histograms
Explanatory variable: independent or x variable
Response variable: dependent or y variable
Form: linear, non-linear, or no relationship
Direction: positive or negative
Strength: strong, moderate, weak
Perfect ± relationship: ±1
Strong ± relationship: ±0.75 to ±1; it can be concluded that y inc/dec as x increases
Moderate ± relationship: ±0.5 to ±0.75; there is some evidence to suggest that y inc/dec as x increases
Weak ± relationship: ±0.25 to ±0.5; there is limited evidence to suggest that y inc/dec as x increases
No relationship: –0.25 to 0.25
Correlation coefficient (r) determines direction and strength and is only reliable if data is linear with no outliers.
The coefficient of determination gives the percentage change in the RV that can be explained by the DV. The higher
the r2, the stronger the relationship between x and y, and the better the regression line fits.
ex. if r = -0.42, r2 = (-0.42)2; 17% of change in y can be explained by change in x, and 83% is unexplained.
Least squared regression line: y = a + bx, eg. height = 100 + 2.5 x age. This can be used to make predictions; ex.
what is the predicted height for an eight-year-old? 100 + 2.5 x 8 = 120
ex. interpret the slope for the equation p = 0.5 + – 31.4w. for every 1 unit increase in w (RV), p (EV) increases by 0.5.
Interpolation: prediction made within the original range of the data given; more reliable.
Extrapolation: prediction made outside the original range of the data given; less reliable.
Residuals: the difference between the actual value and the predicted value. A positive residual indicates the predicted
value is below the actual result; an underestimate, and a negative residual indicates the predicted value is above the
actual result; an overestimate. ex. calculate the residual of a 12 year old who is 142 cm tall. 100 + 2.5 x 12 = 130 →
142 – 130 = 12; positive residual as it is above the line
Sample questions
Comment on the appropriateness of fitting a linear model. If there is no clear pattern in the residual, it is appropriate.
Comment on the validity of the prediction. State whether it is interpolation or extrapolation.
X has a residual of ±2.6. what information does this provide about the EV? X is 2.6% below/above predicted EV.
Median is the midpoint: n + ; data is numeric and may contain outliers.
Mode is the most frequent: data is either numeric or categorical and may contain outliers.
IQR (Q2) is equal to the difference between 75th and 25th percentiles: Q 3 – Q1.
Range contains the max – min values.
Outliers: Q1 – 1.5 x IQR < x < Q3 + 1.5 x IQR
Categorical data: qualities and words; ordinal: characteristics in order (eye colour), nominal: characteristics without
order (star rating). Include frequency tables, column/bar graphs, and dot plots.
Numerical data: quantities and numbers; discrete: only specific values such as whole numbers (test scores),
continuous: any value (height of a tree). Include frequency tables, dot plots, stem & lea plots, boxplots, and histograms
Explanatory variable: independent or x variable
Response variable: dependent or y variable
Form: linear, non-linear, or no relationship
Direction: positive or negative
Strength: strong, moderate, weak
Perfect ± relationship: ±1
Strong ± relationship: ±0.75 to ±1; it can be concluded that y inc/dec as x increases
Moderate ± relationship: ±0.5 to ±0.75; there is some evidence to suggest that y inc/dec as x increases
Weak ± relationship: ±0.25 to ±0.5; there is limited evidence to suggest that y inc/dec as x increases
No relationship: –0.25 to 0.25
Correlation coefficient (r) determines direction and strength and is only reliable if data is linear with no outliers.
The coefficient of determination gives the percentage change in the RV that can be explained by the DV. The higher
the r2, the stronger the relationship between x and y, and the better the regression line fits.
ex. if r = -0.42, r2 = (-0.42)2; 17% of change in y can be explained by change in x, and 83% is unexplained.
Least squared regression line: y = a + bx, eg. height = 100 + 2.5 x age. This can be used to make predictions; ex.
what is the predicted height for an eight-year-old? 100 + 2.5 x 8 = 120
ex. interpret the slope for the equation p = 0.5 + – 31.4w. for every 1 unit increase in w (RV), p (EV) increases by 0.5.
Interpolation: prediction made within the original range of the data given; more reliable.
Extrapolation: prediction made outside the original range of the data given; less reliable.
Residuals: the difference between the actual value and the predicted value. A positive residual indicates the predicted
value is below the actual result; an underestimate, and a negative residual indicates the predicted value is above the
actual result; an overestimate. ex. calculate the residual of a 12 year old who is 142 cm tall. 100 + 2.5 x 12 = 130 →
142 – 130 = 12; positive residual as it is above the line
Sample questions
Comment on the appropriateness of fitting a linear model. If there is no clear pattern in the residual, it is appropriate.
Comment on the validity of the prediction. State whether it is interpolation or extrapolation.
X has a residual of ±2.6. what information does this provide about the EV? X is 2.6% below/above predicted EV.