H1. Descriptive and inferential statistics
Statistics is a branch of mathematics used to summarize, analyse and interpret a group of
numbers or observations.
Descriptive statistics are procedures used to summarize, organize and make sense of a set
of scores(data) or observations. -> presented graphically, tabular or as summary statistics.
statistics that describe or summarize numeric observations or set of scores. data= sets of
individual measurements)
Inferential statistics: statistics used to interpret the meaning of descriptive statistics.
H2. Summarizing data
Measures of central tendency: statistical measures for locating a single score that is most
representative or descriptive of all scores in a distribution, those are values at or near the
center of a distribution.
N= population size
n= sample size
x= set of scores
Mean: ‘balance point’ average or arithmetic mean is the sum of a set of scores in a
distribution, divided by the total number of scores summed.
population mean:
sample mean:
mean is misleading when there is an outline in the dataset, median is not.
Median: middle value (midpoint) in a distribution of data listed in numeric order. 50% of the
scores in a distribution fall above the value and 50% falls above the value.
Median position (middle score): n+1/ 2
Graphically: median is 50% of first above 50%
Mode: the value in a data set that occurs most often or most frequently in a distribution.
,Five characteristics of the mean:
1. Changing an existing score will change the mean (every score effects the mean) ->
increasing/decreasing value = increasing/decreasing mean
2. Adding a new score or removing an existing score:
Adding score above the mean = increasing mean
Adding score below the mean = decreasing mean
Removing score above the mean = decreasing mean
Removing score below the mean = increasing mean
Add or delete a score equal to the mean = no changing mean
3. Adding, subtracting, multiplying or dividing each score by a constant. This means that the
mean will also be changed by this constant when every score in the distribution is changed
by the same constant.
4. Summing the differences of scores from the mean is zero which indicates the balance
point. This is the point where the difference of scores above the mean is the same as the
difference of scores below the mean. (difference =0) notation: ∑ ❑ x−M
5. summing the squared differences of scores from their mean=minimal, it is the smallest
possible positive solution any other constant(than the mean) will produce a larger result.
Notation for describing the sum of the squared differences of scores from their mean is
(¿ x−M ²)
∑¿
Mean is used to describe data that are normally distributed and measures on an interval/ratio
scale. Mean is used to describe the distance that scores deviate from the mean (differences).
Normal distribution: is a theoretical distribution with data that are symmetrically distributed
around the mean, median and the mode. (all the same in centre of distribution). The mean is
used to summarize the data.
, Median is used to describe skewed distributions of data and measures on an ordinal scale.
Distance between scores is not meaningful, median appropriate.
A skewed distribution is a distribution of scores that includes outliers or scores that falls
substantially above or below most other scores in a data set.
Positively skewed distribution: a few outliers are larger (right tail in graph) than most other
scores.
Negatively skewed distribution: a few outliers are smaller (left tail in graph) than most other
scores.
Median appropriate measure of central tendency to describe data from skewed distribution
because the outliers distort the value of the mean.
Mode: modal distributions and measures on nominal scale to identify something or someone
with no quantity.
Modal distribution is a distribution of scores where one or more scores occur most often or
most frequently.
1. unimodal= one mode
2. bimodal= two modes
3. multimodal = more than two
4. nonmodal = no mode
H4. Summarizing data variability
Variability is a measure of dispersion or spread of scores in a distribution and ranges from 0
to endless; the question here is how far do scores vary from the mean and how do they vary
in general? Measures of variability includes the range, variance and standard deviation.
Range: the difference between the largest value (L) and the smallest value (S) in a data set
and is most informative for data sets without outliers. Range = L - S
Range and mean all often reported together.
Fractiles are measures that divide data sets into two or more equal parts. (examples:
median, quartiles, deciles and percentiles).
Statistics is a branch of mathematics used to summarize, analyse and interpret a group of
numbers or observations.
Descriptive statistics are procedures used to summarize, organize and make sense of a set
of scores(data) or observations. -> presented graphically, tabular or as summary statistics.
statistics that describe or summarize numeric observations or set of scores. data= sets of
individual measurements)
Inferential statistics: statistics used to interpret the meaning of descriptive statistics.
H2. Summarizing data
Measures of central tendency: statistical measures for locating a single score that is most
representative or descriptive of all scores in a distribution, those are values at or near the
center of a distribution.
N= population size
n= sample size
x= set of scores
Mean: ‘balance point’ average or arithmetic mean is the sum of a set of scores in a
distribution, divided by the total number of scores summed.
population mean:
sample mean:
mean is misleading when there is an outline in the dataset, median is not.
Median: middle value (midpoint) in a distribution of data listed in numeric order. 50% of the
scores in a distribution fall above the value and 50% falls above the value.
Median position (middle score): n+1/ 2
Graphically: median is 50% of first above 50%
Mode: the value in a data set that occurs most often or most frequently in a distribution.
,Five characteristics of the mean:
1. Changing an existing score will change the mean (every score effects the mean) ->
increasing/decreasing value = increasing/decreasing mean
2. Adding a new score or removing an existing score:
Adding score above the mean = increasing mean
Adding score below the mean = decreasing mean
Removing score above the mean = decreasing mean
Removing score below the mean = increasing mean
Add or delete a score equal to the mean = no changing mean
3. Adding, subtracting, multiplying or dividing each score by a constant. This means that the
mean will also be changed by this constant when every score in the distribution is changed
by the same constant.
4. Summing the differences of scores from the mean is zero which indicates the balance
point. This is the point where the difference of scores above the mean is the same as the
difference of scores below the mean. (difference =0) notation: ∑ ❑ x−M
5. summing the squared differences of scores from their mean=minimal, it is the smallest
possible positive solution any other constant(than the mean) will produce a larger result.
Notation for describing the sum of the squared differences of scores from their mean is
(¿ x−M ²)
∑¿
Mean is used to describe data that are normally distributed and measures on an interval/ratio
scale. Mean is used to describe the distance that scores deviate from the mean (differences).
Normal distribution: is a theoretical distribution with data that are symmetrically distributed
around the mean, median and the mode. (all the same in centre of distribution). The mean is
used to summarize the data.
, Median is used to describe skewed distributions of data and measures on an ordinal scale.
Distance between scores is not meaningful, median appropriate.
A skewed distribution is a distribution of scores that includes outliers or scores that falls
substantially above or below most other scores in a data set.
Positively skewed distribution: a few outliers are larger (right tail in graph) than most other
scores.
Negatively skewed distribution: a few outliers are smaller (left tail in graph) than most other
scores.
Median appropriate measure of central tendency to describe data from skewed distribution
because the outliers distort the value of the mean.
Mode: modal distributions and measures on nominal scale to identify something or someone
with no quantity.
Modal distribution is a distribution of scores where one or more scores occur most often or
most frequently.
1. unimodal= one mode
2. bimodal= two modes
3. multimodal = more than two
4. nonmodal = no mode
H4. Summarizing data variability
Variability is a measure of dispersion or spread of scores in a distribution and ranges from 0
to endless; the question here is how far do scores vary from the mean and how do they vary
in general? Measures of variability includes the range, variance and standard deviation.
Range: the difference between the largest value (L) and the smallest value (S) in a data set
and is most informative for data sets without outliers. Range = L - S
Range and mean all often reported together.
Fractiles are measures that divide data sets into two or more equal parts. (examples:
median, quartiles, deciles and percentiles).