Statistics The study of how we describe and make inferences from data (a conclusion
reached on the basis of evidence and reasoning)
Levels of measurement:
- Nominal
o Group classification
o No meaningful ranking
o Numerical coding arbitrary
- Ordinal
o Meaningful ranking
o Distances between categories no equal
- Interval
o Meaningful ranking
o Distances are equal
- Ratio
o Meaningful ranking
o Distances are equal
o Real zero
Discrete variable Measured in whole unites or categories (e.g. number of children)
Continuous variable Measured along a continuum (e.g. a person's height)
Mean (interval/ratio):
1. Changing any score will change the mean
2. Adding or removing any scores will change the mean
3. + - x / each score by a given value will cause the mean to change
4. Sum of differences from the mean = 0
5. Sum of squared differences (SS) from the mean = minimal
Median (ordinal/interval/ratio):
1. Sort all cases based on value, middle value
2. To determine the median from a frequency table we need to identify the first
category exceeding 50% in the cumulative percentage column
Measures of variability Measures of central tendency alone carry too little meaning to
adequately describe distributions of variables, therefore measures of variability are needed
Range (ordinal/ interval/ ratio)
1. . Range is the distance between the highest and lowest scores
2. It is always reported with the maximum and minimum scores (Q3 and Q1)
3. It is sensitive to outliers
Interquartile range (ordinal/interval/ratio):
1. Based on quartiles that split data into equal groups of cases
, 2. Based on the distance between Q3 and Q1
3. IQR = Q3 – Q1
Variance (interval/ratio) variance is based on the sum of squares (SS)
Standard deviation (interval/ ratio)
1. Standard deviation is the square root of the variance
2. Standard deviation is a measure of the average distance to the mean
Independent variable what we expect to have an influence on another variable
Dependent variable what we expect to be influenced by at least one IV
Confounding variable is an unanticipated variable not accounted from in a study that
could be causing or associated with a measured variable
Making causal claims:
1. Empirical evidence
2. Temporal sequence
3. Causality supported by theory and reason
Reverse causality Is when the direction of causality between two factors can be in either
direction (making inferences hard)
Pearson’s r (interval/ratio) Scatterplots provide rudimentary data, therefore, a measure
of association is needed to express the strength and direction of a relationship
Assumptions of Pearson’s r:
1. Linearity: IV affects DV, straight line in plot
2. Homoscedasticity: equal variance for Y on different values of X
3. Normality: symmetrical distribution of X and Y
Crosstabs (nominal/ordinal)
1. A table that depicts a possible relationship between IV and DV
2. IV on columns and DV on rows
Populations group about which we want to generalize