Chapter 1 – Looking at data: distributions
Categorical or quantitative variable
Categorical variable places a case into one of serveral groups of categories.
Quantitative variable takes numerical values for which arithmetic operations such as
adding and averaging make sense.
Cases, labels and variables
Cases = the objects described by a set of data.
Label = a special variable used in some data sets to distinguish the different cases.
Variable = characteristic of a case.
Displaying distributions with graphs
Categorical variables
Bar graph
Pie chart
Quantitative variables
Stemplot
Back-to-back stemplot
Histogram
Examining distributions
(1) Overall pattern + deviations.
(2) Shape, center + spread.
(3) Outliers.
Describing distributions with numbers
The mean (average value).
The median (middle value – midpoint of a distribution).
The quartiles (Q1 = median of the observations left from median & Q3 = the median
of the observations right from the median).
The five-number summary (Minimum – Q1 – Median – Q3 – Maximum).
IQR (interquartile range = Q3 – Q1).
When is something an outlier?
IQR x 1.5 above the third quartile & IQR x 1.5 below the first quartile.
Everything outside these values = outlier.
Standarddeviation