Summary statistics
Summary statistics - correct answer-- The first thing you want to do in Data analysis
- describe data sets or relationships between variables with one or a few numbers:
- What is the central value?
- How widely are values spread from the center?
- Are there data that are very atypical?
- Are variables related?
Why central tendency? - correct answer-- You can't report the 1000 values to your boss!
- Describe a whole set of data with a single value that represents a central or typical value
for the distribution.
Mean - correct answer-- the sum of the value of each observation in a dataset divided by the
number of observations
Median - correct answer-- the "middle" number when those numbers are listed in order from
smallest to greatest. When dataset x is ordered:
Mode - correct answer-- the value that appears most often in a set of data values
three main measures of central tendency - correct answer-- mean
- median
- mode
central tendency - correct answer-- do not work on nominal or ordinal scale
- sensitivity to outliers
Measures of variability - correct answer-- Dispersion: variance, standard deviation,
interquartile range, and range
Variance - correct answer-The sum of the squared deviations from the mean divided by the
number of observations minus one.
interquartile range - correct answer-- Difference between the first and third quartiles of the
distribution: Q3 -Q1
- focuses on the central portion of the dataset and is less influenced by outliers than the
range.
Range - correct answer-The maximum value for a variable minus the minimum value for that
variable.
Information from SD - correct answer-- Measure of Variability: SD helps quantify the spread
of data points. Are they tightly clustered or widely scattered?
- Risk/ Uncertainty Assessment: high SD indicates lower consistency (usually a bad signal
for customer feedback or brand perception)
Summary statistics - correct answer-- The first thing you want to do in Data analysis
- describe data sets or relationships between variables with one or a few numbers:
- What is the central value?
- How widely are values spread from the center?
- Are there data that are very atypical?
- Are variables related?
Why central tendency? - correct answer-- You can't report the 1000 values to your boss!
- Describe a whole set of data with a single value that represents a central or typical value
for the distribution.
Mean - correct answer-- the sum of the value of each observation in a dataset divided by the
number of observations
Median - correct answer-- the "middle" number when those numbers are listed in order from
smallest to greatest. When dataset x is ordered:
Mode - correct answer-- the value that appears most often in a set of data values
three main measures of central tendency - correct answer-- mean
- median
- mode
central tendency - correct answer-- do not work on nominal or ordinal scale
- sensitivity to outliers
Measures of variability - correct answer-- Dispersion: variance, standard deviation,
interquartile range, and range
Variance - correct answer-The sum of the squared deviations from the mean divided by the
number of observations minus one.
interquartile range - correct answer-- Difference between the first and third quartiles of the
distribution: Q3 -Q1
- focuses on the central portion of the dataset and is less influenced by outliers than the
range.
Range - correct answer-The maximum value for a variable minus the minimum value for that
variable.
Information from SD - correct answer-- Measure of Variability: SD helps quantify the spread
of data points. Are they tightly clustered or widely scattered?
- Risk/ Uncertainty Assessment: high SD indicates lower consistency (usually a bad signal
for customer feedback or brand perception)