Made by Minjeong Kim
2024475001
Chapter 2 Visual Description of Data:
1) Sturges Rule: Sturges' rule is a formula used to determine the number of bins, or classes,
to use when creating a histogram for a dataset:
Formula: k=1+3.322 x 𝐥𝐥𝐥𝐥𝐥𝐥 𝟏𝟏𝟏𝟏 𝒏𝒏
k = number of classes (bins), n = number of observations in the dataset, log n = logarithm
to base 10
Example Question: For 𝟐𝟐𝟑𝟑 =8, so 3= 𝐥𝐥𝐥𝐥𝐥𝐥 𝟐𝟐 𝟖𝟖
Chapter 3 Statistical Description of Data:
1) Measures of Central tendency (the center)
a. Mean, population mean=𝝁𝝁, sample mean=x̄
𝑺𝑺𝑺𝑺𝑺𝑺 𝒐𝒐𝒐𝒐 𝒂𝒂𝒂𝒂𝒂𝒂 𝒗𝒗𝒗𝒗𝒗𝒗𝒗𝒗𝒗𝒗𝒗𝒗
𝒕𝒕𝒕𝒕𝒕𝒕 𝒏𝒏𝒏𝒏𝒏𝒏𝒏𝒏𝒏𝒏𝒏𝒏 𝒐𝒐𝒐𝒐 𝒗𝒗𝒗𝒗𝒗𝒗𝒗𝒗𝒗𝒗𝒗𝒗
, ∑ 𝒙𝒙𝒊𝒊
Population mean: 𝝁𝝁=
𝑵𝑵
∑ 𝒙𝒙𝒊𝒊
Sample mean: x̄=
𝒏𝒏
b. Weighted mean: (When you have grouped data)
∑ 𝒙𝒙𝒊𝒊 𝒘𝒘𝒊𝒊
𝝁𝝁 = ∑ 𝒘𝒘𝒊𝒊
c. Median: 1) Put the data in array 2A. IF the data set has odd numbers -> median=middle
value 2B. Data set has even number of numbers -> median=average of two middle
values
d. Mode: the most frequent value
Just one mean/median for given data set. There may be more than one value for the mode.
-If mean=median=mode: the shape of the distribution is symmetric.
-If mode<median<mean: the shape of the distribution trails to the right. (Positively skewed)
-If mean < median<mode: the shape of the distribution trails to the left. (Negatively skewed).
2) The Spread
a. Range: Largest value minus smallest value
,b. Residuals: Difference between each data value in the set and the group mean
Population: 𝒙𝒙𝒊𝒊 − 𝝁𝝁
Sample: 𝒙𝒙𝒊𝒊 − x̄
c. MAD (Mean Absolute Deviations): Summing the absolute values of all residuals and
dividing by the number of values in the set
(∑|𝒙𝒙𝒊𝒊 −𝝁𝝁|)
Population MAD:
𝑵𝑵
(∑|𝒙𝒙𝒊𝒊 −𝐱𝐱𝐱|)
Sample MAD:
𝒏𝒏
d. Variance (The Spread)
∑(𝒙𝒙𝒊𝒊 −𝝁𝝁)𝟐𝟐
Population 𝝈𝝈𝟐𝟐 =
𝑵𝑵
∑(𝒙𝒙𝒊𝒊 −𝐱𝐱𝐱)𝟐𝟐
Sample 𝒔𝒔 = 𝟐𝟐
𝒏𝒏−𝟏𝟏
e. Standard Deviation
Population: √𝝈𝝈𝟐𝟐 Sample: √𝒔𝒔𝟐𝟐
Coefficient of Variation (CV)
𝝈𝝈
CV= x 100% (relative amount of dispersion in the data)
𝝁𝝁
, 3) Relative Position-Quartiles
a) Quartiles
Most used quantiles=quartiles
Quartiles -> divide the values of data set into 4 subsets of equal size. Each account for
25%.
1) Arrange the N data values from smallest to largest.
𝑵𝑵+𝟏𝟏
2) First Quartile 𝑸𝑸𝟏𝟏 =data value at position
𝟒𝟒
𝟐𝟐(𝑵𝑵+𝟏𝟏)
3) Second Quartile 𝑸𝑸𝟐𝟐 = data values at position
𝟒𝟒
𝟑𝟑(𝑵𝑵+𝟏𝟏)
4) Third Quartiles 𝑸𝑸𝟑𝟑 = data values at position
𝟒𝟒
4) What is standardized value?
How far above or below the individual value is compared to the population mean in units of
standard deviation