Show all of your work, and upload this homework to Canvas.
1. The following data set represents the number of new computer accounts registered during ten consecutive
days:
43, 37, 50, 51, 58, 52, 45, 45, 58, 130
Answer: The ordered data is: 37, 43, 45, 45, 50, 51, 52, 58, 58, 130
(a) Compute the mean, median, IQR, and standard deviation
Answer:
1
P10
• mean = 10 i=1 xi = 56.9
50+51
• median = 2 = 50.5
• Q1 = 43+45
2 = 44; Q3 = 58+58
2 = 58
→ IQR = Q3 − Q1 = 58 − 44 = 14
1
P10
• variance = s2 = 10−1 2
i=1 (xi − x̄) = 702.7667
√ √
→ standard deviation = s = s2 = 702.7667 = 26.5097
(b) Check for outliers using the 1.5(IQR) rule, and indicate which data points are outliers.
Answer:
• Q1 − 1.5(IQR) = 44 − 1.5(14) = 23
• Q3 + 1.5(IQR) = 58 + 1.5(14) = 79
• Any values less than 23 or greater than 79 are outliers.
• Outlier: 130
(c) Remove the detected outliers and compute the new mean, median, IQR, and standard deviation.
Answer: The new ordered data: 37, 43, 45, 45, 50, 51, 52, 58, 58
P9
• mean = 19 i=1 xi = 48.78
• median = 50
• Q1 = 43+45
2 = 44; Q3 = 52+58
2 = 55
→ IQR = Q3 − Q1 = 55 − 44 = 11
1
P9
• variance = s2 = 9−1 2
i=1 (xi − x̄) = 48.4444
√ √
→ standard deviation = s = s2 = 48.4444 = 6.9602
(d) Make a conclusion about the effect of outliers on the basic descriptive statistics from (a) and (c).
Answer: The outlier increased the mean and variance. The median and IQR slightly increased
with the outlier but not by much. Thus, the mean and variance seem to be affected greatly by
outlier, but the median and IQR were not affected much by the outlier (robust)
2. A histogram of the price of diamonds, and a scatterplot of carat vs. price of diamonds are given below.
Diamond Price Diamond Carat vs. Price
12000
15000
Price (dollars)
8000
Count
10000
4000
5000
0
0
0 5000 10000 15000 1 2 3 4 5
Price (dollars) Carat
1
, (a) Describe the shape of the histogram of price of diamonds. (Where are the majority of diamond
prices located? Where are the minority of diamond prices located?)
Answer: The majority of diamond prices are in the lower end of the price spectrum between about
0 and 5, 000 dollars. After about 5, 000 dollar, the number of diamonds drop off.
(b) Are exponential, normal, or uniform distributions reasonable as the population distribution for the
price of diamonds? Justify your answer.
Answer: Since the histogram of our sample has a similar shape to an exponential distribution, an
exponential distribution is a reasonable choice for the population distribution.
(c) Describe the relationship between carat and price of diamonds. (What happens to price as number
of carats increases? What happens to the variability as number of carats increases?)
Answer: In general, as number of carats increases, the price of the diamond also increases. The
data points are more compact for lower (< 2) carats indicating lower variability in prices for lower
carats. Conversely, the data points are more spread out for higher (> 2) carats indicating higher
variability in prices for larger carats.
iid 2
Pn
3. Suppose Xi ∼ Unif(0, θ) for i = 1, . . . , n.. Suppose we propose an estimator for θ as θ̂ = n i=1 Xi .
(a) Is θ̂ an unbiased estimator for θ?
(b) Calculate se(θ̂) (Recall the standard error of an estimator is the square root of the variance of an
estimator).
Answer:
(a)
n
2X 2 θ
E(θ̂) = E(Xi ) = · n · = θ
n i=1 n 2
since Xi are i.i.d. So θ̂ is unbiased.
(b)
4 θ2
Var(θ̂) = Var(Xi ) = .
n2 3n
So √
se(θ̂) = Var(θ̂)1/2 = θ/ 3n.
iid
4. Let X1 , . . . , X4 ∼ Bern(p). Suppose we propose two estimators for p:
X1 +X2 +X3 +X4
p̂1 = 4
X1 +2X2 +X3
p̂2 = 4
(a) Show that both estimators are unbiased estimators of p.
(b) Which estimator is “best” in terms of having a smaller MSE? Calculate MSE(p̂1 ) and MSE(p̂2 )
(Recall that if an estimator θ̂ is unbiased, MSE(θ̂) = V ar(θ̂)).
Answer:
(a)
1 1
E(p̂1 ) = E(X1 + X2 + X3 + X4 ) = 4E(X) = p =⇒ Bias(p̂1 ) = E(p̂1 − p) = E(p̂1 ) − p = p − p = 0
4 4
1 1
E(p̂2 ) = E(X1 + 2X2 + X3 ) = (p + 2p + p) = p =⇒ Bias(p̂2 ) = E(p̂2 − p) = E(p̂2 ) − p = p − p = 0
4 4
So both estimators are unbiased for p.
2