SOLUTIONS
,Chapter 1
Exercises
Section 1.1
1.1 From the yield data in Table 1.1 in the text, and using the given expression,
we obtain
s2A = 2.05
s2B = 7.64
from where we observe that s2A is greater than s2 .B
1.2 A table of values for di is easily generated; the histogram along with sum-
mary statistics obtained using MINITAB is shown in the Figure below.
Summary for d
Mean 3.0467
V ariance 11.0221
N 50
1st Q uartile 1.0978
3rd Q uartile 5.2501
Maximum 9.1111
Figure 1.1: Histogram for d = YA − YB data with superimposed theoretical distribution
1
@
@SS
eeisis
mmiciicsis
oolala
titoionn
,2 CHAPTER 1.
From the data, the arithmetic average, d¯, is obtained as
d¯ = 3.05 (1.1)
And now, that this average is positive, not zero, suggests the possibility that
YA may be greater than YB. However conclusive evidence requires a measure of
intrinsic variability.
1.3 Directly from the data in Table 1.1 in the text, we obtain y¯A = 75.52; y¯B =
72.47; and s2A = 2.05; s2B = 7.64. Also directly from the table of differences, di,
generated for Exercise 1.2, we obtain: d¯ = 3.05; however sd2 = 11.02, not 9.71.
Thus, even though for the means,
d¯ = y¯A — y¯B
for the variances,
s2 /= s2 + s2
d A B
The reason for this discrepancy is that for the variance equality to hold, Y A
must be completely independent of YB so that the covariance between YA and YB
is precisely zero. While this may be true of the actual random variable, it is
not always strictly the case with data. The more general expression which is valid
in all cases is as follows:
s2 = s2 + s2 — 2sAB (1.2)
d A B
where sAB is the covariance between yA and yB (see Chapters 4 and 12). In
this particular case, the covariance between the yA and yB data is computed as
sAB = —0.67
Observe that the value computed for s2d (11.02) is obtained by adding —2sAB
to s2 + s2 , as in Eq (1.2).
A B
Section 1.2
1.4 From the data in Table 1.2 in the text, s2x = 1.2.
1.5 In this case, with x̄ = 1.02, and variance, s2x = 1.2, even though the num-
bers are not exactly equal, within limits of random variation, they appear to be
close enough, suggesting the possibility that X may in fact be a Poisson random
variable.
Section 1.3
1.6 The histograms obtained with bin sizes of 0.75, shown below, contain 10
bins for YA versus 8 bins for the histogram of Fig 1.1 in the text, and 14 bins
for YB versus 11 bins in Fig 1.2 in the text. These new histograms show a bit
more detail but the general features displayed for the data sets are essentially
unchanged. When the bin sizes are expanded to 2.0, things are slightly different,
, @LECTJULIESOLUTIONSSTUVIA
Histogram of YA (Bin size 0.75)
18
16
14
12
Frequency
10
8
6
4
2
0
72.0 73.5 75.0 76.5 78.0 79.5
YA
Histogram of YB (Bin size 0.75)
6
5
4
Frequency
3
2
1
0
67.5 69.0 70.5 72.0 73.5 75.0 76.5 78.0
YB
Figure 1.2: Histogram for YA, YB data with small bin size (0.75)
Histogram of YA (Bin size 2.0)
25
20
15
Frequency
10
5
0
72 74 76 78 80
YA
Histogram of YB(Bin Size 2.0)
14
12
10
Frequency
8
6
4
2
0
67 69 71 73 75 77 79
YB
Figure 1.3: Histogram for YA, YB data with larger bin size (2.0)
, @LECTJULIESOLUTIONSSTUVIA
as shown below. These histograms now contain fewer bins (5 for YA and 7 for
YB); and, hence in general, show less of the true character of the data sets.
1.7 The values computed from the data for y¯A and sA imply that the interval
of interest, y¯ A ± 1.96sA, is 75.52 ± 2.81, or (72.71, 78.33). From the frequency
distribution of Table 1.3 in the text, 48 of the 50 points lie in this range, the
excluded points being (i) the single point in the 71.51–72.50 bin and (ii) the
single point in the 78.51–79.50 bin. Thus, this interval contains 96% of the data.
1.8 For the YB data, the interval of interest, y¯ B ± 1.96sB, is 72.47 ± 5.41, or
(67.06, 77.88). From Table 1.4 in the text,we see that approximately 48 of the 50
points lie in this range (excluding the 2 points in the 77.51–78.50 bin). Thus, this
interval also contains approximately 96% of the data.
1.9 From Table 1.4 in the text, we observe that the relative frequency associated
with x = 4 is 0.033; that associated with x = 5 is 0.017 and 0 thereafter. The
implication is that the relative frequency associated with x > 3 = 0.050. Hence,
the value of x such that only 5% of the data exceeds this value is x = 3.
1.10 Using µ = 75.52 and σ = 1.43, the theoretical values computed for the
function in Eq 1.3 in the text, (for y = 72, 73, . . . , 79) are shown in the table
below along with the the corresponding relative frequency values from Table 1.3
in the text.
Theoretical Relative
YA Group y f (y) Frequency
71.51-72.50 72 0.014 0.02
72.51-73.50 73 0.059 0.04
73.51-74.50 74 0.159 0.18
74.51-75.50 75 0.261 0.34
75.51-76.50 76 0.264 0.14
76.51-77.50 77 0.163 0.16
77.51-78.50 78 0.062 0.10
78.51-79.50 79 0.014 0.02
TOTAL 50 0.996 1.00
The agreement between the theoretical values and the relative frequency is rea-
sonable but not perfect.
1.11 This time time with µ = 72.47 and σ = 2.76 and for y = 67, 68, 69, . . . , 79,
we obtain the table shown below for the YB data (along with the the corre-
sponding relative frequency values from Table 1.4 in the text).
, 5
Theoretical Relative
YB Group y f (y) Frequency
66.51-67.50 67 0.020 0.02
67.51-68.50 68 0.039 0.06
68.51-69.50 69 0.066 0.08
69.51-70.50 70 0.097 0.16
70.51-71.50 71 0.125 0.04
71.51-72.50 72 0.142 0.14
72.51-73.50 73 0.142 0.08
73.51-74.50 74 0.124 0.12
74.51-75.50 75 0.095 0.10
75.51-76.50 76 0.064 0.12
76.51-77.50 77 0.038 0.00
77.51-78.50 78 0.019 0.04
78.51-79.50 79 0.009 0.00
TOTAL 50 0.980 1.00
There is reasonable agreement between the theoretical values and the relative
frequency.
1.12 Using λ = 1.02, the theoretical values of the function f (x|λ) of Eq 1.4
in the text at x = 0, 1, 2, . . . 6 are shown in the table below along with the
corresponding relative frequency values from Table 1.5 in the text.
Theoretical Relative
X f (x|λ = 1.02) Frequency
0 0.3606 0.367
1 0.3678 0.383
2 0.1876 0.183
3 0.0638 0.017
4 0.0163 0.033
5 0.0033 0.017
6 0.0006 0.000
TOTAL 1.0000 1.000
The agreement between the theoretical f (x) and the data relative frequency is
reasonable. (This pdf was plotted in Fig 1.6 of the text.)
Application Problems
1.13 (i) The following is one way to generate a frequency distribution for this
data:
,6 CHAPTER 1.
Relative
X Frequency Frequency
1.00-3.00 4 0.047
3.01-5.00 9 0.106
5.01-7.00 11 0.129
7.01-9.00 20 0.235
9.01-11.00 10 0.118
11.01-13.00 9 0.106
13.01-15.00 3 0.035
15.01-17.00 6 0.070
17.01-19.00 6 0.070
19.01-21.00 5 0.059
21.01-23.00 1 0.012
23.01-25.00 1 0.012
TOTAL 85 0.999
The histogram resulting from this frequency distribution is shown below where
we observe that it is skewed to the right. Superimposed on the histogram is a
theoretical gamma distribution, which fits the data quite well. The variable in
question, time-to-publication, is (a) non-negative, (b) continuous, and (c) has the
potential to be a large number (if a paper goes through several revisions before it
is finally accepted, or if the reviewers are tardy in completing their reviews in the
first place). It is therefore not surprising that the histogram will be skewed to the
right as shown.
Histogram of x
Shape 3.577
Scale 2.830
N 85
Frequency
Figure 1.4: Histogram for time-to-publication data
(ii) From this frequency distribution and the histogram, we see that the “most
popular” time-to-publication is in the range from 7-9 months (centered at 8
months); from the relative frequency values, we note that 41/85 or 0.482 is the
, 7
fraction of the papers that took longer than this to publish.
1.14 (i) A plot of the histogram for the 20-sample averages, yi , generated as
prescribed is shown in the top panel of the figure below. We note the narrower
range occupied by this data set as well as its more symmetric nature. (Super-
imposed on this histogram is a theoretical normal distribution distribution.)
(ii) A histogram of the average of averages, zi, is shown in the bottom panel of
the figure. The “averaging” significantly narrows the range of the data and also
makes the data set somewhat more symmetric.
Histogram of y
Mean 10.12
StDev 0.8088
N 85
Frequency
Histogram of z
Mean 10.38
StDev 0.7148
N 85
Frequency
Figure 1.5: Histogram for time-to-publication data
1.15 (i) Average number of safety incidents per month, x¯ = 0.500; the associated
variance, s2 = 0.511. The frequency table is shown below:
,8 CHAPTER 1.
Relative
X Frequency Frequency
0 30 0.625
1 12 0.250
2 6 0.125
3 0 0.000
TOTAL 48 1.000
The resulting histogram is shown below.
Histogram of SafetyIncidents
Frequency
Figure 1.6: Histogram for safety incidents data
(ii) It is reasonable to consider the relative frequency of occurrence of the safety
incidents as an acceptable measure of the “chances” of obtaining each indicated
number of occurrences: since fr (0) = 0.625, fr (1) = 0.250, fr (2) = 0.125,
fr(3) = 0.000 = fr (4) = fr (5), these may then be considered as reasonable
estimates of the chances of observing the indicated occurrences.
(iii) From the postulated model:
e−0.50.5x
f (x) =
x!
we obtain the following table which shows the theoretical probability of occur-
rence side-by-side with the relative frequency data; it indicates that the model
actually fits the data quite well.
, 9
Theoretical Relative
X Probability, f (x) Frequency
0 0.607 0.625
1 0.303 0.250
2 0.076 0.125
3 0.012 0.000
4 0.002 0.000
5 0.000 0.000
TOTAL 1.000 1.000
(iv) Assuming that this is a reasonable model, then we may use it to compute the
“probability” of observing 1, 3, 2, 3 safety incidents (by pure chance alone)
respectively over a period of 4 consecutive months. From the theoretical results
in (iii) above, we note that the probability of observing 1 incident (by pure
chance alone) is a reasonable 0.303; for 2 incidents, the probability is 0.076;
it appears that the probability of observing 3 incidents by pure chance alone
is rare: 0.012 or 1.2%. Observing another set of 3 incidents just two months after
observing the first set of 3 incidents seems to suggest that something more
systematic than pure chance alone might be responsible. However, these state-
ments are not meant to be “definitive” or conclusive; they merely illustrates how
one may use this model to answer the posed question.
1.16 (i) The histograms for XB and XA are shown below, plotted side-by-side
and on the same x-axis scale. The histograms cover the same range (from about
200 to about 360), and the frequencies are similar. Strictly on the basis of a
visual inspection, therefore, it is difficult to say anything concrete about the
effectiveness of the weight-loss program. It is difficult to spot any real difference
between the two histograms.
200 240 280 320 360
XB XA
4 5
4
3
Frequency
3
2
2
1
1
0 0
200 240 280 320 360
Figure 1.7: Histograms for XB and XA