Sources of Variation
Section 1.1 1.1.10 Color of a sign is the explanatory variable with white,
yellow, and red being the levels.
1.1.1 B.
1.1.11
1.1.2 B & C.
1.1.3 A.
Observed Sources of Sources of
1.1.4 C. Variation in: explained unexplained
1.1.5 E. f. whether the student variation variation
obeyed the sign
1.1.6 B.
60.34 if rigid librarian
1.1.7 predicted number of uses for items = {92.19 if eccentric poet Inclusion criteria a. color of the b. whether the subject
• c. time of day sign was left-handed or
1.1.8 right-handed
• e. age of subject
a. The inclusion criteria are having a clinical diagnosis of mild to d. attitude of student
moderate depression without any treatment four weeks prior and e. age of subject
during the study.
1.1.12
b. The purpose of randomly assigning subjects to the groups is to
make groups very similar except for the one variable (swimming a. The value 6.21 represents the overall mean quiz score, 5.50
with dolphins or not) that the researchers impose. Volunteering for a represents the group mean quiz score for people who used computer
group could introduce a confounding variable. notes, and
6.92 represents the group mean score for people who used paper notes.
c. It was important that the subjects in the control group swim
every day without dolphins so that this control group does b. We look to see how far 6.92 and 5.50 are from one another or from
everything (in- cluding swimming) that the experimental group the overall mean of 6.21 to determine whether the note-taking
does except that when they swim they don’t do it in the presence of method might affect the score.
dolphins. Without this we wouldn’t know whether just swimming c. The number 1.76 represents the typical deviation of an observa-
causes the difference in the reduction of depression symptoms. tion from the expected value, in this case, from the overall mean.
d. Yes, this is an experiment because the subjects were randomly The number 1.61 represents the typical deviation of an observation
as- signed to the two groups. after creating a model that takes into account whether the person is
using computer or paper notes.
1.1.9.
d. Because the standard deviation of the residuals represents the
Observed variation Sources Sources of left- over variation, we can see that after including the type of notes
in: of unexplained as an explanatory variable in our model the unexplained variation
d. substantial reduction explained variation has been reduced (down to 1.61 from 1.76). This tells us that
in depression symptoms variation knowing the type of note-taking method enables us to better predict
scores.
Inclusion criteria a. swimming with • g. problems in the 1.1.13 Random assignment should make the two groups very
• b. mild to moderate dolphins or not personal lives of similar with regard to variables like intelligence, previous knowl-
depression the subjects edge, or any other variable and thus likely eliminate possible
• c. no use of during the study confounding variables.
antidepressant drugs • h. illness of 1.1.14
or psychotherapy subjects
four weeks prior to during the a. This table shows us possible confounding variables but then
the study study shows that subjects in the two groups are quite similar with
Design regard to these characteristics, thus ruling out these possible
• e. swimming confounding variables.
• f. staying on an island b. We would want the p-values to be large, so we could say
for two weeks during that we have little to no evidence that there is a difference in mean
the study age, proportion of males, etc. between the two groups. We want our
groups to be very similar going into the study, so a causal conclusion
is possi- ble if we find a small p-value after applying the
treatment(s).
3
,4 CHAPTER 1 Sources of Variation
1.1.15 It is likely that 3- to 5-year-olds might have different c. R2 = 11.1328/199.62 = 0.0558. We can interpret this by saying
preferenc- es when it comes to toy or candy than 12- to 14-year- that 5.58% of the variation in the perceived level of risk is explained
olds. The older group is probably much more likely to prefer the by whether the name of the hurricane is male or female.
candy over the toy and the opposite could be true with the younger d. SSError = 199.62 − 11.13 = 188.49.
group. We would not
see this difference if the results of all the ages are combined together.
e. √ 188.4872/140 =
1.16. 0.28 if male name
Section 1.2 f. predicted hurricane risk rating = 5.29 + ,
{−0.28 if female name
1.2.1 B. SE of residuals = 1.16.
1.2.2 A, D. 1.2.16
1.2.3 C. a. The explanatory variable is the note-taking method and the re-
sponse variable is the quiz score.
1.2.4 A.
b. The effect of taking notes on paper is 0.71 and the effect of
1.2.5 C. taking notes on the computer is −0.71.
1.2.6 D. c. SSModel = 40 × (0.712) = 20.164.
1.2.7 B. d. R2 = 20.164/120.92 = 0.16675. We can interpret it by saying
1.2.8 Using the effects model, because 4.48 + 0.65 = 5.13 (the mean that 16.675% of the variation of quiz score is explained by the note-
of the scent group) and 4.48 − 0.65 = 3.83 (the mean of the non- taking method.
scent group), the models are equivalent. e. 120.92 – 20.164 = 100.756.
1.2.9 f. √100.756/38 = 1.628.
0.71 if using paper notes
a. SSModel. g. predicted quiz score = 6.21 + .
{−0.71 if using computer notes
b. SSError. 1.2.17
1.2.10 a. Because the sample sizes of each group are the same, the sample
size of each group is just half of the total sample size.
a. R2 = SSModel/SSTotal = 0.4651. ∑ (x − x)2 ∑ (y − y)2
b. R2 = 1 − SSError/SSTotal = 0.7111.
all obs i ̅ all obs i ̅ _1
b. +
1.2.11 n− n−
( _
2 1 _
2 1 )2
a. 8. ∑all obs(xi − x̅)2 + ∑all obs(yi − y̅)2 _1
b. 6 – 8 = –2, 10 – 8 = 2. = n
( _−1
)2
2
c. 74.
∑all obs(xi − x̅)2 + ∑all obs(yi − y̅)2
d. 40. =( )
n−2
e. 34.
Taking the square root we get ∑all obs(xi − x̅)2 + ∑all obs(yi − y̅)2
f. 0.5405.
√ n−2
⎛n n
1.2.12 2 2⎞
The explanatory variable is the type of testing environment;
a. it Use sum from 1 to n: ⎜∑( x − x̅)
i + ∑ (y i − y ̅) ⎟
1 i=1
_ i=1
2 n n
is categorical. ⎝ 2 −1 2 −1 ⎠
n
b. The response variable is the test score; it is quantitative. 2⎞
n
⎛n n 2 2
2
c. The two levels are quiet environment and distracting environment. ∑(xi − x̅) + ∑(yi − y̅) ∑(xi − x̅) + ∑(yi − y̅)
1.2.13
1
⎝
⎜
= _2 i=1 n−1
i=1
⎠
= i=1 ⎟ n−2
i=1
2
√
n n
a. SSTotal would probably be larger with these 10 subjects because ∑ (xi − x)̅ 2 + ∑(yi − y)̅ 2
with the wide variety of ages there would probably be more −0.28 and the effect of naming the hurricane Christopher is 5.57 −
n−2
variability in the test scores. 5.29 = 0.28. The SSModel is 142(0.282) = 11.1328.
b. SSModel would probably be the same because it would still repre-
sent the difference between testing environments.
c. SSError would probably be larger because there would probably
be more variability in the test scores within each group due to the
variability in ages.
1.2.14 The variance of the scores in the distracting environment is
2.5 and the variance of the scores in the distracting e n v i r o_
n m e n t is
6. The square root of the average of these two variances is √4.2_ 5 =
2.06. The SSError is 34, so the standard error of the residuals is
√34/8 = 2.06.
1.2.15
a. The explanatory variably is whether the name of the hurricane is
male or female and the response is the perceived risk level.
b. The effect of naming the hurricane Christina is 5.01 − 5.29
=
, Taking the square root, we get i=1 i=1
.
Section 1.3
1.3.1 D.
1.3.2 A.
1.3.3 D.
1.3.4 A.
1.3.5 A.
1.3.6 The validity conditions are not met because the
male sample size is small and the distribution of the
number of flip-flops owned by the males is quite skewed
to the right.
1.3.7
a. √(24. 382 + 36. 992)/2 = 31.33.
b. t = 92.16 − 60.34 = 4.06.
31.33 √1/32 + 1/32
,