questions
Linear Modeling (Arizona University)
, 88
Paper 4, Section I
5K Statistical Modelling
Consider the normal linear model where the n-vector of responses Y satisfies
Y = Xβ + ε with ε ∼ Nn(0, σ 2 I) and X is an n × p design matrix with full column
rank. Write down a (1 − α)-level confidence set for β.
Define the Cook’s distance for the observation (Yi, xi) where xTi is the ith row of X,
and give its interpretation in terms of confidence sets for β.
In the model above with n = 100 and p = 4, you observe that one observation has
Cook’s distance 3.1. Would you be concerned about the influence of this observation?
Justify your answer.
[Hint: You may find some of the following facts useful:
1. If Z ∼ χ24, then P(Z ≤ 1.06) = 0.1, P(Z ≤ 7.78) = 0.9.
2. If Z ∼ F4,96, then P(Z ≤ 0.26) = 0.1, P(Z ≤ 2.00) = 0.9.
3. If Z ∼ F96,4, then P(Z ≤ 0.50) = 0.1, P(Z ≤ 3.78) = 0.9.]
Part II, 2014 List of Questions
, 89
Paper 3, Section I
5K Statistical Modelling
In an experiment to study factors affecting the production of the plastic polyvinyl
chloride (PVC), three experimenters each used eight devices to produce the PVC and
measured the sizes of the particles produced. For each of the 24 combinations of device
and experimenter, two size measurements were obtained.
The experimenters and devices used for each of the 48 measurements are stored in
R as factors in the objects experimenter and device respectively, with the measurements
themselves stored in the vector psize. The following analysis was performed in R.
> fit0 <- lm(psize ~ experimenter + device)
> fit <- lm(psize ~ experimenter + device + experimenter:device)
> anova(fit0, fit)
Analysis of Variance Table
Model 1: psize ~ experimenter + device
Model 2: psize ~ experimenter + device + experimenter:device
Res.Df RSS Df Sum of Sq F Pr(>F)
1 38 49.815
2 24 35.480 14 14.335 0.6926 0.7599
Let X and X0 denote the design matrices obtained by model.matrix(fit) and
model.matrix(fit0) respectively, and let Y denote the response psize. Let P and P0
denote orthogonal projections onto the column spaces of X and X0 respectively.
For each of the following quantities, write down their numerical values if they appear
in the analysis of variance table above; otherwise write ‘unknown’.
1. (I − P )Y 2
2. X(XT X)−1XT Y 2
2
3. (I − P0)Y 2 − (I − P )Y
(P − P0)Y 2/14
4.
(I − P )Y 2/24
Σ 48
5. i=1 Yi/48
Out of the two models that have been fitted, which appears to be the more
appropriate for the data according to the analysis performed, and why?
Part II, 2014 List of Questions [TURN OVER
, 90
Paper 2, Section I
5K Statistical Modelling
Define the concept of an exponential dispersion family. Show that the family of
scaled binomial distributions 1nBin(n, p), with p ∈ (0, 1) and n ∈ N, is of exponential
dispersion family form.
Deduce the mean of the scaled binomial distribution from the exponential dispersion
family form.
What is the canonical link function in this case?
Paper 1, Section I
5K Statistical Modelling
Write down the model being fitted by the following R command, where y ∈ {0, 1, 2, . . .}n
and X is an n × p matrix with real-valued entries.
fit <- glm(y ~ X, family = poisson)
Write down the log-likelihood for the model. Explain why the command
sum(y) - sum(predict(fit, type = "response"))
gives the answer 0, by arguing based on the log-likelihood you have written down.
[Hint: Recall that if Z ∼ Pois(µ) then
µke−µ
P(Z = k) =
k!
for k ∈ {0, 1, 2, . . .}.]
Part II, 2014 List of Questions