8
Variance Estimation
8.1 Principal techniques of variance estimation
There exist several approaches to estimate the variances of estimators. The
two essential techniques are, on the one hand, the analytical approach, that
is to say, the formatting of expressions for variance estimators, and on the
other hand, replication methods that rely on re-samples conducted from the
initially selected sample.
The analytical approach encounters two types of difficulties. On the one
hand, it is necessary to manage the problem posed by the very complex cal-
culation of double inclusion probabilities πk , which occurs in the majority
of the sampling designs without replacement. On the other hand, it is nec-
essary to bypass the difficulty posed by the manipulation of non-linear esti-
mators. In fact, we know how to mathematically express the variance of a
linear expression, but it is no longer possible to make exact calculations when
products, ratios, powers and roots are involved. The treatment of the prob-
lem of second-order inclusion probabilities is quite complex, and requires us,
on the one hand, to formulate simplifying assumptions on the design, and on
the other hand, to completely explore the branching describing the sampling
design. It is possible to use a recursive formula (see on this topic Raj, 1968)
to construct variance estimators. This technique was used at the Institut Na-
tional de la Statistique et des Études Économiques, (INSEE, France) in the
POULPE software program used to estimate the variances in complex designs
(see Caron, 1999). On the other hand, the treatment of the problem posed by
the non-linearity of the estimators is more accessible due to the linearisation
technique (see on this topic Deville, 1999), once the sample size is ‘sufficiently
large’.
Replication methods such as the jackknife, the bootstrap and balanced half-
samples are used with ‘sufficiently large’ samples and permit the estimation
of variances for non-linear estimators. Nevertheless, notable difficulties ex-
ist when the sampling is complex (multi-stage designs, unequal probability
designs, multi-phase designs) and, above all, the properties of the variance
,294 8 Variance Estimation
estimator (bias, in particular) are not as well controlled as in the analytical
approach when the sampling design is no longer simple random. The reader
interested in these methods can refer to Wolter (1985); Efron and Tibshirani
(1993) and Rao and Sitter (1995).
8.2 Method of estimator linearisation
The idea consists of linearising a complex estimator and assimilating its vari-
ance, under certain conditions, to that of its linear approximation. We then
encounter, in a standard manner, the problem of variance estimation for a
linear estimator. It is this approach that allows for the calculation of the pre-
cision of calibrated estimators, presented in Chapters 6 and 7 (ratio estimator,
regression estimator, marginal calibration estimators), and of estimators with
complex parameters, such as correlation coefficients, regression coefficients
and inequality indicators. To estimate a parameter θ = f (Y 1 , Y 2 , ..., Y p ),
where Y i is the true total of a variable y i (i = 1 to p), we use the substitution
estimator modelled on the same functional form, that is:
θ = f (Y 1 , Y 2 , ..., Y p ),
where Y i is a linear estimator of Y i and therefore of type
wk (S)yki
k∈S
(for example, the unbiased Horvitz-Thompson estimator), and f is a reason-
ably smooth function of Rp in R, in practice of class C 2 (twice differentiable,
the second-order derivative being continuous). If the mean estimators Y i /N
have a mean square error that varies by 1/n (which is always the case in
practice), and if n is sufficiently large so that 1/n3/2 is negligible compared to
1/n (it is therefore an ‘asymptotic’ vision where n and N become very large),
then we show that var(θ) ≈ var(V ), where V is built on the model of Y i (thus
with the same weights), being
V = wk (S)vk ,
k∈S
with, for all k ∈ S,
p
∂f (a1 , a2 , ..., ap ) ''
vk = ykj (Y 1 ,Y 2 ,...,Y p ) .
j=1
∂aj
The new variable vk is called ‘linearisation’ of θ. The variance estimator of θ
is naturally obtained from a variance estimator of V by replacing vk (incal-
culable) with:
, Exercise 8.1 295
∂f (a1 , a2 , ..., ap ) ''
p
vk = ykj '(Y 1 ,Y 2 ,...,Y p ) .
j=1
∂aj
We can show that this substitution is judicious (p remaining fixed when n
increases). We can also proceed stepwise: if θ = f (Y 1 , Y 2 , ..., Y p , ψ), where ψ
is a function of totals (Y p+1 , Y p+2 , ..., Y q ), for which we already calculated a
linearised variable uk , then the linearisation of θ is:
p
∂f (a1 , a2 , ..., ap , z) ''
vk = ykj (Y 1 ,Y 2 ,...,Y p ,ψ)
j=1
∂aj
∂f (a1 , a2 , ..., ap , z) ''
+ uk (Y 1 ,Y 2 ,...,Y p ,ψ) .
∂z
It is then sufficient to form
vk by replacing all the unknown totals with their
respective estimators.
EXERCISES
Exercise 8.1 Variances in an employment survey
The 1989 INSEE employment survey leads to Table 8.1, expressed in thou-
sands of people. The sample size is larger than 10000, and the confidence
intervals are given under the assumption of asymptotic normality of estima-
tors.
Table 8.1. Labour force, employed and unemployed: Exercise 8.1
Estimated size 95% confidence interval
Labour force 24062 ± 129
Employed 21754 ± 149
Unemployed 2308 ± 76
1. Estimate the unemployment rate defined as the percentage of unemployed
people among the labour force (the labour force is the sum of those em-
ployed and unemployed). What type of estimator is this?
2. Give the approximate mathematical expression for the estimated mean
square error (MSE) of the estimated unemployment rate, as a function of:
• the estimated variance of the estimator for the labour force,
• the estimated variance of the estimator for the number of unemployed,
• the estimated covariance between the estimators for the labour force
and the number of unemployed,
• the estimator of the labour force,
• and the estimator of the unemployment rate.
Variance Estimation
8.1 Principal techniques of variance estimation
There exist several approaches to estimate the variances of estimators. The
two essential techniques are, on the one hand, the analytical approach, that
is to say, the formatting of expressions for variance estimators, and on the
other hand, replication methods that rely on re-samples conducted from the
initially selected sample.
The analytical approach encounters two types of difficulties. On the one
hand, it is necessary to manage the problem posed by the very complex cal-
culation of double inclusion probabilities πk , which occurs in the majority
of the sampling designs without replacement. On the other hand, it is nec-
essary to bypass the difficulty posed by the manipulation of non-linear esti-
mators. In fact, we know how to mathematically express the variance of a
linear expression, but it is no longer possible to make exact calculations when
products, ratios, powers and roots are involved. The treatment of the prob-
lem of second-order inclusion probabilities is quite complex, and requires us,
on the one hand, to formulate simplifying assumptions on the design, and on
the other hand, to completely explore the branching describing the sampling
design. It is possible to use a recursive formula (see on this topic Raj, 1968)
to construct variance estimators. This technique was used at the Institut Na-
tional de la Statistique et des Études Économiques, (INSEE, France) in the
POULPE software program used to estimate the variances in complex designs
(see Caron, 1999). On the other hand, the treatment of the problem posed by
the non-linearity of the estimators is more accessible due to the linearisation
technique (see on this topic Deville, 1999), once the sample size is ‘sufficiently
large’.
Replication methods such as the jackknife, the bootstrap and balanced half-
samples are used with ‘sufficiently large’ samples and permit the estimation
of variances for non-linear estimators. Nevertheless, notable difficulties ex-
ist when the sampling is complex (multi-stage designs, unequal probability
designs, multi-phase designs) and, above all, the properties of the variance
,294 8 Variance Estimation
estimator (bias, in particular) are not as well controlled as in the analytical
approach when the sampling design is no longer simple random. The reader
interested in these methods can refer to Wolter (1985); Efron and Tibshirani
(1993) and Rao and Sitter (1995).
8.2 Method of estimator linearisation
The idea consists of linearising a complex estimator and assimilating its vari-
ance, under certain conditions, to that of its linear approximation. We then
encounter, in a standard manner, the problem of variance estimation for a
linear estimator. It is this approach that allows for the calculation of the pre-
cision of calibrated estimators, presented in Chapters 6 and 7 (ratio estimator,
regression estimator, marginal calibration estimators), and of estimators with
complex parameters, such as correlation coefficients, regression coefficients
and inequality indicators. To estimate a parameter θ = f (Y 1 , Y 2 , ..., Y p ),
where Y i is the true total of a variable y i (i = 1 to p), we use the substitution
estimator modelled on the same functional form, that is:
θ = f (Y 1 , Y 2 , ..., Y p ),
where Y i is a linear estimator of Y i and therefore of type
wk (S)yki
k∈S
(for example, the unbiased Horvitz-Thompson estimator), and f is a reason-
ably smooth function of Rp in R, in practice of class C 2 (twice differentiable,
the second-order derivative being continuous). If the mean estimators Y i /N
have a mean square error that varies by 1/n (which is always the case in
practice), and if n is sufficiently large so that 1/n3/2 is negligible compared to
1/n (it is therefore an ‘asymptotic’ vision where n and N become very large),
then we show that var(θ) ≈ var(V ), where V is built on the model of Y i (thus
with the same weights), being
V = wk (S)vk ,
k∈S
with, for all k ∈ S,
p
∂f (a1 , a2 , ..., ap ) ''
vk = ykj (Y 1 ,Y 2 ,...,Y p ) .
j=1
∂aj
The new variable vk is called ‘linearisation’ of θ. The variance estimator of θ
is naturally obtained from a variance estimator of V by replacing vk (incal-
culable) with:
, Exercise 8.1 295
∂f (a1 , a2 , ..., ap ) ''
p
vk = ykj '(Y 1 ,Y 2 ,...,Y p ) .
j=1
∂aj
We can show that this substitution is judicious (p remaining fixed when n
increases). We can also proceed stepwise: if θ = f (Y 1 , Y 2 , ..., Y p , ψ), where ψ
is a function of totals (Y p+1 , Y p+2 , ..., Y q ), for which we already calculated a
linearised variable uk , then the linearisation of θ is:
p
∂f (a1 , a2 , ..., ap , z) ''
vk = ykj (Y 1 ,Y 2 ,...,Y p ,ψ)
j=1
∂aj
∂f (a1 , a2 , ..., ap , z) ''
+ uk (Y 1 ,Y 2 ,...,Y p ,ψ) .
∂z
It is then sufficient to form
vk by replacing all the unknown totals with their
respective estimators.
EXERCISES
Exercise 8.1 Variances in an employment survey
The 1989 INSEE employment survey leads to Table 8.1, expressed in thou-
sands of people. The sample size is larger than 10000, and the confidence
intervals are given under the assumption of asymptotic normality of estima-
tors.
Table 8.1. Labour force, employed and unemployed: Exercise 8.1
Estimated size 95% confidence interval
Labour force 24062 ± 129
Employed 21754 ± 149
Unemployed 2308 ± 76
1. Estimate the unemployment rate defined as the percentage of unemployed
people among the labour force (the labour force is the sum of those em-
ployed and unemployed). What type of estimator is this?
2. Give the approximate mathematical expression for the estimated mean
square error (MSE) of the estimated unemployment rate, as a function of:
• the estimated variance of the estimator for the labour force,
• the estimated variance of the estimator for the number of unemployed,
• the estimated covariance between the estimators for the labour force
and the number of unemployed,
• the estimator of the labour force,
• and the estimator of the unemployment rate.