5
Multi-stage Sampling
5.1 Definitions
We consider a partitioning of the population U into M parts, called primary
units (PU). Each PU is itself partitioned into Ni parts, called secondary units
(SU), identified by the pair (i, k), where k varies from 1 to Ni . The population
of secondary units in PU i is denoted Ui . It is possible to repartition each SU
and to iterate this process. We sample m PU (sample S) then, in general in-
dependently from one PU to another, we sample ni SU in PU i if it is sampled
(sample Si ): we say that we are faced with sampling of two stages. If this final
stage is sampled exhaustively, the sampling is called ‘cluster sampling’.
5.2 Estimator, variance decomposition, and variance
In a two-stage sampling design without replacement, if PU i is selected with
inclusion probability πi , and if SU (i, k) that it contains is selected with prob-
ability πk|i , then we estimate the total
M
Y = yi,k
i=1 k∈Ui
without bias by yi,k
Y = .
πi πk|i
i∈S k∈Si
The variance var(Y ) is the sum of two terms, knowing the ‘inter-class’ variance
var1 (E2|1 (Y )) and the ‘intra-class’ variance E1 (var2|1 (Y )), where 1 and 2 are
the indices representing the two successive sampling stages. In the case of a
simple random sample at each stage, when ni only depends on i, we show
that:
,160 5 Multi-stage Sampling
2
m ST2 M 2
M
ni S2,i
var(Y ) = M 2 1 − + Ni 1 − ,
M m m i=1 Ni ni
where
1
M
ST2 = (Yi − Y )2 ,
M − 1 i=1
1
M
Y = Yi ,
M i=1
and
1
2
S2,i = (yi,k − Y i )2 ,
Ni − 1
k∈Ui
with
Yi
Yi = ,
Ni
and
Yi = yi,k .
k∈Ui
This variance can be estimated without bias by:
m s2T M 2 ni s22,i
Y ) = M 2 1 −
var( + Ni 1 − ,
M m m N i ni
i∈S
where
1 1 2
s2T = (Yi − Yi ) ,
m−1 m
i∈S i∈S
and
(yi,k − Y i )2 ,
1
s22,i =
ni − 1
k∈Si
with
Yi = Ni Y i ,
and
1
Y i = yi,k .
ni
k∈Si
5.3 Specific case of sampling of PU with replacement
When the primary units are selected with replacement, we have a remarkable
result. Denoting m as the sample size of PU, j as the order number of the
drawing and ij as the identifier of the PU selected at the jth drawing, and
denoting:
, 5.4 Cluster effect 161
• pi as the sampling probability of PU i at the time of any drawing
M
pi = 1.
i=1
• Yi as the unbiased estimator of the true total Yi (expression as a function
of the sampling design within PU i).
We then estimate without bias the true total with the Hansen-Hurwitz esti-
mator:
1 Yij
m
YHH = ,
m j=1 pij
and we estimate without bias its variance by:
2
1 m
Yij
YHH
var = − YHH .
m(m − 1) j=1 pij
This very simple expression is valid for whatever sampling design used within
the PU (we only require that Yi be unbiased for Yi ).
5.4 Cluster effect
We thus indicate the phenomenon conveying a certain ‘similarity’ among the
individuals of the same PU, in comparison with the variable of interest y. We
can formalise this by:
7M 7Ni 7Ni
i=1 k=1 =1 (yi,k − Y )(yi, − Y )
=k 1
ρ= 7M 7 ,
i=1 k∈Ui (yi,k −Y )2 N −1
where
N
N=.
M
With simple random sampling without replacement at each of the two stages
and with the PU of same size, we show that
Sy2
var(Y ) = N 2 (1 + ρ(n̄ − 1))
mn̄
as soon as ni = n̄ for all PU i (and that we neglect the sampling rate of PU).
The cluster effect increases the variance, especially since n̄ is large.
, 162 5 Multi-stage Sampling
EXERCISES
Exercise 5.1 Hard disk
On a micro-computer hard disk, we count 400 files, each one consisting of
exactly 50 records. To estimate the average number of characters per record,
we decide to sample using simple random sampling 80 files, then 5 records in
each file. We denote: m = 80 and n = 5. After sampling we find:
• the sample variance of the estimators for the total number of characters
per file, which is s2T = 905 000 ;
• the mean of the m sample variances s22,i is equal to 805, where s22,i repre-
sents the variance for the number of characters per record in file i.
1. How do we estimate without bias the mean number Y of characters per
record?
2. How do we estimate without bias the accuracy of the previous estimator?
3. Give a 95% confidence interval for Y .
Solution
1. We denote yi,k as the number of characters in record k of file i. We have
1 1 1
M M M
Y = yi,k = N Yi = Y i,
N i=1 N i=1 M i=1
k∈Ui
where
• M = 400 is the number of files (primary units),
• N = 50 is the number of records per file,
• N = M × N = 400 × 50 = 20000 is the total number of records,
• Y i is the mean number of characters per record in file i,
• Ui is the set of identifiers for the records of file i.
We estimate Y without bias by
Y 1 Yi
Y = = ,
N N m/M
i∈S1
where
• S1 is the sample of files,
• Yi is the unbiased estimator of the total number of characters in file i
yi,k N
Yi = = yi,k ,
n̄/N n̄
k∈Si k∈S i
Multi-stage Sampling
5.1 Definitions
We consider a partitioning of the population U into M parts, called primary
units (PU). Each PU is itself partitioned into Ni parts, called secondary units
(SU), identified by the pair (i, k), where k varies from 1 to Ni . The population
of secondary units in PU i is denoted Ui . It is possible to repartition each SU
and to iterate this process. We sample m PU (sample S) then, in general in-
dependently from one PU to another, we sample ni SU in PU i if it is sampled
(sample Si ): we say that we are faced with sampling of two stages. If this final
stage is sampled exhaustively, the sampling is called ‘cluster sampling’.
5.2 Estimator, variance decomposition, and variance
In a two-stage sampling design without replacement, if PU i is selected with
inclusion probability πi , and if SU (i, k) that it contains is selected with prob-
ability πk|i , then we estimate the total
M
Y = yi,k
i=1 k∈Ui
without bias by yi,k
Y = .
πi πk|i
i∈S k∈Si
The variance var(Y ) is the sum of two terms, knowing the ‘inter-class’ variance
var1 (E2|1 (Y )) and the ‘intra-class’ variance E1 (var2|1 (Y )), where 1 and 2 are
the indices representing the two successive sampling stages. In the case of a
simple random sample at each stage, when ni only depends on i, we show
that:
,160 5 Multi-stage Sampling
2
m ST2 M 2
M
ni S2,i
var(Y ) = M 2 1 − + Ni 1 − ,
M m m i=1 Ni ni
where
1
M
ST2 = (Yi − Y )2 ,
M − 1 i=1
1
M
Y = Yi ,
M i=1
and
1
2
S2,i = (yi,k − Y i )2 ,
Ni − 1
k∈Ui
with
Yi
Yi = ,
Ni
and
Yi = yi,k .
k∈Ui
This variance can be estimated without bias by:
m s2T M 2 ni s22,i
Y ) = M 2 1 −
var( + Ni 1 − ,
M m m N i ni
i∈S
where
1 1 2
s2T = (Yi − Yi ) ,
m−1 m
i∈S i∈S
and
(yi,k − Y i )2 ,
1
s22,i =
ni − 1
k∈Si
with
Yi = Ni Y i ,
and
1
Y i = yi,k .
ni
k∈Si
5.3 Specific case of sampling of PU with replacement
When the primary units are selected with replacement, we have a remarkable
result. Denoting m as the sample size of PU, j as the order number of the
drawing and ij as the identifier of the PU selected at the jth drawing, and
denoting:
, 5.4 Cluster effect 161
• pi as the sampling probability of PU i at the time of any drawing
M
pi = 1.
i=1
• Yi as the unbiased estimator of the true total Yi (expression as a function
of the sampling design within PU i).
We then estimate without bias the true total with the Hansen-Hurwitz esti-
mator:
1 Yij
m
YHH = ,
m j=1 pij
and we estimate without bias its variance by:
2
1 m
Yij
YHH
var = − YHH .
m(m − 1) j=1 pij
This very simple expression is valid for whatever sampling design used within
the PU (we only require that Yi be unbiased for Yi ).
5.4 Cluster effect
We thus indicate the phenomenon conveying a certain ‘similarity’ among the
individuals of the same PU, in comparison with the variable of interest y. We
can formalise this by:
7M 7Ni 7Ni
i=1 k=1 =1 (yi,k − Y )(yi, − Y )
=k 1
ρ= 7M 7 ,
i=1 k∈Ui (yi,k −Y )2 N −1
where
N
N=.
M
With simple random sampling without replacement at each of the two stages
and with the PU of same size, we show that
Sy2
var(Y ) = N 2 (1 + ρ(n̄ − 1))
mn̄
as soon as ni = n̄ for all PU i (and that we neglect the sampling rate of PU).
The cluster effect increases the variance, especially since n̄ is large.
, 162 5 Multi-stage Sampling
EXERCISES
Exercise 5.1 Hard disk
On a micro-computer hard disk, we count 400 files, each one consisting of
exactly 50 records. To estimate the average number of characters per record,
we decide to sample using simple random sampling 80 files, then 5 records in
each file. We denote: m = 80 and n = 5. After sampling we find:
• the sample variance of the estimators for the total number of characters
per file, which is s2T = 905 000 ;
• the mean of the m sample variances s22,i is equal to 805, where s22,i repre-
sents the variance for the number of characters per record in file i.
1. How do we estimate without bias the mean number Y of characters per
record?
2. How do we estimate without bias the accuracy of the previous estimator?
3. Give a 95% confidence interval for Y .
Solution
1. We denote yi,k as the number of characters in record k of file i. We have
1 1 1
M M M
Y = yi,k = N Yi = Y i,
N i=1 N i=1 M i=1
k∈Ui
where
• M = 400 is the number of files (primary units),
• N = 50 is the number of records per file,
• N = M × N = 400 × 50 = 20000 is the total number of records,
• Y i is the mean number of characters per record in file i,
• Ui is the set of identifiers for the records of file i.
We estimate Y without bias by
Y 1 Yi
Y = = ,
N N m/M
i∈S1
where
• S1 is the sample of files,
• Yi is the unbiased estimator of the total number of characters in file i
yi,k N
Yi = = yi,k ,
n̄/N n̄
k∈Si k∈S i