3
Sampling with Unequal Probabilities
3.1 Calculation of inclusion probabilities
If we have an auxiliary variable xk > 0, k ∈ U , ‘sufficiently’ proportional to the
variable yk , it is often interesting to select the units with unequal probabilities
proportional to xk . To do this, we first calculate the inclusion probabilities
according to
xk
πk = n . (3.1)
x
∈U
If Expression (3.1) gives πk > 1, the corresponding units are selected in the
sample (with an inclusion probability equal to 1), and we then recalculate the
πk according to (3.1) on the remaining units.
3.2 Estimation and variance
The Horvitz-Thompson estimator of the total is
yk
Yπ = ,
πk
k∈S
and its variance is: yk y
var(Yπ ) = ∆k ,
πk π
k∈U ∈U
where ∆k = πk − πk π , and πk is the second-order inclusion probability.
If k = , then πkk = πk . To obtain a positive estimate of the variance (see
page 4), a sufficient constraint is to have ∆k ≤ 0 for all k = in U. This
constraint is called the Sen-Yates-Grundy constraint.
There exist several algorithms that allow for the selection of units with un-
equal probabilities. Two books give a brief overview of such methods: Brewer
,60 3 Sampling with Unequal Probabilities
and Hanif (1983) and Gabler (1990). The most well-known methods are sys-
tematic sampling (Madow, 1948), sampling with replacement (Hansen and
Hurwitz, 1943), the method of Sunter (1977) and Sunter (1986). As well, the
method of Brewer (1975) presents an interesting approach. The representation
through a splitting method (see on this topic Deville and Tillé, 1998) allows
for the rewriting of methods in a standardised manner and the creation of
new algorithms.
EXERCISES
Exercise 3.1 Design and inclusion probabilities
Let there exist a population U = {1, 2, 3} with the following design:
1 1 1
p({1, 2}) = , p({1, 3}) = , p({2, 3}) = .
2 4 4
Give the first-order inclusion probabilities. Give the variance-covariance ma-
trix ∆ of indicator variables for inclusion in the sample. Give the variance
matrix of the unbiased estimator for the total.
Solution
Clearly, we have:
3 3 1
π1 = , π2 = , π3 = .
4 4 2
Notice that π1 + π2 + π3 = 2. In fact, the design is of fixed size and n = 2.
Finally, we directly obtain the
πk − πk π if k =
∆k = cov(Ik , I ) =
πk (1 − πk ) if k =
3 3 3 1 3 3 −1
∆11 = 1− = , ∆12 = − × = ,
4 4 16 2 4 4 16
1 3 1 −1 3 3 3
∆13 = − × = , ∆22 = 1− = ,
4 4 2 8 4 4 16
1 3 1 −1 1 1 1
∆23 = − × = , ∆33 = 1− = ,
4 4 2 8 2 2 4
which gives the positive symmetric matrix:
⎛ ⎞
3/16 −1/16 −1/8
∆ = ⎝−1/16 3/16 −1/8⎠ .
−1/8 −1/8 1/4
If we denote u as the column vector of yk /πk , k = 1, . . . , N, and 1 as the
column vector of Ik , k = 1, . . . , N, we have
, Exercise 3.3 61
yk
var = var(u 1) = u var(1)u = u ∆u.
πk
k∈S
Exercise 3.2 Variance of indicators and design of fixed size
Given a sampling design for a population U , we denote Ik as the random
indicator variable for the presence of unit k in the sample, and
var(Ik ) if = k
∆k =
cov(Ik , I ) if k = .
Show that if
∆k = 0,
k∈U ∈U
then the design is of fixed size.
Solution
Denoting nS as the size, a priori random, of the sample S:
∆k = cov(Ik , I ) = var Ik = var(nS ).
k∈U ∈U k∈U ∈U k∈U
var(nS ) = 0 implies that the design is of fixed size.
Exercise 3.3 Variance of indicators and sampling design
Consider the variance-covariance matrix ∆ = [∆k ] of indicators for the pres-
ence of observation units in the sample for a design p(s),
⎛ ⎞
1 1 1 −1 −1
⎜ 1 1 1 −1 −1⎟
⎜ ⎟ 6
∆=⎜ ⎟
⎜ 1 1 1 −1 −1⎟ × 25 .
⎝−1 −1 −1 1 1 ⎠
−1 −1 −1 1 1
1. Is this a design of fixed size?
2. Does this design satisfy the Sen-Yates-Grundy constraints?
3. Calculate the inclusion probabilities of this design knowing that
π1 = π2 = π3 > π4 = π5 .
4. Give the second-order inclusion probability matrix.
5. Give the probabilities associated with all possible samples.
, 62 3 Sampling with Unequal Probabilities
Solution
1. If we denote Ik as the indicator random variable for the presence of unit
k in the sample, we have:
∆k = cov (Ik , I ) .
If the design is of fixed size,
Ik = n,
k∈U
(with n fixed). We then have, for all ∈ U :
∆k = cov (Ik , I ) = cov Ik , I = cov (n, I ) = 0.
k∈U k∈U k∈U
In a design of fixed size, the sum of all rows and the sum of all columns in
∆k are null. We immediately confirm that this is not the case here, and
thus the design is not of fixed size.
2. No, because we have some ∆k > 0 for k = .
3. Since var(Ik ) = πk (1 − πk ) = 6/25 for all k, we have
6
πk2 − πk + = 0.
25
Therefore
1± 1−4× 6
25 1 ± 15
πk = = ,
2 2
and
3 2
π1 = π2 = π3 = > π4 = π5 = .
5 5
4. Since πk = ∆k + πk π , for all k, ∈ U, if we let π be the column vector
of πk , k ∈ U, the second-order inclusion probability matrix is:
Π = ∆ + ππ
⎛ ⎞ ⎛ ⎞
1 1 1 −1 −1 99 966
⎜ 1 1 1 −1 −1⎟ ⎜9 9 9 6 6⎟
⎜ ⎟ 6 ⎜ ⎟ 1
=⎜ ⎟ ⎜
⎜ 1 1 1 −1 −1⎟ × 25 + ⎜9 9 9 6 6⎟
⎟ × 25
⎝−1 −1 −1 1 1 ⎠ ⎝6 6 6 4 4⎠
−1 −1 −1 1 1 66 644
⎛ ⎞
33300
⎜3 3 3 0 0 ⎟
⎜ ⎟ 1
=⎜ ⎟
⎜3 3 3 0 0 ⎟ × 5 .
⎝0 0 0 2 2 ⎠
00022
Sampling with Unequal Probabilities
3.1 Calculation of inclusion probabilities
If we have an auxiliary variable xk > 0, k ∈ U , ‘sufficiently’ proportional to the
variable yk , it is often interesting to select the units with unequal probabilities
proportional to xk . To do this, we first calculate the inclusion probabilities
according to
xk
πk = n . (3.1)
x
∈U
If Expression (3.1) gives πk > 1, the corresponding units are selected in the
sample (with an inclusion probability equal to 1), and we then recalculate the
πk according to (3.1) on the remaining units.
3.2 Estimation and variance
The Horvitz-Thompson estimator of the total is
yk
Yπ = ,
πk
k∈S
and its variance is: yk y
var(Yπ ) = ∆k ,
πk π
k∈U ∈U
where ∆k = πk − πk π , and πk is the second-order inclusion probability.
If k = , then πkk = πk . To obtain a positive estimate of the variance (see
page 4), a sufficient constraint is to have ∆k ≤ 0 for all k = in U. This
constraint is called the Sen-Yates-Grundy constraint.
There exist several algorithms that allow for the selection of units with un-
equal probabilities. Two books give a brief overview of such methods: Brewer
,60 3 Sampling with Unequal Probabilities
and Hanif (1983) and Gabler (1990). The most well-known methods are sys-
tematic sampling (Madow, 1948), sampling with replacement (Hansen and
Hurwitz, 1943), the method of Sunter (1977) and Sunter (1986). As well, the
method of Brewer (1975) presents an interesting approach. The representation
through a splitting method (see on this topic Deville and Tillé, 1998) allows
for the rewriting of methods in a standardised manner and the creation of
new algorithms.
EXERCISES
Exercise 3.1 Design and inclusion probabilities
Let there exist a population U = {1, 2, 3} with the following design:
1 1 1
p({1, 2}) = , p({1, 3}) = , p({2, 3}) = .
2 4 4
Give the first-order inclusion probabilities. Give the variance-covariance ma-
trix ∆ of indicator variables for inclusion in the sample. Give the variance
matrix of the unbiased estimator for the total.
Solution
Clearly, we have:
3 3 1
π1 = , π2 = , π3 = .
4 4 2
Notice that π1 + π2 + π3 = 2. In fact, the design is of fixed size and n = 2.
Finally, we directly obtain the
πk − πk π if k =
∆k = cov(Ik , I ) =
πk (1 − πk ) if k =
3 3 3 1 3 3 −1
∆11 = 1− = , ∆12 = − × = ,
4 4 16 2 4 4 16
1 3 1 −1 3 3 3
∆13 = − × = , ∆22 = 1− = ,
4 4 2 8 4 4 16
1 3 1 −1 1 1 1
∆23 = − × = , ∆33 = 1− = ,
4 4 2 8 2 2 4
which gives the positive symmetric matrix:
⎛ ⎞
3/16 −1/16 −1/8
∆ = ⎝−1/16 3/16 −1/8⎠ .
−1/8 −1/8 1/4
If we denote u as the column vector of yk /πk , k = 1, . . . , N, and 1 as the
column vector of Ik , k = 1, . . . , N, we have
, Exercise 3.3 61
yk
var = var(u 1) = u var(1)u = u ∆u.
πk
k∈S
Exercise 3.2 Variance of indicators and design of fixed size
Given a sampling design for a population U , we denote Ik as the random
indicator variable for the presence of unit k in the sample, and
var(Ik ) if = k
∆k =
cov(Ik , I ) if k = .
Show that if
∆k = 0,
k∈U ∈U
then the design is of fixed size.
Solution
Denoting nS as the size, a priori random, of the sample S:
∆k = cov(Ik , I ) = var Ik = var(nS ).
k∈U ∈U k∈U ∈U k∈U
var(nS ) = 0 implies that the design is of fixed size.
Exercise 3.3 Variance of indicators and sampling design
Consider the variance-covariance matrix ∆ = [∆k ] of indicators for the pres-
ence of observation units in the sample for a design p(s),
⎛ ⎞
1 1 1 −1 −1
⎜ 1 1 1 −1 −1⎟
⎜ ⎟ 6
∆=⎜ ⎟
⎜ 1 1 1 −1 −1⎟ × 25 .
⎝−1 −1 −1 1 1 ⎠
−1 −1 −1 1 1
1. Is this a design of fixed size?
2. Does this design satisfy the Sen-Yates-Grundy constraints?
3. Calculate the inclusion probabilities of this design knowing that
π1 = π2 = π3 > π4 = π5 .
4. Give the second-order inclusion probability matrix.
5. Give the probabilities associated with all possible samples.
, 62 3 Sampling with Unequal Probabilities
Solution
1. If we denote Ik as the indicator random variable for the presence of unit
k in the sample, we have:
∆k = cov (Ik , I ) .
If the design is of fixed size,
Ik = n,
k∈U
(with n fixed). We then have, for all ∈ U :
∆k = cov (Ik , I ) = cov Ik , I = cov (n, I ) = 0.
k∈U k∈U k∈U
In a design of fixed size, the sum of all rows and the sum of all columns in
∆k are null. We immediately confirm that this is not the case here, and
thus the design is not of fixed size.
2. No, because we have some ∆k > 0 for k = .
3. Since var(Ik ) = πk (1 − πk ) = 6/25 for all k, we have
6
πk2 − πk + = 0.
25
Therefore
1± 1−4× 6
25 1 ± 15
πk = = ,
2 2
and
3 2
π1 = π2 = π3 = > π4 = π5 = .
5 5
4. Since πk = ∆k + πk π , for all k, ∈ U, if we let π be the column vector
of πk , k ∈ U, the second-order inclusion probability matrix is:
Π = ∆ + ππ
⎛ ⎞ ⎛ ⎞
1 1 1 −1 −1 99 966
⎜ 1 1 1 −1 −1⎟ ⎜9 9 9 6 6⎟
⎜ ⎟ 6 ⎜ ⎟ 1
=⎜ ⎟ ⎜
⎜ 1 1 1 −1 −1⎟ × 25 + ⎜9 9 9 6 6⎟
⎟ × 25
⎝−1 −1 −1 1 1 ⎠ ⎝6 6 6 4 4⎠
−1 −1 −1 1 1 66 644
⎛ ⎞
33300
⎜3 3 3 0 0 ⎟
⎜ ⎟ 1
=⎜ ⎟
⎜3 3 3 0 0 ⎟ × 5 .
⎝0 0 0 2 2 ⎠
00022