7
Calibration with Several Auxiliary Variables
7.1 Calibration estimation
The totals of p auxiliary variables x1 , ..., xp are assumed to be known for the
population U . Let us consider the vector xk = (xk1 , ..., xkj , ..., xkp ) of values
taken by the p auxiliary variables on unit k. The total
X= xk
k∈U
is assumed to be known. The objective is always to estimate the total
Y = yk ,
k∈U
using the information given by X. Furthermore, we denote
yk xk
Yπ = , and X π = ,
πk πk
k∈S k∈S
the Horvitz-Thompson estimators of Y and X. The general idea of calibration
methods (see on this topic Deville and Särndal, 1992) consists of defining
weights wk , k ∈ S, which benefit from a calibration property, or in other
words which are such that
wk xk = xk . (7.1)
k∈S k∈U
To obtain such weights, we minimise a pseudo-distance Gk (., .) between wk
and dk = 1/πk ,
Gk (wk , dk )
min ,
wk qk
k∈S
under the constraints of calibration given in (7.1). The weights qk , k ∈ S, form
a set of strictly positive known coefficients. The function Gk (., .) is assumed
,264 7 Calibration with Several Auxiliary Variables
to be strictly convex, positive and such that Gk (dk , dk ) = 0. The weights wk
are then defined by
wk = dk Fk (λ xk ),
where dk Fk (.) is the reciprocal of the function gk (., dk )/qk , with
∂Gk (wk , dk )
gk (wk , dk ) = ,
∂wk
and λ is the Lagrange multiplier following from the constraints. The vector λ
is obtained by solving the calibration equations:
dk Fk (λ xk )xk = xk .
k∈S k∈U
7.2 Generalised regression estimation
If the function Gk (., .) is chi-square,
(wk − dk )2
Gk (wk , dk ) = ,
dk
then the calibrated estimator is equal to the generalised regression estimator
which is
Yreg = Yπ + (X − X
π ) b,
where −1
xk x qk xk yk qk
=
b k
.
πk πk
k∈S k∈S
7.3 Marginal calibration
A particularly important case is obtained when the auxiliary variables are the
indicator variables of the strata, and the function Gk (wk , dk ) = wk log(wk /dk ).
We can show that we then obtain weights equivalent to those given by the
calibration algorithm on the margins (also known under the name raking ra-
tio). In the case where the sample leads to a table of real values estimated
Nij , i = 1, . . . , I, and j = 1 . . . , J, and the true marginals Ni. , i = 1, . . . , I,
and N.j , j = 1, . . . , J, of this table are known in the population, the equiva-
lent calibration method consists of adjusting the estimated table successively
by row and by column. The algorithm is thus the following. We initialise by
having:
(0)
Nij = N ij , for all i = 1, . . . I, j = 1, . . . J.
Next, we successively adjust the rows and columns. For t = 1, 2, 3, . . .
, Exercise 7.1 265
(2t−1) (2t−2) Ni.
Nij = Nij 7 (2t−2)
, for all i = 1, . . . I, j = 1, . . . J,
j Nij
(2t) (2t−1) N.j
Nij = Nij 7 (2t−1)
, for all i = 1, . . . I, j = 1, . . . J.
i Nij
ij is not composed of null values.
The algorithm rapidly converges if the table N
EXERCISES
Exercise 7.1 Adjustment of a table on the margins
Using a sampling procedure, we get the Horvitz-Thompson estimators N ij
from a contingency table (see Table 7.1). Now, the margins of this table are
Table 7.1. Table obtained through sampling: Exercise 7.1
80 170 150 400
90 80 210 380
10 80 130 220
180 330 490 1000
known for the entire population. The true totals of the rows are (430, 360, 210),
and the true totals of the columns (150, 300, 550). Adjust the table obtained
using sampling on the known margins of the population with the ‘raking ratio’
method.
Solution
We start indiscriminately with an adjustment on the rows or on the columns.
Here, we chose to start with an adjustment by row.
Calibration by row: iteration 1
86.00 182.75 161.25 430.00
85.26 75.79 198.95 360.00
9.55 76.36 124.09 210.00
180.81 334.90 484.29 1000.00
Next, we adjust on the columns.
Calibration by column: iteration 2
71.35 163.70 183.13 418.18
70.73 67.89 225.94 364.57
7.92 68.41 140.93 217.25
150.00 300.00 550.00 1000.00
Calibration with Several Auxiliary Variables
7.1 Calibration estimation
The totals of p auxiliary variables x1 , ..., xp are assumed to be known for the
population U . Let us consider the vector xk = (xk1 , ..., xkj , ..., xkp ) of values
taken by the p auxiliary variables on unit k. The total
X= xk
k∈U
is assumed to be known. The objective is always to estimate the total
Y = yk ,
k∈U
using the information given by X. Furthermore, we denote
yk xk
Yπ = , and X π = ,
πk πk
k∈S k∈S
the Horvitz-Thompson estimators of Y and X. The general idea of calibration
methods (see on this topic Deville and Särndal, 1992) consists of defining
weights wk , k ∈ S, which benefit from a calibration property, or in other
words which are such that
wk xk = xk . (7.1)
k∈S k∈U
To obtain such weights, we minimise a pseudo-distance Gk (., .) between wk
and dk = 1/πk ,
Gk (wk , dk )
min ,
wk qk
k∈S
under the constraints of calibration given in (7.1). The weights qk , k ∈ S, form
a set of strictly positive known coefficients. The function Gk (., .) is assumed
,264 7 Calibration with Several Auxiliary Variables
to be strictly convex, positive and such that Gk (dk , dk ) = 0. The weights wk
are then defined by
wk = dk Fk (λ xk ),
where dk Fk (.) is the reciprocal of the function gk (., dk )/qk , with
∂Gk (wk , dk )
gk (wk , dk ) = ,
∂wk
and λ is the Lagrange multiplier following from the constraints. The vector λ
is obtained by solving the calibration equations:
dk Fk (λ xk )xk = xk .
k∈S k∈U
7.2 Generalised regression estimation
If the function Gk (., .) is chi-square,
(wk − dk )2
Gk (wk , dk ) = ,
dk
then the calibrated estimator is equal to the generalised regression estimator
which is
Yreg = Yπ + (X − X
π ) b,
where −1
xk x qk xk yk qk
=
b k
.
πk πk
k∈S k∈S
7.3 Marginal calibration
A particularly important case is obtained when the auxiliary variables are the
indicator variables of the strata, and the function Gk (wk , dk ) = wk log(wk /dk ).
We can show that we then obtain weights equivalent to those given by the
calibration algorithm on the margins (also known under the name raking ra-
tio). In the case where the sample leads to a table of real values estimated
Nij , i = 1, . . . , I, and j = 1 . . . , J, and the true marginals Ni. , i = 1, . . . , I,
and N.j , j = 1, . . . , J, of this table are known in the population, the equiva-
lent calibration method consists of adjusting the estimated table successively
by row and by column. The algorithm is thus the following. We initialise by
having:
(0)
Nij = N ij , for all i = 1, . . . I, j = 1, . . . J.
Next, we successively adjust the rows and columns. For t = 1, 2, 3, . . .
, Exercise 7.1 265
(2t−1) (2t−2) Ni.
Nij = Nij 7 (2t−2)
, for all i = 1, . . . I, j = 1, . . . J,
j Nij
(2t) (2t−1) N.j
Nij = Nij 7 (2t−1)
, for all i = 1, . . . I, j = 1, . . . J.
i Nij
ij is not composed of null values.
The algorithm rapidly converges if the table N
EXERCISES
Exercise 7.1 Adjustment of a table on the margins
Using a sampling procedure, we get the Horvitz-Thompson estimators N ij
from a contingency table (see Table 7.1). Now, the margins of this table are
Table 7.1. Table obtained through sampling: Exercise 7.1
80 170 150 400
90 80 210 380
10 80 130 220
180 330 490 1000
known for the entire population. The true totals of the rows are (430, 360, 210),
and the true totals of the columns (150, 300, 550). Adjust the table obtained
using sampling on the known margins of the population with the ‘raking ratio’
method.
Solution
We start indiscriminately with an adjustment on the rows or on the columns.
Here, we chose to start with an adjustment by row.
Calibration by row: iteration 1
86.00 182.75 161.25 430.00
85.26 75.79 198.95 360.00
9.55 76.36 124.09 210.00
180.81 334.90 484.29 1000.00
Next, we adjust on the columns.
Calibration by column: iteration 2
71.35 163.70 183.13 418.18
70.73 67.89 225.94 364.57
7.92 68.41 140.93 217.25
150.00 300.00 550.00 1000.00