2
Simple Random Sampling
2.1 Simple random sampling without replacement
A design is simple without replacement of fixed size n if and only if, for all s,
N −1
if #s = n
p(s) = n
0 otherwise,
or
N N!
= .
n n!(N − n)!
We can derive the inclusion probabilities
n n(n − 1)
πk = , and πk = .
N N (N − 1)
Finally,
n(N − n) 1
if k =
∆k = × −1
N2 if k = .
N −1
The Horvitz-Thompson estimator of the total becomes
N
Yπ = yk .
n
k∈S
That for the mean is written as
1
Y π = yk .
n
k∈S
The variance of Yπ is
n Sy2
var(Yπ ) = N 2 1 − ,
N n
,6 2 Simple Random Sampling
and its unbiased estimator
n s2y
Yπ ) = N 2 (1 −
var( ) ,
N n
where 2
1
s2y = yk − Y π .
n−1
k∈S
The Horvitz-Thompson estimator of the proportion PD that represents a sub-
population D in the total population is
nD
p= ,
n
where nD = #(S ∩ D), and p is the proportion of individuals of D in S. We
verify:
n PD (1 − PD ) N
var(p) = 1 − ,
N n N −1
and we estimate without bias this variance by
n p(1 − p)
var(p) = 1− .
N n−1
2.2 Simple random sampling with replacement
If m units are selected with replacement and with equal probabilities at each
trial in the population U , then we define ỹi as the value of the variable y for
the i-th selected unit in the sample. We can select the same unit many times
in the sample. The mean estimator
1
m
Y W R = ỹi ,
m i=1
is unbiased, and its variance is
σy2
var(Y W R ) =
.
m
In a simple design with replacement, the sample variance
1
m
s̃2y = (ỹi − Y W R )2 ,
m − 1 i=1
estimates σy2 without bias. It is possible however to show that if we are inter-
ested in nS units of sample S for distinct units, then the estimator
1
Y DU = yk ,
nS
k∈S
is unbiased for the mean and has a smaller variance than that of Y W R . Ta-
ble 2.1 presents a summary of the main results under simple designs.
, Exercise 2.1 7
Table 2.1. Simple designs : summary table
Simple sampling design Without replacement With replacement
Sample size n m
1 1
m
Mean estimator Y = yk Y W R = ỹi
n k∈S m i=1
(N − n) 2 σy2
Variance of the mean estimator var Y = Sy var Y W R =
nN m
Expected sample variance E s2y = Sy2 E s2y = σy2
(N − n) 2 s2y
Variance estimator of the mean Y
var = Y W R =
sy var
nN m
estimator
EXERCISES
Exercise 2.1 Cultivated surface area
We want to estimate the surface area cultivated on the farms of a rural town-
ship. Of the N = 2010 farms that comprise the township, we select 100 using
simple random sampling. We measure yk , the surface area cultivated on the
farm k in hectares, and we find
yk = 2907 ha and yk2 = 154593 ha2 .
k∈S k∈S
1. Give the value of the standard unbiased estimator of the mean
1
Y = yk .
N
k∈U
2. Give a 95 % confidence interval for Y .
Solution
In a simple design, the unbiased estimator of Y is
1
Y =
2907
yk = = 29.07 ha.
n 100
k∈S
The estimator of the dispersion Sy2 is
n 1 2 2 100 154593
2
sy = yk − Y = − 29.072 = 707.945.
n−1 n 99 100
k∈S
, 8 2 Simple Random Sampling
The sample size n being ‘sufficiently large’, the 95% confidence interval is
estimated in hectares as follows:
N − n s2y 2010 − 100 707.45
Y ± 1.96 = 29.07 ± 1.96 ×
N n 2010 100
= [23.99; 34.15] .
Exercise 2.2 Occupational sickness
We are interested in estimating the proportion of men P affected by an oc-
cupational sickness in a business of 1500 workers. In addition, we know that
three out of 10 workers are usually affected by this sickness in businesses of
the same type. We propose to select a sample by means of a simple random
sample.
1. What sample size must be selected so that the total length of a confidence
interval with a 0.95 confidence level is less than 0.02 for simple designs
with replacement and without replacement ?
2. What should we do if we do not know the proportion of men usually
affected by the sickness (for the case of a design without replacement) ?
To avoid confusions in notation, we will use the subscript W R for estimators
with replacement, and the subscript W OR for estimators without replace-
ment.
Solution
1. a) Design with replacement.
If the design is of size m, the length of the (estimated) confidence
interval at a level (1 − α) for a mean is given by
s̃2y s̃2y
CI(1 − α) = Y − z1−α/2 , Y + z1−α/2 ,
m m
where z1−α/2 is the quantile of order 1 − α/2 of a random normal stan-
dardised variate. If we denote PW R as the estimator of the proportion
for the design with replacement, we can write
⎡
PW R (1 − PW R )
CI(1 − α) = ⎣PW R − z1−α/2 ,
m−1
⎤
PW R (1 − PW R )
PW R + z1−α/2 ⎦.
m−1
Simple Random Sampling
2.1 Simple random sampling without replacement
A design is simple without replacement of fixed size n if and only if, for all s,
N −1
if #s = n
p(s) = n
0 otherwise,
or
N N!
= .
n n!(N − n)!
We can derive the inclusion probabilities
n n(n − 1)
πk = , and πk = .
N N (N − 1)
Finally,
n(N − n) 1
if k =
∆k = × −1
N2 if k = .
N −1
The Horvitz-Thompson estimator of the total becomes
N
Yπ = yk .
n
k∈S
That for the mean is written as
1
Y π = yk .
n
k∈S
The variance of Yπ is
n Sy2
var(Yπ ) = N 2 1 − ,
N n
,6 2 Simple Random Sampling
and its unbiased estimator
n s2y
Yπ ) = N 2 (1 −
var( ) ,
N n
where 2
1
s2y = yk − Y π .
n−1
k∈S
The Horvitz-Thompson estimator of the proportion PD that represents a sub-
population D in the total population is
nD
p= ,
n
where nD = #(S ∩ D), and p is the proportion of individuals of D in S. We
verify:
n PD (1 − PD ) N
var(p) = 1 − ,
N n N −1
and we estimate without bias this variance by
n p(1 − p)
var(p) = 1− .
N n−1
2.2 Simple random sampling with replacement
If m units are selected with replacement and with equal probabilities at each
trial in the population U , then we define ỹi as the value of the variable y for
the i-th selected unit in the sample. We can select the same unit many times
in the sample. The mean estimator
1
m
Y W R = ỹi ,
m i=1
is unbiased, and its variance is
σy2
var(Y W R ) =
.
m
In a simple design with replacement, the sample variance
1
m
s̃2y = (ỹi − Y W R )2 ,
m − 1 i=1
estimates σy2 without bias. It is possible however to show that if we are inter-
ested in nS units of sample S for distinct units, then the estimator
1
Y DU = yk ,
nS
k∈S
is unbiased for the mean and has a smaller variance than that of Y W R . Ta-
ble 2.1 presents a summary of the main results under simple designs.
, Exercise 2.1 7
Table 2.1. Simple designs : summary table
Simple sampling design Without replacement With replacement
Sample size n m
1 1
m
Mean estimator Y = yk Y W R = ỹi
n k∈S m i=1
(N − n) 2 σy2
Variance of the mean estimator var Y = Sy var Y W R =
nN m
Expected sample variance E s2y = Sy2 E s2y = σy2
(N − n) 2 s2y
Variance estimator of the mean Y
var = Y W R =
sy var
nN m
estimator
EXERCISES
Exercise 2.1 Cultivated surface area
We want to estimate the surface area cultivated on the farms of a rural town-
ship. Of the N = 2010 farms that comprise the township, we select 100 using
simple random sampling. We measure yk , the surface area cultivated on the
farm k in hectares, and we find
yk = 2907 ha and yk2 = 154593 ha2 .
k∈S k∈S
1. Give the value of the standard unbiased estimator of the mean
1
Y = yk .
N
k∈U
2. Give a 95 % confidence interval for Y .
Solution
In a simple design, the unbiased estimator of Y is
1
Y =
2907
yk = = 29.07 ha.
n 100
k∈S
The estimator of the dispersion Sy2 is
n 1 2 2 100 154593
2
sy = yk − Y = − 29.072 = 707.945.
n−1 n 99 100
k∈S
, 8 2 Simple Random Sampling
The sample size n being ‘sufficiently large’, the 95% confidence interval is
estimated in hectares as follows:
N − n s2y 2010 − 100 707.45
Y ± 1.96 = 29.07 ± 1.96 ×
N n 2010 100
= [23.99; 34.15] .
Exercise 2.2 Occupational sickness
We are interested in estimating the proportion of men P affected by an oc-
cupational sickness in a business of 1500 workers. In addition, we know that
three out of 10 workers are usually affected by this sickness in businesses of
the same type. We propose to select a sample by means of a simple random
sample.
1. What sample size must be selected so that the total length of a confidence
interval with a 0.95 confidence level is less than 0.02 for simple designs
with replacement and without replacement ?
2. What should we do if we do not know the proportion of men usually
affected by the sickness (for the case of a design without replacement) ?
To avoid confusions in notation, we will use the subscript W R for estimators
with replacement, and the subscript W OR for estimators without replace-
ment.
Solution
1. a) Design with replacement.
If the design is of size m, the length of the (estimated) confidence
interval at a level (1 − α) for a mean is given by
s̃2y s̃2y
CI(1 − α) = Y − z1−α/2 , Y + z1−α/2 ,
m m
where z1−α/2 is the quantile of order 1 − α/2 of a random normal stan-
dardised variate. If we denote PW R as the estimator of the proportion
for the design with replacement, we can write
⎡
PW R (1 − PW R )
CI(1 − α) = ⎣PW R − z1−α/2 ,
m−1
⎤
PW R (1 − PW R )
PW R + z1−α/2 ⎦.
m−1