S.Y.B.Sc. –SEM – III Notes (Paper – II)
Course code: USST302
THEORY OF SAMPLING
, Unit I
a) Introduction to Design of Sample Surveys.
A major aspect of any research is the gathering of information which will be used as the basis
for decision-making or to determine the direction of future operations. Researchers often use sample
survey methodology to obtain information about a large population by selecting and measuring a
sample from that population
Sampling survey is the process of selecting a sample from a population according to some
sampling techniques and then collecting data from these selected units and uses them to estimate
characteristics of the entire population.
A sample design or sampling techniques encompasses the rules and operations by which we
select sampling units from the population and the computation of sample statistics, which are estimates
of the population values of interest.
Basic Definitions: -
1) Population: Population refers to the group of objects of interest for study. The collection of
all the sampling units in a given region at a particular point of time is called population. It is an aggregate
of objects, animate or inanimate under study which may be finite or infinite.
E.g. 1) students of K J Somaiya college. 2) stars in the sky.
2) Population unit: Every object of the population is called population unit. Total number of units
in the population is called size of the population. Population size is denoted by ‘N’.
For any statistical investigation, complete enumeration of the population, is rather
impracticable. But even if the population is finite, 100% inspection is not always possible because of
multiplicity of causes such as administrative and financial implications; time factor etc. and we have
to sampling techniques.
3) Sample: A finite subset of sampling units from the population is called a sample and the number of
individuals or sampling units in the sample is called sample size.
4) Sample unit: A sampling unit is an element or an individual in the target population. Total
number of units in the sample is called size of the sample. Sample size is denoted by ‘n’.
5) Census: The complete count of the population is called census. The observations on all the sampling
units in the population are collected in census survey. E.g. In India census is conducted every 10 years in
which observations on all the people staying in India is collected.
6) Sampling frame: Before we use the survey procedures, we should have a well-defined target
population, sampling units and an appropriate sample design or sampling technique. In order to
select a sample according to our sample design, we need to have a list of sampling units in the
population. This is called a sampling frame. E.g. All the students in a college are listed along with
their roll no. and constitutes a sampling frame.
, Concepts in sampling: -
1. Parameter: A statistical constant like mean, variance, etc. which is calculated using values
of all population units is called parameter. A parameter is a value, usually unknown (and which
therefore has to be estimated), used to represent a certain population characteristic. Within a
population, a parameter is a fixed value which does not vary.
2. Statistic: A statistical constant which is calculated using values of only sample units is
called statistic. It is used to give information about unknown values in the corresponding population.
It is possible to draw more than one sample from the same population and the value of a statistic will
in general vary from sample to sample.
3. Estimator: An estimator is a function of sample observation which is used to give
information about an unknown quantity (i.e. parameter) in the population. Therefore, we can say that
an estimator is a statistic.
4. Sampling Distribution: The number of possible samples of size ‘n’ that can be drawn from a
finite population of size ‘N’ is ‘NCn’. For each of these samples, we can compute a statistic say ‘t’.
e.g. mean, variance, etc. which will obviously vary from sample to sample. The aggregate of various
values of statistic may be grouped into frequency distribution which is known as sampling
distribution of statistic.
Bias and Unbiasedness: If expected value of the statistic (estimator) say ‘t’ is equal to
parameter θ, then statistic ‘t’ is said to be unbiased estimator of parameter θ. That is if E(t) = θ i.e. if
E(statistic) = parameter, then estimator ‘t’ is said to be unbiased estimator of parameter θ. An estimator
‘t’ of parameter θ is also denoted by 𝜃̂.
Therefore if E( 𝜃̂ ) = θ ⇒ E( 𝜃̂ ) - θ = 0 ⇒ E( 𝜃̂ - θ ) = 0, then 𝜃̂ is said to be unbiased estimator
of θ.
If E( ̂𝜃 - θ ) ≠ 0, then 𝜃̂ is said to be biased estimator of θ.
∴ Bias = E( 𝜃̂ - θ ) = E( 𝜃̂ ) - θ
The difference 𝜃̂ - θ is called error.
Mean square error: In statistics, the mean squared error or MSE of an estimator is one of
many ways to quantify the amount by which an estimator differs from the true value of the quantity
being estimated. MSE measures the average of the square of the "error." The error is the amount by
which the estimator differs from the quantity to be estimated. The difference occurs because of
randomness or because the estimator doesn't account for information that could produce a more
accurate estimate. The MSE of an estimator θ ˆ with respect to the estimated parameter θ is defined as
MSE( 𝜃̂ ) = E(𝜃̂ - θ )2
The MSE can be written as the sum of the variance and the squared bias of the estimator.
MSE( 𝜃̂ ) = V( 𝜃̂ ) + Bias
For an unbiased estimator, the MSE is the variance. Since MSE is an expectation, it is a
number, and not a random variable. It may be a function of the unknown parameter θ, but it does not
depend on any random quantities.
Course code: USST302
THEORY OF SAMPLING
, Unit I
a) Introduction to Design of Sample Surveys.
A major aspect of any research is the gathering of information which will be used as the basis
for decision-making or to determine the direction of future operations. Researchers often use sample
survey methodology to obtain information about a large population by selecting and measuring a
sample from that population
Sampling survey is the process of selecting a sample from a population according to some
sampling techniques and then collecting data from these selected units and uses them to estimate
characteristics of the entire population.
A sample design or sampling techniques encompasses the rules and operations by which we
select sampling units from the population and the computation of sample statistics, which are estimates
of the population values of interest.
Basic Definitions: -
1) Population: Population refers to the group of objects of interest for study. The collection of
all the sampling units in a given region at a particular point of time is called population. It is an aggregate
of objects, animate or inanimate under study which may be finite or infinite.
E.g. 1) students of K J Somaiya college. 2) stars in the sky.
2) Population unit: Every object of the population is called population unit. Total number of units
in the population is called size of the population. Population size is denoted by ‘N’.
For any statistical investigation, complete enumeration of the population, is rather
impracticable. But even if the population is finite, 100% inspection is not always possible because of
multiplicity of causes such as administrative and financial implications; time factor etc. and we have
to sampling techniques.
3) Sample: A finite subset of sampling units from the population is called a sample and the number of
individuals or sampling units in the sample is called sample size.
4) Sample unit: A sampling unit is an element or an individual in the target population. Total
number of units in the sample is called size of the sample. Sample size is denoted by ‘n’.
5) Census: The complete count of the population is called census. The observations on all the sampling
units in the population are collected in census survey. E.g. In India census is conducted every 10 years in
which observations on all the people staying in India is collected.
6) Sampling frame: Before we use the survey procedures, we should have a well-defined target
population, sampling units and an appropriate sample design or sampling technique. In order to
select a sample according to our sample design, we need to have a list of sampling units in the
population. This is called a sampling frame. E.g. All the students in a college are listed along with
their roll no. and constitutes a sampling frame.
, Concepts in sampling: -
1. Parameter: A statistical constant like mean, variance, etc. which is calculated using values
of all population units is called parameter. A parameter is a value, usually unknown (and which
therefore has to be estimated), used to represent a certain population characteristic. Within a
population, a parameter is a fixed value which does not vary.
2. Statistic: A statistical constant which is calculated using values of only sample units is
called statistic. It is used to give information about unknown values in the corresponding population.
It is possible to draw more than one sample from the same population and the value of a statistic will
in general vary from sample to sample.
3. Estimator: An estimator is a function of sample observation which is used to give
information about an unknown quantity (i.e. parameter) in the population. Therefore, we can say that
an estimator is a statistic.
4. Sampling Distribution: The number of possible samples of size ‘n’ that can be drawn from a
finite population of size ‘N’ is ‘NCn’. For each of these samples, we can compute a statistic say ‘t’.
e.g. mean, variance, etc. which will obviously vary from sample to sample. The aggregate of various
values of statistic may be grouped into frequency distribution which is known as sampling
distribution of statistic.
Bias and Unbiasedness: If expected value of the statistic (estimator) say ‘t’ is equal to
parameter θ, then statistic ‘t’ is said to be unbiased estimator of parameter θ. That is if E(t) = θ i.e. if
E(statistic) = parameter, then estimator ‘t’ is said to be unbiased estimator of parameter θ. An estimator
‘t’ of parameter θ is also denoted by 𝜃̂.
Therefore if E( 𝜃̂ ) = θ ⇒ E( 𝜃̂ ) - θ = 0 ⇒ E( 𝜃̂ - θ ) = 0, then 𝜃̂ is said to be unbiased estimator
of θ.
If E( ̂𝜃 - θ ) ≠ 0, then 𝜃̂ is said to be biased estimator of θ.
∴ Bias = E( 𝜃̂ - θ ) = E( 𝜃̂ ) - θ
The difference 𝜃̂ - θ is called error.
Mean square error: In statistics, the mean squared error or MSE of an estimator is one of
many ways to quantify the amount by which an estimator differs from the true value of the quantity
being estimated. MSE measures the average of the square of the "error." The error is the amount by
which the estimator differs from the quantity to be estimated. The difference occurs because of
randomness or because the estimator doesn't account for information that could produce a more
accurate estimate. The MSE of an estimator θ ˆ with respect to the estimated parameter θ is defined as
MSE( 𝜃̂ ) = E(𝜃̂ - θ )2
The MSE can be written as the sum of the variance and the squared bias of the estimator.
MSE( 𝜃̂ ) = V( 𝜃̂ ) + Bias
For an unbiased estimator, the MSE is the variance. Since MSE is an expectation, it is a
number, and not a random variable. It may be a function of the unknown parameter θ, but it does not
depend on any random quantities.