Lecture 1
Population: the group that you wish to describe (firms, people, …..)
→ The entire set of elements
Sample: the group for which you have data
→ A subset of elements from the population, taken with the intention of making inferences about the
population
Describing the whole population is:
to expensive
impossible
sampling might be destructive (physical geography) → people start thinking about things
differently because of questions you asked
impractical
unnecessary
Parameter: numerical property of a population
Statistic: numerical property of a sample
Sampling error → A difference between the value of a parameter and the statistic computed to
estimate that parameter. Result of:
Variability (change)
Increase n
Sampling Bias (sample of office buildings for a population of houses)
Design of sample procedure
Statistics 1 1
, Nonsampling Error (sloppy research process)
Validity, accuracy, Precision of variables
Prevent coding errors
Prevent interpretation errors
Also: good labelling, metadata
Variability: The phenomenon whereby repeated sampling from the same population results in
different values for the statistics
Sampling distribution: describes how the statistic varies when sampling is repeated. In other words:
describes (extent of) variability → This is the basis for inference
Central Limit Theorem:
even if a variable X is not normally distributed in the population we may assume that (under certain
conditions, such as a large number of cases and a fixed standard deviation σ
→ the Sampling Distribution of the mean is approximately normal with standard error
Sampling Bias: Result of procedures which favor the inclusion, in your sample, of
elements from the population with certain characteristics.
Imagine you survey 50 people in the Grote Markt over a weekend (in December) about the
atmosphere in the city centre around Christmas Time (import tourist, people vacation, people that
are in the city atmosphere like it → people that don’t won’t be there, some people are more
participant than others)
→ Sources of Sampling Bias: (a combination of) the
population
researcher (unintentionally only approach certain group)
research design
research topic
respondent
→ May result in:
incomplete coverage: relevant elements not in sampling frame
Statistics 1 2
, nonresponse: refusal or missing data
Sampling: Steps 1-5: all about the reduction of Sampling Bias
Statistics 1 3
, Processing of data
How to deal with nonresponse
Distinguish:
Choice of respondent - Can still be regarded as a value - “no opinion” still informs about the
respondents opinion - “don’t know” still informs about the reason of nonresponse
Other causes - “no answer” does not inform about the position of the respondent
Qualitative: Non-numerical values
Quantitative:Numerical values (counts, measurements)
Discrete: Range of possible values is limited
Continuous: Intermittent values are also possible
Measurement levels (Typology: Stevens (1946))
Nominal
Categorical, no ranking → no universal ordering → Köppen climate system, because no 1 way
to order them, can use multiple ways
mode
Ordinal
Categorical, ranked
Degrees of a certain phenomenon
Width of intervals unknown
can´t calculate the mean, because there are no variables
so use the median (can’t use Z-test because don’t use median)
Interval
Width of intervals known (= equidistance)
We can compute differences
no nulpunt
Ratio
Statistics 1 4