(English & Dutch)
WUR (MAT-15303)
(Statistical) research in general:
- Research question
- Population of interest (doelgroep)
- Sample deel van de populatie die we gaan onderzoeken
population
sample
units
Population: every memeber of a group (persons, objects, etc.) for which we would like to collect
information.
Sample: part of the population that we will study and collect information from.
Units: the elements of a sample which we are gathering information from.
Variable: measured property of an element of the sample.
There are two kinds of variables
- Quantitative variable (continuous/discrete)
- Qualitative variable (nominal/ordinal)
Measure?
continuous
Numbers?
Quantitative
Count?
discrete
No order?
Categories nominal
Qualitative
Order? ordinal
Drawing a sample from a population:
- Sample (bias): certain parts of the population might be overrepresented as compared to
other parts.
Good/recommened way for sampling SRS
- Simple Random Sampling (SRS): in SRS, units are drawn at random from a population.
Every sample (of a certain size) has equal chance to be selected (and every unit from a
population has the same chance to be selected into the sample).
SRS avoids sampling bias
hierbij geen voorselectie, als er voorselectie wordt gemaakt is het gelijk geen SRS.
Important note:
- Undersampling: certain groups are excluded from the sample.
- Non-response: not participating, or not successfully contacted
, - Voluntary participation (in a survey): might result in particularly positive or negative
answers.
- Response bias: social desirability bias (self-reported personal traits, questions about
income).
Observational vs. experimental research
- Observational study: observe the unit/process without influencing it.
- Experimental study: apply a treatment to the unit in order to observe a reaction.
- Cause-effect relationship can only be concluded from an experimental study.
Central tendencies (centrum maat)
Mean (gemiddelde)
Example:
4 7 3 9 6 (4+7+3+9+6)/5 = 5.8
n = number of … (grootte van sample)
Median M (mediaan)
- Order the data from smallest to largest
Example:
Odd number of data
- 4 7 3 9 6 in increasing order
34679M=6
Even number of data
- 4 7 3 9 6 5 in increasing order
3 4 5 6 7 9 M = (5+6) / 2 = 5.5
The median is not sensitive to outliers, the mean is very sensitive to outliers!
Measure of variability (sprijdingsmaat)
- Minimum
- Maximum
- Range = maximum - minimum
- Standard deviation (sd, standard afwijking) s = √ variance
Standard deviation geeft aan hoever een observatie gemiddeld afligt van het gemiddelde.
Dus s = standard deviation
s2 = variance
- Interquartile range IQR = Q3 – Q1
Q1 = 1st quartile = 25th percentage = lower quartile
Q3 = 3rd quartile = 75th percentage = upper quartile
The interquartile range is not sensitive to outliers, the variance is sensitive to outliers!