Lecture 2: Descriptive Statistics 3
Statistics: why and when? 3
Statistical toolkit 3
Example: measuring differences in wind 3
Measuring wind 4
Lecture 3: Explained variation 7
Prediction errors 7
Variation analysis 7
Eta2 = proportion explained variation 8
Linear regression 9
Testing the model: R^2 10
Lecture 4: Theory of estimates and testing 12
Population vs. sample 12
Characteristics probability distribution sample mean M 13
Standard error of the mean 13
Theory of estimates 13
T-value for M 15
Estimate 15
Interval estimate - Confidence interval (CI) 16
90% Confidence interval for µ if σ is known 17
90% Confidence interval for µ if σ is unknown 17
Testing hypotheses 18
Testing with exceedance probability 18
Testing with critical value 20
Testing with CI 21
Lecture 5: Comparing two groups 22
Type I and Type II error 24
Comparing two groups 24
2 groups of paired measurements: 25
Independent groups 27
Sample-effect size 30
Lecture 6: Comparing two groups 30
Two-tailed vs one-tailed? 31
Analysis of Variance (ANOVA) 34
Planned comparisons 2 combinations 37
Lecture 7: ANOVA with controls 39
Repetition one-way ANOVA 39
Orthogonal contrasts 41
Polynomial contrast 41
Post-hoc comparisons: 42
Control for other variable(s) 42
Decomposition 2-way ANOVA 43
2-way ANOVA- Follow up analysis 44
1
,Lecture 2: Descriptive Statistics
Statistics: why and when?
- techniques for processing (large amounts of) data in different situations
- climate data (climate research) KNMI
- experimental data (treatment-control groups)
- survey data etc.
- less common in qualitative research
- open interviews result in data that is less structured, and less quantitative
Statistical toolkit
Lots of tools!
- different ways to measure
- different types of data
- different types of questions
- number of groups (1 or more)
- number of explanatory (independent)
- etc,
- per situation:
- what tool is most appropriate?
- how to use this tool?
- how to interpret the results?
- how to draw your conclusions
Example: measuring differences in wind
- Are winds stronger at the coast, compared to the interior?
- problem how to measure?
- at what height?
- using what instrument?
- using what scale?
- problem: how to deal with variability
- many places
- many moments (day, months, seasons)
- many times of the day
Limitations of measurements
- coast = den helder
- interior = de bilt
- measurement at every hour in both places
- number of measurements = 2x 20 x 365 x 24 = 350400
- by means of a sample you can try to detect differences and similarities between the coast
(den helder) and the interior (de bilt).
- this will not give the answer to the general question
2
,Statistical techniques
1. describe / summarize the data pertaining to the two groups
- tables, graphs, metrics = draw your conclusions regarding similarities / differences
= descriptive statistics
2. can you generalize the findings for the sample to your population?
- is the observed difference more than a coincidence? (is it statistically significant?)
- what the estimated size of the difference between the populations?
= inductive statistics
Measuring wind
Measurement 1: Beaufort scale
- from 0 to 12 Bft
- 0: smoke rises straight up
- 6: difficult to hold on to your umbrella
- 9: root tiles are blown away, small children can hardly stay upright
- higher score = stronger wind
- level of measurement = ordinal (interval between numbers is not equal)
Measurement 2: Wind velocity in m/sec or km/h
- scale from 0 to infinity (in practice to 50/200)
- similar intervals on the scale indicate similar difference in wind velocity
- level of measurement = interval
- absolute zero is meaningful in this case
- a score that is p times as high, indicates a wind velocity that is p times as high
- level of measurement = ratio
Compare measurements:
km/h scale =
similar intervals represent similar differences in wind strength
Beaufort scale =
order is correct but differences between higher scale values is much larger than
differences between lower scale values
Measurement wind 3: used for windsurfing
- 0 = too strong to wind surf
- 1 = too weak to wind surf
- 2 = good for surf novices
- 3= good for experiences surfers
- 4 = what Dorian van Rijsselberghe likes
- order of scores not congruent with order in strength of wind
- level of measurement = nominal
3
, Data matrix
- represent scores in a spreadsheet
- column = characteristics of the variable
- row = case or observation => scores on the variable
-Frequency tables
- make different classes
- Bar chart
- graphic representation of the frequency table
- polygons
- connected lines (continuous phenomena)
1. Cumulative distribution
Difference measure Change = Max CP
2. differences between centres relative to distribution
- difference means
Statistical toolbox
- mean
- dispersion
- variance
- standard deviation
Example: 2 movies, both movies are graded by a group of 5 friends
Scores:
- movie 1: 9, 6, 6.5, 7.5, 5.5, 9,5
- movie 2: 7, 8, 8,5, 6,5, 7,5
Calculate mean = the scores / number of scores given = mean
- Both give 7.5
Calculate dispersion = deviation of the individual observations from the mean
- dev = x- X
- Score of 9, so 9 – mean = 1,5
- mean deviation , mean of absolute deviation or mean squared deviation
Calculate dispersion variance = sun of all squares is variation (variance of a sample s2)
- Variance = s2 = SS/Df
SS = sum of squares
Df = degrees of freedom = # of deviations that are free to vary
Dispersion: variance
- variance is a measure for the dispersion of the data
- the average of the squared deviations from the mean
- squaring makes each term positive so that values above the mean do not cancel values
below the mean
- give you a very general idea of the spread of your data
4