Statistics 2
- Tutorial 1
Are observations normally distributed?
Q-Q plot = Quantile Plot
- Tutorial 2
A method that also shows the uncertainty of the estimated mean confidence interval.
Biased: low accuracy (=precisie)
Unbiased: high accuracy
y is unbiased estimator for μ y , because μ y =μ y .
2 σ 2y σy
σ y= σ y=
n √n
y is a consistent estimator for μ y (the larger the sample, the coser we tend to the unknown true
value μ y ).
And the outcome of y is an estimate of μ y . How precise this estimate is, indicates the confidence
interval (CI). CI has the form estimator ± error margin. The confidence coefficient (1 - a) reflects a
degree of trust. E.g. 1 – a = 0.95, means that 95% of the procedure is a correct statement.
( μ−2 σ , μ+2 σ )=95 %
( √)
y N ( μ y , σ y )=N μ ,
σy
n
→ z=
y−μ
σy
√n
z a /2∗σ y
Limits (1 – a) x 100% confidence interval for μ: y ± .
√n
Bereken de confidence interval:
1. Bepaal of je t of z gaat gebruiken;
z α/ 2∗σ t α / 2 ,n−1∗s
y± or y±
√n √n
2. Vul de formule in
3. Trek en tel de error margin af/op van de estimator. (Estimator ( y ) ± error margin)
4. …% confidence interval for μ: (results by 3.)
5. In other words: In …% of all possible samples the confidence interval based on the sample
will contain the population mean … μ.
We estimate σ by using the sample standard deviation s, the square root of the sample variance s2 .
s2=Σ ni=1 ¿ ¿
We estimate the standard deviation of the mean:
σy s
σ y= =
√n √n
The standard deviation of the sample mean is a measure for the precision of the sample mean y as
estimator for the population mean μ. That’s why we call it a standard error of a sample mean and we
, s
note it as: SE ( y )=
√n
Standard normal distribution is replaced by t-distribution with a certain degrees of freedom (df).
df = infinity standard normal distribution
σ = known σ = unknown
100(1 – a)% CI for μ equals: 100(1 – a)% CI for μ equals:
z a∗σ t a∗s
2 2
y± y±
√n √n
z a from N(0,1) distribution t a from t(n - 1) distribution
2 2
a a
(df = inf., right-tail p= ) (df = n - 1, right-tail p= )
2 2
! note: s = als je de standaard deviatie berekent vanuit je sample. σ = als het voor de hele populatie geldt.
When constructing a confidence interval we assume a normal distribution for response variable y.
Check if this assumption is reasonable in practice Q-Q plot. Observations must be approximately
normally distributed. Additionally the oservations have to be mutually independent (as if from a
random sample).
- Tutorial 3
Define parameters and specify sequentially:
1. Null-hypothesis H 0 and alternative hypothesis Ha
2. The Test Statistic as a formula, fill in allowed parts
3. The probability distribution of th T.S. under H 0 (when H 0 is true)
4. The behavior of the T.S. under Ha (under Ha the T.S. tends to higher/lower/higher or lower
values than under H 0)
5. The type of P-value (rigt-, left-, two-sided)
! note: determine step 1 till 5 prior to the expeirment !
6. The outcome of the T.S.
7. There are two options: conclusion with P-value or with Rejection Region (R.R.)
a. The appropriate P-value
b. The rejection region (R.R.)
8. The conclusion (also in non-statistical terms)
a. P-value ≤ a reject H0, Ha has been shown
P-value > a do not reject H0, Ha has not been shown
b. Outcome T.S. in R.R. reject H0, Ha has been shown
Outcome T.S. not in R.R. do not reject H0, Ha has not been shown
Model assumptions: based on a random sample of size n from N(μ,σ) population; observations y1, y2, …, yn
z-test σ known t-test σ unkown
Define μ = … Define μ = …
1. H0: μ = μ0 1. H0: μ = μ0
y −μ0 y−μ 0
2. T.S.: z= 2. T.S.: t=
σ y / √n sy/√n
- Tutorial 1
Are observations normally distributed?
Q-Q plot = Quantile Plot
- Tutorial 2
A method that also shows the uncertainty of the estimated mean confidence interval.
Biased: low accuracy (=precisie)
Unbiased: high accuracy
y is unbiased estimator for μ y , because μ y =μ y .
2 σ 2y σy
σ y= σ y=
n √n
y is a consistent estimator for μ y (the larger the sample, the coser we tend to the unknown true
value μ y ).
And the outcome of y is an estimate of μ y . How precise this estimate is, indicates the confidence
interval (CI). CI has the form estimator ± error margin. The confidence coefficient (1 - a) reflects a
degree of trust. E.g. 1 – a = 0.95, means that 95% of the procedure is a correct statement.
( μ−2 σ , μ+2 σ )=95 %
( √)
y N ( μ y , σ y )=N μ ,
σy
n
→ z=
y−μ
σy
√n
z a /2∗σ y
Limits (1 – a) x 100% confidence interval for μ: y ± .
√n
Bereken de confidence interval:
1. Bepaal of je t of z gaat gebruiken;
z α/ 2∗σ t α / 2 ,n−1∗s
y± or y±
√n √n
2. Vul de formule in
3. Trek en tel de error margin af/op van de estimator. (Estimator ( y ) ± error margin)
4. …% confidence interval for μ: (results by 3.)
5. In other words: In …% of all possible samples the confidence interval based on the sample
will contain the population mean … μ.
We estimate σ by using the sample standard deviation s, the square root of the sample variance s2 .
s2=Σ ni=1 ¿ ¿
We estimate the standard deviation of the mean:
σy s
σ y= =
√n √n
The standard deviation of the sample mean is a measure for the precision of the sample mean y as
estimator for the population mean μ. That’s why we call it a standard error of a sample mean and we
, s
note it as: SE ( y )=
√n
Standard normal distribution is replaced by t-distribution with a certain degrees of freedom (df).
df = infinity standard normal distribution
σ = known σ = unknown
100(1 – a)% CI for μ equals: 100(1 – a)% CI for μ equals:
z a∗σ t a∗s
2 2
y± y±
√n √n
z a from N(0,1) distribution t a from t(n - 1) distribution
2 2
a a
(df = inf., right-tail p= ) (df = n - 1, right-tail p= )
2 2
! note: s = als je de standaard deviatie berekent vanuit je sample. σ = als het voor de hele populatie geldt.
When constructing a confidence interval we assume a normal distribution for response variable y.
Check if this assumption is reasonable in practice Q-Q plot. Observations must be approximately
normally distributed. Additionally the oservations have to be mutually independent (as if from a
random sample).
- Tutorial 3
Define parameters and specify sequentially:
1. Null-hypothesis H 0 and alternative hypothesis Ha
2. The Test Statistic as a formula, fill in allowed parts
3. The probability distribution of th T.S. under H 0 (when H 0 is true)
4. The behavior of the T.S. under Ha (under Ha the T.S. tends to higher/lower/higher or lower
values than under H 0)
5. The type of P-value (rigt-, left-, two-sided)
! note: determine step 1 till 5 prior to the expeirment !
6. The outcome of the T.S.
7. There are two options: conclusion with P-value or with Rejection Region (R.R.)
a. The appropriate P-value
b. The rejection region (R.R.)
8. The conclusion (also in non-statistical terms)
a. P-value ≤ a reject H0, Ha has been shown
P-value > a do not reject H0, Ha has not been shown
b. Outcome T.S. in R.R. reject H0, Ha has been shown
Outcome T.S. not in R.R. do not reject H0, Ha has not been shown
Model assumptions: based on a random sample of size n from N(μ,σ) population; observations y1, y2, …, yn
z-test σ known t-test σ unkown
Define μ = … Define μ = …
1. H0: μ = μ0 1. H0: μ = μ0
y −μ0 y−μ 0
2. T.S.: z= 2. T.S.: t=
σ y / √n sy/√n