Statistics
Lecture 1 - 04-09-2023
A statistical test can have two outcomes: reject H0 and claim H1 is right OR not reject
H0 but also not confirm H0 .
If the text rejects the null hypothesis, statisticians say ”the test was significant to the
level α”. A common ”test level” is α = 0.05
A random sample of size n is a collection of n random variables X1 , . . . , Xn that are
independent and distributed identically.
A statistic is an observable function T of a collection of random variables such that T
does not depend on any unknown parameters.
More generally we write X1 , . . . , Xn ∼ Fθ to indicate that X1 , . . . , Xn is a random sample
of size n from a distribution Fθ that depends on the parameter(s) θ. For the density of
the distribution we use the letter f .
For a random sample X1 , . . . , Xn ∼ Fθ we have the relationship
n
Y
fθ (x1 , . . . , xn ) = fθ (xi )
i=1
We use capital letters for random variables. We use small letters for the realizations of
random variables. X1 = x1 indicates that the random variable X1 takes the value x1 .
Given realizations x1 , . . . , xn we define
n
1X
xn := xi
n i=1
Given random variables X1 , . . . , Xn we define
n
1X
X n := Xi
n i=1
A statistic T is called sufficient for θ if we do not lose any information about θ when
”summarizing” x1 , . . . , xn → T (x1 , . . . , xn ).
Given random variables X1 , . . . , Xn ∼ Fθ with joint density fθ (·), the conditional density
of X1 , . . . , Xk given Xk+1 , . . . , Xn is defined as
fθ (x1 , . . . , xn )
fθ (x1 , . . . , xk | xk+1 , . . . , xn ) =
fθ (xk+1 , . . . , xn )
Let T : Rn → R be a statistic. Then T (X1 , . . . , Xn ) is a random variable too. For the
density of T (X1 , . . . , Xn ) we write fθ (t(x1 , . . . , xn )))
1
,The conditional density of X1 , . . . , Xn given T (X1 , . . . , Xn ) is
fθ (x1 , . . . , xn , t(x1 , . . . , xn ))
fθ (x1 , . . . , xn | t(x1 , . . . , xn )) =
fθ (t(x1 , . . . , xn ))
Notation: X denotes the collection X1 , . . . , Xn , 4 denotes the realizations x1 , . . . , xn ,
X = x denotes that X1 = x1 , . . . , Xn = xn and let X n = xn denote that
n n
1X 1X
Xi = xi
n i=1 n i=1
We can now define the formal definition of a sufficient statistic: A statistic T is called
sufficient for θ if the conditional density of X given T (X) (fθ (x | t(x))) does not depend
on θ. That is, if we have
fθ (x | t(x)) = f (x | t(x))
Factorization theorem: Given a random sample X ∼ Fθ , then T is a sufficient statistic
for θ if and only if the joint density fθ (x) of X can be factorized into
fθ (x) = g(t(x); θ) · h(x) for all x = (x1 , . . . , xn ) ∈ SX
A distribution Fθ with θ containing d parameters (|θ| = d) belongs to the exponential
family if the density fθ of Fθ can be decomposed into
( d )
X
fθ (x) = h(x) · exp ηj (θ)Tj (x) − A(θ)
j=1
Given a random sample from a distribution of an exponential family with d parameters:
X1 , . . . , Xn ∼ Fθ , the sufficient statistics for θ are
n n
!
X X
T1 (Xi ), . . . , Td (Xi )
i=1 i=1
Lecture 2 - 06-09-2023
An estimator is a statistic T (X) that is used to estimate the unknown parameter θ. The
statistic is usually denoted θ̂(X) or short θ̂. Note that θ̂ depends on X, but not on θ.
two estimation concepts: Method of Moments (MM) and Maximum Likelihood (ML).
Method of moments depends on the law of large numbers. For any k ∈ N we have
n
1X k P
for n → ∞ : Xi → E[X1k ]
n i=1
2
, Convergence in probability: recall that X n →P E[X1 ] means for every ε > 0 we have
P(|X n − E[X1 ]| > ϵ) → 0 as n → ∞.
Recall from probability that X1 , . . . , Xn iid implies that g(X1 ), . . . , g(Xn ) iid for all mea-
surable functions g. And the law of large numbers applies to the sample g(X1 ), . . . , g(Xn ).
The expectation of X:
(´
S
x · fθ (x) dx if X is continuous
E[X] = P
S x · fθ (x) if X is discrete
the variance of X:
ˆ
(x − E[x])2 · fθ (x) dxif X is continuous
S
X
(x − E[x])2 · fθ (x)if X is discrete
S
Rules for expectations and variances:
• E[X + Y ] = E[X] + E[Y ]
• E[cX] = c · E[X]
• V (X + Y ) = V (X) + V (Y ) − 2 · cov(X, Y ). Recall that the covariance between X
and Y is
Cov(X, Y ) = E[(X − E[X])(Y − E[Y ])]
• V (cX) = c2 · V (X)
• If X and Y are independent, then cov(X, Y ) = 0
• We have the relationship V (X) = E[X 2 ] − E[X]2
Expectation of the mean (not independent):
" n # n
1X 1X
E[X n ] = E Xi = E[Xi ] = E[X1 ]
n i=1 n i=1
Variance of the mean (independent):
n
! n
1X 1 X 1
V (X n ) = V Xi = V (X i ) = V (X1 )
n i=1 n2 i=1 n
Chebyshev inequality: for any random variable Y with finite variance and any ε > 0
V (Y )
P(|Y − E[Y ]| > ε) ≤
ε2
Markov inequality: For a single random variable X ∼ Fθ with sample space SX ⊆ R+
0,
we have for all r > 0 the Markov inequality
E[X]
Pθ (X ≥ r) ≤
r
3
Lecture 1 - 04-09-2023
A statistical test can have two outcomes: reject H0 and claim H1 is right OR not reject
H0 but also not confirm H0 .
If the text rejects the null hypothesis, statisticians say ”the test was significant to the
level α”. A common ”test level” is α = 0.05
A random sample of size n is a collection of n random variables X1 , . . . , Xn that are
independent and distributed identically.
A statistic is an observable function T of a collection of random variables such that T
does not depend on any unknown parameters.
More generally we write X1 , . . . , Xn ∼ Fθ to indicate that X1 , . . . , Xn is a random sample
of size n from a distribution Fθ that depends on the parameter(s) θ. For the density of
the distribution we use the letter f .
For a random sample X1 , . . . , Xn ∼ Fθ we have the relationship
n
Y
fθ (x1 , . . . , xn ) = fθ (xi )
i=1
We use capital letters for random variables. We use small letters for the realizations of
random variables. X1 = x1 indicates that the random variable X1 takes the value x1 .
Given realizations x1 , . . . , xn we define
n
1X
xn := xi
n i=1
Given random variables X1 , . . . , Xn we define
n
1X
X n := Xi
n i=1
A statistic T is called sufficient for θ if we do not lose any information about θ when
”summarizing” x1 , . . . , xn → T (x1 , . . . , xn ).
Given random variables X1 , . . . , Xn ∼ Fθ with joint density fθ (·), the conditional density
of X1 , . . . , Xk given Xk+1 , . . . , Xn is defined as
fθ (x1 , . . . , xn )
fθ (x1 , . . . , xk | xk+1 , . . . , xn ) =
fθ (xk+1 , . . . , xn )
Let T : Rn → R be a statistic. Then T (X1 , . . . , Xn ) is a random variable too. For the
density of T (X1 , . . . , Xn ) we write fθ (t(x1 , . . . , xn )))
1
,The conditional density of X1 , . . . , Xn given T (X1 , . . . , Xn ) is
fθ (x1 , . . . , xn , t(x1 , . . . , xn ))
fθ (x1 , . . . , xn | t(x1 , . . . , xn )) =
fθ (t(x1 , . . . , xn ))
Notation: X denotes the collection X1 , . . . , Xn , 4 denotes the realizations x1 , . . . , xn ,
X = x denotes that X1 = x1 , . . . , Xn = xn and let X n = xn denote that
n n
1X 1X
Xi = xi
n i=1 n i=1
We can now define the formal definition of a sufficient statistic: A statistic T is called
sufficient for θ if the conditional density of X given T (X) (fθ (x | t(x))) does not depend
on θ. That is, if we have
fθ (x | t(x)) = f (x | t(x))
Factorization theorem: Given a random sample X ∼ Fθ , then T is a sufficient statistic
for θ if and only if the joint density fθ (x) of X can be factorized into
fθ (x) = g(t(x); θ) · h(x) for all x = (x1 , . . . , xn ) ∈ SX
A distribution Fθ with θ containing d parameters (|θ| = d) belongs to the exponential
family if the density fθ of Fθ can be decomposed into
( d )
X
fθ (x) = h(x) · exp ηj (θ)Tj (x) − A(θ)
j=1
Given a random sample from a distribution of an exponential family with d parameters:
X1 , . . . , Xn ∼ Fθ , the sufficient statistics for θ are
n n
!
X X
T1 (Xi ), . . . , Td (Xi )
i=1 i=1
Lecture 2 - 06-09-2023
An estimator is a statistic T (X) that is used to estimate the unknown parameter θ. The
statistic is usually denoted θ̂(X) or short θ̂. Note that θ̂ depends on X, but not on θ.
two estimation concepts: Method of Moments (MM) and Maximum Likelihood (ML).
Method of moments depends on the law of large numbers. For any k ∈ N we have
n
1X k P
for n → ∞ : Xi → E[X1k ]
n i=1
2
, Convergence in probability: recall that X n →P E[X1 ] means for every ε > 0 we have
P(|X n − E[X1 ]| > ϵ) → 0 as n → ∞.
Recall from probability that X1 , . . . , Xn iid implies that g(X1 ), . . . , g(Xn ) iid for all mea-
surable functions g. And the law of large numbers applies to the sample g(X1 ), . . . , g(Xn ).
The expectation of X:
(´
S
x · fθ (x) dx if X is continuous
E[X] = P
S x · fθ (x) if X is discrete
the variance of X:
ˆ
(x − E[x])2 · fθ (x) dxif X is continuous
S
X
(x − E[x])2 · fθ (x)if X is discrete
S
Rules for expectations and variances:
• E[X + Y ] = E[X] + E[Y ]
• E[cX] = c · E[X]
• V (X + Y ) = V (X) + V (Y ) − 2 · cov(X, Y ). Recall that the covariance between X
and Y is
Cov(X, Y ) = E[(X − E[X])(Y − E[Y ])]
• V (cX) = c2 · V (X)
• If X and Y are independent, then cov(X, Y ) = 0
• We have the relationship V (X) = E[X 2 ] − E[X]2
Expectation of the mean (not independent):
" n # n
1X 1X
E[X n ] = E Xi = E[Xi ] = E[X1 ]
n i=1 n i=1
Variance of the mean (independent):
n
! n
1X 1 X 1
V (X n ) = V Xi = V (X i ) = V (X1 )
n i=1 n2 i=1 n
Chebyshev inequality: for any random variable Y with finite variance and any ε > 0
V (Y )
P(|Y − E[Y ]| > ε) ≤
ε2
Markov inequality: For a single random variable X ∼ Fθ with sample space SX ⊆ R+
0,
we have for all r > 0 the Markov inequality
E[X]
Pθ (X ≥ r) ≤
r
3