Max Batstra
March 29, 2026
1
, Recap Asymptotics & Estimators
Definition 1. [Convergence in Distribution]
If Yn ∼ Gn (y) for Y1 , Y2 , . . . (sample size n) and if for some CDF Y ∼ G(y) it holds that:
lim Gn (y) = G(y)
n→∞
It is said that the sequence of random variables Y1 , Y2 , . . . converges in distribution to Y ∼ G(y) and is
d
denoted by: Yn −
→ Y . example
Definition 2. [Convergence in Probability]
Yn is said to converge in probability to Y , if:
n→∞
∀ε > 0 : P (| Yn − Y |≥ ε) −−−−→ 0
P
Notation: Yn −
→Y
P
To prove Yn −
→ Y , ∀ε > 0 ∃nε ∈ N ∀n > nε :
P (| Yn − Y |≥ ε) ≤ ε
P d
Theorem 1. If Yn −
→ Y , then also Yn −
→Y
d P
Theorem 2. If Yn −
→ c then also Yn −
→c
Theorem 3. [Slutsky’s Theorem]
d d
If Xn −
→ X and Yn −
→ c and Xn and Yn are defined on the same sample space, then:
d d
Xn + Yn −
→X +c and Xn Yn −
→ cX
Note: Slutksy’s theorem can only applied if one of the variable’s limiting distribution is degenerate (so the
limiting distribution is a deterministic number c), if both are non-degenerate it can’t be applied.
Theorem 4. [Weak Law of Large Numbers]
Let X1 , X2 , . . . be i.i.d. with E[Xi ] = µ, then:
n
1X P
X̄n = Xi −
→µ
n i=1
Theorem 5. [Central Limit Theorem]
Let X1 , X2 , . . . be i.i.d. with E[Xi ] = µ and 0 < Var(Xi ) = σ 2 < ∞, then:
X̄n − µ d
√ − → Z ∼ N (0, 1)
σ/ n
2
,Theorem 6. [Central Mapping Theorem (CMT)]
d
If g : R −→ R is continuous and Yn −
→ Y , then:
d
g (Yn ) −
→ g(Y )
Theorem 7. [Delta Theorem]
√ d
If n (Yn − m) /c − → Z ∼ N(0, 1), where Yn is a function of a random sample, and if g(y) has a nonzero
derivative g ′ (m) ̸= 0, then √
n [g (Yn ) − g(m)] d
→ Z ∼ N(0, 1)
|cg ′ (m)|
Definition 3. [Estimator]
A statistic, τb = τb (X1 , X2 , . . . , Xn ), that is used to estimate the value of τ (θ) is called an estimator of τ (θ).
Definition 4. [Unbiased Estimator]
An estimator τ̂ is said to be an unbiased estimator of τ (θ) if
E(τ̂ ) = τ (θ)
for all θ ∈ Ω. Otherwise, we say that τ̂ is a biased estimator of τ (θ).
Definition 5. [Bias and Mean Squared Error]
If τ̂ is an estimator of τ (θ), then the bias is given by
bθ (τ̂ ) = E(τ̂ ) − τ (θ)
and the mean squared error (MSE) of τ̂ is given by
MSE(τ̂ ) = E (τ̂ − τ (θ))2 = Varθ (τ̂ ) + b2θ (τ̂ )
We will now discuss the Method of Maximum Likelihood which often leads to estimators possessing desirable
properties, particularly large sample properties. The idea is to use a value in the parameter space that
corresponds to the largest ’likelihood’ for the observed data as an estimate of an unknown parameter. More
generally, for a set of discrete random variables, the joint density function of a random sample evaluated
at a particular set of sample data, say f (x1 , . . . , xn ; θ), represents the probability that the observed set of
data x1 , . . . , xn will occur. For continuous random variables, f (x1 , . . . , xn ; θ) is not a probability but it still
reflects the relative ”likelihood” that the set of data will occur, and this likelihood depends on the true value
of the parameter.
Definition 6. [Likelihood Function]
Consider n random variables X1 , . . . , Xn with joint density:
f (X1 = x1 , . . . , Xn = xn ; θ). Then f is referred to as a likelihood function for fixed x1 , . . . , xn and is a
function of Θ denoted by L(θ). So:
L (θ) = f (X1 = x1 , . . . , Xn = xn ; θ)
3
,Definition 7. [Maximum Likelihood Estimator]
Let L (θ) = f (X1 = x1 , . . . , Xn = xn ; θ), θ ∈ Ω, be the joint pdf of X1 , . . . , Xn . For a given set of observations
(x1 , . . . , xn ), if θ̂ maximizes L (θ), then θ̂ is called a maximum likelihood estimate (M LE) of θ. So θ̂
solves:
max f (x1 , . . . , xn ; θ)
θ∈Ω
which is equivalent to solving max L (θ)
θ∈Ω
Algorithm for finding MLE:
Let L (θ) = f (X1 = x1 , . . . , Xn = xn ; θ), θ ∈ Ω, be the joint pdf of X1 , . . . , Xn . Then we find the MLE as
follows:
1. Take the log-likelihood function: ln (L (θ))
δ δ2
2. Find the derivative and double derivative of the log-likelihood function: ln L (θ) and ln L (θ)
δθ δθδθ
δ2
If ln L (θ) < 0 it is convex, hence there exists one global maximum and thus θ̂ will be unique.
δθδθ
δ
3. Solve the F OC: ln L (θ) = 0
δθ
4. Conclude: θ̂ is the M LE for θ ∈ Ω
Theorem 8. [Invariance Property General Case]
If θ̂ = θ̂1 , . . . , θ̂k denotes the MLE of θ = (θ1 , . . . , θk ), then the MLE of τ = (τ1 (θ), . . . , τr (θ)) is
τ̂ = (τ̂1 , . . . , τ̂r ) = τ1 (θ̂), . . . , τr (θ̂) for 1 ⩽ r ⩽ k.
To determine whether an estimator is an UMVUE, it is useful to have a lower bound for the variance of an
unbiased estimator, because if the variance of, say unbiased estimator τ̂ equals this lower bound, then we
know for sure τ̂ is an UMVUE. So suppose we know that Varθ (τ̃ ) ≥ LB(θ) for all unbiased estimators τ̃ of
τ (θ). Then if Varθ (τ̂ ) = LB(θ) is UMVUE. The so-called CRLB, or Cramér-Rao Lower Bound provides
such a lower bound.
Theorem 9. [Cramér-Rao Lower Bound]
Let X1 , . . . , Xn be a random sample from fθ and let τ (θ) : Θ → R be a known function. Suppose that
τ̂ (θ) = τ̂ (X1 , . . . , Xn ; θ) ∈ R is an unbiased estimator of τ (θ) for which Varθ (τ̂ ) exists for all θ ∈ Θ ⊂ R.
Then, under regularity,
2
(τ ′ (θ))
Varθ (τ̂ (θ)) ≥ , for all θ ∈ Θ
nI(θ)
Here I(θ) is the Fisher-information.
2
(τ ′ (θ))
Note: This means that the CRLB = , for all θ ∈ Θ
nI(θ)
4
, Corollary 10. [Fisher Information]
Let X1 , . . . , Xn be a random sample from fθ and θ ∈ Θ ⊂ R. Then the Fisher-information is defined in
the following 3 ways:
2
∂
1. I(θ) = Eθ ln {fθ (X1 )}
∂θ
∂
2. I(θ) = Varθ ln {fθ (X1 )}
∂θ
2
∂
3. I(θ) = −Eθ ln {fθ (X 1 )}
∂θ2
5