ADVANCED
CLASS
STATISTICS
Ad Stats: Stats Inference
Condition/
Description Logical Statement
Theorem
Definition of A proven proposition that is used as a stepping stone to a more
Lemma important result
Definition of
A Statement that follows readily from a previous statement
Corollary
Order Represents the observed values when placed in ascending
Statistics order
Inferential Can be made in the form of Type I Errors (false positives) and
Errors Type II Errors (false negatives)
Sufficiency considers a statistic that summarizes, or captures,
Sufficiency
all of the relevant information about θ contained in X
⭐ Definition A statistic U = T(X) is a sufficient statistic for θ if the conditional Every statistic that is a one-to
of Sufficient distribution of X given U does not depend on θ ⇒ FX∣U (x∣u) function of a sufficient statisti
Statistic is not a function of θ . itself a sufficient statistic
If U = T(X) is a sufficient statistic for θ then any inference about
Sufficiency θ should be based on the random sample X only through T(X).
Principle Such that, it contains all the information that the sample has to
offer.
Consider X = {X1 , ..., Xn } a random sample with joint pdf/
⭐ A “only if” B ≡A ⇒B if the
pmf fX (x; θ), and a sufficient statistic U = T(X)then: The
there must be B A “if” B ≡B
Factorization statistic U = T(X)is sufficient for the vector of parameters θ
there is B, there would be A A
Theorem if and only if we can find functions g and h such that:
~ and only if” B ≡A ⇔B
fx (x; θ) = g(T(x), θ)h(x)for all x ∈ ℜn and θ ∈
Θ
If U is sufficient for all param
Minimal A sufficient statistic U = b(V) is a minimal sufficient statistic,
interest, we have not lost any
Sufficient for some V = T(X) where V is a sufficient statistic. Since Uis a
relevant information moving f
Statistic function of V , V must contain at least as much information
to U.
Condition of fx ( x;θ)
We can derive a minimal suff
, Condition/
Description Logical Statement
Theorem
Is the expected squared distance between an estimator and the
Bias(θ^) = E(θ^) − θ A sm
Mean Square mean squared error will have
true parameter value. MSE θ (θ^) = Eθ ((θ^ − θ)2 ) =
Error
sampling distribution with pro
Var(θ^) + (Bias(θ^))2
mass concentrated near θ
The sequence { θ^n : n = 1, 2, ...} is a consistent sequence
Definition of
of estimators of θ if it converges in probabilty to θ :
Consistency
limn→∞ Pθ (∣θ^n − θ∣ < ϵ) = 1
⭐
From the definition of consistency, Pθ (∣θ^n
− θ∣ ≥ ϵ) ≤
Relationship
δ ; δ > 0 Applying Chebyshev inequality: Pθ (∣θ^n − θ∣ ≥ By definition of MSE, E(θ^ −
between
^
MSE(θ^)
2
ϵ) ≤ E(θn2−θ) limn→∞ MSEθ (θ^n ) = 0 Then { θ^n } is a
MSE & ϵ
Consistency consistent sequence of estimators
A measure for finding an improved estimator in terms of mean
⭐ Rao- squared error by taking conditional expectations on a sufficient
statistic. Consider Xa random sample, U = Y(X)a statistic
Blackwell ~ MSE(T(X)) ≤ MSE(Y
of the same dimension as θ , and S(X)a sufficient statistic. If
Theorem
we consider T = E(U∣S(X)), then T is a sufficient statistic
with MSE θ~(T) ≤ MSE θ~(U)
⭐
Conditional
E(Y∣X) = E(Y) + Cov(Y, X){Var(X)} −1 (X −
Expectations ~
E(X))Var(Y∣X) = Var(Y) − Σ =Covariance Matrix
for
Cov(Y, X){Var(X)} −1 Cov(X, Y)
Multivariate
Normal
⭐ Sample
Moments &
r th sample moment: Mr′ = n1 ΣXri r th central sample
Central
1
moment: Mr = n Σ(Xi − M1′ )r
Sample
Moments
The condition for consistency
Let X = {X1 , X2 , ..., Xn } be a random sample from a requires convergence in prob
Weak Law of
ˉ n →P
population distribution with finite mean, i.e. μ < ∞. X μ The direct consequence of W
Large
ˉ
as n → ∞X can be shown to be a consistent estimator of μ. Law results in the Law of Lar
Numbers ˉ n →P μas n → ∞.
Hence by the definition of consistency, X Numbers or the convergence
distribution.
A remarkable result that allows the application of statistical
inferences about the sample mean using normal distribution The rate of convergence of
Central Limit regardless of the population distribution FX from which a
sampling distribution of Xˉ to
Theorem random sample is drawn. The central limit theorem itself does normality depends on the sha
not provide any indication about how good a normal the underlying distribution FX
approximation would be in finite samples.
Suppose the distribution FX
Four representations of the Sample Variance 1) (n −