Estimation in Data Science
Probability Mass Function (PMF)
f(x) = P(X=x) = {(1/2) if x=0, (1/2) if x=1, 0 if otherwise}
Bernouilli Distribution
pdf=(pi^x) * (1-pi)^(1-x)
E=pi
VAR=pi(1-pi)
Geometric Distribution
Let Y = the number of trials required to get the first success;
pdf = ((1-p)^x) p I(x)
Probability Density Function (PDF)
an equation used to compute probabilities of continuous random variables
PDF Properties
1.) f(x) >= 0 , 2.) Integral (-inf, inf) f(x) dx = 1
normal distribution
A function that represents the distribution of variables as a symmetrical bell-shaped graph.
Exponential Distribution
A probability distribution associated with the time between arrivals;
lambdaexp(-lambdax)
Binomial Distribution
a frequency distribution of the possible number of successful outcomes in a given number of
trials in each of which there is the same probability of success.;
(n x) p^x (1-p)^(n-x) * I{0,..,n}(x)
,Poisson Distribution
Probability distribution for the number of arrivals during each time period;
f(x) = (exp(-lambda)lambda^x)/ x! I{0, 1,...}(x)
uniform distribution
the frequency of each value of the variable is evenly spread out across the values of the
variable;
f(x) = 1/(b-a) * I(a,b)(x)
Gamma Distribution
used to model waiting time until kth event occurs
X~GAM(c,d) 0<c 0<d
pdf: f(x)=1/[(c^d)gamma(d)] x^(d-1)e^-x/c 0<x
mean: dc
variance: dc^2
MGF: Mx(t)=[1/(1-ct)]^d
Joint PMF
f(x,y) = P(X=x, Y=y)
Marginal Probability
the values in the margins of a joint probability table that provide the probabilities of each event
separately
expected value
the average of each possible outcome of a future event, weighted by its probability of occurring
Expected value of normal distributions
mu
Expected value of exponential distribution
1/lambda
Law of the Unconscious Statistician
integral(-inf, inf) (g(x)*fx(x))dx
,Expected Value Property when X and Y are independent,
E[XY] = E[X]E[Y]
Variance
Measure of "spread" of a distribution; standard deviation squared; Var[X] = E[(X-mu)^2] =
E[X^2] - (E[X])^2, where mu = E[X]
Properties of Variance
V(X)>=0, V(aX+b)=a^2V(X)
True or False: Var[X+Y] = Var[X] + Var[Y]
True only if X and Y are independent
Covariance
A measure of linear association between two variables. Positive values indicate a positive
relationship; negative values indicate a negative relationship; E[XY] - E[X]E[Y]
Correlation
A measure of the extent to which two factors vary together, and thus of how well either factor
predicts the other.; Cov(X,Y) / sqrt(Var(x)Var(Y))
True or False: if Corr(X,Y) = 0 , then X and Y are uncorrelated.
True
True or False: A random sample means that it is independent and identically distributed.
True
True or False: if E[xbar]=mu, then xbar is an unbiased estimator of mu
True
Moment Generating Function
M.X(t) = E[e^(tX)]
Method of Moments Estimator
An estimator obtained by using the sample analog of population moments; ordinary least
squares and two stage least squares are both method of moments estimators
maximum likelihood estimation
, a class of estimators that chooses a set of parameters that provides the highest probability of
observing a particular outcome
likelihood function
joint probability distribution of the data, treated as a function of the unknown coefficients
Asymptotically unbiased
unbiased as the sample size tends to infinity. Some biased estimators are asymptotically
unbiased but all unbiased estimators are asymptotically unbiased.
Invariance property of MLEs
if ^θ is the MLE of θ, then τ(^θ)τ(θ^) is the MLE of τ(θ).
There are some constrains on the choice of τ(θ)τ(θ). If τ(θ)τ(θ) is one-to-one then this definition
is fine. In this case, denote η=τ(θ)η=τ(θ), then the inverse function τ−1(η)=θτ−1(η)=θ exists
Mean Squared Error (MSE)
the average of the squared differences between the forecasted and observed values
relative efficiency
given two unbiased point estimators of the same population parameter, the point estimator
with the smaller standard error is more efficient
Cramer Rao Lower Bound
Var[tau^(theta)] >= [tau'(theta)] ^2/ I.n(theta), where tau^(theta) is an unbiased estimator
tau(theta)
Cauchy-Schwarz Inequality
If x and y are vectors in Rn, then |x*y| <= ||x|| ||y||
UMVUE
Uniformly Minimum Variance Unbiased Estimator
Weak Law of Large Numbers
if you take a representative sample of observations from a population, the mean of that sample
approaches the mean of the population as the sample size increases
Markov's Inequality
P(X>=c)<=E(X)/c