Let B1 , B2 , B3 , ...Bn be a partition of the sample space (i.e., they are
Compiled by William Chen (http://wzchen.com) and Joe Blitzstein, Independence disjoint and their union is the entire sample space).
with contributions from Sebastian Chiu, Yuan Jiang, Yuqi Hou, and
Independent Events A and B are independent if knowing whether P (A) = P (A|B1 )P (B1 ) + P (A|B2 )P (B2 ) + · · · + P (A|Bn )P (Bn )
Jessy Hwang. Material based on Joe Blitzstein’s (@stat110) lectures
A occurred gives no information about whether B occurred. More P (A) = P (A ∩ B1 ) + P (A ∩ B2 ) + · · · + P (A ∩ Bn )
(http://stat110.net) and Blitzstein/Hwang’s Introduction to
formally, A and B (which have nonzero probability) are independent if
Probability textbook (http://bit.ly/introprobability). Licensed
and only if one of the following equivalent statements holds: For LOTP with extra conditioning, just add in another event C!
under CC BY-NC-SA 4.0. Please share comments, suggestions, and errors
at http://github.com/wzchen/probability_cheatsheet. P (A ∩ B) = P (A)P (B) P (A|C) = P (A|B1 , C)P (B1 |C) + · · · + P (A|Bn , C)P (Bn |C)
P (A|B) = P (A) P (A|C) = P (A ∩ B1 |C) + P (A ∩ B2 |C) + · · · + P (A ∩ Bn |C)
Last Updated September 4, 2015 P (B|A) = P (B)
Special case of LOTP with B and B c as partition:
Conditional Independence A and B are conditionally independent P (A) = P (A|B)P (B) + P (A|B )P (B )
c c
given C if P (A ∩ B|C) = P (A|C)P (B|C). Conditional independence
Counting does not imply independence, and independence does not imply P (A) = P (A ∩ B) + P (A ∩ B )
c
conditional independence.
Multiplication Rule Unions, Intersections, and Complements Bayes’ Rule
De Morgan’s Laws A useful identity that can make calculating
Bayes’ Rule, and with extra conditioning (just add in C!)
C
cak
e probabilities of unions easier by relating them to intersections, and
V vice versa. Analogous results hold with more than two sets.
C waffle
P (B|A)P (A)
S P (A|B) =
ke (A ∪ B) = A ∩ B
c c c
P (B)
ca
V cake
c c c
waffle
(A ∩ B) = A ∪ B P (B|A, C)P (A|C)
wa S P (A|B, C) =
ffle C
cake
P (B|C)
V
S
Joint, Marginal, and Conditional We can also write
waffl
e Joint Probability P (A ∩ B) or P (A, B) – Probability of A and B.
P (A, B, C) P (B, C|A)P (A)
Let’s say we have a compound experiment (an experiment with Marginal (Unconditional) Probability P (A) – Probability of A. P (A|B, C) = =
P (B, C) P (B, C)
multiple components). If the 1st component has n1 possible outcomes, Conditional Probability P (A|B) = P (A, B)/P (B) – Probability of
the 2nd component has n2 possible outcomes, . . . , and the rth A, given that B occurred. Odds Form of Bayes’ Rule
component has nr possible outcomes, then overall there are
Conditional Probability is Probability P (A|B) is a probability P (A|B) P (B|A) P (A)
n1 n2 . . . nr possibilities for the whole experiment. =
function for any fixed B. Any theorem that holds for probability also P (Ac |B) P (B|Ac ) P (Ac )
holds for conditional probability.
Sampling Table The posterior odds of A are the likelihood ratio times the prior odds.
Probability of an Intersection or Union
Intersections via Conditioning
Random Variables and their Distributions
P (A, B) = P (A)P (B|A)
P (A, B, C) = P (A)P (B|A)P (C|A, B)
PMF, CDF, and Independence
Probability Mass Function (PMF) Gives the probability that a
2
Unions via Inclusion-Exclusion discrete random variable takes on the value x.
8
5
P (A ∪ B) = P (A) + P (B) − P (A ∩ B)
7 9 pX (x) = P (X = x)
1 4 P (A ∪ B ∪ C) = P (A) + P (B) + P (C)
3 6
− P (A ∩ B) − P (A ∩ C) − P (B ∩ C)
The sampling table gives the number of possible samples of size k out + P (A ∩ B ∩ C).
1.0
of a population of size n, under various assumptions about how the
sample is collected. Simpson’s Paradox
0.8
0.6
Order Matters Not Matter heart
pmf
n + k − 1
k
0.4
With Replacement n ●
k
n! n ● ●
0.2
Without Replacement
(n − k)! k
band-aid ● ●
0.0
Naive Definition of Probability 0 1 2 3 4
Dr. Hibbert Dr. Nick x
If all outcomes are equally likely, the probability of an event A
happening is: It is possible to have
The PMF satisfies
c c c c
P (A | B, C) < P (A | B , C) and P (A | B, C ) < P (A | B , C ) X
number of outcomes favorable to A pX (x) ≥ 0 and pX (x) = 1
Pnaive (A) = c
number of outcomes yet also P (A | B) > P (A | B ). x
, Cumulative Distribution Function (CDF) Gives the probability Indicator Random Variables LOTUS
that a random variable is less than or equal to x.
Indicator Random Variable is a random variable that takes on the Expected value of a function of an r.v. The expected value of X
FX (x) = P (X ≤ x) value 1 or 0. It is always an indicator of some event: if the event is defined this way:
occurs, the indicator is 1; otherwise it is 0. They are useful for many
problems about counting how many events of some kind occur. Write X
E(X) = xP (X = x) (for discrete X)
x
(
1 if A occurs,
1.0
●
IA =
● ●
0 if A does not occur. Z ∞
E(X) = xf (x)dx (for continuous X)
0.8
2 −∞
● ● Note that IA = IA , IA IB = IA∩B , and IA∪B = IA + IB − IA IB .
The Law of the Unconscious Statistician (LOTUS) states that
0.6
Distribution IA ∼ Bern(p) where p = P (A). you can find the expected value of a function of a random variable,
cdf
g(X), in a similar way, by replacing the x in front of the PMF/PDF by
0.4
Fundamental Bridge The expectation of the indicator for event A is
● ● the probability of event A: E(IA ) = P (A). g(x) but still working with the PMF/PDF of X:
0.2
X
Variance and Standard Deviation E(g(X)) = g(x)P (X = x) (for discrete X)
● ●
0.0
● x
2 2 2
Var(X) = E (X − E(X)) = E(X ) − (E(X))
0 1 2 3 4 Z ∞
E(g(X)) = g(x)f (x)dx (for continuous X)
q
x SD(X) = Var(X) −∞
The CDF is an increasing, right-continuous function with
Continuous RVs, LOTUS, UoU What’s a function of a random variable? A function of a random
variable is also a random variable. For example, if X is the number of
FX (x) → 0 as x → −∞ and FX (x) → 1 as x → ∞
bikes you see in an hour, then g(X) = 2X is the number of bike wheels
X(X−1)
you see in that hour and h(X) = X
Independence Intuitively, two random variables are independent if Continuous Random Variables (CRVs) 2 = 2 is the number of
knowing the value of one gives no information about the other. pairs of bikes such that you see both of those bikes in that hour.
Discrete r.v.s X and Y are independent if for all values of x and y What’s the probability that a CRV is in an interval? Take the
difference in CDF values (or use the PDF as described later). What’s the point? You don’t need to know the PMF/PDF of g(X)
P (X = x, Y = y) = P (X = x)P (Y = y) to find its expected value. All you need is the PMF/PDF of X.
P (a ≤ X ≤ b) = P (X ≤ b) − P (X ≤ a) = FX (b) − FX (a)
Expected Value and Indicators
For X ∼ N (µ, σ 2 ), this becomes Universality of Uniform (UoU)
Expected Value and Linearity
b−µ
a−µ
When you plug any CRV into its own CDF, you get a Uniform(0,1)
P (a ≤ X ≤ b) = Φ −Φ random variable. When you plug a Uniform(0,1) r.v. into an inverse
Expected Value (a.k.a. mean, expectation, or average) is a weighted σ σ
CDF, you get an r.v. with that CDF. For example, let’s say that a
average of the possible outcomes of our random variable.
random variable X has CDF
Mathematically, if x1 , x2 , x3 , . . . are all of the distinct possible values What is the Probability Density Function (PDF)? The PDF f
that X can take, the expected value of X is is the derivative of the CDF F . −x
P F (x) = 1 − e , for x > 0
E(X) = xi P (X = xi ) 0
F (x) = f (x)
i By UoU, if we plug X into this function then we get a uniformly
A PDF is nonnegative and integrates to 1. By the fundamental distributed random variable.
X Y X+Y
3 4 7
theorem of calculus, to get from PDF back to CDF we can integrate: −X
F (X) = 1 − e ∼ Unif(0, 1)
2 2 4 Z x
6 8 14 F (x) = f (t)dt
10 23 33 −∞ Similarly, if U ∼ Unif(0, 1) then F −1 (U ) has CDF F . The key point is
1 –3 –2 that for any continuous random variable X, we can transform it into a
1 0 1 Uniform random variable and back by using its CDF.
0.30
1.0
5 9 14
0.8
4 1 5
Moments and MGFs
0.20
... ... ...
0.6
CDF
0.4
n n n
0.10
1 1 1
n∑ xi + n∑ yi = n ∑ (xi + yi)
0.2
i=1 i=1 i=1
Moments
0.00
0.0
E(X) + E(Y) = E(X + Y) −4 −2 0 2 4 −4 −2 0 2 4
x x Moments describe the shape of a distribution. Let X have mean µ and
Linearity For any r.v.s X and Y , and constants a, b, c, To find the probability that a CRV takes on a value in an interval, standard deviation σ, and Z = (X − µ)/σ be the standardized version
integrate the PDF over that interval.
E(aX + bY + c) = aE(X) + bE(Y ) + c of X. The kth moment of X is µk = E(X k ) and the kth standardized
Z b moment of X is mk = E(Z k ). The mean, variance, skewness, and
Same distribution implies same mean If X and Y have the same F (b) − F (a) = f (x)dx kurtosis are important summaries of the shape of a distribution.
distribution, then E(X) = E(Y ) and, more generally, a
Mean E(X) = µ1
E(g(X)) = E(g(Y )) How do I find the expected value of a CRV? Analogous to the
Conditional Expected Value is defined like expectation, only
discrete case, where you sum x times the PMF, for CRVs you integrate Variance Var(X) = µ2 − µ21
x times the PDF.
conditioned on any event A. Z ∞
Skewness Skew(X) = m3
P E(X) = xf (x)dx
E(X|A) = xP (X = x|A) −∞
x Kurtosis Kurt(X) = m4 − 3