1. Set: a collection of members
S = {1,2,3,4,5,6} = [1, 6 ]
2. A B A is a subset of B
3. Union (U) OR +
Intersection (n) AND *
Chapter 2
Choose - Order does not matter R: Choose(n,k)
Permutation - Order matters R: Prod(n-k+1:k)
Independence test Mutually Exclusive test
𝑃(𝐴 ∩ 𝐵) 𝑃(𝐴 ∩ 𝐵) = 0
𝑃(𝐴|𝐵) = = 𝑃(𝐴)
𝑃(𝐵)
Addition Rule Addition Rule
𝑃(𝐴 ∪ 𝐵) = Pr(𝐴) + Pr(𝐵) − Pr (𝐴 ∩ 𝐵) 𝑃(𝐴 ∪ 𝐵) = Pr(𝐴) + Pr(𝐵)
Conditional Probability Independence R test 𝑃(𝑋|𝑌)
𝑃(𝐴|𝐵). 𝑃(𝐵) = 𝑃(𝐵|𝐴). 𝑃(𝐴) = 𝑃(𝐴 ∩ 𝐵) sum(tbl$probability[tbl$x==1 & tbl$y == 2] /
𝑃(𝐴 ∩ 𝐵) = 𝑃(𝐴). 𝑃(𝐵) sum(tbl$probability[tbl$y ==2]
Conditional Probability
Experiment: to the process where we obtain data from.
Sample Space: Collection of all probable outcomes of an experiment
Event: a collection of some of outcomes in a sample space
Fallacy: Because the theoretical probability is for indefinite repeat of experiment. If we repeat the
experiment infinitely, the probability that will be getting closer to the theoretical probability value.
Simple sample space is where the probabilities of each outcome are the same.
Chapter 3
dbinom: 2 outcome punif: cumu uni distri, dnorm: pdf f(x), pnorm: cdf F(q),
sum(dbinom(40:45,50,0.7)): dunif: non-cumu uni distri qnorm: revert cdf F(q), rnom: random draw
Chapter 4
𝐸(𝑋 + 𝑋 + 𝑋 ) = 𝐸(𝑋 ) + 𝐸(𝑋 ) + 𝐸(𝑋 ) Var(X1+X2) = Var(X1) + Var(X2) if independent
𝐸(𝑎𝑋 + 𝑏𝑋 + 𝑐) = 𝑎𝐸(𝑋) + 𝑏𝐸(𝑋) + 𝑐 Var(aX+c) = a2Var(X)
𝐸(𝑋 𝑋 ) = 𝐸(𝑋 )𝐸(𝑋 ), if independent Variance = spread of distribution
Uniform (Die) Equal Chance Bernouli (Coin) 2 outcome Binomial (MCQ) Finite trials
E(X) (n+1)/2 p X=1, P X=0, 1-P np nCx * px *(1-p)n-x
Var(X) (𝑛 − 1)(𝑛 + 1)
p(1-p) np(1-p)
12
An appliance has a maximum lifetime of one year. The time X until it fails is a random variable with a continuous
distribution with the p.d.f.
2𝑥 𝑓𝑜𝑟 0 < 𝑥 < 1 1 1
Given: 𝑓(x) = 𝐴𝑛𝑠: 𝐸 = ∫ 𝑥𝑓(𝑥)𝑑𝑥 = ∫ 𝑥(2𝑥)𝑑𝑥 = 0.667
0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 0 0
R: fx <- function(x) (x*2*x) multiply x in front
integrate(fx,0,1) = 0.667
A product has a warranty of one year. Let X be the time at which the product fails. Suppose that X has a continuous
distribution with the p.d.f
0 𝑓𝑜𝑟 𝑥 < 1 ∞ ∞
𝐴𝑛𝑠: 𝐸 = ∫ 𝑥𝑓(𝑥)𝑑𝑥 = ∫ 𝑥 𝑑𝑥 =
Given: f(x) = 1 0
𝑓𝑜𝑟 𝑥 ≥ 1 R: fx <- function(x) x*3/(x^3)
integrate(fx, 1, Inf) = 3 (expected value)
x<- c(1,2,3,4,5)
fx<- rep(0.2,5) OR c(0.1, 0.3, 0.2, 0.3, 0.1)
xfx = x*fx
m= sum(xfx)
, var = fx*( x- m)^2
𝜎 = 𝑠𝑞𝑟𝑡(𝑣𝑎𝑟)
Previous Chapters
Independence test Mutually Exclusive test
𝑃(𝐴 ∩ 𝐵) 𝑃(𝐴 ∩ 𝐵) = 0
𝑃(𝐴|𝐵) = = 𝑃(𝐴)
𝑃(𝐵)
Addition Rule Addition Rule
𝑃(𝐴 ∪ 𝐵) = Pr(𝐴) + Pr(𝐵) − Pr (𝐴 ∩ 𝐵) 𝑃(𝐴 ∪ 𝐵) = Pr(𝐴) + Pr(𝐵)
Conditional Probability Independence R test 𝑃(𝑋|𝑌)
𝑃(𝐴|𝐵). 𝑃(𝐵) = 𝑃(𝐵|𝐴). 𝑃(𝐴) = 𝑃(𝐴 ∩ 𝐵) sum(tbl$probability[tbl$x==1 & tbl$y == 2]
𝑃(𝐴 ∩ 𝐵) = 𝑃(𝐴). 𝑃(𝐵) / sum(tbl$probability[tbl$y ==2]
CHAPTER 5
Bivariate distribution shows the probabilities for different value combinations of two random variables X and Y.
If X and Y are statistically independent, their bivariate distribution will be f(X)*f(Y), the probability that you pick a
value for X then another value independently for Y.
Check for independence in Bivariate Distribution? 𝑃(𝐴 ∩ 𝐵) = 𝑃(𝐴). 𝑃(𝐵)
Load file >car=read.csv(file.choose(), header=T) R uses Sample Var()
Cov(X,Y) = E[ (X- 𝜇 )(Y-𝜇 ) ] Var(X1+X2) = Var(X1) + Var(X2) if independent
(X- 𝜇 ): deviation X from its mean Else,
(Y-𝜇 ): deviation Y from its mean Var(aX1+bX2 +C) = a2Var(X1) + b2Var(X2) + 2abCov(X1,X2)
Positive correlation: Cov(X,Y) > 0 Var(aX1+bX2) = a2Var(X1) + b2Var(X2) + 2ab (𝑃 𝜎 𝜎 )
Negative correlation: Cov(X,Y) < 0 Thus, coefficient as follows:
No correlation: Cov(X,Y) = 0 ( , )
𝑃 = , takes value -1 to +1
Covariance is an Expectation, whose value In R Studio,
is a probability-weighted average of the Sample Cov(X,Y) has to be converted to Population Cov(X,Y)
random variable inside the expectation. CovPopulation(X,Y) = CovSample(X,Y)*
High Cov shows strong + correlations, dots are closer with incline up sloop.
rowsum(zfile$probabilities,group=zfile$x), for grouping of probabilities based on x’s values, probabilities is a colm.
sum(file$probabilities[file$x>file$y]), finding probability when values of x>y
sum(file$probabilities[file$x==1]), finding probability when value of x = 1
sum(file$probabilities[file$x<= 2 & file$y <= 2]), finding probability when value of x <= 2 and y< = 2
Covariance≠Causality
Correlation is about the observed co-movement of two random variables, while causality is about understanding the
underlying process/mechanism of the co-movement (which can involve the third and more variables).
CHAPTER 6
Statistical Inference is use of partial data to infer the population.
1. I assume the income distribution in the world follows a normal distribution.
2. Collect income data from a number of working adults.
3. Use data to predict the shape and location of the hypothetical normal distribution, such as the population-level average
income.
Random sample size, n are (1) mutually independent (2) each of them has the same marginal distribution
Independently and Identically Distributed (iid) – (1) Identical distribution, (2) Dependency
Random Integers Random String of numbers
>sample(1:24, n) >runif(n, min=1, max, 10) /rbinom/ rnorm
Properties of data, R studio
>numbers=c(1,2,3,4,5,6,7,8,9) Median helps to remove outliers, while mean
>summary(numbers) obtain min,Q1, median, mean, Q3, max compute all data given in the sample space.
>IQR(numbers) Lower Limit = Higher(Q1 – 1.5 * IQR, min)
>library(“datasets”) ChickWeight is in “datasets” Higher Limit = Lower(Q3 + 1.5 * IQR, max)
>boxplot(chick$weight~chickdiet) where Y,weight~ X,diet type