Definition 4.2.1 Joint pdf
The joint pdf of the 𝑘-dimensional discrete random variable 𝑋 = (𝑋1 , … , 𝑋𝑘 ) is defined as:
𝑓𝑋1 ,..,𝑋𝑘 (𝑥1 , … , 𝑥𝑘 ) = 𝑃[𝑋1 = 𝑥1 , … , 𝑋𝑘 = 𝑥𝑘 ]
Theorem 4.2.1/ 4.3.1
A function 𝑓𝑋1 ,..,𝑋𝑘 (𝑥1 , … , 𝑥𝑘 ) is the joint pdf for some vector-valued random variable
𝑋 = (𝑋1 , 𝑋2 , … , 𝑋𝑘 ) if and only if the following properties are satisfied:
• 𝑓𝑋1 ,..,𝑋𝑘 (𝑥1 , … , 𝑥𝑘 ) ⩾ 0 for all possible values (𝑥1 , 𝑥2 , … , 𝑥𝑘 )
• For discrete case:
o ∑𝑥1 ⋯ ∑𝑥𝑘 𝑓𝑋1 ,..,𝑋𝑘 (𝑥1 , … , 𝑥𝑘 ) = 1
• For continuous case:
∞ ∞
o ∫−∞ ⋯ ∫−∞ 𝑓𝑋1 ,..,𝑋𝑘 (𝑥1 , … , 𝑥𝑘 )𝑑𝑥1 ⋯ 𝑑𝑥𝑘 = 1
Definition 4.2.2 Marginal pdf
Let 𝑋 be a discrete random variable 𝑋 = (𝑋1 , 𝑋2 , … , 𝑋𝑘 ), the marginal pdf of 𝑋𝑖 is given by:
𝑓𝑋𝑖 (𝑥𝑖 ) = ∑ … ∑ 𝑓𝑋1 ,..,𝑋𝑘 (𝑥1 , … , 𝑥𝑘 )
𝑗≠𝑖
Let 𝑋 be a continuous random variable 𝑋 = (𝑋1 , 𝑋2 , … , 𝑋𝑘 ), the marginal pdf of 𝑋𝑖 is given by:
∞ ∞
𝑓𝑋𝑖 (𝑥𝑖 ) = ∫ … ∫ 𝑓𝑋1 ,..,𝑋𝑘 (𝑥1 , … , 𝑥𝑘 ) ∏ 𝑑𝑥𝑗
−∞ −∞ 𝑗≠𝑖
Definition 4.2.3 Joint cdf
The joint cumulative distribution function of 𝑘 random variables 𝑋1 , 𝑋2 , … , 𝑋𝑘 is the function:
𝐹(𝑥1 , … , 𝑥𝑘 ) = 𝑃[𝑋1 ⩽ 𝑥1 , … , 𝑋𝑘 ⩽ 𝑥𝑘 ]
Theorem 𝑪𝑫𝑭 → 𝒑𝒅𝒇 (continuous)
∂𝑘
𝑓𝑋1 ,..,𝑋𝑘 (𝑥1 , … , 𝑥𝑘 ) = 𝐹 (𝑥 , … , 𝑥𝑘 )
∂𝑥1 ⋯ ∂𝑥𝑘 𝑋1 ,..,𝑋𝑘 1
Definition 4.3.3 Marginal cdf (continuous)
If a continuous 𝑘-dimensional vector-valued random variable 𝑋 = (𝑋1 , 𝑋2 , … , 𝑋𝑘 ) has CDF
𝐹𝑋1 ,..,𝑋𝑘 (𝑥1 , … , 𝑥𝑘 ) then the marginal CDF of 𝑋𝑗 is:
, 𝐹𝑗 (𝑥𝑗 ) = 𝑥𝑙𝑖𝑚 𝐹 (𝑥 , … , 𝑥𝑗 , … , 𝑥𝑘 )
→∞ 𝑋1 ,..,𝑋𝑘 1
𝑖
all i≠j
Theorem 4.4.1 Independence
𝑋1 , 𝑋2 , … , 𝑋𝑘 are independent random variables if and only if the following properties hold:
𝐹𝑋1 ,..,𝑋𝑘 (𝑥1 , … , 𝑥𝑘 ) = 𝐹1 (𝑥1 ) ⋯ 𝐹𝑘 (𝑥𝑘 )
𝑓𝑋1 ,..,𝑋𝑘 (𝑥1 , … , 𝑥𝑘 ) = 𝑓1 (𝑥1 ) ⋯ 𝑓𝑘 (𝑥𝑘 )
Note that these are products of the marginals
Before the next theorem we define a Cartesian product:
Let 𝐴, 𝐵 be sets. Then the Cartesian product of 𝐴 and 𝐵 is denoted: 𝐴 × 𝐵 = {(𝑎, 𝑏): 𝑎 ∈ 𝐴, 𝑏 ∈ 𝐵}
For example (𝑥, 𝑦) ∈ ℝ2 can be written as ℝ × ℝ = {(𝑥, 𝑦): 𝑥 ∈ ℝ, 𝑦 ∈ ℝ}
Theorem 4.4.2 Independence
Two random variables 𝑋1 , 𝑋2 with joint pdf 𝑓𝑋1 𝑋2 (𝑥1 , 𝑥2 ) are independent ⇔
• The support set {(𝑥1 , 𝑥2 ) ∣ 𝑓𝑋1 𝑋2 (𝑥1 , 𝑥2 ) > 0} is a Cartesian product 𝐴 × 𝐵 (so a rectangle)
and
• The joint pdf can be factorized into the product of functions of 𝑥1 and 𝑥2 :
𝑓𝑋1 𝑋2 (𝑥1 , 𝑥2 ) = 𝑔𝑋1 (𝑥1 )ℎ𝑋2 (𝑥2 )
Definition Conditional Distribution
For both discrete and continuous the conditional distribution is defined as following:
𝑓𝑋 𝑋 (𝑥1 , 𝑥2 )
𝑓𝑋2 ∣𝑋1 (𝑥2 ∣ 𝑥1 ) = 1 2
𝑓𝑋1 (𝑥1 )
Definition Order Statistics and Empirical Distribution Functions
Let 𝑋1 , … , 𝑋𝑛 be a random sample, then we can rearrange this sample from smallest to largest:
𝑋1:𝑛 ≤ 𝑋2:𝑛 ≤ ⋯ ≤ 𝑋𝑛:𝑛
This is called the order statistics.
From this follow the empirical CDF and pdf of this distribution:
𝑛
1 1
𝐹𝑛 (𝑥) = ∑ 𝕀(−∞,𝑥] (𝑋𝑖:𝑛 ) 𝑓𝑛 (𝑋𝑖:𝑛 ) = , 𝑖 = 1, … , 𝑛
𝑛 𝑛
𝑖=1
Note that for 𝑥 ∈ ℝ, we have that
𝑛 𝑛
𝑛𝐹𝑛 (𝑥) = ∑ 1(−∞,𝑥] (𝑋𝑖:𝑛 ) = ∑ 1(−∞,𝑥] (𝑋𝑖 ) ∼ Bin(𝑛, 𝐹(𝑥))
𝑖=1 𝑖=1
,Definition Sample Mean and Variance
The sample mean of the empirical distribution of random variable 𝑋 = (𝑋1 , … , 𝑋𝑛 ) is given by:
𝑛
𝑥𝑖
𝑥̅ = ∑
𝑛
𝑖=1
The sample variance is given by:
𝑛
1
2
𝑠 = ∑ (𝑋𝑖 − 𝑋̅)2
𝑛−1
𝑖=1
Theorem 5.2.1
If 𝑋 = (𝑋1 , … , 𝑋𝑘 ), has joint pdf 𝑓(𝑥1 , … , 𝑥𝑘 ) and if 𝑌 = 𝑢(𝑋1 , … , 𝑋𝑘 ) is a function of 𝑋, then:
𝐸(𝑌) = 𝐸𝑋 [𝑢(𝑋1 , … , 𝑋𝑘 )] = ∑ ⋯ ∑ 𝑢(𝑥1 , … , 𝑥𝑘 )𝑓(𝑥1 , … , 𝑥𝑘 )
𝑥1 𝑥𝑘
if discrete, and
∞ ∞
𝐸𝑋 [𝑢(𝑋1 , … , 𝑋𝑘 )] = ∫ ⋯∫ 𝑢(𝑥1 , … , 𝑥𝑘 )𝑓(𝑥1 , … , 𝑥𝑘 )𝑑𝑥1 … 𝑑𝑥𝑘
−∞ −∞
if continuous
Definition 5.2.1 Covariance
The covariance of a pair of random variables 𝑋 and 𝑌 is defined by:
Cov(𝑋, 𝑌) = 𝐸[(𝑋 − 𝜇𝑋 )(𝑌 − 𝜇𝑌 )] = 𝐸[𝑋𝑌] − 𝐸[𝑋]𝐸[𝑌]
The covariance is also denoted by 𝜎𝑋𝑌
Now if 𝑋 and 𝑌 are independent it follows that
Cov(𝑋, 𝑌) = 𝐸[𝑋𝑌] − 𝐸[𝑋]𝐸[𝑌] = 𝐸[𝑋]𝐸[𝑌] − 𝐸[𝑋]𝐸[𝑌] = 0
Theorem 5.2.6
If 𝑋1 , … , 𝑋𝑛 are random variables with joint pdf 𝑓𝑋1 ,…,𝑋𝑛 (𝑥1 , … , 𝑥𝑛 ) then
𝑘 𝑘
Var (∑ 𝑎𝑖 𝑋𝑖 ) = ∑ 𝑎𝑖2 Var(𝑋𝑖 ) + 2 ∑ ∑ 𝑎𝑖 𝑎𝑗 Cov(𝑋𝑖 , 𝑋𝑗 )
𝑖=1 𝑖=1 𝑖<𝑗
Theorem Bilinearity property of Covariance
𝑘 ℓ 𝑘 ℓ
Cov (∑ 𝑎𝑖 𝑋𝑖 , ∑ 𝑏𝑗 𝑌𝑗 ) = ∑ ∑ 𝑎𝑖 𝑏𝑗 Cov(𝑋𝑖 , 𝑌𝑗 )
𝑖=1 𝑖=1 𝑖=1 𝑖=1
,Definition Correlation Coefficient
The correlation coefficient of random variables 𝑋 and 𝑌 is defined by:
𝜎𝑋𝑌 𝐶𝑜𝑣(𝑋, 𝑌)
𝜌𝑋𝑌 = =
𝜎𝑋 𝜎𝑌 √𝑉𝑎𝑟(𝑋)𝑉𝑎𝑟(𝑌)
Theorem 5.3.1
For the correlation coefficient of random variables 𝑋 and 𝑌, it holds that:
−1 ≤ 𝜌𝑋𝑌 ≤ 1
Note: 𝜌𝑋𝑌 ± 1 ⇔ 𝑃(𝑌 = 𝑎𝑋 + 𝑏) = 1 where 𝑎, 𝑏 ∈ ℝ and 𝑎 ≠ 0
Definition 5.4.1 Conditional Expectation
If 𝑋 and 𝑌 are jointly distributed random variables, then the conditional expectation of 𝑌 given
𝑋 = 𝑥 is given by:
𝐸[ 𝑌 ∣ 𝑋 = 𝑥 ] = ∑ 𝑦𝑓𝑌|𝑋 ( 𝑌 = 𝑦 ∣ 𝑋 = 𝑥 ) if 𝑋 and 𝑌 are discrete
𝑦
∞
𝐸[ 𝑌 ∣ 𝑋 = 𝑥 ] = ∫ 𝑦𝑓𝑌|𝑋 ( 𝑌 = 𝑦 ∣ 𝑋 = 𝑥 )𝑑𝑦 if 𝑋 and 𝑌 are continuous
−∞
Theorem 5.4.1 Law of Iterated Expectations (𝑳𝑰𝑬)
𝐸[𝑌] = 𝐸[𝐸[𝑌|𝑋]]
Proof continuous case:
∞
𝐸[𝐸[𝑌|𝑋]] = ∫ 𝐸[𝑌|𝑋 = 𝑥]𝑓𝑋 (𝑥)𝑑𝑥
−∞
∞ ∞
=∫ ∫ 𝑦𝑓𝑌|𝑋=𝑥 (𝑦|𝑥)𝑓𝑋 (𝑥)𝑑𝑥𝑑𝑦
−∞ −∞
∞ ∞
𝑓𝑋,𝑌 (𝑥, 𝑦)
=∫ ∫ 𝑦 𝑓 (𝑥)𝑑𝑥𝑑𝑦
−∞ −∞ 𝑓𝑋 (𝑥) 𝑋
∞ ∞
=∫ ∫ 𝑦𝑓𝑋,𝑌 (𝑥, 𝑦)𝑑𝑥𝑑𝑦
−∞ −∞
∞ ∞
=∫ 𝑦∫ 𝑓𝑋,𝑌 (𝑥, 𝑦)𝑑𝑥𝑑𝑦
−∞ −∞
∞
=∫ 𝑦𝑓𝑌 (𝑦)𝑑𝑦
−∞
= 𝐸[𝑌]
, Definition 5.4.2 Conditional Variance
The conditional variance of 𝑌 given 𝑋 = 𝑥 is given by:
Var( 𝑌 ∣ 𝑋 = 𝑥 ) = 𝐸((𝑌 − 𝐸(𝑌 ∣ 𝑋 = 𝑥))2 ∣ 𝑋 = 𝑥)
Equivalently,
Var (𝑌 ∣ 𝑋 = 𝑥) = 𝐸(𝑌 2 ∣ 𝑋 = 𝑥) − [𝐸(𝑌 ∣ 𝑋 = 𝑥)]2
Theorem 5.4.3 Law of Total Variance
Var(𝑌) = 𝐸(Var( 𝑌 ∣ 𝑋 )) + Var(𝐸( 𝑌 ∣ 𝑋 ))
Proof
𝐸[𝑉𝑎𝑟(𝑌|𝑋)] = 𝐸[𝐸[𝑌 2 |𝑋]] − 𝐸[(𝐸[𝑌|𝑋])2 ] = 𝐸[𝑌 2 ] − 𝐸[𝐸[𝑌|𝑋]2 ]
2
𝑉𝑎𝑟(𝐸[𝑌|𝑋]) = 𝐸[𝐸[𝑌|𝑋]2 ] − 𝐸[𝐸[𝑌|𝑋]] ⇔ 𝑉𝑎𝑟(𝐸[𝑌|𝑋]) + 𝐸[𝑌]2 = 𝐸[𝐸[𝑌|𝑋]2 ]
It follows that:
𝐸[𝑉𝑎𝑟(𝑌|𝑋)] = 𝐸[𝑌 2 ] − 𝐸[𝐸[𝑌|𝑋]2 ]
⇔ 𝐸[𝑉𝑎𝑟(𝑌|𝑋)] = 𝐸[𝑌 2 ] − 𝐸[𝑌]2 − 𝑉𝑎𝑟(𝐸[𝑌|𝑋])
⇔ 𝐸[𝑉𝑎𝑟(𝑌|𝑋)] + 𝑉𝑎𝑟(𝐸[𝑌|𝑋]) = 𝑉𝑎𝑟(𝑌)
∎
Definition 5.5.1 Joint MGF
IF the joint 𝑀𝐺𝐹 of 𝑋 = (𝑋1 , … , 𝑋𝑘 ) exists, it is defined to be:
𝑀𝐺𝐹𝑋1 ,…,𝑋𝑘 (𝑡) = 𝐸[𝑒 𝑡1 𝑋1 +⋯+𝑡𝑘 𝑋𝑘 ]
To obtain the marginal MGF of, say, 𝑋𝑖 we set all 𝑡𝑗 = 0 ∶ 𝑗 ≠ 𝑖, we obtain:
𝑀𝐺𝐹𝑋𝑖 (𝑡) = 𝐸[𝑒 0𝑋1 +⋯+𝑡𝑖𝑋𝑖+⋯+0𝑋𝑘 ] = 𝐸[𝑒 𝑡𝑖𝑋𝑖 ]