Written by students who passed Immediately available after payment Read online or as PDF Wrong document? Swap it for free 4.6 TrustPilot
logo-home
Summary

Summary Statistical Computing (JBM050)

Rating
4.0
(1)
Sold
10
Pages
7
Uploaded on
12-06-2022
Written in
2020/2021

A short summary discussing the Statistical Computing JBM050 course for the bachelor Data Science in Tilburg and Eindhoven. This summary is based on the lectures and the reading materials.

Institution
Course

Content preview

Sampling distribution
Statistics The use of data in the context of uncertainty, a branch of mathematics using probability theory.

Bernoulli trial 𝑋~𝐵𝑒𝑟𝑛(𝜋) Binomial trial 𝑋~𝐵𝑖𝑛(𝑛, 𝜋)
A random experiment with exactly 2 A repetition of the Bernoulli trial. P of k successes in n repetitions:
outcomes (binary variables): “success”
𝑛! 𝑛
[P(X=1)=π], and “failure” [P(X=0)=1-π]. 𝑃(𝑋 = 𝑘) = 𝜋 𝑘 (1 − 𝜋)𝑛−𝑘 = ( ) 𝜋 𝑘 (1 − 𝜋)𝑛−𝑘
𝑘!(𝑛−𝑘)! 𝑘

Hypergeometric trial 𝑋~ℎ𝑦𝑝𝑒𝑟𝑔𝑒𝑜𝑚𝑒𝑡𝑟𝑖𝑐(𝑁, 𝐾, 𝑛) 𝐾 𝑁−𝐾
( )( )
Calculate the probability of drawing k elements of the K items in a set n with a certain 𝑃(𝑋 = 𝑘) = 𝑘 𝑛 − 𝑘
𝑁
( )
feature without replacement: 𝑛



Estimator An approximation of a population parameter that uses observed data (statistics).
µ → µ̂/𝑥̅ 𝜎 2 → 𝜎̂ 2 /𝑠 2
The population parameter is often denoted using 𝜃, the sample estimate is denoted using 𝜃̂

Normal distribution 𝑋~𝑁(µ, 𝜎 2 ) Sampling distribution If 𝑋~𝑁(µ, 𝜎 2 ),
2
A distribution with the shape of a Probability distribution of the sample statistic. The 𝜎
then 𝑥̅ ~𝑁(µ, )
𝑛
bell-curve. Usually the model statistic is the random variable in the distribution.
parameters need to be estimates, as The sd 𝜎 of a sample statistic is the same as the Standard error.
the population model is unknown. It expresses the uncertainty about the statistic.



Bias, Variance, MSE
A statistic is unbiased if the mean of the sampling distribution coincides with the population parameter.
A statistic has low variance if the deviation from the mean is very small.
2 2 2
𝑩𝒊𝒂𝒔𝟐 = (𝜃 − 𝐸(𝜃̂)) 𝑽𝒂𝒓𝒊𝒂𝒏𝒄𝒆 = 𝐸 ((𝐸(𝜃̂) − 𝜃̂) ) 𝑺𝒕𝒂𝒏𝒅𝒂𝒓𝒅 𝒆𝒓𝒓𝒐𝒓 = √𝑣𝑎𝑟 = √𝐸 ((𝐸(𝜃̂) − 𝜃̂) )
2
̂ ) = 𝐸(𝜃 − 𝜃̂) = 𝑏𝑖𝑎𝑠 2 + 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒
𝑴𝒆𝒂𝒏 𝒔𝒒𝒖𝒂𝒓𝒆𝒅 𝒆𝒓𝒓𝒐𝒓 𝒐𝒇 𝒆𝒔𝒕𝒊𝒎𝒂𝒕𝒊𝒐𝒏 = 𝑴𝑺𝑬(𝜽

Efficient estimators have a low MSE. It’s a difficult situation to minimize, as it results in a Bias-Variance Tradeoff. A
lower variance is better, but the Bias is ideally equal to 0. When comparing statistics, this MSE is used most often.

The variance is the variance regarding the sample mean, the MSE is the variance regarding the population mean.

Sample mean of 𝑋~𝑁(µ, 𝜎 2 )
𝜽 ̂
𝜽 Sampling distribution Bias Variance MSE
µ 1
𝑥̅ = ∑𝑛𝑖=1 𝑥𝑖 𝑥̅ ~𝑁(µ,
𝜎2
) 𝐸(𝑥̅ ) = µ 𝜎2 𝜎2
𝑛 𝑛 Bias = 0 𝑛 𝑏𝑖𝑎𝑠 2 + 𝑣𝑎𝑟 =
𝑛
Sample mean of X~N(µ, ?) (unknown variance)
𝜽 ̂
𝜽 Sampling distribution Bias Variance MSE
µ 𝑥̅ 𝑡 − 𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛 𝐸(𝑥̅ ) = µ 𝑠2 𝑠2
𝑥̅ −µ Bias = 0 𝑛 𝑏𝑖𝑎𝑠 2 + 𝑣𝑎𝑟 =
~𝑡𝑛−1 1 𝑛
𝑠/√𝑛 𝑠2 = ∑(𝑥𝑖 − 𝑥̅ )2
𝑛−1


Student’s t distribution 𝑡~𝑡𝑛−1
𝑥̅ −µ
T is the t-statistic 𝑡 = .
𝑠/√𝑛
This is the sample distribution when the population distribution is Normal, and the variance is unknown.
𝑛 − 1 signifies the degrees of freedom, the higher the degrees of freedom, the closer it gets to ~N().

Maximum Likelihood estimator Unbiased estimator
1 1
𝜎 2 = ∑(𝑥𝑖 − 𝑥̅ )2 𝜎2 = ∑(𝑥𝑖 − 𝑥̅ )2
𝑛 𝑛−1


When using the MSE to analyze the estimators, the Maximum Likelihood estimator of smaller samples is more
efficient.

, If you have 2 samples from normally distributed data: 𝑋1 ~𝑁(µ1 , 𝜎 2 ) and 𝑋2 ~𝑁(µ2 , 𝜎 2 )
The sampling distribution of the difference in sample means is a t distribution with 𝑛1 + 𝑛2 − 2 degrees of freedom,
centered at µ2 − µ1 .

sampling distribution of the difference in sample means 𝑡~𝑡𝑛1+𝑛2 −2
This is centered at µ2 − µ1
The Standard error is calculated using: 𝑆𝐸(𝑥̅2 − 𝑥̅1 ) = 𝑠𝑝 √
1
+
1 (𝑛1 −1)𝑠12 +(𝑛2 −1)𝑠22
, 𝑠𝑝 = √
𝑥̅2 − 𝑥̅1 𝑛1 𝑛2 𝑛1 +𝑛2 −2
𝑡=
𝑆𝐸(𝑥̅2 − 𝑥̅2 )


Central Limit Theorem
If n is large enough, the sample mean of X coming from 𝑋~? (µ, 𝜎 2 ) with mean µ and variance
𝜎2
𝜎 2 is approximately the normal distribution 𝑥̅ ~𝑁(µ, )
𝑛




Monte Carlo Simulation
Computer simulation A numerical technique for conducting experiments on the computer. A tool to virtually
investigate the behavior of the system
Monte Carlo Simulation Computer experiment involving random sampling from probability distributions.
Used for estimators and for hypothesis testing (in absence of analytical results)

MC simulations for estimators
An estimator or test statistic has a true sampling distribution under a particular set of conditions. We want to know
this distribution. The derivation is however not always tractable. The MC simulation can be used to approximate the
distribution.
Step 1: Create approximate sampling distribution
Generate S independent data sets of given sample size n under the conditions of interest
Compute the numerical value of the estimator/test statistic 𝜃̂ for each dataset.
Step 2: Derive bias, var, MSE, relative efficiency
If S is large enough, the summary statistics should be a good approximation to the true sampling properties

̂ (1) )
𝑀𝑆𝐸(𝜃
Relative efficiency 𝑅𝐸 = ̂ (2) )
1 if 𝑅𝐸 < 1, estimator 1 is preferred
𝑀𝑆𝐸(𝜃

The sample median is most efficient for distributions with thick tails.
If the distribution is more similar to a normal distribution the mean is more useful.

MC simulations for hypothesis testing
t-statistic:
There are two types of hypothesis testing situations: 𝑥̅ −𝑥̅
1) Randomness (𝐻0 ) vs. Non-randomness (𝐻1 ) of data 𝑡𝑜𝑏𝑠 = 2̅ 1̅
𝑆𝐸(𝑋2 −𝑋1 )
2) No effect (𝐻0 ) vs. Effect (𝐻1 )
𝐻0 is rejected if the observed data/statistics are very unlikely under the assumption of randomness and no effect.

Confidence intervals
Confidence intervals This expresses sampling uncertainty. Often this is mentioned Two sided t-confidence:
instead of the point estimate. [(𝑥̅ 2 − 𝑥̅1 ) − 𝑡𝐶;𝑛1+𝑛2 −2 𝑆𝐸(𝑥̅2 − 𝑥̅1 );
It holds the true population parameter 𝜃 with a probability of C. (𝑥̅2 − 𝑥̅1 ) + 𝑡𝐶;𝑛1+𝑛2−2 𝑆𝐸(𝑥̅2 − 𝑥̅1 )]
A two-sample Student’s t-test does rely on some assumptions: the samples must come
from a normal distribution, and the variances are equal. If these are violated it can impact the quality of the
hypothesis test. If the variances are not equal, Welch’s test applies.

Power of a test complement of Type II error Significance level Type I error
𝑃(𝑡𝑒𝑠𝑡 𝑐𝑜𝑟𝑟𝑒𝑐𝑡𝑙𝑦 𝑟𝑒𝑗𝑒𝑐𝑡𝑠 𝐻0 | 𝐻1 𝑡𝑟𝑢𝑒) = 1 − 𝛽 𝑃( 𝑡𝑒𝑠𝑡 𝑟𝑒𝑗𝑒𝑐𝑡𝑠 𝐻0 ∣ 𝐻0 𝑡𝑟𝑢𝑒 ) = 𝛼
The probability of correctly rejecting 𝐻0
generate data under 𝐻0 : µ = µ0
Generate data under 𝐻1 : µ ≠ µ0 calculate how often 𝐻0 is rejected, this approximates 𝛼.
calculate the proportion of rejections.


1 Compare two estimators, e.g. 𝜃̂ (1) is the mean and 𝜃̂ (2) is the median

Written for

Institution
Study
Course

Document information

Uploaded on
June 12, 2022
Number of pages
7
Written in
2020/2021
Type
SUMMARY

Subjects

$6.53
Get access to the full document:

Wrong document? Swap it for free Within 14 days of purchase and before downloading, you can choose a different document. You can simply spend the amount again.
Written by students who passed
Immediately available after payment
Read online or as PDF

Reviews from verified buyers

Showing all reviews
1 year ago

4.0

1 reviews

5
0
4
1
3
0
2
0
1
0
Trustworthy reviews on Stuvia

All reviews are made by real Stuvia users after verified purchases.

Get to know the seller

Seller avatar
Reputation scores are based on the amount of documents a seller has sold for a fee and the reviews they have received for those documents. There are three levels: Bronze, Silver and Gold. The better the reputation, the more your can rely on the quality of the sellers work.
NienkeUr Technische Universiteit Eindhoven
Follow You need to be logged in order to follow users or courses
Sold
50
Member since
3 year
Number of followers
18
Documents
11
Last sold
1 month ago

4.5

4 reviews

5
2
4
2
3
0
2
0
1
0

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

Student with book image

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Working on your references?

Create accurate citations in APA, MLA and Harvard with our free citation generator.

Working on your references?

Frequently asked questions