Summary

Summary Statistical Computing (JBM050)

Name: Summary Statistical Computing (JBM050)
SKU: doc_1788809
Rating: 4.00 (1 reviews)
Author: NienkeUr

Rating

4.0

(1)

Sold

Pages

Uploaded on

12-06-2022

Written in

2020/2021

A short summary discussing the Statistical Computing JBM050 course for the bachelor Data Science in Tilburg and Eindhoven. This summary is based on the lectures and the reading materials.

Institution

Course

Content preview

Sampling distribution
Statistics The use of data in the context of uncertainty, a branch of mathematics using probability theory.

Bernoulli trial 𝑋~𝐵𝑒𝑟𝑛(𝜋) Binomial trial 𝑋~𝐵𝑖𝑛(𝑛, 𝜋)
A random experiment with exactly 2 A repetition of the Bernoulli trial. P of k successes in n repetitions:
outcomes (binary variables): “success”
𝑛! 𝑛
[P(X=1)=π], and “failure” [P(X=0)=1-π]. 𝑃(𝑋 = 𝑘) = 𝜋 𝑘 (1 − 𝜋)𝑛−𝑘 = ( ) 𝜋 𝑘 (1 − 𝜋)𝑛−𝑘
𝑘!(𝑛−𝑘)! 𝑘

Hypergeometric trial 𝑋~ℎ𝑦𝑝𝑒𝑟𝑔𝑒𝑜𝑚𝑒𝑡𝑟𝑖𝑐(𝑁, 𝐾, 𝑛) 𝐾 𝑁−𝐾
( )( )
Calculate the probability of drawing k elements of the K items in a set n with a certain 𝑃(𝑋 = 𝑘) = 𝑘 𝑛 − 𝑘
𝑁
( )
feature without replacement: 𝑛

Estimator An approximation of a population parameter that uses observed data (statistics).
µ → µ̂/𝑥̅ 𝜎 2 → 𝜎̂ 2 /𝑠 2
The population parameter is often denoted using 𝜃, the sample estimate is denoted using 𝜃̂

Normal distribution 𝑋~𝑁(µ, 𝜎 2 ) Sampling distribution If 𝑋~𝑁(µ, 𝜎 2 ),
2
A distribution with the shape of a Probability distribution of the sample statistic. The 𝜎
then 𝑥̅ ~𝑁(µ, )
𝑛
bell-curve. Usually the model statistic is the random variable in the distribution.
parameters need to be estimates, as The sd 𝜎 of a sample statistic is the same as the Standard error.
the population model is unknown. It expresses the uncertainty about the statistic.

Bias, Variance, MSE
A statistic is unbiased if the mean of the sampling distribution coincides with the population parameter.
A statistic has low variance if the deviation from the mean is very small.
2 2 2
𝑩𝒊𝒂𝒔𝟐 = (𝜃 − 𝐸(𝜃̂)) 𝑽𝒂𝒓𝒊𝒂𝒏𝒄𝒆 = 𝐸 ((𝐸(𝜃̂) − 𝜃̂) ) 𝑺𝒕𝒂𝒏𝒅𝒂𝒓𝒅 𝒆𝒓𝒓𝒐𝒓 = √𝑣𝑎𝑟 = √𝐸 ((𝐸(𝜃̂) − 𝜃̂) )
2
̂ ) = 𝐸(𝜃 − 𝜃̂) = 𝑏𝑖𝑎𝑠 2 + 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒
𝑴𝒆𝒂𝒏 𝒔𝒒𝒖𝒂𝒓𝒆𝒅 𝒆𝒓𝒓𝒐𝒓 𝒐𝒇 𝒆𝒔𝒕𝒊𝒎𝒂𝒕𝒊𝒐𝒏 = 𝑴𝑺𝑬(𝜽

Efficient estimators have a low MSE. It’s a difficult situation to minimize, as it results in a Bias-Variance Tradeoff. A
lower variance is better, but the Bias is ideally equal to 0. When comparing statistics, this MSE is used most often.

The variance is the variance regarding the sample mean, the MSE is the variance regarding the population mean.

Sample mean of 𝑋~𝑁(µ, 𝜎 2 )
𝜽 ̂
𝜽 Sampling distribution Bias Variance MSE
µ 1
𝑥̅ = ∑𝑛𝑖=1 𝑥𝑖 𝑥̅ ~𝑁(µ,
𝜎2
) 𝐸(𝑥̅ ) = µ 𝜎2 𝜎2
𝑛 𝑛 Bias = 0 𝑛 𝑏𝑖𝑎𝑠 2 + 𝑣𝑎𝑟 =
𝑛
Sample mean of X~N(µ, ?) (unknown variance)
𝜽 ̂
𝜽 Sampling distribution Bias Variance MSE
µ 𝑥̅ 𝑡 − 𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛 𝐸(𝑥̅ ) = µ 𝑠2 𝑠2
𝑥̅ −µ Bias = 0 𝑛 𝑏𝑖𝑎𝑠 2 + 𝑣𝑎𝑟 =
~𝑡𝑛−1 1 𝑛
𝑠/√𝑛 𝑠2 = ∑(𝑥𝑖 − 𝑥̅ )2
𝑛−1

Student’s t distribution 𝑡~𝑡𝑛−1
𝑥̅ −µ
T is the t-statistic 𝑡 = .
𝑠/√𝑛
This is the sample distribution when the population distribution is Normal, and the variance is unknown.
𝑛 − 1 signifies the degrees of freedom, the higher the degrees of freedom, the closer it gets to ~N().

Maximum Likelihood estimator Unbiased estimator
1 1
𝜎 2 = ∑(𝑥𝑖 − 𝑥̅ )2 𝜎2 = ∑(𝑥𝑖 − 𝑥̅ )2
𝑛 𝑛−1

When using the MSE to analyze the estimators, the Maximum Likelihood estimator of smaller samples is more
efficient.

, If you have 2 samples from normally distributed data: 𝑋1 ~𝑁(µ1 , 𝜎 2 ) and 𝑋2 ~𝑁(µ2 , 𝜎 2 )
The sampling distribution of the difference in sample means is a t distribution with 𝑛1 + 𝑛2 − 2 degrees of freedom,
centered at µ2 − µ1 .

sampling distribution of the difference in sample means 𝑡~𝑡𝑛1+𝑛2 −2
This is centered at µ2 − µ1
The Standard error is calculated using: 𝑆𝐸(𝑥̅2 − 𝑥̅1 ) = 𝑠𝑝 √
1
+
1 (𝑛1 −1)𝑠12 +(𝑛2 −1)𝑠22
, 𝑠𝑝 = √
𝑥̅2 − 𝑥̅1 𝑛1 𝑛2 𝑛1 +𝑛2 −2
𝑡=
𝑆𝐸(𝑥̅2 − 𝑥̅2 )

Central Limit Theorem
If n is large enough, the sample mean of X coming from 𝑋~? (µ, 𝜎 2 ) with mean µ and variance
𝜎2
𝜎 2 is approximately the normal distribution 𝑥̅ ~𝑁(µ, )
𝑛

Monte Carlo Simulation
Computer simulation A numerical technique for conducting experiments on the computer. A tool to virtually
investigate the behavior of the system
Monte Carlo Simulation Computer experiment involving random sampling from probability distributions.
Used for estimators and for hypothesis testing (in absence of analytical results)

MC simulations for estimators
An estimator or test statistic has a true sampling distribution under a particular set of conditions. We want to know
this distribution. The derivation is however not always tractable. The MC simulation can be used to approximate the
distribution.
Step 1: Create approximate sampling distribution
Generate S independent data sets of given sample size n under the conditions of interest
Compute the numerical value of the estimator/test statistic 𝜃̂ for each dataset.
Step 2: Derive bias, var, MSE, relative efficiency
If S is large enough, the summary statistics should be a good approximation to the true sampling properties

̂ (1) )
𝑀𝑆𝐸(𝜃
Relative efficiency 𝑅𝐸 = ̂ (2) )
1 if 𝑅𝐸 < 1, estimator 1 is preferred
𝑀𝑆𝐸(𝜃

The sample median is most efficient for distributions with thick tails.
If the distribution is more similar to a normal distribution the mean is more useful.

MC simulations for hypothesis testing
t-statistic:
There are two types of hypothesis testing situations: 𝑥̅ −𝑥̅
1) Randomness (𝐻0 ) vs. Non-randomness (𝐻1 ) of data 𝑡𝑜𝑏𝑠 = 2̅ 1̅
𝑆𝐸(𝑋2 −𝑋1 )
2) No effect (𝐻0 ) vs. Effect (𝐻1 )
𝐻0 is rejected if the observed data/statistics are very unlikely under the assumption of randomness and no effect.

Confidence intervals
Confidence intervals This expresses sampling uncertainty. Often this is mentioned Two sided t-confidence:
instead of the point estimate. [(𝑥̅ 2 − 𝑥̅1 ) − 𝑡𝐶;𝑛1+𝑛2 −2 𝑆𝐸(𝑥̅2 − 𝑥̅1 );
It holds the true population parameter 𝜃 with a probability of C. (𝑥̅2 − 𝑥̅1 ) + 𝑡𝐶;𝑛1+𝑛2−2 𝑆𝐸(𝑥̅2 − 𝑥̅1 )]
A two-sample Student’s t-test does rely on some assumptions: the samples must come
from a normal distribution, and the variances are equal. If these are violated it can impact the quality of the
hypothesis test. If the variances are not equal, Welch’s test applies.

Power of a test complement of Type II error Significance level Type I error
𝑃(𝑡𝑒𝑠𝑡 𝑐𝑜𝑟𝑟𝑒𝑐𝑡𝑙𝑦 𝑟𝑒𝑗𝑒𝑐𝑡𝑠 𝐻0 | 𝐻1 𝑡𝑟𝑢𝑒) = 1 − 𝛽 𝑃( 𝑡𝑒𝑠𝑡 𝑟𝑒𝑗𝑒𝑐𝑡𝑠 𝐻0 ∣ 𝐻0 𝑡𝑟𝑢𝑒 ) = 𝛼
The probability of correctly rejecting 𝐻0
generate data under 𝐻0 : µ = µ0
Generate data under 𝐻1 : µ ≠ µ0 calculate how often 𝐻0 is rejected, this approximates 𝛼.
calculate the proportion of rejections.

1 Compare two estimators, e.g. 𝜃̂ (1) is the mean and 𝜃̂ (2) is the median

Report Copyright Violation

Written for

Institution: Technische Universiteit Eindhoven (TUE)
Study: Data Science
Course: Statistical Computing (JBM050)

All documents for this subject (3)

Document information

Uploaded on: June 12, 2022
Number of pages: 7
Written in: 2020/2021
Type: SUMMARY

Subjects

statistical computing
jbm050
statistics
data science
monte carlo
statistical modeling
cross validation
optimization
optimization methods

$6.53

Get access to the full document:

Written by students who passed

Immediately available after payment

Read online or as PDF

Get to know the seller

NienkeUr

4.5

(4)

Reviews from verified buyers

Showing all reviews

dsteenbergen Computer Science And Engineering · 2 reviews

1 year ago

4.0

1 reviews

Trustworthy reviews on Stuvia

All reviews are made by real Stuvia users after verified purchases.

Get to know the seller

NienkeUr Technische Universiteit Eindhoven

View profile

Sold

Member since

3 year

Number of followers

Documents

Last sold

1 month ago

4.5

4 reviews

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions

What do I get when I buy this document?

You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.

Satisfaction guarantee: how does it work?

Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.

Who am I buying these notes from?

Stuvia is a marketplace, so you are not buying this document from us, but from seller NienkeUr. Stuvia facilitates payment to the seller.

Will I be stuck with a subscription?

No, you only buy these notes for $6.53. You're not tied to anything after your purchase.

Can Stuvia be trusted?

4.6 stars on Google & Trustpilot (+1000 reviews) 47251 documents were sold in the last 30 days Founded in 2010, the go-to place to buy study notes for 16 years now

Summary Statistical Computing (JBM050)

Content preview

Written for

Document information

Subjects

Reviews from verified buyers

Get to know the seller

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Didn't get what you expected? Choose another document

Pay as you like, start learning right away

Working on your references?

Frequently asked questions

What do I get when I buy this document?

Satisfaction guarantee: how does it work?

Who am I buying these notes from?

Will I be stuck with a subscription?

Can Stuvia be trusted?