College aantekeningen

College aantekeningen Experimental Design & Data Analysis (X_405078), VU AI

Beoordeling

Verkocht

Pagina's

Geüpload op

23-02-2022

Geschreven in

2020/2021

Alle collegeaantekeningen van de cursus EDDA; u hoeft de colleges niet te bekijken als u die aantekeningen leert.

Instelling

Vak

Voorbeeld van de inhoud

✔
Experimental Design & Data Analysis
Created @February 2, 2021 11:10 AM

Class S4

Type S4

Materials

Lecture 1:
Introduction

Statistics allows to generalize from data to a true state of nature. Statistical inference requires assumptions and
mathematical modelling.

The data should be obtained by a carefully designed (chance) experiment

“experimental units” are assigned to “treatments” by chance, or by randomization. To exclude other possible
explanations of an observed difference.

data obtained by registering an ongoing phenomenon, without randomization or applying other controls, is
called observational

randomization R code:

Recap of basic statistical concepts:

data summaries

histograms: sample of observed values is a barplot where the area of the bar over a cell C corresponds
to the fraction, bigger N means more information. How many of observations end up in a cell? compute
fraction of it.

number of observations in cell C/sample size

Experimental Design & Data Analysis 1

, x_(1) because not the original x_1 but smallest one

you can go from any normal to std normal and also back

confidence interval: the probability that a population parameter will fall between a set of values for a
certain proportion of times.

two important characteristics of a population are location (=mean) μ and scale (=std deviation =
measures the dispersion of a dataset relative to its mean) σ

Density, probability and quantiles of distributions in R: dnorm(u,par), pnorm(q,par), qnorm(a,par),
rnorm(size,par), etc.

Numerical summaries: sample mean, sample variance, sample median, sample standard deviation,
sample α-quantile, etc. R: mean(x), var(x), med(x), sd(x), quantile(x,a), summary(x), range(x), etc.

Graphical summaries: histogram, boxplot, (normal) QQ-plot, scatter plot(s), empirical distribution
function (cumulative histogram), etc. Commands in R: hist(x), boxplot(x), qqnorm(x), plot(x,y),
plot(ecdf(x)), etc.

QQ-plots:

can reveal whether data (approximately) follows a certain distribution P (normal distribution:
qqnorm(x))

plotting ordered data vs the quantiles (q_1/N)

If the x_i’s are from distribution P then the plot will follow a straight line. Then it can be assumed
that the data is sampled from the population.

estimation, confidence intervals

sample mean and its distributions
ˉ
sample mean: X

distribution of the sample mean:

N(µ, σ 2 /n) when sample size is n and sample is taken from N(µ, σ 2 )
ˉ has
when sample is taken from another distribution with expectation µ and variance σ 2 , then X
2 ˉ
approximately the N(µ, σ /n) distribution (X is asymptotically normal) because of the Central
Limit Theorem.

The mean varies less than the individual observations: the standard deviation σ is replaced by σ/√n.

standardizing the mean

any normal random var X ∼ N(µ, σ 2 ) can be standardized into a std N(0, 1)-var by Z =
(X − μ)/σ ∼ N(0, 1) - converse is also possible
in a real data set the population std deviation σ is unknown and needs to be estimated by the
sample std deviation s

this uncertainty influences the distribution of the resulting statistics

random variable T has a t-distribution with n-1 degrees of freedom instead of N(0, 1) distribution

Experimental Design & Data Analysis 2

, estimation - the concepts

supposing that the population of interest has a certain distribution with an unknown parameter

a point estimate for this parameter is a function of only the observed data seen as a random var.

^, p^ , etc
we denote estimators by a hat: μ

a confidence interval (CI) of level 1 − α for the unknown parameter is a random interval based
only on the observed data that contains the true value of the parameter with probability at least
1 − α.
estimating the mean, CI

the upper quantile zα of the N(0, 1)-distribution is such zα that P (Z ≥ zα ) = α for Z ∼
N(0, 1), (in R: zα =qnorm(1-alpha)).
if the std deviation σ is unknown, we estimate it by s and the confidence interval is based on a t-
distribution and the upper t-quantile tα =qt(1-alpha,df=n-1)

the t-confidence interval of level 1 − α for μ then becomes:

hypothesis testing

concept

two claims: null hypothesis (H0 ) and the alternative hypothesis (H1 ), which do not overlap.
Statistical test chooses between those two.

the claim of interest is usually represented by H1

to perform the test, one needs a test statistic T=T(X), which summarizes the data X=(X_1,..,X_n) in
a relative way

A test has two possible outcomes:

the strong outcome: H0 is rejected, alternative hyp. is assumed to be true —> if the value of
the test statistic is too extreme to what expected under the H0: reject H0 if T (X) ∈ K , for
critical region K

the weak outcome: H0 not rejected

to perform a test, we need to know the distribution of T(X) under H_0

test statistic is not unique. we can choose different test statistics, leading to different tests for the
same hypothesis H_0

p-value

3 ways to test, say, H0 : μ = μ0 , with test statistics T(X) and level alpha:
1. checking whether T (X) ∈ Kα : ∣T (X)∣ ≥ ∣tα/2 or not;
2. Most common one: comparing the p-value to α: P (∣T (X)∣ ≥ ∣t∣) ≤ α or not; Value of test
statistics T(X) is converted into p-value

Experimental Design & Data Analysis 3

, 1. checking whether μ0 is in the (1 − α)-CI (for μ) or not.

p-value of a test is the probability that an experiment in the situation that H_0 is true will deliver
the data actually observed. A small one says that the observed data would be unlikely if H_0 was
true

p-value under significance level alpha —> reject H_0, else not reject

if rejected, data is said to be statistically significant at level alpha. This is about generalization: an
observed effect is not due to chance, it should be observed again if a new experiment were
performed. Data can be statistically significant even though the deviation from H0 is very small!

In practice, this boils down to practical significance which is about the relation between the size of
the effect and the available information.

t-test, example

t-test tests the population mean of a normal population

types of errors, power of the test

two type of errors:

type 1 error: rejecting H_0 while it is true

type 2 error: not reject H_0 while it is false

test are constructed to have small P(type 1 error)typically, <5%

P(type 2 error) depends on the amount of data

The probability of (correctly, when H0 is not true) rejecting H0 is called
the power of the test. Under H1, power = 1 − P(type II error).

Different test statistics can yield different statistical power of the test.

higher sample size —> higher power

Experimental Design & Data Analysis 4

Meld schending auteursrecht

Geschreven voor

Instelling: Vrije Universiteit Amsterdam (VU)
Studie: MSc Artificial Intelligence
Vak: Experimental Design & Data Analysis (X_405078)

Alle documenten voor dit vak (1)

Documentinformatie

Geüpload op: 23 februari 2022
Aantal pagina's: 88
Geschreven in: 2020/2021
Type: College aantekeningen
Docent(en): Eduard belitser
Bevat: Alle colleges

Onderwerpen

hypothesis testing
data analysis
kunstmatige intelligentie
artificial intelligence
p value
boo
experimental design amp data analysis
ai
vu
vrije universiteit amsterdam
bootstrap confidence intervals

€4,49

Krijg toegang tot het volledige document:

Geschreven door studenten die geslaagd zijn

Direct beschikbaar na je betaling

Online lezen of als PDF

Maak kennis met de verkoper

MeldaMalkoc

3,3

(7)

Maak kennis met de verkoper

MeldaMalkoc Vrije Universiteit Amsterdam

Bekijk profiel

Volgen

Verkocht

Lid sinds

4 jaar

Aantal volgers

Documenten

Laatst verkocht

10 maanden geleden

3,3

7 beoordelingen

Recent door jou bekeken

Waarom studenten kiezen voor Stuvia

Gemaakt door medestudenten, geverifieerd door reviews

Kwaliteit die je kunt vertrouwen: geschreven door studenten die slaagden en beoordeeld door anderen die dit document gebruikten.

Niet tevreden? Kies een ander document

Geen zorgen! Je kunt voor hetzelfde geld direct een ander document kiezen dat beter past bij wat je zoekt.

Betaal zoals je wilt, start meteen met leren

Geen abonnement, geen verplichtingen. Betaal zoals je gewend bent via iDeal of creditcard en download je PDF-document meteen.

“Gekocht, gedownload en geslaagd. Zo makkelijk kan het dus zijn.”

Alisha Student

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.

Tevredenheidsgarantie: hoe werkt dat?

Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.

Van wie koop ik deze samenvatting?

Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper MeldaMalkoc. Stuvia faciliteert de betaling aan de verkoper.

Zit ik meteen vast aan een abonnement?

Nee, je koopt alleen deze samenvatting voor €4,49. Je zit daarna nergens aan vast.

Is Stuvia te vertrouwen?

4,6 sterren op Google & Trustpilot (+1000 reviews) Afgelopen 30 dagen zijn er 50860 samenvattingen verkocht Opgericht in 2010, al 16 jaar dé plek om samenvattingen te kopen

College aantekeningen Experimental Design & Data Analysis (X_405078), VU AI

Voorbeeld van de inhoud

Geschreven voor

Documentinformatie

Onderwerpen

Meer vakken binnen Vrije Universiteit Amsterdam (VU) > MSc Artificial Intelligence

Maak kennis met de verkoper

Recent door jou bekeken

Waarom studenten kiezen voor Stuvia

Gemaakt door medestudenten, geverifieerd door reviews

Niet tevreden? Kies een ander document

Betaal zoals je wilt, start meteen met leren

Bezig met je bronvermelding?

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Tevredenheidsgarantie: hoe werkt dat?

Van wie koop ik deze samenvatting?

Zit ik meteen vast aan een abonnement?

Is Stuvia te vertrouwen?