College aantekeningen

Lecture summaries of statistics and methodology spring 2025

Beoordeling

Verkocht

Pagina's

Geüpload op

25-07-2025

Geschreven in

2024/2025

Lecture summaries of statistics and methodology spring 2025 Lecture 1-7, a lot of things are written down verbatim, thats why its so extensive. some added notes of things that were unclear. I got a 8.7 on the final exam and a 9.2 on the quizzes.

Meer zien Lees minder

Instelling

Vak

Voorbeeld van de inhoud

Lecture 1
Introduction to statistical inference (forming judgement on parameters/population and reliability
of statistical relationships):
Motivating:
- A mean = 118 seconds and B mean = 110 seconds
- B is faster, but have to think about variability
- SD A = 7 seconds and B = 5 seconds; and what if SD A = 35 seconds and B = 25 seconds?
- In first scenario B with more confidence ⇒ much greater precision
- 25 seconds SD is not precisive
- Statistical reasoning: carefully think about certainty of measurement; precision of
measurements; foundation of good statistical analysis
- In previous example, the mean lap time for setup A is clearly longer than the mean of B
- If the times are highly variable, with respect to the size of the mean difference, we may
not care much about the mean difference
- The purpose of statistics is to systematize the way that we account for uncertainty when
making data-based decisions

Recap of probability distributions:
Probability distribution:
- Mathematical function
- Quantify how likely it is to observe each possible value of some probabilistic entity (e.g., height)
- Probability distributions are re-scaled frequency distributions
- We can build up the intuition of a probability density by beginning with a histogram
- With an infinite number of bins, a histogram smooths into a continuous curve
- In a loose sense, each point on the curve gives the probability of observing the
corresponding x value in any given sample
- The area under the curve must integrate (combine) to 1.0 ⇒ total probability is 1

,Statistical testing:
- In practice we may want to distill the information in the preceding plots into a simple statistic so
we can make a judgment
- One way to distill this information and control for uncertainty when generating knowledge
is through statistical testing
- When we conduct statistical tests, we weight the estimated effect by the precision
of the estimate
- A common type of statistical test, the Wald test, follows this pattern:

- Form of T test; estimate, null-hypothesis value for this estimate (value it would
take if there was no effect in population), some type of variability (how well we
have estimated this type of effect)
- If we want to test the null hypothesis of a zero mean difference, applying Wald test logic to control
for the uncertainty in our estimate results in the familiar t-test:

Generally, the larger the test statistic,
the better

In R, assume equal variance ⇒ easier
to compare to the modelling approaches
later on

,Statistical testing:
- We’ve computed a test statistic, but how do we use it to compare lap times under setups A and B
- A test statistic, by itself, is just an arbitrary number
- To conduct the test, we need to compare the test statistic to some objective reference
- This objective reference needs to tell us something about how exceptional our test
statistic is
- The specific reference we will be employing is known as a sampling distribution of the
test statistic

Sampling distribution:
- A sampling distribution (=probability distribution) is simply the probability distribution of a statistic
- The sampling distribution quantifies the possible values of the test statistic over infinite
repeated sampling (based on different mean differences; possibility of these different
values is sampling distribution)
- The area of a region under the curve represents the probability of observing a test
statistic within the corresponding interval
- Note that a sampling distribution is a slightly different concept than the distribution of a random
variable
- The sampling distribution quantifies the possible values of a statistic (e.g., mean,
t-statistic, correlation coefficient, etc.)
- The distribution of a random variable quantifies the possible values of a variable (e.g.,
age, gender, income, movie preferences, etc.).
- The t-test we’ve been considering is a way to summarize the comparison of two variables’
distributions
- The t-statistic also has a sampling distribution that quantifies the possible t-values we
could get if we repeatedly drew samples from the variables’ distributions and
re-computed a t-statistic each time

Statistical testing:
- To quantify how exceptional our estimated t-statistic is, we compare the estimated value to a
sampling distribution of t-statistics assuming no effect (if null hypothesis is true)
- This distribution quantifies the null hypothesis
- The special case of a null hypothesis of no effect is called the nil-null (mean
difference in lap time = 0)
- If our estimated statistic would be very unusual in a population where the null hypothesis
is true, we reject the null and claim a “statistically significant” effect
Computing the probability of events:
- We can find the probability associated with a range of values (i.e., a range of possible events,
variable values, or statistics) by computing the area of the corresponding slice from the
distribution

, P-values:
- By calculating the area in the null distribution that exceeds our estimated test statistic, we can
compute the probability of observing the given test statistic, or one more extreme, if the null
hypothesis were true (test statistic = e.g. t-statistic, objective statistic that we use for testing,
based on estimated effect; p-value tells us the probability that the t-statistic is larger or equal to
an estimated test statistic, given that H0 is true)
- In other words, we can compute the probability of having sampled the data we observed,
or more unusual data, from a population wherein there is no true difference in lap times

- The preceding test is one-tailed
- We use a one-tailed test when we have directional hypotheses
- Use one-tailed when you have directional hypothesis (you expect the mean to be
greater or less than a certain value)
- Ex. students who study with new method score higher than those who
don’t (H0 = new method has no effect; H1 = new method leads to higher
scores)
- Since we didn’t expect setup B to out-perform setup A, we need to use a two-tailed test
- Use two-tailed when you have non-directional hypothesis (you only care if there’s
a difference, not the direction)
- Ex. a new drug has a different effect on blood pressure compared to a
placebo (H0 = drug has no effect; H1 = the drug changes blood
pressure)
- We cannot first compute the means and then say we do a directional test based on the
results since it will increase your type 1 error ⇒ reject the null hypothesis too easy

Meld schending auteursrecht

Geschreven voor

Instelling: Tilburg University (UVT)
Studie: Data Science & Society
Vak: Statistics and Methodology (880670M6)

Alle documenten voor dit vak (1)

Documentinformatie

Geüpload op: 25 juli 2025
Aantal pagina's: 62
Geschreven in: 2024/2025
Type: College aantekeningen
Docent(en): Dr. leonie v.d.e. vogelsmeier
Bevat: Alle colleges

Onderwerpen

lectures
quizzes
test
spring2025
statistics
regression
coding
practical
summary
exam

€9,49

Krijg toegang tot het volledige document:

Geschreven door studenten die geslaagd zijn

Direct beschikbaar na je betaling

Online lezen of als PDF

Maak kennis met de verkoper

StudentSums

2,5

(2)

Maak kennis met de verkoper

StudentSums Erasmus Universiteit Rotterdam

Bekijk profiel

Volgen

Verkocht

Lid sinds

5 jaar

Aantal volgers

Documenten

Laatst verkocht

1 maand geleden

2,5

2 beoordelingen

Recent door jou bekeken

Waarom studenten kiezen voor Stuvia

Gemaakt door medestudenten, geverifieerd door reviews

Kwaliteit die je kunt vertrouwen: geschreven door studenten die slaagden en beoordeeld door anderen die dit document gebruikten.

Niet tevreden? Kies een ander document

Geen zorgen! Je kunt voor hetzelfde geld direct een ander document kiezen dat beter past bij wat je zoekt.

Betaal zoals je wilt, start meteen met leren

Geen abonnement, geen verplichtingen. Betaal zoals je gewend bent via iDeal of creditcard en download je PDF-document meteen.

“Gekocht, gedownload en geslaagd. Zo makkelijk kan het dus zijn.”

Alisha Student

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.

Tevredenheidsgarantie: hoe werkt dat?

Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.

Van wie koop ik deze samenvatting?

Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper StudentSums. Stuvia faciliteert de betaling aan de verkoper.

Zit ik meteen vast aan een abonnement?

Nee, je koopt alleen deze samenvatting voor €9,49. Je zit daarna nergens aan vast.

Is Stuvia te vertrouwen?

4,6 sterren op Google & Trustpilot (+1000 reviews) Afgelopen 30 dagen zijn er 49904 samenvattingen verkocht Opgericht in 2010, al 16 jaar dé plek om samenvattingen te kopen

Lecture summaries of statistics and methodology spring 2025

Voorbeeld van de inhoud

Geschreven voor

Documentinformatie

Onderwerpen

Meer vakken binnen Tilburg University (UVT) > Data Science & Society

Maak kennis met de verkoper

Recent door jou bekeken

Waarom studenten kiezen voor Stuvia

Gemaakt door medestudenten, geverifieerd door reviews

Niet tevreden? Kies een ander document

Betaal zoals je wilt, start meteen met leren

Bezig met je bronvermelding?

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Tevredenheidsgarantie: hoe werkt dat?

Van wie koop ik deze samenvatting?

Zit ik meteen vast aan een abonnement?

Is Stuvia te vertrouwen?