Written by students who passed Immediately available after payment Read online or as PDF Wrong document? Swap it for free 4.6 TrustPilot
logo-home
Summary

Summary Advanced Statistics

Rating
-
Sold
-
Pages
10
Uploaded on
21-03-2025
Written in
2024/2025

Summary of the advanced statistics course.

Institution
Course

Content preview

Summary Advanced Statistics
Some definitions
P-value => the chance of observing a difference from H0 at least as extreme as the one in you sample
 P-hacking: Performing a large number of statistical tests, only reporting the ones that are
statistically significant, thereby increasing the risk of false positive results.

Standard Error (SE) => a measure of uncertainty of an estimate, so how much the estimate is
expected to vary from the estimate of the true population.
 It helps understand how reliable or representative our sample is as an estimate of the
population.
 A smaller standard error suggests a more reliable estimate, while a larger one indicates more
uncertainty.
Standard deviation (SD) => tells us how spread out or varied a set of data points is from the average
(mean).
 It helps understand the degree of variability or dispersion in a dataset.
 A larger standard deviation means the data points are more spread out, while a smaller one
indicates they are closer to the mean.
Degrees of freedom (DF) => represent the number of values in the final calculation of a statistic that
are free to vary.
 It measures the flexibility or constraints in data.
 It's the number of data points minus the number of parameters estimated or restrictions
imposed in a statistical analysis.

Tidy data => Every row is one is measurement in space and time, columns are variables with meaning
in the context of a hypothesis or model. Long format!
o Minimal number of columns = Degrees of freedom Model
Power => the probability that a statistical test or analysis will correctly detect a true effect or
difference when it exists. It measures the ability of a test to avoid a "false negative" or Type II error,
indicating the test's sensitivity to finding real effects.

Type I Error => incorrectly rejecting a true null hypothesis. In other words, it's a false positive,
indicating that there is an effect or difference when there isn't one. Underestimate of SE.
Type II Error => failing to reject a false null hypothesis. In other words, it's a false negative, indicating
that there is no effect or difference when there actually is one. Overestimate of SE.

Null deviance => measure for the deviance of the null model (maximal deviance explained by model).
Residual deviance => measure for deviance of the residuals (variance not explained by model).
- Residual deviance should be the same or close to degrees of freedom = model fits good
Deviance explained: (Null deviance - residual deviance)/Null deviance
- Overdispersion => having more variation or "spread" in the data than the model predicts,
which can lead to inaccurate model results and conclusions.
o You can deal with this in different ways:
 The dispersion parameter can be used to correct for the
underestimate/overestimate of SE.
 Quasipoisson (poisson but with more variance)
 Negative binomial (poisson but with more variance, more complex, separate
parameters for mean and variance)
 Mixed Models, but only if there is a random effect factor.
- Under dispersion => having less variation or "spread" than the model predicts, which can
also affect the accuracy of model results and conclusions.

, Fisher scoring => how many steps it took to find the best fit (4-8 is good, above 15 bad).

Studies where data is not independent:
- Longitudinal studies: Subject is measured over time
- Repeated measurement: Subject receives multiple treatments.
- Nested designs: One subject nested in treatment (not a factorial design).
- Split plot design: Combination of factorial and nested design.

Statistical Considerations of Study Design
 Balance => Equal sample size per category
o Not always possible => but increases power and simplicity of the analysis
 Replication => true replication is absolutely essential
o The required sample size depends on…
 Variance (the stochastic part of the process)
 Effect size (how large the true differences are)
 Model complexity (more parameters require more samples)
o Variance and effect size can be determined from a pilot study, previous research, or
expert knowledge.
o Model complexity depends on what kind of comparison you want to make, what
distribution you think the outcome has conditional on the explanatory variables,
whether you believe there to be potential confounders that have to be included, etc.
o No pseudo replications (measurements on the same experimental units, like leaves
on one tree instead of multiple trees).
 Randomization => random allocation of treatments, locations, or even the order in which you
process samples.
o Avoiding confounding effects
o Without randomization, samples run first will have slightly different measurement
error than samples run last.
 Blocking => a way to group similar things or subjects together.
o For estimating confounding effects
o A block is a subset of the experimental material within which experimental units are
expected to be homogeneous (e.g., a microarray is a block).
o Blocking can make it easier to detect the true effects of the factors you're studying by
reducing the influence of other variables that could muddy the result.
o Nested mixed models can use blocking (blocks nested in blocks).




Required sample size (n) depends on the complexity of the study design:
- Groups
- Natural variability
- Experimental techniques

 Small n => significance testing
 Medium n => regularized linear models (also small)
 Large n => predictive models

Written for

Institution
Study
Course

Document information

Uploaded on
March 21, 2025
Number of pages
10
Written in
2024/2025
Type
SUMMARY

Subjects

$9.05
Get access to the full document:

Wrong document? Swap it for free Within 14 days of purchase and before downloading, you can choose a different document. You can simply spend the amount again.
Written by students who passed
Immediately available after payment
Read online or as PDF

Get to know the seller
Seller avatar
mayastelzer

Also available in package deal

Get to know the seller

Seller avatar
mayastelzer Universiteit Leiden
Follow You need to be logged in order to follow users or courses
Sold
2
Member since
1 year
Number of followers
0
Documents
9
Last sold
5 months ago

0.0

0 reviews

5
0
4
0
3
0
2
0
1
0

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

Student with book image

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Working on your references?

Create accurate citations in APA, MLA and Harvard with our free citation generator.

Working on your references?

Frequently asked questions