Overig

Statistics and Methodology

Beoordeling

Verkocht

Pagina's

Geüpload op

02-10-2025

Geschreven in

2023/2024

Class notes combined with all study material

Instelling

Vak

Voorbeeld van de inhoud

fSTATISTICS AND METHODOLOGY
Recap of RBMS:
- RQ → hypotheses of a population → based on that we design a study and collect
data on a subsample of the pop → descriptive statistics to see how the sample reacts
→ inferential statistics to determine how likely it is that the things we observe in the
sample are true for the population.
- p-value: data unlikely to occur if the H0 is true, reject H0, while if they are likely to
occur then we retain H0.
- We can’t prove a negative, we prove a positive by rejecting a negative. Only
‘probably’, because there is a certain level of uncertainty, and the statistics help us
determine the uncertainty and how probable the observation is when the H0 is true. It
doesn’t talk about reality, we only find support for hypotheses, never prove things.
- Never say that the H0 is true, only that we retain it.
- RQ: PICOS, intervention, comparison, outcomes, population, study design.
- H0: no effect, no difference, no association; H1: an effect in either direction. Based
on literature.
- Two-sided: an effect in either direction; One-sided: only one direction, only when the
other direction is biologically impossible.
- Study design: causal effect or association, dependent and independent variables, the
latter also how it is manipulated, if it’s paired or not paired.
- Observational: cross sectional (all measures at the same time), case control (you go
back in time), prospective (you follow the sample over time) or experimental: RCD
(participant are assigned to one condition only), cross-over design (all conditions, but
the order is randomly assigned); the only one you can draw causal conclusions.
- Descriptive statistics: mean, mode, mode; range, interquartile range, variance, SD;
graphs and figures.
- Inferential statistics: p-value represents the probability of the data given that the null
hypothesis is true; can be unlikely, so we reject H0 and accept H, when p-value < H0.
TS = (point estimate - expected value)/SE, deviation of the data from the data under
the H0.
- p-value: we use probability distributions, empirical is based on the data in the
sample, while the theoretical is the hypothetical population distribution. If the
empirical data resemble the normal distribution, then the properties of the normal
distribution can be used to draw conclusions about the population based on the
sample using parametric statistics, and if not then we use nonparametric.
- Normal distribution: symmetrical around the midpoint; 95% of the observations are
between the mean - 1.96 SD and the mean + 1.96 SD, 2.5% are lower than the (-)
and 2.5% are higher than the (+).
- Z-scores: used to standardize normal distribution to standard normal distribution, to
compare two scores from two different normal distributions, (x-mean)/sd. Then, we
can calculate the probability of something compared to this z-score. This can be
done by looking at the area under the curve, or in the standard table.
- t-test lies between the critical t, meaning that the probability of observing the data
given that the H0 is true is likely, >5%, so retain the H0, non-significant result. If it’s
more extreme than the critical t, it means the data are very unlikely, so we reject the
H0, significant result.

,- Failing to reject the H0 doesn’t mean you can accept the H0 as true, you can only
say that there is not enough evidence to conclude that is untrue. Also you do not
know the size of the effect, if it’s either strong signal or little noise.
- Test selection: difference in means, proportions, or an association? DV and IV level
of measurements? DV normally distributed? IV levels? Paired or unpaired?

- Errors: type I errors: rejecting the H0 when we should have retained it; type Ii errors:
retaining the H0 when we should have rejected it.
- Confidence intervals: whether the hypothesized estimate falls in an interval around
the sample parameter of which we are very confident that it contains the population
parameter. CI = point estimate +- margin of error (critical statistic value * SE). We
retain the H0 when the population parameter lies in the CI. The meaning is that we
are 95% confident that the population mean lies within the CI around the sample
mean, an advantage.
- Correlation: measure of covariation, whether a change in a variable is associated
with a change in the other variable. Correlation coefficients tell us the direction of the
relationships (+ or -) and the strength of it (from -1 to 1). Pearson’s product moment
(parametric) or the Spearman’s rho (non-parametric).
- Covariation is not causality: unknown the direction, another variable or a triangular
explanation.
- Correlation coefficients are measures of linear association, and for a nonlinear
association is 0, which has no meaning.
- Correlation coefficients are sensitive to outliers, so always draw the data in a plot.
- Regression has a plus, it can tell you the direction of the correlation, with the
regression line. Still, it doesn’t imply causality, which is only determined by the
research design, only experimental!

, Regression I:
Statistical models: understanding relationship between variables, both for experimental and
observational studies. Statistics tell you whether an observed relationship between X and Y
in your sample is likely to reflect a true relationship in the unobserved population, accounting
for chance effects of random sampling and uncovering possible biases. All statistical tests
develop a model that describes the data well and makes accurate predictions about new
data points in the population.
Outcome(ind variable) = model + error (data point individually).
Means model is the simplest model: best guess that gives us the most accurate prediction, if
we don’t have any data, but if we have additional useful information, we can make a better,
more predictive model by including this: outcome = (mean +bXi) + error, where bXi is the
additional information which might be a deviation for a particular individual.
Regression leverages predictors to fit in a linear association model. Statistical tests depend
on the variables both independent and dependent, so if they’re continuous or categorical.
Linear regression is used to compare two continuous variables, and the ANOVA to
compare a continuous dependent variable between two or more categorical independent
variables, and logistic regression can be used when the dependent variable is categorical.
Statistical inference: data are just samples from a much bigger population, but might be
different from the population due to chance, so we use statistical testing to ensure that our
result and conclusion are not likely due to lucky sampling. Sampling distribution is a
distribution made with the many means found in many different samples taken from a
population and it shows a similar distribution to the population one. But as researchers we
can only do it with ONE sample, so this is why we use the SE, theoretical estimate of the
deviation that we would see in a population parameter in any individual randomly sampled
drawn from that population. SE is the standard deviation of sample of means, so the many
means found in the many samples observed, and it indicates how much we expect the
sample mean to be wrong from the value in the population mean. Central limit theorem: if
samples are large enough, the sampling distribution is normal with SD=s/sqrtN.
If we conduct a linear association the same things are true. Test statistics also tell you the
effect size and the uncertainty, expressed in units on a known distribution.
Degrees of freedom shapes the distribution, it represents the difference between, so the
ratio, the available information and the estimated information, usually equals the number of
observations minus the parameters estimated.
Regression: relationship between two quantitative variables, so when the independent
variable increases, what happens to the dependent one, what is the magnitude and the
direction of the association.
Option A describes it well but it cannot be used to make predictions, so this is why option C
is the best option. The purpose of regression is to find the line that best fits the points, and it
minimizes the squared error between the point and the line. Formula form: y = a + b*x + e,
where a is the intercept so average y when x=0, b is the slope so the effect of x on y, e is
the error or residual and it’s the difference between the true and predicted value of the
dependent variable. The purpose of drawing a regression line is to fit a statistical model to
our data and describe the linear pattern of association between x and y, and we can use the
formula to predict DV if IV is known. If we change the units, the slope coefficient changes
accordingly, so the relation stays the same, and in general linear regression is insensitive to
linear transformation. But for nonlinear transformation, such as logarithmic, the relation is
affected and the conclusion can change. After standardizing both predictor x and outcome y,
we know that b = r * sy/sx and a=0.

Meld schending auteursrecht

Geschreven voor

Instelling: Vrije Universiteit Amsterdam (VU)
Studie: Biomedische Wetenschappen
Vak: AB_1201 (AB_1201)

Alle documenten voor dit vak (1)

Documentinformatie

Geüpload op: 2 oktober 2025
Aantal pagina's: 28
Geschreven in: 2023/2024
Type: OVERIG
Persoon: Onbekend

Onderwerpen

spss
data analysis
stats
research methods

$15.16

Krijg toegang tot het volledige document:

Geschreven door studenten die geslaagd zijn

Direct beschikbaar na je betaling

Online lezen of als PDF

Maak kennis met de verkoper

ssarto

Maak kennis met de verkoper

ssarto Vrije Universiteit Amsterdam

Bekijk profiel

Volgen

Verkocht

Lid sinds

2 jaar

Aantal volgers

Documenten

Laatst verkocht

1 maand geleden

0.0

0 beoordelingen

Recent door jou bekeken

Waarom studenten kiezen voor Stuvia

Gemaakt door medestudenten, geverifieerd door reviews

Kwaliteit die je kunt vertrouwen: geschreven door studenten die slaagden en beoordeeld door anderen die dit document gebruikten.

Niet tevreden? Kies een ander document

Geen zorgen! Je kunt voor hetzelfde geld direct een ander document kiezen dat beter past bij wat je zoekt.

Betaal zoals je wilt, start meteen met leren

Geen abonnement, geen verplichtingen. Betaal zoals je gewend bent via iDeal of creditcard en download je PDF-document meteen.

“Gekocht, gedownload en geslaagd. Zo makkelijk kan het dus zijn.”

Alisha Student

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.

Tevredenheidsgarantie: hoe werkt dat?

Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.

Van wie koop ik deze samenvatting?

Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper ssarto. Stuvia faciliteert de betaling aan de verkoper.

Zit ik meteen vast aan een abonnement?

Nee, je koopt alleen deze samenvatting voor $15.16. Je zit daarna nergens aan vast.

Is Stuvia te vertrouwen?

4,6 sterren op Google & Trustpilot (+1000 reviews) Afgelopen 30 dagen zijn er 48849 samenvattingen verkocht Opgericht in 2010, al 16 jaar dé plek om samenvattingen te kopen

Statistics and Methodology

Voorbeeld van de inhoud

Geschreven voor

Documentinformatie

Onderwerpen

Maak kennis met de verkoper

Recent door jou bekeken

Waarom studenten kiezen voor Stuvia

Gemaakt door medestudenten, geverifieerd door reviews

Niet tevreden? Kies een ander document

Betaal zoals je wilt, start meteen met leren

Bezig met je bronvermelding?

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Tevredenheidsgarantie: hoe werkt dat?

Van wie koop ik deze samenvatting?

Zit ik meteen vast aan een abonnement?

Is Stuvia te vertrouwen?