Other

Cheatsheet test 1 - Inferential Statistics ()

Rating

Sold

Pages

Uploaded on

23-04-2026

Written in

2024/2025

Cheatsheet test 1 - Inferential Statistics (), pre-master Psychology, universiteit Twente

Institution

Course

Content preview

Mann whitney wilcoxon/Wilcoxon rank sum test: 2 groups, not parametric = not normal, unequal variances, 2 indp. Hypotheses - 1sample t-test of differences: Notations
H0:median(male)=median(female)
Kruskal wallis: 3+ groups, not parametric, not normal, unequal variances.(+ groepen, niet var) H 0: 2- μ μ
1 = 0 (no change); HA: 2- 1 Population μ μ≠
proportion: π
H0:median(group1) = median(group2) = median(group3) 0 (change) Addition Sample proportion: p
Wilcoxon signed rank test: 2 datasets of 1 group, not normal, meet medians, not parametric( post pre test difference) option 2-sided hypothesis: H0 = 0 Population mean: β1 OR μ
1 H0:median(jan_2023) = median(jan_2022) option 2 H0: median (diff) = 0
Anova: 3 groups, normal, equal variances. Gericht op gemiddeldes. Bv. Onderzoekt effect van 3 lessen op school prestaties.
H0: is 100; HA: is not 100 μ HA μ0 β1≠ β
0
1-sided hypothesis: Code for scatterplot:
H0:mean(group1) = mean(group2) = mean(group3)
Independent sample t-test: mean of 2 independent groups, normal and equal. H0:mean(group1) = mean(group2) H 0: 100; HA : 100 μ≥ H0 = 0
μ Healthdata %>% β2
One sample t-test: means of 1 group compared to population mean. Normal distribution. T value (standardised effect of b) ggplot(aes(x=pred,
H0:mean(sample) = mean(population) S.E. (proportion
SD of the sampling distribution =
b− β b y=res))+geom_jitter
Of die plot(model2,1)
Welch t test: compares means of 2 independent groups, unequal variances. H0:mean(group1) = mean(group2)
s.e. s.e .
Standardised t-value Chi-square statistic SD of sampling distribution Effect size: horizontal % diff between 2 horizontal values
x−μ Standard error (mean) (calculate % first  from total in column)
(S.E.) of ρ when ρ = 0 is S  t-
If you usedistribution
SD of the sampling Load data:
s.e. distribution  n-1 dataset <- read.csv("data_scot.csv", sep = ",")

Levene’s test meet variances
*T value > 2: it is very unlikely these data come from a population where β
is 0 (different groups) P > 0,05=homoskedasticity in residuals = spreiding = equal
-> we reject the null hypothesis Breusch pagan meet variances
* T value < 2: it is very likely these data come from a population where there is no association P > 0.05= homoskedacitiy = equal variances = accept H0
between the variables (simlar groups, β=0 ¿ Shapira wilk meet normality: p<0.05= reject H0 = not normal. W: 0-1, hoe dichterbij 1 hoe
The bigger the t is, the smaller the p-value will be. T is calculated using the following formula: T = normaler.
estimate/ Standard error. Pvalue <0,05 = reject H0 = violation of the equal variances = heteroskedasticity
If the estimate is big -> t will be big -> p-value small.
Pvalue >0.05=not reject H0 = equal variances, homoskedacity
The p-value associated with X1 is smaller than the p-value associated with X2. This is because the -there is no difference in health between genders
estimate (see output) of X1 is larger (in absolute terms) than X2. = incorrect -the medians between genders are not significantly different
-the medians between genders are similar
Als je kijkt naar X1 en X2, als P laag is (<0.05), dan is X1 statistisch significant geassocieerd met de
afhankelijke variabele.. >0.05 dat is het niet geassocieerd met de afhankelijke variabele. -we are 95% sure that there are NOT significant health differences exist between males and
females
What does it mean that a case is influential? And why do we need to check whether they are
present?=
Means= affect the estimates in the model Income = B2 + B3private – B4Unemp + B5Education – B6Private*Education – B7Unemp*Education
Check= because it means that the estimates are strongly affected by only one or two data points R studio – interaction/moderation
(1) Create dummies: data$x_dummy_1 <- ifelse(data$x == “public”, 1, 0) select public as ref cat.
What is the expected probability of smoking for someone who is 50 years old? (2) Use lm() to estimate model parameters: data_sample %>% lm(income ~ private + unemp +
b0 = -2.87 educ + educ*private + educ*unemp, data = .) %>% summary  t-values >2 and p-value >0.05?
b1 = -0.015 (3) Add residuals: data_clean$residuals <- model$residuals
x = 50 (4) Add predictions: data_clean$pred <- model$fitted.values
logodds = b0 + b1*x OR in this way: data <- data %>% add_predictions(model1) %>% add_residuals(model1)
p = exp(logodds)/(1+exp(logodds)) Check residuals: data_clean %>% ggplot(aes(x = pred, y = residuals)) + geom_point()
p Low vs high (1) level of education: model_low_educ <- data_clean %>% filter(education == 0)%>%
Logodds is een manier om een kans (ppp) uit te drukken in termen van een logaritme en de
verhouding tussen succes (ppp) en niet-succes (1−p1-p1−p). ligt tusen 0-1. lm(support ~ campaign, data = .)
Create interaction term: data_clean<-data_clean%>%mutate(interaction=campaign*education)
Met hand log odds= B0+b1(x)  model_withinteraction<-data_clean%>%lm(support~campaign+education+interaction)
Probability= e(logodds) : 1+e(logodds) =….  summary(model_withinteraction)

Unit 560 – Non-normality of resid & omitted variables (assumpt. of normality & equal variance)
Unit 554 – Multiple regression and non-linearity
Studying improves your grade (non-linearity) – effect decreases when you study more ε¿
Errors (
i are in the population  Residuals are in the sample ( ie¿
Grade= β 0+ ( β 1 )∗hours of studying  If a model is good the errors will be random. Deviations are problematic: (1) the mean of residuals
(zero) is affected by outliers/skew, (2) This mean is associated with b (the estimate), (3) Less
β 1=β 2−β 3∗hours of studying confident in the S.E. based on these means  execute the steps below
Grade= β 0+ ( β 2−β 3 HoursStud ) HoursStud  1.visual histogram inspection
2 2.QQ plots (range in variable): shows the relationship between what you expect to find (x-axis)
β 0+ β 2∗HoursStud−β 3∗HoursStud when it is a normal distribution and what you observe on the y-axis. Can answer if the distribution
is normal (straight line). If the data deviates from normality, then the line will display strong
Non-linearity: (1) inspect via scatterplot, (2) residuals don’t have equal variances  solution ^2 curvature.  (formality test for testing normality Shapiro Wilk test)
For example (X = 75) = 20 + 0.6 * X + 0.002 * X^2  20 + 0.6*75 + 0.002*(75*75) = 76.25 3.Shapiro Wilk test (GoOfTe): chance of finding a W in a sample, smaller than critical value. Tests
hypothesis that the distribution of the data deviates from a comparable normal distribution. If
R studio – non-linearity (p<0.05)  reject null-hypothesis  data is not normally distributed. When sample size increases,
Scatterplot: data_sample %>% ggplot(aes(x = size, y = conflicts)) + geom_point() + SW will lead to greater probability of rejecting the null-hypothesis. H0= normal distribution.
geom_smooth(method = “lm”) R studio
Estimate model: model1 %>% data_sample %>% lm(conflicts ~ size, data = .) Creat a model  create & store residuals (step 3 and 4)  histogram (x = residuals)
data_sample <- data_sample %>% add_residuals(model1) OR data_sample$resid2 <- Transforming Y: model2 <- data %>% lm(log(punish +4) ~ crime, data=.) (this isn’t easy)
model$residuals data$residuals <- model2$residuals OR same query but instead of log use sqrt
Detecting non-lin: data_sample %>% ggplot(aes(x = size, y = resid2)) + geom_point() + geom_sm.. QQ plot: data %>% ggplot(aes(sample = residuals)) + geom_qq() + geom_qq_line()
Shapiro Wilk test (test normality): shapiro.test(data$residuals)
Unit 561 – Heteroscedasticity – non-equal variances and interaction effects
Homoscedasticity (residual variance) = homogeneity (of variances) = equal variances Unit 563 – Outliers, Influential cases, and Multicollinearity
Heteroscedasticity= heterogeneity (of variances) = unequal variances  is bad because: (1) We Residual: extent to which a datapoint is away from the estimated line.
only have one S.E. for the slope. (2) S.E. is used to evaluate the ‘quality’ of the slope/find p-values. Leverage: outlier on the x (IV) , say beyond +/- 2 S.E. How much the observation’s value on the
It occurs because of (1) Measurement error in Y (which is related to x) and (2) Interaction effects. predictor variable differs from the mean of the predictor variable. Look at whether they’re
 detect: by making an X (predicted) and Y (residuals) graph. Save pred and resid and create different from the rest.
R studio Influence: the extent to which the slope of the line is affected by the data point. Determined by
Create differences t2_t1, create dummies, estimate lm with read_diff, add residuals  residuals and leverage. When both are high then you have influence.
Levene’s (groups only): leveneTest(resid ~ as.factor(DV), data) OR instead of resid use read_diff Cooks distance in R studio: unit 563
levenetest <- leveneTest(read_diff ~ treatment, data)
1st : Create a model: for example: model <- lm(y~x, data = data563)
Breusch (for lm): bptest <- bptest(model1)  bptest
Unit 470 2nd : Plot the graph using one of the following code:
True positive (TP): 108 (correct voorspelde 1’s) plot(model) #Then hit enter 4 times until you see the first plot below left.
True negatives (TN): 84 (correct voorspelde 0’s) plot(model, 4)
False positives (FP): 60 (0’s fout voorspeld als 1’s) 3rd : Optional: Add all the cook distances in the dataset so you can find the influential case directly
False negatives (FN) : 69 (1’s fout voorspeld als 0’s)
Prevalence: totaal aantal 1: total observation = 168:321 on the table: data563$cd = cooks.distance(model). This can help to identify other information
Accurancy: correct voorspelde obervatie: total obervatie = 108+84:321 about this case.
Specifity: true negatives : true negatives + false positives = 84: 84+60 A case is influential if:
Sensitivity: true positives : true positives + false negatives = 108:108+69 -high leverage: far in x axis
Precision: true positives : true positives + false positives= 108: 108+60
-high residual: far in y axis
In R:
In R -high impact: removing/including it owuld change the slope&estimate
TP <- 108
prevalence <- (TP + FN) / Total If case has high leverage,residual and impact: cooks distance will be >0.5 and even 1.
TN <- 84
FP <- 60 accuracy <- (TP + TN) / Total 2 gebogen lijnen = kwadratisch verband (niet linear)
FN <- 69 specificity <- TN / (TN + FP) Bij de dummy veraiabele geen kwadraat!
Total <- 321 sensitivity <- TP / (TP + FN) Altijd beginnen met b0 (intercept)
precision <- TP / (TP + FP)
list(prevalence = prevalence, accuracy = accuracy, specificity = Interaction=verschillen tussen groepen, effecten
specificity, sensitivity = sensitivity, precision = precision) Addition: elk effect word los van elkaar bekeken
Bijv. lm(health ~ bmi + smoking)

Report Copyright Violation

Written for

Institution: Universiteit Twente (UT)
Study: Pre master Psychologie
Course: Inferential Statistics (202001403)

All documents for this subject (8)

Document information

Uploaded on: April 23, 2026
Number of pages: 3
Written in: 2024/2025
Type: OTHER
Person: Unknown

Subjects

cheatsheet
inferential statistics
test 1
exam

$7.02

Get access to the full document:

Written by students who passed

Immediately available after payment

Read online or as PDF

Get to know the seller

pienvanderv

Get to know the seller

pienvanderv Intercultural Open University

View profile

Sold

Member since

8 year

Number of followers

Documents

Last sold

4 days ago

0.0

0 reviews

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions

What do I get when I buy this document?

You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.

Satisfaction guarantee: how does it work?

Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.

Who am I buying these notes from?

Stuvia is a marketplace, so you are not buying this document from us, but from seller pienvanderv. Stuvia facilitates payment to the seller.

Will I be stuck with a subscription?

No, you only buy these notes for $7.02. You're not tied to anything after your purchase.

Can Stuvia be trusted?

4.6 stars on Google & Trustpilot (+1000 reviews) 46507 documents were sold in the last 30 days Founded in 2010, the go-to place to buy study notes for 16 years now

Cheatsheet test 1 - Inferential Statistics ()

Content preview

Written for

Document information

Subjects

Get to know the seller

Why students choose Stuvia

Created by fellow students, verified by reviews

Didn't get what you expected? Choose another document

Pay as you like, start learning right away

Working on your references?

Frequently asked questions

What do I get when I buy this document?

Satisfaction guarantee: how does it work?

Who am I buying these notes from?

Will I be stuck with a subscription?

Can Stuvia be trusted?