Written by students who passed Immediately available after payment Read online or as PDF Wrong document? Swap it for free 4.6 TrustPilot
logo-home
Other

Cheatsheet test 1 - Inferential Statistics ()

Rating
-
Sold
-
Pages
3
Uploaded on
23-04-2026
Written in
2024/2025

Cheatsheet test 1 - Inferential Statistics (), pre-master Psychology, universiteit Twente

Institution
Course

Content preview

Mann whitney wilcoxon/Wilcoxon rank sum test: 2 groups, not parametric = not normal, unequal variances, 2 indp. Hypotheses - 1sample t-test of differences: Notations
H0:median(male)=median(female)
Kruskal wallis: 3+ groups, not parametric, not normal, unequal variances.(+ groepen, niet var) H 0: 2- μ μ
1 = 0 (no change); HA: 2- 1 Population μ μ≠
proportion: π
H0:median(group1) = median(group2) = median(group3) 0 (change) Addition Sample proportion: p
Wilcoxon signed rank test: 2 datasets of 1 group, not normal, meet medians, not parametric( post pre test difference) option 2-sided hypothesis: H0 = 0 Population mean: β1 OR μ
1 H0:median(jan_2023) = median(jan_2022) option 2 H0: median (diff) = 0
Anova: 3 groups, normal, equal variances. Gericht op gemiddeldes. Bv. Onderzoekt effect van 3 lessen op school prestaties.
H0: is 100; HA: is not 100 μ HA μ0 β1≠ β
0
1-sided hypothesis: Code for scatterplot:
H0:mean(group1) = mean(group2) = mean(group3)
Independent sample t-test: mean of 2 independent groups, normal and equal. H0:mean(group1) = mean(group2) H 0: 100; HA : 100 μ≥ H0 = 0
μ Healthdata %>% β2
One sample t-test: means of 1 group compared to population mean. Normal distribution. T value (standardised effect of b) ggplot(aes(x=pred,
H0:mean(sample) = mean(population) S.E. (proportion
SD of the sampling distribution =
b− β b y=res))+geom_jitter
Of die plot(model2,1)
Welch t test: compares means of 2 independent groups, unequal variances. H0:mean(group1) = mean(group2)
s.e. s.e .
Standardised t-value Chi-square statistic SD of sampling distribution Effect size: horizontal % diff between 2 horizontal values
x−μ Standard error (mean) (calculate % first  from total in column)
(S.E.) of ρ when ρ = 0 is S  t-
If you usedistribution
SD of the sampling Load data:
s.e. distribution  n-1 dataset <- read.csv("data_scot.csv", sep = ",")

Levene’s test meet variances
*T value > 2: it is very unlikely these data come from a population where β
is 0 (different groups) P > 0,05=homoskedasticity in residuals = spreiding = equal
-> we reject the null hypothesis Breusch pagan meet variances
* T value < 2: it is very likely these data come from a population where there is no association P > 0.05= homoskedacitiy = equal variances = accept H0
between the variables (simlar groups, β=0 ¿ Shapira wilk meet normality: p<0.05= reject H0 = not normal. W: 0-1, hoe dichterbij 1 hoe
The bigger the t is, the smaller the p-value will be. T is calculated using the following formula: T = normaler.
estimate/ Standard error. Pvalue <0,05 = reject H0 = violation of the equal variances = heteroskedasticity
If the estimate is big -> t will be big -> p-value small.
Pvalue >0.05=not reject H0 = equal variances, homoskedacity
The p-value associated with X1 is smaller than the p-value associated with X2. This is because the -there is no difference in health between genders
estimate (see output) of X1 is larger (in absolute terms) than X2. = incorrect -the medians between genders are not significantly different
-the medians between genders are similar
Als je kijkt naar X1 en X2, als P laag is (<0.05), dan is X1 statistisch significant geassocieerd met de
afhankelijke variabele.. >0.05 dat is het niet geassocieerd met de afhankelijke variabele. -we are 95% sure that there are NOT significant health differences exist between males and
females
What does it mean that a case is influential? And why do we need to check whether they are
present?=
Means= affect the estimates in the model Income = B2 + B3private – B4Unemp + B5Education – B6Private*Education – B7Unemp*Education
Check= because it means that the estimates are strongly affected by only one or two data points R studio – interaction/moderation
(1) Create dummies: data$x_dummy_1 <- ifelse(data$x == “public”, 1, 0) select public as ref cat.
What is the expected probability of smoking for someone who is 50 years old? (2) Use lm() to estimate model parameters: data_sample %>% lm(income ~ private + unemp +
b0 = -2.87 educ + educ*private + educ*unemp, data = .) %>% summary  t-values >2 and p-value >0.05?
b1 = -0.015 (3) Add residuals: data_clean$residuals <- model$residuals
x = 50 (4) Add predictions: data_clean$pred <- model$fitted.values
logodds = b0 + b1*x OR in this way: data <- data %>% add_predictions(model1) %>% add_residuals(model1)
p = exp(logodds)/(1+exp(logodds)) Check residuals: data_clean %>% ggplot(aes(x = pred, y = residuals)) + geom_point()
p Low vs high (1) level of education: model_low_educ <- data_clean %>% filter(education == 0)%>%
Logodds is een manier om een kans (ppp) uit te drukken in termen van een logaritme en de
verhouding tussen succes (ppp) en niet-succes (1−p1-p1−p). ligt tusen 0-1. lm(support ~ campaign, data = .)
Create interaction term: data_clean<-data_clean%>%mutate(interaction=campaign*education)
Met hand log odds= B0+b1(x)  model_withinteraction<-data_clean%>%lm(support~campaign+education+interaction)
Probability= e(logodds) : 1+e(logodds) =….  summary(model_withinteraction)

Unit 560 – Non-normality of resid & omitted variables (assumpt. of normality & equal variance)
Unit 554 – Multiple regression and non-linearity
Studying improves your grade (non-linearity) – effect decreases when you study more ε¿
Errors (
i are in the population  Residuals are in the sample ( ie¿
Grade= β 0+ ( β 1 )∗hours of studying  If a model is good the errors will be random. Deviations are problematic: (1) the mean of residuals
(zero) is affected by outliers/skew, (2) This mean is associated with b (the estimate), (3) Less
β 1=β 2−β 3∗hours of studying confident in the S.E. based on these means  execute the steps below
Grade= β 0+ ( β 2−β 3 HoursStud ) HoursStud  1.visual histogram inspection
2 2.QQ plots (range in variable): shows the relationship between what you expect to find (x-axis)
β 0+ β 2∗HoursStud−β 3∗HoursStud when it is a normal distribution and what you observe on the y-axis. Can answer if the distribution
is normal (straight line). If the data deviates from normality, then the line will display strong
Non-linearity: (1) inspect via scatterplot, (2) residuals don’t have equal variances  solution ^2 curvature.  (formality test for testing normality Shapiro Wilk test)
For example (X = 75) = 20 + 0.6 * X + 0.002 * X^2  20 + 0.6*75 + 0.002*(75*75) = 76.25 3.Shapiro Wilk test (GoOfTe): chance of finding a W in a sample, smaller than critical value. Tests
hypothesis that the distribution of the data deviates from a comparable normal distribution. If
R studio – non-linearity (p<0.05)  reject null-hypothesis  data is not normally distributed. When sample size increases,
Scatterplot: data_sample %>% ggplot(aes(x = size, y = conflicts)) + geom_point() + SW will lead to greater probability of rejecting the null-hypothesis. H0= normal distribution.
geom_smooth(method = “lm”) R studio
Estimate model: model1 %>% data_sample %>% lm(conflicts ~ size, data = .) Creat a model  create & store residuals (step 3 and 4)  histogram (x = residuals)
data_sample <- data_sample %>% add_residuals(model1) OR data_sample$resid2 <- Transforming Y: model2 <- data %>% lm(log(punish +4) ~ crime, data=.) (this isn’t easy)
model$residuals data$residuals <- model2$residuals OR same query but instead of log use sqrt
Detecting non-lin: data_sample %>% ggplot(aes(x = size, y = resid2)) + geom_point() + geom_sm.. QQ plot: data %>% ggplot(aes(sample = residuals)) + geom_qq() + geom_qq_line()
Shapiro Wilk test (test normality): shapiro.test(data$residuals)
Unit 561 – Heteroscedasticity – non-equal variances and interaction effects
Homoscedasticity (residual variance) = homogeneity (of variances) = equal variances Unit 563 – Outliers, Influential cases, and Multicollinearity
Heteroscedasticity= heterogeneity (of variances) = unequal variances  is bad because: (1) We Residual: extent to which a datapoint is away from the estimated line.
only have one S.E. for the slope. (2) S.E. is used to evaluate the ‘quality’ of the slope/find p-values. Leverage: outlier on the x (IV) , say beyond +/- 2 S.E. How much the observation’s value on the
It occurs because of (1) Measurement error in Y (which is related to x) and (2) Interaction effects. predictor variable differs from the mean of the predictor variable. Look at whether they’re
 detect: by making an X (predicted) and Y (residuals) graph. Save pred and resid and create different from the rest.
R studio Influence: the extent to which the slope of the line is affected by the data point. Determined by
Create differences t2_t1, create dummies, estimate lm with read_diff, add residuals  residuals and leverage. When both are high then you have influence.
Levene’s (groups only): leveneTest(resid ~ as.factor(DV), data) OR instead of resid use read_diff Cooks distance in R studio: unit 563
levenetest <- leveneTest(read_diff ~ treatment, data)
1st : Create a model: for example: model <- lm(y~x, data = data563)
Breusch (for lm): bptest <- bptest(model1)  bptest
Unit 470 2nd : Plot the graph using one of the following code:
True positive (TP): 108 (correct voorspelde 1’s) plot(model) #Then hit enter 4 times until you see the first plot below left.
True negatives (TN): 84 (correct voorspelde 0’s) plot(model, 4)
False positives (FP): 60 (0’s fout voorspeld als 1’s) 3rd : Optional: Add all the cook distances in the dataset so you can find the influential case directly
False negatives (FN) : 69 (1’s fout voorspeld als 0’s)
Prevalence: totaal aantal 1: total observation = 168:321 on the table: data563$cd = cooks.distance(model). This can help to identify other information
Accurancy: correct voorspelde obervatie: total obervatie = 108+84:321 about this case.
Specifity: true negatives : true negatives + false positives = 84: 84+60 A case is influential if:
Sensitivity: true positives : true positives + false negatives = 108:108+69 -high leverage: far in x axis
Precision: true positives : true positives + false positives= 108: 108+60
-high residual: far in y axis
In R:
In R -high impact: removing/including it owuld change the slope&estimate
TP <- 108
prevalence <- (TP + FN) / Total If case has high leverage,residual and impact: cooks distance will be >0.5 and even 1.
TN <- 84
FP <- 60 accuracy <- (TP + TN) / Total 2 gebogen lijnen = kwadratisch verband (niet linear)
FN <- 69 specificity <- TN / (TN + FP) Bij de dummy veraiabele geen kwadraat!
Total <- 321 sensitivity <- TP / (TP + FN) Altijd beginnen met b0 (intercept)
precision <- TP / (TP + FP)
list(prevalence = prevalence, accuracy = accuracy, specificity = Interaction=verschillen tussen groepen, effecten
specificity, sensitivity = sensitivity, precision = precision) Addition: elk effect word los van elkaar bekeken
Bijv. lm(health ~ bmi + smoking)

Written for

Institution
Study
Course

Document information

Uploaded on
April 23, 2026
Number of pages
3
Written in
2024/2025
Type
OTHER
Person
Unknown

Subjects

$7.02
Get access to the full document:

Wrong document? Swap it for free Within 14 days of purchase and before downloading, you can choose a different document. You can simply spend the amount again.
Written by students who passed
Immediately available after payment
Read online or as PDF

Get to know the seller
Seller avatar
pienvanderv

Get to know the seller

Seller avatar
pienvanderv Intercultural Open University
Follow You need to be logged in order to follow users or courses
Sold
1
Member since
8 year
Number of followers
0
Documents
9
Last sold
4 days ago

0.0

0 reviews

5
0
4
0
3
0
2
0
1
0

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

Student with book image

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Working on your references?

Create accurate citations in APA, MLA and Harvard with our free citation generator.

Working on your references?

Frequently asked questions