Goodness of fit: R-squared = how well the fit of the prediction is (verklaarde variantie van de
voorspellende variabele between 0-1). Betekent niet dat het een goede voorspeller is voor
nieuwe observaties.
Strongest predictor: highest standardized coefficient
Residual sum of squares (SSr): how well the model fits the data, the smaller the better
R2 = proportion of improvement due to the model. Proportion of the variation in the
outcome that can be predicted from the model
F-test: a measure of how much the model has improved the prediction compared to the
level of inaccuracy of the model >1 at least! Hoe goed het model kan voorspellen.
Sum of squares: hoe goed het model bij de data past
t-test: kans dat de nul hypothese waar is
z-scores: standardized measure so we can compare different models
Influential cases:
- Cook’s distance: overall influence of one case on the model. >1 is cause for concern!
- Leverage: the influence of the observed value of the outcome variable over the
predicted values (N-1/N=maximum value)
- Mahalanobis distances: measure the distance of cases from the. Mean of the
predictor variable
Assumptions:
- Additivity and linearity: lijn moet linear zijn
- Independent errors: residual terms should be uncorrelated (Durbin Watson Test)
- Homoscedasticity: at each level of the predictors variable, the variance of the
residual terms should be constant (dezelfde error/residual bij alle waardes)
- Normally distributed errors: with a mean of zero
- No perfect multicollinearity: no perfect linear relationship between two or more
predictors
- Absence of outliers
Adjusted R2: how much variance in Y would be accounted for if the model had been derived
from the population from which the sample was taken
R square: the explained variance in the model
Data splitting: a method that can be used to check how well the model generizes
Absence of outliers:
- Standardized residual should be between -3.3 and 3.3
, - Mahalanobis Distance must be lower than 10+2*number of Independent Variables
(outlier in X space)
- Cook’s distance must be lower than 1 (outlier in XY space)
Multicollinearity:
- In coefficients table last 2 columns
- Tolerance <.2: potential problem. <.1: problem
- VIF (variance inflation factor) = 1/tolerance > 10 = problem
Normally distributed à check histogram
Homoscedasticity à check scatter plot
Regressie coefficient interactie:
Y = b0 + b1*X1 + b2*X2 + b3*(X1*X2)
Bonferroni method: alpha/aantal experimenten
Grand variantie: totale variantie binnen alle scores ongeacht de groep
Degree of freedom (df): N-k (totale sample grootte - aantal groepen)
F-test: alleen algemene verschillen tussen de gemiddeldes en niet welke groepen verschillen
Levene’s test mag niet significant zijn! Om homogeniteit mee te testen
Post-hoc: alle groepen gemiddeldes vergelijken
ANOVA: om lineaire modellen te onderzoeken
- Interval/ratio level
ANCOVA: covariaat die de relatie beinvloedt wordt meegenomen
Partial eta squared (partial n2): effect size. Kijkt naar de proportie in totale variantie die een
variabele verklaart. Percentage of total variation that is explained by condition. Tells us
something about the slope; direction and strength.
Explained variance à between variance
Residual variance à within variance
Follow-up testing:
- Post-hoc
- Planned comparisons through specific contrast-tests (Simple in SPSS)