Simple (linear) regression:
General:
Design: Between-subjects
Independent variables: 1 Quantitative (or Categorical)
Dependent variables: 1 Quantitative
Null hypothesis:
((H0 : β0 = 0 )) (The null hypothesis for the intercept is generally not used.)
H0 : β1 = 0 (no effect X) (This null hypothesis concerns the slope).
Regression line:
^Y = b0 + b1X
Conditions for the optimally fitting line:
In a scatterplot you may draw an optimally fitting line through the cloud of points. But this only
works under following conditions; so when drawing a regression line (= line of best fit), be wary of:
1. There should be no outliers.
Outliers can either make a connection seem weaker (first image below) or stronger (second image
below) than is actually the case.
2. There should be no subgroups.
When there are subgroups, it indicates that a third (unassessed) variable might actually give rise
to confounding.
In case of subgroups, analyse them separately, or, when identified, use multiple regression.
>>When analysing the subgroups separately, it implies that you study the simple effects of the
independent variable on the dependent variable within each separate group.
3. The relationship between X and Y should be not curvilinear. (Must be able to draw a proper
straight line through the scatterplot, not a bent curve).
If the conditions are met:
-The trendline describes what the relationship looks like.
The statistics that define this line are regression coefficients. (b0 and b1)
-How closely the points follow the line tells you the strength of the relationship.
Expressed through Pearson’s correlation coefficient.
Regression:
Formula for regression line:
^Y = b0 + b1X (= Predicted Y)
Same as the formula used during Statistics I; ^Y = a + bX
This formula measures the appearance of how the regression line runs (not the strength).
Allows you to predict a value for a certain participant. (E.g. If ^Y = 1 + 2X, and your participant
drank 3 beers (X = 3) you predict that ^Y = 1+ 2*3 = 7 on whatever it is that you are measuring.)
>>You can then compare this predicted result to the actual result (E.g. Perhaps the participant
actually scored “9” rather than “7”) to determine your residuals.
-b0 and b1 are regression coefficients.
-b0 is the intercept: the value where the line (fictitiously) passes the y-axis; it determines the
baseline height of the regression line.
, At ^Y = b0; X = 0.
-b1 is the slope: How steep the line runs.
If b1 = 0, each ^Y = b0. And r = 0 as there would be no correlation.
Residuals:
Yres = Y - ^Y (Residual is actual Y minus predicted Y).
-Residuals are prediction errors; They show how far actual y-values deviate from the regression line.
These residuals show that other factors besides the independent variable influenced the
dependent variable.
The regression line is drawn in such a way that the residuals are as small as possible = least
squares criterion.
Notes on regression coefficients & SPSS output:
-The regression coefficients (b0, b1) are estimates based on your sample; They estimate their
respective regression coefficient of the population (β 0, β1).
To draw actual conclusions about the population, you must perform a statistical test (allowing you
to check the coefficients table in SPSS).
-In the coefficients table:
Under “Unstandardized Coefficients; B” you see the two regression coefficients as observed in the
sample (b0, b1).
”Unstandardized Coefficients; Std. Error” displays the standard deviations of b 0 and b1; indicating
how far a b will deviate from its β on average, if we keep drawing samples indefinitely.
>>The values displayed here in SPSS are thus estimates of the actual standard errors.
”Standardized Coefficients” displays the regression coefficients after X and Y were standardised to
z-scores.
”t” (t = b / sb) shows the results of t-tests (and “Sig.” the p-values) that assess the null hypotheses
H0: β0 =0 and H0: β1 =0
>>The null hypothesis concerning the intercept generally does not matter; if this null hypothesis is
right, the regression line goes through the origin (0,0), otherwise the intersection with the y-axis is
higher/lower. Even if that null hypothesis were to be violated, in most studies, this is not something
you particularly care about. (E.g. If you were to assess how many ice creams someone eats per
month, you would find that someone of “0” years old eats “0” ice creams, and your regression line
would indeed include (0,0). In this case, for X = 0, ^Y = b1X )
>>This refers to the same bit as the Intercept term in two-way ANOVA.
>>If the second null hypothesis is true, the slope of the population regression line is horizontal; the
independent variable then has no significant effect on the dependent variable.
The confidence intervals show between what values the population coefficients probably fall.
Correlation:
r = linear correlation coefficient.
It indicates how closely individual participants cluster around the regression line.
It is a standardised quantity (= z-score).
>>Always falls between [-1 ; 1].
-r expresses the predictive capacity of the optimal line through the point cloud.
The further away r is from 0, the more accurate our predictions become.
General:
Design: Between-subjects
Independent variables: 1 Quantitative (or Categorical)
Dependent variables: 1 Quantitative
Null hypothesis:
((H0 : β0 = 0 )) (The null hypothesis for the intercept is generally not used.)
H0 : β1 = 0 (no effect X) (This null hypothesis concerns the slope).
Regression line:
^Y = b0 + b1X
Conditions for the optimally fitting line:
In a scatterplot you may draw an optimally fitting line through the cloud of points. But this only
works under following conditions; so when drawing a regression line (= line of best fit), be wary of:
1. There should be no outliers.
Outliers can either make a connection seem weaker (first image below) or stronger (second image
below) than is actually the case.
2. There should be no subgroups.
When there are subgroups, it indicates that a third (unassessed) variable might actually give rise
to confounding.
In case of subgroups, analyse them separately, or, when identified, use multiple regression.
>>When analysing the subgroups separately, it implies that you study the simple effects of the
independent variable on the dependent variable within each separate group.
3. The relationship between X and Y should be not curvilinear. (Must be able to draw a proper
straight line through the scatterplot, not a bent curve).
If the conditions are met:
-The trendline describes what the relationship looks like.
The statistics that define this line are regression coefficients. (b0 and b1)
-How closely the points follow the line tells you the strength of the relationship.
Expressed through Pearson’s correlation coefficient.
Regression:
Formula for regression line:
^Y = b0 + b1X (= Predicted Y)
Same as the formula used during Statistics I; ^Y = a + bX
This formula measures the appearance of how the regression line runs (not the strength).
Allows you to predict a value for a certain participant. (E.g. If ^Y = 1 + 2X, and your participant
drank 3 beers (X = 3) you predict that ^Y = 1+ 2*3 = 7 on whatever it is that you are measuring.)
>>You can then compare this predicted result to the actual result (E.g. Perhaps the participant
actually scored “9” rather than “7”) to determine your residuals.
-b0 and b1 are regression coefficients.
-b0 is the intercept: the value where the line (fictitiously) passes the y-axis; it determines the
baseline height of the regression line.
, At ^Y = b0; X = 0.
-b1 is the slope: How steep the line runs.
If b1 = 0, each ^Y = b0. And r = 0 as there would be no correlation.
Residuals:
Yres = Y - ^Y (Residual is actual Y minus predicted Y).
-Residuals are prediction errors; They show how far actual y-values deviate from the regression line.
These residuals show that other factors besides the independent variable influenced the
dependent variable.
The regression line is drawn in such a way that the residuals are as small as possible = least
squares criterion.
Notes on regression coefficients & SPSS output:
-The regression coefficients (b0, b1) are estimates based on your sample; They estimate their
respective regression coefficient of the population (β 0, β1).
To draw actual conclusions about the population, you must perform a statistical test (allowing you
to check the coefficients table in SPSS).
-In the coefficients table:
Under “Unstandardized Coefficients; B” you see the two regression coefficients as observed in the
sample (b0, b1).
”Unstandardized Coefficients; Std. Error” displays the standard deviations of b 0 and b1; indicating
how far a b will deviate from its β on average, if we keep drawing samples indefinitely.
>>The values displayed here in SPSS are thus estimates of the actual standard errors.
”Standardized Coefficients” displays the regression coefficients after X and Y were standardised to
z-scores.
”t” (t = b / sb) shows the results of t-tests (and “Sig.” the p-values) that assess the null hypotheses
H0: β0 =0 and H0: β1 =0
>>The null hypothesis concerning the intercept generally does not matter; if this null hypothesis is
right, the regression line goes through the origin (0,0), otherwise the intersection with the y-axis is
higher/lower. Even if that null hypothesis were to be violated, in most studies, this is not something
you particularly care about. (E.g. If you were to assess how many ice creams someone eats per
month, you would find that someone of “0” years old eats “0” ice creams, and your regression line
would indeed include (0,0). In this case, for X = 0, ^Y = b1X )
>>This refers to the same bit as the Intercept term in two-way ANOVA.
>>If the second null hypothesis is true, the slope of the population regression line is horizontal; the
independent variable then has no significant effect on the dependent variable.
The confidence intervals show between what values the population coefficients probably fall.
Correlation:
r = linear correlation coefficient.
It indicates how closely individual participants cluster around the regression line.
It is a standardised quantity (= z-score).
>>Always falls between [-1 ; 1].
-r expresses the predictive capacity of the optimal line through the point cloud.
The further away r is from 0, the more accurate our predictions become.