Bivariate regression
Bivariate regression is a technique that analyses the linear relationship between
two variables: a dependent variable (y ) and an independent variable (x). It uses a
line to summarise the relationship, predicting the value of the dependent variable
based on the independent variable.
Where is bivariate regression used
The purpose is to predict one quantitative variable (y)from another
quantitative variable (x).
Example: predict salary (y)from years of education (x)
The variable (x)represents the predictor (independent variable, while the
variable (y)is the outcome (dependent variable)
The relationship between X and Y must be linear. Where Y must be
quantitative and not categorical or dichotomus.
Regression is used to add direction to relationships, showing how one affect
the other. While correlation tells how two things are related with eachother.
There are two types of regression:
Raw-score regression → uses raw, untransformed data to build a
predictive model. It’s also called unstandardised regression equation
which predicts raw score for a dependent vairable, based on a raw score
for an indipendent variable.
This uses original units, such as dollars, kilos etc…
The formula for a raw score regression is Y = a + bX where Y is the
predicted dependent variable, X is the is the independent variable, ais
the y-intercept, and bis the slope.
Standardised regression → a method that uses standardised variables (z-
scores = unit-free comparison) instead of raw data, allowing for the
comparison of the relative importance of different independent variables
that have different units or scales.
Bivariate regression 1
, Difference from correlation
Correlation (r)is symmetric → X < − > Y are interchaneable
Regression is directional → predicts X < − > Y only
On a graph → X is the horizontal axis and Y is the vertical axis
New information is provided by regression
Regression gives more information that correlation. It provides the prediction
^
equation → Y = b0 + b1 X
b0 → intercept: Y when X = 0
b1 → slope: change in Y for each 1unit increase in X
Allowing to predict specific Y values
Introducing prediction errors (residuals) → the difference between the actual
^ , observed value. Measuring a model’s
value Y and the predicted Y
inaccuracy, indicating how far off a model’s forecast is from reality. This can
be positive (underestimation) or negative (overestimation)
Regression equations and lines
A regression line is the line of best fit that summarises how Y changes with X
. Passing through the ‘center’ of the data points on a scatterplot.
The purpose is to minimise the overall prediction errors. The method to do so,
is least squares which finds the best fitting lines or curves that makes the sum
of squared errors as small as possible.
^
The equation for this is: Y = a + bX
Y^ = predicted value of Y
a= intercept → predicted Y when X = 0
b= slope → expected change in Y for each 1-unit change in X
Versions of regression equation
The regression line summarises how Y depends on X ; the raw units predicts
actual scores. Standardised shows the relative strenght of association (equal to r
in simple regression)
Bivariate regression 2
Bivariate regression is a technique that analyses the linear relationship between
two variables: a dependent variable (y ) and an independent variable (x). It uses a
line to summarise the relationship, predicting the value of the dependent variable
based on the independent variable.
Where is bivariate regression used
The purpose is to predict one quantitative variable (y)from another
quantitative variable (x).
Example: predict salary (y)from years of education (x)
The variable (x)represents the predictor (independent variable, while the
variable (y)is the outcome (dependent variable)
The relationship between X and Y must be linear. Where Y must be
quantitative and not categorical or dichotomus.
Regression is used to add direction to relationships, showing how one affect
the other. While correlation tells how two things are related with eachother.
There are two types of regression:
Raw-score regression → uses raw, untransformed data to build a
predictive model. It’s also called unstandardised regression equation
which predicts raw score for a dependent vairable, based on a raw score
for an indipendent variable.
This uses original units, such as dollars, kilos etc…
The formula for a raw score regression is Y = a + bX where Y is the
predicted dependent variable, X is the is the independent variable, ais
the y-intercept, and bis the slope.
Standardised regression → a method that uses standardised variables (z-
scores = unit-free comparison) instead of raw data, allowing for the
comparison of the relative importance of different independent variables
that have different units or scales.
Bivariate regression 1
, Difference from correlation
Correlation (r)is symmetric → X < − > Y are interchaneable
Regression is directional → predicts X < − > Y only
On a graph → X is the horizontal axis and Y is the vertical axis
New information is provided by regression
Regression gives more information that correlation. It provides the prediction
^
equation → Y = b0 + b1 X
b0 → intercept: Y when X = 0
b1 → slope: change in Y for each 1unit increase in X
Allowing to predict specific Y values
Introducing prediction errors (residuals) → the difference between the actual
^ , observed value. Measuring a model’s
value Y and the predicted Y
inaccuracy, indicating how far off a model’s forecast is from reality. This can
be positive (underestimation) or negative (overestimation)
Regression equations and lines
A regression line is the line of best fit that summarises how Y changes with X
. Passing through the ‘center’ of the data points on a scatterplot.
The purpose is to minimise the overall prediction errors. The method to do so,
is least squares which finds the best fitting lines or curves that makes the sum
of squared errors as small as possible.
^
The equation for this is: Y = a + bX
Y^ = predicted value of Y
a= intercept → predicted Y when X = 0
b= slope → expected change in Y for each 1-unit change in X
Versions of regression equation
The regression line summarises how Y depends on X ; the raw units predicts
actual scores. Standardised shows the relative strenght of association (equal to r
in simple regression)
Bivariate regression 2