#classnotes
Subject: MATH307 MATH340 #STAT306 PHYS119
Topic:: Multiple linear regression
Slide 7 - Linear models: vector and matrix formulation
We can write the linear model in matrix form:
Y = Xβ + ϵ
where
The first column of X is a column of 1s, this corresponds to the intercept parameter β0
X is called the design matrix
Normal equations in matrix form
Normal equations:
In matrix form:
X T Xβ^ = XTY
,X T X is not invertible when the n ∑ x2i − (∑ xi)2 = ∑(xi − x̄ ) 2 = 0
-> x1 = x2 = ⋯ = xn
-> if all the covariate values in our data are the same, then it’s not possible to observe a trend
between the covariate and the response.
-> the design matrix X with dimensions n x p has rank(X) = p + 1
where p is the number of covariates, columns of X are linearly independent, n ≥ p
MLR model:
in matrix form
Y = Xβ + ϵ (this is a linear model)
,where
The residuals is given by Y − Xb
We get that the minimum = XT Y
We have that mid RSSp=2 ≥ RSSp=3 where p is the number of covariates.
, By adding a new covariate, - The regression model with p+1 covariates includes all the
previous covariates, so it can reproduce the fit from the model with p covariates by simply
setting the new coefficient (for p+1) to zero.new covariate
However, in minimizing RSS, the model can take advantage of the new covariate, possibly
further reducing the residual.
Thus RSSp ≥ RSSp+1, the fit cannot get worse, it may improve
Since R2 = 1 − RSS/TSS and since RSS decreases as we add a new covariate then R2
increases.
Since σ^2 = RSS and since k = p + 1 , RSS decreases, n − k also decreases (since n − k now
n−k
is n − p − 1) therefore σ^2 can increase or decrease.
To account for the number of predictors in the model, we can use the adjust R2. It adjusts for
the number of predictors in a regression model, penalizing the model when unnecessary
variables are added