κ(A⊤ A) = κ(A)2
P
EECS 127 Final Cheat Sheet Properties: tr(A) = λ , det(A) = Errors: ∥A − A ∥ = 2
i>k σi
Q i i k F
1 Linear Algebra i λi ∥A − Ak ∥2 = σk+1 3 Convexity
1.1 Vector Spaces λ(A−1 ) = 1/λ(A), λ(A + cI) = λ(A) + c ∥A−A ∥2
P
σ2
3.1 Convex Sets
Subspace: Closed under +, scalar ×, con- λ(Ak ) = λ(A)k Relative: ∥A∥k2 F = Pi>kσ2 i Def: ∀x, y ∈ S, ∀θ ∈ [0, 1]:P
θx+(1−θ)y ∈ S
F i i
tains 0 1.7 Spectral Theorem Unique iff σk ̸= σk+1 Convex combination: i θi xi , θi ≥ 0,
Affine set: {x0 + v : v ∈ S}, S subspace For symmetric A = A⊤ :
P
2 Least Squares i θi = 1
Fundamental Thm: N (A) ⊕ R(A⊤ ) = 2.1 Problem Convex hull: smallest convex set contain-
n
Rn X minx ∥Ax − y∥22 ing S
A = U ΛU ⊤ = λ i ui u⊤
dim(R(A)) + dim(N (A)) = n i Normal equations: A⊤ Ax = A⊤ y 3.2 Examples of Convex Sets
i=1 ⊤
Range: R(A) = {Ax : x ∈ R } n Gradient: ∇∥Ax − y∥2 = 2A⊤ (Ax − y) Hyperplane: {x : a x = b}
Nullspace: N (A) = {x : Ax = 0} U orthogonal, Λ = diag(λ1 , . . . , λn ), λi ∈ R 2.2 Cases Halfspace: {x : a⊤ x ≤ b}
Rank: rank(A) = dim(R(A)) Eigenvectors orthogonal Overdetermined (tall A, m > n): Ball: {x : ∥x − c∥ ≤ r}
1.2 Norms 1.8 PSD/PD Matrices Full col rank: x̂ = (A⊤ A)−1 A⊤ y (ls +infea Ellipsoid: {x : (x − c)⊤ P −1 (x − c) ≤ 1},
ℓp : ∥x∥pP= ( i |xi |p )1/p
P
PSD (A ⪰ 0): Any of: unique) P ≻0
∥x∥1 = i |xi | (sparsity promoting) 1. All λi ≥ 0 Underdetermined (fat A, m < n): Polyhedron: {x : Ax ≤ b}
√ ⊤ Full row rank: x̂ = A⊤ (AA⊤ )−1 y (min- Cone: {x : θx ∈ C, ∀θ ≥ 0}
pP
∥x∥2 = 2 ⊤ 2. x Ax ≥ 0 ∀x
i xi = x x
∥x∥∞ = maxi |xi | ⊤
3. A = B B for some B norm) PSD cone: Sn+ = {X : X ⪰ 0}
Matrix norms: 4. Leading principal minors ≥ 0 General: x̂ = A† y Norm ball: {x : ∥x∥p ≤ r}, p ≥ 1
Solution set: S = A† y + N (A)
qP
3.3 Operations Preserving Convexity
Frobenius: ∥A∥F = ij aij
2 = PD (A≻ 0):strict inequalities
p pP
2 a b 2.3 Geometry Intersection of convex sets
tr(A⊤ A) = i σi 2×2: ⪰ 0 iff a ≥ 0, d ≥ 0, ad ≥ b2 ŷ = Ax̂ = ΠR(A) y Affine image: f (S) = {f (x) : x ∈ S}
b d
Spectral: ∥A∥2 = P σmax (A) = σ1 Residual: y − ŷ ⊥ R(A) Affine preimage: f −1 (S)
a b
Nuclear: ∥A∥∗ = i σi (trace norm) 2×2: ≻ 0 iff a > 0, ad > b2 ∥y − Ax̂∥2 = distance from y to R(A) Scaling, translation
b d
Inequalities: ∥x∥∞ ≤ ∥x∥2 ≤ ∥x∥1 Sum: S1 + S2 = {x + y : x ∈ S1 , y ∈ S2 }
√ √ Check PSD: For 2×2, verify det ≥ 0 and 2.4 Example
∥x∥2 ≤ n∥x∥∞ , ∥x∥1 ≤ n∥x∥2
tr ≥ 0 1 0 1 3.4 Sublevel Sets
Triangle: ∥x + y∥ ≤ ∥x∥ + ∥y∥ 1.9 SVD A = 0 1 , y = 1 Sα = {x : f (x) ≤ α}
Cauchy-Schwarz: |x⊤ y| ≤ ∥x∥2 ∥y∥2 Xr 1 1 0 f convex ⇒ Sα convex ∀α
1.3 Orthogonality A = U ΣV ⊤ = σi ui vi⊤
⊤ 2 1 ⊤ 1 Converse false!
x ⊥ y ⇔ x⊤ y = 0 A A= ,A y= 3.5 Separating Hyperplanes
i=1 1 2 1
Orthonormal: {q1 , . . . , qk } with qi⊤ qj =
If S1 , S2 convex, disjoint: ∃a ̸= 0, b:
U ∈ Rm×m : left singular vectors (eigenvecs (A⊤ A)−1 = 1 2 −1
δij ⊤ 3 −1 2 a⊤ x ≤ b ∀x ∈ S1 , a⊤ x ≥ b ∀x ∈ S2
Orthogonal matrix: Q⊤ Q = QQ⊤ = I of AA n×n ) For compact, closed sets: strict separation
1.4 Projections V ∈R : right singular vectors (eigenvecs x̂ = 1 1
3.6 Convex Functions - Definitions
3 1
Onto subspace R(A) (full col rank): of A⊤ A) 0th order: f (θx + (1 − θ)y) ≤ θf (x) + (1 −
σ1 ≥ p σ2 ≥ · · · ≥ σr > 0, r = rank(A) 2.5 Ridge Regression
ΠR(A) = A(A⊤ A)−1 A⊤ θ)f (y)
σi = λi (A A) ⊤ minx ∥Ax − y∥2 + λ∥x∥2
⊤ −1 ⊤ 1st order: (diff’able) f (y) ≥ f (x) +
Properties: Π2 = Π, Π⊤ = Π Relationships: R(A) = span{u1 , . . . , ur } x̂ = (A A + λI) A y ∇f (x)⊤ (y − x)
N (A) = span{vr+1 , . . . , vn } Always invertible, shrinks toward 0
Projection of y: ŷ = Πy Tangent plane is global underestimator
⊤
R(A ) = span{v1 , . . . , vr } Bias-variance tradeoff
Orthog complement: Π⊥ = I − Π 2.6 LASSO 2nd order: (twice diff’able) ∇2 f (x) ⪰ 0
⊤
Onto affine {x : Ax = b}: Π(y) = N (A ) = span{ur+1 , . . . , um } minx ∥Ax − y∥2 + λ∥x∥1 ∀x
y − A⊤ (AA⊤ )−1 (Ay − b) Pseudoinverse: A† = V Σ† U ⊤ Strictly convex: strict inequalities
Promotes sparsity
1.5 Gram-Schmidt Σ†ii = 1/σi if σi > 0, else 0 (unique min)
† ⊤ −1 ⊤ ℓ1 ball has corners on axes
Input: {a1 , . . . , ak }. Output: orthonormal Full col rank: A = (A A) A Strongly convex: ∇2 f ⪰ µI for µ > 0
† ⊤ ⊤ −1 QP form: minx,t ∥Ax − y∥2 + λ1⊤ t
{q1 , . . . , qk } Full row rank: A = A (AA ) 3.7 Common Convex Functions
1.10 Low-Rank Approximation
s.t. −t ≤ x ≤ t (i.e., |xi | ≤ ti ) ⊤
q1 = a1 /∥a 1∥ 2.7 Condition Number
Linear: a x + b
q̃i = ai − j<i (qj⊤ ai )qj , qi = q̃i /∥q̃i ∥ Eckart-Young: Best rank-k: Quadratic: x⊤ P x + q ⊤ x + r (P ⪰ 0)
P
κ(A) = ∥A∥ ∥A−1 ∥ = σ /σ
2 2 1 n
1.6 Eigenvalues ∥∆x∥ Norms: ∥x∥p , p ≥ 1
k ≤ κ(A) ∥∆y∥
Av = λv, v ̸= 0 (eigenvector) X ∥x∥ ∥y∥ Exponential: eax
Ak = σi ui vi⊤ = Uk Σk Vk⊤ Large κ = ill-conditioned
Characteristic poly: det(A − λI) = 0 Neg log: − log x on x > 0
i=1