1.7 Derivatives in several variables as linear transformations
1.7.1 (Derivative) Let U be an open subset of R and let f : U → R be a function. Then f is
differentiable at a ∈ U with derivative f ′ (a) is the following limit exists:
1
f ′ (a) := lim (f (a + h) − f (a))
h→0 h
1.7.3 (partial derivative) Let U be an open subset of Rn and f : U → R a function. The partial
derivative of f with respect to the ith variable, evaluated at ⃗a, is the following limit (provided it exists):
a1 a1
.. ..
. .
1
Di f (⃗a) := lim f ai + h − f
ai
h→0 h
.. .
. ..
an an
∂f
Di f = ∂x i
= Dxi f = fxi
Example: f (x, y) = x2 + y 2 . D1 f (1, 0) = 2
Example:
xy y x
f (x, y) = sin(x + y) =⇒ Dx f (x, y) = D1 f (x, y) = cos(x + y) , Dy f (x, y) = D2 f (x, y) = cos(x + y)
x2 − y 2 2x −2y
1.7.5 (alternate definition of the derivative) Let U be an open subset of R and f : U → R a
function. Then f is differentiable at a with derivative m if and only if
1
lim [(f (a + h) − f (a)) − (mh)] = 0
h→0 h
1.7.7 (Jacobian matrix) Let U be an open subset of Rn . The Jacobian matrix of a function f⃗ : U → Rm
is the m × n matrix composed of the n partial derivatives of f⃗ evaluated at ⃗a:
D1 f1 (⃗a) . . . Dn f1 (⃗a)
[J⃗f⃗(⃗a)] :=
.. .. ..
. . .
D1 fm (⃗a) . . . Dn fm (⃗a)
1.7.9 (derivative) Let U ⊆ Rn be an open subset and let f⃗ : U → Rm be a mapping; let ⃗a ∈ U . If
there exists a linear transformation L : Rn → Rm such that
1 ⃗
lim [(f (⃗a + ⃗h) − f⃗(⃗a)) − (L(⃗h))] = ⃗0
⃗
h→⃗
0 |⃗h|
Then f⃗ is differentiable at ⃗a and L is unique and is the derivative of f⃗ at ⃗a, denoted [Df (⃗a)]
In 2D case: l(x) = f ′ (a)x is the best linear approximation
of f at a.
2 2 y x
Example: Let f (x, y) = (xy, x − y ). Then, Jf (x, y) = . Now we check: h = (h1 , h2 ).
2x −2y
1 (x + h1 )(y + h2 ) xy y x h1
lim 2 2 − 2 2 −
(x + h ) − (y + h ) x − y 2x −2y h2
p
(h1 ,h2 )→(0,0) 2
h1 + h22 1 2
1 xh2 + yh1 + h1 h2 ) yh1 + xh2
= lim 2 2 −
2xh + h − 2y + h − h 2xh1 − 2yh1
p
(h1 ,h2 )→(0,0) 2
h1 + h22 1 2 2 2
1 h1 h2
= lim 2 2 =0
h2 + h2 h1 − h2
p
(h1 ,h2 )→(0,0)
1 2
1
,1.7.10 If f is differentiable at a, then all partial derivates of f at a exist, and the matrix representing
[Df (⃗a)] is [Jf (⃗a)].
Gradient the gradient of a function f : Rn → R is
grad f (⃗a) = [Df (⃗a)]T = [D1 f (⃗a) . . . Dn f (⃗a)]T
1.7.11 Let U be an open subset of Rn and let f : U → Rm be a mapping; let a be a point in U . If f is
differentiable at a, then f is continuous at a.
1.7.13 (Directional derivative) Let U ⊆ Rn be open and let f : U → Rm be a function. The
directional derivative of f at a in the direction ⃗v is
f (a + h⃗v ) − f (a)
lim
h→0 h
1.7.14 (Computing directional derivatives) If U ⊆ Rn is open and f : U → Rm is differentiable at
a ∈ U , then all directional derivatives of f at a exist, and the directional derivative in the direction ⃗v is
given by the formula
f (a + h⃗v ) − f (a)
lim = [Df (a)]⃗v
h→0 h
Ex: Consider S : M at(n, n) → M at(n, n) given by S(A) = A2 . Then DS(A) : H 7→ AH + HA. Do the
computations for a 2 × 2 matrix to corroborate this.
Claim: DS(A)(H) = AH + HA.
X
|A| = a2ij
1
lim (S(A + H) − S(A) − (AH + HA))
H→0 |H|
1
= lim ((A + H)(A + H) − A2 − AH − HA)
H→0 |H|
1
= lim H 2 = 0 ∈ M at(n, n)
H→0 |H|
1.7.18 If f is the function f (A) = A−1 defined on the set of invertible matrices, then f is differentiable
and [Df (A)]H = −A−1 HA −1 2
Ex: S : mat(n, n) → mat(n, n), X 7→ X , what is DS(A)?
a 2
a b b 2 a + bc ab + bd
Identify with . Then A =
Then we can write S : [a b c d]T →
c d c ca + dc cb + d2
d
[a + bc ab + bd ca + dc cb + d2 ]. Claim: DS(A) = AH + HA for arbitrary dimension
2
1.8 Rules for computing derivatives
Some rules to keep in mind:
1. If f is constant, then its derivative is 0
2. If f is linear, then it is differentiable everywhere, then its derivative of all point a ∈ U is f , that is,
(Df (a))v = f (v)
3. If f1 . . . fm are differentiable at a, then f = (f1 , . . . , fm ) : U → Rm is differentiable at a with derivative
Df1 (a)v
Df (a)v =
..
.
Dfm (a)v
4. If f, g are differentiable at a, then D(f + g)(a) = Df (a) + Dg(a)
5. If f : U → R and g : U → Rm are differentiable at a, so is f g and D(f g(a))v = f (a)Dg(a)vD f (a)vg(a)
2
,6. D(g/f )(a) = (Dg(a))v
f (a) − D(f(f(a)g(A))
(a))2
7. If f, g : U → Rm , then D(f g)(a) = Df (a)vg(a) + f (a)Dg(a)v
The chain rule: Let U ⊆ Rn , V ⊆ Rm be open, let g : U → V and f : V → Rp be mappings, and let
a ∈ U . If g is differentiable at a and f is differentiable at g(a), then f ◦ g is differentiable at a with
derivative
(D(f ◦ g)(a)) = (Df (g(a))) ◦ (Dg(a)) = Jf (g(a))Jg(a)
1.8.2 differentiability of polynomials and rational functions
1. any polynomial function Rn → R is differentiable on all of Rn
2. Any rational function is differentiable on the subset of Rn where the denominator does not vanish
1.9 Then mean value theorem and criteria for differentiability
Mean value theorem: Let U ⊆ Rn be open, and let f : U → R be differentiable. Let the segment [a, b]
joining a to b contained in U where [a, b] is the image of t 7→ (1 − t)a + tb where 0 ≤ t ≤ 1. Then there
exists c ∈ [a, b] such that
f (b) − f (a) = (Df (c))(b − a)
1.9.2 If f is a function as above, then
!
|f (b) − f (a)| ≤ sup |[Df (c)]| |b − a|
c∈[a,b]
1.9.6 A function is continuously differentiable on U ⊆ Rn if all its partial derivatives exist and are
continuous on U . Such a function is known as a C 1 function.
1.9.7 A C p function on U ⊆ Rn is a function that is p times continuously differentiable: all of its partial
derivatives up to order p exist and are continuous on U .
3
, 2.6 Abstract vector spaces
2.6.1 A vector space V is a set of vectors that is closed under addition between vectors and multiplication
by a scalar. The addition (+) and multiplication (· or ×) operation satisfy the following rules:
1. ∃0 ∈ V such that for any v ∈ V , 0 + v = v
2. For any v ∈ V , ∃ − v ∈ V such that v + (−v) = 0 ∈ V
3. For all v, w ∈ V , v + w = w + v ∈ V
4. For all v1 , v2 , v3 ∈ V , v1 + (v2 + v3 ) = (v1 + v2 ) + v3
5. For all v ∈ V , 1v = v
6. For all scalars a, b and all v ∈ V , a(bv) = (ab)v
7. For all scalars a, b and all v ∈ V , (a + b)v = av + bv.
8. For all scalars a and all v, w ∈ V , a(v + w) = av + aw.
Subspace criteria:
1. 0 ∈ V =⇒ 0 ∈ S
2. u, v ∈ S =⇒ u + v ∈ S
3. u ∈ S, a ∈ F =⇒ au ∈ S.
Examples of vector spaces:
Rn , mat(m, n), C(0, 1) are vector spaces under the usual operations.
The first quadrant of R2 is not a vector space under the usual operations.
V = (0, ∞) with the operations x + y = xy and a · x = xa is a vector space.
2.6.4 If V and W are vector spaces, a linear transformation T : V → W is a mapping satisfying for all
scalars α, β ∈ R and all v1 , v2 ∈ V ,
T (αv1 + βv2 ) = αT (v1 ) + βT (v2 )
Example of linear transformation: Let C(0, 1) denote the space of continuous real-valued functions on
[0, 1]. Let g : [0, 1] × [0, 1] → R be continuous, and define Tg : C(0, 1) → C(0, 1) by (Tg (f ))(x) =
´1
0
g(x, y)f (y)dy.
2.6.8 Let V be a vector space and let {v} := v1 , . . . , vm be a finite ordered collection of vectors in V . A
linear combination of the vectors v1 , . . . , vm is a vector v of the form
m
X
v= ai vi , a1 , . . . , am ∈ R
i=1
2.6.9 The vectors v1 , . . . , vm span V if and only if every vector in V is a linear combination of v1 , . . . , vm .
2.6.10 The vectors v1 , . . . , vm are linearly independent if and only if any of the following three equivalent
conditions is met: 1.
m
X m
X
ai vi = bi vi =⇒ a1 = b1 , a2 = b2 , . . . , am = bm
i=1 i=1
2.
m
X
ai vi = 0 =⇒ a1 = a2 = · · · = am = 0
i=1
3. None of the vi is a linear combination of the others.
2.6.11 An ordered set of vectors v1 , . . . , vm ∈ V is a basis of V if and only if it is linearly independent
and spans V .
A basis does not need to be finite. Example: C(0, 1) does not have a finite basis.
2.6.12 Let {v} = v1 , . . . , vn be a finite, ordered collection of n vectors in a vector space V . The concrete
to abstract function Φ{v} is the linear transformation Φ{v} : Rn → V such that
a1
..
Φ{v} (a) = Φ{v} . := a1 v1 + · · · + an vn
an
2.6.15 Let {v} = v1 , . . . , vn be vectors in a vector space V , and let Φ{v} : Rn → V be the associated
concrete to abstract transformation. Then: 1. The set {v} is linearly independent if and only if Φ{v} is
4
1.7.1 (Derivative) Let U be an open subset of R and let f : U → R be a function. Then f is
differentiable at a ∈ U with derivative f ′ (a) is the following limit exists:
1
f ′ (a) := lim (f (a + h) − f (a))
h→0 h
1.7.3 (partial derivative) Let U be an open subset of Rn and f : U → R a function. The partial
derivative of f with respect to the ith variable, evaluated at ⃗a, is the following limit (provided it exists):
a1 a1
.. ..
. .
1
Di f (⃗a) := lim f ai + h − f
ai
h→0 h
.. .
. ..
an an
∂f
Di f = ∂x i
= Dxi f = fxi
Example: f (x, y) = x2 + y 2 . D1 f (1, 0) = 2
Example:
xy y x
f (x, y) = sin(x + y) =⇒ Dx f (x, y) = D1 f (x, y) = cos(x + y) , Dy f (x, y) = D2 f (x, y) = cos(x + y)
x2 − y 2 2x −2y
1.7.5 (alternate definition of the derivative) Let U be an open subset of R and f : U → R a
function. Then f is differentiable at a with derivative m if and only if
1
lim [(f (a + h) − f (a)) − (mh)] = 0
h→0 h
1.7.7 (Jacobian matrix) Let U be an open subset of Rn . The Jacobian matrix of a function f⃗ : U → Rm
is the m × n matrix composed of the n partial derivatives of f⃗ evaluated at ⃗a:
D1 f1 (⃗a) . . . Dn f1 (⃗a)
[J⃗f⃗(⃗a)] :=
.. .. ..
. . .
D1 fm (⃗a) . . . Dn fm (⃗a)
1.7.9 (derivative) Let U ⊆ Rn be an open subset and let f⃗ : U → Rm be a mapping; let ⃗a ∈ U . If
there exists a linear transformation L : Rn → Rm such that
1 ⃗
lim [(f (⃗a + ⃗h) − f⃗(⃗a)) − (L(⃗h))] = ⃗0
⃗
h→⃗
0 |⃗h|
Then f⃗ is differentiable at ⃗a and L is unique and is the derivative of f⃗ at ⃗a, denoted [Df (⃗a)]
In 2D case: l(x) = f ′ (a)x is the best linear approximation
of f at a.
2 2 y x
Example: Let f (x, y) = (xy, x − y ). Then, Jf (x, y) = . Now we check: h = (h1 , h2 ).
2x −2y
1 (x + h1 )(y + h2 ) xy y x h1
lim 2 2 − 2 2 −
(x + h ) − (y + h ) x − y 2x −2y h2
p
(h1 ,h2 )→(0,0) 2
h1 + h22 1 2
1 xh2 + yh1 + h1 h2 ) yh1 + xh2
= lim 2 2 −
2xh + h − 2y + h − h 2xh1 − 2yh1
p
(h1 ,h2 )→(0,0) 2
h1 + h22 1 2 2 2
1 h1 h2
= lim 2 2 =0
h2 + h2 h1 − h2
p
(h1 ,h2 )→(0,0)
1 2
1
,1.7.10 If f is differentiable at a, then all partial derivates of f at a exist, and the matrix representing
[Df (⃗a)] is [Jf (⃗a)].
Gradient the gradient of a function f : Rn → R is
grad f (⃗a) = [Df (⃗a)]T = [D1 f (⃗a) . . . Dn f (⃗a)]T
1.7.11 Let U be an open subset of Rn and let f : U → Rm be a mapping; let a be a point in U . If f is
differentiable at a, then f is continuous at a.
1.7.13 (Directional derivative) Let U ⊆ Rn be open and let f : U → Rm be a function. The
directional derivative of f at a in the direction ⃗v is
f (a + h⃗v ) − f (a)
lim
h→0 h
1.7.14 (Computing directional derivatives) If U ⊆ Rn is open and f : U → Rm is differentiable at
a ∈ U , then all directional derivatives of f at a exist, and the directional derivative in the direction ⃗v is
given by the formula
f (a + h⃗v ) − f (a)
lim = [Df (a)]⃗v
h→0 h
Ex: Consider S : M at(n, n) → M at(n, n) given by S(A) = A2 . Then DS(A) : H 7→ AH + HA. Do the
computations for a 2 × 2 matrix to corroborate this.
Claim: DS(A)(H) = AH + HA.
X
|A| = a2ij
1
lim (S(A + H) − S(A) − (AH + HA))
H→0 |H|
1
= lim ((A + H)(A + H) − A2 − AH − HA)
H→0 |H|
1
= lim H 2 = 0 ∈ M at(n, n)
H→0 |H|
1.7.18 If f is the function f (A) = A−1 defined on the set of invertible matrices, then f is differentiable
and [Df (A)]H = −A−1 HA −1 2
Ex: S : mat(n, n) → mat(n, n), X 7→ X , what is DS(A)?
a 2
a b b 2 a + bc ab + bd
Identify with . Then A =
Then we can write S : [a b c d]T →
c d c ca + dc cb + d2
d
[a + bc ab + bd ca + dc cb + d2 ]. Claim: DS(A) = AH + HA for arbitrary dimension
2
1.8 Rules for computing derivatives
Some rules to keep in mind:
1. If f is constant, then its derivative is 0
2. If f is linear, then it is differentiable everywhere, then its derivative of all point a ∈ U is f , that is,
(Df (a))v = f (v)
3. If f1 . . . fm are differentiable at a, then f = (f1 , . . . , fm ) : U → Rm is differentiable at a with derivative
Df1 (a)v
Df (a)v =
..
.
Dfm (a)v
4. If f, g are differentiable at a, then D(f + g)(a) = Df (a) + Dg(a)
5. If f : U → R and g : U → Rm are differentiable at a, so is f g and D(f g(a))v = f (a)Dg(a)vD f (a)vg(a)
2
,6. D(g/f )(a) = (Dg(a))v
f (a) − D(f(f(a)g(A))
(a))2
7. If f, g : U → Rm , then D(f g)(a) = Df (a)vg(a) + f (a)Dg(a)v
The chain rule: Let U ⊆ Rn , V ⊆ Rm be open, let g : U → V and f : V → Rp be mappings, and let
a ∈ U . If g is differentiable at a and f is differentiable at g(a), then f ◦ g is differentiable at a with
derivative
(D(f ◦ g)(a)) = (Df (g(a))) ◦ (Dg(a)) = Jf (g(a))Jg(a)
1.8.2 differentiability of polynomials and rational functions
1. any polynomial function Rn → R is differentiable on all of Rn
2. Any rational function is differentiable on the subset of Rn where the denominator does not vanish
1.9 Then mean value theorem and criteria for differentiability
Mean value theorem: Let U ⊆ Rn be open, and let f : U → R be differentiable. Let the segment [a, b]
joining a to b contained in U where [a, b] is the image of t 7→ (1 − t)a + tb where 0 ≤ t ≤ 1. Then there
exists c ∈ [a, b] such that
f (b) − f (a) = (Df (c))(b − a)
1.9.2 If f is a function as above, then
!
|f (b) − f (a)| ≤ sup |[Df (c)]| |b − a|
c∈[a,b]
1.9.6 A function is continuously differentiable on U ⊆ Rn if all its partial derivatives exist and are
continuous on U . Such a function is known as a C 1 function.
1.9.7 A C p function on U ⊆ Rn is a function that is p times continuously differentiable: all of its partial
derivatives up to order p exist and are continuous on U .
3
, 2.6 Abstract vector spaces
2.6.1 A vector space V is a set of vectors that is closed under addition between vectors and multiplication
by a scalar. The addition (+) and multiplication (· or ×) operation satisfy the following rules:
1. ∃0 ∈ V such that for any v ∈ V , 0 + v = v
2. For any v ∈ V , ∃ − v ∈ V such that v + (−v) = 0 ∈ V
3. For all v, w ∈ V , v + w = w + v ∈ V
4. For all v1 , v2 , v3 ∈ V , v1 + (v2 + v3 ) = (v1 + v2 ) + v3
5. For all v ∈ V , 1v = v
6. For all scalars a, b and all v ∈ V , a(bv) = (ab)v
7. For all scalars a, b and all v ∈ V , (a + b)v = av + bv.
8. For all scalars a and all v, w ∈ V , a(v + w) = av + aw.
Subspace criteria:
1. 0 ∈ V =⇒ 0 ∈ S
2. u, v ∈ S =⇒ u + v ∈ S
3. u ∈ S, a ∈ F =⇒ au ∈ S.
Examples of vector spaces:
Rn , mat(m, n), C(0, 1) are vector spaces under the usual operations.
The first quadrant of R2 is not a vector space under the usual operations.
V = (0, ∞) with the operations x + y = xy and a · x = xa is a vector space.
2.6.4 If V and W are vector spaces, a linear transformation T : V → W is a mapping satisfying for all
scalars α, β ∈ R and all v1 , v2 ∈ V ,
T (αv1 + βv2 ) = αT (v1 ) + βT (v2 )
Example of linear transformation: Let C(0, 1) denote the space of continuous real-valued functions on
[0, 1]. Let g : [0, 1] × [0, 1] → R be continuous, and define Tg : C(0, 1) → C(0, 1) by (Tg (f ))(x) =
´1
0
g(x, y)f (y)dy.
2.6.8 Let V be a vector space and let {v} := v1 , . . . , vm be a finite ordered collection of vectors in V . A
linear combination of the vectors v1 , . . . , vm is a vector v of the form
m
X
v= ai vi , a1 , . . . , am ∈ R
i=1
2.6.9 The vectors v1 , . . . , vm span V if and only if every vector in V is a linear combination of v1 , . . . , vm .
2.6.10 The vectors v1 , . . . , vm are linearly independent if and only if any of the following three equivalent
conditions is met: 1.
m
X m
X
ai vi = bi vi =⇒ a1 = b1 , a2 = b2 , . . . , am = bm
i=1 i=1
2.
m
X
ai vi = 0 =⇒ a1 = a2 = · · · = am = 0
i=1
3. None of the vi is a linear combination of the others.
2.6.11 An ordered set of vectors v1 , . . . , vm ∈ V is a basis of V if and only if it is linearly independent
and spans V .
A basis does not need to be finite. Example: C(0, 1) does not have a finite basis.
2.6.12 Let {v} = v1 , . . . , vn be a finite, ordered collection of n vectors in a vector space V . The concrete
to abstract function Φ{v} is the linear transformation Φ{v} : Rn → V such that
a1
..
Φ{v} (a) = Φ{v} . := a1 v1 + · · · + an vn
an
2.6.15 Let {v} = v1 , . . . , vn be vectors in a vector space V , and let Φ{v} : Rn → V be the associated
concrete to abstract transformation. Then: 1. The set {v} is linearly independent if and only if Φ{v} is
4