Homework 1
1. Fun with vector calculus. This question has two parts.
a. If x is a d-dimensional vector variable, write down the gradient of the function f (x) =
kxk22 .
b. SUppose we have n data points that are real d-dimensional vectors. Analytically derive a
constant vector µ for which
Xn
kxi − µk22
i=1
is minimized.
Solution
For part a, you can either write out the explicit form of the squared loss function and take
derivatives with respect to each of the d coordinates. The answer should be:
∂f
∇f (x) = 2x, or, = 2x(j) .
∂x(j)
For part b, take the derivative of the given expression and set to zero. You can either decouple
it into coordinates or take the vector derivative (which resembles the expression you got to part
a, except there is a µ), and set to zero:
n
X
2(xi − µ) = 0.
i=1
Solving for µ, the final answer should be:
Pn
i=1 xi
µ= .
n
2. Linear regression with non-standard losses. In class we derived an analytical expression for
the optimal linear regression model using the least squares loss for a finite dataset. If X is the
matrix of training data points (stacked row-wise) and y is the vector of labels, then:
a. Using matrix/vector notation, write down a loss function that measures the training error
in terms of the `1 -norm.
b. Can you write down the optimal linear model in closed form? If not, why not?
c. If the answer to b is no, can you think of an alternative algorithm to optimize the loss
function? Comment on its pros and cons.
Solution
a. L(w) = kXw − yk1 .
1