Exercise 0 - Matrix Calculus

Problem 1 (25%) Definitions

Gradient
It points in the direction of the greatest rate of increase of the function, and its magnitude gives the rate of increase in that direction.

The gradient is a vector composed of the partial derivatives of the function with respect to each of its variables, where $f : R^{n} \to R$ .
$\nabla f (x) = [\frac{\partial f}{\partial x}]^{T} = [\frac{\partial f}{\partial x _{1}} \dots \frac{\partial f}{\partial x _{n}}]$
Link to original

Jacobian
The Jacobian is a generalization of the Gradient where we look at a continuously differentiable function $f : R^{n} \to R^{m}$ .
$\frac{\partial f}{\partial x} = \frac{\partial f _{1}}{\partial x _{1}} ⋮ \frac{\partial f _{m}}{\partial x _{1}} \dots ⋱ \dots \frac{\partial f _{1}}{\partial x _{n}} ⋮ \frac{\partial f _{m}}{\partial x _{n}}$
Link to original

c) The resulting gradient will be

1 \times n

\nabla f (x) = \frac{\partial f}{\partial x} = [\frac{\partial f}{\partial x _{1}} \dots \frac{\partial f}{\partial x _{n}}]

d) The resulting jacobian will be $m \times n$

\frac{\partial f}{\partial x} = \frac{\partial f _{1}}{\partial x _{1}} ⋮ \frac{\partial f _{m}}{\partial x _{1}} \dots ⋱ \dots \frac{\partial f _{1}}{\partial x _{n}} ⋮ \frac{\partial f _{m}}{\partial x _{n}}

Problem 2 (25%) Linear

Let $f (x) = A x$ , where

A = [a_{11} a_{21} a_{12} a_{22}], x = [x_{1} x_{2}]

\frac{\partial f ( x )}{\partial x} \frac{\partial f ( x )}{\partial x} \frac{\partial f ( x )}{\partial x} \frac{\partial f ( x )}{\partial x} \frac{\partial f ( x )}{\partial x} = \frac{\partial}{\partial x} A x = \frac{\partial}{\partial x} [a_{11} x_{1} + a_{12} x_{2} a_{21} x_{1} + a_{22} x_{2}] = [f_{1} f_{2}] = [\frac{\partial}{\partial x _{1}} f_{1} \frac{\partial}{\partial x _{1}} f_{2} \frac{\partial}{\partial x _{2}} f_{2} \frac{\partial}{\partial x _{2}} f_{2}] = [a_{11} a_{21} a_{12} a_{22}] = A

This is the jacobian of $x$ , since $f (x)$ is a vector with two elements.

\frac{\partial A x}{\partial x} = A

When $x : n \times 1$ and $f : m \times n$ .

Problem 3 (25%) Nonlinear

Let $f (x, y) = x^{T} G y$ , where

x = [x_{1} x_{2}], G = [g_{11} g_{21} g_{12} g_{22} g_{13} g_{23}], y = y_{1} y_{2} y_{3}

a) Gradient With Respect to a Variable The dimensions of $f (x, y)$ is $1 \times 2 * 2 \times 3 * 3 \times 1 = 1 \times 1$

$\frac{\partial f ( x , y )}{\partial x}$ will have dimensions $m \times n = 1 \times 2$ , while $\nabla_{x} f (x, y)$ will have dimensions $2 \times 1$ . They are therefore a transpose different.

b) Method 1

\nabla_{x} f (x, y) \nabla_{x} f (x, y) \nabla_{x} f (x, y) = G y = [g_{11} g_{21} g_{12} g_{22} g_{13} g_{23}] y_{1} y_{2} y_{3} = [g_{11} y_{1} + g_{12} y_{2} + g_{13} y_{3} g_{21} y_{1} + g_{22} y_{2} + g_{23} y_{3}]

Method 2

\nabla_{x} f (x, y) \nabla_{x} f (x, y) \nabla_{x} f (x, y) \nabla_{x} f (x, y) = \nabla_{x} (x^{T} G y) = \nabla_{x} (x_{1} y_{1} g_{11} + x_{2} y_{1} g_{21} + x_{1} y_{2} g_{12} + x_{2} y_{2} g_{22} + x_{1} y_{3} g_{13} + x_{2} y_{3} g_{23}) = [\frac{\partial f}{\partial x _{1}} \frac{\partial f}{x _{2}}] = [g_{11} y_{1} + g_{12} y_{2} + g_{13} y_{3} g_{21} y_{1} + g_{22} y_{2} + g_{23} y_{3}]

c) We have $f (x, y) = x_{1} y_{1} g_{11} + x_{2} y_{1} g_{21} + x_{1} y_{2} g_{12} + x_{2} y_{2} g_{22} + x_{1} y_{3} g_{13} + x_{2} y_{3} g_{23}$ , and calculate

\nabla_{y} f (x, y) \nabla_{y} f (x, y) = \frac{\partial f}{\partial y _{1}} \frac{\partial f}{\partial y _{2}} \frac{\partial f}{\partial y _{3}} = x_{1} g_{11} + x_{2} g_{21} x_{1} g_{12} + x_{2} g_{22} x_{1} g_{13} + x_{2} g_{23}

d) Using a variant of the product rule, we first differentiate with respect to the “first x”, and then with respect to the “second x” similar to how we did in a) and b)

\nabla f (x) = G x + G^{T} x

Since $G$ is symmetric, $G = G^{T}$ , we get

\nabla f (x) = 2 G x

Problem 4 (25%) Common case

Given this scalar operator

L (x, λ, μ) = x^{T} G x + λ^{T} (C x - d) + μ^{T} (E x - h)

Find

a) $\nabla_{x} L (x, λ, μ)$

\nabla_{x} L (x, λ, μ) = (G x + G^{T} x) + C^{T} λ + E^{T} μ

b) $\nabla_{λ} L (x, λ, μ)$

\nabla_{λ} L (x, λ, μ) = C x - d

c) $\nabla_{μ} L (x, λ, μ)$

\nabla_{μ} L (x, λ, μ) = E x - h

🪴 Quartz 4.0

Explorer

Exercise 0 - Matrix Calculus

Problem 1 (25%) Definitions

Gradient

Jacobian

Problem 2 (25%) Linear

Problem 3 (25%) Nonlinear

Problem 4 (25%) Common case

Graph View

Table of Contents

Backlinks