Frontmatter

If you are publishing this notebook on the web, you can set the parameters below to provide HTML metadata. This is useful for search engines and social media.

using Symbolics, LinearAlgebra, PlutoUI, ForwardDiff, Zygote
67.1 s
TableOfContents(title="Differentiating the Determinant and the Inverse", indent=true, depth=4, aside=true)
41.8 μs

Math: The gradient of the determinant

md"""
# Math: The gradient of the determinant
"""
68.0 ms

Theorem: $\nabla(\det A) = \text{cofactor}(A) = (\det A)A^{-T} = \text{adj}(A^T)$
$d(\det A) = \text{tr}( \det(A)A^{-1}dA) = \text{tr(adj}(A) dA) = \text{tr(cofactor}(A)^T dA)$
Note:the adjugate is defined as $(\det A)A^{-1}$; it is the transpose of the cofactor.

md"""
Theorem:
``\nabla(\det A) = \text{cofactor}(A) = (\det A)A^{-T} = \text{adj}(A^T)`` $(br)
``d(\det A) = \text{tr}( \det(A)A^{-1}dA) = \text{tr(adj}(A) dA) = \text{tr(cofactor}(A)^T dA)`` $(br)
Note:the adjugate is defined as ``(\det A)A^{-1}``; it is the transpose of the cofactor.
"""
350 ms
adj (generic function with 1 method)
adj(A) = det(A) * inv(A) # adjugate function
286 μs
cofactor (generic function with 1 method)
cofactor(A) = adj(A)'
11.9 ms

Formulas at a glance:
$A^{-1}$ = adj($A$)/det($A$) = cofactor($A$)$^T$/det($A$)
adj($A$) = det($A$)$A^{-1}$ = cofactor($A$)$^T$
cofactor($A$) = $A^{-T}$/det($A$) = adj($A$)$^{T}$

md"""
**Formulas at a glance:** $(br)
``A^{-1}`` = adj(``A``)/det(``A``) = cofactor(``A``)``^T``/det(``A``) $(br)
adj(``A``) = det(``A``)``A^{-1}`` = cofactor(``A``)``^T`` $(br)
cofactor(``A``) = ``A^{-T}``/det(``A``) = adj(``A``)``^{T}``
"""
4.2 ms

The adjugate is the transpose of the cofactor matrix. You may remember that the cofactor matrix has as (i,j) entry $(-1)^{i+j}$ times the determinant obtained by deleting row i and column j.

md"""
The adjugate is the transpose of the cofactor matrix.
You may remember that the cofactor matrix has as (i,j) entry ``(-1)^{i+j}`` times the determinant obtained by deleting row i and column j.
"""
209 μs
@variables a b c d
22.1 ms
M

$$\begin{equation} \left[ \begin{array}{cc} a & c \\ b & d \\ \end{array} \right] \end{equation}$$

M = [ a c;b d]
18.3 μs

$$\begin{equation} \left[ \begin{array}{cc} d & - b \\ - c & a \\ \end{array} \right] \end{equation}$$

simplify.(cofactor(M))
10.6 s

$$\begin{equation} \left[ \begin{array}{cc} d & - c \\ - b & a \\ \end{array} \right] \end{equation}$$

simplify.(adj( M))
44.3 ms

$$\begin{equation} \left[ \begin{array}{cc} \frac{d}{a d - b c} & \frac{ - c}{a d - b c} \\ \frac{ - b}{a d - b c} & \frac{a}{a d - b c} \\ \end{array} \right] \end{equation}$$

simplify.(inv(M))
369 ms

Numerical Demonstration

md"""
## Numerical Demonstration
"""
3.7 ms
9×9 Matrix{Float64}:
 9.84997e-6  3.27962e-6  3.68515e-6  …  1.31594e-6  8.59009e-6  5.59433e-6
 6.58166e-6  9.49231e-6  7.67381e-6     8.88176e-6  8.25997e-7  7.52597e-6
 6.70657e-6  2.578e-6    8.35539e-6     1.98348e-6  8.75045e-6  6.93709e-6
 2.44452e-6  3.84697e-6  9.332e-6       7.35059e-6  1.4179e-6   2.84029e-6
 1.90982e-6  3.21393e-6  5.36157e-6     3.90574e-6  2.79935e-6  6.62397e-6
 5.20741e-6  9.23777e-7  7.61258e-6  …  8.47971e-6  5.19634e-6  1.73769e-6
 3.78462e-6  5.52636e-7  9.17756e-6     8.67793e-6  5.85933e-6  8.39804e-6
 9.97994e-6  7.38427e-6  8.06906e-6     5.13478e-6  9.34815e-6  2.17865e-6
 7.46987e-6  6.87991e-6  6.53469e-6     3.4723e-6   4.03916e-6  9.32723e-6
begin
n = 9
A = rand(n,n); dA = rand(n,n) * .00001
end
56.4 μs
det(A+dA) - det(A), tr( adj(A) * dA )
147 μs

Forward Diff Autodiff

md"""
## Forward Diff Autodiff
"""
145 μs
9×9 Matrix{Float64}:
 -0.0254462     0.0750306    0.0256232   …   0.0128037     0.00625252   -0.0146315
  0.0392959    -0.0659637    0.00465413     -0.0378384    -0.00419778    0.0321942
  0.0107266    -0.0722348   -0.0137141      -0.0293705     0.000304564   0.0236944
  0.00138368   -0.015074    -0.0132784       0.0103984     0.0082165    -0.00174828
  0.0157866    -0.00764511   0.0334501      -0.000560729  -0.0247453    -0.00310108
  0.0181448    -0.0610458   -0.00610382  …  -0.0353692     0.0179075     0.0156312
 -0.0554795     0.156881    -0.0144352       0.0659222     0.00133675   -0.0340218
  0.0235991    -0.0512598    0.0203016      -0.0102284    -0.0143283     0.0191255
  0.000760144   0.00688254  -0.0554133       0.022568      0.01269      -0.0365086
ForwardDiff.gradient(A->det(A), A)
983 ms
9×9 adjoint(::Matrix{Float64}) with eltype Float64:
 -0.0254462     0.0750306    0.0256232   …   0.0128037     0.00625252   -0.0146315
  0.0392959    -0.0659637    0.00465413     -0.0378384    -0.00419778    0.0321942
  0.0107266    -0.0722348   -0.0137141      -0.0293705     0.000304564   0.0236944
  0.00138368   -0.015074    -0.0132784       0.0103984     0.0082165    -0.00174828
  0.0157866    -0.00764511   0.0334501      -0.000560729  -0.0247453    -0.00310108
  0.0181448    -0.0610458   -0.00610382  …  -0.0353692     0.0179075     0.0156312
 -0.0554795     0.156881    -0.0144352       0.0659222     0.00133675   -0.0340218
  0.0235991    -0.0512598    0.0203016      -0.0102284    -0.0143283     0.0191255
  0.000760144   0.00688254  -0.0554133       0.022568      0.01269      -0.0365086
cofactor(A)
39.2 μs

Reverse Mode Autodiff

md"""
## Reverse Mode Autodiff
"""
143 μs

Zygote does reverse ad

md"""
Zygote does reverse ad
"""
225 μs
Zygote.gradient(A->det(A), A )
7.2 s

Symbolic Demonstration

md"""
## Symbolic Demonstration
"""
143 μs

$$\begin{equation} \left[ \begin{array}{cc} d & - b \\ - c & a \\ \end{array} \right] \end{equation}$$

reshape(Symbolics.gradient(det(M),vec(M)), 2, 2)
616 ms

$$\begin{equation} \left[ \begin{array}{cc} d & - b \\ - c & a \\ \end{array} \right] \end{equation}$$

simplify.(adj(M'))
7.4 ms

Direct Proof

A direct proof where you just differentiate the scalar with respect to every input can be obtained as a simple consequence from the cofactor expansion a.k.a. the Laplace expansion of the determinant based on the $i$th row.

$\det(A) = A_{i1} C_{i1} + A_{i2}C_{i2} + \ldots + A_{in}C_{in}$

Recall that the determinant is a linear (affine) function of any element with slope equal to the cofactor.

We then readily obtain $\frac{\partial \det(A)}{\partial A_{ij}}=C_{ij}$ from which we conclude $\nabla (\det A ) = C$.


Example:

md"""
# Direct Proof
A direct proof where you just differentiate the scalar with respect to every input
can be obtained as a simple consequence from the cofactor expansion a.k.a. the [Laplace expansion](https://en.wikipedia.org/wiki/Laplace_expansion) of the determinant based on the ``i``th row.

``\det(A) = A_{i1} C_{i1} + A_{i2}C_{i2} + \ldots + A_{in}C_{in}``

Recall that the determinant is a linear (affine) function of any element with slope equal to the cofactor.

We then readily obtain
``\frac{\partial \det(A)}{\partial A_{ij}}=C_{ij}`` from which we conclude
``\nabla (\det A ) = C``.



$(br) **Example:**
"""
37.1 ms

$$\begin{equation} \left[ \begin{array}{ccc} 4 & 1 & 5 \\ a & 4 & 5 \\ 6 & 7 & 8 \\ \end{array} \right] \end{equation}$$

begin
i,j = 2,1
Z= [Num(4) 1 5; 3 4 5 ; 6 7 8]
Z[i,j] = a
Z
end
7.5 ms

$$\begin{equation} 18 + 5 \left( -24 + 7 a \right) - 8 a \end{equation}$$

det(Z)
54.6 ms

$$\begin{equation} 27.0 \end{equation}$$

simplify( cofactor(Z)[i,j] )
2.2 s

Fancy Proof

Figure out linearization near the identity I

$\det(I+dA)- 1 = \text{tr}(dA)$ (think of the $n!$ terms in the determinant and drop higher terms)

md"""
# Fancy Proof
Figure out linearization near the identity I $(br)

``\det(I+dA)- 1 = \text{tr}(dA)`` (think of the ``n!`` terms in the determinant and drop higher terms) $(br)
"""
329 μs

$\det(A+ A(A^{-1}dA)) - \det(A) = \det(A)\det(I+A^{-1}dA) -\det(A) =$ $\det(A) \text{tr}( A^{-1}dA) = \text{tr}(\det(A) A^{-1}dA)$

md"""
``\det(A+ A(A^{-1}dA)) - \det(A) = \det(A)\det(I+A^{-1}dA) -\det(A) = ``
``\det(A) \text{tr}( A^{-1}dA) =
\text{tr}(\det(A) A^{-1}dA)``
"""
168 μs

Application to derivative of the characteristic polynomial evaluated at $x$

$p(x) = \det(xI-A)$, a scalar function of $x$

Recall that $p(x)$ can be written in terms of the eigenvalues $\lambda_i$:

md"""
## Application to derivative of the characteristic polynomial evaluated at ``x``

``p(x) = \det(xI-A)``, a scalar function of ``x``

Recall that ``p(x)`` can be written in terms of the eigenvalues ``\lambda_i``:


"""
309 μs

Direct derivative (freshman calculus):

$\frac{d}{dx} \prod_i (x-\lambda_i) \ = \ \sum_i \prod_{j\ne i} (x-\lambda _j) = \prod_i (x-\lambda_i) \left\{ \sum_i (x-\lambda_i)^{-1} \right\}$ (as long as $x \ne \lambda_i$)


Perfectly good simple proof, but if you want to show off...

md"""
### Direct derivative (freshman calculus):

``\frac{d}{dx} \prod_i (x-\lambda_i) \ = \ \sum_i \prod_{j\ne i} (x-\lambda _j) =
\prod_i (x-\lambda_i) \left\{ \sum_i (x-\lambda_i)^{-1} \right\}``
(as long as ``x \ne \lambda_i``)

$(br)
Perfectly good simple proof, but if you want to show off...
"""
318 μs

With our new technology:

$d(\det (xI-A))= \det(xI-A) \text{tr}( (xI-A)^{-1} d(xI-A) )$
$\hspace{1.2in}= \det(xI-A) \text{tr}(xI-A)^{-1} dx$

Note: $d(xI-A)=dxI$ when $A$ is constant and
$\text{tr}(Adx) = \text{tr}(A)dx$ since $dx$ is a scalar.

md"""
### With our new technology:

``d(\det (xI-A))= \det(xI-A) \text{tr}( (xI-A)^{-1} d(xI-A) ) ``$(br)
`` \hspace{1.2in}= \det(xI-A) \text{tr}(xI-A)^{-1} dx``

Note: ``d(xI-A)=dxI`` when ``A`` is constant and $(br) ``\text{tr}(Adx) = \text{tr}(A)dx`` since ``dx`` is a scalar.
"""
284 μs

Check:

md"""
### Check:
"""
118 μs
begin
x = rand()
ForwardDiff.derivative( x-> det(x*I-A), x), # Automatic Differentiation
det(x*I-A)*tr(inv(x*I-A)) # Analytic Answer
end
214 ms

Application d(log(det(A)))

$d(\log(\det(A))) = \frac{d(\det(A))}{\det A} = \det(A^{-1}) d(\det(A)) = \text{tr}( A^{-1} dA)$

The logarithmic derivative shows up a lot in applied mathematics. For example the key term in Newton's method: $f(x)/f'(x)$ may be written $1/ \left\{ \log f(x) \right\}'$

md"""
## Application d(log(det(A)))

``d(\log(\det(A))) = \frac{d(\det(A))}{\det A} = \det(A^{-1}) d(\det(A)) =
\text{tr}( A^{-1} dA) `` $(br)

The logarithmic derivative shows up a lot in applied mathematics. For example the key term in Newton's method: ``f(x)/f'(x)`` may be written ``1/ \left\{ \log f(x) \right\}'``
"""
311 μs

Math: The Jacobian of the Inverse

$A^{-1}A= I \rightarrow d(A^{-1}A)=0=d(A^{-1})A + A^{-1}dA$ from the product rule
$\rightarrow d(A^{-1}) = -A^{-1} (dA) A^{-1} = -(A^{-T} \otimes A^{-1}) dA$

md"""
# Math: The Jacobian of the Inverse
``A^{-1}A= I \rightarrow d(A^{-1}A)=0=d(A^{-1})A + A^{-1}dA`` from the product rule $(br)
``\rightarrow d(A^{-1}) = -A^{-1} (dA) A^{-1} = -(A^{-T} \otimes A^{-1}) dA``
"""
260 μs
16×16 Matrix{Float64}:
 -3.88325   -3.02397     5.69787    1.98243  …   1.39286    -2.62447   -0.913117
  1.94632    5.55259    -7.09678   -3.76388     -2.55755     3.26882    1.73367
 -2.595     -5.94282     3.09885    4.87364      2.7373     -1.42735   -2.24483
  1.78865   -0.190301    0.690036  -2.211        0.0876539  -0.317834   1.0184
 -3.02397   -2.35484     4.43706    1.54376     -0.148192    0.279228   0.0971501
  1.51564    4.32392    -5.52642   -2.93102  …   0.272108   -0.347783  -0.184452
 -2.02078   -4.6278      2.41315    3.79522     -0.291232    0.151862   0.238836
  ⋮                                          ⋱                          ⋮
  3.80763    8.71986    -4.54693   -7.15107  …   1.05601    -0.550652  -0.866024
 -2.62447    0.279228   -1.01248    3.24419      0.0338157  -0.122616   0.392885
  1.98243    1.54376    -2.9088    -1.01204     -1.72176     3.24419    1.12873
 -0.993608  -2.83463     3.62295    1.92149      3.16147    -4.04068   -2.14304
  1.32476    3.03385    -1.58199   -2.48803     -3.38366     1.76439    2.7749
 -0.913117   0.0971501  -0.352268   1.12873  …  -0.108352    0.392885  -1.25888
begin
C = rand(4,4)
ForwardDiff.jacobian(A->inv(A),C)
end
804 ms
16×16 Matrix{Float64}:
 -3.88325   -3.02397     5.69787    1.98243  …   1.39286    -2.62447   -0.913117
  1.94632    5.55259    -7.09678   -3.76388     -2.55755     3.26882    1.73367
 -2.595     -5.94282     3.09885    4.87364      2.7373     -1.42735   -2.24483
  1.78865   -0.190301    0.690036  -2.211        0.0876539  -0.317834   1.0184
 -3.02397   -2.35484     4.43706    1.54376     -0.148192    0.279228   0.0971501
  1.51564    4.32392    -5.52642   -2.93102  …   0.272108   -0.347783  -0.184452
 -2.02078   -4.6278      2.41315    3.79522     -0.291232    0.151862   0.238836
  ⋮                                          ⋱                          ⋮
  3.80763    8.71986    -4.54693   -7.15107  …   1.05601    -0.550652  -0.866024
 -2.62447    0.279228   -1.01248    3.24419      0.0338157  -0.122616   0.392885
  1.98243    1.54376    -2.9088    -1.01204     -1.72176     3.24419    1.12873
 -0.993608  -2.83463     3.62295    1.92149      3.16147    -4.04068   -2.14304
  1.32476    3.03385    -1.58199   -2.48803     -3.38366     1.76439    2.7749
 -0.913117   0.0971501  -0.352268   1.12873  …  -0.108352    0.392885  -1.25888
-kron(inv(C'),inv(C))
27.8 μs