using Symbolics

, LinearAlgebra

, PlutoUI

, ForwardDiff

, Zygote

67.1 s

TableOfContents(title="Differentiating the Determinant and the Inverse", indent=true, depth=4, aside=true)

41.8 μs

Math: The gradient of the determinant

md"""

# Math: The gradient of the determinant

"""

68.0 ms

Theorem: $\nabla(\det A) = \text{cofactor}(A) = (\det A)A^{-T} = \text{adj}(A^T)$
$d(\det A) = \text{tr}( \det(A)A^{-1}dA) = \text{tr(adj}(A) dA) = \text{tr(cofactor}(A)^T dA)$
Note:the adjugate is defined as $(\det A)A^{-1}$; it is the transpose of the cofactor.

md"""

Theorem:

``\nabla(\det A) = \text{cofactor}(A) = (\det A)A^{-T} = \text{adj}(A^T)`` $(br)

``d(\det A) = \text{tr}( \det(A)A^{-1}dA) = \text{tr(adj}(A) dA) = \text{tr(cofactor}(A)^T dA)`` $(br)

Note:the adjugate is defined as ``(\det A)A^{-1}``; it is the transpose of the cofactor.

"""

350 ms

adj (generic function with 1 method)

adj(A) = det(

) * inv(

) # adjugate function

286 μs

cofactor (generic function with 1 method)

cofactor(A) =

adj

(

11.9 ms

Formulas at a glance:
$A^{-1}$ = adj($A$)/det($A$) = cofactor($A$)$^T$/det($A$)
adj($A$) = det($A$)$A^{-1}$ = cofactor($A$)$^T$
cofactor($A$) = $A^{-T}$/det($A$) = adj($A$)$^{T}$

md"""

**Formulas at a glance:** $(br)

``A^{-1}`` = adj(``A``)/det(``A``) = cofactor(``A``)``^T``/det(``A``) $(br)

adj(``A``) = det(``A``)``A^{-1}`` = cofactor(``A``)``^T`` $(br)

cofactor(``A``) = ``A^{-T}``/det(``A``) = adj(``A``)``^{T}``

"""

4.2 ms

The adjugate is the transpose of the cofactor matrix. You may remember that the cofactor matrix has as (i,j) entry $(-1)^{i+j}$ times the determinant obtained by deleting row i and column j.

md"""

The adjugate is the transpose of the cofactor matrix.

You may remember that the cofactor matrix has as (i,j) entry ``(-1)^{i+j}`` times the determinant obtained by deleting row i and column j.

"""

209 μs

Symbolics.Num

$$\begin{equation} a \end{equation}$$

$$\begin{equation} b \end{equation}$$

$$\begin{equation} c \end{equation}$$

$$\begin{equation} d \end{equation}$$

@variables

22.1 ms

$$\begin{equation} \left[ \begin{array}{cc} a & c \\ b & d \\ \end{array} \right] \end{equation}$$

M = [

;

]

18.3 μs

$$\begin{equation} \left[ \begin{array}{cc} d & - b \\ - c & a \\ \end{array} \right] \end{equation}$$

simplify.(

cofactor

(

))

10.6 s

$$\begin{equation} \left[ \begin{array}{cc} d & - c \\ - b & a \\ \end{array} \right] \end{equation}$$

simplify.(

adj

(

))

44.3 ms

$$\begin{equation} \left[ \begin{array}{cc} \frac{d}{a d - b c} & \frac{ - c}{a d - b c} \\ \frac{ - b}{a d - b c} & \frac{a}{a d - b c} \\ \end{array} \right] \end{equation}$$

simplify.(inv(

))

369 ms

Numerical Demonstration

md"""

## Numerical Demonstration

"""

3.7 ms

9×9 Matrix{Float64}:
 9.84997e-6  3.27962e-6  3.68515e-6  …  1.31594e-6  8.59009e-6  5.59433e-6
 6.58166e-6  9.49231e-6  7.67381e-6     8.88176e-6  8.25997e-7  7.52597e-6
 6.70657e-6  2.578e-6    8.35539e-6     1.98348e-6  8.75045e-6  6.93709e-6
 2.44452e-6  3.84697e-6  9.332e-6       7.35059e-6  1.4179e-6   2.84029e-6
 1.90982e-6  3.21393e-6  5.36157e-6     3.90574e-6  2.79935e-6  6.62397e-6
 5.20741e-6  9.23777e-7  7.61258e-6  …  8.47971e-6  5.19634e-6  1.73769e-6
 3.78462e-6  5.52636e-7  9.17756e-6     8.67793e-6  5.85933e-6  8.39804e-6
 9.97994e-6  7.38427e-6  8.06906e-6     5.13478e-6  9.34815e-6  2.17865e-6
 7.46987e-6  6.87991e-6  6.53469e-6     3.4723e-6   4.03916e-6  9.32723e-6

begin

n = 9

A = rand(

); dA = rand(

) * .00001

end

56.4 μs

-1.01992e-6

-1.01993e-6

det(

) - det(

), tr(

adj

(

) *

)

147 μs

Forward Diff Autodiff

md"""

## Forward Diff Autodiff

"""

145 μs

9×9 Matrix{Float64}:
 -0.0254462     0.0750306    0.0256232   …   0.0128037     0.00625252   -0.0146315
  0.0392959    -0.0659637    0.00465413     -0.0378384    -0.00419778    0.0321942
  0.0107266    -0.0722348   -0.0137141      -0.0293705     0.000304564   0.0236944
  0.00138368   -0.015074    -0.0132784       0.0103984     0.0082165    -0.00174828
  0.0157866    -0.00764511   0.0334501      -0.000560729  -0.0247453    -0.00310108
  0.0181448    -0.0610458   -0.00610382  …  -0.0353692     0.0179075     0.0156312
 -0.0554795     0.156881    -0.0144352       0.0659222     0.00133675   -0.0340218
  0.0235991    -0.0512598    0.0203016      -0.0102284    -0.0143283     0.0191255
  0.000760144   0.00688254  -0.0554133       0.022568      0.01269      -0.0365086

ForwardDiff

.gradient(A->det(

)

983 ms

9×9 adjoint(::Matrix{Float64}) with eltype Float64:
 -0.0254462     0.0750306    0.0256232   …   0.0128037     0.00625252   -0.0146315
  0.0392959    -0.0659637    0.00465413     -0.0378384    -0.00419778    0.0321942
  0.0107266    -0.0722348   -0.0137141      -0.0293705     0.000304564   0.0236944
  0.00138368   -0.015074    -0.0132784       0.0103984     0.0082165    -0.00174828
  0.0157866    -0.00764511   0.0334501      -0.000560729  -0.0247453    -0.00310108
  0.0181448    -0.0610458   -0.00610382  …  -0.0353692     0.0179075     0.0156312
 -0.0554795     0.156881    -0.0144352       0.0659222     0.00133675   -0.0340218
  0.0235991    -0.0512598    0.0203016      -0.0102284    -0.0143283     0.0191255
  0.000760144   0.00688254  -0.0554133       0.022568      0.01269      -0.0365086

cofactor

(

)

39.2 μs

Reverse Mode Autodiff

md"""

## Reverse Mode Autodiff

"""

143 μs

Zygote does reverse ad

md"""

Zygote does reverse ad

"""

225 μs

9×9 Matrix{Float64}:
 -0.0254462     0.0750306    0.0256232   …   0.0128037     0.00625252   -0.0146315
  0.0392959    -0.0659637    0.00465413     -0.0378384    -0.00419778    0.0321942
  0.0107266    -0.0722348   -0.0137141      -0.0293705     0.000304564   0.0236944
  0.00138368   -0.015074    -0.0132784       0.0103984     0.0082165    -0.00174828
  0.0157866    -0.00764511   0.0334501      -0.000560729  -0.0247453    -0.00310108
  0.0181448    -0.0610458   -0.00610382  …  -0.0353692     0.0179075     0.0156312
 -0.0554795     0.156881    -0.0144352       0.0659222     0.00133675   -0.0340218
  0.0235991    -0.0512598    0.0203016      -0.0102284    -0.0143283     0.0191255
  0.000760144   0.00688254  -0.0554133       0.022568      0.01269      -0.0365086

Zygote

.gradient(A->det(

)

7.2 s

Symbolic Demonstration

md"""

## Symbolic Demonstration

"""

143 μs

$$\begin{equation} \left[ \begin{array}{cc} d & - b \\ - c & a \\ \end{array} \right] \end{equation}$$

reshape(

Symbolics

.gradient(det(

),vec(

)), 2, 2)

616 ms

$$\begin{equation} \left[ \begin{array}{cc} d & - b \\ - c & a \\ \end{array} \right] \end{equation}$$

simplify.(

adj

(

'))

7.4 ms

Direct Proof

A direct proof where you just differentiate the scalar with respect to every input can be obtained as a simple consequence from the cofactor expansion a.k.a. the Laplace expansion of the determinant based on the $i$th row.

$\det(A) = A_{i1} C_{i1} + A_{i2}C_{i2} + \ldots + A_{in}C_{in}$

Recall that the determinant is a linear (affine) function of any element with slope equal to the cofactor.

We then readily obtain $\frac{\partial \det(A)}{\partial A_{ij}}=C_{ij}$ from which we conclude $\nabla (\det A ) = C$.

Example:

md"""

# Direct Proof

A direct proof where you just differentiate the scalar with respect to every input

can be obtained as a simple consequence from the cofactor expansion a.k.a. the [Laplace expansion](https://en.wikipedia.org/wiki/Laplace_expansion) of the determinant based on the ``i``th row.

``\det(A) = A_{i1} C_{i1} + A_{i2}C_{i2} + \ldots + A_{in}C_{in}``

Recall that the determinant is a linear (affine) function of any element with slope equal to the cofactor.

We then readily obtain

``\frac{\partial \det(A)}{\partial A_{ij}}=C_{ij}`` from which we conclude

``\nabla (\det A ) = C``.

$(br) **Example:**

"""

37.1 ms

$$\begin{equation} \left[ \begin{array}{ccc} 4 & 1 & 5 \\ a & 4 & 5 \\ 6 & 7 & 8 \\ \end{array} \right] \end{equation}$$

begin

i,j = 2,1

Z= [Num(4) 1 5; 3 4 5 ; 6 7 8]

[

,j] =

end

7.5 ms

$$\begin{equation} 18 + 5 \left( -24 + 7 a \right) - 8 a \end{equation}$$

det(

)

54.6 ms

$$\begin{equation} 27.0 \end{equation}$$

simplify(

cofactor

(

)[

] )

2.2 s

Fancy Proof

Figure out linearization near the identity I

$\det(I+dA)- 1 = \text{tr}(dA)$ (think of the $n!$ terms in the determinant and drop higher terms)

md"""

# Fancy Proof

Figure out linearization near the identity I $(br)

``\det(I+dA)- 1 = \text{tr}(dA)`` (think of the ``n!`` terms in the determinant and drop higher terms) $(br)

"""

329 μs

$\det(A+ A(A^{-1}dA)) - \det(A) = \det(A)\det(I+A^{-1}dA) -\det(A) =$ $\det(A) \text{tr}( A^{-1}dA) = \text{tr}(\det(A) A^{-1}dA)$

md"""

``\det(A+ A(A^{-1}dA)) - \det(A) = \det(A)\det(I+A^{-1}dA) -\det(A) = ``

``\det(A) \text{tr}( A^{-1}dA) =

\text{tr}(\det(A) A^{-1}dA)``

"""

168 μs

Application to derivative of the characteristic polynomial evaluated at $x$

$p(x) = \det(xI-A)$, a scalar function of $x$

Recall that $p(x)$ can be written in terms of the eigenvalues $\lambda_i$:

md"""

## Application to derivative of the characteristic polynomial evaluated at ``x``

``p(x) = \det(xI-A)``, a scalar function of ``x``

Recall that ``p(x)`` can be written in terms of the eigenvalues ``\lambda_i``:

"""

309 μs

Direct derivative (freshman calculus):

$\frac{d}{dx} \prod_i (x-\lambda_i) \ = \ \sum_i \prod_{j\ne i} (x-\lambda _j) = \prod_i (x-\lambda_i) \left\{ \sum_i (x-\lambda_i)^{-1} \right\}$ (as long as $x \ne \lambda_i$)

Perfectly good simple proof, but if you want to show off...

md"""

### Direct derivative (freshman calculus):

``\frac{d}{dx} \prod_i (x-\lambda_i) \ = \ \sum_i \prod_{j\ne i} (x-\lambda _j) =

\prod_i (x-\lambda_i) \left\{ \sum_i (x-\lambda_i)^{-1} \right\}``

(as long as ``x \ne \lambda_i``)

$(br)

Perfectly good simple proof, but if you want to show off...

"""

318 μs

With our new technology:

$d(\det (xI-A))= \det(xI-A) \text{tr}( (xI-A)^{-1} d(xI-A) )$
$\hspace{1.2in}= \det(xI-A) \text{tr}(xI-A)^{-1} dx$

Note: $d(xI-A)=dxI$ when $A$ is constant and
$\text{tr}(Adx) = \text{tr}(A)dx$ since $dx$ is a scalar.

md"""

### With our new technology:

``d(\det (xI-A))= \det(xI-A) \text{tr}( (xI-A)^{-1} d(xI-A) ) ``$(br)

`` \hspace{1.2in}= \det(xI-A) \text{tr}(xI-A)^{-1} dx``

Note: ``d(xI-A)=dxI`` when ``A`` is constant and $(br) ``\text{tr}(Adx) = \text{tr}(A)dx`` since ``dx`` is a scalar.

"""

284 μs

Check:

md"""

### Check:

"""

118 μs

-5.42182

-5.42182

begin

x = rand()

ForwardDiff

.derivative( x-> det(

*I-

), # Automatic Differentiation

det(

*I-

)*tr(inv(

*I-

)) # Analytic Answer

end

214 ms

Application d(log(det(A)))

$d(\log(\det(A))) = \frac{d(\det(A))}{\det A} = \det(A^{-1}) d(\det(A)) = \text{tr}( A^{-1} dA)$

The logarithmic derivative shows up a lot in applied mathematics. For example the key term in Newton's method: $f(x)/f'(x)$ may be written $1/ \left\{ \log f(x) \right\}'$

md"""

## Application d(log(det(A)))

``d(\log(\det(A))) = \frac{d(\det(A))}{\det A} = \det(A^{-1}) d(\det(A)) =

\text{tr}( A^{-1} dA) `` $(br)

The logarithmic derivative shows up a lot in applied mathematics. For example the key term in Newton's method: ``f(x)/f'(x)`` may be written ``1/ \left\{ \log f(x) \right\}'``

"""

311 μs

Math: The Jacobian of the Inverse

$A^{-1}A= I \rightarrow d(A^{-1}A)=0=d(A^{-1})A + A^{-1}dA$ from the product rule
$\rightarrow d(A^{-1}) = -A^{-1} (dA) A^{-1} = -(A^{-T} \otimes A^{-1}) dA$

md"""

# Math: The Jacobian of the Inverse

``A^{-1}A= I \rightarrow d(A^{-1}A)=0=d(A^{-1})A + A^{-1}dA`` from the product rule $(br)

``\rightarrow d(A^{-1}) = -A^{-1} (dA) A^{-1} = -(A^{-T} \otimes A^{-1}) dA``

"""

260 μs

16×16 Matrix{Float64}:
 -3.88325   -3.02397     5.69787    1.98243  …   1.39286    -2.62447   -0.913117
  1.94632    5.55259    -7.09678   -3.76388     -2.55755     3.26882    1.73367
 -2.595     -5.94282     3.09885    4.87364      2.7373     -1.42735   -2.24483
  1.78865   -0.190301    0.690036  -2.211        0.0876539  -0.317834   1.0184
 -3.02397   -2.35484     4.43706    1.54376     -0.148192    0.279228   0.0971501
  1.51564    4.32392    -5.52642   -2.93102  …   0.272108   -0.347783  -0.184452
 -2.02078   -4.6278      2.41315    3.79522     -0.291232    0.151862   0.238836
  ⋮                                          ⋱                          ⋮
  3.80763    8.71986    -4.54693   -7.15107  …   1.05601    -0.550652  -0.866024
 -2.62447    0.279228   -1.01248    3.24419      0.0338157  -0.122616   0.392885
  1.98243    1.54376    -2.9088    -1.01204     -1.72176     3.24419    1.12873
 -0.993608  -2.83463     3.62295    1.92149      3.16147    -4.04068   -2.14304
  1.32476    3.03385    -1.58199   -2.48803     -3.38366     1.76439    2.7749
 -0.913117   0.0971501  -0.352268   1.12873  …  -0.108352    0.392885  -1.25888

begin

C = rand(4,4)

ForwardDiff

.jacobian(A->inv(

)

end

804 ms

16×16 Matrix{Float64}:
 -3.88325   -3.02397     5.69787    1.98243  …   1.39286    -2.62447   -0.913117
  1.94632    5.55259    -7.09678   -3.76388     -2.55755     3.26882    1.73367
 -2.595     -5.94282     3.09885    4.87364      2.7373     -1.42735   -2.24483
  1.78865   -0.190301    0.690036  -2.211        0.0876539  -0.317834   1.0184
 -3.02397   -2.35484     4.43706    1.54376     -0.148192    0.279228   0.0971501
  1.51564    4.32392    -5.52642   -2.93102  …   0.272108   -0.347783  -0.184452
 -2.02078   -4.6278      2.41315    3.79522     -0.291232    0.151862   0.238836
  ⋮                                          ⋱                          ⋮
  3.80763    8.71986    -4.54693   -7.15107  …   1.05601    -0.550652  -0.866024
 -2.62447    0.279228   -1.01248    3.24419      0.0338157  -0.122616   0.392885
  1.98243    1.54376    -2.9088    -1.01204     -1.72176     3.24419    1.12873
 -0.993608  -2.83463     3.62295    1.92149      3.16147    -4.04068   -2.14304
  1.32476    3.03385    -1.58199   -2.48803     -3.38366     1.76439    2.7749
 -0.913117   0.0971501  -0.352268   1.12873  …  -0.108352    0.392885  -1.25888

-kron(inv(

'),inv(

))

27.8 μs