using LinearAlgebra

, PlutoUI

338 ms

TableOfContents(title="Symmetric Eigenvalue Derivatives", indent=true, depth=4, aside=true)

7.7 μs

Warmup:Differentiating on the unit sphere in n dimensions

Geometrically, we all know that velocity vectors (equivalently tangents) on the sphere are orthogonal to radii. Our differentials say this algebraically:

md"""

## Warmup:Differentiating on the unit sphere in n dimensions

Geometrically, we all know that velocity vectors (equivalently tangents) on the sphere are orthogonal to radii. Our differentials say this algebraically:

"""

6.4 μs

-9.676404091203052e-13

begin

x = randn(5)

dx= .000001 * randn(5)

q = normalize(x) # make x a unit vector x/norm(x)

dq = normalize(x+dx)-q # make (x+dx) a unit vector and subtract from q

q'dq

end

11.6 μs

Since $x^{T} x = 1$ , we have that
$2 x^{T} d x = d (1) = 0$ , which says that at the point $x$ on the sphere (a radius, if you will), $d x$ , the linearization of the constraint of moving along the sphere satisfies $d x ⊥ x$ (dot product is 0).

This is our first example where we have seen the infinitesimal perturbation $d x$ being constrained.

md"""

Since ``x^Tx=1``, we have that $(br)

``2x^Tdx=d(1)=0``, which says that at the point ``x`` on the sphere (a radius, if you will), ``dx``,

the linearization of the constraint of moving along the sphere satisfies ``dx \perp x``

(dot product is 0).

This is our first example where we have seen the infinitesimal perturbation ``dx`` being constrained.

"""

9.9 μs

Special case: a circle

Let us consider simply the circle in the plane.

$x = (\cos θ, \sin θ)$
$x^{T} d x = (\cos θ, \sin θ) \cdot (- \sin θ, \cos θ) d θ = 0$

We can think of $x$ as "extrinsic" coordinates, in that it is a vector in $R^{2} .$ On the other hand $θ$ is an "intrinsic" coordinate, every point on the circle is specified by one $θ$ .

md"""

## Special case: a circle

Let us consider simply the circle in the plane.

``x = (\cos \theta, \sin \theta)`` $(br)

``x^T dx = (\cos \theta, \sin \theta) \cdot (-\sin \theta, \cos \theta) d\theta= 0``

We can think of ``x`` as "extrinsic" coordinates, in that it is a vector in ``R^2.``

On the other hand ``\theta`` is an "intrinsic" coordinate, every point on the circle is specified by one ``\theta``.

"""

9.6 μs

Enter cell code...

56 ns

Suppose $A$ is symmetric. We then know that if we allow general $d x$ then
$d (\frac{1}{2} x^{T} A x) = (A x)^{T} d x$ and we would conclude $A x$ is the gradient.
Now we wish to restrict to the sphere.

On the sphere

You may remember that $I - x x^{T}$ is a Projection Matrix ( meaning that its equal to its square and it is symmetric). Geometrically the matrix removes components in the $x$ direction.
In particular if $x^{T} d x = 0$ , $(I - x x^{T}) d x = d x$ .

It follows that if $x^{T} d x = 0$ then $x^{T} A (d x) = x^{T} A (I - x x^{T}) d x = ((I - x x^{T}) A x)^{T} d x$ so that $(I - x x^{T}) A x$ is the gradient of $\frac{1}{2} x^{T} A x$ on the sphere.

What did we just do?

To get the gradient we needed two things:

A linearization of the function that is correct on tangents and
A direction that is tangent (satisifes the linearized constraint)

Gradient of a general scalar function on the sphere:

$d f = g (x)^{T} d x = ((I - x x^{T}) g (x))^{T} d x$

Project the unconstrainted gradient to the sphere to get the constrained gradient. It is the direction of maximal increase on the sphere.

md"""

Suppose ``A`` is symmetric. We then know that if we allow general ``dx`` then

$(br) ``d(\frac{1}{2}x^TAx)= (Ax)^Tdx `` and we would conclude ``Ax`` is the gradient.

$(br) Now we wish to restrict to the sphere. $(br)

### On the sphere

You may remember that ``I-xx^T`` is a *Projection Matrix* ( meaning that its equal to its square and it is symmetric). Geometrically the matrix removes components in the ``x`` direction. $(br) In particular if ``x^Tdx=0``, ``(I-xx^T)dx=dx``.

It follows that if ``x^Tdx=0`` then ``x^TA(dx) = x^TA(I-xx^T)dx = ((I-xx^T)Ax)^T dx``

so that $(I-xx^T)Ax$ is the gradient of ``\frac{1}{2}x^TAx`` on the sphere.

### What did we just do?

To get the gradient we needed two things:

* A linearization of the function that is correct on tangents and

* A direction that is tangent (satisifes the linearized constraint)

### Gradient of a general scalar function on the sphere:

``df= g(x)^T dx = ((I-xx^T)g(x))^Tdx``

Project the unconstrainted gradient to the sphere to get the constrained gradient. It is the direction of maximal increase on the sphere.

"""

17.7 μs

Differentiating nxn orthogonal matrices (the orthogonal group)

md"""

# Differentiating nxn orthogonal matrices (the orthogonal group)

"""

5.4 μs

5×5 Matrix{Float64}:
 -2.33231e-6  -0.0615623   -0.138078    -0.316197    -0.586192
  0.0615663   -3.55588e-6  -0.471466     0.414144     0.559983
  0.138083     0.471459    -1.92691e-5   1.89733     -0.11238
  0.316198    -0.414156    -1.89733     -2.10233e-5   0.577319
  0.58619     -0.55998      0.112393    -0.577321    -5.01563e-6

begin

A = randn(5,5)

dA = .00001 * randn(5,5)

Q = qr(A).Q

dQ = qr(A+dA).Q - Q

(Q'dQ)/.00001

end

86.9 μs

Do you see the structure?

Q^TdQ is anti-symmetric (sometimes called skew-symmetric).

(If $M = - M^{T}$ , we say that $M$ is anti-symmetric. Note all anti-symmetric have 0 diagonally)

Proof: The constraint of being orthogonal is $Q^{T} Q = I$ so differentiating, $Q^{T} d Q + d Q^{T} Q = 0$ which is the same as saying $(Q^{T} d Q) + (Q^{T} d Q)^{T} = 0$ , but this is the equation for being antisymmetric.

md"""

Do you see the structure?

Q^TdQ is anti-symmetric (sometimes called skew-symmetric).

(If ``M=-M^T``, we say that ``M`` is anti-symmetric. Note all anti-symmetric

have 0 diagonally)

Proof: The constraint of being orthogonal is ``Q^TQ=I``

so differentiating, ``Q^TdQ + dQ^TQ=0`` which is the same as

saying `` (Q^TdQ) + (Q^TdQ)^T = 0``, but this is the equation

for being antisymmetric.

"""

9.4 μs

What is the dimension of the "surface" of orthogonal matrices in the $n^{2}$ dimensional , n by n matrix space?

For example when n=2 we have rotations (and reflections). Rotations have the form
$Q = (\begin{matrix} \cos θ & \sin θ \\ - \sin θ & \cos θ \end{matrix})$

When n=2 we have one parameter.

When n=3, airplane pilots know about "roll, pitch, and yaw" and these are three parameters.

For general $n$ the answer is $n (n - 1) / 2.$

A few ways to see that:

n^2 free parameters, orthogonality $Q^{T} Q = I$ imposes n(n+1)/2 constraints leaving

$n (n - 1) / 2$ free parameters.

When we do QR, the R "eats" up n(n+1)/2 parameters leaving n(n-1)/2 for Q.
Think about the symmetric eigenvalue problem: S = QΛQᵀ.

S has n(n+1)/2 and Λ has n, leaving n(n-1)/2 for Q.

Think about the singular value decomposition. A = UΣVᵀ

A has n^2, and Σ has n, leaving n(n-1) to be split evenly for the orthogonal matrices U and V.

md"""

## What is the dimension of the "surface" of orthogonal matrices in the ``n^2`` dimensional , n by n matrix space?

For example when n=2 we have rotations (and reflections). Rotations have the form

$(br)

``Q = \begin{pmatrix} \cos \theta & \sin \theta \\ -\sin \theta & \cos \theta \end{pmatrix} ``

When n=2 we have one parameter.

When n=3, airplane pilots know about "roll, pitch, and yaw" and these are three parameters.

For general ``n`` the answer is ``n(n-1)/2.``

A few ways to see that:

* n^2 free parameters, orthogonality ``Q^TQ=I`` imposes n(n+1)/2 constraints leaving

``n(n-1)/2`` free parameters.

* When we do QR, the R "eats" up n(n+1)/2 parameters leaving n(n-1)/2 for Q.

* Think about the symmetric eigenvalue problem: S = QΛQᵀ.

S has n(n+1)/2 and Λ has n, leaving n(n-1)/2 for Q.

* Think about the singular value decomposition. A = UΣVᵀ

A has n^2, and Σ has n, leaving n(n-1) to be split evenly for the

orthogonal matrices U and V.

"""

20.8 μs

Differentiating the Symmetric Eigendecomposition

md"""

## Differentiating the Symmetric Eigendecomposition

"""

4.9 μs

$S = Q Λ Q^{T}$ is the eigendecomposition of a symmetric $S$ with $Λ$ diagonal containing eigenvalues, and $Q$ othogonal with columns as eigenvectors.

$d S = d Q Λ Q^{T} + Q d Λ Q^{T} + Q Λ d Q^{T}$ which may be written $Q^{T} d S Q = Q^{T} d Q Λ - Λ Q^{T} d Q + d Λ$

Exercise: Check that the left and right side of the above are both symmetric.

md"""

`` S = Q \Lambda Q^T`` is the eigendecomposition of a symmetric ``S`` with ``\Lambda`` diagonal containing eigenvalues, and ``Q`` othogonal with columns as eigenvectors.

$(br)

``dS = dQ \Lambda Q^T + Q d\Lambda Q^T + Q Λ dQ^T`` which may be written

``Q^T dS Q = Q^T dQ \Lambda - \Lambda Q^T dQ + d\Lambda``

$(br)

Exercise: Check that the left and right side of the above are both symmetric.

"""

9.8 μs

10×5 Matrix{Float64}:
 -1.12888e-6  -4.89005e-6   3.80776e-5  -1.75771e-5  -9.48582e-6
 -4.89005e-6   6.05826e-6  -4.70447e-6   2.28823e-6  -2.71523e-5
  3.80776e-5  -4.70447e-6   1.86158e-5  -2.90847e-5  -1.08122e-5
 -1.75771e-5   2.28823e-6  -2.90847e-5   1.56049e-5  -8.56316e-6
 -9.48582e-6  -2.71523e-5  -1.08122e-5  -8.56316e-6   1.41283e-6
 -1.12918e-6  -4.89002e-6   3.80764e-5   1.75793e-5  -9.48591e-6
 -4.89003e-6   6.05814e-6  -4.70462e-6  -2.28813e-6  -2.71523e-5
  3.80774e-5  -4.70459e-6   1.86143e-5   2.90851e-5  -1.0812e-5
 -1.75769e-5   2.2882e-6   -2.9085e-5    1.56065e-5  -8.56307e-6
 -9.48584e-6  -2.71523e-5  -1.08128e-5   8.56275e-6   1.41319e-6

let

A = randn(5,5)

dA = .00001 * randn(5,5)

S = A+A' # symmetrize A

dS = dA + dA' # symmetrize dA

Λ,Q = eigen(Symmetric(S))

Λ₁,Q₁ = eigen(Symmetric(S+dS))

dQ = Q₁-Q

dΛ = Λ₁-Λ

[Q'*dS*Q ; diagm(dΛ) + Q'dQ*diagm(Λ)-diagm(Λ)*Q'dQ]

end

45.2 μs

Maybe easier if one looks at the diagonal entries on their own:

$(Q^{T} d S Q)_{i i} = q_{i}^{T} d S q_{i},$ where $q_{i}$ is the ith eigenvector.
Hence $q_{i}^{T} d S q_{i} = d λ_{i} .$

Sometimes we think of a curve of matrices $S (t)$ depending on a parameter such as time. If we ask for $\frac{d λ_{i}}{d t}$ we have that it equals $q_{i}^{T} \frac{d S (t)}{d t} q_{i}$ .

How do we get the gradient $\nabla λ_{i}$ of one eigenvalue $λ_{i}$ ?

trace( $(q_{i} q_{i}^{T})^{T} d S$ ) $= d λ_{i}$ , thus we instantly see that $\nabla λ_{i} = q_{i} q_{i}^{T}$

md"""

Maybe easier if one looks at the diagonal entries on their own: $(br)

``(Q^T dS Q)_{ii} = q_i^T dS q_i,`` where ``q_i`` is the ith eigenvector. $(br)

Hence ``q_i^T dS q_i = d\lambda_i.``

Sometimes we think of a curve of matrices ``S(t)`` depending on a parameter such as time. If we ask for ``\frac{d\lambda_i}{dt}`` we have that it equals ``q_i^T \frac{dS(t)}{dt} q_i``.

How do we get the gradient ``\nabla \lambda_i`` of one eigenvalue ``\lambda_i``?

trace(``(q_i q_i^T)^T dS``) ``= d\lambda_i``, thus we instantly see

that ``\nabla \lambda_i = q_i q_i^T``

"""

11.0 μs

What about the eigenvectors? Those come from the off-diagonal elements:
$(Q^{T} d S Q)_{i j} = (Q^{T} \frac{d Q}{d t})_{i j} (λ_{j} - λ_{i}),$ if $(i \neq j)$ , so we can form the elements of $Q^{T} \frac{d Q}{d t}$ (remember the diagonal is 0), and left multiply by $Q$ to obtain $\frac{d Q}{d t} .$

md"""

What about the eigenvectors? Those come from the off-diagonal elements:

$(br)

``(Q^T dS Q)_{ij} = (Q^T \frac{dQ}{dt})_{ij}(\lambda_j-\lambda_i),`` if ``(i\ne j)``,

so we can form the elements of `` Q^T \frac{dQ}{dt}`` (remember the diagonal is 0),

and left multiply by ``Q`` to obtain ``\frac{dQ}{dt}.``

"""

6.4 μs

It is interesting to get the second derivative of eigenvalues when moving along a line in symmetric matrix space. For simplicity we'll start at a diagonal matrix $Λ$ .

Let $S (t) = Λ + t E$ .

Differentiating $\frac{d Λ}{d t} = d i a g (Q^{T} \frac{d S}{d t} Q)$ we get $\frac{d^{2} Λ}{d t^{2}} = d i a g (Q^{T} \frac{d^{2} S}{d t^{2}} Q) + 2 d i a g (Q^{T} \frac{d S}{d t} \frac{d Q}{d t}) .$

md"""

It is interesting to get the second derivative of eigenvalues when moving along a line in symmetric matrix space. For simplicity we'll start at a diagonal matrix ``\Lambda``.

Let ``S(t)= \Lambda + t E``.

Differentiating ``\frac{d\Lambda}{dt} = diag( Q^T \frac{dS}{dt} Q)`` we get

``\frac{d^2\Lambda}{dt^2} = diag( Q^T \frac{d^2S}{dt^2} Q) + 2 diag(Q^T \frac{dS}{dt} \frac{dQ}{dt}).``

"""

6.5 μs

Evaluating at $Q = I$ and recognizing that the first term is $0$ since we are on a line, we have

$\frac{d^{2} Λ}{d t^{2}} = 2 d i a g (E \frac{d Q}{d t})$

$\frac{d^{2} λ_{i}}{d t^{2}} = 2 \sum_{k \neq i} E_{i k}^{2} / (λ_{i} - λ_{k}) .$

md"""

Evaluating at ``Q=I`` and recognizing that the first term is $0$ since we are on a line, we have $(br)

``\frac{d^2\Lambda}{dt^2} = 2 diag( E \frac{dQ}{dt} )``

``\frac{d^2\lambda_i}{dt^2} = 2 \sum_{k \ne i} E_{ik}^2/(\lambda_i-\lambda_k).``

"""

10.9 μs

We can write this as a Taylor series.

$λ_{i} (ϵ) = λ_{i} + ϵ E_{i i} + ϵ^{2} \sum_{k \neq i} E_{i k}^{2} / (λ_{i} - λ_{k}) + \dots$

md"""

We can write this as a Taylor series.

``\lambda_i(\epsilon) = \lambda_i + \epsilon E_{ii} + \epsilon^2 \sum_{k \ne i} E_{ik}^2/(\lambda_i-\lambda_k) + \ldots ``

"""

6.1 μs

Warmup:Differentiating on the unit sphere in n dimensions

Special case: a circle

On the sphere

What did we just do?

Gradient of a general scalar function on the sphere:

Differentiating nxn orthogonal matrices (the orthogonal group)

What is the dimension of the "surface" of orthogonal matrices in the n2 dimensional , n by n matrix space?

Differentiating the Symmetric Eigendecomposition

What is the dimension of the "surface" of orthogonal matrices in the $n^{2}$ dimensional , n by n matrix space?