Edit or run this notebook
338 ms
7.7 μs

Warmup:Differentiating on the unit sphere in n dimensions

Geometrically, we all know that velocity vectors (equivalently tangents) on the sphere are orthogonal to radii. Our differentials say this algebraically:

6.4 μs
-9.676404091203052e-13
11.6 μs

Since xTx=1, we have that
2xTdx=d(1)=0, which says that at the point x on the sphere (a radius, if you will), dx, the linearization of the constraint of moving along the sphere satisfies dxx (dot product is 0).

This is our first example where we have seen the infinitesimal perturbation dx being constrained.

9.9 μs

Special case: a circle

Let us consider simply the circle in the plane.

x=(cosθ,sinθ)
xTdx=(cosθ,sinθ)(sinθ,cosθ)dθ=0

We can think of x as "extrinsic" coordinates, in that it is a vector in R2. On the other hand θ is an "intrinsic" coordinate, every point on the circle is specified by one θ.

9.6 μs
56 ns

Suppose A is symmetric. We then know that if we allow general dx then
d(12xTAx)=(Ax)Tdx and we would conclude Ax is the gradient.
Now we wish to restrict to the sphere.

On the sphere

You may remember that IxxT is a Projection Matrix ( meaning that its equal to its square and it is symmetric). Geometrically the matrix removes components in the x direction.
In particular if xTdx=0, (IxxT)dx=dx.

It follows that if xTdx=0 then xTA(dx)=xTA(IxxT)dx=((IxxT)Ax)Tdx so that (IxxT)Ax is the gradient of 12xTAx on the sphere.

What did we just do?

To get the gradient we needed two things:

  • A linearization of the function that is correct on tangents and

  • A direction that is tangent (satisifes the linearized constraint)

Gradient of a general scalar function on the sphere:

df=g(x)Tdx=((IxxT)g(x))Tdx

Project the unconstrainted gradient to the sphere to get the constrained gradient. It is the direction of maximal increase on the sphere.

17.7 μs

Differentiating nxn orthogonal matrices (the orthogonal group)

5.4 μs
5×5 Matrix{Float64}:
 -2.33231e-6  -0.0615623   -0.138078    -0.316197    -0.586192
  0.0615663   -3.55588e-6  -0.471466     0.414144     0.559983
  0.138083     0.471459    -1.92691e-5   1.89733     -0.11238
  0.316198    -0.414156    -1.89733     -2.10233e-5   0.577319
  0.58619     -0.55998      0.112393    -0.577321    -5.01563e-6
86.9 μs

Do you see the structure?

Q^TdQ is anti-symmetric (sometimes called skew-symmetric).

(If M=MT, we say that M is anti-symmetric. Note all anti-symmetric have 0 diagonally)

Proof: The constraint of being orthogonal is QTQ=I so differentiating, QTdQ+dQTQ=0 which is the same as saying (QTdQ)+(QTdQ)T=0, but this is the equation for being antisymmetric.

9.4 μs

What is the dimension of the "surface" of orthogonal matrices in the n2 dimensional , n by n matrix space?

For example when n=2 we have rotations (and reflections). Rotations have the form
Q=(cosθsinθsinθcosθ)

When n=2 we have one parameter.

When n=3, airplane pilots know about "roll, pitch, and yaw" and these are three parameters.

For general n the answer is n(n1)/2.

A few ways to see that:

  • n^2 free parameters, orthogonality QTQ=I imposes n(n+1)/2 constraints leaving

n(n1)/2 free parameters.

  • When we do QR, the R "eats" up n(n+1)/2 parameters leaving n(n-1)/2 for Q.

  • Think about the symmetric eigenvalue problem: S = QΛQᵀ.

S has n(n+1)/2 and Λ has n, leaving n(n-1)/2 for Q.

  • Think about the singular value decomposition. A = UΣVᵀ

A has n^2, and Σ has n, leaving n(n-1) to be split evenly for the orthogonal matrices U and V.

20.8 μs

Differentiating the Symmetric Eigendecomposition

4.9 μs

S=QΛQT is the eigendecomposition of a symmetric S with Λ diagonal containing eigenvalues, and Q othogonal with columns as eigenvectors.

dS=dQΛQT+QdΛQT+QΛdQT which may be written QTdSQ=QTdQΛΛQTdQ+dΛ


Exercise: Check that the left and right side of the above are both symmetric.

9.8 μs
10×5 Matrix{Float64}:
 -1.12888e-6  -4.89005e-6   3.80776e-5  -1.75771e-5  -9.48582e-6
 -4.89005e-6   6.05826e-6  -4.70447e-6   2.28823e-6  -2.71523e-5
  3.80776e-5  -4.70447e-6   1.86158e-5  -2.90847e-5  -1.08122e-5
 -1.75771e-5   2.28823e-6  -2.90847e-5   1.56049e-5  -8.56316e-6
 -9.48582e-6  -2.71523e-5  -1.08122e-5  -8.56316e-6   1.41283e-6
 -1.12918e-6  -4.89002e-6   3.80764e-5   1.75793e-5  -9.48591e-6
 -4.89003e-6   6.05814e-6  -4.70462e-6  -2.28813e-6  -2.71523e-5
  3.80774e-5  -4.70459e-6   1.86143e-5   2.90851e-5  -1.0812e-5
 -1.75769e-5   2.2882e-6   -2.9085e-5    1.56065e-5  -8.56307e-6
 -9.48584e-6  -2.71523e-5  -1.08128e-5   8.56275e-6   1.41319e-6
45.2 μs

Maybe easier if one looks at the diagonal entries on their own:

(QTdSQ)ii=qiTdSqi, where qi is the ith eigenvector.
Hence qiTdSqi=dλi.

Sometimes we think of a curve of matrices S(t) depending on a parameter such as time. If we ask for dλidt we have that it equals qiTdS(t)dtqi.

How do we get the gradient λi of one eigenvalue λi?

trace((qiqiT)TdS) =dλi, thus we instantly see that λi=qiqiT

11.0 μs

What about the eigenvectors? Those come from the off-diagonal elements:
(QTdSQ)ij=(QTdQdt)ij(λjλi), if (ij), so we can form the elements of QTdQdt (remember the diagonal is 0), and left multiply by Q to obtain dQdt.

6.4 μs

It is interesting to get the second derivative of eigenvalues when moving along a line in symmetric matrix space. For simplicity we'll start at a diagonal matrix Λ.

Let S(t)=Λ+tE.

Differentiating dΛdt=diag(QTdSdtQ) we get d2Λdt2=diag(QTd2Sdt2Q)+2diag(QTdSdtdQdt).

6.5 μs

Evaluating at Q=I and recognizing that the first term is 0 since we are on a line, we have

d2Λdt2=2diag(EdQdt)

or

d2λidt2=2kiEik2/(λiλk).

10.9 μs

We can write this as a Taylor series.

λi(ϵ)=λi+ϵEii+ϵ2kiEik2/(λiλk)+

6.1 μs
Loading...
ii