Warmup:Differentiating on the unit sphere in n dimensions
Geometrically, we all know that velocity vectors (equivalently tangents) on the sphere are orthogonal to radii. Our differentials say this algebraically:
-9.676404091203052e-13
Since
This is our first example where we have seen the infinitesimal perturbation
Special case: a circle
Let us consider simply the circle in the plane.
We can think of
Suppose
Now we wish to restrict to the sphere.
On the sphere
You may remember that
In particular if
It follows that if
What did we just do?
To get the gradient we needed two things:
A linearization of the function that is correct on tangents and
A direction that is tangent (satisifes the linearized constraint)
Gradient of a general scalar function on the sphere:
Project the unconstrainted gradient to the sphere to get the constrained gradient. It is the direction of maximal increase on the sphere.
Differentiating nxn orthogonal matrices (the orthogonal group)
5×5 Matrix{Float64}:
-2.33231e-6 -0.0615623 -0.138078 -0.316197 -0.586192
0.0615663 -3.55588e-6 -0.471466 0.414144 0.559983
0.138083 0.471459 -1.92691e-5 1.89733 -0.11238
0.316198 -0.414156 -1.89733 -2.10233e-5 0.577319
0.58619 -0.55998 0.112393 -0.577321 -5.01563e-6
Do you see the structure?
Q^TdQ is anti-symmetric (sometimes called skew-symmetric).
(If
Proof: The constraint of being orthogonal is
What is the dimension of the "surface" of orthogonal matrices in the dimensional , n by n matrix space?
For example when n=2 we have rotations (and reflections). Rotations have the form
When n=2 we have one parameter.
When n=3, airplane pilots know about "roll, pitch, and yaw" and these are three parameters.
For general
A few ways to see that:
n^2 free parameters, orthogonality
imposes n(n+1)/2 constraints leaving
When we do QR, the R "eats" up n(n+1)/2 parameters leaving n(n-1)/2 for Q.
Think about the symmetric eigenvalue problem: S = QΛQᵀ.
S has n(n+1)/2 and Λ has n, leaving n(n-1)/2 for Q.
Think about the singular value decomposition. A = UΣVᵀ
A has n^2, and Σ has n, leaving n(n-1) to be split evenly for the orthogonal matrices U and V.
Differentiating the Symmetric Eigendecomposition
Exercise: Check that the left and right side of the above are both symmetric.
10×5 Matrix{Float64}:
-1.12888e-6 -4.89005e-6 3.80776e-5 -1.75771e-5 -9.48582e-6
-4.89005e-6 6.05826e-6 -4.70447e-6 2.28823e-6 -2.71523e-5
3.80776e-5 -4.70447e-6 1.86158e-5 -2.90847e-5 -1.08122e-5
-1.75771e-5 2.28823e-6 -2.90847e-5 1.56049e-5 -8.56316e-6
-9.48582e-6 -2.71523e-5 -1.08122e-5 -8.56316e-6 1.41283e-6
-1.12918e-6 -4.89002e-6 3.80764e-5 1.75793e-5 -9.48591e-6
-4.89003e-6 6.05814e-6 -4.70462e-6 -2.28813e-6 -2.71523e-5
3.80774e-5 -4.70459e-6 1.86143e-5 2.90851e-5 -1.0812e-5
-1.75769e-5 2.2882e-6 -2.9085e-5 1.56065e-5 -8.56307e-6
-9.48584e-6 -2.71523e-5 -1.08128e-5 8.56275e-6 1.41319e-6
Maybe easier if one looks at the diagonal entries on their own:
Hence
Sometimes we think of a curve of matrices
How do we get the gradient
trace(
What about the eigenvectors? Those come from the off-diagonal elements:
It is interesting to get the second derivative of eigenvalues when moving along a line in symmetric matrix space. For simplicity we'll start at a diagonal matrix
Let
Differentiating
Evaluating at
or
We can write this as a Taylor series.