In my recent work I made a connection between the theory of self-dual quaternion random matrices and Riemann-Hilbert problems. As part of the background of this research, I needed to revisit the theory of self-dual quaternion random matrices, in particular the question how to make sense of the eigenvalues of such matrices. This is not entirely self-explanatory given quaternions do not commute. In this post I hope to give an accessible explanation of this.

First let us recall basic facts about quaternions. The algebra of quaternions \(\mathbb{H}\) is the real span of 4 linearly independent elements \(1, e_1, e_2, e_3\) with the relations

\[e_1^2 = e_2^2 = e_3^2 = -1\] \[e_1 e_2 = e_3 \quad \text{ etc. by cyclic permutations}\] \[e_i e_j = -e_j e_i \quad \text{ for } i \neq j\]

It is convenient to identify these with \(2 \times 2\) matrices

\[\begin{aligned}1 \simeq \mathbb{I} = \left( \begin{matrix} 1 & 0 \\ 0 & 1 \end{matrix} \right), & & e_1 \simeq \left( \begin{matrix} i & 0 \\ 0 & -i \end{matrix} \right) \\ e_2 \simeq \left( \begin{matrix} 0 & 1 \\ -1 & 0 \end{matrix} \right), & & e_3 \simeq \left( \begin{matrix} 0 & i \\ i & 0 \end{matrix} \right)\end{aligned}\]

In what follows it will be useful to complexify the quaternions \(\mathbb{H}_{\mathbb{C}}\) so that for \(Q \in \mathbb{H}_{\mathbb{C}}\)

\[Q = q_0 \mathbb{I} + q_1 e_1 + q_2 e_2 + q_3 e_3 \quad (q_i \in \mathbb{C})\]

Definition: The dual of a quaternion \(Q= q_0 \mathbb{I} + q_1 e_1 + q_2 e_2 + q_3 e_3 \in \mathbb{H}_\mathbb{C}\) is

\[Q^\mathsf{D} = q_0 \mathbb{I} - q_1 e_1 - q_2 e_2 - q_3 e_3\]

Note that \(Q \mapsto Q^\mathsf{D}\) is a \(\mathbb{C}\)-linear (and not conjugate linear) operation.

Lemma: Using our \(2 \times 2\) matrix representation of a quaternion \(Q \in \mathbb{H}_{\mathbb{C}}\) we may write the dual

\[Q^\mathsf{D} = -e_2 Q^\mathsf{T} e_2\]

Proof: Straightfoward calculation. \(\square\)

Definition: The adjoint of a quaternion \(Q= q_0 \mathbb{I} + q_1 e_1 + q_2 e_2 + q_3 e_3 \in \mathbb{H}_\mathbb{C}\) is

\[Q^\dagger = \overline{q_0} \mathbb{I} - \overline{q_1} e_1 - \overline{q_2} e_2 - \overline{q_3} e_3\]

Note that \(Q \mapsto Q^\dagger\) is a conjugate-linear operation and given our matrix representation it is exactly the conjugate transpose of the matrix \(Q\).

Corollary: A quaternion \(Q\) is real (has real coefficients) if and only if \(Q^\dagger = Q^\mathsf{D}\), i.e. \(Q^\dagger = -e_2 Q^\mathsf{T} e_2\). Equivalently, a \(2 \times 2\) matrix \(Q\) is in the real span of \(\mathbb{I}, e_1, e_2, e_3\) if and only if \(Q^\dagger = -e_2 Q^\mathsf{T} e_2\). \(\triangle\)

We can now see the advantage of introducing \(\mathbb{H}_\mathbb{C}\) even though we are really only interested in \(\mathbb{H}\). Given an \(n \times n\) (real) quaternion matrix \(\mathcal{M}\) we identify this with a \(2n \times 2n\) matrix \(M\), and the condition that \(\mathcal{M}_{ij} = \mathcal{M}_{ji}^\mathsf{D}\) becomes the requirement that

\[M = M^\mathsf{D} = M^\dagger\]

where \(M^\mathsf{D} = - J M^\mathsf{T}J\) for \(J = \underbrace{e_2 \oplus \dots \oplus e_2}_{n \text{ times}}\).

Remark: Define the non-degenerate skew-symmetric bilinear form \(\Omega : \mathbb{C}^{2n} \times \mathbb{C}^{2n} \to \mathbb{C}\) by

\[\Omega(x,y) = x^\mathsf{T} J y\]

Then \(M = M^\mathsf{D}\) is equivalent to \(\Omega(Mx,y) = \Omega(x,My)\) for all \(x,y \in \mathbb{C}^{2n}\). \(\triangle\)

Definition: The (non-compact) symplectic group \(\mathrm{Sp}(n)\) is the group of \(2n \times 2n\) matrices \(U\) for which \(\Omega(Ux,Uy) = \Omega(x,y)\) for all \(x,y \in \mathbb{C}^{2n}\). The (compact) symplectic group is \(\mathrm{USp}(n)=\mathrm{Sp}(n) \cap \mathrm{U}(2n)\). \(\triangle\)

It is easily seen that for \(U \in \mathrm{USp}(n)\), \(U^{-1}= U^\mathsf{D} = U^\dagger\), so that \(U\) may be thought of as an \(n \times n\) matrix with real quaternion entries whose dual is its inverse. Note that \(\mathrm{USp}(n)\) is exactly the group which, acting by conjugation, preserves (real) quaternion self-duality.

Proposition (Kramers’ degeneracy): Let \(M = M^\mathsf{D}\) be a \(2n \times 2n\) matrix. Then the characteristic polynomial of \(M\) is an exact square. In particular, \(M\) has generically \(n\) eigenvalues each of multiplicity \(2\).

Proof: Because \(M = M^\mathsf{D}\) we have that \((JM)^\mathsf{T} = - JM\) and so

\[\det(\zeta \mathbb{I}- M) = \det(\zeta J- JM) = \left( \mathrm{pf}(\zeta J- JM)\right)^2\]

for \(\zeta \in \mathbb{C}\) and \(\mathrm{pf}\) being the Pfaffian. Here we have used that \(\det J = 1\). \(\square\)

Remark: Many works, including e.g. the textbooks of M. L. Mehta (Random Matrices) and P. Forrester (Log-Gases and Random Matrices), prefer to work with a so-called “quaternion determinant.” Given a self-dual \(n \times n\) quaternion matrix \(\mathcal{M}\) with \(2n \times 2n\) representative \(M\), we define the quaternion determinant

\[\mathrm{Qdet}(\mathcal{M}) = \mathrm{pf}(JM)\]

Surprisingly, there is a theorem due Dyson (see Theorem 5.1.2 of Mehta’s textbook) that shows that \(\mathrm{Qdet}\) admits a Laplace-type formula in terms of a sum over permutations (ibid, Equation 5.1.5). All of this presumes that the matrix \(M\) is self-dual, as far as I understand \(\mathrm{Qdet}\) is not defined for non-self-dual matrices. \(\triangle\)

Finally, to conclude our discussion, we must give meaning to the notion of diagonalising quaternion self-dual matrices. Let \(\mathcal{M}\) be an \(n \times n\) self-dual quaternion matrix and \(M = M^\dagger = M^\mathsf{D}\) be its \(2n \times 2n\) representative. We aim to show that \(M\) may be diagonalised by an element of \(\mathrm{USp}(n)\). Let us assume for simplicity of exposition that \(M\) has exactly \(n\) (distinct) eigenvalues \(\lambda_1, \dots, \lambda_n \in \mathbb{R}\) each of multiplicity \(2\). Let \(v_k \in \mathbb{C}^{2n}\) be an eigenvector, \(\| v_k \| = 1\), with eigenvalue \(\lambda_k\).

\[M v_k = \lambda_k v_k\]

By self-duality, \(w_k := J \overline{v_k}\) is also an eigenvector with \(\lambda_k\). \(w_k\) and \(v_k\) are linearly independent eigenvectors since \(\| w_k \| = 1\) and \(\langle w_k , v_k \rangle = w_k^\dagger v_k = v_k^\mathsf{T} J v_k = 0\). Then define the matrix

\[U = \left( \begin{matrix} \vert & \vert &\dots &\vert & \vert \\ v_1 & w_1 &\dots & v_n & w_n \\ \vert & \vert &\dots &\vert & \vert \end{matrix} \right)\]

From the construction it is clear that

\[U^{-1} M U = \mathrm{diag}(\lambda_1, \lambda_1, \dots, \lambda_n, \lambda_n)\]

so \(U\) diagonalises \(M\). Furthermore we claim \(U \in \mathrm{USp}(n)\). This can be seen from the following. Firstly, since the columns of \(U\) are orthonormal with respect to the standard Hermitian inner product on \(\mathbb{C}^{2n}\), \(U\) must be unitary (\(U^{-1} = U^\dagger\)). Secondly, again by construction \(J U J = - \overline{U} = -U^\mathsf{-T}\), and hence \(U^\mathsf{D} = U^{-1}\). This completes the proof that \(U \in \mathrm{USp}(n)\).