Morpho: Use a 2-Block Partial Least-Squares regression for large amount of coordinates

Given two sets of variables, a 2-Block Partial Least-Squares regression (2B-PLS) tries to find a projection into a latent space where the covariance between both sets is maximized. This is done by decomposing the block of the joint covariance matrix containing the covariation between both sets of variables, using a singular value decomposition (SVD). Given \(n\) observations, of a set of variables \(A\) and \(B\), we can write the combined data matrix, without loss of generality:

\[AB = \begin{array}{cccccc} a_{11} & \cdots & a_{1p} & b_{11} & \cdots & b_{1k} \\ \vdots & \ddots & & & & \vdots \\ \vdots & & \ddots & & & \vdots\\ \vdots & & & \ddots & & \vdots\\ a_{n1} & \cdots & a_{np} & b_{n1} & \cdots & b_{nk} \end{array}\]

With the covariance matrix \(\Sigma\) being a \(k*p \times k*p\) matrix, we will need to decompose the upper right \((k*p - k) \times (k*p-p)\) submatrix (or the lower left transposed version thereof). Explicitly calculating this matrix and its subsequent decomposition is extremely inefficient and not viable when dealing with large amounts of coordinates, very common in shape modelling. For allowing this in Morpho, I had to come up with a more elegant solution to this problem.

Solution

Be \(A\) and \(B\) mean centered variable matrices containing each \(n\) observations, the problem can then be posed as looking for the decomposition

\[\frac{A^tB}{n-1} = UDV^t\]

Singular value decompositions for each subset is cheap and we can write \(A=U_1D_1V_1^t\) and \(B=U_2D_2V_2^t\) and

\[A^tB = V_1D_1U_1^tU_2D_2V_2^t\]

With \(U_1\) being a \(k \times n\) and \(U_2\) a \(p \times n\) matrix, the decomposition of

\(D_1U_1^tU_2D_2=U_3D_3V_3^t\) is computationally cheap, only facing an \(n \times n\) matrix (there are only \(n-1\) singular values that are \(> 0\)). The diagonal elements of \(D_3\), divided by \(n-1\), are then the singular values and the basis vectors \(U\) and \(V\) can be computed as \(U = V_1U_3\) and \(V = V_3^tV_2^t\). As all are orthonormal matrices, the product is orthonormal as well. That way, a 2B-PLS (using Morpho::pls2B, with Morpho installed freshly from github) can now easily be applied to assess shape covariation for large amounts of coordinates.