torchsparsegradutils.sparse_generic_lstsq

torchsparsegradutils.sparse_generic_lstsq(A: Tensor, B: Tensor, lstsq: Callable[[Tensor, Tensor], Tensor] | None = None, transpose_lstsq: Callable[[Tensor, Tensor], Tensor] | None = None) → Tensor[source]

Sparse linear least squares with sparse-aware gradients.

Solves the overdetermined problem \(\min_x \|\mathbf{A}x - \mathbf{B}\|_2^2\) where \(\mathbf{A} \in \mathbb{R}^{m\times n}\) is sparse and tall (\(m>n\)) and \(\mathbf{B} \in \mathbb{R}^{m\times p}\) is dense. Backprop preserves the sparsity pattern by returning sparse gradients for \(\mathbf{A}\) at its nonzero entries only.

We assume \(\mathbf{A}\) has full column rank so that \(\mathbf{A}^{+}\mathbf{A}=\mathbf{I}\) (with \(\,\cdot^{+}\) the Moore–Penrose pseudoinverse). Let \(\mathbf{x} \in \mathbb{R}^{n\times p}\) denote the solution and let the upstream gradient be \(\frac{\partial \mathcal{L}}{\partial \mathbf{x}} \in \mathbb{R}^{n\times p}\) for some scalar objective \(\mathcal{L}\). Using Golub & Pereyra (1973) [1f], the gradients are:

Gradient with respect to B (dense):

\[\frac{\partial \mathcal{L}}{\partial \mathbf{B}} \;=\; (\mathbf{A}^{\top})^{+} \, \frac{\partial \mathcal{L}}{\partial \mathbf{x}} \;\equiv\; \mathbf{G}_B.\]

Gradient with respect to A (sparse): The dense form is

\[\frac{\partial \mathcal{L}}{\partial \mathbf{A}} \;=\; -\, \mathbf{G}_B\, \mathbf{x}^{\top} \; -\; (\mathbf{A}\,\mathbf{x} - \mathbf{B})\; \big(\mathbf{A}^{+}\, \mathbf{G}_B\big)^{\top},\]

and we evaluate only the entries corresponding to nonzeros of \(\mathbf{A}\) to keep the gradient sparse. Equivalently, for a nonzero entry \(\mathbf{A}_{ij}\) with residuals \(\mathbf{r}=\mathbf{A}\,\mathbf{x}-\mathbf{B}\) and \(\mathbf{H}=\mathbf{A}^{+}\,\mathbf{G}_B\), the contribution is

\[\bigg[\frac{\partial \mathcal{L}}{\partial \mathbf{A}}\bigg]_{ij} \;=\; -\, (\mathbf{G}_B)_{i,:}\,\cdot\, \mathbf{x}_{j,:} \; -\; \mathbf{r}_{i,:}\,\cdot\, \mathbf{H}_{j,:},\]

where dots denote row-wise inner products over the \(p\) right-hand sides.

Parameters:

A (torch.Tensor) – Sparse COO/CSR tensor of shape (m, n) with m>n and full column rank.
B (torch.Tensor) – Dense RHS of shape (m,) or (m, k) with B.shape[0] == A.shape[0].
lstsq (callable, optional) – Solver lstsq(A,B)->X ((n,) or (n,k)). Default: LSMR (torchsparsegradutils.utils.lsmr()).
transpose_lstsq (callable, optional) – Solver for transpose system in backward ((A^T) Y = G). Default: LSMR on A^T.

Returns:

Solution X minimizing \(\|AX - B\|_2^2\) with shape (n,) or (n,k).

Return type:

torch.Tensor

Raises:

TypeError – If A is not sparse COO/CSR.
ValueError – If dimension mismatch or if backward encounters non-tall A.
RuntimeError – If a provided solver fails or returns unexpected shape.

See also

SparseGenericLstsq: Autograd implementation.

References

[1f]

Gene H. Golub and Victor Pereyra. The Differentiation of Pseudo-Inverses and Nonlinear Least Squares Problems Whose Variables Separate. SIAM Journal on Numerical Analysis, 10(2):413-432, 1973.

Examples

>>> # Simple sparse least squares example:
>>> import torch
>>> from torchsparsegradutils import sparse_generic_lstsq
>>> indices = torch.tensor([[0, 1, 2, 3, 4, 1, 2, 3],
...                         [0, 0, 0, 0, 1, 1, 1, 2]])
>>> values = torch.tensor([1.0, 2.0, 1.0, 3.0, 1.0, 2.0, 1.0, 1.0])
>>> A = torch.sparse_coo_tensor(indices, values, (5, 3)).coalesce()
>>> B = torch.randn(5)
>>> x = sparse_generic_lstsq(A, B)
>>> x.shape
torch.Size([3])

>>> # Multiple RHS:
>>> Bm = torch.randn(5, 4)
>>> Xm = sparse_generic_lstsq(A, Bm)
>>> Xm.shape
torch.Size([3, 4])

>>> # Custom solver:
>>> from torchsparsegradutils.utils import lsmr
>>> def my_lstsq(A_, B_):
...     return lsmr(A_, B_, atol=1e-10, btol=1e-10)[0]
>>> _ = sparse_generic_lstsq(A, B, lstsq=my_lstsq)

>>> # Gradients:
>>> A.requires_grad_(True)  
tensor(...)
>>> B.requires_grad_(True)  
tensor(...)
>>> x = sparse_generic_lstsq(A, B)
>>> loss = x.sum()  # Simple loss to preserve sparsity
>>> loss.backward()
>>> A.grad.is_sparse
True