torchsparsegradutils.sparse_generic_lstsq
- torchsparsegradutils.sparse_generic_lstsq(A: Tensor, B: Tensor, lstsq: Callable[[Tensor, Tensor], Tensor] | None = None, transpose_lstsq: Callable[[Tensor, Tensor], Tensor] | None = None) Tensor[source]
Sparse linear least squares with sparse-aware gradients.
Solves the overdetermined problem \(\min_x \|\mathbf{A}x - \mathbf{B}\|_2^2\) where \(\mathbf{A} \in \mathbb{R}^{m\times n}\) is sparse and tall (\(m>n\)) and \(\mathbf{B} \in \mathbb{R}^{m\times p}\) is dense. Backprop preserves the sparsity pattern by returning sparse gradients for \(\mathbf{A}\) at its nonzero entries only.
We assume \(\mathbf{A}\) has full column rank so that \(\mathbf{A}^{+}\mathbf{A}=\mathbf{I}\) (with \(\,\cdot^{+}\) the Moore–Penrose pseudoinverse). Let \(\mathbf{x} \in \mathbb{R}^{n\times p}\) denote the solution and let the upstream gradient be \(\frac{\partial \mathcal{L}}{\partial \mathbf{x}} \in \mathbb{R}^{n\times p}\) for some scalar objective \(\mathcal{L}\). Using Golub & Pereyra (1973) [1f], the gradients are:
Gradient with respect to B (dense):
\[\frac{\partial \mathcal{L}}{\partial \mathbf{B}} \;=\; (\mathbf{A}^{\top})^{+} \, \frac{\partial \mathcal{L}}{\partial \mathbf{x}} \;\equiv\; \mathbf{G}_B.\]Gradient with respect to A (sparse): The dense form is
\[\frac{\partial \mathcal{L}}{\partial \mathbf{A}} \;=\; -\, \mathbf{G}_B\, \mathbf{x}^{\top} \; -\; (\mathbf{A}\,\mathbf{x} - \mathbf{B})\; \big(\mathbf{A}^{+}\, \mathbf{G}_B\big)^{\top},\]and we evaluate only the entries corresponding to nonzeros of \(\mathbf{A}\) to keep the gradient sparse. Equivalently, for a nonzero entry \(\mathbf{A}_{ij}\) with residuals \(\mathbf{r}=\mathbf{A}\,\mathbf{x}-\mathbf{B}\) and \(\mathbf{H}=\mathbf{A}^{+}\,\mathbf{G}_B\), the contribution is
\[\bigg[\frac{\partial \mathcal{L}}{\partial \mathbf{A}}\bigg]_{ij} \;=\; -\, (\mathbf{G}_B)_{i,:}\,\cdot\, \mathbf{x}_{j,:} \; -\; \mathbf{r}_{i,:}\,\cdot\, \mathbf{H}_{j,:},\]where dots denote row-wise inner products over the \(p\) right-hand sides.
- Parameters:
A (torch.Tensor) – Sparse COO/CSR tensor of shape
(m, n)withm>nand full column rank.B (torch.Tensor) – Dense RHS of shape
(m,)or(m, k)withB.shape[0] == A.shape[0].lstsq (callable, optional) – Solver
lstsq(A,B)->X((n,)or(n,k)). Default: LSMR (torchsparsegradutils.utils.lsmr()).transpose_lstsq (callable, optional) – Solver for transpose system in backward (
(A^T) Y = G). Default: LSMR onA^T.
- Returns:
Solution
Xminimizing \(\|AX - B\|_2^2\) with shape(n,)or(n,k).- Return type:
- Raises:
TypeError – If
Ais not sparse COO/CSR.ValueError – If dimension mismatch or if backward encounters non-tall
A.RuntimeError – If a provided solver fails or returns unexpected shape.
See also
SparseGenericLstsqAutograd implementation.
References
[1f]Gene H. Golub and Victor Pereyra. The Differentiation of Pseudo-Inverses and Nonlinear Least Squares Problems Whose Variables Separate. SIAM Journal on Numerical Analysis, 10(2):413-432, 1973.
Examples
>>> # Simple sparse least squares example: >>> import torch >>> from torchsparsegradutils import sparse_generic_lstsq >>> indices = torch.tensor([[0, 1, 2, 3, 4, 1, 2, 3], ... [0, 0, 0, 0, 1, 1, 1, 2]]) >>> values = torch.tensor([1.0, 2.0, 1.0, 3.0, 1.0, 2.0, 1.0, 1.0]) >>> A = torch.sparse_coo_tensor(indices, values, (5, 3)).coalesce() >>> B = torch.randn(5) >>> x = sparse_generic_lstsq(A, B) >>> x.shape torch.Size([3])
>>> # Multiple RHS: >>> Bm = torch.randn(5, 4) >>> Xm = sparse_generic_lstsq(A, Bm) >>> Xm.shape torch.Size([3, 4])
>>> # Custom solver: >>> from torchsparsegradutils.utils import lsmr >>> def my_lstsq(A_, B_): ... return lsmr(A_, B_, atol=1e-10, btol=1e-10)[0] >>> _ = sparse_generic_lstsq(A, B, lstsq=my_lstsq)
>>> # Gradients: >>> A.requires_grad_(True) tensor(...) >>> B.requires_grad_(True) tensor(...) >>> x = sparse_generic_lstsq(A, B) >>> loss = x.sum() # Simple loss to preserve sparsity >>> loss.backward() >>> A.grad.is_sparse True