---
title: "NNLS Solver Implementation Details"
author: "Zaoqu Liu"
date: "`r Sys.Date()`"
output: 
  rmarkdown::html_vignette:
    toc: true
    toc_depth: 3
vignette: >
  %\VignetteIndexEntry{NNLS Solver Implementation Details}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.width = 7,
  fig.height = 5,
  fig.align = "center"
)
```

## Overview

CellProgramMapper implements two algorithms for solving the Non-Negative Least Squares (NNLS) problem:

1. **Coordinate Descent** (default, faster)
2. **Active Set Method** (Lawson-Hanson algorithm)

Both solvers are implemented in C++ using RcppArmadillo for optimal performance.

## The NNLS Problem

Given matrix $\mathbf{A} \in \mathbb{R}^{m \times n}$ and vector $\mathbf{b} \in \mathbb{R}^m$, NNLS solves:

$$\min_{\mathbf{x} \geq 0} \|\mathbf{A}\mathbf{x} - \mathbf{b}\|_2^2$$

## Active Set Method (Lawson-Hanson)

### Algorithm Description

The active set method partitions variables into two sets:

- **P (Positive set)**: Variables that can be positive
- **Z (Zero set)**: Variables constrained to zero

```
Algorithm: Lawson-Hanson NNLS
Input: A (m x n matrix), b (m-vector)
Output: x (n-vector) with x >= 0

1. Initialize: x = 0, P = empty, Z = {1,...,n}
2. Compute gradient: w = A'(b - Ax)
3. While Z is not empty and max(w[Z]) > 0:
   a. Move index of max(w[Z]) from Z to P
   b. Inner loop:
      - Solve unconstrained LS on P: z[P] = argmin ||A[:,P]z - b||^2
      - While any z[P] <= 0:
        * Compute step alpha to boundary
        * Update x = x + alpha(z - x)
        * Move zero indices from P to Z
   c. Update gradient: w = A'(b - Ax)
4. Return x
```

### Convergence Properties

- **Finite termination**: Guaranteed to terminate in at most $\binom{n+m}{n}$ iterations
- **Exact solution**: Returns the globally optimal solution
- **Numerical stability**: Uses QR decomposition for least squares subproblems

### Implementation

```cpp
// Simplified C++ implementation (actual code in src/nnls_solver.cpp)
arma::vec nnls_single(const arma::mat& A, const arma::vec& b, 
                      int max_iter, double tol) {
    int n = A.n_cols;
    arma::vec x = arma::zeros(n);
    arma::vec w = A.t() * b;  // Initial gradient
    
    arma::uvec P_set, Z_set = arma::regspace<arma::uvec>(0, n-1);
    
    while (Z_set.n_elem > 0 && iter < max_iter) {
        // Find index with max positive gradient in Z
        arma::uword t = Z_set(w.elem(Z_set).index_max());
        if (w(t) <= tol) break;
        
        // Move t from Z to P
        P_set = join_cols(P_set, {t});
        Z_set.shed_row(find_idx);
        
        // Inner loop: solve unconstrained subproblem
        while (true) {
            arma::vec z_P = solve(A.cols(P_set), b);
            if (all(z_P > 0)) {
                x.elem(P_set) = z_P;
                break;
            }
            // Handle negative values...
        }
        w = A.t() * (b - A * x);
    }
    return x;
}
```

## Coordinate Descent Method

### Algorithm Description

Coordinate descent optimizes one variable at a time while keeping others fixed.

For NNLS, we work with the normal equations:
$$\mathbf{Q}\mathbf{x} = \mathbf{c}$$

where $\mathbf{Q} = \mathbf{A}^\top\mathbf{A}$ and $\mathbf{c} = \mathbf{A}^\top\mathbf{b}$.

```
Algorithm: Coordinate Descent NNLS
Input: Q = A'A, c = A'b
Output: x (n-vector) with x >= 0

1. Initialize: x = 0, g = -c (gradient at x=0)
2. Repeat until convergence:
   For j = 1 to n:
     a. Compute update: delta = -g[j] / Q[j,j]
     b. Project: x_new = max(0, x[j] + delta)
     c. Update gradient: g = g + (x_new - x[j]) * Q[:,j]
     d. x[j] = x_new
3. Return x
```

### Advantages

- **Simplicity**: Easy to implement and understand
- **Efficiency**: Each iteration is $O(n)$ for gradient update
- **Precomputation**: $\mathbf{Q} = \mathbf{A}^\top\mathbf{A}$ computed once for batch processing

### Implementation

```cpp
// Simplified C++ implementation
arma::vec nnls_coord_descent(const arma::mat& AtA, const arma::vec& Atb,
                             int max_iter, double tol) {
    int n = AtA.n_cols;
    arma::vec x = arma::zeros(n);
    arma::vec grad = -Atb;  // Gradient at x=0
    
    for (int iter = 0; iter < max_iter; iter++) {
        double max_change = 0;
        
        for (int j = 0; j < n; j++) {
            if (AtA(j,j) < 1e-12) continue;  // Skip zero diagonal
            
            double old_xj = x(j);
            double new_xj = std::max(0.0, old_xj - grad(j) / AtA(j,j));
            
            if (new_xj != old_xj) {
                double delta = new_xj - old_xj;
                x(j) = new_xj;
                grad += delta * AtA.col(j);  // Efficient gradient update
                max_change = std::max(max_change, std::abs(delta));
            }
        }
        
        if (max_change < tol) break;
    }
    return x;
}
```

## Comparison

```{r comparison-table, echo=FALSE, results='asis'}
df <- data.frame(
  Aspect = c("Time per iteration", "Iterations to converge", 
             "Memory", "Convergence guarantee", "Best for"),
  ActiveSet = c("O(k^3)", "Usually fewer", "O(p*k)", 
                   "Finite", "Small k"),
  CoordinateDescent = c("O(k^2)", "Usually more", "O(k^2)", 
                           "Linear", "Large k")
)
knitr::kable(df, col.names = c("Aspect", "Active Set", "Coordinate Descent"))
```

## Benchmark

```{r benchmark, eval=TRUE, fig.cap="Solver performance comparison"}
library(CellProgramMapper)

# Benchmark function using CellProgramMapper
benchmark_solver <- function(n_cells, n_genes, n_programs, method) {
  set.seed(42)
  X <- matrix(abs(rnorm(n_cells * n_genes)), nrow = n_cells)
  colnames(X) <- paste0("Gene", 1:n_genes)
  H <- matrix(abs(rnorm(n_programs * n_genes)), nrow = n_programs)
  colnames(H) <- paste0("Gene", 1:n_genes)
  rownames(H) <- paste0("GEP", 1:n_programs)
  
  start_time <- Sys.time()
  result <- CellProgramMapper(
    query = X,
    reference = H,
    method = method,
    verbose = FALSE
  )
  end_time <- Sys.time()
  
  as.numeric(difftime(end_time, start_time, units = "secs"))
}

# Run benchmarks
n_cells_vec <- c(100, 500, 1000)
n_genes <- 200
n_programs <- 10

results <- data.frame()
for (n_cells in n_cells_vec) {
  for (method in c("cd", "active_set")) {
    time_taken <- benchmark_solver(n_cells, n_genes, n_programs, method)
    results <- rbind(results, data.frame(
      n_cells = n_cells,
      method = method,
      time = time_taken
    ))
  }
}

# Plot
par(mar = c(4, 4, 2, 1))
cd_times <- results$time[results$method == "cd"]
as_times <- results$time[results$method == "active_set"]

barplot(rbind(cd_times, as_times), beside = TRUE,
        names.arg = n_cells_vec,
        col = c("#1976d2", "#388e3c"),
        xlab = "Number of cells",
        ylab = "Time (seconds)",
        main = "NNLS Solver Performance")
legend("topleft", legend = c("Coordinate Descent", "Active Set"),
       fill = c("#1976d2", "#388e3c"), bty = "n")
```

## Choosing a Solver

Use **Coordinate Descent** (default) when:

- Processing large numbers of cells (>1000)
- Number of programs is moderate (5-50)
- Speed is the priority

Use **Active Set** when:

- High numerical precision is required
- Number of programs is small (<10)
- Guaranteed finite convergence is important

```r
# Use coordinate descent (default)
result <- CellProgramMapper(query, reference, method = "cd")

# Use active set method
result <- CellProgramMapper(query, reference, method = "active_set")
```

## Numerical Considerations

### Tolerance Parameter

The `tol` parameter controls convergence:

- Smaller values: more iterations, higher precision
- Larger values: fewer iterations, faster but less precise
- Default: `1e-8` (good balance)

### Maximum Iterations

The `max_iter` parameter prevents infinite loops:

- Default: `1000` (sufficient for most problems)
- Increase for very large programs or ill-conditioned problems

### Division by Zero Protection

Both solvers include protection against division by zero:

```cpp
if (AtA(j,j) < 1e-12) continue;  // Skip near-zero diagonal
```

## Session Info

```{r session}
sessionInfo()
```