---
title: "MAGIC Algorithm: Mathematical Foundations"
author: "Zaoqu Liu"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{MAGIC Algorithm: Mathematical Foundations}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.width = 7,
  fig.height = 5,
  fig.align = "center",
  message = FALSE,
  warning = FALSE
)
```

## Introduction

This vignette provides a comprehensive mathematical description of the MAGIC algorithm. Understanding these foundations helps users make informed decisions about parameter selection and interpret results correctly.

## The Dropout Problem in scRNA-seq

Single-cell RNA sequencing captures transcriptomic profiles at single-cell resolution, but suffers from technical limitations:

1. **Low capture efficiency**: Only 10-20% of mRNA molecules are captured
2. **Amplification bias**: PCR amplification introduces noise
3. **Dropout events**: Expressed genes appear as zeros due to sampling

These issues result in sparse, noisy count matrices where biological signal is obscured.

## MAGIC: A Diffusion-Based Solution

MAGIC leverages the **manifold hypothesis**: high-dimensional single-cell data lies on a lower-dimensional manifold reflecting biological states. By diffusing information along this manifold, MAGIC recovers missing transcripts while preserving biological structure.

## Algorithm Steps

### Step 1: Dimensionality Reduction (Optional)

For computational efficiency, MAGIC first reduces dimensionality using PCA:

$$X_{pca} = X \cdot V_k$$

where $V_k$ contains the top $k$ principal components (default: 100).

### Step 2: k-Nearest Neighbor Graph

For each cell $i$, identify its $k$ nearest neighbors based on Euclidean distance in PCA space:

$$\mathcal{N}_k(i) = \{j : d(x_i, x_j) \leq d_k(x_i)\}$$

where $d_k(x_i)$ is the distance to the $k$-th nearest neighbor.

### Step 3: α-Decaying Kernel

The affinity between cells is computed using an α-decaying kernel with **adaptive bandwidth**:

$$K(x_i, x_j) = \exp\left(-\left(\frac{\|x_i - x_j\|_2}{\varepsilon_i}\right)^\alpha\right)$$

where:

- $\varepsilon_i = d_k(x_i)$ is the adaptive bandwidth (distance to $k$-th neighbor)
- $\alpha$ controls kernel sharpness (default: 1)

**Why adaptive bandwidth?**

- Accounts for varying local density
- Cells in dense regions have smaller bandwidth
- Cells in sparse regions have larger bandwidth
- Preserves rare cell populations

```{r fig.width=8, fig.height=4}
# Visualize α-decaying kernel
x <- seq(0, 3, length.out = 100)

par(mfrow = c(1, 2))

# Effect of distance
plot(x, exp(-x), type = "l", lwd = 2, col = "#e74c3c",
     xlab = "Normalized Distance (d/ε)", ylab = "Affinity K(x,y)",
     main = "α-Decaying Kernel (α = 1)")
abline(h = 0.5, lty = 2, col = "gray")
abline(v = log(2), lty = 2, col = "gray")

# Effect of α
plot(x, exp(-x^0.5), type = "l", lwd = 2, col = "#3498db",
     xlab = "Normalized Distance (d/ε)", ylab = "Affinity K(x,y)",
     main = "Effect of α Parameter", ylim = c(0, 1))
lines(x, exp(-x^1), lwd = 2, col = "#e74c3c")
lines(x, exp(-x^2), lwd = 2, col = "#2ecc71")
legend("topright", legend = c("α = 0.5", "α = 1", "α = 2"),
       col = c("#3498db", "#e74c3c", "#2ecc71"), lwd = 2)
```

### Step 4: Markov Normalization

The kernel matrix is row-normalized to create a **Markov transition matrix**:

$$P = D^{-1}K$$

where $D$ is the diagonal degree matrix with $D_{ii} = \sum_j K_{ij}$.

**Properties of P:**

- Row-stochastic: $\sum_j P_{ij} = 1$
- Represents transition probabilities in a random walk
- Eigenvalues in $[-1, 1]$

### Step 5: Diffusion (Powering)

The diffusion operator is powered $t$ times:

$$X_{imputed} = P^t X$$

Each power of $P$ propagates information one step along the graph:

- $t = 1$: Local averaging with immediate neighbors
- $t = 2$: Information from 2-hop neighbors
- $t = t$: Information from $t$-hop neighborhood

```{r fig.width=8, fig.height=4}
# Simulate diffusion effect
set.seed(42)
n <- 100
x <- c(rnorm(50, -2, 0.5), rnorm(50, 2, 0.5))
y <- c(rnorm(50, 0, 0.5), rnorm(50, 0, 0.5))

# Add noise
x_noisy <- x + rnorm(n, 0, 0.3)
y_noisy <- y + rnorm(n, 0, 0.3)

par(mfrow = c(1, 2))
plot(x_noisy, y_noisy, pch = 16, col = adjustcolor("#3498db", 0.6),
     main = "Before Diffusion (Noisy)", xlab = "Gene 1", ylab = "Gene 2")

# Simple smoothing simulation
x_smooth <- 0.7 * x + 0.3 * x_noisy
y_smooth <- 0.7 * y + 0.3 * y_noisy
plot(x_smooth, y_smooth, pch = 16, col = adjustcolor("#e74c3c", 0.6),
     main = "After Diffusion (Denoised)", xlab = "Gene 1", ylab = "Gene 2")
```

## Automatic t Selection

When `t = "auto"`, MAGIC uses **Procrustes analysis** to find optimal diffusion time.

### Procrustes Disparity

Given two matrices $X$ and $Y$, Procrustes analysis finds the optimal rotation $R$ that minimizes:

$$d_{Procrustes}(X, Y) = \|X - YR\|_F^2$$

after centering and normalizing both matrices.

### Convergence Criterion

MAGIC iteratively applies diffusion and monitors:

$$\Delta_t = d_{Procrustes}(X_t, X_{t+1})$$

The algorithm stops when $\Delta_t < \theta$ (default: 0.001), indicating convergence.

```{r}
library(MAGICR)
data(magic_testdata)

# Run with automatic t selection
result <- magic(magic_testdata, t = "auto", t_max = 15, verbose = TRUE)

cat("\nOptimal t selected:", result$params$t, "\n")
```

## Solver Options

### Exact Solver

Computes imputation in full gene space:

$$X_{imputed} = P^t X$$

**Advantages:**
- Preserves all gene-gene relationships
- No approximation error

**Disadvantages:**
- Memory intensive for large gene sets
- Slower computation

### Approximate Solver

Operates in PCA space and projects back:

$$X_{imputed} = P^t X_{pca} \cdot V_k^T$$

**Advantages:**
- Much faster for large datasets
- Lower memory footprint

**Disadvantages:**
- Some information loss from PCA projection

## Spectral Interpretation

The diffusion process has an elegant spectral interpretation. If $P$ has eigendecomposition:

$$P = \sum_i \lambda_i v_i v_i^T$$

Then:

$$P^t = \sum_i \lambda_i^t v_i v_i^T$$

Since $|\lambda_i| \leq 1$:

- Small eigenvalues (high-frequency noise) decay rapidly
- Large eigenvalues (low-frequency signal) persist
- Diffusion acts as a **low-pass filter**

## Practical Recommendations

### Parameter Selection Guidelines

| Scenario | Recommended Parameters |
|----------|----------------------|
| Standard scRNA-seq | `t = 3-5`, `knn = 5`, `decay = 1` |
| Highly sparse data | `t = 5-10`, `knn = 10` |
| Rare populations | `t = 2-3`, `decay = 2` |
| Large datasets | `solver = "approximate"`, `npca = 50` |

### When to Use MAGIC

✅ **Good use cases:**
- Visualizing gene-gene relationships
- Trajectory analysis preprocessing
- Recovering dropout in sparse data

❌ **Avoid when:**
- Performing differential expression (use raw counts)
- Data is already dense
- Preserving exact count values is critical

## References

1. van Dijk, D., et al. (2018). Recovering Gene Interactions from Single-Cell Data Using Data Diffusion. *Cell*, 174(3), 716-729.e27.

2. Coifman, R.R., & Lafon, S. (2006). Diffusion maps. *Applied and Computational Harmonic Analysis*, 21(1), 5-30.

3. Moon, K.R., et al. (2019). Visualizing structure and transitions in high-dimensional biological data. *Nature Biotechnology*, 37(12), 1482-1492.

## Session Info

```{r}
sessionInfo()
```