SpaTalk employs a multi-step computational framework to infer spatially resolved cell-cell communications from spatial transcriptomics data. This vignette provides detailed explanations of the underlying algorithms. ## Algorithmic Pipeline
┌─────────────────────────────────────────────────────────────────┐
│ SpaTalk Pipeline │
├─────────────────────────────────────────────────────────────────┤
│ 1. Cell-type Deconvolution (for spot-based ST) │
│ └── Non-negative Linear Model (NNLM) │
│ │
│ 2. Spatial Reconstruction │
│ └── Monte Carlo sampling + k-NN spatial mapping │
│ │
│ 3. LR-Pathway Filtering │
│ └── Knowledge graph: LR pairs → Downstream TFs │
│ │
│ 4. CCI Inference │
│ └── Co-expression analysis + Permutation test │
│ │
│ 5. Pathway Scoring │
│ └── Random walk on gene-gene interaction graph │
└─────────────────────────────────────────────────────────────────┘
For spot-based ST data where each spot contains multiple cells, SpaTalk decomposes the observed expression profile into cell-type-specific contributions using Non-negative Matrix Factorization (NMF).
Given: - \(X \in \mathbb{R}^{G \times S}\): ST expression matrix (genes × spots) - \(R \in \mathbb{R}^{G \times C}\): Reference expression profile (genes × cell types)
The deconvolution problem is formulated as:
\[X \approx R \cdot W\]
where \(W \in \mathbb{R}^{C \times S}\) represents cell-type proportions per spot, subject to: - \(W_{ij} \geq 0\) (non-negativity) - \(\sum_i W_{ij} = 1\) (proportions sum to 1)
SpaTalk uses the NNLM (Non-Negative Linear Model) package with the following optimization:
\[\min_{W \geq 0} \|X - RW\|_F^2 + \lambda \|W\|_1\]
The L1 regularization term promotes sparsity in cell-type assignments.
After deconvolution, SpaTalk reconstructs single-cell resolution expression by sampling cells from the reference data according to:
Cell proportion sampling: For each spot \(s\) with predicted proportions \(W_s\), sample \(N_s\) cells where \(N_s\) is the expected number of cells per spot.
Spatial coordinate assignment: Assign coordinates to sampled cells using a weighted distance model:
\[P(x_i | s) \propto \exp\left(-\frac{d(x_i, c_s)^2}{2\sigma^2}\right)\]
where \(c_s\) is the spot centroid and \(\sigma\) controls the spatial spread.
For each reconstructed cell, find the k-nearest neighbors in the original ST data to refine expression estimates:
\[\hat{E}_i = \frac{1}{|N_k(i)|} \sum_{j \in N_k(i)} E_j \cdot w_{ij}\]
where \(w_{ij}\) is the distance-based weight.
SpaTalk integrates curated biological knowledge from multiple databases:
| Database | Content | Size |
|---|---|---|
| CellTalkDB | Ligand-receptor pairs | 3,398 (Human), 2,033 (Mouse) |
| KEGG | Signaling pathways | ~300 pathways |
| Reactome | Pathway interactions | ~2,000 pathways |
| AnimalTFDB | Transcription factors | ~1,600 TFs |
For a ligand-receptor pair (L, R) between sender cell type A and receiver cell type B:
\[\text{CoExp}(L_A, R_B) = \frac{|\{(a,b) : E_L^a > 0 \land E_R^b > 0 \land d(a,b) < \delta\}|}{|\{(a,b) : d(a,b) < \delta\}|}\]
where \(\delta\) is the spatial distance threshold.
Statistical significance is assessed via permutation:
\[p = \frac{|\{\pi : \text{CoExp}^\pi \geq \text{CoExp}^{obs}\}| + 1}{n + 1}\]
To score downstream transcription factors activated by a receptor, SpaTalk performs random walks on the gene-gene interaction (GGI) graph:
Algorithm: Random Walk TF Scoring
Input: GGI graph G, receptor R, TF set T, n_walks, max_hop
Output: TF scores
Initialize: score[tf] = 0 for all tf in T
for i = 1 to n_walks:
current = R
for hop = 1 to max_hop:
neighbors = G.neighbors(current)
if neighbors is empty: break
next = random_choice(neighbors)
if next in T:
score[next] += 1
current = next
Return: score / n_walks
SpaTalk uses optimized C++ code for performance-critical computations:
// Simplified random walk implementation
NumericVector cpp_random_walk(
CharacterVector ggi_src,
CharacterVector ggi_dest,
String receptor_name,
CharacterVector tf_names,
int n_walks = 10000,
int max_hop = 10
) {
// Build adjacency list
// Perform random walks
// Count TF visits
// Return normalized scores
}| Step | Time Complexity | Space Complexity |
|---|---|---|
| Deconvolution | \(O(G \cdot S \cdot C)\) | \(O(G \cdot S)\) |
| Spatial reconstruction | \(O(S \cdot N \cdot k)\) | \(O(N)\) |
| Co-expression | \(O(L \cdot N^2)\) | \(O(N^2)\) |
| Permutation test | \(O(P \cdot L \cdot N^2)\) | \(O(N^2)\) |
| Random walk | \(O(W \cdot H)\) | \(O(|E|)\) |
Where: G = genes, S = spots, C = cell types, N = cells, L = LR pairs, P = permutations, W = walks, H = max hops, E = edges.
Shao, X., et al. (2022). Knowledge-graph-based cell-cell communication inference for spatially resolved transcriptomic data with SpaTalk. Nature Communications, 13, 4429.
Lin, X., & Boutros, P. C. (2020). Optimization and expansion of non-negative matrix factorization. BMC Bioinformatics, 21, 7.
Kanehisa, M., & Goto, S. (2000). KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Research, 28(1), 27-30.
sessionInfo()
#> R version 4.6.1 (2026-06-24)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 26.04 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.32.so; LAPACK version 3.12.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: Etc/UTC
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] rmarkdown_2.31
#>
#> loaded via a namespace (and not attached):
#> [1] digest_0.6.39 R6_2.6.1 fastmap_1.2.0 xfun_0.59
#> [5] maketools_1.3.2 cachem_1.1.0 knitr_1.51 htmltools_0.5.9
#> [9] buildtools_1.0.0 lifecycle_1.0.5 cli_3.6.6 sass_0.4.10
#> [13] jquerylib_0.1.4 compiler_4.6.1 sys_3.4.3 tools_4.6.1
#> [17] evaluate_1.0.5 bslib_0.11.0 yaml_2.3.12 otel_0.2.0
#> [21] jsonlite_2.0.0 rlang_1.2.0