NOVA implements a comprehensive computational framework for inferring cell-to-cell communication networks based on ligand-receptor co-expression patterns. This document details the mathematical foundations and algorithmic implementations underlying NOVA’s analysis pipeline.
Intercellular communication is a fundamental biological process where cells exchange information through signaling molecules. In the context of transcriptomic analysis, we focus on ligand-receptor (L-R) interactions, where:
The strength of a potential communication event can be estimated from gene expression data by examining the co-expression of ligand-receptor pairs between cell populations.
For a given gene \(g\) in cluster \(c\), NOVA computes:
\[ \text{pct}_{g,c} = \frac{|\{i : x_{g,i} > 0, i \in c\}|}{|c|} \]
where \(x_{g,i}\) is the expression of gene \(g\) in cell \(i\), and \(|c|\) is the number of cells in cluster \(c\).
\[ \bar{x}_{g,c} = \frac{1}{|c|} \sum_{i \in c} x_{g,i} \]
The specificity score quantifies how preferentially a gene is expressed in a given cluster relative to all clusters:
\[ S_{g,c} = \frac{\bar{x}_{g,c}}{\sum_{c' \in C} \bar{x}_{g,c'}} \]
where \(C\) is the set of all clusters. This metric ranges from 0 to 1, with higher values indicating more cluster-specific expression.
Properties: - \(\sum_{c \in C} S_{g,c} = 1\) for each gene - \(S_{g,c} = 0\) if gene \(g\) is not expressed in cluster \(c\) - \(S_{g,c} = 1\) if gene \(g\) is only expressed in cluster \(c\)
For a ligand-receptor pair \((L, R)\) between sending cluster \(s\) and target cluster \(t\):
\[ W_{\text{expr}}(L,R,s,t) = \bar{x}_{L,s} \times \bar{x}_{R,t} \]
\[ W_{\text{spec}}(L,R,s,t) = S_{L,s} \times S_{R,t} \]
NOVA applies the following filters to identify biologically meaningful interactions:
Detection threshold: Both ligand and receptor must be detected in a minimum percentage of cells: \[ \text{pct}_{L,s} > \theta_{\text{pct}} \quad \text{and} \quad \text{pct}_{R,t} > \theta_{\text{pct}} \]
Expression threshold: The edge weight must exceed a minimum value: \[ W_{\text{expr}}(L,R,s,t) > \theta_{\text{expr}} \]
# Pseudocode for cluster statistics
compute_cluster_stats <- function(expr, clusters) {
for each cluster c:
cells_in_c <- get_cells(clusters, c)
for each gene g:
n_expressing <- sum(expr[g, cells_in_c] > 0)
pct[g,c] <- n_expressing / length(cells_in_c)
mean[g,c] <- mean(expr[g, cells_in_c])
# Compute specificity (row normalization)
for each gene g:
row_sum <- sum(mean[g, ])
if row_sum > 0:
specificity[g, ] <- mean[g, ] / row_sum
return list(pct, mean, specificity)
}# Pseudocode for edge computation
compute_edges <- function(lr_pairs, lig_stats, rec_stats, clusters) {
edges <- list()
for each pair (L, R) in lr_pairs:
for each sending cluster s:
for each target cluster t:
# Check detection threshold
if lig_stats$pct[L,s] > min_pct AND
rec_stats$pct[R,t] > min_pct:
# Compute weights
weight_expr <- lig_stats$mean[L,s] * rec_stats$mean[R,t]
weight_spec <- lig_stats$spec[L,s] * rec_stats$spec[R,t]
if weight_expr > 0:
edges.add(L, R, s, t, weight_expr, weight_spec)
return edges
}For comparing communication between two conditions (reference vs. target):
\[ \text{log}_2\text{FC} = \log_2\left(\frac{W_{\text{target}} + \epsilon}{W_{\text{reference}} + \epsilon}\right) \]
where \(\epsilon\) is a small constant (default: 0.001) to avoid division by zero.
\[ \Delta_{\text{expr}} = W_{\text{expr,target}} - W_{\text{expr,reference}} \]
\[ \Delta_{\text{spec}} = W_{\text{spec,target}} - W_{\text{spec,reference}} \]
NOVA utilizes the connectomeDB2020 database, which provides:
| Database | Description | Pairs |
|---|---|---|
| lrc2p | Literature-curated, high-confidence pairs | 2,293 |
| lrc2a | Extended set including predictions | ~15,000 |
library(NOVA)
# Load database
lr_db <- GetLRDatabase("lrc2p")
str(lr_db)
#> Classes 'data.table' and 'data.frame': 2293 obs. of 2 variables:
#> $ ligand : chr "A2M" "AANAT" "AANAT" "ACE" ...
#> $ receptor: chr "LRP1" "MTNR1A" "MTNR1B" "BDKRB2" ...
#> - attr(*, ".internal.selfref")=<pointer: 0x55d386e54b60>
head(lr_db)
#> ligand receptor
#> <char> <char>
#> 1: A2M LRP1
#> 2: AANAT MTNR1A
#> 3: AANAT MTNR1B
#> 4: ACE BDKRB2
#> 5: ADAM10 EPHA3
#> 6: ADAM11 ITGA4NOVA supports cross-species analysis through NCBI HomoloGene:
# Get supported species
species_list <- supported_species()
print(species_list)
#> human mouse chimpanzee
#> "9606" "10090" "9598"
#> dog monkey cattle
#> "9615" "9544" "9913"
#> rat chicken frog
#> "10116" "9031" "8364"
#> zebrafish fruitfly mosquito
#> "7955" "7227" "7165"
#> nematode thalecress rice
#> "6239" "3702" "4530"
#> riceblastfungus bakeryeast neurosporacrassa
#> "318829" "4932" "5141"
#> fissionyeast eremotheciumgossypii kluyveromyceslactis
#> "4896" "33169" "28985"
# Convert mouse genes to human
# mouse_genes <- c("Cd4", "Cd8a", "Ptprc")
# human_orthologs <- ConvertGeneSymbols(mouse_genes, from = "mouse", to = "human")NOVA supports 21 species including:
NOVA employs several optimization strategies:
data.table
for efficient data manipulationMatrix
package for memory efficiencyfuture package| Operation | Time Complexity | Space Complexity |
|---|---|---|
| Cluster stats | O(G × N) | O(G × C) |
| Edge computation | O(P × C²) | O(E) |
| Filtering | O(E) | O(E) |
Where: - G = number of genes - N = number of cells - C = number of clusters - P = number of LR pairs - E = number of edges
Hou, R., Denisenko, E., Ong, H.T. et al. Predicting cell-to-cell communication networks using NATMI. Nat Commun 11, 5011 (2020).
Ramilowski, J.A., et al. A draft network of ligand-receptor-mediated multicellular signalling in human. Nat Commun 6, 7866 (2015).