Connectome is built on the premise that cell-cell communication can be inferred from the co-expression patterns of ligand-receptor pairs across distinct cell populations. This document describes the mathematical framework underlying the analysis.
Connectome utilizes the FANTOM5 (Functional Annotation of the Mammalian Genome 5) ligand-receptor database, which provides curated pairs of interacting molecules:
\[\mathcal{P} = \{(L_k, R_k, M_k)\}_{k=1}^{N}\]
Where: - \(L_k\): Ligand gene symbol
- \(R_k\): Receptor gene symbol
- \(M_k\): Signaling mode/family
classification - \(N\): Total number of
pairs (~2,557 for human)
Pairs are classified by evidence strength:
| Level | Description |
|---|---|
| Literature supported | Experimentally validated interactions |
| Putative | Computationally predicted interactions |
For each cell population \(i\) and gene \(g\):
Normalized Expression: \[\bar{E}_{i,g} = \frac{1}{|C_i|} \sum_{c \in C_i} E_{c,g}\]
Scaled Expression (Z-score): \[Z_{i,g} = \frac{\bar{E}_{i,g} - \mu_g}{\sigma_g}\]
Percent Expression: \[P_{i,g} = \frac{|\{c \in C_i : E_{c,g} > 0\}|}{|C_i|}\]
Given source population \(i\), target population \(j\), and ligand-receptor pair \((L, R)\):
Product (Default): \[w_{ij}^{LR} = E_{i,L} \times E_{j,R}\]
Sum: \[w_{ij}^{LR} = E_{i,L} + E_{j,R}\]
Mean: \[w_{ij}^{LR} = \frac{E_{i,L} + E_{j,R}}{2}\]
The product formulation captures the multiplicative nature of ligand-receptor binding kinetics.
For each gene \(g\) in cluster \(i\), we test whether expression differs from background:
\[H_0: \text{median}(E_{C_i,g}) = \text{median}(E_{C_{\backslash i},g})\]
The test statistic: \[W = \sum_{c \in C_i} R_c\]
Where \(R_c\) is the rank of cell \(c\) in the combined sample.
Adjusted p-values using Bonferroni correction: \[p_{adj} = \min(p \times m, 1)\]
Where \(m\) is the number of tests performed.
The DOR quantifies gene specificity for a cell cluster using a 2×2 contingency table:
| Expressing | Non-expressing | |
|---|---|---|
| In cluster | TP | FN |
| Out of cluster | FP | TN |
\[\text{DOR} = \frac{TP \times TN}{FP \times FN}\]
To handle zero cells, we apply the Haldane-Anscombe correction with pseudocount \(\epsilon = 0.5\):
\[\text{DOR}_{corrected} = \frac{(TP + \epsilon)(TN + \epsilon)}{(FP + \epsilon)(FN + \epsilon)}\]
Log-transformed for symmetry: \[\log(\text{DOR}) = \log(TP + \epsilon) + \log(TN + \epsilon) - \log(FP + \epsilon) - \log(FN + \epsilon)\]
Interpretation: - \(\log(\text{DOR}) > 0\): Gene is enriched in cluster - \(\log(\text{DOR}) < 0\): Gene is depleted in cluster - \(\log(\text{DOR}) = 0\): No association
The connectome is represented as a directed weighted graph: \[G = (V, E, w)\]
Where: - \(V\): Cell populations (nodes) - \(E\): Signaling edges - \(w\): Edge weights
For two conditions (reference and test):
\[\text{LFC}_{ij}^{L} = \log_2\left(\frac{E_{i,L}^{test}}{E_{i,L}^{ref}}\right)\]
\[\text{LFC}_{ij}^{R} = \log_2\left(\frac{E_{j,R}^{test}}{E_{j,R}^{ref}}\right)\]
The overall perturbation score combines ligand and receptor changes:
\[S_{ij}^{LR} = |\text{LFC}_{ij}^{L}| \times |\text{LFC}_{ij}^{R}|\]
This captures edges where both components are differentially expressed.
| Operation | Complexity |
|---|---|
| Expression averaging | \(O(n \times g)\) |
| Edge construction | \(O(k^2 \times p)\) |
| P-value calculation | \(O(k \times g \times n)\) |
| Centrality analysis | \(O(k^2 \times p)\) |
Where: - \(n\): Number of cells - \(g\): Number of genes - \(k\): Number of cell populations - \(p\): Number of L-R pairs
Connectome uses: - data.table for efficient data
manipulation - Sparse matrix operations via Matrix package
- Pre-allocated vectors to avoid memory fragmentation
FANTOM5 Database: Ramilowski, J.A. et al. A draft network of ligand–receptor-mediated multicellular signalling in human. Nat Commun 6, 7866 (2015).
Kleinberg Algorithm: Kleinberg, J.M. Authoritative sources in a hyperlinked environment. Journal of the ACM 46(5), 604–632 (1999).
Haldane-Anscombe Correction: Agresti, A. Categorical Data Analysis. Wiley, 3rd edition (2013).
sessionInfo()
#> R version 4.6.0 (2026-04-24)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.4 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: Etc/UTC
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] rmarkdown_2.31
#>
#> loaded via a namespace (and not attached):
#> [1] digest_0.6.39 R6_2.6.1 fastmap_1.2.0 xfun_0.59
#> [5] maketools_1.3.2 cachem_1.1.0 knitr_1.51 htmltools_0.5.9
#> [9] buildtools_1.0.0 lifecycle_1.0.5 cli_3.6.6 sass_0.4.10
#> [13] jquerylib_0.1.4 compiler_4.6.0 sys_3.4.3 tools_4.6.0
#> [17] evaluate_1.0.5 bslib_0.11.0 yaml_2.3.12 otel_0.2.0
#> [21] jsonlite_2.0.0 rlang_1.2.0