Algorithm Principles and Mathematical Framework

Theoretical Foundation

Connectome is built on the premise that cell-cell communication can be inferred from the co-expression patterns of ligand-receptor pairs across distinct cell populations. This document describes the mathematical framework underlying the analysis.

1. Ligand-Receptor Database

FANTOM5 Database

Connectome utilizes the FANTOM5 (Functional Annotation of the Mammalian Genome 5) ligand-receptor database, which provides curated pairs of interacting molecules:

\[\mathcal{P} = \{(L_k, R_k, M_k)\}_{k=1}^{N}\]

Where: - \(L_k\): Ligand gene symbol - \(R_k\): Receptor gene symbol
- \(M_k\): Signaling mode/family classification - \(N\): Total number of pairs (~2,557 for human)

Evidence Levels

Pairs are classified by evidence strength:

Level Description
Literature supported Experimentally validated interactions
Putative Computationally predicted interactions

2. Edge Weight Computation

Expression Metrics

For each cell population \(i\) and gene \(g\):

Normalized Expression: \[\bar{E}_{i,g} = \frac{1}{|C_i|} \sum_{c \in C_i} E_{c,g}\]

Scaled Expression (Z-score): \[Z_{i,g} = \frac{\bar{E}_{i,g} - \mu_g}{\sigma_g}\]

Percent Expression: \[P_{i,g} = \frac{|\{c \in C_i : E_{c,g} > 0\}|}{|C_i|}\]

Edge Weight Functions

Given source population \(i\), target population \(j\), and ligand-receptor pair \((L, R)\):

Product (Default): \[w_{ij}^{LR} = E_{i,L} \times E_{j,R}\]

Sum: \[w_{ij}^{LR} = E_{i,L} + E_{j,R}\]

Mean: \[w_{ij}^{LR} = \frac{E_{i,L} + E_{j,R}}{2}\]

The product formulation captures the multiplicative nature of ligand-receptor binding kinetics.

3. Statistical Testing

Wilcoxon Rank-Sum Test

For each gene \(g\) in cluster \(i\), we test whether expression differs from background:

\[H_0: \text{median}(E_{C_i,g}) = \text{median}(E_{C_{\backslash i},g})\]

The test statistic: \[W = \sum_{c \in C_i} R_c\]

Where \(R_c\) is the rank of cell \(c\) in the combined sample.

Multiple Testing Correction

Adjusted p-values using Bonferroni correction: \[p_{adj} = \min(p \times m, 1)\]

Where \(m\) is the number of tests performed.

4. Diagnostic Odds Ratio (DOR)

The DOR quantifies gene specificity for a cell cluster using a 2×2 contingency table:

Expressing Non-expressing
In cluster TP FN
Out of cluster FP TN

Standard DOR

\[\text{DOR} = \frac{TP \times TN}{FP \times FN}\]

Haldane-Anscombe Correction

To handle zero cells, we apply the Haldane-Anscombe correction with pseudocount \(\epsilon = 0.5\):

\[\text{DOR}_{corrected} = \frac{(TP + \epsilon)(TN + \epsilon)}{(FP + \epsilon)(FN + \epsilon)}\]

Log-transformed for symmetry: \[\log(\text{DOR}) = \log(TP + \epsilon) + \log(TN + \epsilon) - \log(FP + \epsilon) - \log(FN + \epsilon)\]

Interpretation: - \(\log(\text{DOR}) > 0\): Gene is enriched in cluster - \(\log(\text{DOR}) < 0\): Gene is depleted in cluster - \(\log(\text{DOR}) = 0\): No association

5. Network Centrality Analysis

Graph Construction

The connectome is represented as a directed weighted graph: \[G = (V, E, w)\]

Where: - \(V\): Cell populations (nodes) - \(E\): Signaling edges - \(w\): Edge weights

Kleinberg’s Hub and Authority Scores

Authority score (receiving importance): \[a_i = \sum_{j \rightarrow i} w_{ji} \cdot h_j\]

Hub score (sending importance): \[h_i = \sum_{i \rightarrow j} w_{ij} \cdot a_j\]

These are computed iteratively until convergence:

Initialize: h = a = 1/√n
Repeat until convergence:
    a' = A^T · h
    h' = A · a
    Normalize a' and h'
    a = a', h = h'

6. Differential Connectivity Analysis

Fold Change Computation

For two conditions (reference and test):

\[\text{LFC}_{ij}^{L} = \log_2\left(\frac{E_{i,L}^{test}}{E_{i,L}^{ref}}\right)\]

\[\text{LFC}_{ij}^{R} = \log_2\left(\frac{E_{j,R}^{test}}{E_{j,R}^{ref}}\right)\]

Perturbation Score

The overall perturbation score combines ligand and receptor changes:

\[S_{ij}^{LR} = |\text{LFC}_{ij}^{L}| \times |\text{LFC}_{ij}^{R}|\]

This captures edges where both components are differentially expressed.

7. Implementation Details

Computational Complexity

Operation Complexity
Expression averaging \(O(n \times g)\)
Edge construction \(O(k^2 \times p)\)
P-value calculation \(O(k \times g \times n)\)
Centrality analysis \(O(k^2 \times p)\)

Where: - \(n\): Number of cells - \(g\): Number of genes - \(k\): Number of cell populations - \(p\): Number of L-R pairs

Memory Efficiency

Connectome uses: - data.table for efficient data manipulation - Sparse matrix operations via Matrix package - Pre-allocated vectors to avoid memory fragmentation

References

  1. FANTOM5 Database: Ramilowski, J.A. et al. A draft network of ligand–receptor-mediated multicellular signalling in human. Nat Commun 6, 7866 (2015).

  2. Kleinberg Algorithm: Kleinberg, J.M. Authoritative sources in a hyperlinked environment. Journal of the ACM 46(5), 604–632 (1999).

  3. Haldane-Anscombe Correction: Agresti, A. Categorical Data Analysis. Wiley, 3rd edition (2013).

Session Info

sessionInfo()
#> R version 4.6.0 (2026-04-24)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.4 LTS
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> time zone: Etc/UTC
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] rmarkdown_2.31
#> 
#> loaded via a namespace (and not attached):
#>  [1] digest_0.6.39    R6_2.6.1         fastmap_1.2.0    xfun_0.59       
#>  [5] maketools_1.3.2  cachem_1.1.0     knitr_1.51       htmltools_0.5.9 
#>  [9] buildtools_1.0.0 lifecycle_1.0.5  cli_3.6.6        sass_0.4.10     
#> [13] jquerylib_0.1.4  compiler_4.6.0   sys_3.4.3        tools_4.6.0     
#> [17] evaluate_1.0.5   bslib_0.11.0     yaml_2.3.12      otel_0.2.0      
#> [21] jsonlite_2.0.0   rlang_1.2.0