This vignette describes the statistical methods and algorithms implemented in iTALK for intercellular communication analysis.
iTALK includes a curated database of 2,649 ligand-receptor pairs categorized into four functional groups:
The database integrates information from:
# Database columns
cat("Database columns:", paste(colnames(db), collapse = ", "), "\n\n")
#> Database columns: Pair.Name, Ligand.ApprovedSymbol, Ligand.Name, Receptor.ApprovedSymbol, Receptor.Name, Classification
# Example entries
head(db[, c("Ligand.ApprovedSymbol", "Receptor.ApprovedSymbol", "Classification")])
#> Ligand.ApprovedSymbol Receptor.ApprovedSymbol Classification
#> 1 A2M LRP1 other
#> 2 AANAT MTNR1A other
#> 3 AANAT MTNR1B other
#> 4 ACE AGTR2 other
#> 5 ACE BDKRB2 other
#> 6 ADAM10 AXL otheriTALK supports seven statistical methods for identifying differentially expressed genes:
| Method | Model | Best For | Reference |
|---|---|---|---|
| Wilcox | Wilcoxon rank-sum | General purpose | - |
| DESeq2 | Negative binomial | Bulk RNA-seq | Love et al., 2014 |
| edgeR | Negative binomial + EB | Bulk RNA-seq | McCarthy et al., 2012 |
| MAST | Two-part hurdle | scRNA-seq | Finak et al., 2015 |
| monocle | Census-based | Trajectory analysis | Qiu et al., 2017 |
| SCDE | Bayesian | scRNA-seq | Kharchenko et al., 2014 |
| DEsingle | Zero-inflated NB | Dropout modeling | Miao et al., 2018 |
The default method uses the Wilcoxon rank-sum test (Mann-Whitney U test):
\[W = \sum_{i=1}^{n_1} R_i - \frac{n_1(n_1+1)}{2}\]
Where \(R_i\) is the rank of observation \(i\) in the combined sample.
Log Fold Change Calculation:
For raw counts: \[\log_2 FC = \log_2 \frac{\bar{x}_1 + \epsilon}{\bar{x}_2 + \epsilon}\]
Where \(\epsilon = 10^{-9}\) prevents division by zero.
For log-transformed data: \[\log_2 FC = \bar{x}_1 - \bar{x}_2\]
DESeq2 uses a negative binomial generalized linear model:
\[K_{ij} \sim NB(\mu_{ij}, \alpha_i)\]
\[\mu_{ij} = s_j q_{ij}\]
\[\log_2 q_{ij} = \sum_r x_{jr} \beta_{ir}\]
Where: - \(K_{ij}\) = read count for gene \(i\) in sample \(j\) - \(s_j\) = size factor for sample \(j\) - \(\alpha_i\) = dispersion for gene \(i\) - \(\beta_{ir}\) = log2 fold change
MAST uses a two-part hurdle model for scRNA-seq data:
Part 1 (Discrete): Logistic regression for detection \[\log \frac{P(Z_g = 1)}{P(Z_g = 0)} = X\beta_D\]
Part 2 (Continuous): Linear model for expression given detection \[E[Y_g | Z_g = 1] = X\beta_C\]
The combined test statistic follows a \(\chi^2\) distribution with 2 degrees of freedom.
┌─────────────────┐
│ Gene List(s) │
│ (expr/DEG) │
└────────┬────────┘
│
▼
┌─────────────────┐
│ Species Check │──► Auto-conversion if needed
└────────┬────────┘
│
▼
┌─────────────────┐
│ Database Match │
│ Ligand ∩ Recep │
└────────┬────────┘
│
▼
┌─────────────────┐
│ Annotate Pairs │
│ + Cell Types │
└────────┬────────┘
│
▼
┌─────────────────┐
│ Filter & Output │
└─────────────────┘
For “mean count” datatype:
# Pseudocode
ligand_genes <- intersect(input_genes, database$Ligand)
receptor_genes <- intersect(input_genes, database$Receptor)
pairs <- database[database$Ligand %in% ligand_genes &
database$Receptor %in% receptor_genes, ]For “DEG” datatype, additional filtering removes pairs where both ligand and receptor are not differentially expressed (logFC = 0.0001 placeholder).
All p-values are adjusted using the Benjamini-Hochberg procedure:
\[q_i = \min\left(1, \frac{p_i \cdot m}{i}\right)\]
Where: - \(p_i\) = i-th smallest p-value - \(m\) = total number of tests - \(q_i\) = adjusted p-value (FDR)
# Demonstration of BH correction
set.seed(42)
p_values <- c(runif(50, 0, 0.01), runif(50, 0.01, 1)) # Mix of significant and non-significant
q_values <- p.adjust(p_values, method = "BH")
# Visualize
par(mfrow = c(1, 2))
hist(p_values, breaks = 20, main = "Raw P-values", xlab = "P-value", col = "steelblue")
hist(q_values, breaks = 20, main = "BH-adjusted Q-values", xlab = "Q-value", col = "coral")For mean count analysis, communication strength is proportional to:
\[S_{LR} = \bar{E}_L \times \bar{E}_R\]
Where \(\bar{E}_L\) and \(\bar{E}_R\) are mean expression levels of ligand and receptor.
For DEG analysis:
\[S_{LR} = |FC_L| \times |FC_R| \times \text{sign}(FC_L \times FC_R)\]
This captures both magnitude and directionality of changes.
Cell-cell communication networks are constructed using:
library(igraph)
# Example network construction
edges <- data.frame(
from = c("T_cell", "T_cell", "Macrophage", "Dendritic"),
to = c("Macrophage", "B_cell", "T_cell", "T_cell"),
weight = c(15, 8, 12, 6)
)
g <- graph_from_data_frame(edges, directed = TRUE)
E(g)$width <- E(g)$weight / 3
plot(g,
vertex.size = 30,
vertex.color = "lightblue",
edge.arrow.size = 0.5,
main = "Cell Communication Network")| Operation | Time Complexity | Space Complexity |
|---|---|---|
| rawParse | O(n × g) | O(c × g) |
| FindLR | O(g × d) | O(p) |
| DEG (Wilcox) | O(n × g) | O(g) |
| LRPlot | O(p) | O(p) |
| NetView | O(c²) | O(c²) |
Where: - n = number of cells - g = number of genes - c = number of cell types - d = database size - p = number of pairs
Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biology. 2014;15:550.
Finak G, et al. MAST: a flexible statistical framework for assessing transcriptional changes and characterizing heterogeneity in single-cell RNA sequencing data. Genome Biology. 2015;16:278.
Wang Y, et al. iTALK: an R Package to Characterize and Illustrate Intercellular Communication. bioRxiv. 2019;507871.
sessionInfo()
#> R version 4.6.0 (2026-04-24)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.4 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: Etc/UTC
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] igraph_2.3.2 iTALK_0.1.1 rmarkdown_2.31
#>
#> loaded via a namespace (and not attached):
#> [1] sass_0.4.10 generics_0.1.4 tidyr_1.3.2
#> [4] shape_1.4.6.1 stringi_1.8.7 hms_1.1.4
#> [7] digest_0.6.39 magrittr_2.0.5 evaluate_1.0.5
#> [10] grid_4.6.0 RColorBrewer_1.1-3 circlize_0.4.18
#> [13] fastmap_1.2.0 jsonlite_2.0.0 progress_1.2.3
#> [16] GlobalOptions_0.1.4 purrr_1.2.2 scales_1.4.0
#> [19] pbapply_1.7-4 randomcoloR_1.1.0.1 jquerylib_0.1.4
#> [22] cli_3.6.6 crayon_1.5.3 rlang_1.2.0
#> [25] cachem_1.1.0 yaml_2.3.12 otel_0.2.0
#> [28] Rtsne_0.17 parallel_4.6.0 tools_4.6.0
#> [31] dplyr_1.2.1 colorspace_2.1-2 ggplot2_4.0.3
#> [34] curl_7.1.0 buildtools_1.0.0 vctrs_0.7.3
#> [37] R6_2.6.1 lifecycle_1.0.5 stringr_1.6.0
#> [40] V8_8.2.0 cluster_2.1.8.2 pkgconfig_2.0.3
#> [43] pillar_1.11.1 bslib_0.11.0 gtable_0.3.6
#> [46] glue_1.8.1 Rcpp_1.1.1-1.1 xfun_0.59
#> [49] tibble_3.3.1 tidyselect_1.2.1 sys_3.4.3
#> [52] knitr_1.51 farver_2.1.2 htmltools_0.5.9
#> [55] maketools_1.3.2 compiler_4.6.0 prettyunits_1.2.0
#> [58] S7_0.2.2