scMetaLink models metabolite-mediated cell communication as a two-step process:
Figure 1: scMetaLink Workflow. Schematic overview of the analysis pipeline for inferring metabolite-mediated cell-cell communication.
scMetaLink uses MetalinksDB, a comprehensive database of metabolite-protein interactions.
library(scMetaLink)
# Load and explore the database
db <- scMetaLink:::.load_metalinksdb()
cat("=== MetalinksDB Statistics ===\n")
#> === MetalinksDB Statistics ===
cat("Metabolites:", nrow(db$metabolites), "\n")
#> Metabolites: 1128
cat("Proteins:", nrow(db$proteins), "\n")
#> Proteins: 4374
cat("Interactions:", nrow(db$edges), "\n")
#> Interactions: 41894
cat("Pathways:", length(unique(db$pathway$pathway)), "\n")
#> Pathways: 36806# Two main interaction types
cat("=== Interaction Types ===\n")
#> === Interaction Types ===
print(table(db$edges$type))
#>
#> lr pd
#> 11869 30025
cat("\n- lr (ligand-receptor): Direct metabolite-receptor binding\n")
#>
#> - lr (ligand-receptor): Direct metabolite-receptor binding
cat("- pd (produce-degrade): Enzymatic production/degradation\n")
#> - pd (produce-degrade): Enzymatic production/degradationFor pd type interactions, MOR indicates the
direction:
pd_edges <- db$edges[db$edges$type == "pd", ]
cat("=== Mode of Regulation for pd interactions ===\n")
#> === Mode of Regulation for pd interactions ===
print(table(pd_edges$mor))
#>
#> -1 1
#> 13144 16881
cat("\n- mor = 1: Enzyme produces/secretes the metabolite\n")
#>
#> - mor = 1: Enzyme produces/secretes the metabolite
cat("- mor = -1: Enzyme degrades/consumes the metabolite\n")
#> - mor = -1: Enzyme degrades/consumes the metabolite
cat("- mor = 0: Enzyme binds without direction change\n")
#> - mor = 0: Enzyme binds without direction changecat("=== Protein Classifications ===\n")
#> === Protein Classifications ===
print(table(db$proteins$protein_type, useNA = "ifany"))
#>
#> catalytic_receptor enzyme gpcr lgic
#> 179 918 306 71
#> nhr other_ic other_protein transporter
#> 43 39 28 393
#> vgic <NA>
#> 73 2324
cat("\n")
cat("Receptors: gpcr, lgic, nhr, vgic, catalytic_receptor, other_ic\n")
#> Receptors: gpcr, lgic, nhr, vgic, catalytic_receptor, other_ic
cat("Enzymes: enzyme (or NA in protein_type)\n")
#> Enzymes: enzyme (or NA in protein_type)
cat("Transporters: transporter\n")
#> Transporters: transporterFigure 2: Computational Algorithm. Schematic illustration of the mathematical framework for computing metabolite production, sensing, and communication scores.
The production potential for metabolite \(m\) in cell type \(c\) is:
\[\text{MPP}_{m,c} = \frac{1}{|E_m|} \sum_{e \in E_m} \text{Score}(e, c) - \alpha \cdot \frac{1}{|D_m|} \sum_{d \in D_m} \text{Score}(d, c)\]
Where: - \(E_m\): Set of enzymes that produce metabolite \(m\) - \(D_m\): Set of enzymes that degrade metabolite \(m\) - \(\alpha = 0.5\): Degradation weight factor - \(\text{Score}(g, c)\): Gene expression score for gene \(g\) in cell type \(c\)
Three methods are available:
| Method | Formula | Use Case |
|---|---|---|
mean |
\(\bar{x}_g\) | Simple average expression |
proportion |
\(P(x_g > 0)\) | Proportion of expressing cells |
combined |
\(\bar{x}_g \times P(x_g > 0)\) | Balanced (recommended) |
| Method | Formula | Value |
|---|---|---|
| mean | mean(expr) | 0.610 |
| proportion | mean(expr > 0) | 0.300 |
| combined | mean x proportion | 0.183 |
For single-cell data with high dropout, trimean provides a robust alternative:
\[\text{Trimean} = \frac{Q_1 + 2Q_2 + Q_3}{4}\]
# Example: Trimean vs Arithmetic Mean
expr_with_outlier <- c(0, 0, 0, 1, 2, 3, 100) # Outlier = 100
cat("Data:", paste(expr_with_outlier, collapse = ", "), "\n")
#> Data: 0, 0, 0, 1, 2, 3, 100
cat("Arithmetic mean:", round(mean(expr_with_outlier), 2), "\n")
#> Arithmetic mean: 15.14
cat("Median:", median(expr_with_outlier), "\n")
#> Median: 1
# Trimean calculation
q <- quantile(expr_with_outlier, c(0.25, 0.5, 0.75))
trimean <- (q[1] + 2 * q[2] + q[3]) / 4
cat("Trimean:", round(trimean, 2), "\n")
#> Trimean: 1.12
cat("\nTrimean is robust to the outlier!\n")
#>
#> Trimean is robust to the outlier!The sensing capability for metabolite \(m\) in cell type \(c\) is:
\[\text{MSC}_{m,c} = \frac{1}{|R_m|} \sum_{r \in R_m} w_r \cdot \text{Score}(r, c)\]
Where: - \(R_m\): Set of receptors/transporters for metabolite \(m\) - \(w_r\): Affinity weight based on interaction confidence score
Receptor-metabolite binding affinities from MetalinksDB are used as weights:
To model receptor saturation kinetics, an optional Hill transformation can be applied:
\[P = \frac{E^n}{K_h^n + E^n}\]
Where: - \(E\): Expression level (normalized) - \(n\): Hill coefficient (cooperativity) - \(K_h\): Half-maximal threshold
# Hill function visualization
x <- seq(0, 1, 0.01)
hill <- function(x, n, Kh) x^n / (Kh^n + x^n)
plot(x, x,
type = "l", lty = 2, col = "gray",
xlab = "Expression (normalized)", ylab = "Response",
main = "Hill Function Transformation"
)
lines(x, hill(x, n = 1, Kh = 0.5), col = "blue", lwd = 2)
lines(x, hill(x, n = 2, Kh = 0.5), col = "red", lwd = 2)
lines(x, hill(x, n = 0.5, Kh = 0.5), col = "green", lwd = 2)
legend("bottomright",
legend = c("Linear", "n=1 (Michaelis-Menten)", "n=2 (Cooperative)", "n=0.5"),
col = c("gray", "blue", "red", "green"),
lty = c(2, 1, 1, 1), lwd = 2
)
abline(h = 0.5, v = 0.5, lty = 3, col = "gray")Figure 1: Hill Function Transformation. Different Hill coefficients (n) produce different response curves. n=1 gives standard Michaelis-Menten kinetics. n>1 indicates positive cooperativity (steeper curve). The half-maximal response occurs at x=Kh.
The communication score from sender \(s\) to receiver \(r\) via metabolite \(m\):
\[\text{Comm}_{s \to r}^m = f(\text{MPP}_{m,s}, \text{MSC}_{m,r})\]
Three aggregation methods are available:
| Method | Formula | Properties |
|---|---|---|
geometric |
\(\sqrt{\text{MPP} \times \text{MSC}}\) | Balanced, penalizes imbalance |
product |
\(\text{MPP} \times \text{MSC}\) | Emphasizes strong bilateral signals |
harmonic |
\(\frac{2 \times \text{MPP} \times \text{MSC}}{\text{MPP} + \text{MSC}}\) | Strongly penalizes imbalance |
Figure 2: Communication Score Methods. Heat maps showing how different methods combine production (x-axis) and sensing (y-axis) scores. Geometric mean (left) provides balanced weighting. Product (center) emphasizes strong bilateral signals. Harmonic mean (right) penalizes imbalanced communication.
When population.size = TRUE, communication is weighted
by cell type abundance:
\[\text{Comm}_{s \to r}^{\text{weighted}} = \text{Comm}_{s \to r} \times \sqrt{\frac{n_s}{N} \times \frac{n_r}{N}}\]
Where \(n_s, n_r\) are cell counts and \(N\) is total cells.
Rationale: Larger cell populations contribute more to tissue-level signaling.
To assess significance, cell type labels are randomly shuffled:
\[p = \frac{1 + \sum_{b=1}^{B} \mathbb{1}[\text{Comm}_b \geq \text{Comm}_{\text{obs}}]}{B + 1}\]
Three strategies for handling multiple comparisons:
| Method | Description |
|---|---|
metabolite_stratified |
BH correction within each metabolite (recommended) |
BH |
Global Benjamini-Hochberg |
bonferroni |
Global Bonferroni (very conservative) |
Why metabolite_stratified?
| Aspect | Ligand-Receptor (e.g., CellChat) | Metabolite-Mediated (scMetaLink) |
|---|---|---|
| Signal type | Proteins | Small molecules |
| Diffusion range | Short (juxtacrine/paracrine) | Variable (paracrine/endocrine) |
| Database | CellChatDB, CellPhoneDB | MetalinksDB |
| Key genes | Ligands, receptors | Enzymes, transporters, receptors |
| Signal complexity | Direct binding | Synthesis -> Secretion -> Sensing |
Schafer, S., et al. (2023). MetalinksDB: a knowledgebase of metabolite-centric signaling. Nature Communications.
Xiao, Z., et al. (2022). Metabolite-mediated intercellular communication in tumor microenvironment. Frontiers in Cell and Developmental Biology.