Algorithm and Mathematical Background

Biological Motivation

Stem cells are characterized by their ability to differentiate into multiple cell types. At the molecular level, this pluripotency is reflected in the gene expression patterns:

Pluripotent cells: Express genes broadly, with signaling flowing through many pathways
Differentiated cells: Express genes in a more focused pattern, with signaling concentrated in specific pathways

The signaling entropy quantifies this “randomness” of information flow through the protein-protein interaction (PPI) network.

Mathematical Framework

Signaling Entropy Rate (SR)

Given a gene expression profile \(\mathbf{x} = (x_1, ..., x_n)\) and an adjacency matrix \(\mathbf{A}\) of the PPI network, the SR is computed as follows:

Step 1: Transition Probabilities

For each gene \(j\), compute the probability of signaling to neighbor \(k\):

\[P_{jk} = \frac{A_{jk} \cdot x_k}{\sum_l A_{jl} \cdot x_l}\]

This represents the probability that a signal at gene \(j\) will transition to gene \(k\), weighted by the expression level of \(k\).

Step 2: Stationary Distribution

The stationary distribution \(\pi_j\) represents the long-term probability of finding a signal at gene \(j\):

\[\pi_j = \frac{x_j \cdot (\mathbf{A}\mathbf{x})_j}{Z}\]

where \(Z = \sum_j x_j \cdot (\mathbf{A}\mathbf{x})_j\) is the normalization constant.

Step 3: Local Entropy

The local entropy at gene \(j\) measures the uncertainty in signaling transitions:

\[S_j = -\sum_k P_{jk} \log P_{jk}\]

Step 4: Global Entropy Rate

The signaling entropy rate is the weighted average of local entropies:

\[SR = \frac{\sum_j \pi_j S_j}{SR_{max}}\]

where \(SR_{max} = \log(\lambda_1)\) and \(\lambda_1\) is the largest eigenvalue of \(\mathbf{A}\).

Visual Demonstration

library(SCENT)
library(ggplot2)
library(Matrix)

data(net13Jun12.m)

Network Structure

# Network statistics
n_genes <- nrow(net13Jun12.m)
n_edges <- sum(net13Jun12.m) / 2
degrees <- rowSums(net13Jun12.m)

cat("Network Statistics:\n")
#> Network Statistics:
cat("  Genes:", n_genes, "\n")
#>   Genes: 8434
cat("  Interactions:", n_edges, "\n")
#>   Interactions: 303600
cat("  Mean degree:", round(mean(degrees), 2), "\n")
#>   Mean degree: 71.99
cat("  Max degree:", max(degrees), "\n")
#>   Max degree: 1030

# Degree distribution
df_degree <- data.frame(degree = degrees)

ggplot(df_degree, aes(x = degree)) +
  geom_histogram(bins = 50, fill = "#3498db", alpha = 0.7, color = "white") +
  scale_x_log10() +
  labs(
    title = "PPI Network Degree Distribution",
    subtitle = "Scale-free network property",
    x = "Degree (log scale)",
    y = "Count"
  ) +
  theme_minimal() +
  theme(plot.title = element_text(face = "bold"))

Entropy Computation Example

set.seed(123)

# Create two contrasting expression patterns
n_genes_sim <- 5500
n_cells <- 20

# Pattern 1: Uniform expression (high entropy)
exp_uniform <- matrix(rep(5, n_genes_sim * n_cells), nrow = n_genes_sim)
rownames(exp_uniform) <- head(rownames(net13Jun12.m), n_genes_sim)

# Pattern 2: Focused expression (low entropy)
exp_focused <- matrix(1, nrow = n_genes_sim, ncol = n_cells)
exp_focused[1:500, ] <- 50  # High expression in subset
rownames(exp_focused) <- head(rownames(net13Jun12.m), n_genes_sim)

# Compute SR
integ_uniform <- DoIntegPPI(exp_uniform, net13Jun12.m)
integ_focused <- DoIntegPPI(exp_focused, net13Jun12.m)

sr_uniform <- CompSRana(integ_uniform)
sr_focused <- CompSRana(integ_focused)

cat("Uniform expression SR:", round(mean(sr_uniform$SR), 4), "\n")
#> Uniform expression SR: 0.9121
cat("Focused expression SR:", round(mean(sr_focused$SR), 4), "\n")
#> Focused expression SR: 0.8868

df_patterns <- data.frame(
  Pattern = c(rep("Uniform\n(Pluripotent-like)", n_cells), 
              rep("Focused\n(Differentiated-like)", n_cells)),
  SR = c(sr_uniform$SR, sr_focused$SR)
)

ggplot(df_patterns, aes(x = Pattern, y = SR, fill = Pattern)) +
  geom_boxplot(alpha = 0.7, outlier.shape = NA) +
  geom_jitter(width = 0.2, alpha = 0.5, size = 2) +
  scale_fill_manual(values = c("#e74c3c", "#3498db")) +
  labs(
    title = "SR Reflects Expression Pattern Entropy",
    subtitle = "Higher SR = More pluripotent-like state",
    x = "",
    y = "Signaling Entropy Rate"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold"),
    legend.position = "none"
  )

CCAT: Fast Approximation

CCAT (Correlation of Connectome And Transcriptome) is based on the observation that:

Pluripotent cells express hub genes (high-degree nodes) at higher levels

The CCAT score is simply the Pearson correlation between gene expression and network degree:

\[CCAT = cor(\mathbf{x}, \mathbf{k})\]

where \(\mathbf{k} = (k_1, ..., k_n)\) are the node degrees.

# Demonstrate CCAT
ccat_uniform <- CompCCAT(exp_uniform, net13Jun12.m)
ccat_focused <- CompCCAT(exp_focused, net13Jun12.m)

cat("Uniform expression CCAT:", round(mean(ccat_uniform), 4), "\n")
#> Uniform expression CCAT: NA
cat("Focused expression CCAT:", round(mean(ccat_focused), 4), "\n")
#> Focused expression CCAT: 0.4218

Why SR and CCAT Correlate

The mathematical connection:

High SR ← Uniform signaling flow ← Broad gene expression
High hub expression ← Broad expression pattern ← High CCAT

Therefore, SR and CCAT capture the same biological phenomenon from different angles.

set.seed(42)
exp_test <- matrix(rpois(5500 * 100, 5), nrow = 5500)
rownames(exp_test) <- head(rownames(net13Jun12.m), 5500)

integ_test <- DoIntegPPI(exp_test, net13Jun12.m)
sr_test <- CompSRana(integ_test)
ccat_test <- CompCCAT(exp_test, net13Jun12.m)

r <- cor(sr_test$SR, ccat_test)
cat("SR-CCAT correlation in random data: r =", round(r, 3), "\n")
#> SR-CCAT correlation in random data: r = 0.856
cat("(Original paper reports r ~ 0.78)\n")
#> (Original paper reports r ~ 0.78)

References

Teschendorff AE, Enver T. Single-cell entropy for accurate estimation of differentiation potency from a cell’s transcriptome. Nature Communications. 2017;8:15599. doi:10.1038/ncomms15599

Session Info

sessionInfo()
#> R version 4.6.0 (2026-04-24)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.4 LTS
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> time zone: Etc/UTC
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] Matrix_1.7-5   ggplot2_4.0.3  SCENT_2.0.0    rmarkdown_2.31
#> 
#> loaded via a namespace (and not attached):
#>  [1] gtable_0.3.6       jsonlite_2.0.0     dplyr_1.2.1        compiler_4.6.0    
#>  [5] tidyselect_1.2.1   Rcpp_1.1.1-1.1     jquerylib_0.1.4    scales_1.4.0      
#>  [9] yaml_2.3.12        fastmap_1.2.0      lattice_0.22-9     R6_2.6.1          
#> [13] labeling_0.4.3     generics_0.1.4     igraph_2.3.2       knitr_1.51        
#> [17] tibble_3.3.1       maketools_1.3.2    bslib_0.11.0       pillar_1.11.1     
#> [21] RColorBrewer_1.1-3 rlang_1.2.0        cachem_1.1.0       xfun_0.59         
#> [25] sass_0.4.10        sys_3.4.3          S7_0.2.2           otel_0.2.0        
#> [29] cli_3.6.6          withr_3.0.3        magrittr_2.0.5     digest_0.6.39     
#> [33] grid_4.6.0         lifecycle_1.0.5    vctrs_0.7.3        evaluate_1.0.5    
#> [37] glue_1.8.1         farver_2.1.2       buildtools_1.0.0   tools_4.6.0       
#> [41] pkgconfig_2.0.3    htmltools_0.5.9