--- title: "Algorithm and Mathematical Background" author: "Zaoqu Liu" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Algorithm and Mathematical Background} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 7, fig.height = 5, message = FALSE, warning = FALSE ) ``` ## Biological Motivation Stem cells are characterized by their ability to differentiate into multiple cell types. At the molecular level, this **pluripotency** is reflected in the gene expression patterns: - **Pluripotent cells**: Express genes broadly, with signaling flowing through many pathways - **Differentiated cells**: Express genes in a more focused pattern, with signaling concentrated in specific pathways The **signaling entropy** quantifies this "randomness" of information flow through the protein-protein interaction (PPI) network. ## Mathematical Framework ### Signaling Entropy Rate (SR) Given a gene expression profile $\mathbf{x} = (x_1, ..., x_n)$ and an adjacency matrix $\mathbf{A}$ of the PPI network, the SR is computed as follows: #### Step 1: Transition Probabilities For each gene $j$, compute the probability of signaling to neighbor $k$: $$P_{jk} = \frac{A_{jk} \cdot x_k}{\sum_l A_{jl} \cdot x_l}$$ This represents the probability that a signal at gene $j$ will transition to gene $k$, weighted by the expression level of $k$. #### Step 2: Stationary Distribution The stationary distribution $\pi_j$ represents the long-term probability of finding a signal at gene $j$: $$\pi_j = \frac{x_j \cdot (\mathbf{A}\mathbf{x})_j}{Z}$$ where $Z = \sum_j x_j \cdot (\mathbf{A}\mathbf{x})_j$ is the normalization constant. #### Step 3: Local Entropy The local entropy at gene $j$ measures the uncertainty in signaling transitions: $$S_j = -\sum_k P_{jk} \log P_{jk}$$ #### Step 4: Global Entropy Rate The signaling entropy rate is the weighted average of local entropies: $$SR = \frac{\sum_j \pi_j S_j}{SR_{max}}$$ where $SR_{max} = \log(\lambda_1)$ and $\lambda_1$ is the largest eigenvalue of $\mathbf{A}$. ## Visual Demonstration ```{r load} library(SCENT) library(ggplot2) library(Matrix) data(net13Jun12.m) ``` ### Network Structure ```{r network-stats} # Network statistics n_genes <- nrow(net13Jun12.m) n_edges <- sum(net13Jun12.m) / 2 degrees <- rowSums(net13Jun12.m) cat("Network Statistics:\n") cat(" Genes:", n_genes, "\n") cat(" Interactions:", n_edges, "\n") cat(" Mean degree:", round(mean(degrees), 2), "\n") cat(" Max degree:", max(degrees), "\n") ``` ```{r degree-dist, fig.height=4} # Degree distribution df_degree <- data.frame(degree = degrees) ggplot(df_degree, aes(x = degree)) + geom_histogram(bins = 50, fill = "#3498db", alpha = 0.7, color = "white") + scale_x_log10() + labs( title = "PPI Network Degree Distribution", subtitle = "Scale-free network property", x = "Degree (log scale)", y = "Count" ) + theme_minimal() + theme(plot.title = element_text(face = "bold")) ``` ### Entropy Computation Example ```{r entropy-example} set.seed(123) # Create two contrasting expression patterns n_genes_sim <- 5500 n_cells <- 20 # Pattern 1: Uniform expression (high entropy) exp_uniform <- matrix(rep(5, n_genes_sim * n_cells), nrow = n_genes_sim) rownames(exp_uniform) <- head(rownames(net13Jun12.m), n_genes_sim) # Pattern 2: Focused expression (low entropy) exp_focused <- matrix(1, nrow = n_genes_sim, ncol = n_cells) exp_focused[1:500, ] <- 50 # High expression in subset rownames(exp_focused) <- head(rownames(net13Jun12.m), n_genes_sim) # Compute SR integ_uniform <- DoIntegPPI(exp_uniform, net13Jun12.m) integ_focused <- DoIntegPPI(exp_focused, net13Jun12.m) sr_uniform <- CompSRana(integ_uniform) sr_focused <- CompSRana(integ_focused) cat("Uniform expression SR:", round(mean(sr_uniform$SR), 4), "\n") cat("Focused expression SR:", round(mean(sr_focused$SR), 4), "\n") ``` ```{r entropy-viz, fig.height=4} df_patterns <- data.frame( Pattern = c(rep("Uniform\n(Pluripotent-like)", n_cells), rep("Focused\n(Differentiated-like)", n_cells)), SR = c(sr_uniform$SR, sr_focused$SR) ) ggplot(df_patterns, aes(x = Pattern, y = SR, fill = Pattern)) + geom_boxplot(alpha = 0.7, outlier.shape = NA) + geom_jitter(width = 0.2, alpha = 0.5, size = 2) + scale_fill_manual(values = c("#e74c3c", "#3498db")) + labs( title = "SR Reflects Expression Pattern Entropy", subtitle = "Higher SR = More pluripotent-like state", x = "", y = "Signaling Entropy Rate" ) + theme_minimal() + theme( plot.title = element_text(face = "bold"), legend.position = "none" ) ``` ## CCAT: Fast Approximation **CCAT** (Correlation of Connectome And Transcriptome) is based on the observation that: > Pluripotent cells express hub genes (high-degree nodes) at higher levels The CCAT score is simply the Pearson correlation between gene expression and network degree: $$CCAT = cor(\mathbf{x}, \mathbf{k})$$ where $\mathbf{k} = (k_1, ..., k_n)$ are the node degrees. ```{r ccat-demo} # Demonstrate CCAT ccat_uniform <- CompCCAT(exp_uniform, net13Jun12.m) ccat_focused <- CompCCAT(exp_focused, net13Jun12.m) cat("Uniform expression CCAT:", round(mean(ccat_uniform), 4), "\n") cat("Focused expression CCAT:", round(mean(ccat_focused), 4), "\n") ``` ## Why SR and CCAT Correlate The mathematical connection: 1. **High SR** ← Uniform signaling flow ← Broad gene expression 2. **High hub expression** ← Broad expression pattern ← **High CCAT** Therefore, SR and CCAT capture the same biological phenomenon from different angles. ```{r correlation-demo} set.seed(42) exp_test <- matrix(rpois(5500 * 100, 5), nrow = 5500) rownames(exp_test) <- head(rownames(net13Jun12.m), 5500) integ_test <- DoIntegPPI(exp_test, net13Jun12.m) sr_test <- CompSRana(integ_test) ccat_test <- CompCCAT(exp_test, net13Jun12.m) r <- cor(sr_test$SR, ccat_test) cat("SR-CCAT correlation in random data: r =", round(r, 3), "\n") cat("(Original paper reports r ~ 0.78)\n") ``` ## References 1. Teschendorff AE, Enver T. **Single-cell entropy for accurate estimation of differentiation potency from a cell's transcriptome.** *Nature Communications.* 2017;8:15599. doi:10.1038/ncomms15599 ## Session Info ```{r session} sessionInfo() ```