--- title: "Algorithm Details: Mathematical Foundation of scaper" author: "Zaoqu Liu" date: "`r Sys.Date()`" output: rmarkdown::html_vignette: toc: true toc_depth: 3 vignette: > %\VignetteIndexEntry{Algorithm Details} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include=FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", warning = FALSE, message = FALSE ) ``` ## Introduction This vignette explains the mathematical algorithms underlying **scaper**. The package implements the Variance-Adjusted Mahalanobis (VAM) method for cytokine activity scoring with bidirectional gene set integration. ## Mathematical Formulation ### Variance-Adjusted Mahalanobis (VAM) Distance For a gene set $G$ with genes $g_1, g_2, \ldots, g_k$ and weights $w_1, w_2, \ldots, w_k$, the VAM score for cell $c$ is: $$\text{VAM}_c(G) = \frac{(\mathbf{w}^T \mathbf{x}_c - \mu_w)^2}{\sigma_w^2}$$ where: - $\mathbf{x}_c$: Expression vector for cell $c$ - $\mathbf{w}$: Weight vector for gene set $G$ - $\mu_w$: Expected weighted sum under null - $\sigma_w^2$: Variance accounting for correlation structure ### CDF Transformation The VAM distance follows a chi-square distribution: $$\text{CDF}_c(G) = P(\chi^2_1 \leq \text{VAM}_c(G))$$ ### Bidirectional Scoring (CytoSig) For cytokines with both positive and negative target genes: $$\boxed{\text{Activity} = \frac{\text{CDF}^+ + (1 - \text{CDF}^-)}{2}}$$ This ensures correct biological interpretation: - High positive gene expression + Low negative gene expression → High activity - Low positive gene expression + High negative gene expression → Low activity ## Gene Set Information ### CytoSig Database ```{r cytosig_info} library(scaper) # Get CytoSig gene set for IL6 file_path <- system.file("extdata", "IL6_output.csv", package = "scaper") il6_genes <- genesetCytoSig(cytokine.eval = "IL6", file.name = file_path) # Analyze weight distribution positive <- sum(il6_genes$weight > 0) negative <- sum(il6_genes$weight < 0) cat("IL6 Gene Set Summary:\n") cat("- Total genes:", nrow(il6_genes), "\n") cat("- Positive weights (up-regulated):", positive, "\n") cat("- Negative weights (down-regulated):", negative, "\n") ``` ### Weight Distribution ```{r weight_dist, fig.width=7, fig.height=4} par(mfrow = c(1, 2)) hist(il6_genes$weight[il6_genes$weight > 0], breaks = 20, col = "red", main = "Positive Weights", xlab = "Log2 Fold Change") hist(il6_genes$weight[il6_genes$weight < 0], breaks = 20, col = "blue", main = "Negative Weights", xlab = "Log2 Fold Change") ``` ## Reactome Database ```{r reactome_info} # Get Reactome gene set for IL6 file_path <- system.file("extdata", "IL6_Interleukin6_signaling.xml", package = "scaper") il6_reactome <- genesetReactome(cytokine.eval = "IL6", file.name = file_path) cat("IL6 Reactome Pathway:\n") cat("- Total genes:", nrow(il6_reactome), "\n") head(il6_reactome, 10) ``` ## Score Interpretation | Score Range | Interpretation | |-------------|----------------| | > 0.7 | Strong pathway activation | | 0.3 - 0.7 | Moderate/basal activity | | < 0.3 | Minimal activity | ## References 1. **Frost (2020)**. Variance-adjusted Mahalanobis (VAM). *Nucleic Acids Research*, 48(16), e94. 2. **Jiang et al. (2021)**. CytoSig database. *Nature Methods*, 18(10), 1181-1186. 3. **Gillespie et al. (2022)**. Reactome pathway database. *Nucleic Acids Research*, 50(D1), D687-D692. ## Session Information ```{r session} sessionInfo() ``` --- **Author**: Zaoqu Liu **GitHub**: [Zaoqu-Liu/scaper](https://github.com/Zaoqu-Liu/scaper)