Algorithm Details: Mathematical Foundation of scaper

Introduction

This vignette explains the mathematical algorithms underlying scaper. The package implements the Variance-Adjusted Mahalanobis (VAM) method for cytokine activity scoring with bidirectional gene set integration.

Mathematical Formulation

Variance-Adjusted Mahalanobis (VAM) Distance

For a gene set \(G\) with genes \(g_1, g_2, \ldots, g_k\) and weights \(w_1, w_2, \ldots, w_k\), the VAM score for cell \(c\) is:

\[\text{VAM}_c(G) = \frac{(\mathbf{w}^T \mathbf{x}_c - \mu_w)^2}{\sigma_w^2}\]

where:

  • \(\mathbf{x}_c\): Expression vector for cell \(c\)
  • \(\mathbf{w}\): Weight vector for gene set \(G\)
  • \(\mu_w\): Expected weighted sum under null
  • \(\sigma_w^2\): Variance accounting for correlation structure

CDF Transformation

The VAM distance follows a chi-square distribution:

\[\text{CDF}_c(G) = P(\chi^2_1 \leq \text{VAM}_c(G))\]

Bidirectional Scoring (CytoSig)

For cytokines with both positive and negative target genes:

\[\boxed{\text{Activity} = \frac{\text{CDF}^+ + (1 - \text{CDF}^-)}{2}}\]

This ensures correct biological interpretation: - High positive gene expression + Low negative gene expression → High activity - Low positive gene expression + High negative gene expression → Low activity

Gene Set Information

CytoSig Database

library(scaper)

# Get CytoSig gene set for IL6
file_path <- system.file("extdata", "IL6_output.csv", package = "scaper")
il6_genes <- genesetCytoSig(cytokine.eval = "IL6", file.name = file_path)

# Analyze weight distribution
positive <- sum(il6_genes$weight > 0)
negative <- sum(il6_genes$weight < 0)

cat("IL6 Gene Set Summary:\n")
#> IL6 Gene Set Summary:
cat("- Total genes:", nrow(il6_genes), "\n")
#> - Total genes: 1684
cat("- Positive weights (up-regulated):", positive, "\n")
#> - Positive weights (up-regulated): 947
cat("- Negative weights (down-regulated):", negative, "\n")
#> - Negative weights (down-regulated): 737

Weight Distribution

par(mfrow = c(1, 2))
hist(il6_genes$weight[il6_genes$weight > 0], breaks = 20, col = "red",
     main = "Positive Weights", xlab = "Log2 Fold Change")
hist(il6_genes$weight[il6_genes$weight < 0], breaks = 20, col = "blue", 
     main = "Negative Weights", xlab = "Log2 Fold Change")

Reactome Database

# Get Reactome gene set for IL6
file_path <- system.file("extdata", "IL6_Interleukin6_signaling.xml", package = "scaper")
il6_reactome <- genesetReactome(cytokine.eval = "IL6", file.name = file_path)

cat("IL6 Reactome Pathway:\n")
#> IL6 Reactome Pathway:
cat("- Total genes:", nrow(il6_reactome), "\n")
#> - Total genes: 29
head(il6_reactome, 10)
#>     gene cytokine
#> 1   JAK1      IL6
#> 2  IL6ST      IL6
#> 3    IL6      IL6
#> 4   IL6R      IL6
#> 5   JAK2      IL6
#> 6   TYK2      IL6
#> 7    JAK      IL6
#> 8  STAT1      IL6
#> 9  STAT3      IL6
#> 10  STAT      IL6

Score Interpretation

Score Range Interpretation
> 0.7 Strong pathway activation
0.3 - 0.7 Moderate/basal activity
< 0.3 Minimal activity

References

  1. Frost (2020). Variance-adjusted Mahalanobis (VAM). Nucleic Acids Research, 48(16), e94.
  2. Jiang et al. (2021). CytoSig database. Nature Methods, 18(10), 1181-1186.
  3. Gillespie et al. (2022). Reactome pathway database. Nucleic Acids Research, 50(D1), D687-D692.

Session Information

sessionInfo()
#> R version 4.6.0 (2026-04-24)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.4 LTS
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> time zone: Etc/UTC
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] scaper_0.2.1   rmarkdown_2.31
#> 
#> loaded via a namespace (and not attached):
#>   [1] deldir_2.0-4           pbapply_1.7-4          gridExtra_2.3         
#>   [4] rlang_1.2.0            magrittr_2.0.5         RcppAnnoy_0.0.23      
#>   [7] otel_0.2.0             spatstat.geom_3.8-1    matrixStats_1.5.0     
#>  [10] ggridges_0.5.7         compiler_4.6.0         png_0.1-9             
#>  [13] vctrs_0.7.3            reshape2_1.4.5         stringr_1.6.0         
#>  [16] pkgconfig_2.0.3        fastmap_1.2.0          promises_1.5.0        
#>  [19] purrr_1.2.2            xfun_0.57              cachem_1.1.0          
#>  [22] jsonlite_2.0.0         goftest_1.2-3          later_1.4.8           
#>  [25] spatstat.utils_3.2-3   irlba_2.3.7            parallel_4.6.0        
#>  [28] cluster_2.1.8.2        R6_2.6.1               ica_1.0-3             
#>  [31] spatstat.data_3.1-9    bslib_0.11.0           stringi_1.8.7         
#>  [34] RColorBrewer_1.1-3     reticulate_1.46.0      spatstat.univar_3.2-0 
#>  [37] parallelly_1.47.0      lmtest_0.9-40          jquerylib_0.1.4       
#>  [40] scattermore_1.2        Rcpp_1.1.1-1.1         knitr_1.51            
#>  [43] tensor_1.5.1           future.apply_1.20.2    zoo_1.8-15            
#>  [46] sctransform_0.4.3      httpuv_1.6.17          Matrix_1.7-5          
#>  [49] splines_4.6.0          igraph_2.3.1           tidyselect_1.2.1      
#>  [52] abind_1.4-8            yaml_2.3.12            spatstat.random_3.5-0 
#>  [55] codetools_0.2-20       miniUI_0.1.2           spatstat.explore_3.8-1
#>  [58] listenv_0.10.1         lattice_0.22-9         tibble_3.3.1          
#>  [61] plyr_1.8.9             shiny_1.13.0           S7_0.2.2              
#>  [64] ROCR_1.0-12            evaluate_1.0.5         Rtsne_0.17            
#>  [67] future_1.70.0          fastDummies_1.7.6      survival_3.8-6        
#>  [70] polyclip_1.10-7        xml2_1.5.2             fitdistrplus_1.2-6    
#>  [73] pillar_1.11.1          Seurat_5.5.0           KernSmooth_2.23-26    
#>  [76] plotly_4.12.0          generics_0.1.4         RcppHNSW_0.6.0        
#>  [79] sp_2.2-1               ggplot2_4.0.3          scales_1.4.0          
#>  [82] globals_0.19.1         xtable_1.8-8           glue_1.8.1            
#>  [85] lazyeval_0.2.3         maketools_1.3.2        tools_4.6.0           
#>  [88] sys_3.4.3              data.table_1.18.4      RSpectra_0.16-2       
#>  [91] RANN_2.6.2             buildtools_1.0.0       dotCall64_1.2         
#>  [94] cowplot_1.2.0          grid_4.6.0             tidyr_1.3.2           
#>  [97] nlme_3.1-169           patchwork_1.3.2        cli_3.6.6             
#> [100] spatstat.sparse_3.2-0  spam_2.11-3            viridisLite_0.4.3     
#> [103] dplyr_1.2.1            uwot_0.2.4             gtable_0.3.6          
#> [106] sass_0.4.10            digest_0.6.39          progressr_0.19.0      
#> [109] ggrepel_0.9.8          htmlwidgets_1.6.4      SeuratObject_5.4.0    
#> [112] farver_2.1.2           htmltools_0.5.9        lifecycle_1.0.5       
#> [115] httr_1.4.8             mime_0.13              MASS_7.3-65

Author: Zaoqu Liu
GitHub: Zaoqu-Liu/scaper