tradeSeq implements a flexible statistical framework based on Generalized Additive Models (GAMs) for trajectory-based differential expression analysis. This vignette provides a detailed explanation of the statistical methodology underlying the package.
For each gene \(g\), tradeSeq fits a negative binomial generalized additive model of the form:
\[\log(\mu_{gc}) = \eta_{gc} = \sum_{l=1}^{L} \mathbf{1}_{cl} \cdot s_l(t_{cl}) + \mathbf{U}_c \boldsymbol{\beta} + O_c\]
Where:
The smooth functions \(s_l(t)\) are modeled using cubic regression splines with a shared set of knots across lineages. This approach:
The number and placement of knots determine the flexibility of the fitted curves. tradeSeq provides:
evaluateK() function: Helps select
optimal number of knots using AICtradeSeq implements several statistical tests based on Wald statistics computed from the fitted GAM coefficients.
Tests whether gene expression changes significantly along pseudotime within each lineage.
Null Hypothesis: Expression is constant along pseudotime \[H_0: s_l(t) = c \text{ for all } t\]
Test Statistic: Wald statistic comparing consecutive knot coefficients
Tests whether genes are differentially expressed at the endpoints of different lineages.
Null Hypothesis: Expression at endpoints is equal across lineages \[H_0: s_1(t_{max,1}) = s_2(t_{max,2})\]
Tests whether genes have different expression patterns (shapes) between lineages.
Null Hypothesis: Expression patterns are identical across lineages \[H_0: s_1(t) = s_2(t) \text{ for all } t\]
Tests for differential expression in a specific region of the trajectory (e.g., near the branching point).
All tests in tradeSeq are based on the Wald test framework:
\[W = (\mathbf{L}\hat{\boldsymbol{\beta}})^T [\mathbf{L}\hat{\mathbf{V}}\mathbf{L}^T]^{-1} (\mathbf{L}\hat{\boldsymbol{\beta}})\]
Where:
Under \(H_0\), \(W \sim \chi^2_r\) where \(r\) is the rank of \(\mathbf{L}\).
tradeSeq supports testing against a log2 fold change threshold:
\[H_0: |\mathbf{L}\boldsymbol{\beta}| \leq \log(2^{\text{l2fc}})\]
This is useful when biological significance requires not just statistical significance but also a meaningful effect size.
When analyzing data with multiple experimental conditions, tradeSeq fits condition-specific smoothers for each lineage:
\[\log(\mu_{gc}) = \sum_{l=1}^{L} \sum_{k=1}^{K} \mathbf{1}_{clk} \cdot s_{lk}(t_{cl}) + O_c\]
Where \(K\) is the number of conditions. This allows testing for:
tradeSeq supports parallel computation through the BiocParallel framework:
For large datasets, consider:
Van den Berge K, Roux de Bézieux H, Street K, Saelens W, Cannoodt R, Saeys Y, Dudoit S, Clement L. (2020). Trajectory-based differential expression analysis for single-cell sequencing data. Nature Communications, 11:1201. DOI: 10.1038/s41467-020-14766-3
## R version 4.6.0 (2026-04-24)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.4 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
##
## Random number generation:
## RNG: Mersenne-Twister
## Normal: Inversion
## Sample: Rounding
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Etc/UTC
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] ggplot2_4.0.3 mgcv_1.9-4
## [3] nlme_3.1-169 slingshot_2.21.0
## [5] TrajectoryUtils_1.21.0 princurve_2.1.6
## [7] SingleCellExperiment_1.35.1 SummarizedExperiment_1.43.0
## [9] Biobase_2.73.1 GenomicRanges_1.65.0
## [11] Seqinfo_1.3.0 IRanges_2.47.1
## [13] S4Vectors_0.51.2 BiocGenerics_0.59.3
## [15] generics_0.1.4 MatrixGenerics_1.25.0
## [17] matrixStats_1.5.0 RColorBrewer_1.1-3
## [19] tradeSeq_1.13.13 knitr_1.51
## [21] rmarkdown_2.31
##
## loaded via a namespace (and not attached):
## [1] gtable_0.3.6 xfun_0.57 bslib_0.11.0
## [4] lattice_0.22-9 vctrs_0.7.3 tools_4.6.0
## [7] parallel_4.6.0 tibble_3.3.1 pkgconfig_2.0.3
## [10] Matrix_1.7-5 S7_0.2.2 lifecycle_1.0.5
## [13] compiler_4.6.0 farver_2.1.2 statmod_1.5.2
## [16] codetools_0.2-20 htmltools_0.5.9 sys_3.4.3
## [19] buildtools_1.0.0 sass_0.4.10 yaml_2.3.12
## [22] pillar_1.11.1 jquerylib_0.1.4 BiocParallel_1.47.0
## [25] limma_3.69.1 DelayedArray_0.39.2 cachem_1.1.0
## [28] viridis_0.6.5 abind_1.4-8 locfit_1.5-9.12
## [31] tidyselect_1.2.1 digest_0.6.39 dplyr_1.2.1
## [34] labeling_0.4.3 maketools_1.3.2 splines_4.6.0
## [37] fastmap_1.2.0 grid_4.6.0 cli_3.6.6
## [40] SparseArray_1.13.2 magrittr_2.0.5 S4Arrays_1.13.0
## [43] withr_3.0.2 edgeR_4.11.1 scales_1.4.0
## [46] XVector_0.53.0 igraph_2.3.1 gridExtra_2.3
## [49] pbapply_1.7-4 evaluate_1.0.5 viridisLite_0.4.3
## [52] rlang_1.2.0 Rcpp_1.1.1-1.1 glue_1.8.1
## [55] jsonlite_2.0.0 R6_2.6.1