CellOracleR implements a mathematical framework for predicting cell state transitions following transcription factor (TF) perturbations. This vignette describes the core algorithms underlying the package.
For each target gene \(g\), we model its expression as a linear function of its regulators:
\[ X_g = \sum_{r \in R_g} \beta_{r,g} \cdot X_r + \epsilon_g \]
where:
We use Ridge regression (L2 regularization) to estimate \(\beta\) coefficients:
\[ \hat{\beta} = \arg\min_\beta \left\{ \|X_g - X_R \beta\|_2^2 + \alpha \|\beta\|_2^2 \right\} \]
The closed-form solution is:
\[ \hat{\beta} = (X_R^T X_R + \alpha I)^{-1} X_R^T X_g \]
Why Ridge Regression?
To improve robustness, we employ bagging:
\[ \hat{\beta}_{final} = \text{median}(\hat{\beta}_1, \hat{\beta}_2, \ldots, \hat{\beta}_B) \]
Advantages:
Given a perturbation condition (e.g., TF knockout), we simulate the downstream effects:
\[ \Delta X^{(0)} = X_{perturbed} - X_{original} \]
The signal propagates through the GRN iteratively:
\[ \Delta X^{(t+1)} = \Delta X^{(t)} \cdot W \]
where \(W\) is the coefficient matrix (genes × genes).
Key constraint: Perturbed gene values are maintained at each step:
\[ \Delta X^{(t+1)}_p = \Delta X^{(0)}_p \quad \text{for all perturbed genes } p \]
Gene expression cannot be negative:
\[ X_{simulated} = \max(0, X_{original} + \Delta X) \]
Signal propagation through GRN
We estimate the probability of cell \(i\) transitioning to cell \(j\) based on:
\[ \rho_{ij} = \text{cor}(\Delta X_i, X_j - X_i) \]
This measures how well the predicted expression change aligns with the direction toward cell \(j\).
Correlations are converted to probabilities using an exponential kernel:
\[ P_{ij} = \frac{\exp(\rho_{ij} / \sigma)}{\sum_{k \in N_i} \exp(\rho_{ik} / \sigma)} \]
where:
The expected movement in embedding space:
\[ \Delta E_i = \sum_{j \in N_i} P_{ij} \cdot \hat{d}_{ij} \]
where \(\hat{d}_{ij}\) is the unit vector from cell \(i\) to cell \(j\) in embedding space.
Cell fate is modeled as a discrete-time Markov chain:
\[ P(S_{t+1} = j | S_t = i) = P_{ij} \]
For each starting cell, we simulate \(N\) steps:
Algorithm: Markov Walk
Input: transition_prob P, start_cell s, n_steps T
Output: trajectory [s_0, s_1, ..., s_T]
s_0 ← s
for t = 1 to T:
sample s_t from Categorical(P[s_{t-1}, :])
return [s_0, s_1, ..., s_T]
For absorbing Markov chains, we can compute the probability of reaching each terminal state:
\[ F = (I - Q)^{-1} R \]
where \(Q\) is the transient-to-transient transition matrix and \(R\) is the transient-to-absorbing matrix.
Degree centrality: \[ C_D(v) = \text{deg}(v) = |N(v)| \]
Betweenness centrality: \[ C_B(v) = \sum_{s \neq v \neq t} \frac{\sigma_{st}(v)}{\sigma_{st}} \]
Eigenvector centrality: \[ C_E(v) = \frac{1}{\lambda} \sum_{u \in N(v)} C_E(u) \]
Degree distribution entropy:
\[ H = -\sum_{k} p(k) \log_2 p(k) \]
Higher entropy indicates more heterogeneous connectivity.
CellOracleR combines:
These mathematical foundations enable accurate prediction of cellular responses to genetic perturbations.
Kamimoto, K., et al. (2023). CellOracle: Dissecting cell identity via network inference and in silico gene perturbation. Molecular Systems Biology.
Hoerl, A.E. & Kennard, R.W. (1970). Ridge Regression: Biased Estimation for Nonorthogonal Problems. Technometrics.
Breiman, L. (1996). Bagging Predictors. Machine Learning.
sessionInfo()
#> R version 4.6.0 (2026-04-24)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.4 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: Etc/UTC
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] ggplot2_4.0.3 rmarkdown_2.31
#>
#> loaded via a namespace (and not attached):
#> [1] vctrs_0.7.3 cli_3.6.6 knitr_1.51 rlang_1.2.0
#> [5] xfun_0.57 otel_0.2.0 generics_0.1.4 S7_0.2.2
#> [9] jsonlite_2.0.0 labeling_0.4.3 glue_1.8.1 buildtools_1.0.0
#> [13] htmltools_0.5.9 maketools_1.3.2 sys_3.4.3 sass_0.4.10
#> [17] scales_1.4.0 grid_4.6.0 tibble_3.3.1 evaluate_1.0.5
#> [21] jquerylib_0.1.4 fastmap_1.2.0 yaml_2.3.12 lifecycle_1.0.5
#> [25] compiler_4.6.0 dplyr_1.2.1 RColorBrewer_1.1-3 pkgconfig_2.0.3
#> [29] farver_2.1.2 digest_0.6.39 R6_2.6.1 tidyselect_1.2.1
#> [33] pillar_1.11.1 magrittr_2.0.5 bslib_0.11.0 withr_3.0.2
#> [37] tools_4.6.0 gtable_0.3.6 cachem_1.1.0