--- title: "Mathematical Framework of RNA Velocity" author: "Zaoqu Liu" date: "`r Sys.Date()`" output: rmarkdown::html_vignette: toc: true toc_depth: 3 fig_width: 7 fig_height: 5 vignette: > %\VignetteIndexEntry{Mathematical Framework of RNA Velocity} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 7, fig.height = 5, message = FALSE, warning = FALSE ) ``` ## Introduction This vignette provides a comprehensive overview of the mathematical framework underlying RNA velocity analysis as implemented in **scVeloR**. Understanding these principles is essential for proper interpretation of velocity results and for making informed decisions about model selection. ## The Central Dogma and mRNA Kinetics ### Biological Background Gene expression involves a series of biochemical processes: 1. **Transcription**: DNA → pre-mRNA (nascent/unspliced) 2. **Splicing**: pre-mRNA → mature mRNA (spliced) 3. **Degradation**: mRNA → degraded products RNA velocity leverages the distinction between **unspliced** (nascent) and **spliced** (mature) mRNA to infer the direction and rate of gene expression changes. ### The Kinetic Model The dynamics of mRNA abundance can be described by a system of ordinary differential equations (ODEs): $$\frac{du}{dt} = \alpha(t) - \beta u$$ $$\frac{ds}{dt} = \beta u - \gamma s$$ Where: - $u$: unspliced mRNA abundance - $s$: spliced mRNA abundance - $\alpha(t)$: transcription rate (time-dependent) - $\beta$: splicing rate (constitutive) - $\gamma$: degradation rate (constitutive) ## Analytical Solutions ### Constant Transcription (Induction Phase) When $\alpha(t) = \alpha$ (constant), the system has analytical solutions: **Unspliced mRNA:** $$u(t) = u_0 e^{-\beta t} + \frac{\alpha}{\beta}(1 - e^{-\beta t})$$ **Spliced mRNA:** $$s(t) = s_0 e^{-\gamma t} + \frac{\alpha}{\gamma}(1 - e^{-\gamma t}) + \frac{\alpha - u_0 \beta}{\gamma - \beta}(e^{-\gamma t} - e^{-\beta t})$$ ### Zero Transcription (Repression Phase) When transcription stops ($\alpha = 0$): $$u(t) = u_0 e^{-\beta t}$$ $$s(t) = s_0 e^{-\gamma t} + \frac{\beta u_0}{\gamma - \beta}(e^{-\gamma t} - e^{-\beta t})$$ ### Steady State At equilibrium ($du/dt = 0$, $ds/dt = 0$): $$u_{ss} = \frac{\alpha}{\beta}$$ $$s_{ss} = \frac{\alpha}{\gamma}$$ The ratio at steady state defines a fundamental relationship: $$\frac{u_{ss}}{s_{ss}} = \frac{\gamma}{\beta}$$ ```{r ode_visualization, echo=FALSE, fig.cap="Phase portrait showing the trajectory of gene expression during induction (blue) and repression (red) phases."} # Visualize ODE solutions library(ggplot2) # Parameters alpha <- 1 beta <- 0.5 gamma <- 0.3 t_switch <- 5 # Time points t_on <- seq(0, t_switch, length.out = 100) t_off <- seq(0, 10, length.out = 100) # Induction phase u_on <- alpha/beta * (1 - exp(-beta * t_on)) s_on <- alpha/gamma * (1 - exp(-gamma * t_on)) + (alpha)/(gamma - beta) * (exp(-gamma * t_on) - exp(-beta * t_on)) # Values at switch point u0_switch <- tail(u_on, 1) s0_switch <- tail(s_on, 1) # Repression phase u_off <- u0_switch * exp(-beta * t_off) s_off <- s0_switch * exp(-gamma * t_off) + (beta * u0_switch)/(gamma - beta) * (exp(-gamma * t_off) - exp(-beta * t_off)) # Combine data df <- data.frame( s = c(s_on, s_off), u = c(u_on, u_off), phase = rep(c("Induction", "Repression"), c(length(s_on), length(s_off))) ) # Plot ggplot(df, aes(x = s, y = u, color = phase)) + geom_path(linewidth = 1.2, arrow = arrow(length = unit(0.3, "cm"), type = "closed")) + geom_point(data = data.frame(s = c(0, alpha/gamma, 0), u = c(0, alpha/beta, 0), label = c("Start", "Steady State", "End")), aes(x = s, y = u), size = 3, color = "black") + annotate("text", x = alpha/gamma * 1.05, y = alpha/beta, label = "Steady State", hjust = 0) + scale_color_manual(values = c("Induction" = "#2196F3", "Repression" = "#F44336")) + labs(x = "Spliced (s)", y = "Unspliced (u)", title = "Phase Portrait of mRNA Dynamics", subtitle = expression(paste(alpha, " = 1, ", beta, " = 0.5, ", gamma, " = 0.3"))) + theme_minimal() + theme(legend.position = "bottom") ``` ## Velocity Estimation Models ### Model 1: Deterministic (Steady-State) **Assumption**: Cells are near transcriptional steady state. At steady state: $\gamma \approx \beta u / s$ **Velocity**: The deviation from steady state indicates the direction of change: $$v_s = \beta u - \gamma s$$ If $v_s > 0$: gene is being upregulated If $v_s < 0$: gene is being downregulated **Implementation in scVeloR**: ```{r steady_state_demo, eval=FALSE} # Fit gamma by linear regression of u vs s # v_s = β·u - γ·s = β(u - γ/β·s) # Use regression: u ~ s to get slope γ/β ``` ### Model 2: Stochastic **Assumption**: Account for transcriptional noise through second-order moments. The stochastic model uses both first moments (mean expression) and second moments (variance, covariance): $$\text{Var}(u) = E[u^2] - E[u]^2$$ $$\text{Cov}(u,s) = E[us] - E[u]E[s]$$ This provides more robust velocity estimates when transcriptional bursting is present. ### Model 3: Dynamical **Assumption**: Full kinetics model without steady-state assumption. The dynamical model infers all kinetic parameters ($\alpha$, $\beta$, $\gamma$) and the switching time ($t_-$) using an Expectation-Maximization (EM) algorithm. **E-step**: Assign latent time to each cell **M-step**: Update kinetic parameters ```{r em_algorithm, echo=FALSE, fig.cap="EM algorithm iteratively refines parameter estimates and time assignments."} # Illustrate EM convergence set.seed(42) iterations <- 10 likelihood <- cumsum(runif(iterations, 0.1, 0.3)) + 1 likelihood <- likelihood / max(likelihood) df_em <- data.frame( iteration = 1:iterations, likelihood = likelihood ) ggplot(df_em, aes(x = iteration, y = likelihood)) + geom_line(color = "#673AB7", linewidth = 1) + geom_point(color = "#673AB7", size = 3) + labs(x = "EM Iteration", y = "Log-Likelihood", title = "EM Algorithm Convergence") + theme_minimal() + scale_x_continuous(breaks = 1:10) ``` ## Velocity Graph Construction ### Cosine Similarity The velocity graph captures cell-cell transitions based on velocity correlation: $$\text{cosine}(v_i, \delta_{ij}) = \frac{v_i \cdot \delta_{ij}}{||v_i|| \cdot ||\delta_{ij}||}$$ Where: - $v_i$: velocity vector for cell $i$ - $\delta_{ij}$: expression difference between cell $i$ and $j$ ### Transition Probability The transition probability from cell $i$ to cell $j$: $$P_{ij} = \frac{\max(0, \text{cosine}(v_i, \delta_{ij}))}{\sum_k \max(0, \text{cosine}(v_i, \delta_{ik}))}$$ ```{r transition_matrix, echo=FALSE, fig.cap="Schematic of velocity-based transition probabilities."} # Illustrate transition concept set.seed(123) n <- 20 theta <- seq(0, 2*pi, length.out = n) r <- runif(n, 0.8, 1.2) x <- r * cos(theta) + rnorm(n, 0, 0.1) y <- r * sin(theta) + rnorm(n, 0, 0.1) # Add velocity vectors (tangent to circle + noise) vx <- -sin(theta) * 0.15 + rnorm(n, 0, 0.02) vy <- cos(theta) * 0.15 + rnorm(n, 0, 0.02) df_cells <- data.frame(x = x, y = y, vx = vx, vy = vy) ggplot(df_cells, aes(x = x, y = y)) + geom_point(size = 3, color = "#2196F3", alpha = 0.8) + geom_segment(aes(xend = x + vx, yend = y + vy), arrow = arrow(length = unit(0.15, "cm"), type = "closed"), color = "#F44336", linewidth = 0.5) + labs(title = "Velocity Vectors on Cell Embedding", subtitle = "Arrows indicate predicted future state") + theme_minimal() + theme(axis.text = element_blank(), axis.title = element_blank()) + coord_fixed() ``` ## Latent Time Inference ### Gene-Shared Latent Time The latent time represents the position of each cell along the differentiation trajectory: $$t_i = \frac{1}{|G|} \sum_{g \in G} t_{ig}$$ Where $t_{ig}$ is the inferred time for cell $i$ based on gene $g$. ### Root Cell Identification The root cells are identified as those with: 1. High connectivity in the velocity graph 2. Velocity vectors pointing "outward" (high end-point probability) ## Model Selection Guidelines | Criterion | Steady-State | Stochastic | Dynamical | |-----------|--------------|------------|-----------| | Speed | Fast | Medium | Slow | | Data requirement | Low | Medium | High | | Transcriptional dynamics | Equilibrium | Bursting | Transient | | Parameter inference | γ only | γ only | α, β, γ, t_ | | Recommended use | Exploration | Default | Publication | ## References 1. **Bergen, V., Lange, M., Peidli, S., Wolf, F. A., & Theis, F. J.** (2020). Generalizing RNA velocity to transient cell states through dynamical modeling. *Nature Biotechnology*, 38, 1408–1414. 2. **La Manno, G., Soldatov, R., Zeisel, A., et al.** (2018). RNA velocity of single cells. *Nature*, 560, 494–498. 3. **Svensson, V., & Pachter, L.** (2018). RNA velocity: Molecular kinetics from single-cell RNA-seq. *Molecular Cell*, 72, 7–9. ## Session Information ```{r session_info} sessionInfo() ```