--- title: "Algorithm Theory and Mathematical Framework" author: "Zaoqu Liu" date: "`r Sys.Date()`" output: rmarkdown::html_vignette: toc: true toc_depth: 3 vignette: > %\VignetteIndexEntry{Algorithm Theory and Mathematical Framework} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include=FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 7, fig.height = 5, warning = FALSE, message = FALSE ) ``` ## Introduction METAFLUX employs constraint-based metabolic modeling to infer intracellular metabolic fluxes from gene expression data. This vignette provides a comprehensive overview of the mathematical framework underlying the algorithm. ## Theoretical Background ### Genome-Scale Metabolic Models A genome-scale metabolic model (GEM) represents the complete set of metabolic reactions in an organism. Human-GEM, used by METAFLUX, contains: - **8,378 metabolites** - **13,082 reactions** - **3,625 metabolic genes** ### Stoichiometric Matrix The stoichiometric matrix $\mathbf{S}$ encodes the structure of the metabolic network: $$\mathbf{S} \in \mathbb{R}^{m \times n}$$ where: - $m$ = number of metabolites (8,378) - $n$ = number of reactions (13,082) - $S_{ij}$ = stoichiometric coefficient of metabolite $i$ in reaction $j$ ## Flux Balance Analysis (FBA) ### Steady-State Assumption At metabolic steady state, the rate of production equals the rate of consumption for each metabolite: $$\mathbf{S} \cdot \mathbf{v} = 0$$ where $\mathbf{v}$ is the flux vector. ### Optimization Problem METAFLUX solves a quadratic programming problem: $$\min_{\mathbf{v}} \quad \mathbf{v}^T \mathbf{P} \mathbf{v} + \mathbf{q}^T \mathbf{v}$$ subject to: $$\mathbf{S} \cdot \mathbf{v} = 0$$ $$\mathbf{lb} \leq \mathbf{v} \leq \mathbf{ub}$$ where: - $\mathbf{P}$ = identity matrix (regularization) - $\mathbf{q}$ = objective coefficients (biomass maximization) - $\mathbf{lb}, \mathbf{ub}$ = flux bounds from MRAS ### Biomass Objective The objective function maximizes biomass production: $$\mathbf{q} = \begin{cases} -10000 & \text{if reaction is biomass} \\ 0 & \text{otherwise} \end{cases}$$ ## Metabolic Reaction Activity Scores (MRAS) ### Gene-Protein-Reaction (GPR) Rules GPR rules define the relationship between genes, proteins, and reactions: 1. **Isoenzymes (OR relationship)**: Multiple genes can catalyze the same reaction 2. **Enzyme complexes (AND relationship)**: Multiple genes required for one reaction ### Scoring Algorithm #### Isoenzyme Score (OR) For reactions with multiple isoenzymes: $$\text{Score}_{\text{iso}} = \sum_{i=1}^{n} \frac{e_i}{w_i}$$ where: - $e_i$ = expression level of gene $i$ - $w_i$ = gene occurrence weight (frequency in GPR rules) #### Complex Score (AND) For enzyme complexes: $$\text{Score}_{\text{complex}} = \min_{i=1}^{n} \frac{e_i}{w_i}$$ The rate-limiting subunit determines the complex activity. #### Mixed Relationships For complex GPR rules with both AND and OR: $$\text{Score}_{\text{mixed}} = f(\text{hierarchical combination})$$ ### Normalization Scores are normalized per sample: $$\text{MRAS}_{ij} = \frac{\text{Score}_{ij}}{\max_j(\text{Score}_{ij})}$$ ## Flux Bounds Construction ### Reversible Reactions For reversible reactions ($\text{rev} = 1$): $$lb_i = -\text{MRAS}_i, \quad ub_i = \text{MRAS}_i$$ ### Irreversible Reactions For irreversible reactions ($\text{rev} = 0$): $$lb_i = 0, \quad ub_i = \text{MRAS}_i$$ ### Exchange Reactions Exchange reactions represent nutrient uptake/secretion: $$lb_i = \begin{cases} -1 & \text{if nutrient in medium} \\ 0 & \text{otherwise} \end{cases}$$ ## Community Modeling (Single-Cell) ### Community Stoichiometric Matrix For $k$ cell types, the community matrix is: $$\mathbf{S}_{\text{community}} = \begin{bmatrix} \mathbf{A}_1 & \mathbf{A}_2 & \cdots & \mathbf{A}_k & \mathbf{E} \\ \mathbf{S}_1 & \mathbf{0} & \cdots & \mathbf{0} & \mathbf{0} \\ \mathbf{0} & \mathbf{S}_2 & \cdots & \mathbf{0} & \mathbf{0} \\ \vdots & \vdots & \ddots & \vdots & \vdots \\ \mathbf{0} & \mathbf{0} & \cdots & \mathbf{S}_k & \mathbf{0} \end{bmatrix}$$ where: - $\mathbf{A}_i$ = exchange matrix for cell type $i$ - $\mathbf{S}_i$ = intracellular stoichiometry - $\mathbf{E}$ = external medium exchange ### Weighted Objective Cell type fractions weight the objective: $$\mathbf{q}_{\text{community}} = \sum_{i=1}^{k} f_i \cdot \mathbf{q}_i$$ where $f_i$ is the fraction of cell type $i$. ## Solver: OSQP METAFLUX uses the OSQP (Operator Splitting Quadratic Program) solver: ### Algorithm Settings ```{r osqp-settings, eval=FALSE} settings <- osqp::osqpSettings( max_iter = 1000000L, # Maximum iterations eps_abs = 1e-04, # Absolute tolerance eps_rel = 1e-04, # Relative tolerance adaptive_rho_interval = 50, verbose = FALSE ) ``` ### Convergence The solver converges when: $$\|\mathbf{S} \cdot \mathbf{v}\|_{\infty} < \epsilon$$ Typical precision: $\epsilon \approx 10^{-5}$ ## Computational Complexity | Operation | Complexity | |-----------|------------| | MRAS calculation | $O(n \cdot g)$ | | FBA (per sample) | $O(m \cdot n^2)$ | | Community model | $O(k^2 \cdot m \cdot n^2)$ | where: - $n$ = number of reactions - $m$ = number of metabolites - $g$ = number of genes - $k$ = number of cell types ## References 1. Huang Y, et al. (2023). Characterizing cancer metabolism from bulk and single-cell RNA-seq data using METAFlux. *Nature Communications*, 14, 4883. 2. Robinson JL, et al. (2020). An atlas of human metabolism. *Science Signaling*, 13, eaaz1482. 3. Orth JD, et al. (2010). What is flux balance analysis? *Nature Biotechnology*, 28, 245-248. 4. Stellato B, et al. (2020). OSQP: An operator splitting solver for quadratic programs. *Mathematical Programming Computation*, 12, 637-672. ## Author **Zaoqu Liu** - Package maintainer and optimization implementation - GitHub: [Zaoqu-Liu](https://github.com/Zaoqu-Liu) - ORCID: [0000-0002-0452-742X](https://orcid.org/0000-0002-0452-742X)