--- title: "Getting Started with CellODE" author: "Zaoqu Liu" date: "`r Sys.Date()`" output: rmarkdown::html_vignette: toc: true toc_depth: 3 vignette: > %\VignetteIndexEntry{Getting Started with CellODE} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.align = "center", message = FALSE, warning = FALSE, eval = FALSE ) ``` ## Introduction **CellODE** is an R package for inferring cellular dynamics from single-cell RNA sequencing data using deep generative models. The package combines: - **Variational Autoencoders (VAE)** for dimensionality reduction - **Neural Ordinary Differential Equations (Neural ODE)** for continuous dynamics modeling Key capabilities include: - Pseudotime estimation without specifying root cells - Vector field inference representing cellular velocity - Latent space learning capturing cellular states - Prediction on query datasets ## Installation ### From R-universe (Recommended) ```{r install-universe} install.packages("CellODE", repos = "https://zaoqu-liu.r-universe.dev") ``` ### From GitHub ```{r install-github} # Install dependencies install.packages(c("torch", "R6", "Matrix", "ggplot2", "Rcpp", "RcppArmadillo")) # Install torch backend (run once) torch::install_torch() # Install CellODE remotes::install_github("Zaoqu-Liu/CellODE") ``` ## Basic Workflow ### 1. Load Libraries ```{r load} library(CellODE) library(Seurat) library(ggplot2) ``` ### 2. Prepare Data CellODE works with Seurat objects. Your data should have: - Raw UMI counts in `counts` slot (for `nb`/`zinb` modes) - Or log-normalized data in `data` slot (for `mse` mode) ```{r prepare} # Load your Seurat object seurat_obj <- readRDS("your_data.rds") # Standard preprocessing (if not already done) seurat_obj <- NormalizeData(seurat_obj) seurat_obj <- FindVariableFeatures(seurat_obj, nfeatures = 2000) seurat_obj <- ScaleData(seurat_obj) seurat_obj <- RunPCA(seurat_obj) seurat_obj <- RunUMAP(seurat_obj, dims = 1:30) ``` ### 3. Create and Train Model ```{r train} # Create trainer trainer <- Trainer$new( seurat_obj = seurat_obj, n_latent = 5, # Latent space dimensions n_ode_hidden = 25, # ODE network hidden units n_vae_hidden = 128, # VAE hidden units loss_mode = "nb", # Negative binomial for UMI counts nepoch = 100, # Training epochs batch_size = 1024 # Batch size ) # Train the model trainer$train() # Plot training history plot_training_history(trainer) ``` ### 4. Extract Pseudotime ```{r pseudotime} # Get pseudotime pseudotime <- trainer$get_time() # Add to Seurat object seurat_obj$pseudotime <- pseudotime # Visualize plot_pseudotime(seurat_obj, time_key = "pseudotime", embedding_key = "umap") ``` ### 5. Get Latent Space ```{r latent} # Extract latent representations latent <- trainer$get_latentsp(alpha_z = 0.5, alpha_predz = 0.5) # Store in Seurat object seurat_obj@misc$X_zs <- latent$mix_zs ``` ### 6. Compute Vector Field ```{r vector-field} # Get vector field vf <- trainer$get_vector_field(pseudotime, latent$mix_zs) seurat_obj@misc$X_VF <- vf # Visualize plot_vector_field(seurat_obj, zs_key = "X_zs", vf_key = "X_VF") ``` ### 7. Save Model ```{r save} # Save for later use trainer$save_model("my_cellode_model") # Load model # trainer <- load_model("my_cellode_model", seurat_obj) ``` ## Quick Example with Synthetic Data Here's a minimal working example: ```{r synthetic} library(CellODE) library(torch) # Create synthetic data set.seed(42) n_cells <- 200 n_genes <- 50 # Simulate expression with pseudo-trajectory true_time <- runif(n_cells) X <- matrix(0, n_cells, n_genes) for (i in 1:n_cells) { X[i, 1:25] <- rpois(25, lambda = 20 * (1 - true_time[i]) + 2) X[i, 26:50] <- rpois(25, lambda = 20 * true_time[i] + 2) } # Create model directly (without Seurat) model <- TNODE( n_int = n_genes, n_latent = 5L, n_ode_hidden = 25L, n_vae_hidden = 64L, loss_mode = "nb" ) # Prepare tensors X_log <- log1p(X) X_tensor <- torch::torch_tensor(X_log, dtype = torch::torch_float()) y_tensor <- X_tensor$sum(dim = 2) # Train optimizer <- torch::optim_adam(model$parameters, lr = 0.001) for (epoch in 1:50) { optimizer$zero_grad() result <- model(X_tensor, y_tensor) result$loss$backward() optimizer$step() if (epoch %% 10 == 0) { cat(sprintf("Epoch %d: loss = %.4f\n", epoch, result$loss$item())) } } # Get pseudotime model$eval() torch::with_no_grad({ enc_out <- model$encoder(X_tensor) pseudotime <- as.numeric(enc_out$t$squeeze(-1)$cpu()) }) # Check correlation with true time cat(sprintf("Correlation: %.4f\n", abs(cor(pseudotime, true_time)))) ``` ## Parameter Guidelines ### Loss Mode Selection | Data Type | Recommended Mode | |-----------|------------------| | UMI counts | `"nb"` (default) | | Sparse UMI counts | `"zinb"` | | Log-normalized | `"mse"` | ### Latent Dimensions | Trajectory Complexity | Recommended `n_latent` | |-----------------------|------------------------| | Simple linear | 3-5 | | Single branch | 5-10 | | Multiple branches | 10-20 | ## Next Steps - **Algorithm Theory**: Learn about the mathematical foundations in the [Algorithm Theory](algorithm-theory.html) vignette - **Vector Field Analysis**: Deep dive into velocity analysis in the [Vector Field Analysis](vector-field-analysis.html) vignette - **Advanced Usage**: Explore hyperparameter tuning in the [Advanced Usage](advanced-usage.html) vignette ## Session Info ```{r session} sessionInfo() ```