This guide provides strategies for optimizing scVeloR performance when working with large single-cell datasets. We cover memory management, parallel computing, and algorithmic optimizations.
scVeloR uses a hybrid architecture for optimal performance:
┌─────────────────────────────────────────────────────────────┐
│ scVeloR Performance Stack │
├─────────────────────────────────────────────────────────────┤
│ R Interface Layer │
│ └── Vectorized R operations (Matrix package) │
│ └── Parallel processing (future/parallel) │
├─────────────────────────────────────────────────────────────┤
│ C++ Core (Rcpp/RcppArmadillo) │
│ └── Cosine similarity computation │
│ └── EM algorithm core │
│ └── KNN computations │
├─────────────────────────────────────────────────────────────┤
│ Sparse Matrix Support │
│ └── Memory-efficient storage │
│ └── Optimized linear algebra │
└─────────────────────────────────────────────────────────────┘
scVeloR automatically uses sparse matrices when beneficial:
library(Matrix)
# Check sparsity of your data
sparsity <- sum(seurat_obj@assays$RNA@counts == 0) /
length(seurat_obj@assays$RNA@counts)
message(sprintf("Data sparsity: %.1f%%", sparsity * 100))
# Force sparse representation
seurat_obj@assays$RNA@counts <- as(seurat_obj@assays$RNA@counts, "dgCMatrix")For very large datasets, process in chunks:
# Split cells into chunks
n_cells <- ncol(seurat_obj)
chunk_size <- 10000
n_chunks <- ceiling(n_cells / chunk_size)
# Process each chunk
results <- list()
for (i in seq_len(n_chunks)) {
start_idx <- (i - 1) * chunk_size + 1
end_idx <- min(i * chunk_size, n_cells)
chunk_obj <- seurat_obj[, start_idx:end_idx]
results[[i]] <- process_velocity_chunk(chunk_obj)
gc() # Clean up after each chunk
}
# Merge results
final_results <- merge_velocity_results(results)| Scenario | Recommended Setup |
|---|---|
| < 10K cells | Sequential (overhead > benefit) |
| 10K - 50K cells | 4 workers |
| 50K - 100K cells | 8 workers |
| > 100K cells | Max available - 1 |
Reducing the number of genes dramatically speeds up computation:
For large datasets, use approximate nearest neighbor algorithms:
# Exact KNN (default, slower for large data)
seurat_obj <- compute_neighbors(seurat_obj,
n_neighbors = 30,
method = "exact")
# Approximate KNN with Annoy (faster)
seurat_obj <- compute_neighbors(seurat_obj,
n_neighbors = 30,
method = "annoy",
n_trees = 50)
# Approximate KNN with HNSW (fastest for very large data)
seurat_obj <- compute_neighbors(seurat_obj,
n_neighbors = 30,
method = "hnsw",
M = 16, ef = 200)Computational time for different methods and dataset sizes.
# Fewer iterations for faster (less accurate) results
seurat_obj <- recover_dynamics(seurat_obj, max_iter = 5)
# More iterations for better accuracy
seurat_obj <- recover_dynamics(seurat_obj, max_iter = 20)
# Early stopping based on convergence
seurat_obj <- recover_dynamics(seurat_obj,
max_iter = 20,
tol = 1e-4) # Stop if change < tol| Dataset Size | Recommended RAM |
|---|---|
| < 10K cells | 8 GB |
| 10K - 50K cells | 16 GB |
| 50K - 100K cells | 32 GB |
| > 100K cells | 64+ GB |
| Analysis Type | CPU Cores | Time Estimate (50K cells) |
|---|---|---|
| Steady-state | 1 | ~2 min |
| Stochastic | 4 | ~10 min |
| Dynamical | 8 | ~45 min |
library(scVeloR)
library(future)
# 1. Configure parallel backend
n_cores <- min(8, availableCores() - 1)
plan(multisession, workers = n_cores)
# 2. Use sparse matrices
seurat_obj@assays$RNA@counts <- as(
seurat_obj@assays$RNA@counts, "dgCMatrix"
)
# 3. Preprocessing with filtering
seurat_obj <- prepare_velocity(
seurat_obj,
min_counts = 30, # Stricter filtering
min_cells = 50,
n_neighbors = 30
)
# 4. Use approximate KNN
seurat_obj <- compute_neighbors(
seurat_obj,
method = "hnsw",
n_neighbors = 30
)
# 5. Velocity with fewer genes
seurat_obj <- velocity(
seurat_obj,
mode = "dynamical",
n_top_genes = 1000, # Reduced from 2000
max_iter = 8, # Reduced from 10
n_cores = n_cores
)
# 6. Build velocity graph
seurat_obj <- velocity_graph(
seurat_obj,
n_neighbors = 30,
n_cores = n_cores
)
# 7. Reset backend
plan(sequential)
gc()# Subsample for visualization
set.seed(42)
sample_idx <- sample(ncol(seurat_obj), min(5000, ncol(seurat_obj)))
# Create subsampled plot
p <- plot_velocity(seurat_obj[, sample_idx],
embedding = "umap",
n_arrows = 500)
# Save to file instead of displaying
ggsave("velocity_plot.pdf", p, width = 8, height = 6)| Issue | Cause | Solution |
|---|---|---|
| Out of memory | Dense matrices | Use sparse matrices |
| Slow KNN | Large dataset | Use HNSW |
| Long EM runtime | Too many genes | Reduce n_top_genes |
| Worker errors | Memory per worker | Reduce workers or increase RAM |
Key optimization strategies:
future package
with appropriate workerssessionInfo()
#> R version 4.6.0 (2026-04-24)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.4 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: Etc/UTC
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] tidyr_1.3.2 gridExtra_2.3 ggplot2_4.0.3 rmarkdown_2.31
#>
#> loaded via a namespace (and not attached):
#> [1] gtable_0.3.6 jsonlite_2.0.0 dplyr_1.2.1 compiler_4.6.0
#> [5] tidyselect_1.2.1 jquerylib_0.1.4 scales_1.4.0 yaml_2.3.12
#> [9] fastmap_1.2.0 R6_2.6.1 labeling_0.4.3 generics_0.1.4
#> [13] knitr_1.51 tibble_3.3.1 maketools_1.3.2 bslib_0.11.0
#> [17] pillar_1.11.1 RColorBrewer_1.1-3 rlang_1.2.0 cachem_1.1.0
#> [21] xfun_0.57 sass_0.4.10 sys_3.4.3 S7_0.2.2
#> [25] otel_0.2.0 viridisLite_0.4.3 cli_3.6.6 withr_3.0.2
#> [29] magrittr_2.0.5 digest_0.6.39 grid_4.6.0 lifecycle_1.0.5
#> [33] vctrs_0.7.3 evaluate_1.0.5 glue_1.8.1 farver_2.1.2
#> [37] buildtools_1.0.0 purrr_1.2.2 tools_4.6.0 pkgconfig_2.0.3
#> [41] htmltools_0.5.9