--- title: "Quick Start Guide" author: "Zaoqu Liu" date: "`r Sys.Date()`" output: rmarkdown::html_vignette: toc: true toc_depth: 2 vignette: > %\VignetteIndexEntry{Quick Start Guide} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 7, fig.height = 5, fig.align = "center" ) ``` ## Introduction **CellProgramMapper** maps single-cell RNA sequencing data to reference gene expression programs (GEPs) using non-negative matrix factorization. This guide demonstrates the essential workflow in 5 minutes. ## Installation ```{r install, eval = FALSE} # From R-universe (recommended) install.packages("CellProgramMapper", repos = "https://zaoqu-liu.r-universe.dev") # Or from GitHub devtools::install_github("Zaoqu-Liu/CellProgramMapper") ``` ## Quick Example ```{r quick-example, eval = FALSE} library(CellProgramMapper) # Map a Seurat object to T-cell reference result <- CellProgramMapper( query = seurat_obj, reference = "TCAT.V1" ) # View results print(result) # Get usage matrix usage <- get_usage(result, normalized = TRUE) # Add to Seurat object seurat_obj <- add_results_to_seurat(seurat_obj, result) ``` ## Available References ```{r refs, eval = TRUE} library(CellProgramMapper) refs <- available_references() print(refs[, c("Name", "Cell_Type", "Species")]) ``` ## Input Formats CellProgramMapper accepts multiple input types: ```{r inputs, eval = FALSE} # 1. Seurat object (V4 or V5) result <- CellProgramMapper(query = seurat_obj, reference = "TCAT.V1") # 2. Matrix (cells × genes) result <- CellProgramMapper(query = counts_matrix, reference = "TCAT.V1") # 3. File path (h5ad, mtx) result <- CellProgramMapper(query = "data.h5ad", reference = "TCAT.V1") ``` ## Working with Results ### Access Usage Matrix ```{r usage, eval = FALSE} # Normalized (rows sum to 1) usage_norm <- get_usage(result, normalized = TRUE) # Raw usage_raw <- get_usage(result, normalized = FALSE) ``` ### Access Scores ```{r scores, eval = FALSE} # Get computed scores scores <- get_scores(result) ``` ### Save Results ```{r save, eval = FALSE} save_results(result, output_dir = "./output", prefix = "my_analysis") ``` ## Demonstration with Simulated Data ```{r demo, eval = TRUE, fig.cap = "Simulated GEP usage visualization"} set.seed(42) # Simulate reference (5 programs × 100 genes) H <- matrix(runif(5 * 100, 0, 1), nrow = 5) colnames(H) <- paste0("Gene", 1:100) rownames(H) <- paste0("GEP", 1:5) # Simulate query (50 cells × 100 genes) W_true <- matrix(runif(50 * 5, 0, 1), nrow = 50) X <- W_true %*% H + matrix(rnorm(50 * 100, 0, 0.1), nrow = 50) X[X < 0] <- 0 colnames(X) <- paste0("Gene", 1:100) rownames(X) <- paste0("Cell", 1:50) # Run CellProgramMapper result <- CellProgramMapper( query = X, reference = H, verbose = FALSE ) # Visualize usage <- get_usage(result, normalized = TRUE) usage_mat <- as.matrix(usage) par(mfrow = c(1, 2), mar = c(4, 4, 2, 1)) # Heatmap image(t(usage_mat), col = colorRampPalette(c("white", "#08306b"))(100), xlab = "Programs", ylab = "Cells", main = "Usage Matrix", axes = FALSE) axis(1, at = seq(0, 1, length.out = 5), labels = colnames(usage_mat)) # Bar plot for first cell barplot(as.numeric(usage[1, ]), col = "#1976d2", names.arg = colnames(usage), main = paste("Cell1 Usage"), xlab = "GEP", ylab = "Usage") ``` ## Performance Tips ```{r perf, eval = FALSE} # For large datasets, use parallel processing result <- CellProgramMapper( query = seurat_obj, reference = "TCAT.V1", n_workers = 4 ) # Data is automatically batched for memory efficiency ``` ## Next Steps - [Mathematical Framework](algorithm.html) - Understand the algorithm - [NNLS Solver Details](nnls-solver.html) - Implementation details - [Visualization Guide](visualization.html) - Create publication figures - [Custom References](custom-reference.html) - Build your own references ## Session Info ```{r session} sessionInfo() ```