--- title: "Introduction to TorchDecon" author: "Zaoqu Liu" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Introduction to TorchDecon} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", eval = FALSE ) ``` ## Overview TorchDecon is an R package for **deep learning-based cell type deconvolution** of bulk RNA-seq data. It estimates the proportions of different cell types in bulk tissue samples by training deep neural networks on simulated bulk samples generated from single-cell RNA-seq reference data. ### Key Features - **Native R implementation**: Built on the torch package (LibTorch C++ backend), no Python required - **Seurat integration**: Works seamlessly with Seurat objects (v4 and v5) - **GPU acceleration**: Automatic CUDA support for faster training - **Ensemble model**: Uses three neural networks with different architectures for robust predictions - **Cross-platform**: Works on Windows, macOS, and Linux ## Installation ```{r install} # Install from GitHub devtools::install_github("Zaoqu-Liu/TorchDecon") # Install the torch backend (required) torch::install_torch() # For GPU support (requires CUDA) # torch::install_torch(type = "cuda") ``` ## Quick Start The simplest way to use TorchDecon is with the `RunTorchDecon()` function: ```{r quickstart} library(TorchDecon) library(Seurat) # Load your single-cell reference data seurat_obj <- readRDS("scRNA_reference.rds") # Load bulk RNA-seq data for deconvolution bulk_data <- read.table("bulk_expression.txt", header = TRUE, row.names = 1) # Run the complete workflow result <- RunTorchDecon( seurat_object = seurat_obj, bulk_data = bulk_data, celltype_col = "cell_type", n_samples = 2000, num_steps = 5000, seed = 42 ) # View predictions head(result$predictions) ``` ## Step-by-Step Workflow For more control, you can run each step separately: ### Step 1: Simulate Bulk Data Generate artificial bulk RNA-seq samples from your scRNA-seq reference: ```{r simulate} simulation <- SimulateBulk( object = seurat_obj, n_samples = 2000, cells_per_sample = 100, celltype_col = "cell_type", sparse_fraction = 0.5, # 50% of samples will have missing cell types seed = 42 ) print(simulation) ``` ### Step 2: Process Training Data Preprocess the simulated data and find common genes with your prediction data: ```{r process} processed <- ProcessTrainingData( simulation = simulation, prediction_data = bulk_data, var_cutoff = 0.1, scaling = "log_min_max" ) print(processed) ``` ### Step 3: Create and Train Model Create and train the deep neural network ensemble: ```{r train} # Create ensemble (3 models: m256, m512, m1024) ensemble <- CreateTorchDeconEnsemble( n_features = processed$n_genes, n_classes = length(processed$celltypes) ) # Train the model ensemble <- TrainModel( model = ensemble, data = processed, num_steps = 5000, batch_size = 128, learning_rate = 0.0001 ) ``` ### Step 4: Predict Cell Fractions Use the trained model to predict cell type fractions: ```{r predict} predictions <- PredictFractions(ensemble, bulk_data) head(predictions) ``` ### Step 5: Save and Load Model Save your trained model for future use: ```{r save-load} # Save model SaveModel(ensemble, "my_trained_model") # Load model later loaded_model <- LoadModel("my_trained_model") # Use loaded model for prediction new_predictions <- PredictFractions(loaded_model, new_bulk_data) ``` ## Model Architecture TorchDecon uses an ensemble of three deep neural networks: | Model | Architecture | Dropout | |-------|-------------|---------| | M256 | 256 → 128 → 64 → 32 → n_classes | None | | M512 | 512 → 256 → 128 → 64 → n_classes | 0, 0.3, 0.2, 0.1 | | M1024 | 1024 → 512 → 256 → 128 → n_classes | 0, 0.6, 0.3, 0.1 | Each network uses: - ReLU activation functions - Softmax output layer - Adam optimizer (β1 = 0.9, β2 = 0.999) - Mean Squared Error (MSE) loss The final prediction is the average of all three models. ## Evaluation If you have ground truth data, you can evaluate prediction accuracy: ```{r evaluate} metrics <- EvaluatePredictions(predictions, true_fractions) print(metrics$rmse) print(metrics$correlation) print(metrics$per_celltype) ``` ## Tips for Best Results 1. **Reference data quality**: Use high-quality scRNA-seq data with accurate cell type annotations 2. **Sample size**: More simulated samples (2000-5000) generally improve performance 3. **Training steps**: 5000-10000 steps are usually sufficient 4. **Gene filtering**: The default variance cutoff (0.1) works well for most cases 5. **GPU acceleration**: If available, GPU can speed up training significantly ## Citation If you use TorchDecon in your research, please cite: ``` Liu Z (2026). TorchDecon: Deep Learning-Based Cell Type Deconvolution Using torch. R package. https://github.com/Zaoqu-Liu/TorchDecon ``` This package implements the algorithm described in: ``` Menden K, et al. (2020). Deep learning-based cell composition analysis from tissue expression profiles. Science Advances, 6(30), eaba2619. ``` ## Session Info ```{r session} sessionInfo() ```