---
title: "Introduction to TorchDecon"
author: "Zaoqu Liu"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Introduction to TorchDecon}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  eval = FALSE
)
```

## Overview

TorchDecon is an R package for **deep learning-based cell type deconvolution** of bulk RNA-seq data. It estimates the proportions of different cell types in bulk tissue samples by training deep neural networks on simulated bulk samples generated from single-cell RNA-seq reference data.

### Key Features

- **Native R implementation**: Built on the torch package (LibTorch C++ backend), no Python required
- **Seurat integration**: Works seamlessly with Seurat objects (v4 and v5)
- **GPU acceleration**: Automatic CUDA support for faster training
- **Ensemble model**: Uses three neural networks with different architectures for robust predictions
- **Cross-platform**: Works on Windows, macOS, and Linux

## Installation

```{r install}
# Install from GitHub
devtools::install_github("Zaoqu-Liu/TorchDecon")

# Install the torch backend (required)
torch::install_torch()

# For GPU support (requires CUDA)
# torch::install_torch(type = "cuda")
```

## Quick Start

The simplest way to use TorchDecon is with the `RunTorchDecon()` function:

```{r quickstart}
library(TorchDecon)
library(Seurat)

# Load your single-cell reference data
seurat_obj <- readRDS("scRNA_reference.rds")

# Load bulk RNA-seq data for deconvolution
bulk_data <- read.table("bulk_expression.txt", header = TRUE, row.names = 1)

# Run the complete workflow
result <- RunTorchDecon(
  seurat_object = seurat_obj,
  bulk_data = bulk_data,
  celltype_col = "cell_type",
  n_samples = 2000,
  num_steps = 5000,
  seed = 42
)

# View predictions
head(result$predictions)
```

## Step-by-Step Workflow

For more control, you can run each step separately:

### Step 1: Simulate Bulk Data

Generate artificial bulk RNA-seq samples from your scRNA-seq reference:

```{r simulate}
simulation <- SimulateBulk(
  object = seurat_obj,
  n_samples = 2000,
  cells_per_sample = 100,
  celltype_col = "cell_type",
  sparse_fraction = 0.5,  # 50% of samples will have missing cell types
  seed = 42
)

print(simulation)
```

### Step 2: Process Training Data

Preprocess the simulated data and find common genes with your prediction data:

```{r process}
processed <- ProcessTrainingData(
  simulation = simulation,
  prediction_data = bulk_data,
  var_cutoff = 0.1,
  scaling = "log_min_max"
)

print(processed)
```

### Step 3: Create and Train Model

Create and train the deep neural network ensemble:

```{r train}
# Create ensemble (3 models: m256, m512, m1024)
ensemble <- CreateTorchDeconEnsemble(
  n_features = processed$n_genes,
  n_classes = length(processed$celltypes)
)

# Train the model
ensemble <- TrainModel(
  model = ensemble,
  data = processed,
  num_steps = 5000,
  batch_size = 128,
  learning_rate = 0.0001
)
```

### Step 4: Predict Cell Fractions

Use the trained model to predict cell type fractions:

```{r predict}
predictions <- PredictFractions(ensemble, bulk_data)
head(predictions)
```

### Step 5: Save and Load Model

Save your trained model for future use:

```{r save-load}
# Save model
SaveModel(ensemble, "my_trained_model")

# Load model later
loaded_model <- LoadModel("my_trained_model")

# Use loaded model for prediction
new_predictions <- PredictFractions(loaded_model, new_bulk_data)
```

## Model Architecture

TorchDecon uses an ensemble of three deep neural networks:

| Model | Architecture | Dropout |
|-------|-------------|---------|
| M256  | 256 → 128 → 64 → 32 → n_classes | None |
| M512  | 512 → 256 → 128 → 64 → n_classes | 0, 0.3, 0.2, 0.1 |
| M1024 | 1024 → 512 → 256 → 128 → n_classes | 0, 0.6, 0.3, 0.1 |

Each network uses:
- ReLU activation functions
- Softmax output layer
- Adam optimizer (β1 = 0.9, β2 = 0.999)
- Mean Squared Error (MSE) loss

The final prediction is the average of all three models.

## Evaluation

If you have ground truth data, you can evaluate prediction accuracy:

```{r evaluate}
metrics <- EvaluatePredictions(predictions, true_fractions)
print(metrics$rmse)
print(metrics$correlation)
print(metrics$per_celltype)
```

## Tips for Best Results

1. **Reference data quality**: Use high-quality scRNA-seq data with accurate cell type annotations
2. **Sample size**: More simulated samples (2000-5000) generally improve performance
3. **Training steps**: 5000-10000 steps are usually sufficient
4. **Gene filtering**: The default variance cutoff (0.1) works well for most cases
5. **GPU acceleration**: If available, GPU can speed up training significantly

## Citation

If you use TorchDecon in your research, please cite:

```
Liu Z (2026). TorchDecon: Deep Learning-Based Cell Type Deconvolution Using torch.
R package. https://github.com/Zaoqu-Liu/TorchDecon
```

This package implements the algorithm described in:

```
Menden K, et al. (2020). Deep learning-based cell composition analysis from
tissue expression profiles. Science Advances, 6(30), eaba2619.
```

## Session Info

```{r session}
sessionInfo()
```