| Title: | Deep Learning-Based Cell Type Deconvolution Using torch |
|---|---|
| Description: | A deep neural network ensemble approach for cell type deconvolution of bulk RNA-seq data. TorchDecon uses simulated bulk samples generated from single-cell RNA-seq data to train deep neural networks that predict cell type fractions. This package is an R-native implementation based on the Scaden algorithm, built on the torch framework (LibTorch C++ backend) for GPU acceleration and cross-platform compatibility. Seamlessly integrates with Seurat objects (v4 and v5 compatible). |
| Authors: | Zaoqu Liu [aut, cre] (ORCID: <https://orcid.org/0000-0002-0452-742X>) |
| Maintainer: | Zaoqu Liu <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 1.0.0 |
| Built: | 2026-05-26 06:24:29 UTC |
| Source: | https://github.com/Zaoqu-Liu/TorchDecon |
Apply the same scaling transformation to new prediction data.
ApplyScaling(X, scaling_method = "log_min_max")ApplyScaling(X, scaling_method = "log_min_max")
X |
Numeric matrix (samples x genes) of new data to scale. |
scaling_method |
Character. Scaling method to use. |
This function applies the same scaling approach used during training to new data. For min-max scaling, each sample is scaled independently based on its own min/max.
Scaled matrix.
Calculate fraction of predictions within threshold of truth.
CalculateAccuracy(predictions, truth, threshold = 0.05)CalculateAccuracy(predictions, truth, threshold = 0.05)
predictions |
Numeric vector or matrix of predictions. |
truth |
Numeric vector or matrix of true values. |
threshold |
Numeric. Threshold for accuracy calculation. Default is 0.05. |
Accuracy (fraction between 0 and 1).
Calculate correlation between predictions and truth.
CalculateCorrelation(predictions, truth, method = "pearson")CalculateCorrelation(predictions, truth, method = "pearson")
predictions |
Numeric vector or matrix of predictions. |
truth |
Numeric vector or matrix of true values. |
method |
Character. Correlation method ("pearson" or "spearman"). Default is "pearson". |
Correlation coefficient.
Calculate Mean Absolute Error between predictions and truth.
CalculateMAE(predictions, truth)CalculateMAE(predictions, truth)
predictions |
Numeric vector or matrix of predictions. |
truth |
Numeric vector or matrix of true values. |
MAE value.
Calculate Mean Relative Error between predictions and truth.
CalculateMRE(predictions, truth, epsilon = 1e-06)CalculateMRE(predictions, truth, epsilon = 1e-06)
predictions |
Numeric vector or matrix of predictions. |
truth |
Numeric vector or matrix of true values. |
epsilon |
Numeric. Small value to avoid division by zero. Default is 1e-6. |
MRE value.
Calculate Root Mean Squared Error between predictions and truth.
CalculateRMSE(predictions, truth)CalculateRMSE(predictions, truth)
predictions |
Numeric vector or matrix of predictions. |
truth |
Numeric vector or matrix of true values. |
RMSE value.
Create a TorchDecon model object with specified architecture.
CreateTorchDecon( n_features, n_classes, architecture = c("m256", "m512", "m1024", "custom"), hidden_units = NULL, dropout_rates = NULL, device = "auto", seed = NULL )CreateTorchDecon( n_features, n_classes, architecture = c("m256", "m512", "m1024", "custom"), hidden_units = NULL, dropout_rates = NULL, device = "auto", seed = NULL )
n_features |
Integer. Number of input features (genes). |
n_classes |
Integer. Number of output classes (cell types). |
architecture |
Character. One of "m256", "m512", "m1024", or "custom". Default is "m256". |
|
Integer vector. Custom hidden layer sizes (only used if architecture = "custom"). Default is NULL. |
|
dropout_rates |
Numeric vector. Custom dropout rates (only used if architecture = "custom"). Default is NULL. |
device |
Character. Device to use ("cpu", "cuda", or "auto"). Default is "auto". |
seed |
Integer. Random seed for reproducibility. Default is NULL. |
Pre-defined architectures:
m256: Hidden units 256-128-64-32, no dropout
m512: Hidden units 512-256-128-64, dropout 0/0.3/0.2/0.1
m1024: Hidden units 1024-512-256-128, dropout 0/0.6/0.3/0.1
A TorchDeconModel object containing the neural network and metadata.
## Not run: # Create a model with m256 architecture model <- CreateTorchDecon(n_features = 5000, n_classes = 10, architecture = "m256") # Create a custom architecture model <- CreateTorchDecon( n_features = 5000, n_classes = 10, architecture = "custom", hidden_units = c(512, 256, 128, 64), dropout_rates = c(0.1, 0.2, 0.1, 0.1) ) ## End(Not run)## Not run: # Create a model with m256 architecture model <- CreateTorchDecon(n_features = 5000, n_classes = 10, architecture = "m256") # Create a custom architecture model <- CreateTorchDecon( n_features = 5000, n_classes = 10, architecture = "custom", hidden_units = c(512, 256, 128, 64), dropout_rates = c(0.1, 0.2, 0.1, 0.1) ) ## End(Not run)
Create an ensemble of three TorchDecon models with different architectures.
CreateTorchDeconEnsemble(n_features, n_classes, device = "auto", seed = NULL)CreateTorchDeconEnsemble(n_features, n_classes, device = "auto", seed = NULL)
n_features |
Integer. Number of input features (genes). |
n_classes |
Integer. Number of output classes (cell types). |
device |
Character. Device to use. Default is "auto". |
seed |
Integer. Random seed. Default is NULL. |
A TorchDeconEnsemble object containing three models.
## Not run: ensemble <- CreateTorchDeconEnsemble(n_features = 5000, n_classes = 10) ## End(Not run)## Not run: ensemble <- CreateTorchDeconEnsemble(n_features = 5000, n_classes = 10) ## End(Not run)
Comprehensive evaluation of cell type deconvolution predictions against ground truth fractions. Calculates multiple performance metrics including RMSE, MAE, MRE, Pearson correlation, and accuracy at different thresholds.
EvaluateDeconvolution( predictions, truth, by_celltype = TRUE, accuracy_thresholds = c(0.01, 0.05, 0.1) )EvaluateDeconvolution( predictions, truth, by_celltype = TRUE, accuracy_thresholds = c(0.01, 0.05, 0.1) )
predictions |
Data frame or matrix of predicted cell type fractions (samples x cell types). |
truth |
Data frame or matrix of true cell type fractions (samples x cell types). |
by_celltype |
Logical. Calculate metrics per cell type. Default is TRUE. |
accuracy_thresholds |
Numeric vector. Thresholds for accuracy calculation. Default is c(0.01, 0.05, 0.1). |
Metrics calculated:
RMSE: Root Mean Squared Error
MAE: Mean Absolute Error
MRE: Mean Relative Error (relative to true values)
Pearson: Pearson correlation coefficient
Spearman: Spearman rank correlation
Accuracy: Fraction of predictions within threshold of truth
A list containing:
Data frame with overall metrics (RMSE, MAE, MRE, correlation)
Data frame with per-celltype metrics (if by_celltype = TRUE)
Data frame with accuracy at different thresholds
Numeric vector of per-sample correlations
## Not run: # Evaluate predictions eval_results <- EvaluateDeconvolution(predictions, true_fractions) # View overall metrics print(eval_results$overall) # View per-celltype metrics print(eval_results$by_celltype) ## End(Not run)## Not run: # Evaluate predictions eval_results <- EvaluateDeconvolution(predictions, true_fractions) # View overall metrics print(eval_results$overall) # View per-celltype metrics print(eval_results$by_celltype) ## End(Not run)
Evaluate predicted cell fractions against known ground truth.
EvaluatePredictions(predictions, truth)EvaluatePredictions(predictions, truth)
predictions |
Data frame of predicted fractions (samples x cell types). |
truth |
Data frame of true fractions (same format as predictions). |
A list containing:
Root mean squared error overall
Mean absolute error overall
Pearson correlation overall
Metrics per cell type
Metrics per sample
## Not run: # Evaluate on held-out test data metrics <- EvaluatePredictions(predictions, true_fractions) print(metrics$rmse) ## End(Not run)## Not run: # Evaluate on held-out test data metrics <- EvaluatePredictions(predictions, true_fractions) print(metrics$rmse) ## End(Not run)
Export a TorchDeconSimulation object to tab-separated files.
ExportSimulation(simulation, output_dir = ".", prefix = "simulation")ExportSimulation(simulation, output_dir = ".", prefix = "simulation")
simulation |
A TorchDeconSimulation object. |
output_dir |
Character. Output directory. Default is current directory. |
prefix |
Character. Prefix for output files. Default is "simulation". |
Invisibly returns the output paths.
Extract cell type annotations from a Seurat object.
ExtractCellTypes(object, celltype_col = NULL)ExtractCellTypes(object, celltype_col = NULL)
object |
A Seurat object. |
celltype_col |
Character. Name of the metadata column containing cell types. If NULL, uses active identity (Idents). |
A character vector of cell type labels.
## Not run: # Use active identity celltypes <- ExtractCellTypes(seurat_obj) # Use specific metadata column celltypes <- ExtractCellTypes(seurat_obj, celltype_col = "cell_type") ## End(Not run)## Not run: # Use active identity celltypes <- ExtractCellTypes(seurat_obj) # Use specific metadata column celltypes <- ExtractCellTypes(seurat_obj, celltype_col = "cell_type") ## End(Not run)
Extract the count matrix from a Seurat object, compatible with both v4 and v5. Uses v4 methods by default, falling back to v5 if necessary.
ExtractSeuratData( object, assay = NULL, slot = c("counts", "data", "scale.data"), layer = NULL )ExtractSeuratData( object, assay = NULL, slot = c("counts", "data", "scale.data"), layer = NULL )
object |
A Seurat object. |
assay |
Character. Name of the assay to use. Default is NULL (uses default assay). |
slot |
Character. Slot to extract. One of "counts", "data", or "scale.data". Default is "counts". |
layer |
Character. For Seurat v5, the layer name. Default is NULL. |
This function prioritizes Seurat v4 compatibility. For v4 objects, it uses
GetAssayData() with the slot argument. For v5 objects, it attempts
to use the same method first, then falls back to layer-based access if needed.
A matrix (sparse or dense) containing the expression data.
## Not run: # Extract counts from default assay counts <- ExtractSeuratData(seurat_obj) # Extract normalized data from RNA assay data <- ExtractSeuratData(seurat_obj, assay = "RNA", slot = "data") ## End(Not run)## Not run: # Extract counts from default assay counts <- ExtractSeuratData(seurat_obj) # Extract normalized data from RNA assay data <- ExtractSeuratData(seurat_obj, assay = "RNA", slot = "data") ## End(Not run)
Generate random example data for testing TorchDecon functionality. Creates a simple Seurat object and bulk expression data.
GenerateExampleData( n_cells = 500L, n_genes = 200L, n_celltypes = 5L, n_bulk_samples = 20L, seed = 42L )GenerateExampleData( n_cells = 500L, n_genes = 200L, n_celltypes = 5L, n_bulk_samples = 20L, seed = 42L )
n_cells |
Integer. Number of cells in scRNA-seq data. Default is 500. |
n_genes |
Integer. Number of genes. Default is 200. |
n_celltypes |
Integer. Number of cell types. Default is 5. |
n_bulk_samples |
Integer. Number of bulk samples. Default is 20. |
seed |
Integer. Random seed. Default is 42. |
A list containing:
A Seurat object with random scRNA-seq data
Matrix of bulk expression data (genes x samples)
## Not run: example_data <- GenerateExampleData() seurat_obj <- example_data$seurat bulk_data <- example_data$bulk_data ## End(Not run)## Not run: example_data <- GenerateExampleData() seurat_obj <- example_data$seurat bulk_data <- example_data$bulk_data ## End(Not run)
Extract training history from a trained model.
GetTrainingHistory(model)GetTrainingHistory(model)
model |
A trained TorchDeconModel or TorchDeconEnsemble object. |
A data frame with training loss (and validation loss if applicable).
Load a previously saved TorchDecon model or ensemble from disk.
LoadModel(path, device = "auto")LoadModel(path, device = "auto")
path |
Character. Directory path where the model was saved. |
device |
Character. Device to load the model onto ("cpu", "cuda", or "auto"). Default is "auto". |
A TorchDeconModel or TorchDeconEnsemble object.
## Not run: model <- LoadModel("my_model") predictions <- PredictFractions(model, new_data) ## End(Not run)## Not run: model <- LoadModel("my_model") predictions <- PredictFractions(model, new_data) ## End(Not run)
Merge multiple TorchDeconSimulation objects into one.
MergeSimulations(...)MergeSimulations(...)
... |
TorchDeconSimulation objects to merge, or a list of them. |
The merged object will contain:
Combined bulk counts (horizontally concatenated)
Combined cell fractions (vertically concatenated)
Union of all genes (with NA handling)
Union of all cell types (with NA/0 for missing types)
A merged TorchDeconSimulation object.
## Not run: sim1 <- SimulateBulk(seurat1, n_samples = 500) sim2 <- SimulateBulk(seurat2, n_samples = 500) merged <- MergeSimulations(sim1, sim2) ## End(Not run)## Not run: sim1 <- SimulateBulk(seurat1, n_samples = 500) sim2 <- SimulateBulk(seurat2, n_samples = 500) merged <- MergeSimulations(sim1, sim2) ## End(Not run)
Visualize evaluation metrics for deconvolution results.
PlotEvaluation( evaluation, type = c("bar", "correlation", "scatter", "heatmap"), predictions = NULL, truth = NULL )PlotEvaluation( evaluation, type = c("bar", "correlation", "scatter", "heatmap"), predictions = NULL, truth = NULL )
evaluation |
A TorchDeconEvaluation object from EvaluateDeconvolution(). |
type |
Character. Type of plot: "correlation", "scatter", "heatmap", or "bar". Default is "bar". |
predictions |
Data frame of predictions (required for "scatter" and "heatmap"). |
truth |
Data frame of true values (required for "scatter" and "heatmap"). |
A ggplot2 object.
Visualize training loss over steps for TorchDecon models.
PlotTrainingHistory(model, log_scale = FALSE, smooth = TRUE, smooth_span = 0.1)PlotTrainingHistory(model, log_scale = FALSE, smooth = TRUE, smooth_span = 0.1)
model |
A trained TorchDeconModel or TorchDeconEnsemble object, or a data frame from GetTrainingHistory(). |
log_scale |
Logical. Use log scale for y-axis. Default is FALSE. |
smooth |
Logical. Add smoothed line. Default is TRUE. |
smooth_span |
Numeric. Span for LOESS smoothing. Default is 0.1. |
This function requires ggplot2 for plotting. If ggplot2 is not available, it will return the training history data frame.
A ggplot2 object (if ggplot2 is available), otherwise NULL.
## Not run: # Plot training history PlotTrainingHistory(trained_model) # With log scale PlotTrainingHistory(trained_model, log_scale = TRUE) ## End(Not run)## Not run: # Plot training history PlotTrainingHistory(trained_model) # With log scale PlotTrainingHistory(trained_model, log_scale = TRUE) ## End(Not run)
Use a trained TorchDecon model or ensemble to predict cell type fractions from bulk RNA-seq data.
PredictFractions( model, data, scaling = "log_min_max", return_all = FALSE, verbose = TRUE )PredictFractions( model, data, scaling = "log_min_max", return_all = FALSE, verbose = TRUE )
model |
A trained TorchDeconModel or TorchDeconEnsemble object. |
data |
Matrix, data frame, or file path to bulk RNA-seq data (genes x samples). Can also be a TorchDeconProcessed object. |
scaling |
Character. Scaling method to apply. Default is "log_min_max". Set to NULL to skip scaling (if data is already processed). |
return_all |
Logical. For ensemble, return predictions from all models in addition to the average. Default is FALSE. |
verbose |
Logical. Print progress messages. Default is TRUE. |
For ensemble models, predictions are averaged across all three sub-models (m256, m512, m1024) to produce the final prediction.
The input data must contain the same genes used during training. Missing genes will cause an error.
A data frame of predicted cell type fractions with samples as rows and cell types as columns. If return_all = TRUE for ensemble, returns a list with 'average' and individual model predictions.
## Not run: # Predict with a trained ensemble predictions <- PredictFractions(trained_ensemble, bulk_data) # Get individual model predictions all_predictions <- PredictFractions(trained_ensemble, bulk_data, return_all = TRUE) ## End(Not run)## Not run: # Predict with a trained ensemble predictions <- PredictFractions(trained_ensemble, bulk_data) # Get individual model predictions all_predictions <- PredictFractions(trained_ensemble, bulk_data, return_all = TRUE) ## End(Not run)
Print summary of a TorchDeconEnsemble object.
## S3 method for class 'TorchDeconEnsemble' print(x, ...)## S3 method for class 'TorchDeconEnsemble' print(x, ...)
x |
A TorchDeconEnsemble object. |
... |
Additional arguments (ignored). |
Print summary of evaluation results.
## S3 method for class 'TorchDeconEvaluation' print(x, ...)## S3 method for class 'TorchDeconEvaluation' print(x, ...)
x |
A TorchDeconEvaluation object. |
... |
Additional arguments (ignored). |
Print summary of a TorchDeconModel object.
## S3 method for class 'TorchDeconModel' print(x, ...)## S3 method for class 'TorchDeconModel' print(x, ...)
x |
A TorchDeconModel object. |
... |
Additional arguments (ignored). |
Print summary of processed data.
## S3 method for class 'TorchDeconProcessed' print(x, ...)## S3 method for class 'TorchDeconProcessed' print(x, ...)
x |
A TorchDeconProcessed object. |
... |
Additional arguments (ignored). |
Print summary of simulation results.
## S3 method for class 'TorchDeconSimulation' print(x, ...)## S3 method for class 'TorchDeconSimulation' print(x, ...)
x |
A TorchDeconSimulation object. |
... |
Additional arguments (ignored). |
Process bulk RNA-seq data for prediction using an existing TorchDecon model.
ProcessPredictionData(data, genes, scaling = "log_min_max", verbose = TRUE)ProcessPredictionData(data, genes, scaling = "log_min_max", verbose = TRUE)
data |
Matrix or data frame of bulk expression data (genes x samples). |
genes |
Character vector of genes to use (signature genes from training). |
scaling |
Character. Scaling method matching training. Default is "log_min_max". |
verbose |
Logical. Print progress. Default is TRUE. |
Processed matrix (samples x genes) ready for prediction.
Preprocess simulated bulk data for model training. This includes log transformation, scaling, and gene filtering based on variance and intersection with prediction data.
ProcessTrainingData( simulation, prediction_data = NULL, var_cutoff = 0.1, scaling = c("log_min_max", "log_zscore", "none"), verbose = TRUE )ProcessTrainingData( simulation, prediction_data = NULL, var_cutoff = 0.1, scaling = c("log_min_max", "log_zscore", "none"), verbose = TRUE )
simulation |
A TorchDeconSimulation object from |
prediction_data |
Matrix or data frame of bulk expression data for prediction (genes in rows, samples in columns). Used to find common genes. |
var_cutoff |
Numeric. Filter out genes with variance below this threshold. Default is 0.1. |
scaling |
Character. Scaling method to use. One of "log_min_max" (default), "log_zscore", or "none". |
verbose |
Logical. Print progress messages. Default is TRUE. |
The preprocessing pipeline:
Find common genes between training and prediction data
Filter genes by variance threshold
Apply log2(x + 1) transformation
Apply sample-wise min-max scaling (or z-score)
A list containing:
Processed expression matrix (samples x genes), ready for training
Cell type fractions matrix (samples x cell types)
Character vector of genes used (signature genes)
Character vector of cell type names
Scaling method used
Parameters for scaling (for applying to new data)
## Not run: # Basic processing processed <- ProcessTrainingData(simulation, prediction_data = bulk_data) # Custom variance cutoff processed <- ProcessTrainingData( simulation, prediction_data = bulk_data, var_cutoff = 0.05, scaling = "log_min_max" ) ## End(Not run)## Not run: # Basic processing processed <- ProcessTrainingData(simulation, prediction_data = bulk_data) # Custom variance cutoff processed <- ProcessTrainingData( simulation, prediction_data = bulk_data, var_cutoff = 0.05, scaling = "log_min_max" ) ## End(Not run)
Load a pre-trained model and make predictions on new bulk data.
QuickPredict(model_path, bulk_data, output_file = NULL, verbose = TRUE)QuickPredict(model_path, bulk_data, output_file = NULL, verbose = TRUE)
model_path |
Character. Path to saved model directory. |
bulk_data |
Matrix or file path to bulk RNA-seq data. |
output_file |
Character. Path to save predictions. Default is NULL. |
verbose |
Logical. Print progress. Default is TRUE. |
Data frame of predicted cell fractions.
## Not run: predictions <- QuickPredict( model_path = "trained_model", bulk_data = "bulk_expression.txt", output_file = "predictions.txt" ) ## End(Not run)## Not run: predictions <- QuickPredict( model_path = "trained_model", bulk_data = "bulk_expression.txt", output_file = "predictions.txt" ) ## End(Not run)
A convenience function that runs the complete TorchDecon workflow: simulate bulk data, process, train ensemble, and optionally predict.
RunTorchDecon( seurat_object, bulk_data = NULL, celltype_col = NULL, assay = NULL, n_samples = 1000L, cells_per_sample = 100L, sparse_fraction = 0.5, unknown_celltypes = NULL, num_steps = 1000L, batch_size = 128L, learning_rate = 1e-04, validation_split = 0, early_stopping = FALSE, patience = 100L, var_cutoff = 0.1, scaling = "log_min_max", model_type = c("ensemble", "single"), architecture = c("m512", "m256", "m1024"), device = "auto", save_model = NULL, seed = 42L, verbose = TRUE, n_cores = 1L )RunTorchDecon( seurat_object, bulk_data = NULL, celltype_col = NULL, assay = NULL, n_samples = 1000L, cells_per_sample = 100L, sparse_fraction = 0.5, unknown_celltypes = NULL, num_steps = 1000L, batch_size = 128L, learning_rate = 1e-04, validation_split = 0, early_stopping = FALSE, patience = 100L, var_cutoff = 0.1, scaling = "log_min_max", model_type = c("ensemble", "single"), architecture = c("m512", "m256", "m1024"), device = "auto", save_model = NULL, seed = 42L, verbose = TRUE, n_cores = 1L )
seurat_object |
A Seurat object with cell type annotations. |
bulk_data |
Matrix of bulk RNA-seq data for prediction (genes x samples). If NULL, only training is performed. |
celltype_col |
Character. Metadata column with cell type labels. |
assay |
Character. Assay to use from Seurat object. Default is NULL (default assay). |
n_samples |
Integer. Number of bulk samples to simulate. Default is 1000. |
cells_per_sample |
Integer. Cells per simulated sample. Default is 100. |
sparse_fraction |
Numeric. Fraction of sparse samples (0-1). Default is 0.5. |
unknown_celltypes |
Character vector. Cell types to merge into "Unknown". Default is NULL. |
num_steps |
Integer. Training steps per model. Default is 1000 (matches Python). |
batch_size |
Integer. Training batch size. Default is 128. |
learning_rate |
Numeric. Learning rate. Default is 0.0001. |
validation_split |
Numeric. Fraction for validation (0-1). Default is 0. |
early_stopping |
Logical. Enable early stopping. Default is FALSE. |
patience |
Integer. Early stopping patience. Default is 100. |
var_cutoff |
Numeric. Variance cutoff for gene filtering. Default is 0.1. |
scaling |
Character. Scaling method: "log_min_max", "log_zscore", or "none". Default is "log_min_max". |
model_type |
Character. "ensemble" or "single". Default is "ensemble". |
architecture |
Character. Architecture for single model: "m256", "m512", "m1024". Default is "m512". |
device |
Character. "auto", "cpu", or "cuda". Default is "auto". |
save_model |
Character. Path to save trained model. Default is NULL (don't save). |
seed |
Integer. Random seed. Default is 42. |
verbose |
Logical. Print progress. Default is TRUE. |
n_cores |
Integer. Cores for parallel simulation. Default is 1. |
A list containing:
The trained TorchDeconModel or TorchDeconEnsemble
Predicted cell fractions (if bulk_data provided)
The simulation object
The processed training data
## Not run: # Complete workflow with ensemble (default) result <- RunTorchDecon( seurat_object = my_seurat, bulk_data = bulk_expression, celltype_col = "cell_type", n_samples = 2000, num_steps = 1000 ) # Single model with early stopping result <- RunTorchDecon( seurat_object = my_seurat, bulk_data = bulk_expression, model_type = "single", architecture = "m1024", validation_split = 0.1, early_stopping = TRUE ) # Get predictions predictions <- result$predictions ## End(Not run)## Not run: # Complete workflow with ensemble (default) result <- RunTorchDecon( seurat_object = my_seurat, bulk_data = bulk_expression, celltype_col = "cell_type", n_samples = 2000, num_steps = 1000 ) # Single model with early stopping result <- RunTorchDecon( seurat_object = my_seurat, bulk_data = bulk_expression, model_type = "single", architecture = "m1024", validation_split = 0.1, early_stopping = TRUE ) # Get predictions predictions <- result$predictions ## End(Not run)
Save a trained TorchDecon model or ensemble to disk.
SaveModel(model, path, overwrite = FALSE)SaveModel(model, path, overwrite = FALSE)
model |
A TorchDeconModel or TorchDeconEnsemble object. |
path |
Character. Directory path to save the model. |
overwrite |
Logical. Overwrite existing files. Default is FALSE. |
The function saves:
Network weights (.pt files)
Model metadata (genes, cell types, architecture)
Training history (if available)
Invisibly returns the save path.
## Not run: SaveModel(trained_model, "my_model") ## End(Not run)## Not run: SaveModel(trained_model, "my_model") ## End(Not run)
Generate simulated bulk RNA-seq samples from single-cell RNA-seq data stored in a Seurat object. This function creates artificial bulk samples by randomly sampling and aggregating single cells with known cell type proportions.
SimulateBulk( object, n_samples = 1000L, cells_per_sample = 100L, celltype_col = NULL, assay = NULL, unknown_celltypes = NULL, sparse_fraction = 0.5, min_celltypes = 1L, seed = NULL, verbose = TRUE, n_cores = 1L )SimulateBulk( object, n_samples = 1000L, cells_per_sample = 100L, celltype_col = NULL, assay = NULL, unknown_celltypes = NULL, sparse_fraction = 0.5, min_celltypes = 1L, seed = NULL, verbose = TRUE, n_cores = 1L )
object |
A Seurat object containing single-cell RNA-seq data. |
n_samples |
Integer. Number of bulk samples to simulate. Default is 1000. |
cells_per_sample |
Integer. Number of cells to aggregate per sample. Default is 100. |
celltype_col |
Character. Name of metadata column containing cell type labels. If NULL, uses active identity (Idents). Default is NULL. |
assay |
Character. Name of assay to use. Default is NULL (uses default assay). |
unknown_celltypes |
Character vector. Cell types to merge into "Unknown" category. Default is NULL (no merging). |
sparse_fraction |
Numeric. Fraction of samples that should be "sparse" (missing some cell types). Value between 0 and 1. Default is 0.5. |
min_celltypes |
Integer. Minimum number of cell types in sparse samples. Default is 1. |
seed |
Integer. Random seed for reproducibility. Default is NULL. |
verbose |
Logical. Print progress messages. Default is TRUE. |
n_cores |
Integer. Number of cores for parallel processing. Default is 1. |
The simulation process:
Generate random cell type fractions that sum to 1
Sample cells according to these fractions
Sum expression values across sampled cells
Create both "normal" (all cell types) and "sparse" (subset of cell types) samples
A list containing:
Matrix of simulated bulk expression (genes x samples)
Data frame of true cell type fractions (samples x cell types)
Character vector of cell type names
Character vector of gene names
List of simulation parameters
## Not run: # Basic simulation sim_data <- SimulateBulk(seurat_obj, n_samples = 1000) # Custom simulation with specific parameters sim_data <- SimulateBulk( seurat_obj, n_samples = 2000, cells_per_sample = 200, celltype_col = "cell_annotation", sparse_fraction = 0.3, seed = 42 ) ## End(Not run)## Not run: # Basic simulation sim_data <- SimulateBulk(seurat_obj, n_samples = 1000) # Custom simulation with specific parameters sim_data <- SimulateBulk( seurat_obj, n_samples = 2000, cells_per_sample = 200, celltype_col = "cell_annotation", sparse_fraction = 0.3, seed = 42 ) ## End(Not run)
Convert a TorchDeconSimulation object to a data frame for export.
SimulationToDataFrame(simulation, what = c("both", "counts", "fractions"))SimulationToDataFrame(simulation, what = c("both", "counts", "fractions"))
simulation |
A TorchDeconSimulation object. |
what |
Character. What to export: "counts", "fractions", or "both". Default is "both". |
A data frame or list of data frames.
Return summary statistics of evaluation.
## S3 method for class 'TorchDeconEvaluation' summary(object, ...)## S3 method for class 'TorchDeconEvaluation' summary(object, ...)
object |
A TorchDeconEvaluation object. |
... |
Additional arguments (ignored). |
Data frame with summary statistics.
Print detailed summary of a TorchDeconModel object.
## S3 method for class 'TorchDeconModel' summary(object, ...)## S3 method for class 'TorchDeconModel' summary(object, ...)
object |
A TorchDeconModel object. |
... |
Additional arguments (ignored). |
Train a TorchDecon model or ensemble on processed training data.
TrainModel( model, data, batch_size = 128L, learning_rate = 1e-04, num_steps = 1000L, validation_split = 0, early_stopping = FALSE, patience = 500L, checkpoint_dir = NULL, verbose = TRUE, seed = NULL )TrainModel( model, data, batch_size = 128L, learning_rate = 1e-04, num_steps = 1000L, validation_split = 0, early_stopping = FALSE, patience = 500L, checkpoint_dir = NULL, verbose = TRUE, seed = NULL )
model |
A TorchDeconModel or TorchDeconEnsemble object. |
data |
A TorchDeconProcessed object from |
batch_size |
Integer. Batch size for training. Default is 128. |
learning_rate |
Numeric. Learning rate for Adam optimizer. Default is 0.0001. |
num_steps |
Integer. Number of training steps. Default is 5000. |
validation_split |
Numeric. Fraction of data to use for validation (0-1). Default is 0 (no validation). |
early_stopping |
Logical. Enable early stopping based on validation loss. Default is FALSE. |
patience |
Integer. Number of steps without improvement before stopping. Default is 500. |
checkpoint_dir |
Character. Directory to save model checkpoints. Default is NULL. |
verbose |
Logical. Print training progress. Default is TRUE. |
seed |
Integer. Random seed. Default is NULL. |
The training process uses:
Adam optimizer with configurable learning rate
Mean Squared Error (MSE) loss function
Mini-batch gradient descent
Optional validation and early stopping
For ensemble models, each sub-model is trained sequentially.
The trained model object (modified in place and returned).
## Not run: # Train a single model model <- CreateTorchDecon(n_features = 5000, n_classes = 10) model <- TrainModel(model, processed_data, num_steps = 5000) # Train an ensemble ensemble <- CreateTorchDeconEnsemble(n_features = 5000, n_classes = 10) ensemble <- TrainModel(ensemble, processed_data, num_steps = 5000) ## End(Not run)## Not run: # Train a single model model <- CreateTorchDecon(n_features = 5000, n_classes = 10) model <- TrainModel(model, processed_data, num_steps = 5000) # Train an ensemble ensemble <- CreateTorchDeconEnsemble(n_features = 5000, n_classes = 10) ensemble <- TrainModel(ensemble, processed_data, num_steps = 5000) ## End(Not run)