Main CytoSPACER Analysis Functions
Description
Core functions for running the CytoSPACER analysis pipeline.
| Title: | Mapping Single-Cell Transcriptomes to Spatial Transcriptomics Data |
|---|---|
| Description: | CytoSPACER is an R implementation of CytoSPACE, a computational strategy for assigning single-cell transcriptomes to in situ spatial transcriptomics (ST) data. The method solves single cell/spot assignment by minimizing a correlation-based cost function through a linear programming-based optimization routine using the Jonker-Volgenant algorithm. This package provides high-performance implementations using Rcpp and supports parallel processing across all platforms. |
| Authors: | Zaoqu Liu [aut, cre] (ORCID: <https://orcid.org/0000-0002-5623-6869>), CytoSPACE Team [ctb] (Original Python implementation) |
| Maintainer: | Zaoqu Liu <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 1.0.0 |
| Built: | 2026-06-01 09:10:17 UTC |
| Source: | https://github.com/Zaoqu-Liu/CytoSPACER |
Add CytoSPACER assignment results as metadata to a Seurat spatial object.
add_cytospace_to_seurat(seurat_obj, results)add_cytospace_to_seurat(seurat_obj, results)
seurat_obj |
Seurat spatial object. |
results |
CytoSPACER results from |
Updated Seurat object with new metadata columns.
Add small random noise to cost matrix for tie-breaking.
add_noise_to_cost(cost_matrix, scale = 1e-10, seed = NULL)add_noise_to_cost(cost_matrix, scale = 1e-10, seed = NULL)
cost_matrix |
Cost matrix. |
scale |
Scale of random noise. Default is 1e-10. |
seed |
Random seed. Default is NULL. |
Cost matrix with added noise.
Functions for sampling single cells based on cell type composition.
Compute the total cost of an assignment.
compute_assignment_cost(cost_matrix, assignment)compute_assignment_cost(cost_matrix, assignment)
cost_matrix |
The cost matrix. |
assignment |
Integer vector of column assignments. |
The total cost (sum of assigned costs).
## Not run: cost <- matrix(c(1, 2, 3, 2, 4, 6, 3, 6, 9), nrow = 3) assignment <- solve_lap(cost) total <- compute_assignment_cost(cost, assignment) ## End(Not run)## Not run: cost <- matrix(c(1, 2, 3, 2, 4, 6, 3, 6, 9), nrow = 3) assignment <- solve_lap(cost) total <- compute_assignment_cost(cost, assignment) ## End(Not run)
Compute the cost matrix between single cells and spatial spots based on expression similarity. The cost matrix is used as input to the linear assignment problem solver.
compute_cost_matrix( sc_data, st_data, method = c("pearson", "spearman", "euclidean"), use_cpp = TRUE )compute_cost_matrix( sc_data, st_data, method = c("pearson", "spearman", "euclidean"), use_cpp = TRUE )
sc_data |
Normalized single-cell expression matrix (genes x cells). |
st_data |
Normalized spatial expression matrix (genes x spots). |
method |
Distance/similarity metric:
|
use_cpp |
Logical. If TRUE, use C++ implementation. Default is TRUE. |
For correlation-based methods, the cost is computed as where
is the correlation coefficient. This transforms the similarity
measure into a cost for minimization. This matches the Python CytoSPACE
implementation exactly.
A cost matrix of dimensions (n_cells x n_spots).
## Not run: cost <- compute_cost_matrix(sc_norm, st_norm, method = "pearson") ## End(Not run)## Not run: cost <- compute_cost_matrix(sc_norm, st_norm, method = "pearson") ## End(Not run)
Functions for computing cost matrices for the linear assignment problem.
Functions for reading and writing CytoSPACER input/output files.
Downsample expression matrix to a target number of transcripts per cell. This helps ensure that cell assignments are not biased by overall expression levels.
downsample_expression(data, target_count = 1500, seed = NULL)downsample_expression(data, target_count = 1500, seed = NULL)
data |
Gene expression count matrix (genes x cells). |
target_count |
Target number of transcripts per cell. Default is 1500. |
seed |
Random seed for reproducibility. Default is NULL. |
For each cell, if the total count exceeds target_count, transcripts
are randomly sampled without replacement. Cells with fewer transcripts are
left unchanged.
Downsampled matrix with the same dimensions as input.
## Not run: downsampled <- downsample_expression(counts, target_count = 1500) ## End(Not run)## Not run: downsampled <- downsample_expression(counts, target_count = 1500) ## End(Not run)
Estimate the number of cells in each ST spot based on total RNA content and an expected mean cell count.
estimate_cells_per_spot(st_data, mean_cells = 5, min_cells = 1)estimate_cells_per_spot(st_data, mean_cells = 5, min_cells = 1)
st_data |
Spatial expression matrix (genes x spots). |
mean_cells |
Expected mean number of cells per spot. Default is 5 (appropriate for 10x Visium). |
min_cells |
Minimum cells per spot. Default is 1. |
The estimation uses a linear model relating total normalized expression to cell counts, anchored at the minimum and mean values.
An integer vector of estimated cell counts per spot.
## Not run: cell_counts <- estimate_cells_per_spot(st_expr, mean_cells = 5) ## End(Not run)## Not run: cell_counts <- estimate_cells_per_spot(st_expr, mean_cells = 5) ## End(Not run)
Estimate cell type fractions from spatial transcriptomics data using reference-based deconvolution.
estimate_fractions( sc_data, cell_types, st_data, method = "seurat", max_cells = 10000 ) estimate_cell_type_fractions( sc_data, cell_types, st_data, method = "seurat", max_cells = 10000 )estimate_fractions( sc_data, cell_types, st_data, method = "seurat", max_cells = 10000 ) estimate_cell_type_fractions( sc_data, cell_types, st_data, method = "seurat", max_cells = 10000 )
sc_data |
Single-cell expression matrix (genes x cells). |
cell_types |
Named vector of cell types for each cell. |
st_data |
Spatial expression matrix (genes x spots). |
method |
Deconvolution method. Currently "seurat" is supported. |
max_cells |
Maximum cells to use for computational efficiency. Default is 10000. |
This function requires the Seurat package (v4 or v5). It uses SCTransform for normalization and FindTransferAnchors/TransferData for deconvolution.
A data.frame with cell type fractions per spot.
## Not run: fractions <- estimate_fractions(sc_expr, cell_labels, st_expr) ## End(Not run)## Not run: fractions <- estimate_fractions(sc_expr, cell_labels, st_expr) ## End(Not run)
Extract expression data and metadata from a Seurat object.
extract_seurat_data( seurat_obj, assay = "RNA", cell_type_col = NULL, output_dir = NULL, prefix = "", sparse = FALSE )extract_seurat_data( seurat_obj, assay = "RNA", cell_type_col = NULL, output_dir = NULL, prefix = "", sparse = FALSE )
seurat_obj |
A Seurat object. |
assay |
Assay to extract. Default is "RNA". |
cell_type_col |
Column containing cell type labels. |
output_dir |
Directory to save output files. If NULL, returns data. |
prefix |
File name prefix for output files. |
sparse |
Save expression as sparse matrix. Default is FALSE. |
If output_dir is NULL, returns a list. Otherwise saves files.
## Not run: # Extract to files extract_seurat_data(seurat_obj, "RNA", "celltype", "input/") # Extract to R objects data <- extract_seurat_data(seurat_obj, "RNA", "celltype") ## End(Not run)## Not run: # Extract to files extract_seurat_data(seurat_obj, "RNA", "celltype", "input/") # Extract to R objects data <- extract_seurat_data(seurat_obj, "RNA", "celltype") ## End(Not run)
Extract spatial expression data and coordinates from a Seurat spatial object.
extract_spatial_data( seurat_obj, assay = "Spatial", image_name = NULL, output_dir = NULL, prefix = "", sparse = FALSE )extract_spatial_data( seurat_obj, assay = "Spatial", image_name = NULL, output_dir = NULL, prefix = "", sparse = FALSE )
seurat_obj |
A Seurat object with spatial data. |
assay |
Assay to extract. Default is "Spatial". |
image_name |
Name of the image. Default is first image. |
output_dir |
Directory to save output files. |
prefix |
File name prefix. |
sparse |
Save as sparse matrix. Default is FALSE. |
If output_dir is NULL, returns a list. Otherwise saves files.
R interface to linear assignment problem (LAP) solvers.
Core functions for running the CytoSPACER analysis pipeline.
Functions for normalizing gene expression data.
Normalize gene expression data using TPM normalization followed by log2(x+1) transformation.
normalize_expression(data, scale_factor = 1e+06, use_cpp = TRUE)normalize_expression(data, scale_factor = 1e+06, use_cpp = TRUE)
data |
Gene expression matrix (genes x cells/spots). Can be a matrix, data.frame, or sparse Matrix. |
scale_factor |
Scaling factor for normalization. Default is 1e6 (TPM). |
use_cpp |
Logical. If TRUE, use C++ implementation. Default is TRUE. |
The normalization performs:
TPM normalization:
Log transformation:
This is the standard normalization used in CytoSPACE for computing cell-to-spot similarities.
Normalized matrix with the same dimensions as input.
## Not run: normalized <- normalize_expression(counts_matrix) ## End(Not run)## Not run: normalized <- normalize_expression(counts_matrix) ## End(Not run)
Plot cell type composition as bar charts.
plot_composition( results, type = c("global", "per_spot"), top_spots = 20, colors = NULL )plot_composition( results, type = c("global", "per_spot"), top_spots = 20, colors = NULL )
results |
Results from |
type |
Type of plot:
|
top_spots |
Number of top spots for per_spot plot. Default is 20. |
colors |
Named vector of colors. |
A ggplot2 object.
Generate visualizations of CytoSPACER cell assignment results.
plot_cytospace( results, type = c("cell_types", "by_spot", "density"), coordinates = NULL, point_size = 1, alpha = 0.8, jitter = 0, max_cells = 50000, ncol = 3, colors = NULL, title = NULL )plot_cytospace( results, type = c("cell_types", "by_spot", "density"), coordinates = NULL, point_size = 1, alpha = 0.8, jitter = 0, max_cells = 50000, ncol = 3, colors = NULL, title = NULL )
results |
Results from |
type |
Type of plot to generate:
|
coordinates |
Optional coordinates data.frame. |
point_size |
Size of points. Default is 1. |
alpha |
Point transparency. Default is 0.8. |
jitter |
Amount of jitter to add. Default is 0. |
max_cells |
Maximum cells to plot. Default is 50000. |
ncol |
Number of columns for faceted plots. Default is 3. |
colors |
Named vector of colors per cell type. |
title |
Plot title. |
A ggplot2 object.
## Not run: results <- run_cytospace(...) plot_cytospace(results, type = "cell_types") plot_cytospace(results, type = "cell_types", jitter = 0.3) ## End(Not run)## Not run: results <- run_cytospace(...) plot_cytospace(results, type = "cell_types") plot_cytospace(results, type = "cell_types", jitter = 0.3) ## End(Not run)
Read various input file formats supported by CytoSPACER.
read_cytospace_input(file_path, as_sparse = FALSE)read_cytospace_input(file_path, as_sparse = FALSE)
file_path |
Path to the input file. Supported formats:
|
as_sparse |
Logical. If TRUE, return sparse matrix. Default is FALSE. |
For sparse matrix files (.mtx), the function expects companion files:
genes.tsv or features.tsv - Gene names
barcodes.tsv or cells.tsv - Cell/spot barcodes
A data.frame or dgCMatrix (if sparse) with genes as rows and cells/spots as columns.
## Not run: expr <- read_cytospace_input("scRNA_data.csv") expr_sparse <- read_cytospace_input("matrix.mtx", as_sparse = TRUE) ## End(Not run)## Not run: expr <- read_cytospace_input("scRNA_data.csv") expr_sparse <- read_cytospace_input("matrix.mtx", as_sparse = TRUE) ## End(Not run)
Read 10x Visium spatial transcriptomics data.
read_visium_data(path, use_seurat = FALSE)read_visium_data(path, use_seurat = FALSE)
path |
Path to SpaceRanger output directory or .tar.gz archive |
use_seurat |
Logical. If TRUE and Seurat is available, use Load10X_Spatial. Default is FALSE. |
A list containing:
Gene expression matrix (genes x spots)
Spot coordinates data.frame
## Not run: visium_data <- read_visium_data("path/to/spaceranger/output") ## End(Not run)## Not run: visium_data <- read_visium_data("path/to/spaceranger/output") ## End(Not run)
Main function for assigning single cells to spatial transcriptomics spots using the CytoSPACE algorithm. This is a complete R implementation of the Python CytoSPACE package.
run_cytospace( sc_data, cell_types, st_data, coordinates, cell_type_fractions = NULL, cells_per_spot = NULL, mean_cells_per_spot = 5, single_cell = FALSE, st_cell_types = NULL, distance_metric = c("pearson", "spearman", "euclidean"), sampling_method = c("duplicates", "synthetic"), downsample = TRUE, downsample_target = 1500, sampling_sub_spots = FALSE, number_of_selected_spots = 10000, number_of_selected_sub_spots = 10000, n_workers = 1, seed = 1, verbose = TRUE )run_cytospace( sc_data, cell_types, st_data, coordinates, cell_type_fractions = NULL, cells_per_spot = NULL, mean_cells_per_spot = 5, single_cell = FALSE, st_cell_types = NULL, distance_metric = c("pearson", "spearman", "euclidean"), sampling_method = c("duplicates", "synthetic"), downsample = TRUE, downsample_target = 1500, sampling_sub_spots = FALSE, number_of_selected_spots = 10000, number_of_selected_sub_spots = 10000, n_workers = 1, seed = 1, verbose = TRUE )
sc_data |
Single-cell RNA-seq expression matrix (genes x cells). Can be a matrix, data.frame, or sparse Matrix. |
cell_types |
Named character vector of cell type labels. Names should
match column names of |
st_data |
Spatial transcriptomics expression matrix (genes x spots). |
coordinates |
Data.frame with spot coordinates. Must have columns
'row' and 'col' (or 'X' and 'Y'), with rownames matching |
cell_type_fractions |
Optional data.frame of cell type fractions per spot. If NULL, will be estimated using Seurat. |
cells_per_spot |
Optional integer vector of cell counts per spot. If NULL, will be estimated from RNA content. |
mean_cells_per_spot |
Expected mean cells per spot for estimation. Default is 5 (appropriate for 10x Visium). |
single_cell |
Logical. If TRUE, treat ST data as single-cell spatial (one cell per spot). Default is FALSE. |
st_cell_types |
Optional named vector of cell types for single-cell ST. Required when single_cell=TRUE for optimal assignment. |
distance_metric |
Distance metric for cost matrix. One of "pearson" (default), "spearman", or "euclidean". |
sampling_method |
Method for handling cell count mismatches. One of "duplicates" (default) or "synthetic" (called "place_holders" in Python). |
downsample |
Logical. If TRUE, downsample scRNA-seq to equal counts. Default is TRUE. |
downsample_target |
Target count for downsampling. Default is 1500. |
sampling_sub_spots |
Logical. If TRUE, use sub-spot sampling for large datasets. Default is FALSE. |
number_of_selected_spots |
Chunk size for single_cell mode. Default 10000. |
number_of_selected_sub_spots |
Chunk size for sampling_sub_spots mode. Default is 10000. |
n_workers |
Number of parallel workers. Default is 1. |
seed |
Random seed for reproducibility. Default is 1. |
verbose |
Logical. If TRUE, print progress messages. Default is TRUE. |
The CytoSPACE algorithm assigns single cells to spatial spots by:
Estimating cell type fractions and cell counts per spot
Sampling single cells to match the ST cell composition
Computing a cost matrix based on expression similarity
Solving the linear assignment problem (LAP) using Jonker-Volgenant
For large datasets, use sampling_sub_spots=TRUE to partition the
problem into manageable chunks processed in parallel.
A list containing:
Data.frame with cell-to-spot assignments
Expression matrix for assigned cells
Cell type counts per spot (bulk ST only)
Cell type fractions per spot (bulk ST only)
List of input parameters
Character vector of log messages
Total execution time in seconds
Vahid, M.R., et al. (2023). High-resolution alignment of single-cell and spatial transcriptomes with CytoSPACE. Nature Biotechnology 41, 1543-1548.
## Not run: # Standard bulk ST analysis results <- run_cytospace( sc_data = sc_expr, cell_types = cell_labels, st_data = st_expr, coordinates = coords, mean_cells_per_spot = 5 ) # Single-cell ST with known cell types results <- run_cytospace( sc_data = sc_expr, cell_types = cell_labels, st_data = st_expr, coordinates = coords, single_cell = TRUE, st_cell_types = st_labels ) # Large dataset with sub-spot sampling results <- run_cytospace( sc_data = sc_expr, cell_types = cell_labels, st_data = st_expr, coordinates = coords, sampling_sub_spots = TRUE, number_of_selected_sub_spots = 5000, n_workers = 4 ) ## End(Not run)## Not run: # Standard bulk ST analysis results <- run_cytospace( sc_data = sc_expr, cell_types = cell_labels, st_data = st_expr, coordinates = coords, mean_cells_per_spot = 5 ) # Single-cell ST with known cell types results <- run_cytospace( sc_data = sc_expr, cell_types = cell_labels, st_data = st_expr, coordinates = coords, single_cell = TRUE, st_cell_types = st_labels ) # Large dataset with sub-spot sampling results <- run_cytospace( sc_data = sc_expr, cell_types = cell_labels, st_data = st_expr, coordinates = coords, sampling_sub_spots = TRUE, number_of_selected_sub_spots = 5000, n_workers = 4 ) ## End(Not run)
Run CytoSPACER analysis directly from Seurat objects.
run_cytospace_seurat( sc_seurat, st_seurat, cell_type_col = NULL, sc_assay = "RNA", st_assay = "Spatial", image_name = NULL, ... )run_cytospace_seurat( sc_seurat, st_seurat, cell_type_col = NULL, sc_assay = "RNA", st_assay = "Spatial", image_name = NULL, ... )
sc_seurat |
Seurat object containing scRNA-seq data. |
st_seurat |
Seurat object containing spatial transcriptomics data. |
cell_type_col |
Column name in sc_seurat metadata containing cell type labels. Default is NULL (uses Idents). |
sc_assay |
Assay to use from sc_seurat. Default is "RNA". |
st_assay |
Assay to use from st_seurat. Default is "Spatial". |
image_name |
Name of the image in st_seurat. Default is the first image. |
... |
Additional arguments passed to |
This function automatically extracts expression matrices, cell types, and coordinates from Seurat objects. Compatible with Seurat v4 and v5.
A list containing CytoSPACER results.
## Not run: sc <- readRDS("scRNA_seurat.rds") st <- readRDS("visium_seurat.rds") results <- run_cytospace_seurat(sc, st, cell_type_col = "celltype") ## End(Not run)## Not run: sc <- readRDS("scRNA_seurat.rds") st <- readRDS("visium_seurat.rds") results <- run_cytospace_seurat(sc, st, cell_type_col = "celltype") ## End(Not run)
Sample single cells from the scRNA-seq dataset to match the estimated cell type composition in the ST data.
sample_cells( sc_data, cell_types, target_counts, method = c("duplicates", "synthetic"), seed = NULL ) sample_cells_by_type( sc_data, cell_types, target_counts, method = c("duplicates", "synthetic"), seed = NULL )sample_cells( sc_data, cell_types, target_counts, method = c("duplicates", "synthetic"), seed = NULL ) sample_cells_by_type( sc_data, cell_types, target_counts, method = c("duplicates", "synthetic"), seed = NULL )
sc_data |
Single-cell expression matrix (genes x cells). |
cell_types |
Named character vector of cell types for each cell. |
target_counts |
Named integer vector of target cell counts per type. |
method |
Sampling method:
|
seed |
Random seed for reproducibility. |
A list containing:
Sampled expression matrix
Vector of original cell IDs
Vector of cell types for sampled cells
Logical vector indicating synthetic cells
## Not run: # Define target counts per cell type targets <- c(B_cell = 100, T_cell = 200, Macrophage = 50) # Sample cells sampled <- sample_cells(sc_expr, cell_labels, targets) ## End(Not run)## Not run: # Define target counts per cell type targets <- c(B_cell = 100, T_cell = 200, Macrophage = 50) # Sample cells sampled <- sample_cells(sc_expr, cell_labels, targets) ## End(Not run)
Save CytoSPACER visualization to files.
save_cytospace_plot( plot, output_dir, prefix = "", formats = c("png", "pdf"), width = 10, height = 8, dpi = 300 )save_cytospace_plot( plot, output_dir, prefix = "", formats = c("png", "pdf"), width = 10, height = 8, dpi = 300 )
plot |
A ggplot2 object. |
output_dir |
Output directory. |
prefix |
File name prefix. |
formats |
File formats to save. Default is c("png", "pdf"). |
width |
Plot width in inches. Default is 10. |
height |
Plot height in inches. Default is 8. |
dpi |
Resolution for raster formats. Default is 300. |
Invisibly returns file paths.
Functions for integrating CytoSPACER with Seurat objects.
Solve the linear assignment problem using the Jonker-Volgenant algorithm. This is the core optimization step in CytoSPACE.
solve_lap(cost_matrix, maximize = FALSE)solve_lap(cost_matrix, maximize = FALSE)
cost_matrix |
A numeric matrix of costs. For rectangular matrices, the number of rows must be less than or equal to columns. |
maximize |
Logical. If TRUE, maximize instead of minimize. Default is FALSE. |
The Jonker-Volgenant algorithm is an efficient O(n^3) algorithm for solving the linear assignment problem. It is particularly well-suited for dense cost matrices.
For rectangular matrices where rows < columns, the algorithm pads the matrix with dummy rows and returns only the assignments for the original rows.
An integer vector of column assignments. The i-th element indicates which column is assigned to row i (1-indexed).
Jonker, R., & Volgenant, A. (1987). A shortest augmenting path algorithm for dense and sparse linear assignment problems. Computing, 38(4), 325-340.
## Not run: # Create a simple cost matrix cost <- matrix(c(1, 2, 3, 2, 4, 6, 3, 6, 9), nrow = 3, byrow = TRUE) # Solve the assignment problem assignment <- solve_lap(cost) ## End(Not run)## Not run: # Create a simple cost matrix cost <- matrix(c(1, 2, 3, 2, 4, 6, 3, 6, 9), nrow = 3, byrow = TRUE) # Solve the assignment problem assignment <- solve_lap(cost) ## End(Not run)
Solve multiple assignment subproblems in parallel.
solve_lap_parallel(cost_matrices, n_workers = NULL, progress = TRUE)solve_lap_parallel(cost_matrices, n_workers = NULL, progress = TRUE)
cost_matrices |
A list of cost matrices. |
n_workers |
Number of parallel workers. Default is auto-detected. |
progress |
Logical. If TRUE, show progress. Default is TRUE. |
Uses the future framework for cross-platform parallel processing.
A list of assignment vectors.
## Not run: cost_list <- list(cost1, cost2, cost3) assignments <- solve_lap_parallel(cost_list, n_workers = 4) ## End(Not run)## Not run: cost_list <- list(cost1, cost2, cost3) assignments <- solve_lap_parallel(cost_list, n_workers = 4) ## End(Not run)
Write CytoSPACER results to output files.
write_cytospace_results( results, output_dir, prefix = "", save_expression = TRUE, sparse_output = FALSE )write_cytospace_results( results, output_dir, prefix = "", save_expression = TRUE, sparse_output = FALSE )
results |
A list returned by |
output_dir |
Directory to save output files |
prefix |
Prefix for output file names |
save_expression |
Logical. If TRUE, save expression matrix. Default TRUE. |
sparse_output |
Logical. If TRUE, save expression in sparse MTX format. |
Invisibly returns the output directory path.
## Not run: results <- run_cytospace(...) write_cytospace_results(results, "output/", prefix = "analysis1_") ## End(Not run)## Not run: results <- run_cytospace(...) write_cytospace_results(results, "output/", prefix = "analysis1_") ## End(Not run)