| Title: | Single-Cell Phenotype-Associated Subpopulation Identifier |
|---|---|
| Description: | Identifies phenotype-associated cell subpopulations from single-cell RNA-seq data by integrating bulk RNA-seq data with phenotype information. This package uses network-regularized sparse regression to quantify the strength of association between each cell and a phenotype (e.g., disease stage, tumor metastasis, treatment response, survival outcomes). Compatible with both Seurat v4 (4.0.0+) and Seurat v5 (5.0.0+). The method supports Gaussian (continuous), binomial (binary), and Cox (survival) regression families. Full cross-platform compatibility (Windows, macOS, Linux). |
| Authors: | Aimin Xie [aut] (Original author), Zaoqu Liu [aut, cre] (ORCID: <https://orcid.org/0000-0002-0452-742X>, Maintainer, bug fixes and optimization) |
| Maintainer: | Zaoqu Liu <[email protected]> |
| License: | GPL-3 |
| Version: | 1.0.4 |
| Built: | 2026-05-23 08:35:15 UTC |
| Source: | https://github.com/Zaoqu-Liu/scPAS |
The function of imputaion.
imputation(obj, assay = "RNA", method = c("KNN", "ALRA"))imputation(obj, assay = "RNA", method = c("KNN", "ALRA"))
obj |
A seurat object. |
assay |
The assay for imputation. The default is 'RNA'. |
method |
The method for imputation. The default is 'RNA'. |
A seurat object after imputaion.
A method for imputation of missing values in single cell RNA-sequencing data based on ALRA.
imputation_ALRA(obj, assay = "RNA")imputation_ALRA(obj, assay = "RNA")
obj |
A seurat object. |
assay |
The assay for imputation. The default is 'RNA'. |
A seurat object after imputaion.
A method for imputation of missing values in single cell RNA-sequencing data based on the average expression value of nearest neighbor cells.
imputation_KNN(obj, assay = "RNA", LogNormalized = TRUE)imputation_KNN(obj, assay = "RNA", LogNormalized = TRUE)
obj |
A seurat object. |
assay |
The assay for imputation. The default is 'RNA'. |
LogNormalized |
Whether the data is LogNormalized. |
A seurat object after imputaion.
Seurat packageThis function provide a simplified-version of Seurat analysis pipeline for single-cell RNA-seq data. It contains the following steps in the pipeline:
Create a Seurat object from raw data.
Normalize the count data present in a given assay.
Identify the variable features.
Scales and centers features in the dataset.
Run a PCA dimensionality reduction.
Constructs a Shared Nearest Neighbor (SNN) Graph for a given dataset.
Identify clusters of cells by a shared nearest neighbor (SNN) modularity optimization based clustering algorithm.
Run t-distributed Stochastic Neighbor Embedding (t-SNE) dimensionality reduction on selected features.
Runs the Uniform Manifold Approximation and Projection (UMAP) dimensional reduction technique.
run_Seurat( counts, project = "Single_Cell", min.cells = 400, min.features = 200, meta.data = NULL, normalization.method = "LogNormalize", scale.factor = 10000, selection.method = "vst", resolution = 0.6, dims_Neighbors = 1:10, dims_TSNE = 1:10, dims_UMAP = 1:10, verbose = TRUE )run_Seurat( counts, project = "Single_Cell", min.cells = 400, min.features = 200, meta.data = NULL, normalization.method = "LogNormalize", scale.factor = 10000, selection.method = "vst", resolution = 0.6, dims_Neighbors = 1:10, dims_TSNE = 1:10, dims_UMAP = 1:10, verbose = TRUE )
counts |
A |
project |
Project name for the |
min.cells |
Include features detected in at least this many cells. Will subset the counts matrix as well. To reintroduce excluded features, create a new object with a lower cutoff. |
min.features |
Include cells where at least this many features are detected. |
meta.data |
meta data of single cell data. |
normalization.method |
Method for normalization.
|
scale.factor |
Sets the scale factor for cell-level normalization. |
selection.method |
How to choose top variable features. Choose one of :
|
resolution |
Value of the resolution parameter, use a value above (below) 1.0 if you want to obtain a larger (smaller) number of communities. |
dims_Neighbors |
Dimensions of reduction to use as input. |
dims_TSNE |
Which dimensions to use as input features for t-SNE. |
dims_UMAP |
Which dimensions to use as input features for UMAP. |
verbose |
Print output. |
A Seurat object containing cell-cell similarity network, t-SNE and UMAP representations.
scPAS : A tool for identifying Phenotype-Associated cell Subpopulations from single-cell sequencing data by integrating bulk data
scPAS( bulk_dataset, sc_dataset, phenotype, assay = "RNA", tag = NULL, nfeature = NULL, do_imputation = TRUE, imputation_method = c("KNN", "ALRA"), alpha = NULL, network_class = c("SC", "bulk"), independent = TRUE, family = c("gaussian", "binomial", "cox"), permutation_times = 2000, FDR.threshold = 0.05, n_cores = 1 )scPAS( bulk_dataset, sc_dataset, phenotype, assay = "RNA", tag = NULL, nfeature = NULL, do_imputation = TRUE, imputation_method = c("KNN", "ALRA"), alpha = NULL, network_class = c("SC", "bulk"), independent = TRUE, family = c("gaussian", "binomial", "cox"), permutation_times = 2000, FDR.threshold = 0.05, n_cores = 1 )
bulk_dataset |
Matrix. Bulk expression matrix of related disease. Each row represents a gene and each column represents a sample. The input expression values are continuous, such as microarray fluorescent units in logarithmic scale, RNA-seq log-CPMs, log-RPKMs or log-TPMs. |
sc_dataset |
Matrix or seurat object. Single-cell RNA-seq expression matrix of related disease. Each row represents a gene and each column represents a sample. A Seurat object that contains the preprocessed data and constructed network is preferred. Otherwise, a cell-cell similarity network is constructed based on the input matrix.Otherwise, the raw count expression matrix will be processed by using Seurat's default parameters. See run_Seurat for details. |
phenotype |
Phenotype annotation of each bulk sample. It can be a continuous dependent variable, binary group indicator vector, or clinical survival data:
|
assay |
Name of Assay to get. |
tag |
Names for each phenotypic group. Used for logistic regressions only. |
nfeature |
Numeric. The Number of features to select as top variable features in sc_dataset. Top variable features will be used to intersect with the features of bulk_dataset. Default is NULL.All features will be used. |
do_imputation |
Logical. Whether to perform imputation on single-cell data (default: TRUE). |
imputation_method |
Character. Name of alternative method for imputation. |
alpha |
Numeric. Parameter used to balance the effect of the l1 norm and the network-based penalties. It can be a number or a searching vector.
If |
network_class |
The source of feature-feature similarity network. By default this is set to |
independent |
Logical. The background distribution of risk scores is constructed independently of each cell. |
family |
Character. Response type for the regression model. It depends on the type of the given phenotype and
can be |
permutation_times |
Integer. Number of permutation iterations for statistical significance testing (default: 2000). Higher values increase accuracy but also computation time. Recommended: 1000-5000. For faster testing, use 500-1000. |
FDR.threshold |
Numeric. FDR value threshold for identifying phenotype-associated cells. The default is 0.05. |
n_cores |
Integer. Number of CPU cores to use for parallel permutation test (default: 1 for sequential processing). Setting n_cores > 1 enables parallel computing which can significantly speed up the analysis (2-4x faster with 4 cores). Requires 'future' and 'future.apply' packages. |
This function returns a Seurat object with the following components added to :
scPAS_para |
A list contains the final model parameters added to misc. |
PAS result |
A data frame containing risk scores (scPAS_RS), normalized risk scores (scPAS_NRS), p-value (scPAS_Pvalue) , adjusted p-value (scPAS_FDR) cell classification labels (scPAS) added to metaData. |
scPAS.prediction: A function that uses the scPAS model to make predictions on independent data
scPAS.prediction( model, test.data, assay = "RNA", FDR.threshold = 0.05, do_imputation = FALSE, imputation_method = "KNN", independent = TRUE, permutation_times = 2000, n_cores = 1 )scPAS.prediction( model, test.data, assay = "RNA", FDR.threshold = 0.05, do_imputation = FALSE, imputation_method = "KNN", independent = TRUE, permutation_times = 2000, n_cores = 1 )
model |
Seurat object. A Seurat object containing the scPAS model (from running scPAS()). |
test.data |
Matrix or Seurat object. Single-cell RNA-seq expression matrix of related disease. Each row represents a gene and each column represents a sample. A Seurat object that contains the preprocessed data and constructed network is preferred. |
assay |
Name of Assay to get. |
FDR.threshold |
Numeric. FDR value threshold for identifying phenotype-associated cells. The default is 0.05. |
do_imputation |
Logical. Whether to perform imputation on the test data (default: FALSE). |
imputation_method |
Character. Imputation method: "KNN" or "ALRA". |
independent |
Logical. Whether to compute background distribution independently for each cell. |
permutation_times |
Integer. Number of permutations for significance testing (default: 2000). |
n_cores |
Integer. Number of CPU cores for parallel processing (default: 1). |
A seurat object or data frame containing the forecast results.
A function compute the correlation of a sparse matrix.
sparse.cor(x)sparse.cor(x)
x |
Matrix. Normalized single cell expression profile extracted from Seurat object. |
A correlation matrix.