| Title: | Multi-Objective Gene Selection Using Evolutionary Algorithms |
|---|---|
| Description: | Automatic gene selection for bulk RNA-seq deconvolution using multi-objective optimization. Implements the NSGA-II algorithm to simultaneously minimize correlation and maximize distance between cell type expression profiles, yielding Pareto-optimal gene subsets. Supports Seurat objects (V4 and V5), SingleCellExperiment, and standard matrix inputs. Includes built-in deconvolution methods and parallel computing support. |
| Authors: | Zaoqu Liu [aut, cre] |
| Maintainer: | Zaoqu Liu <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 1.0.0 |
| Built: | 2026-05-25 07:31:01 UTC |
| Source: | https://github.com/Zaoqu-Liu/darwin |
Computes the condition number of the data matrix. Lower values indicate better numerical stability for deconvolution.
compute_condition(data, use_cpp = getOption("darwin.use_cpp", TRUE))compute_condition(data, use_cpp = getOption("darwin.use_cpp", TRUE))
data |
Numeric matrix (cell types x genes). |
use_cpp |
Use C++ implementation for speed. Default: TRUE. |
Condition number (ratio of largest to smallest singular value).
data <- matrix(rnorm(50), nrow = 5, ncol = 10) cond <- compute_condition(data)data <- matrix(rnorm(50), nrow = 5, ncol = 10) cond <- compute_condition(data)
Computes the sum of absolute pairwise Pearson correlations between cell type expression profiles. Lower values indicate more distinct profiles.
compute_correlation(data, use_cpp = getOption("darwin.use_cpp", TRUE))compute_correlation(data, use_cpp = getOption("darwin.use_cpp", TRUE))
data |
Numeric matrix (cell types x genes). |
use_cpp |
Use C++ implementation for speed. Default: TRUE. |
Sum of absolute pairwise Pearson correlations between rows.
data <- matrix(rnorm(50), nrow = 5, ncol = 10) corr <- compute_correlation(data)data <- matrix(rnorm(50), nrow = 5, ncol = 10) corr <- compute_correlation(data)
Computes the sum of pairwise Euclidean distances between cell type expression profiles. Higher values indicate more distinct profiles.
compute_distance(data, use_cpp = getOption("darwin.use_cpp", TRUE))compute_distance(data, use_cpp = getOption("darwin.use_cpp", TRUE))
data |
Numeric matrix (cell types x genes). |
use_cpp |
Use C++ implementation for speed. Default: TRUE. |
Sum of pairwise Euclidean distances between rows.
data <- matrix(rnorm(50), nrow = 5, ncol = 10) dist_val <- compute_distance(data)data <- matrix(rnorm(50), nrow = 5, ncol = 10) dist_val <- compute_distance(data)
Computes the crowding distance for each individual, measuring solution density in objective space.
crowding_distance(fitness_values, ranks = NULL)crowding_distance(fitness_values, ranks = NULL)
fitness_values |
Matrix of fitness values (individuals x objectives). |
ranks |
Vector of Pareto ranks (optional). |
Crowding distance measures how close an individual is to its neighbors in objective space. Higher values indicate more isolated solutions, which are preferred for maintaining diversity. Boundary solutions receive infinite crowding distance.
Numeric vector of crowding distances.
fitness <- matrix(runif(20), nrow = 10, ncol = 2) crowding <- crowding_distance(fitness)fitness <- matrix(runif(20), nrow = 10, ncol = 2) crowding <- crowding_distance(fitness)
Creates a Darwin object for multi-objective gene selection optimization. This is the recommended way to create Darwin objects.
darwin( data, celltype_key = "celltype", assay = NULL, layer = "data", genes_key = NULL, use_highly_variable = FALSE )darwin( data, celltype_key = "celltype", assay = NULL, layer = "data", genes_key = NULL, use_highly_variable = FALSE )
data |
Input data. Can be a matrix (cell types x genes), data.frame, Seurat object, or SingleCellExperiment object. |
celltype_key |
For Seurat/SCE objects, the metadata column containing cell type labels. Default: "celltype". |
assay |
For Seurat objects, which assay to use. Default: default assay. |
layer |
For Seurat V5, which layer to use. Default: "data". |
genes_key |
Column in feature metadata for gene pre-selection. |
use_highly_variable |
Use highly variable genes only. Default: FALSE. |
A Darwin-class R6 object.
Darwin-class for the R6 class documentation.
# Create example data set.seed(42) data <- matrix(rnorm(500), nrow = 5, ncol = 100) rownames(data) <- paste0("CellType", 1:5) colnames(data) <- paste0("Gene", 1:100) # Initialize darwin dw <- darwin(data) # Run optimization dw$optimize(ngen = 5, verbose = FALSE, parallel = FALSE) # Select genes dw$select() genes <- dw$get_genes()# Create example data set.seed(42) data <- matrix(rnorm(500), nrow = 5, ncol = 100) rownames(data) <- paste0("CellType", 1:5) colnames(data) <- paste0("Gene", 1:100) # Initialize darwin dw <- darwin(data) # Run optimization dw$optimize(ngen = 5, verbose = FALSE, parallel = FALSE) # Select genes dw$select() genes <- dw$get_genes()
R6 class implementing multi-objective gene selection using NSGA-II algorithm.
Use the darwin constructor function to create instances.
None directly exposed. Use methods to access data.
initialize(data, ...)Create a new Darwin object. See darwin.
optimize(ngen, mode, ...)Run the NSGA-II optimization algorithm.
plot()Plot the Pareto front.
select(weights, index, close_to)Select a solution from the Pareto front.
get_genes()Get names of selected genes.
get_selection()Get logical vector of gene selection.
get_pareto()Get all Pareto-optimal solutions.
get_fitness()Get fitness values for Pareto front.
deconvolve(bulk, method)Perform bulk RNA-seq deconvolution.
save(path)Save object to file.
print()Print object summary.
darwin for the constructor function.
# Create example data set.seed(42) data <- matrix(rnorm(500), nrow = 5, ncol = 100) rownames(data) <- paste0("CellType", 1:5) colnames(data) <- paste0("Gene", 1:100) # Create and use Darwin object dw <- darwin(data) # Using constructor function dw$optimize(ngen = 5, verbose = FALSE, parallel = FALSE) dw$select() genes <- dw$get_genes()# Create example data set.seed(42) data <- matrix(rnorm(500), nrow = 5, ncol = 100) rownames(data) <- paste0("CellType", 1:5) colnames(data) <- paste0("Gene", 1:100) # Create and use Darwin object dw <- darwin(data) # Using constructor function dw$optimize(ngen = 5, verbose = FALSE, parallel = FALSE) dw$select() genes <- dw$get_genes()
Performs NSGA-II selection based on non-dominated sorting and crowding distance.
nsga2_select(fitness_values, n, weights)nsga2_select(fitness_values, n, weights)
fitness_values |
Matrix of fitness values (individuals x objectives). |
n |
Number of individuals to select. |
weights |
Vector of weights (-1 for minimization, 1 for maximization). |
Selection proceeds by:
Non-dominated sorting to assign Pareto ranks
Computing crowding distance within each rank
Selecting individuals by rank (ascending), breaking ties by crowding distance (descending)
Integer vector of selected indices.
Deb, K., Pratap, A., Agarwal, S., & Meyarivan, T. (2002). A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Transactions on Evolutionary Computation, 6(2), 182-197.
fitness <- matrix(runif(20), nrow = 10, ncol = 2) selected <- nsga2_select(fitness, n = 5, weights = c(-1, 1))fitness <- matrix(runif(20), nrow = 10, ncol = 2) selected <- nsga2_select(fitness, n = 5, weights = c(-1, 1))