This vignette demonstrates a complete single-sample CNA analysis workflow using SCEVAN. We analyze a glioblastoma sample (MGH106) from the publicly available dataset GSE131928.
The pipelineCNA() function performs the complete
analysis:
| Parameter | Value | Rationale |
|---|---|---|
sample |
“MGH106” | Prefix for output files |
par_cores |
4 | Parallel processing |
SUBCLONES |
TRUE | Detect clonal subpopulations |
beta_vega |
0.5 | Standard segmentation granularity |
ClonalCN |
TRUE | Generate clonal CN profile |
The pipeline generates several output files:
| File Pattern | Description |
|---|---|
*heatmap.png |
CNA heatmap with classifications |
*_CNAmtx.RData |
CNA matrix for downstream analysis |
*_CN.seg |
Segmentation file |
*OncoHeat.png |
OncoPrint-style visualization |
*CloneTree.png |
Phylogenetic tree |
SCEVAN generates several visualization files in the output directory:
*heatmap.png)The main heatmap shows copy number profiles across all cells:
*heatmap_subclones.png)When subclones are detected, this heatmap colors cells by their subclone assignment.
*CloneTree.png)The phylogenetic tree shows evolutionary relationships between subclones.
*consensus.png)Compact visualization showing the consensus copy number profile for each subclone.
*OncoHeat.png)OncoPrint-style plot highlighting:
# Define a genomic region of interest (e.g., chromosome 7)
chr7_cn <- apply(
CNA_mtx_relat[count_mtx_annot$seqnames == 7, ],
2,
mean
)
# Create data frame for plotting
cn_df <- data.frame(
cell = names(chr7_cn),
chr7_cn = chr7_cn,
class = results[names(chr7_cn), "class"]
)
# Visualize
ggplot(cn_df, aes(x = class, y = chr7_cn, fill = class)) +
geom_boxplot() +
theme_minimal() +
labs(
title = "Chromosome 7 Copy Number by Cell Type",
x = "Cell Classification",
y = "Mean CN Ratio"
) +
scale_fill_manual(values = c("tumor" = "#E74C3C", "normal" = "#3498DB"))# Identify genes in amplified regions
chr7_genes <- rownames(count_mtx_annot)[count_mtx_annot$seqnames == 7]
# Compare expression between tumor subclones
if("subclone" %in% colnames(results)) {
tumor_cells <- rownames(results)[results$class == "tumor"]
# Get subclone assignments
subclone_1 <- rownames(results)[results$subclone == 1 & !is.na(results$subclone)]
subclone_2 <- rownames(results)[results$subclone == 2 & !is.na(results$subclone)]
# Calculate fold changes for chr7 genes
if(length(subclone_1) > 0 & length(subclone_2) > 0) {
expr_s1 <- rowMeans(count_mtx[chr7_genes, subclone_1])
expr_s2 <- rowMeans(count_mtx[chr7_genes, subclone_2])
fc <- log2((expr_s1 + 1) / (expr_s2 + 1))
# Top differentially expressed genes
head(sort(fc, decreasing = TRUE), 10)
}
}If you have prior knowledge of normal cells:
For noisy data, increase beta_vega:
# Define custom normal cell signatures
custom_signatures <- list(
Astrocytes = c("GFAP", "AQP4", "SLC1A2", "SLC1A3"),
Oligodendrocytes = c("MBP", "MOG", "PLP1", "OLIG2"),
Microglia = c("CX3CR1", "P2RY12", "TMEM119", "AIF1")
)
results <- pipelineCNA(
count_mtx,
sample = "MGH106_custom_sig",
AdditionalGeneSets = custom_signatures,
SCEVANsignatures = TRUE # Also use built-in signatures
)# 1. Initial run with defaults
results_default <- pipelineCNA(count_mtx, sample = "sample_v1")
# 2. Review outputs and classifications
table(results_default$class)
# 3. Adjust parameters if needed
results_tuned <- pipelineCNA(
count_mtx,
sample = "sample_v2",
beta_vega = 0.7 # Adjusted based on review
)
# 4. Final analysis
results_final <- pipelineCNA(
count_mtx,
sample = "sample_final",
SUBCLONES = TRUE,
plotTree = TRUE
)| Issue | Solution |
|---|---|
| No normal cells detected | Add custom signatures or provide known normals |
| Too many/few segments | Adjust beta_vega parameter |
| Memory errors | Reduce par_cores or subsample data |
| Missing genes | Verify gene name format (Symbol vs Ensembl) |
sessionInfo()
#> R version 4.6.0 (2026-04-24)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.4 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: Etc/UTC
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] rmarkdown_2.31
#>
#> loaded via a namespace (and not attached):
#> [1] digest_0.6.39 R6_2.6.1 fastmap_1.2.0 xfun_0.57
#> [5] maketools_1.3.2 cachem_1.1.0 knitr_1.51 htmltools_0.5.9
#> [9] buildtools_1.0.0 lifecycle_1.0.5 cli_3.6.6 sass_0.4.10
#> [13] jquerylib_0.1.4 compiler_4.6.0 sys_3.4.3 tools_4.6.0
#> [17] evaluate_1.0.5 bslib_0.11.0 yaml_2.3.12 otel_0.2.0
#> [21] jsonlite_2.0.0 rlang_1.2.0