Single-Sample CNA Analysis

Introduction

This vignette demonstrates a complete single-sample CNA analysis workflow using SCEVAN. We analyze a glioblastoma sample (MGH106) from the publicly available dataset GSE131928.

Setup

Load Required Packages

library(SCEVAN)
library(ggplot2)
library(dplyr)

Download Example Data

# Load pre-processed MGH106 glioblastoma data
load(url("https://www.dropbox.com/s/b9udpvhnc2ez9pc/MGH106_data.RData?raw=1"))

# Examine data structure
cat("Dimensions:", dim(count_mtx), "\n")
cat("Genes:", nrow(count_mtx), "\n")
cat("Cells:", ncol(count_mtx), "\n")

# Preview
count_mtx[1:5, 1:5]

Running the Pipeline

Basic Analysis

The pipelineCNA() function performs the complete analysis:

results <- pipelineCNA(
  count_mtx,
  sample = "MGH106",
  par_cores = 4,           # Adjust based on your system
  organism = "human",
  SUBCLONES = TRUE,        # Enable subclone detection
  beta_vega = 0.5,         # Segmentation parameter
  ClonalCN = TRUE,         # Infer clonal profile
  plotTree = TRUE          # Generate phylogenetic tree
)

Understanding Parameters

Parameter Value Rationale
sample “MGH106” Prefix for output files
par_cores 4 Parallel processing
SUBCLONES TRUE Detect clonal subpopulations
beta_vega 0.5 Standard segmentation granularity
ClonalCN TRUE Generate clonal CN profile

Examining Results

Cell Classification Summary

# View classification results
head(results)

# Summary statistics
table(results$class)

# Subclone distribution (if detected)
if("subclone" %in% colnames(results)) {
  table(results$subclone, useNA = "ifany")
}

Output Files

The pipeline generates several output files:

# List all output files
list.files("./output", pattern = "MGH106")
File Pattern Description
*heatmap.png CNA heatmap with classifications
*_CNAmtx.RData CNA matrix for downstream analysis
*_CN.seg Segmentation file
*OncoHeat.png OncoPrint-style visualization
*CloneTree.png Phylogenetic tree

Output Visualizations

SCEVAN generates several visualization files in the output directory:

1. Classification Heatmap (*heatmap.png)

The main heatmap shows copy number profiles across all cells:

  • Rows: Genes ordered by genomic position
  • Columns: Cells clustered by CNA profile
  • Color: Blue = deletion, Red = amplification
  • Top bar: Cell classification (tumor/normal)

2. Subclone Heatmap (*heatmap_subclones.png)

When subclones are detected, this heatmap colors cells by their subclone assignment.

3. Clonal Tree (*CloneTree.png)

The phylogenetic tree shows evolutionary relationships between subclones.

4. Consensus Plot (*consensus.png)

Compact visualization showing the consensus copy number profile for each subclone.

5. OncoPrint Visualization (*OncoHeat.png)

OncoPrint-style plot highlighting:

  • Subclone-specific alterations: Present in only one subclone
  • Shared alterations: Present in multiple (not all) subclones
  • Clonal alterations: Present in all subclones

Downstream Analysis

Load CNA Matrix

# Load the CNA matrix for custom analyses
load("./output/MGH106_CNAmtx.RData")
load("./output/MGH106_count_mtx_annot.RData")

dim(CNA_mtx_relat)

Extract Region-Specific CN Values

# Define a genomic region of interest (e.g., chromosome 7)
chr7_cn <- apply(
  CNA_mtx_relat[count_mtx_annot$seqnames == 7, ], 
  2, 
  mean
)

# Create data frame for plotting
cn_df <- data.frame(
  cell = names(chr7_cn),
  chr7_cn = chr7_cn,
  class = results[names(chr7_cn), "class"]
)

# Visualize
ggplot(cn_df, aes(x = class, y = chr7_cn, fill = class)) +
  geom_boxplot() +
  theme_minimal() +
  labs(
    title = "Chromosome 7 Copy Number by Cell Type",
    x = "Cell Classification",
    y = "Mean CN Ratio"
  ) +
  scale_fill_manual(values = c("tumor" = "#E74C3C", "normal" = "#3498DB"))

Differential Expression in CNA Regions

# Identify genes in amplified regions
chr7_genes <- rownames(count_mtx_annot)[count_mtx_annot$seqnames == 7]

# Compare expression between tumor subclones
if("subclone" %in% colnames(results)) {
  tumor_cells <- rownames(results)[results$class == "tumor"]
  
  # Get subclone assignments
  subclone_1 <- rownames(results)[results$subclone == 1 & !is.na(results$subclone)]
  subclone_2 <- rownames(results)[results$subclone == 2 & !is.na(results$subclone)]
  
  # Calculate fold changes for chr7 genes
  if(length(subclone_1) > 0 & length(subclone_2) > 0) {
    expr_s1 <- rowMeans(count_mtx[chr7_genes, subclone_1])
    expr_s2 <- rowMeans(count_mtx[chr7_genes, subclone_2])
    
    fc <- log2((expr_s1 + 1) / (expr_s2 + 1))
    
    # Top differentially expressed genes
    head(sort(fc, decreasing = TRUE), 10)
  }
}

Advanced Options

Using Known Normal Cells

If you have prior knowledge of normal cells:

# Specify known normal cell barcodes
known_normals <- c("cell_1", "cell_2", "cell_3")

results <- pipelineCNA(
  count_mtx,
  sample = "MGH106_custom",
  norm_cell = known_normals,
  FIXED_NORMAL_CELLS = FALSE  # Still allow automatic detection
)

Adjusting Segmentation

For noisy data, increase beta_vega:

# Coarser segmentation for noisy data
results_coarse <- pipelineCNA(
  count_mtx,
  sample = "MGH106_coarse",
  beta_vega = 1.0  # More conservative segmentation
)

Adding Custom Gene Sets

# Define custom normal cell signatures
custom_signatures <- list(
  Astrocytes = c("GFAP", "AQP4", "SLC1A2", "SLC1A3"),
  Oligodendrocytes = c("MBP", "MOG", "PLP1", "OLIG2"),
  Microglia = c("CX3CR1", "P2RY12", "TMEM119", "AIF1")
)

results <- pipelineCNA(
  count_mtx,
  sample = "MGH106_custom_sig",
  AdditionalGeneSets = custom_signatures,
  SCEVANsignatures = TRUE  # Also use built-in signatures
)

Best Practices

Quality Control Checklist

  1. Check cell coverage: Ensure sufficient genes per cell
  2. Verify normal detection: Review confident normal cells
  3. Validate subclones: Check modularity scores
  4. Examine segmentation: Adjust beta if needed

Troubleshooting

Common Issues

Issue Solution
No normal cells detected Add custom signatures or provide known normals
Too many/few segments Adjust beta_vega parameter
Memory errors Reduce par_cores or subsample data
Missing genes Verify gene name format (Symbol vs Ensembl)

Session Info

sessionInfo()
#> R version 4.6.0 (2026-04-24)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.4 LTS
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> time zone: Etc/UTC
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] rmarkdown_2.31
#> 
#> loaded via a namespace (and not attached):
#>  [1] digest_0.6.39    R6_2.6.1         fastmap_1.2.0    xfun_0.57       
#>  [5] maketools_1.3.2  cachem_1.1.0     knitr_1.51       htmltools_0.5.9 
#>  [9] buildtools_1.0.0 lifecycle_1.0.5  cli_3.6.6        sass_0.4.10     
#> [13] jquerylib_0.1.4  compiler_4.6.0   sys_3.4.3        tools_4.6.0     
#> [17] evaluate_1.0.5   bslib_0.11.0     yaml_2.3.12      otel_0.2.0      
#> [21] jsonlite_2.0.0   rlang_1.2.0