--- title: "Multi-Sample Comparison Analysis" author: "Zaoqu Liu" date: "`r Sys.Date()`" output: rmarkdown::html_vignette: toc: true toc_depth: 3 fig_caption: true vignette: > %\VignetteIndexEntry{Multi-Sample Comparison Analysis} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include=FALSE} knitr::opts_chunk$set( echo = TRUE, message = FALSE, warning = FALSE, collapse = TRUE, comment = "#>", fig.width = 10, fig.height = 6, out.width = "100%", eval = FALSE ) ``` ## Introduction SCEVAN enables comparative analysis of copy number alterations across multiple samples. This is particularly useful for: - **Longitudinal studies**: Comparing primary tumor vs metastasis - **Treatment response**: Pre- and post-treatment samples - **Patient cohorts**: Identifying shared vs patient-specific alterations ## Setup ```{r load-packages} library(SCEVAN) library(ggplot2) library(dplyr) ``` ## Preparing Multi-Sample Data ### Data Structure Multi-sample analysis requires a **named list** of count matrices: ```{r data-structure} # Example structure listCountMtx <- list( "Sample1" = count_mtx_1, "Sample2" = count_mtx_2, "Sample3" = count_mtx_3 ) ``` ### Download Example Data ```{r load-example} # Load glioblastoma multi-sample data load(url("https://www.dropbox.com/s/esqvnltucdqajg1/listCountMtx.RData?raw=1")) # Examine structure names(listCountMtx) # Sample sizes sapply(listCountMtx, ncol) ``` ## Running Multi-Sample Analysis ### Basic Comparison ```{r run-multi} multiSampleComparisonClonalCN( listCountMtx, analysisName = "GBM_comparison", organism = "human", par_cores = 4 ) ``` ### With Known Normal Cells ```{r with-normals} # Optionally provide known normal cells per sample listNormCells <- list( "MGH102" = c("cell_a", "cell_b"), "MGH104" = c("cell_c", "cell_d"), "MGH105" = NULL, # Auto-detect "MGH106" = NULL ) multiSampleComparisonClonalCN( listCountMtx, listNormCells = listNormCells, analysisName = "GBM_with_normals", par_cores = 4 ) ``` ## Output Interpretation ### Generated Files ```{r list-files} list.files("./output", pattern = "GBM_comparison") ``` | File | Description | |------|-------------| | `*_allOncoHeat.png` | Combined OncoPrint across samples | | `*_comparison.png` | Side-by-side CN profiles | | `*_CloneTree.png` | Cross-sample phylogeny | ### Comparative Heatmap The multi-sample comparison generates combined visualization files: - `*_allOncoHeat.png` - Combined OncoPrint showing shared and sample-specific alterations - `*_CloneTree.png` - Cross-sample phylogenetic tree ## Visualization Functions ### Plot All Clonal Profiles ```{r plot-clonal} # Generate combined clonal CN plot plotAllClonalCN( sampleNames = names(listCountMtx), pathOutput = "./output" ) ``` ### Plot Subclonal Profiles ```{r plot-subclonal} # Generate combined subclonal CN plot plotAllSubclonalCN( sampleNames = names(listCountMtx), pathOutput = "./output" ) ``` ## Advanced Analysis ### Identifying Shared Alterations ```{r shared-alterations} # Load individual results results_list <- lapply(names(listCountMtx), function(s) { seg_file <- paste0("./output/", s, "_Clonal_CN.seg") if(file.exists(seg_file)) { read.table(seg_file, header = TRUE, sep = "\t") } }) names(results_list) <- names(listCountMtx) # Find alterations present in all samples find_shared_alterations <- function(results_list) { # Get altered regions per sample altered_regions <- lapply(results_list, function(df) { if(!is.null(df)) { df[df$CN != 2, c("Chr", "Pos", "End", "CN")] } }) # Find overlaps (simplified example) # In practice, use GenomicRanges for proper overlap detection altered_regions } shared <- find_shared_alterations(results_list) ``` ### Custom Comparison Plots ```{r custom-plots} # Create custom comparison visualization create_cn_comparison <- function(sample_names, output_path) { cn_data <- lapply(sample_names, function(s) { load(paste0(output_path, "/", s, "_CNAmtx.RData")) data.frame( sample = s, mean_cn = colMeans(CNA_mtx_relat) ) }) cn_df <- do.call(rbind, cn_data) ggplot(cn_df, aes(x = sample, y = mean_cn, fill = sample)) + geom_boxplot() + theme_minimal() + labs( title = "Global CNA Burden Comparison", x = "Sample", y = "Mean CNA Score" ) + theme(legend.position = "none") } # Generate plot create_cn_comparison(names(listCountMtx), "./output") ``` ## Case Study: Head & Neck Cancer ### Primary vs Lymph Node Metastasis ```{r hnscc-example} # Load HNSCC data load(url("https://www.dropbox.com/s/6zns12amobs39g8/HNSCC26_data.RData?raw=1")) # Examine samples names(listCountMtx) # Should show "Primary" and "LN" # Run comparison multiSampleComparisonClonalCN( listCountMtx, analysisName = "HNSCC26_comparison", organism = "human", par_cores = 4, plotTree = TRUE ) ``` ### Interpreting Primary vs Metastasis Key questions to address: 1. **Clonal evolution**: Which alterations are clonal (present in both)? 2. **Metastasis-specific**: New alterations in lymph node 3. **Lost alterations**: Present in primary but not metastasis ## Statistical Considerations ### Sample Size Requirements | Analysis Type | Minimum Cells/Sample | Recommendation | |---------------|---------------------|----------------| | Cell classification | 100 | 500+ | | Subclone detection | 200 | 1000+ | | Cross-sample comparison | 300 | 1000+ | ### Batch Effect Considerations When comparing samples: - Process with same protocol if possible - Consider batch correction methods - Verify normal cell detection consistency ## Best Practices ### Workflow Checklist 1. [ ] Verify consistent gene annotation across samples 2. [ ] Check cell quality metrics per sample 3. [ ] Run individual analyses first 4. [ ] Review normal cell detection 5. [ ] Run multi-sample comparison 6. [ ] Validate shared alterations ### Common Pitfalls | Issue | Cause | Solution | |-------|-------|----------| | No shared alterations | Different tumor types | Verify sample identity | | All alterations shared | Contamination | Check for cross-sample mixing | | Inconsistent segmentation | Different cell counts | Normalize comparison | ## Session Info ```{r session-info, eval=TRUE} sessionInfo() ```