--- title: "Single-Sample CNA Analysis" author: "Zaoqu Liu" date: "`r Sys.Date()`" output: rmarkdown::html_vignette: toc: true toc_depth: 3 vignette: > %\VignetteIndexEntry{Single-Sample CNA Analysis} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include=FALSE} knitr::opts_chunk$set( echo = TRUE, message = FALSE, warning = FALSE, collapse = TRUE, comment = "#>", fig.width = 10, fig.height = 6, out.width = "100%", eval = FALSE ) ``` ## Introduction This vignette demonstrates a complete single-sample CNA analysis workflow using SCEVAN. We analyze a glioblastoma sample (MGH106) from the publicly available dataset GSE131928. ## Setup ### Load Required Packages ```{r load-packages} library(SCEVAN) library(ggplot2) library(dplyr) ``` ### Download Example Data ```{r load-data} # Load pre-processed MGH106 glioblastoma data load(url("https://www.dropbox.com/s/b9udpvhnc2ez9pc/MGH106_data.RData?raw=1")) # Examine data structure cat("Dimensions:", dim(count_mtx), "\n") cat("Genes:", nrow(count_mtx), "\n") cat("Cells:", ncol(count_mtx), "\n") # Preview count_mtx[1:5, 1:5] ``` ## Running the Pipeline ### Basic Analysis The `pipelineCNA()` function performs the complete analysis: ```{r run-basic} results <- pipelineCNA( count_mtx, sample = "MGH106", par_cores = 4, # Adjust based on your system organism = "human", SUBCLONES = TRUE, # Enable subclone detection beta_vega = 0.5, # Segmentation parameter ClonalCN = TRUE, # Infer clonal profile plotTree = TRUE # Generate phylogenetic tree ) ``` ### Understanding Parameters | Parameter | Value | Rationale | |-----------|-------|-----------| | `sample` | "MGH106" | Prefix for output files | | `par_cores` | 4 | Parallel processing | | `SUBCLONES` | TRUE | Detect clonal subpopulations | | `beta_vega` | 0.5 | Standard segmentation granularity | | `ClonalCN` | TRUE | Generate clonal CN profile | ## Examining Results ### Cell Classification Summary ```{r examine-results} # View classification results head(results) # Summary statistics table(results$class) # Subclone distribution (if detected) if("subclone" %in% colnames(results)) { table(results$subclone, useNA = "ifany") } ``` ### Output Files The pipeline generates several output files: ```{r list-outputs} # List all output files list.files("./output", pattern = "MGH106") ``` | File Pattern | Description | |--------------|-------------| | `*heatmap.png` | CNA heatmap with classifications | | `*_CNAmtx.RData` | CNA matrix for downstream analysis | | `*_CN.seg` | Segmentation file | | `*OncoHeat.png` | OncoPrint-style visualization | | `*CloneTree.png` | Phylogenetic tree | ## Output Visualizations SCEVAN generates several visualization files in the output directory: ### 1. Classification Heatmap (`*heatmap.png`) The main heatmap shows copy number profiles across all cells: - **Rows**: Genes ordered by genomic position - **Columns**: Cells clustered by CNA profile - **Color**: Blue = deletion, Red = amplification - **Top bar**: Cell classification (tumor/normal) ### 2. Subclone Heatmap (`*heatmap_subclones.png`) When subclones are detected, this heatmap colors cells by their subclone assignment. ### 3. Clonal Tree (`*CloneTree.png`) The phylogenetic tree shows evolutionary relationships between subclones. ### 4. Consensus Plot (`*consensus.png`) Compact visualization showing the consensus copy number profile for each subclone. ### 5. OncoPrint Visualization (`*OncoHeat.png`) OncoPrint-style plot highlighting: - **Subclone-specific alterations**: Present in only one subclone - **Shared alterations**: Present in multiple (not all) subclones - **Clonal alterations**: Present in all subclones ## Downstream Analysis ### Load CNA Matrix ```{r load-cna} # Load the CNA matrix for custom analyses load("./output/MGH106_CNAmtx.RData") load("./output/MGH106_count_mtx_annot.RData") dim(CNA_mtx_relat) ``` ### Extract Region-Specific CN Values ```{r extract-region} # Define a genomic region of interest (e.g., chromosome 7) chr7_cn <- apply( CNA_mtx_relat[count_mtx_annot$seqnames == 7, ], 2, mean ) # Create data frame for plotting cn_df <- data.frame( cell = names(chr7_cn), chr7_cn = chr7_cn, class = results[names(chr7_cn), "class"] ) # Visualize ggplot(cn_df, aes(x = class, y = chr7_cn, fill = class)) + geom_boxplot() + theme_minimal() + labs( title = "Chromosome 7 Copy Number by Cell Type", x = "Cell Classification", y = "Mean CN Ratio" ) + scale_fill_manual(values = c("tumor" = "#E74C3C", "normal" = "#3498DB")) ``` ### Differential Expression in CNA Regions ```{r de-analysis} # Identify genes in amplified regions chr7_genes <- rownames(count_mtx_annot)[count_mtx_annot$seqnames == 7] # Compare expression between tumor subclones if("subclone" %in% colnames(results)) { tumor_cells <- rownames(results)[results$class == "tumor"] # Get subclone assignments subclone_1 <- rownames(results)[results$subclone == 1 & !is.na(results$subclone)] subclone_2 <- rownames(results)[results$subclone == 2 & !is.na(results$subclone)] # Calculate fold changes for chr7 genes if(length(subclone_1) > 0 & length(subclone_2) > 0) { expr_s1 <- rowMeans(count_mtx[chr7_genes, subclone_1]) expr_s2 <- rowMeans(count_mtx[chr7_genes, subclone_2]) fc <- log2((expr_s1 + 1) / (expr_s2 + 1)) # Top differentially expressed genes head(sort(fc, decreasing = TRUE), 10) } } ``` ## Advanced Options ### Using Known Normal Cells If you have prior knowledge of normal cells: ```{r known-normals} # Specify known normal cell barcodes known_normals <- c("cell_1", "cell_2", "cell_3") results <- pipelineCNA( count_mtx, sample = "MGH106_custom", norm_cell = known_normals, FIXED_NORMAL_CELLS = FALSE # Still allow automatic detection ) ``` ### Adjusting Segmentation For noisy data, increase `beta_vega`: ```{r adjust-beta} # Coarser segmentation for noisy data results_coarse <- pipelineCNA( count_mtx, sample = "MGH106_coarse", beta_vega = 1.0 # More conservative segmentation ) ``` ### Adding Custom Gene Sets ```{r custom-genesets} # Define custom normal cell signatures custom_signatures <- list( Astrocytes = c("GFAP", "AQP4", "SLC1A2", "SLC1A3"), Oligodendrocytes = c("MBP", "MOG", "PLP1", "OLIG2"), Microglia = c("CX3CR1", "P2RY12", "TMEM119", "AIF1") ) results <- pipelineCNA( count_mtx, sample = "MGH106_custom_sig", AdditionalGeneSets = custom_signatures, SCEVANsignatures = TRUE # Also use built-in signatures ) ``` ## Best Practices ### Quality Control Checklist 1. **Check cell coverage**: Ensure sufficient genes per cell 2. **Verify normal detection**: Review confident normal cells 3. **Validate subclones**: Check modularity scores 4. **Examine segmentation**: Adjust beta if needed ### Recommended Workflow ```{r workflow, eval=FALSE} # 1. Initial run with defaults results_default <- pipelineCNA(count_mtx, sample = "sample_v1") # 2. Review outputs and classifications table(results_default$class) # 3. Adjust parameters if needed results_tuned <- pipelineCNA( count_mtx, sample = "sample_v2", beta_vega = 0.7 # Adjusted based on review ) # 4. Final analysis results_final <- pipelineCNA( count_mtx, sample = "sample_final", SUBCLONES = TRUE, plotTree = TRUE ) ``` ## Troubleshooting ### Common Issues | Issue | Solution | |-------|----------| | No normal cells detected | Add custom signatures or provide known normals | | Too many/few segments | Adjust `beta_vega` parameter | | Memory errors | Reduce `par_cores` or subsample data | | Missing genes | Verify gene name format (Symbol vs Ensembl) | ## Session Info ```{r session-info, eval=TRUE} sessionInfo() ```