---
title: "Multi-Sample Comparison Analysis"
author: "Zaoqu Liu"
date: "`r Sys.Date()`"
output: 
  rmarkdown::html_vignette:
    toc: true
    toc_depth: 3
    fig_caption: true
vignette: >
  %\VignetteIndexEntry{Multi-Sample Comparison Analysis}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(
  echo = TRUE,
  message = FALSE,
  warning = FALSE,
  collapse = TRUE,
  comment = "#>",
  fig.width = 10,
  fig.height = 6,
  out.width = "100%",
  eval = FALSE
)
```

## Introduction

SCEVAN enables comparative analysis of copy number alterations across multiple samples. This is particularly useful for:

- **Longitudinal studies**: Comparing primary tumor vs metastasis
- **Treatment response**: Pre- and post-treatment samples
- **Patient cohorts**: Identifying shared vs patient-specific alterations

## Setup

```{r load-packages}
library(SCEVAN)
library(ggplot2)
library(dplyr)
```

## Preparing Multi-Sample Data

### Data Structure

Multi-sample analysis requires a **named list** of count matrices:

```{r data-structure}
# Example structure
listCountMtx <- list(
  "Sample1" = count_mtx_1,
  "Sample2" = count_mtx_2,
  "Sample3" = count_mtx_3
)
```

### Download Example Data

```{r load-example}
# Load glioblastoma multi-sample data
load(url("https://www.dropbox.com/s/esqvnltucdqajg1/listCountMtx.RData?raw=1"))

# Examine structure
names(listCountMtx)

# Sample sizes
sapply(listCountMtx, ncol)
```

## Running Multi-Sample Analysis

### Basic Comparison

```{r run-multi}
multiSampleComparisonClonalCN(
  listCountMtx,
  analysisName = "GBM_comparison",
  organism = "human",
  par_cores = 4
)
```

### With Known Normal Cells

```{r with-normals}
# Optionally provide known normal cells per sample
listNormCells <- list(
  "MGH102" = c("cell_a", "cell_b"),
  "MGH104" = c("cell_c", "cell_d"),
  "MGH105" = NULL,  # Auto-detect
  "MGH106" = NULL
)

multiSampleComparisonClonalCN(
  listCountMtx,
  listNormCells = listNormCells,
  analysisName = "GBM_with_normals",
  par_cores = 4
)
```

## Output Interpretation

### Generated Files

```{r list-files}
list.files("./output", pattern = "GBM_comparison")
```

| File | Description |
|------|-------------|
| `*_allOncoHeat.png` | Combined OncoPrint across samples |
| `*_comparison.png` | Side-by-side CN profiles |
| `*_CloneTree.png` | Cross-sample phylogeny |

### Comparative Heatmap

The multi-sample comparison generates combined visualization files:

- `*_allOncoHeat.png` - Combined OncoPrint showing shared and sample-specific alterations
- `*_CloneTree.png` - Cross-sample phylogenetic tree

## Visualization Functions

### Plot All Clonal Profiles

```{r plot-clonal}
# Generate combined clonal CN plot
plotAllClonalCN(
  sampleNames = names(listCountMtx),
  pathOutput = "./output"
)
```

### Plot Subclonal Profiles

```{r plot-subclonal}
# Generate combined subclonal CN plot
plotAllSubclonalCN(
  sampleNames = names(listCountMtx),
  pathOutput = "./output"
)
```

## Advanced Analysis

### Identifying Shared Alterations

```{r shared-alterations}
# Load individual results
results_list <- lapply(names(listCountMtx), function(s) {
  seg_file <- paste0("./output/", s, "_Clonal_CN.seg")
  if(file.exists(seg_file)) {
    read.table(seg_file, header = TRUE, sep = "\t")
  }
})
names(results_list) <- names(listCountMtx)

# Find alterations present in all samples
find_shared_alterations <- function(results_list) {
  # Get altered regions per sample
  altered_regions <- lapply(results_list, function(df) {
    if(!is.null(df)) {
      df[df$CN != 2, c("Chr", "Pos", "End", "CN")]
    }
  })
  
  # Find overlaps (simplified example)
  # In practice, use GenomicRanges for proper overlap detection
  altered_regions
}

shared <- find_shared_alterations(results_list)
```

### Custom Comparison Plots

```{r custom-plots}
# Create custom comparison visualization
create_cn_comparison <- function(sample_names, output_path) {
  
  cn_data <- lapply(sample_names, function(s) {
    load(paste0(output_path, "/", s, "_CNAmtx.RData"))
    data.frame(
      sample = s,
      mean_cn = colMeans(CNA_mtx_relat)
    )
  })
  
  cn_df <- do.call(rbind, cn_data)
  
  ggplot(cn_df, aes(x = sample, y = mean_cn, fill = sample)) +
    geom_boxplot() +
    theme_minimal() +
    labs(
      title = "Global CNA Burden Comparison",
      x = "Sample",
      y = "Mean CNA Score"
    ) +
    theme(legend.position = "none")
}

# Generate plot
create_cn_comparison(names(listCountMtx), "./output")
```

## Case Study: Head & Neck Cancer

### Primary vs Lymph Node Metastasis

```{r hnscc-example}
# Load HNSCC data
load(url("https://www.dropbox.com/s/6zns12amobs39g8/HNSCC26_data.RData?raw=1"))

# Examine samples
names(listCountMtx)  # Should show "Primary" and "LN"

# Run comparison
multiSampleComparisonClonalCN(
  listCountMtx,
  analysisName = "HNSCC26_comparison",
  organism = "human",
  par_cores = 4,
  plotTree = TRUE
)
```

### Interpreting Primary vs Metastasis

Key questions to address:

1. **Clonal evolution**: Which alterations are clonal (present in both)?
2. **Metastasis-specific**: New alterations in lymph node
3. **Lost alterations**: Present in primary but not metastasis

## Statistical Considerations

### Sample Size Requirements

| Analysis Type | Minimum Cells/Sample | Recommendation |
|---------------|---------------------|----------------|
| Cell classification | 100 | 500+ |
| Subclone detection | 200 | 1000+ |
| Cross-sample comparison | 300 | 1000+ |

### Batch Effect Considerations

When comparing samples:

- Process with same protocol if possible
- Consider batch correction methods
- Verify normal cell detection consistency

## Best Practices

### Workflow Checklist

1. [ ] Verify consistent gene annotation across samples
2. [ ] Check cell quality metrics per sample
3. [ ] Run individual analyses first
4. [ ] Review normal cell detection
5. [ ] Run multi-sample comparison
6. [ ] Validate shared alterations

### Common Pitfalls

| Issue | Cause | Solution |
|-------|-------|----------|
| No shared alterations | Different tumor types | Verify sample identity |
| All alterations shared | Contamination | Check for cross-sample mixing |
| Inconsistent segmentation | Different cell counts | Normalize comparison |

## Session Info

```{r session-info, eval=TRUE}
sessionInfo()
```