--- title: "Visualization Guide" author: "Zaoqu Liu" date: "`r Sys.Date()`" output: rmarkdown::html_vignette: toc: true toc_depth: 3 vignette: > %\VignetteIndexEntry{Visualization Guide} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include=FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 7, fig.height = 5, warning = FALSE, message = FALSE, eval = FALSE ) ``` ## Introduction MOFSR provides comprehensive visualization functions for multi-omics clustering analysis. All visualizations work with both base R graphics and ggplot2 when available. ## Setup ```{r load-data} library(MOFSR) set.seed(42) # Generate simulated data with 3 subtypes n_samples <- 60 true_clusters <- rep(1:3, each = 20) generate_omics <- function(n, p, clusters) { n_clusters <- length(unique(clusters)) centers <- matrix(rnorm(n_clusters * p, sd = 2), n_clusters, p) data <- t(sapply(clusters, function(k) { centers[k, ] + rnorm(p, sd = 1) })) colnames(data) <- paste0("Feature_", seq_len(p)) rownames(data) <- paste0("Sample_", seq_len(n)) return(t(data)) } data_list <- list( mRNA = generate_omics(n_samples, 500, true_clusters), miRNA = generate_omics(n_samples, 200, true_clusters), methylation = generate_omics(n_samples, 100, true_clusters) ) # Run clustering result_snf <- run_snf(data_list, n_clusters = 3) ``` --- ## UMAP Visualization MOFSR includes an internal UMAP implementation that requires no external dependencies. ### Basic UMAP Plot ```{r umap-basic, fig.cap="UMAP visualization colored by cluster assignments"} # Compute UMAP coordinates umap_coords <- compute_umap(data_list, n_neighbors = 15, n_epochs = 100, seed = 42) # Basic plot plot_umap(umap_coords, result_snf, title = "Multi-Omics UMAP") ``` ### Customized UMAP ```{r umap-custom, fig.cap="Customized UMAP with custom colors"} # Custom color palette my_colors <- c("#E64B35", "#4DBBD5", "#00A087") plot_umap(umap_coords, result_snf, title = "SNF Clustering Results", point_size = 3, colors = my_colors, show_legend = TRUE) ``` ### UMAP Parameters | Parameter | Default | Description | |:----------|:--------|:------------| | `n_neighbors` | 15 | Number of neighbors for local structure | | `min_dist` | 0.1 | Minimum distance between points | | `n_epochs` | 200 | Number of optimization iterations | | `seed` | NULL | Random seed for reproducibility | ```{r umap-params, fig.cap="Effect of different UMAP parameters"} # Tight clustering (small min_dist) umap_tight <- compute_umap(data_list, min_dist = 0.01, n_epochs = 100, seed = 42) plot_umap(umap_tight, result_snf, title = "min_dist = 0.01 (Tight)") ``` --- ## Consensus Matrix Heatmap Consensus matrices show clustering stability across bootstrap resamples. ### Generate Consensus Matrix ```{r consensus-matrix} # Run consensus clustering cc_result <- consensus_cluster(data_list$mRNA, maxK = 5, reps = 50, seed = 42) ``` ### Basic Heatmap ```{r consensus-heatmap, fig.cap="Consensus matrix heatmap for K=3"} plot_consensus_heatmap(cc_result[[3]]$consensusMatrix, title = "Consensus Matrix (K=3)") ``` ### Ordered by Clusters ```{r consensus-ordered, fig.cap="Consensus matrix ordered by cluster assignments"} plot_consensus_heatmap(cc_result[[3]]$consensusMatrix, clusters = cc_result[[3]]$consensusClass, title = "Consensus Matrix (Ordered)") ``` ### Custom Colors ```{r consensus-colors, fig.cap="Consensus matrix with custom color palette"} # Purple-green palette custom_colors <- grDevices::colorRampPalette(c("white", "#7570B3", "#1B9E77"))(100) plot_consensus_heatmap(cc_result[[3]]$consensusMatrix, colors = custom_colors, title = "Custom Color Palette") ``` --- ## Cluster Quality Metrics ### PAC (Proportion of Ambiguous Clustering) Lower PAC indicates more stable clustering. ```{r pac-calc} # Calculate PAC for each K pac_values <- calc_pac(cc_result) print(pac_values) ``` ```{r pac-plot, fig.cap="PAC scores across different K values"} plot_cluster_quality(pac_values, title = "PAC Scores (Lower is Better)") ``` ### Combined Metrics ```{r chi-calc} # Calculate CHI (Calinski-Harabasz Index) # Higher CHI indicates better separation chi_values <- sapply(2:5, function(k) { CalCHI(t(data_list$mRNA), cc_result[[k]]$consensusClass) }) names(chi_values) <- 2:5 print(chi_values) ``` --- ## Silhouette Analysis Silhouette width measures how similar samples are to their own cluster compared to other clusters. ```{r silhouette, fig.cap="Silhouette plot showing cluster quality"} # Compute distance matrix dist_mat <- dist(t(data_list$mRNA)) # Plot silhouette plot_silhouette(result_snf$Cluster, dist_mat, title = "Silhouette Analysis") ``` ### Interpretation - **Silhouette > 0.7**: Strong cluster structure - **Silhouette 0.5-0.7**: Reasonable structure - **Silhouette 0.25-0.5**: Weak structure - **Silhouette < 0.25**: No substantial structure --- ## Algorithm Comparison Compare clustering results across multiple algorithms. ```{r algo-compare} # Run multiple algorithms algorithms <- c("SNF", "RGCCA", "CPCA") results <- lapply(algorithms, function(alg) { run_integration(data_list, algorithm = alg, n_clusters = 3) }) names(results) <- algorithms ``` ```{r algo-heatmap, fig.cap="Algorithm agreement heatmap (ARI)"} plot_algorithm_comparison(results, title = "Algorithm Agreement") ``` ### ARI Matrix ```{r ari-matrix} ari_matrix <- compare_clusterings(results) print(round(ari_matrix, 3)) ``` --- ## Survival Analysis (Optional) If you have survival data, MOFSR can generate Kaplan-Meier curves. ```{r survival-demo, eval=FALSE} # Example with simulated survival data time <- rexp(60, rate = 0.1) event <- sample(0:1, 60, replace = TRUE, prob = c(0.3, 0.7)) # Plot survival curves plot_survival(time, event, result_snf, title = "Survival by Cluster", conf_int = TRUE) ``` --- ## Base R vs ggplot2 MOFSR automatically uses ggplot2 when available, falling back to base R graphics otherwise. ### Check ggplot2 Availability ```{r check-ggplot} has_ggplot <- requireNamespace("ggplot2", quietly = TRUE) cat("ggplot2 available:", has_ggplot, "\n") ``` ### Consistent API All plot functions have the same API regardless of the backend: ```{r plot-api, eval=FALSE} # These work identically with or without ggplot2 plot_umap(umap_coords, clusters) plot_consensus_heatmap(matrix) plot_silhouette(clusters, dist_matrix) plot_cluster_quality(pac_values) ``` --- ## Saving Plots ### With ggplot2 ```{r save-ggplot, eval=FALSE} library(ggplot2) # Create plot p <- plot_umap(umap_coords, result_snf, title = "My Plot") # Save to file ggsave("umap_plot.png", p, width = 8, height = 6, dpi = 300) ggsave("umap_plot.pdf", p, width = 8, height = 6) ``` ### With Base R ```{r save-base, eval=FALSE} # PNG png("umap_plot.png", width = 800, height = 600, res = 150) plot_umap(umap_coords, result_snf) dev.off() # PDF pdf("umap_plot.pdf", width = 8, height = 6) plot_umap(umap_coords, result_snf) dev.off() ``` --- ## Color Palettes ### Default Palette ```{r default-colors} default_colors <- c("#E41A1C", "#377EB8", "#4DAF4A", "#984EA3", "#FF7F00", "#FFFF33", "#A65628", "#F781BF", "#999999", "#66C2A5") cat("Default 10-color palette:\n") print(default_colors) ``` ### Nature-Style Palettes ```{r nature-colors} # Nature Publishing Group colors npg_colors <- c("#E64B35", "#4DBBD5", "#00A087", "#3C5488", "#F39B7F", "#8491B4", "#91D1C2", "#DC0000", "#7E6148", "#B09C85") # Lancet colors lancet_colors <- c("#00468B", "#ED0000", "#42B540", "#0099B4", "#925E9F", "#FDAF91", "#AD002A", "#ADB6B6", "#1B1919") ``` --- ## Summary MOFSR provides a complete visualization toolkit for multi-omics analysis: | Function | Purpose | |:---------|:--------| | `plot_umap()` | Dimensionality reduction visualization | | `plot_consensus_heatmap()` | Clustering stability | | `plot_cluster_quality()` | Optimal K selection | | `plot_silhouette()` | Cluster validation | | `plot_algorithm_comparison()` | Method comparison | | `plot_survival()` | Clinical outcome analysis | ## Session Info ```{r session} sessionInfo() ```