--- title: "Network Analysis and Visualization with OmnipathR" author: - name: Zaoqu Liu email: liuzaoqu@163.com affiliation: Department of Interventional Radiology, The First Affiliated Hospital of Zhengzhou University - name: Dénes Türei email: turei.denes@gmail.com - name: Julio Saez-Rodriguez affiliation: Institute for Computational Biomedicine, Heidelberg University package: OmnipathR output: bookdown::html_document2: base_format: rmarkdown::html_vignette toc: true toc_depth: 3 number_sections: true fig_caption: true fig_width: 8 fig_height: 6 pkgdown: as_is: true vignette: | %\VignetteIndexEntry{Network Analysis and Visualization with OmnipathR} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include=FALSE} knitr::opts_chunk$set( echo = TRUE, message = FALSE, warning = FALSE, fig.align = "center", collapse = TRUE, comment = "#>" ) ``` # Introduction Biological networks are fundamental representations of molecular interactions that underlie cellular processes. **OmnipathR** provides comprehensive tools for constructing, analyzing, and visualizing molecular interaction networks from the OmniPath database and other integrated resources. This vignette demonstrates advanced network analysis workflows including: - Network construction from multiple data sources - Topological analysis and centrality measures - Pathway enrichment and functional analysis - Publication-quality network visualization # Theoretical Background ## Graph Theory in Biology Biological networks can be represented as graphs $G = (V, E)$ where: - **Vertices (V)**: Represent biological entities (proteins, genes, metabolites) - **Edges (E)**: Represent interactions or relationships between entities Key network metrics include: | Metric | Formula | Biological Interpretation | |--------|---------|---------------------------| | Degree | $k_i = \sum_j A_{ij}$ | Number of interaction partners | | Betweenness | $B_i = \sum_{s \neq i \neq t} \frac{\sigma_{st}(i)}{\sigma_{st}}$ | Information flow through a node | | Clustering | $C_i = \frac{2e_i}{k_i(k_i-1)}$ | Local network density | ## Signaling Network Properties OmniPath networks exhibit characteristic properties: 1. **Scale-free topology**: Degree distribution follows power law $P(k) \sim k^{-\gamma}$ 2. **Small-world property**: Short average path length with high clustering 3. **Modularity**: Functional modules correspond to biological pathways # Setup and Data Loading ```{r load-packages} library(OmnipathR) library(igraph) library(dplyr) library(ggplot2) ``` ## Retrieve Interaction Data ```{r get-interactions} # Retrieve high-confidence protein-protein interactions interactions <- omnipath( resources = c("SIGNOR", "SignaLink3"), organism = 9606 # Human ) # Filter for directed, signed interactions interactions_filtered <- interactions %>% filter( is_directed == 1, !is.na(consensus_direction) ) cat("Total interactions:", nrow(interactions), "\n") cat("Filtered interactions:", nrow(interactions_filtered), "\n") ``` # Network Construction ## Building the Interaction Graph ```{r build-graph} # Convert to igraph object network <- interaction_graph(interactions_filtered) # Network summary cat("Nodes:", vcount(network), "\n") cat("Edges:", ecount(network), "\n") cat("Density:", round(edge_density(network), 4), "\n") ``` ## Extract Giant Component For meaningful analysis, we typically focus on the largest connected component: ```{r giant-component} # Extract giant component gc <- giant_component(network) cat("Giant component nodes:", vcount(gc), "\n") cat("Giant component edges:", ecount(gc), "\n") cat("Proportion of original:", round(vcount(gc)/vcount(network) * 100, 1), "%\n") ``` # Topological Analysis ## Centrality Measures ```{r centrality} # Calculate centrality measures V(gc)$degree <- degree(gc) V(gc)$betweenness <- betweenness(gc, normalized = TRUE) # Create centrality data frame centrality_df <- data.frame( gene = V(gc)$name, degree = V(gc)$degree, betweenness = V(gc)$betweenness ) %>% arrange(desc(degree)) # Top hub genes head(centrality_df, 15) ``` ## Degree Distribution Analysis ```{r degree-dist, fig.cap="Degree distribution of the OmniPath signaling network on log-log scale. The linear relationship indicates scale-free topology."} # Degree distribution degree_dist <- data.frame( degree = degree(gc) ) %>% count(degree) %>% mutate( log_degree = log10(degree), log_freq = log10(n) ) # Plot degree distribution ggplot(degree_dist, aes(x = log_degree, y = log_freq)) + geom_point(color = "#007B7F", size = 2, alpha = 0.7) + geom_smooth(method = "lm", se = FALSE, color = "#E74C3C", linetype = "dashed") + labs( title = "Degree Distribution (Log-Log Scale)", subtitle = "Scale-free networks show linear relationship", x = expression(log[10](k)), y = expression(log[10](P(k))) ) + theme_minimal() + theme( plot.title = element_text(face = "bold", size = 14), plot.subtitle = element_text(color = "gray50") ) ``` ## Hub Gene Visualization ```{r hub-viz, fig.cap="Top 20 hub genes ranked by degree centrality in the OmniPath network."} # Visualize top hub genes top_hubs <- centrality_df %>% head(20) ggplot(top_hubs, aes(x = reorder(gene, degree), y = degree)) + geom_col(fill = "#007B7F", alpha = 0.8) + geom_text(aes(label = degree), hjust = -0.2, size = 3) + coord_flip() + labs( title = "Top 20 Hub Genes", subtitle = "Ranked by degree centrality", x = "Gene", y = "Degree" ) + theme_minimal() + theme( plot.title = element_text(face = "bold"), axis.text.y = element_text(size = 9) ) + expand_limits(y = max(top_hubs$degree) * 1.1) ``` # Pathway Analysis ## Shortest Path Finding ```{r shortest-paths} # Find all paths between key signaling proteins paths <- find_all_paths( graph = gc, start = c("EGFR"), end = c("AKT1"), attr = "name", maxlen = 3 ) cat("Found", length(paths), "paths from EGFR to AKT1\n") # Display example paths if(length(paths) > 0) { cat("\nExample paths:\n") for(i in 1:min(5, length(paths))) { cat("Path", i, ":", paste(paths[[i]], collapse = " -> "), "\n") } } ``` ## Subnetwork Extraction ```{r subnetwork, fig.cap="Subnetwork around key oncogenic signaling proteins showing first-order neighbors."} # Extract subnetwork around key nodes key_genes <- c("TP53", "EGFR", "AKT1", "MTOR", "MAPK1", "SRC") # Get first neighbors subnet_nodes <- unique(unlist(lapply(key_genes, function(g) { if(g %in% V(gc)$name) { c(g, names(neighbors(gc, g))) } }))) # Create subnetwork if(length(subnet_nodes) > 0) { subnet <- induced_subgraph(gc, subnet_nodes[subnet_nodes %in% V(gc)$name]) cat("Subnetwork nodes:", vcount(subnet), "\n") cat("Subnetwork edges:", ecount(subnet), "\n") # Simple visualization V(subnet)$color <- ifelse(V(subnet)$name %in% key_genes, "#E74C3C", "#3498DB") V(subnet)$size <- ifelse(V(subnet)$name %in% key_genes, 12, 6) plot(subnet, vertex.label = ifelse(V(subnet)$name %in% key_genes, V(subnet)$name, ""), vertex.label.cex = 0.8, vertex.label.color = "black", edge.arrow.size = 0.3, edge.color = "gray70", layout = layout_with_fr(subnet), main = "Oncogenic Signaling Subnetwork") } ``` # Integration with Annotations ## Functional Annotation of Hub Genes ```{r annotations} # Get top hub genes hub_genes <- centrality_df %>% head(10) %>% pull(gene) # Retrieve functional annotations hub_annotations <- annotations( proteins = hub_genes, resources = c("HPA_subcellular") ) if(nrow(hub_annotations) > 0) { # Summarize annotations annotation_summary <- hub_annotations %>% group_by(genesymbol, label) %>% summarise(value = first(value), .groups = "drop") %>% head(20) print(annotation_summary) } ``` # Session Information ```{r session-info} sessionInfo() ``` # References 1. Türei D, Korcsmáros T, Saez-Rodriguez J. OmniPath: guidelines and gateway for literature-curated signaling pathway resources. *Nature Methods* 2016;13:966-967. 2. Türei D, et al. Integrated intra- and intercellular signaling knowledge for multicellular omics analysis. *Molecular Systems Biology* 2021;17:e9923. 3. Csardi G, Nepusz T. The igraph software package for complex network research. *InterJournal Complex Systems* 2006;1695.