---
title: "Network Analysis and Visualization with OmnipathR"
author:
- name: Zaoqu Liu
  email: liuzaoqu@163.com
  affiliation: Department of Interventional Radiology, The First Affiliated Hospital of Zhengzhou University
- name: Dénes Türei
  email: turei.denes@gmail.com
- name: Julio Saez-Rodriguez
  affiliation: Institute for Computational Biomedicine, Heidelberg University
package: OmnipathR
output:
  bookdown::html_document2:
    base_format: rmarkdown::html_vignette
    toc: true
    toc_depth: 3
    number_sections: true
    fig_caption: true
    fig_width: 8
    fig_height: 6
pkgdown:
  as_is: true
vignette: |
  %\VignetteIndexEntry{Network Analysis and Visualization with OmnipathR}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(
    echo = TRUE,
    message = FALSE,
    warning = FALSE,
    fig.align = "center",
    collapse = TRUE,
    comment = "#>"
)
```

# Introduction

Biological networks are fundamental representations of molecular interactions that underlie cellular processes. **OmnipathR** provides comprehensive tools for constructing, analyzing, and visualizing molecular interaction networks from the OmniPath database and other integrated resources.

This vignette demonstrates advanced network analysis workflows including:

- Network construction from multiple data sources
- Topological analysis and centrality measures
- Pathway enrichment and functional analysis
- Publication-quality network visualization

# Theoretical Background

## Graph Theory in Biology

Biological networks can be represented as graphs $G = (V, E)$ where:

- **Vertices (V)**: Represent biological entities (proteins, genes, metabolites)
- **Edges (E)**: Represent interactions or relationships between entities

Key network metrics include:

| Metric | Formula | Biological Interpretation |
|--------|---------|---------------------------|
| Degree | $k_i = \sum_j A_{ij}$ | Number of interaction partners |
| Betweenness | $B_i = \sum_{s \neq i \neq t} \frac{\sigma_{st}(i)}{\sigma_{st}}$ | Information flow through a node |
| Clustering | $C_i = \frac{2e_i}{k_i(k_i-1)}$ | Local network density |

## Signaling Network Properties

OmniPath networks exhibit characteristic properties:
1. **Scale-free topology**: Degree distribution follows power law $P(k) \sim k^{-\gamma}$
2. **Small-world property**: Short average path length with high clustering
3. **Modularity**: Functional modules correspond to biological pathways

# Setup and Data Loading

```{r load-packages}
library(OmnipathR)
library(igraph)
library(dplyr)
library(ggplot2)
```

## Retrieve Interaction Data

```{r get-interactions}
# Retrieve high-confidence protein-protein interactions
interactions <- omnipath(
    resources = c("SIGNOR", "SignaLink3"),
    organism = 9606  # Human
)

# Filter for directed, signed interactions
interactions_filtered <- interactions %>%
    filter(
        is_directed == 1,
        !is.na(consensus_direction)
    )

cat("Total interactions:", nrow(interactions), "\n")
cat("Filtered interactions:", nrow(interactions_filtered), "\n")
```

# Network Construction

## Building the Interaction Graph

```{r build-graph}
# Convert to igraph object
network <- interaction_graph(interactions_filtered)

# Network summary
cat("Nodes:", vcount(network), "\n")
cat("Edges:", ecount(network), "\n")
cat("Density:", round(edge_density(network), 4), "\n")
```

## Extract Giant Component

For meaningful analysis, we typically focus on the largest connected component:

```{r giant-component}
# Extract giant component
gc <- giant_component(network)

cat("Giant component nodes:", vcount(gc), "\n")
cat("Giant component edges:", ecount(gc), "\n")
cat("Proportion of original:", 
    round(vcount(gc)/vcount(network) * 100, 1), "%\n")
```

# Topological Analysis

## Centrality Measures

```{r centrality}
# Calculate centrality measures
V(gc)$degree <- degree(gc)
V(gc)$betweenness <- betweenness(gc, normalized = TRUE)

# Create centrality data frame
centrality_df <- data.frame(
    gene = V(gc)$name,
    degree = V(gc)$degree,
    betweenness = V(gc)$betweenness
) %>%
    arrange(desc(degree))

# Top hub genes
head(centrality_df, 15)
```

## Degree Distribution Analysis

```{r degree-dist, fig.cap="Degree distribution of the OmniPath signaling network on log-log scale. The linear relationship indicates scale-free topology."}
# Degree distribution
degree_dist <- data.frame(
    degree = degree(gc)
) %>%
    count(degree) %>%
    mutate(
        log_degree = log10(degree),
        log_freq = log10(n)
    )

# Plot degree distribution
ggplot(degree_dist, aes(x = log_degree, y = log_freq)) +
    geom_point(color = "#007B7F", size = 2, alpha = 0.7) +
    geom_smooth(method = "lm", se = FALSE, color = "#E74C3C", linetype = "dashed") +
    labs(
        title = "Degree Distribution (Log-Log Scale)",
        subtitle = "Scale-free networks show linear relationship",
        x = expression(log[10](k)),
        y = expression(log[10](P(k)))
    ) +
    theme_minimal() +
    theme(
        plot.title = element_text(face = "bold", size = 14),
        plot.subtitle = element_text(color = "gray50")
    )
```

## Hub Gene Visualization

```{r hub-viz, fig.cap="Top 20 hub genes ranked by degree centrality in the OmniPath network."}
# Visualize top hub genes
top_hubs <- centrality_df %>% head(20)

ggplot(top_hubs, aes(x = reorder(gene, degree), y = degree)) +
    geom_col(fill = "#007B7F", alpha = 0.8) +
    geom_text(aes(label = degree), hjust = -0.2, size = 3) +
    coord_flip() +
    labs(
        title = "Top 20 Hub Genes",
        subtitle = "Ranked by degree centrality",
        x = "Gene",
        y = "Degree"
    ) +
    theme_minimal() +
    theme(
        plot.title = element_text(face = "bold"),
        axis.text.y = element_text(size = 9)
    ) +
    expand_limits(y = max(top_hubs$degree) * 1.1)
```

# Pathway Analysis

## Shortest Path Finding

```{r shortest-paths}
# Find all paths between key signaling proteins
paths <- find_all_paths(
    graph = gc,
    start = c("EGFR"),
    end = c("AKT1"),
    attr = "name",
    maxlen = 3
)

cat("Found", length(paths), "paths from EGFR to AKT1\n")

# Display example paths
if(length(paths) > 0) {
    cat("\nExample paths:\n")
    for(i in 1:min(5, length(paths))) {
        cat("Path", i, ":", paste(paths[[i]], collapse = " -> "), "\n")
    }
}
```

## Subnetwork Extraction

```{r subnetwork, fig.cap="Subnetwork around key oncogenic signaling proteins showing first-order neighbors."}
# Extract subnetwork around key nodes
key_genes <- c("TP53", "EGFR", "AKT1", "MTOR", "MAPK1", "SRC")

# Get first neighbors
subnet_nodes <- unique(unlist(lapply(key_genes, function(g) {
    if(g %in% V(gc)$name) {
        c(g, names(neighbors(gc, g)))
    }
})))

# Create subnetwork
if(length(subnet_nodes) > 0) {
    subnet <- induced_subgraph(gc, subnet_nodes[subnet_nodes %in% V(gc)$name])
    
    cat("Subnetwork nodes:", vcount(subnet), "\n")
    cat("Subnetwork edges:", ecount(subnet), "\n")
    
    # Simple visualization
    V(subnet)$color <- ifelse(V(subnet)$name %in% key_genes, "#E74C3C", "#3498DB")
    V(subnet)$size <- ifelse(V(subnet)$name %in% key_genes, 12, 6)
    
    plot(subnet, 
         vertex.label = ifelse(V(subnet)$name %in% key_genes, V(subnet)$name, ""),
         vertex.label.cex = 0.8,
         vertex.label.color = "black",
         edge.arrow.size = 0.3,
         edge.color = "gray70",
         layout = layout_with_fr(subnet),
         main = "Oncogenic Signaling Subnetwork")
}
```

# Integration with Annotations

## Functional Annotation of Hub Genes

```{r annotations}
# Get top hub genes
hub_genes <- centrality_df %>%
    head(10) %>%
    pull(gene)

# Retrieve functional annotations
hub_annotations <- annotations(
    proteins = hub_genes,
    resources = c("HPA_subcellular")
)

if(nrow(hub_annotations) > 0) {
    # Summarize annotations
    annotation_summary <- hub_annotations %>%
        group_by(genesymbol, label) %>%
        summarise(value = first(value), .groups = "drop") %>%
        head(20)
    
    print(annotation_summary)
}
```

# Session Information

```{r session-info}
sessionInfo()
```

# References

1. Türei D, Korcsmáros T, Saez-Rodriguez J. OmniPath: guidelines and gateway for literature-curated signaling pathway resources. *Nature Methods* 2016;13:966-967.

2. Türei D, et al. Integrated intra- and intercellular signaling knowledge for multicellular omics analysis. *Molecular Systems Biology* 2021;17:e9923.

3. Csardi G, Nepusz T. The igraph software package for complex network research. *InterJournal Complex Systems* 2006;1695.