Biological databases use diverse identifier systems to reference genes, proteins, and other molecular entities. OmnipathR provides a unified framework for translating between these identifier systems, supporting seamless data integration across multiple resources.
OmnipathR supports translation between numerous identifier systems:
| Category | Identifier Types |
|---|---|
| Protein | UniProt AC, UniProt ID, RefSeq |
| Gene | HGNC Symbol, Entrez Gene ID, Ensembl Gene ID |
| Transcript | Ensembl Transcript ID, RefSeq mRNA |
| Peptide | Ensembl Peptide ID |
| External | KEGG, Reactome, PDB, InterPro |
The identifier translation in OmnipathR follows a hub-and-spoke model:
┌─────────────┐
│ UniProt │
│ (Hub) │
└──────┬──────┘
┌───────────────┼───────────────┐
│ │ │
┌──────▼──────┐ ┌──────▼──────┐ ┌──────▼──────┐
│ Ensembl │ │ HGNC │ │ Entrez │
│ Gene ID │ │ Symbol │ │ Gene ID │
└─────────────┘ └─────────────┘ └─────────────┘
Identifier mappings can be:
OmnipathR provides tools to quantify and handle these ambiguities.
# Create example data frame
expression_data <- tibble(
uniprot = c("P00533", "P04637", "P31749", "P42345", "P06400"),
log2fc = c(2.5, -1.8, 1.2, 0.8, -2.1),
pvalue = c(0.001, 0.005, 0.02, 0.15, 0.003)
)
# Add gene symbol column
expression_annotated <- translate_ids(
expression_data,
uniprot,
genesymbol
)
print(expression_annotated)
#> # A tibble: 5 × 4
#> uniprot log2fc pvalue genesymbol
#> <chr> <dbl> <dbl> <chr>
#> 1 P00533 2.5 0.001 EGFR
#> 2 P04637 -1.8 0.005 TP53
#> 3 P31749 1.2 0.02 AKT1
#> 4 P42345 0.8 0.15 MTOR
#> 5 P06400 -2.1 0.003 RB1# Visualize translated data
if(nrow(expression_annotated) > 0 && "genesymbol" %in% colnames(expression_annotated)) {
plot_data <- expression_annotated %>%
filter(!is.na(genesymbol))
if(nrow(plot_data) > 0) {
ggplot(plot_data, aes(x = reorder(genesymbol, log2fc), y = log2fc, fill = log2fc > 0)) +
geom_col(alpha = 0.8) +
geom_hline(yintercept = 0, linetype = "dashed", color = "gray50") +
scale_fill_manual(values = c("TRUE" = "#E74C3C", "FALSE" = "#3498DB"), guide = "none") +
coord_flip() +
labs(
title = "Differential Expression",
subtitle = "UniProt IDs translated to gene symbols",
x = "Gene Symbol",
y = "Log2 Fold Change"
) +
theme_minimal() +
theme(plot.title = element_text(face = "bold"))
}
}Expression data with translated gene symbols showing log2 fold change values.
# Get interactions
interactions <- omnipath(resources = "SIGNOR")
# Check columns
cat("Interaction columns:", paste(head(colnames(interactions), 10), collapse = ", "), "...\n")
#> Interaction columns: source, target, source_genesymbol, target_genesymbol, is_directed, is_stimulation, is_inhibition, consensus_direction, consensus_stimulation, consensus_inhibition ...
cat("Total interactions:", nrow(interactions), "\n")
#> Total interactions: 65532
# Show sample with gene symbols
interactions %>%
select(source, source_genesymbol, target, target_genesymbol) %>%
head(10)
#> # A tibble: 10 × 4
#> source source_genesymbol target target_genesymbol
#> <chr> <chr> <chr> <chr>
#> 1 Q13976 PRKG1 Q13507 TRPC3
#> 2 P06241 FYN Q9Y210 TRPC6
#> 3 Q13976 PRKG1 Q9Y210 TRPC6
#> 4 P12931 SRC Q9Y210 TRPC6
#> 5 Q13976 PRKG1 Q9HCX4 TRPC7
#> 6 Q00535 CDK5 Q8NER1 TRPV1
#> 7 Q13438 OS9 Q9HBA0 TRPV4
#> 8 P18031 PTPN1 Q9H1D0 TRPV6
#> 9 P63244 RACK1 Q9BX84 TRPM6
#> 10 Q9BX84 TRPM6 Q96QT4 TRPM7# Get all UniProts from interactions
all_uniprots <- unique(c(interactions$source, interactions$target))
cat("Unique proteins in network:", length(all_uniprots), "\n")
#> Unique proteins in network: 6258
# Sample translation
sample_uniprots <- head(all_uniprots, 100)
sample_translated <- translate_ids(
data.frame(uniprot = sample_uniprots),
uniprot,
genesymbol
)
# Calculate success rate
success_rate <- sum(!is.na(sample_translated$genesymbol)) / nrow(sample_translated) * 100
cat("Translation success rate:", round(success_rate, 1), "%\n")
#> Translation success rate: 100 %
# Visualize
mapping_summary <- data.frame(
Status = c("Translated", "Not Found"),
Count = c(sum(!is.na(sample_translated$genesymbol)),
sum(is.na(sample_translated$genesymbol)))
)
ggplot(mapping_summary, aes(x = Status, y = Count, fill = Status)) +
geom_col(alpha = 0.8) +
geom_text(aes(label = Count), vjust = -0.5) +
scale_fill_manual(values = c("Translated" = "#27AE60", "Not Found" = "#E74C3C")) +
labs(
title = "ID Translation Results",
subtitle = paste("Sample of", length(sample_uniprots), "UniProt IDs"),
y = "Number of IDs"
) +
theme_minimal() +
theme(
plot.title = element_text(face = "bold"),
legend.position = "none"
) +
expand_limits(y = max(mapping_summary$Count) * 1.1)Distribution of identifier mapping success rates across different resources.
sessionInfo()
#> R version 4.6.0 (2026-04-24)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.4 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8
#> [4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C
#> [10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: Etc/UTC
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] tibble_3.3.1 bookdown_0.46 magrittr_2.0.5 ggraph_2.2.2 igraph_2.3.1 ggplot2_4.0.3
#> [7] dplyr_1.2.1 Matrix_1.7-5 OmnipathR_3.19.1 BiocStyle_2.41.0
#>
#> loaded via a namespace (and not attached):
#> [1] tidyselect_1.2.1 viridisLite_0.4.3 farver_2.1.2 blob_1.3.0 viridis_0.6.5
#> [6] R.utils_2.13.0 S7_0.2.2 fastmap_1.2.0 tweenr_2.0.3 XML_3.99-0.23
#> [11] digest_0.6.39 timechange_0.4.0 lifecycle_1.0.5 RSQLite_3.52.0 compiler_4.6.0
#> [16] rlang_1.2.0 sass_0.4.10 progress_1.2.3 tools_4.6.0 utf8_1.2.6
#> [21] yaml_2.3.12 knitr_1.51 labeling_0.4.3 graphlayouts_1.2.3 prettyunits_1.2.0
#> [26] bit_4.6.0 curl_7.1.0 xml2_1.5.2 RColorBrewer_1.1-3 R.matlab_3.7.0
#> [31] withr_3.0.2 purrr_1.2.2 sys_3.4.3 R.oo_1.27.1 polyclip_1.10-7
#> [36] grid_4.6.0 scales_1.4.0 MASS_7.3-65 cli_3.6.6 rmarkdown_2.31
#> [41] crayon_1.5.3 generics_0.1.4 otel_0.2.0 httr_1.4.8 tzdb_0.5.0
#> [46] sessioninfo_1.2.3 readxl_1.5.0 DBI_1.3.0 cachem_1.1.0 ggforce_0.5.0
#> [51] stringr_1.6.0 rvest_1.0.5 parallel_4.6.0 BiocManager_1.30.27 selectr_0.5-1
#> [56] cellranger_1.1.0 vctrs_0.7.3 jsonlite_2.0.0 hms_1.1.4 ggrepel_0.9.8
#> [61] bit64_4.8.2 maketools_1.3.2 tidyr_1.3.2 jquerylib_0.1.4 glue_1.8.1
#> [66] lubridate_1.9.5 stringi_1.8.7 gtable_0.3.6 later_1.4.8 logger_0.4.2
#> [71] pillar_1.11.1 rappdirs_0.3.4 htmltools_0.5.9 R6_2.6.1 httr2_1.2.2
#> [76] tcltk_4.6.0 tidygraph_1.3.1 vroom_1.7.1 evaluate_1.0.5 lattice_0.22-9
#> [81] readr_2.2.0 R.methodsS3_1.8.2 backports_1.5.1 memoise_2.0.1 bslib_0.11.0
#> [86] Rcpp_1.1.1-1.1 zip_2.3.3 gridExtra_2.3 checkmate_2.3.4 xfun_0.57
#> [91] fs_2.1.0 buildtools_1.0.0 pkgconfig_2.0.3Türei D, et al. Integrated intra- and intercellular signaling knowledge for multicellular omics analysis. Molecular Systems Biology 2021;17:e9923.
The UniProt Consortium. UniProt: the universal protein knowledgebase in 2021. Nucleic Acids Research 2021;49:D480-D489.
Cunningham F, et al. Ensembl 2022. Nucleic Acids Research 2022;50:D988-D995.