The iTALK ligand-receptor database uses human gene symbols (e.g., TGFB1, VEGFA). This creates a challenge when analyzing data from other species like mouse, where gene symbols follow different conventions (e.g., Tgfb1, Vegfa).
This vignette describes iTALK’s automatic cross-species conversion system, which enables seamless analysis of non-human data through ortholog mapping via Ensembl BioMart.
Different species follow distinct gene naming patterns:
| Species | Convention | Examples |
|---|---|---|
| Human | ALL UPPERCASE | TGFB1, VEGFA, CD8A |
| Mouse | Title Case | Tgfb1, Vegfa, Cd8a |
| Rat | Title Case | Tgfb1, Vegfa, Cd8a |
library(iTALK)
# Human genes
human_result <- detect_species(c("TGFB1", "VEGFA", "IL6", "TNF", "CD8A"))
cat("Human detection:\n")
#> Human detection:
cat(" Species:", human_result$species, "\n")
#> Species: Homo_sapiens
cat(" Confidence:", round(human_result$confidence * 100, 1), "%\n")
#> Confidence: 100 %
cat(" Method:", human_result$method, "\n\n")
#> Method: uppercase_pattern
# Mouse genes
mouse_result <- detect_species(c("Tgfb1", "Vegfa", "Il6", "Tnf", "Cd8a"))
cat("Mouse detection:\n")
#> Mouse detection:
cat(" Species:", mouse_result$species, "\n")
#> Species: Mus_musculus
cat(" Confidence:", round(mouse_result$confidence * 100, 1), "%\n")
#> Confidence: 100 %
cat(" Method:", mouse_result$method, "\n\n")
#> Method: titlecase_pattern
# Mixed (ambiguous)
mixed_result <- detect_species(c("TGFB1", "Vegfa", "IL6", "Tnf"))
cat("Mixed detection:\n")
#> Mixed detection:
cat(" Species:", mixed_result$species, "\n")
#> Species: unknown
cat(" Confidence:", round(mixed_result$confidence * 100, 1), "%\n")
#> Confidence: 50 %┌─────────────────────────────────────┐
│ Input Gene List │
└────────────────┬────────────────────┘
│
▼
┌─────────────────────────────────────┐
│ Sample up to 100 unique genes │
│ Filter: length ≥ 3, contains A-Za-z│
└────────────────┬────────────────────┘
│
▼
┌─────────────────────────────────────┐
│ Pattern Matching │
│ Human: ^[A-Z0-9]+$ │
│ Mouse: ^[A-Z][a-z0-9]+[A-Za-z0-9]*$│
└────────────────┬────────────────────┘
│
▼
┌─────────────────────────────────────┐
│ Calculate Proportions │
│ prop_human = n_human / n_total │
│ prop_mouse = n_mouse / n_total │
└────────────────┬────────────────────┘
│
▼
┌─────────────────────────────────────┐
│ Threshold Check (default: 70%) │
│ if prop_human ≥ 0.7 → Homo_sapiens │
│ if prop_mouse ≥ 0.7 → Mus_musculus │
│ else → unknown │
└─────────────────────────────────────┘
When mouse genes are detected, iTALK queries Ensembl BioMart to retrieve ortholog mappings:
# Manual conversion example
conversion <- convert_species_biomart(
genes = c("Tgfb1", "Vegfa", "Ctnnb1", "Cd8a", "Ptprc"),
from_species = "Mus_musculus",
to_species = "Homo_sapiens",
ensembl_version = 103, # Fixed version for reproducibility
cache = TRUE
)
# View mapping results
conversion$mapping
#> from_gene to_gene
#> 1 Tgfb1 TGFB1
#> 2 Vegfa VEGFA
#> 3 Ctnnb1 CTNNB1
#> 4 Cd8a CD8A
#> 5 Ptprc PTPRC
# Statistics
conversion$stats
#> $n_input: 5
#> $n_mapped: 5
#> $mapping_rate: 1.0The query retrieves the associated_gene_name attribute
for orthologs:
Dataset: mmusculus_gene_ensembl
Filter: external_gene_name (mouse symbols)
Attribute: hsapiens_homolog_associated_gene_name
To avoid repeated BioMart queries, results are cached locally:
Cache location: ~/.Rcache/
Cache key: hash(genes) + species + ensembl_version
Cache format: R.cache RDS files
First query: ~15 seconds (network dependent)
Cached query: < 1 second
When convert_species = TRUE (default),
FindLR() automatically handles species conversion:
# Mouse data - automatic conversion
mouse_genes <- rawParse(mouse_data, top_genes = 50)
lr_pairs <- FindLR(
data_1 = mouse_genes,
datatype = "mean count",
comm_type = "cytokine",
convert_species = TRUE # Default
)
# Console output:
# Detected species: Mus_musculus (95.2%)
# Converting mouse genes to human orthologs...
# Mapping complete: 847/1000 genes mapped (84.7%)| Conversion | Mapping Rate | Notes |
|---|---|---|
| Mouse → Human | 85-95% | Most comprehensive |
| Rat → Human | 80-90% | Good coverage |
| Other mammals | 70-85% | Variable |
Some genes have multiple orthologs. iTALK handles these by:
For faster access from different regions:
| Operation | Genes | Time | Memory |
|---|---|---|---|
| Species detection | 1000 | < 0.1s | < 1 MB |
| BioMart query (first) | 1000 | ~15s | ~10 MB |
| BioMart query (cached) | 1000 | < 1s | < 1 MB |
| Full FindLR with conversion | 1000 | ~20s | ~15 MB |
1. BioMart connection timeout
# Increase retry attempts
conversion <- convert_species_biomart(
genes = mouse_genes,
from_species = "Mus_musculus",
max_tries = 10
)2. Low mapping rate - Check for non-standard gene symbols - Verify species detection is correct - Some genes may be species-specific
3. Cache issues
Key points about cross-species analysis in iTALK:
sessionInfo()
#> R version 4.6.0 (2026-04-24)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.4 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: Etc/UTC
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] dplyr_1.2.1 igraph_2.3.2 iTALK_0.1.1 rmarkdown_2.31
#>
#> loaded via a namespace (and not attached):
#> [1] sass_0.4.10 generics_0.1.4 tidyr_1.3.2
#> [4] shape_1.4.6.1 stringi_1.8.7 hms_1.1.4
#> [7] digest_0.6.39 magrittr_2.0.5 evaluate_1.0.5
#> [10] grid_4.6.0 RColorBrewer_1.1-3 circlize_0.4.18
#> [13] fastmap_1.2.0 jsonlite_2.0.0 progress_1.2.3
#> [16] GlobalOptions_0.1.4 purrr_1.2.2 scales_1.4.0
#> [19] pbapply_1.7-4 randomcoloR_1.1.0.1 jquerylib_0.1.4
#> [22] cli_3.6.6 crayon_1.5.3 rlang_1.2.0
#> [25] withr_3.0.3 cachem_1.1.0 yaml_2.3.12
#> [28] otel_0.2.0 Rtsne_0.17 parallel_4.6.0
#> [31] tools_4.6.0 colorspace_2.1-2 ggplot2_4.0.3
#> [34] curl_7.1.0 buildtools_1.0.0 vctrs_0.7.3
#> [37] R6_2.6.1 lifecycle_1.0.5 stringr_1.6.0
#> [40] V8_8.2.0 cluster_2.1.8.2 pkgconfig_2.0.3
#> [43] pillar_1.11.1 bslib_0.11.0 gtable_0.3.6
#> [46] glue_1.8.1 Rcpp_1.1.1-1.1 xfun_0.59
#> [49] tibble_3.3.1 tidyselect_1.2.1 sys_3.4.3
#> [52] knitr_1.51 farver_2.1.2 htmltools_0.5.9
#> [55] maketools_1.3.2 compiler_4.6.0 prettyunits_1.2.0
#> [58] S7_0.2.2