SPARQL endpoint

Last updated: 2020-04-20

Checks: 7 0

Knit directory: Bgee/

This reproducible R Markdown analysis was created with workflowr (version 1.6.1). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.

R Markdown file: up-to-date

Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Environment: empty

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

Seed: set.seed(20200417)

The command set.seed(20200417) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Session information: recorded

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Cache: none

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

File paths: relative

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Repository version: 9073f83

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version 9073f83. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .Rhistory
    Ignored:    analysis/downloaddata_cache/
    Ignored:    analysis/extractinfo_cache/
    Ignored:    analysis/processdata_cache/

Untracked files:
    Untracked:  Bos_taurus_Bgee_14_1/
    Untracked:  Drosophila_melanogaster_Bgee_14_1/
    Untracked:  README.html
    Untracked:  release.tsv
    Untracked:  species_Bgee_14_1.tsv

Unstaged changes:
    Modified:   README.md
    Modified:   analysis/_site.yml
    Deleted:    analysis/about.Rmd
    Modified:   analysis/index.Rmd
    Deleted:    analysis/license.Rmd

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.

These are the previous versions of the repository in which changes were made to the R Markdown (analysis/sparql.Rmd) and HTML (docs/sparql.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File	Version	Author	Date	Message
Rmd	9073f83	SFonsecaCosta	2020-04-20	add analysis

In this section we will introduce you the SPARQL endpoint from Bgee. You are able to use the sparql endpoint from Bgee in R to retrieve all information from the database.

Load the packages

library(SPARQL)
library(stringr)
library(data.table)

SPARQL endpoint

The Bgee SPARQL endpoint is accessible in a stable manner through the stable URL address.

sparqlEndPoint <- "https://bgee.org/sparql14_1"

Retrieve species

Using the sparql endpoint from Bgee you are able to retrieve information about each species by specifying the respective uniprot taxon.

species_taxon <- "PREFIX up: <http://purl.uniprot.org/core/>
SELECT * {
    ?species a up:Taxon .
    ?species up:scientificName ?name .
    ?species up:rank up:Species .
}"

species_taxonTable <- unique(SPARQL(url=sparqlEndPoint, species_taxon)$results)

paste0("Number of the species present in Bgee database: ", nrow(species_taxonTable))

[1] "Number of the species present in Bgee database: 29"

As a recommendation and for forward analysis you maybe should clean the first row of the table.

species_taxonTable$species <- sub('<http://purl.uniprot.org/taxonomy/(\\d+).*', '\\1', species_taxonTable$species)
head(species_taxonTable)

   species                   name
1    10090           Mus musculus
9    10116      Rattus norvegicus
17   10141        Cavia porcellus
25   13616  Monodelphis domestica
33   28377    Anolis carolinensis
41    6239 Caenorhabditis elegans

To show you how to query particular data from species, genes or anatomical entitites, in this section we will use information collected from the TopAnat analysis, so this means we will use Bus taurus as a specie target.

Retrieve anatomical entities

Anatomical entities from a particular specie and developmental stage

You can retrieve data by specifying your target species and the target developmental stage by retrieving all anatomic entities. Here we will use as example the Bus taurus (cattle) and as developmental stage the ‘prime adult stage’.

anatEnt_devStage <- "PREFIX up: <http://purl.uniprot.org/core/>
PREFIX genex: <http://purl.org/genex#>
PREFIX obo: <http://purl.obolibrary.org/obo/>
SELECT DISTINCT ?anatName FROM <https://bgee.org/rdf_v14_1> {
    ?cond genex:hasAnatomicalEntity ?anatEntity .
    ?anatEntity rdfs:label ?anatName .
    ?cond genex:hasDevelopmentalStage ?stage .
    ?stage rdfs:label ?stageName .
    ?cond obo:RO_0002162 ?taxon .
    ?taxon up:commonName ?commonName .
    FILTER ( LCASE(?commonName) = LCASE('cattle')).
    FILTER ( CONTAINS(?stageName, 'prime adult stage'))
}"

anatEnt_devStageTable <- SPARQL(url=sparqlEndPoint, anatEnt_devStage)
print(paste0("Number of anatomical entities found: ", length(anatEnt_devStageTable$results)))

[1] "Number of anatomical entities found: 319"

Anatomic entities where a gene is expressed

Now using one of the statistical significant genes from TopAnat you should be able to retrieve all anatomical entites in Bgee. For that you should specify in your query the target species and the target gene.

anatEnt_gene_species <- "PREFIX orth: <http://purl.org/net/orth#>
PREFIX up: <http://purl.uniprot.org/core/>
PREFIX genex: <http://purl.org/genex#>
PREFIX obo: <http://purl.obolibrary.org/obo/>
PREFIX lscr: <http://purl.org/lscr#>
PREFIX dct: <http://purl.org/dc/terms/>
SELECT DISTINCT ?anatEntity ?anatName  FROM <https://bgee.org/rdf_v14_1> {
values ?ensembl_gene { <http://rdf.ebi.ac.uk/resource/ensembl/ENSBTAG00000005333> }  
   
    ?seq a orth:Gene .
    ?seq lscr:xrefEnsemblGene  ?ensembl_gene.
    ?seq rdfs:label ?geneName .
    ?seq genex:isExpressedIn ?cond .
    ?cond genex:hasAnatomicalEntity ?anatEntity .
    ?anatEntity rdfs:label ?anatName .
    ?cond obo:RO_0002162 <http://purl.uniprot.org/taxonomy/9913> . 
}"

anatEnt_gene_speciesTable <- SPARQL(url=sparqlEndPoint, anatEnt_gene_species)
print(paste0("Number of anatomical entities: ", length(anatEnt_gene_speciesTable$results$anatEntity)))

[1] "Number of anatomical entities: 13"

print(unique(anatEnt_gene_speciesTable$results$anatEntity))

 [1] "<http://purl.obolibrary.org/obo/UBERON_0000082>"
 [2] "<http://purl.obolibrary.org/obo/UBERON_0000451>"
 [3] "<http://purl.obolibrary.org/obo/UBERON_0000948>"
 [4] "<http://purl.obolibrary.org/obo/UBERON_0000955>"
 [5] "<http://purl.obolibrary.org/obo/UBERON_0001111>"
 [6] "<http://purl.obolibrary.org/obo/UBERON_0001134>"
 [7] "<http://purl.obolibrary.org/obo/UBERON_0001155>"
 [8] "<http://purl.obolibrary.org/obo/UBERON_0002048>"
 [9] "<http://purl.obolibrary.org/obo/UBERON_0002106>"
[10] "<http://purl.obolibrary.org/obo/UBERON_0001295>"
[11] "<http://purl.obolibrary.org/obo/UBERON_0001401>"
[12] "<http://purl.obolibrary.org/obo/UBERON_0002000>"
[13] "<http://purl.obolibrary.org/obo/UBERON_0034908>"

Target genes

Use the genes to target description and species.

Target the genes that have muscle as the term description.

Target the genes that have muscle as a term condiction, from this verify if the gene “ENSBTAG00000014614” was detected. Note that this genes was statistically significant in the TopAnat analysis.

genes_muscles <- "PREFIX up: <http://purl.uniprot.org/core/>
PREFIX orth: <http://purl.org/net/orth#>
PREFIX dcterms: <http://purl.org/dc/terms/>
SELECT ?geneName ?geneId FROM <https://bgee.org/rdf_v14_1> {
    ?gene a orth:Gene .
    ?gene rdfs:label ?geneName .
    ?gene dcterms:identifier ?geneId .
    ?gene dcterms:description ?desc .
    FILTER CONTAINS ( ?desc, 'muscle' )
}"

genes_musclesTable <- SPARQL(url=sparqlEndPoint, genes_muscles)

## gene just in Bus taurus
genes_musclesTable$results[genes_musclesTable$results$geneId %like% "ENSBTAG", ]

    geneName             geneId
28      PYGM ENSBTAG00000001032
29      MUSK ENSBTAG00000002744
30     MYH7B ENSBTAG00000003512
31     ACTC1 ENSBTAG00000005714
32      MYH2 ENSBTAG00000007090
33      MYH8 ENSBTAG00000009702
34      MYH7 ENSBTAG00000009703
35    ANKRD1 ENSBTAG00000011734
36    CAPZA3 ENSBTAG00000013207
37       CKM ENSBTAG00000013921
38     MBNL3 ENSBTAG00000014088
39     PERM1 ENSBTAG00000014540
40      SMPX ENSBTAG00000015204
41     MYLPF ENSBTAG00000021218
42      MURC ENSBTAG00000021992
43           ENSBTAG00000030186
44      MYH4 ENSBTAG00000037794
45     ACTA1 ENSBTAG00000046332
453     PFKM ENSBTAG00000000286
454   ATP2A2 ENSBTAG00000001398
455     MRAS ENSBTAG00000001497
456      PKM ENSBTAG00000001601
457   ATP5A1 ENSBTAG00000002507
458    MYH14 ENSBTAG00000002580
459   CAPZA2 ENSBTAG00000004072
460    CAPZB ENSBTAG00000004554
461    MBNL1 ENSBTAG00000004564
462     ENO3 ENSBTAG00000005534
463    ACYP2 ENSBTAG00000006852
464      GEM ENSBTAG00000007596
465    PHKG1 ENSBTAG00000008195
466          ENSBTAG00000009713
467     CNN1 ENSBTAG00000011207
468    PAMR1 ENSBTAG00000012630
469   ANKRD2 ENSBTAG00000012720
470   CAPZA1 ENSBTAG00000014295
471     MYLK ENSBTAG00000014567
472    ACTA2 ENSBTAG00000014614
473   COX7A1 ENSBTAG00000014878
474     CFL2 ENSBTAG00000015053
475    ACTG2 ENSBTAG00000015441
476    PHKA1 ENSBTAG00000015848
477    FABP3 ENSBTAG00000016819
478    MBNL2 ENSBTAG00000018313
479    MYH10 ENSBTAG00000021151
480    LMOD1 ENSBTAG00000021576

Target species where gene is present

Verify if the geneId “ENSBTAG00000014614” with gene name ACTA2 is also present in others species.

gene_present_species <- "PREFIX up: <http://purl.uniprot.org/core/>
PREFIX orth: <http://purl.org/net/orth#>
PREFIX obo: <http://purl.obolibrary.org/obo/>
SELECT ?name FROM <https://bgee.org/rdf_v14_1> {
    ?gene a orth:Gene .
    ?gene rdfs:label ?geneName .
    ?gene orth:organism ?organism . #orth v2
    ?organism obo:RO_0002162 ?taxon . #label: in taxon .
    ?taxon up:scientificName ?name .
    FILTER ( UCASE(?geneName) = UCASE('ACTA2') )
}"

gene_present_speciesTable <- SPARQL(url=sparqlEndPoint, gene_present_species)
print(paste0("Number of species detected: ",length(gene_present_speciesTable$results)))

[1] "Number of species detected: 18"

t(gene_present_speciesTable$results)

        [,1]                      
name    "Danio rerio"             
name.1  "Homo sapiens"            
name.2  "Mus musculus"            
name.3  "Rattus norvegicus"       
name.4  "Sus scrofa"              
name.5  "Xenopus tropicalis"      
name.6  "Anolis carolinensis"     
name.7  "Bos taurus"              
name.8  "Canis lupus familiaris"  
name.9  "Cavia porcellus"         
name.10 "Equus caballus"          
name.11 "Erinaceus europaeus"     
name.12 "Felis catus"             
name.13 "Ornithorhynchus anatinus"
name.14 "Oryctolagus cuniculus"   
name.15 "Gorilla gorilla"         
name.16 "Macaca mulatta"          
name.17 "Monodelphis domestica"

sessionInfo()

R version 3.6.0 (2019-04-26)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.4

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] data.table_1.12.8 stringr_1.4.0     SPARQL_1.16       RCurl_1.98-1.1   
[5] XML_3.99-0.3      workflowr_1.6.1  

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.4.6    knitr_1.28      whisker_0.4     magrittr_1.5   
 [5] R6_2.4.1        rlang_0.4.5     tools_3.6.0     xfun_0.13      
 [9] git2r_0.26.1    htmltools_0.4.0 yaml_2.2.1      digest_0.6.25  
[13] rprojroot_1.3-2 later_1.0.0     promises_1.1.0  fs_1.4.1       
[17] bitops_1.0-6    glue_1.4.0      evaluate_0.14   rmarkdown_2.1  
[21] stringi_1.4.6   compiler_3.6.0  backports_1.1.6 httpuv_1.5.2

SPARQL endpoint

Sara Fonseca Costa

April 20, 2020

Load the packages

SPARQL endpoint

Retrieve species

Retrieve anatomical entities

Anatomical entities from a particular specie and developmental stage

Anatomic entities where a gene is expressed

Target genes

Target the genes that have muscle as the term description.

Target species where gene is present