Last updated: 2025-08-25
Checks: 7 0
Knit directory:
genomics_ancest_disease_dispar/
This reproducible R Markdown analysis was created with workflowr (version 1.7.1). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.
Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.
Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.
The command set.seed(20220216)
was run prior to running
the code in the R Markdown file. Setting a seed ensures that any results
that rely on randomness, e.g. subsampling or permutations, are
reproducible.
Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.
Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.
Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.
The results in this page were generated with repository version b710a4d. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.
Note that you need to be careful to ensure that all relevant files for
the analysis have been committed to Git prior to generating the results
(you can use wflow_publish
or
wflow_git_commit
). workflowr only checks the R Markdown
file, but you know if there are other scripts or data files that it
depends on. Below is the status of the Git repository when the results
were generated:
Ignored files:
Ignored: .Rproj.user/
Ignored: data/gwas_catalog/
Ignored: output/gwas_study_info_cohort_corrected.csv
Ignored: output/gwas_study_info_trait_corrected.csv
Ignored: output/gwas_study_info_trait_ontology_info.csv
Ignored: output/trait_ontology/
Untracked files:
Untracked: .DS_Store
Untracked: data/.DS_Store
Untracked: renv/
Unstaged changes:
Modified: .Rprofile
Modified: analysis/collapse_traits.Rmd
Modified: analysis/disease_inves_by_ancest.Rmd
Modified: analysis/index.Rmd
Modified: analysis/replication_ancestry_bias.Rmd
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
These are the previous versions of the repository in which changes were
made to the R Markdown
(analysis/trait_ontology_collapse.Rmd
) and HTML
(docs/trait_ontology_collapse.html
) files. If you’ve
configured a remote Git repository (see ?wflow_git_remote
),
click on the hyperlinks in the table below to view the files as they
were in that past version.
File | Version | Author | Date | Message |
---|---|---|---|---|
Rmd | b710a4d | IJbeasley | 2025-08-25 | Update trait ontology investigation to group disease terms |
html | 243218e | IJbeasley | 2025-08-24 | Build site. |
Rmd | 6ceba65 | IJbeasley | 2025-08-24 | workflowr::wflow_publish("analysis/trait_ontology_collapse.Rmd") |
knitr::opts_chunk$set(echo = TRUE,
message = FALSE,
warning = FALSE
)
library(data.table)
library(dplyr)
library(ggplot2)
library(stringr)
library(httr)
library(jsonlite)
url <-
"http://www.ebi.ac.uk/ols4/api/ontologies/mondo/terms/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FMONDO_0700096/descendants"
mondo_descendants <- c()
repeat {
res <- GET(url)
stop_for_status(res)
data <- fromJSON(content(res, as = "text", encoding = "UTF-8"))
mondo_descendants <- c(mondo_descendants, data$`_embedded`$terms$label)
# check if there is a next page
if (!is.null(data$`_links`$`next`$href)) {
url <- data$`_links`$`next`$href
} else {
break
}
}
length(mondo_descendants)
mondo_descendants[1:5]
writeLines(mondo_descendants, here::here("output/trait_ontology/mondo_0700096_descendants.txt"))
url <-
"http://www.ebi.ac.uk/ols4/api/ontologies/efo/terms/http%253A%252F%252Fwww.ebi.ac.uk%252Fefo%252FEFO_0000408/descendants"
efo_descendants <- c()
repeat {
res <- GET(url)
stop_for_status(res)
data <- fromJSON(content(res, as = "text", encoding = "UTF-8"))
efo_descendants <- c(efo_descendants, data$`_embedded`$terms$label)
# check if there is a next page
if (!is.null(data$`_links`$`next`$href)) {
url <- data$`_links`$`next`$href
} else {
break
}
}
length(efo_descendants)
efo_descendants[1:5]
writeLines(efo_descendants, here::here("output/trait_ontology/efo_0000408_descendants.txt"))
url <- "http://www.ebi.ac.uk/ols4/api/ontologies/ncit/terms/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FNCIT_C2991/descendants"
ncit_descendants <- c()
repeat {
res <- GET(url)
stop_for_status(res)
data <- fromJSON(content(res, as = "text", encoding = "UTF-8"))
ncit_descendants <- c(ncit_descendants, data$`_embedded`$terms$label)
# check if there is a next page
if (!is.null(data$`_links`$`next`$href)) {
url <- data$`_links`$`next`$href
} else {
break
}
}
length(ncit_descendants)
ncit_descendants[1:5]
writeLines(ncit_descendants, here::here("output/trait_ontology/ncit_C2991_descendants.txt"))
url <- "http://www.ebi.ac.uk/ols4/api/ontologies/ordo/terms/http%253A%252F%252Fwww.orpha.net%252FORDO%252FOrphanet_557493/descendants"
orphanet_descendants <- c()
repeat {
res <- GET(url)
stop_for_status(res)
data <- fromJSON(content(res, as = "text", encoding = "UTF-8"))
orphanet_descendants <- c(orphanet_descendants, data$`_embedded`$terms$label)
# check if there is a next page
if (!is.null(data$`_links`$`next`$href)) {
url <- data$`_links`$`next`$href
} else {
break
}
}
length(orphanet_descendants)
orphanet_descendants[1:5]
writeLines(orphanet_descendants, here::here("output/trait_ontology/orphanet_557493_descendants.txt"))
url <- "http://www.ebi.ac.uk/ols4/api/ontologies/efo/terms/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FOBI_1110122/descendants"
path_descendants <- c()
repeat {
res <- GET(url)
stop_for_status(res)
data <- fromJSON(content(res, as = "text", encoding = "UTF-8"))
path_descendants <- c(path_descendants, data$`_embedded`$terms$label)
# check if there is a next page
if (!is.null(data$`_links`$`next`$href)) {
url <- data$`_links`$`next`$href
} else {
break
}
}
length(path_descendants)
path_descendants[1:5]
writeLines(path_descendants, here::here("output/trait_ontology/obi_1110122_descendants.txt"))
url <- "http://www.ebi.ac.uk/ols4/api/ontologies/efo/terms/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FOGMS_0000063/descendants"
disease_course_descendants <- c()
repeat {
res <- GET(url)
stop_for_status(res)
data <- fromJSON(content(res, as = "text", encoding = "UTF-8"))
disease_course_descendants <- c(disease_course_descendants, data$`_embedded`$terms$label)
# check if there is a next page
if (!is.null(data$`_links`$`next`$href)) {
url <- data$`_links`$`next`$href
} else {
break
}
}
length(disease_course_descendants)
disease_course_descendants[1:5]
writeLines(disease_course_descendants, here::here("output/trait_ontology/ogms_0000063_descendants.txt"))
url <- "http://www.ebi.ac.uk/ols4/api/ontologies/efo/terms/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FHP_0000118/descendants"
hp_descendants <- c()
repeat {
res <- GET(url)
stop_for_status(res)
data <- fromJSON(content(res, as = "text", encoding = "UTF-8"))
hp_descendants <- c(hp_descendants, data$`_embedded`$terms$label)
# check if there is a next page
if (!is.null(data$`_links`$`next`$href)) {
url <- data$`_links`$`next`$href
} else {
break
}
}
length(hp_descendants)
[1] 2625
hp_descendants[1:5]
[1] "hypoxia"
[2] "necrosis"
[3] "Abnormality of limbs"
[4] "Abnormality of the musculoskeletal system"
[5] "Vocal cord dysfunction"
writeLines(hp_descendants, here::here("output/trait_ontology/hp_0000118_descendants.txt"))
# Define the API endpoint
url <- "http://www.ebi.ac.uk/ols4/api/ontologies/efo/terms/http%253A%252F%252Fwww.ebi.ac.uk%252Fefo%252FEFO_0001444/descendants"
measurement_descendants <- c()
repeat {
res <- GET(url)
stop_for_status(res)
data <- fromJSON(content(res, as = "text", encoding = "UTF-8"))
measurement_descendants <- c(measurement_descendants, data$`_embedded`$terms$label)
# check if there is a next page
if (!is.null(data$`_links`$`next`$href)) {
url <- data$`_links`$`next`$href
} else {
break
}
}
length(measurement_descendants)
measurement_descendants[1:5]
writeLines(measurement_descendants, here::here("output/trait_ontology/efo_0001444_descendants.txt"))
url <- "http://www.ebi.ac.uk/ols4/api/ontologies/efo/terms/http%253A%252F%252Fwww.ebi.ac.uk%252Fefo%252FEFO_0004574/descendants"
total_choles_descendants <- c()
repeat {
res <- GET(url)
stop_for_status(res)
data <- fromJSON(content(res, as = "text", encoding = "UTF-8"))
total_choles_descendants <- c(total_choles_descendants, data$`_embedded`$terms$label)
# check if there is a next page
if (!is.null(data$`_links`$`next`$href)) {
url <- data$`_links`$`next`$href
} else {
break
}
}
writeLines(total_choles_descendants, here::here("output/trait_ontology/efo_0004574_descendants.txt"))
length(total_choles_descendants)
total_choles_descendants[1:5]
Includes:
url <- "http://www.ebi.ac.uk/ols4/api/ontologies/go/terms/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FGO_0050896/descendants"
go_response_descendants <- c()
repeat {
res <- GET(url)
stop_for_status(res)
data <- fromJSON(content(res, as = "text", encoding = "UTF-8"))
go_response_descendants <- c(go_response_descendants, data$`_embedded`$terms$label)
# check if there is a next page
if (!is.null(data$`_links`$`next`$href)) {
url <- data$`_links`$`next`$href
} else {
break
}
}
length(go_response_descendants)
go_response_descendants[1:5]
writeLines(go_response_descendants, here::here("output/trait_ontology/go_0050896_descendants.txt"))
url <- "http://www.ebi.ac.uk/ols4/api/ontologies/efo/terms/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FGO_0050896/descendants"
efo_response_descendants <- c()
repeat {
res <- GET(url)
stop_for_status(res)
data <- fromJSON(content(res, as = "text", encoding = "UTF-8"))
efo_response_descendants <- c(efo_response_descendants, data$`_embedded`$terms$label)
# check if there is a next page
if (!is.null(data$`_links`$`next`$href)) {
url <- data$`_links`$`next`$href
} else {
break
}
}
length(efo_response_descendants)
efo_response_descendants[1:5]
writeLines(efo_response_descendants, here::here("output/trait_ontology/efo_go_0050896_descendants.txt"))
url <- "http://www.ebi.ac.uk/ols4/api/ontologies/efo/terms/http%253A%252F%252Fwww.ebi.ac.uk%252Fefo%252FEFO_0000651/descendants"
phenotype_descendants <- c()
repeat {
res <- GET(url)
stop_for_status(res)
data <- fromJSON(content(res, as = "text", encoding = "UTF-8"))
phenotype_descendants <- c(phenotype_descendants, data$`_embedded`$terms$label)
# check if there is a next page
if (!is.null(data$`_links`$`next`$href)) {
url <- data$`_links`$`next`$href
} else {
break
}
}
length(phenotype_descendants)
phenotype_descendants[1:5]
writeLines(phenotype_descendants, here::here("output/trait_ontology/efo_0000651_descendants.txt"))
url <- "http://www.ebi.ac.uk/ols4/api/ontologies/efo/terms/http%253A%252F%252Fwww.ebi.ac.uk%252Fefo%252FEFO_0004323/descendants"
mental_descendants <- c()
repeat {
res <- GET(url)
stop_for_status(res)
data <- fromJSON(content(res, as = "text", encoding = "UTF-8"))
mental_descendants <- c(mental_descendants, data$`_embedded`$terms$label)
# check if there is a next page
if (!is.null(data$`_links`$`next`$href)) {
url <- data$`_links`$`next`$href
} else {
break
}
}
length(mental_descendants)
mental_descendants[1:5]
mental_descendants = c("mental process",
"cognitive function measurement",
mental_descendants)
writeLines(mental_descendants, here::here("output/trait_ontology/efo_0004323_descendants.txt"))
url <- "http://www.ebi.ac.uk/ols4/api/ontologies/efo/terms/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FGO_0007610/descendants"
behavior_descendants <- c()
repeat {
res <- GET(url)
stop_for_status(res)
data <- fromJSON(content(res, as = "text", encoding = "UTF-8"))
behavior_descendants <- c(behavior_descendants, data$`_embedded`$terms$label)
# check if there is a next page
if (!is.null(data$`_links`$`next`$href)) {
url <- data$`_links`$`next`$href
} else {
break
}
}
length(behavior_descendants)
behavior_descendants[1:5]
behavior_descendants = c("behavior", behavior_descendants)
writeLines(behavior_descendants, here::here("output/trait_ontology/go_0007610_descendants.txt"))
url <- "http://www.ebi.ac.uk/ols4/api/ontologies/efo/terms/http%253A%252F%252Fwww.ebi.ac.uk%252Fefo%252FEFO_0000546/descendants"
injury_descendants <- c()
repeat {
res <- GET(url)
stop_for_status(res)
data <- fromJSON(content(res, as = "text", encoding = "UTF-8"))
injury_descendants <- c(injury_descendants, data$`_embedded`$terms$label)
# check if there is a next page
if (!is.null(data$`_links`$`next`$href)) {
url <- data$`_links`$`next`$href
} else {
break
}
}
length(injury_descendants)
injury_descendants[1:5]
injury_descendants = c("injury", injury_descendants)
writeLines(injury_descendants, here::here("output/trait_ontology/efo_0000546_descendants.txt"))
enviro_factors <- c(
"income",
"socioeconomic status",
"encounter with health service related to socioeconomic and psychosocial circumstances" ,
"townsend deprivation index",
"household income",
"social deprivation",
"economic and social preference",
"social deprivation",
"physical activity",
"exercise",
"family relationship"
)
gwas_study_info <- fread(here::here("output/gwas_study_info_cohort_corrected.csv"))
all_gwas_terms = unique(gwas_study_info$MAPPED_TRAIT)
all_gwas_terms = stringr::str_trim(tolower(all_gwas_terms))
efo_descendants <- readLines(here::here("output/trait_ontology/efo_0000408_descendants.txt"))
mondo_descendants <- readLines(here::here("output/trait_ontology/mondo_0700096_descendants.txt"))
ncit_descendants <- readLines(here::here("output/trait_ontology/ncit_C2991_descendants.txt"))
orphanet_descendants <- readLines(here::here("output/trait_ontology/orphanet_557493_descendants.txt"))
disease_terms = c(mondo_descendants,
efo_descendants,
ncit_descendants,
orphanet_descendants) |>
unique()
disease_terms = stringr::str_trim(tolower(disease_terms))
print("Number of terms related to disease or disorder")
[1] "Number of terms related to disease or disorder"
length(disease_terms)
[1] 53251
not_simple_disease_terms = all_gwas_terms[!all_gwas_terms %in% disease_terms]
# sometimes there's multiple terms - check if any disease term is in these gwas terms
multiple_terms = grep(",", not_simple_disease_terms, value = T)
mask <- Reduce(`|`, lapply(disease_terms, function(x) grepl(x, multiple_terms)))
additional_disease_gwas <- multiple_terms[mask]
disease_gwas = c(all_gwas_terms[all_gwas_terms %in% disease_terms],
additional_disease_gwas)
not_disease_terms = not_simple_disease_terms[!not_simple_disease_terms %in% additional_disease_gwas]
print("Number of GWAS traits under disease or disorder terms")
[1] "Number of GWAS traits under disease or disorder terms"
length(all_gwas_terms) - length(not_disease_terms)
[1] 3377
print("Percentage of GWAS traits under disease or disorder terms")
[1] "Percentage of GWAS traits under disease or disorder terms"
round(100 * (length(all_gwas_terms) - length(not_disease_terms)) / length(all_gwas_terms),
digits = 1)
[1] 14.8
print("Percentage of GWAS traits not under disease or disorder terms")
[1] "Percentage of GWAS traits not under disease or disorder terms"
round(100 * length(not_disease_terms) / length(all_gwas_terms),
digits = 1)
[1] 85.2
find_disease_terms = function(MAPPED_TRAIT) {
# find all disease terms that appear in the trait
split_mapped_traits <- stringr::str_split(MAPPED_TRAIT, ", ") |>
unlist()
mapped_disease_terms = disease_terms[disease_terms %in% split_mapped_traits]
mapped_disease_terms = unique(mapped_disease_terms)
return(paste0(mapped_disease_terms, collapse = ", ")) # combine multiple matches
}
gwas_study_info <-
gwas_study_info |>
dplyr::rowwise() |>
dplyr::mutate(
disease_terms =
ifelse(MAPPED_TRAIT %in% disease_gwas,
find_disease_terms(MAPPED_TRAIT),
NA)
)
measurement <- readLines(here::here("output/trait_ontology/efo_0001444_descendants.txt"))
total_choles <- readLines(here::here("output/trait_ontology/efo_0004574_descendants.txt"))
measurement <- c(total_choles,
measurement)
measurement <- unique(measurement)
measurement = stringr::str_trim(tolower(measurement))
# Find terms where all comma-split pieces are in measurement
measurement_gwas <- not_disease_terms[
sapply(strsplit(not_disease_terms, ", "), function(parts) {
parts <- trimws(parts) # remove extra spaces
all(parts %in% measurement)
})
]
additional_measurement <- not_disease_terms[not_disease_terms %in% measurement]
measurement_gwas = c(measurement_gwas, additional_measurement) |> unique()
print("Number of GWAS traits under measurement terms")
[1] "Number of GWAS traits under measurement terms"
length(measurement_gwas)
[1] 18226
print("Percentage of GWAS traits under measurement terms")
[1] "Percentage of GWAS traits under measurement terms"
round(100 * length(measurement_gwas) / length(all_gwas_terms),
digits = 1)
[1] 79.7
not_accounted_for = not_disease_terms[!not_disease_terms %in% measurement_gwas]
print("Percentage of GWAS traits not accounted for by disease, disorder or measurement terms")
[1] "Percentage of GWAS traits not accounted for by disease, disorder or measurement terms"
round(100 * length(not_accounted_for) / length(all_gwas_terms),
digits = 1)
[1] 5.6
print("Number of GWAS traits not accounted for by disease, disorder or measurement terms")
[1] "Number of GWAS traits not accounted for by disease, disorder or measurement terms"
length(not_accounted_for)
[1] 1274
go_response = readLines(here::here("output/trait_ontology/go_0050896_descendants.txt"))
efo_response <- readLines(here::here("output/trait_ontology/efo_go_0050896_descendants.txt"))
response <- c(go_response,
efo_response,
"response to stimulus")
response <- unique(response)
response = stringr::str_trim(tolower(response))
response_gwas = not_accounted_for[not_accounted_for %in% response]
print("Percentage of GWAS traits under response terms")
[1] "Percentage of GWAS traits under response terms"
round(100 * length(response_gwas) / length(all_gwas_terms),
digits = 1)
[1] 0.6
not_accounted_for = not_accounted_for[!not_accounted_for %in% response_gwas]
print("Percentage of GWAS traits not accounted for by disease, measurement or response terms")
[1] "Percentage of GWAS traits not accounted for by disease, measurement or response terms"
round(100 * length(not_accounted_for) / length(all_gwas_terms),
digits = 1)
[1] 5
print("Number of GWAS traits not accounted for by disease, measurement or response terms")
[1] "Number of GWAS traits not accounted for by disease, measurement or response terms"
length(not_accounted_for)
[1] 1141
pheno_abnorm <- readLines(here::here("output/trait_ontology/hp_0000118_descendants.txt"))
pheno_abnorm = stringr::str_trim(tolower(pheno_abnorm))
pheno_abnorm_gwas = not_accounted_for[not_accounted_for %in% pheno_abnorm]
print("Percentage of GWAS traits under phenotype abnormality terms")
[1] "Percentage of GWAS traits under phenotype abnormality terms"
round(100 * length(pheno_abnorm_gwas) / length(all_gwas_terms),
digits = 1)
[1] 1.5
not_accounted_for = not_accounted_for[!not_accounted_for %in% pheno_abnorm_gwas]
print("Percentage of GWAS traits not accounted for so far")
[1] "Percentage of GWAS traits not accounted for so far"
round(100 * length(not_accounted_for) / length(all_gwas_terms),
digits = 1)
[1] 3.5
print("Number of GWAS traits not accounted for by so far")
[1] "Number of GWAS traits not accounted for by so far"
length(not_accounted_for)
[1] 804
mental <- readLines(here::here("output/trait_ontology/efo_0004323_descendants.txt"))
mental = stringr::str_trim(tolower(mental))
mental_gwas = not_accounted_for[not_accounted_for %in% mental]
print("Percentage of GWAS traits under mental process terms")
[1] "Percentage of GWAS traits under mental process terms"
round(100 * length(mental_gwas) / length(all_gwas_terms),
digits = 1)
[1] 0.1
not_accounted_for = not_accounted_for[!not_accounted_for %in% mental_gwas]
print("Percentage of GWAS traits not accounted for thus far")
[1] "Percentage of GWAS traits not accounted for thus far"
round(100 * length(not_accounted_for) / length(all_gwas_terms),
digits = 1)
[1] 3.4
print("Number of GWAS traits not accounted for thus far")
[1] "Number of GWAS traits not accounted for thus far"
length(not_accounted_for)
[1] 786
behavior <- readLines(here::here("output/trait_ontology/go_0007610_descendants.txt"))
behavior = stringr::str_trim(tolower(behavior))
behavior_gwas = not_accounted_for[not_accounted_for %in% behavior]
print("Percentage of GWAS traits under behavouir terms")
[1] "Percentage of GWAS traits under behavouir terms"
round(100 * length(behavior_gwas) / length(all_gwas_terms),
digits = 1)
[1] 0.1
not_accounted_for = not_accounted_for[!not_accounted_for %in% behavior_gwas]
print("Percentage of GWAS traits not accounted for so far")
[1] "Percentage of GWAS traits not accounted for so far"
round(100 * length(not_accounted_for) / length(all_gwas_terms),
digits = 1)
[1] 3.3
print("Number of GWAS traits not accounted for so far")
[1] "Number of GWAS traits not accounted for so far"
length(not_accounted_for)
[1] 763
injury <- readLines(here::here("output/trait_ontology/efo_0000546_descendants.txt"))
injury = stringr::str_trim(tolower(injury))
injury_gwas = not_accounted_for[not_accounted_for %in% injury]
print("Percentage of GWAS traits under injury terms")
[1] "Percentage of GWAS traits under injury terms"
round(100 * length(injury_gwas) / length(all_gwas_terms),
digits = 1)
[1] 0.1
not_accounted_for = not_accounted_for[!not_accounted_for %in% injury_gwas]
print("Percentage of GWAS traits not accounted for so far")
[1] "Percentage of GWAS traits not accounted for so far"
round(100 * length(not_accounted_for) / length(all_gwas_terms),
digits = 1)
[1] 3.3
print("Number of GWAS traits not accounted for so far")
[1] "Number of GWAS traits not accounted for so far"
length(not_accounted_for)
[1] 744
phenotype <- readLines(here::here("output/trait_ontology/efo_0000651_descendants.txt"))
phenotype = stringr::str_trim(tolower(phenotype))
phenotype_gwas = not_accounted_for[not_accounted_for %in% phenotype]
print("Percentage of GWAS traits under phenotype terms")
[1] "Percentage of GWAS traits under phenotype terms"
round(100 * length(phenotype_gwas) / length(all_gwas_terms),
digits = 1)
[1] 0.2
not_accounted_for = not_accounted_for[!not_accounted_for %in% phenotype_gwas]
print("Percentage of GWAS traits not accounted for so far")
[1] "Percentage of GWAS traits not accounted for so far"
round(100 * length(not_accounted_for) / length(all_gwas_terms),
digits = 1)
[1] 3
print("Number of GWAS traits not accounted for so far")
[1] "Number of GWAS traits not accounted for so far"
length(not_accounted_for)
[1] 695
gwas_study_info =
gwas_study_info |>
dplyr::mutate(MAPPED_TRAIT_CATEGORY = dplyr::case_when(MAPPED_TRAIT %in% disease_gwas ~ "Disease/Disorder",
MAPPED_TRAIT %in% pheno_abnorm_gwas ~ "Phenotypic Abornmality",
MAPPED_TRAIT %in% measurement_gwas ~ "Measurement",
MAPPED_TRAIT %in% response_gwas ~ "Response",
TRUE ~ "Other"
)
)
gwas_study_info$MAPPED_BACKGROUND_TRAIT |> unique() -> gwas_background
gwas_background = stringr::str_trim(tolower(gwas_background))
multiple_terms = grep(",", gwas_background, value = T)
mask <- Reduce(`|`, lapply(disease_terms, function(x) grepl(x, multiple_terms)))
additional_disease_gwas <- multiple_terms[mask]
disease_gwas = c(gwas_background[gwas_background %in% disease_terms],
additional_disease_gwas)
gwas_study_info <-
gwas_study_info |>
rowwise() |>
dplyr::mutate(
background_disease_terms =
ifelse(MAPPED_BACKGROUND_TRAIT %in% disease_gwas,
find_disease_terms(MAPPED_BACKGROUND_TRAIT),
NA)
)
not_accounted_background = gwas_background[!gwas_background %in% disease_gwas]
measurement_gwas <- not_disease_terms[
sapply(strsplit(not_accounted_background, ", "), function(parts) {
parts <- trimws(parts) # remove extra spaces
all(parts %in% measurement)
})
]
measurement_gwas = c(measurement_gwas, additional_measurement) |> unique()
additional_measurement <- not_accounted_background[not_accounted_background %in% measurement]
not_accounted_background = not_accounted_background[!not_accounted_background %in% measurement_gwas]
response_gwas = not_accounted_background[not_accounted_background %in% response]
gwas_study_info =
gwas_study_info |>
dplyr::mutate(BACKGROUND_TRAIT_CATEGORY = dplyr::case_when(MAPPED_BACKGROUND_TRAIT %in% disease_gwas ~ "Disease/Disorder",
MAPPED_BACKGROUND_TRAIT %in% measurement_gwas ~ "Measurement",
MAPPED_BACKGROUND_TRAIT %in% response_gwas ~ "Response",
TRUE ~ "Other"
))
gwas_study_info |>
group_by(MAPPED_TRAIT_CATEGORY, BACKGROUND_TRAIT_CATEGORY) |>
summarise(n_studies = n()) |>
arrange(desc(n_studies))
# A tibble: 18 × 3
# Groups: MAPPED_TRAIT_CATEGORY [5]
MAPPED_TRAIT_CATEGORY BACKGROUND_TRAIT_CATEGORY n_studies
<chr> <chr> <int>
1 Measurement Measurement 84977
2 Other Measurement 27173
3 Disease/Disorder Measurement 16585
4 Measurement Disease/Disorder 7252
5 Other Disease/Disorder 5109
6 Phenotypic Abornmality Measurement 436
7 Disease/Disorder Disease/Disorder 365
8 Response Measurement 274
9 Measurement Other 256
10 Response Disease/Disorder 155
11 Other Other 108
12 Disease/Disorder Other 97
13 Phenotypic Abornmality Disease/Disorder 37
14 Response Other 13
15 Phenotypic Abornmality Other 11
16 Measurement Response 5
17 Disease/Disorder Response 1
18 Other Response 1
gwas_study_info =
gwas_study_info |>
dplyr::rowwise() |>
dplyr::mutate(DISEASE_STUDY = ifelse(MAPPED_TRAIT_CATEGORY == "Disease/Disorder" | BACKGROUND_TRAIT_CATEGORY == "Disease/Disorer", T, F )) |>
dplyr::ungroup()
combined_disease_terms = function(MAPPED_TRAIT_1, MAPPED_TRAIT_2){
MAPPED_TRAIT_1 = stringr::str_split(MAPPED_TRAIT_1, ", ") |> unlist()
MAPPED_TRAIT_2 = stringr::str_split(MAPPED_TRAIT_2, ", ") |> unlist()
all_mapped_disease_terms =
c(MAPPED_TRAIT_1, MAPPED_TRAIT_2) |>
unique()
combined_mapped_disease_terms = paste0(all_mapped_disease_terms,
collapse = ", ")
return(combined_mapped_disease_terms)
}
gwas_study_info <-
gwas_study_info |>
dplyr::rowwise() |>
dplyr::mutate(all_disease_terms =
case_when(is.na(background_disease_terms) & is.na(disease_terms) ~ NA,
is.na(background_disease_terms) & !is.na(disease_terms) ~ disease_terms,
!is.na(background_disease_terms) & is.na(disease_terms) ~ background_disease_terms,
!is.na(background_disease_terms) & !is.na(disease_terms) ~
combined_disease_terms(background_disease_terms,
disease_terms))
) |>
dplyr::ungroup()
data.table::fwrite(gwas_study_info,
here::here("output/gwas_study_info_trait_ontology_info.csv"),
sep = ",")
sessionInfo()
R version 4.3.1 (2023-06-16)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS 15.6.1
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
time zone: America/Los_Angeles
tzcode source: internal
attached base packages:
[1] stats graphics grDevices datasets utils methods base
other attached packages:
[1] jsonlite_2.0.0 httr_1.4.7 stringr_1.5.1 ggplot2_3.5.2
[5] dplyr_1.1.4 data.table_1.17.8 workflowr_1.7.1
loaded via a namespace (and not attached):
[1] gtable_0.3.6 compiler_4.3.1 renv_1.0.3 promises_1.3.3
[5] tidyselect_1.2.1 Rcpp_1.1.0 git2r_0.36.2 callr_3.7.6
[9] later_1.4.2 jquerylib_0.1.4 scales_1.4.0 yaml_2.3.10
[13] fastmap_1.2.0 here_1.0.1 R6_2.6.1 generics_0.1.4
[17] curl_6.4.0 knitr_1.50 tibble_3.3.0 rprojroot_2.1.0
[21] RColorBrewer_1.1-3 bslib_0.9.0 pillar_1.11.0 rlang_1.1.6
[25] utf8_1.2.6 cachem_1.1.0 stringi_1.8.7 httpuv_1.6.16
[29] xfun_0.52 getPass_0.2-4 fs_1.6.6 sass_0.4.10
[33] cli_3.6.5 withr_3.0.2 magrittr_2.0.3 ps_1.9.1
[37] grid_4.3.1 digest_0.6.37 processx_3.8.6 rstudioapi_0.17.1
[41] lifecycle_1.0.4 vctrs_0.6.5 evaluate_1.0.4 glue_1.8.0
[45] farver_2.1.2 whisker_0.4.1 rmarkdown_2.29 tools_4.3.1
[49] pkgconfig_2.0.3 htmltools_0.5.8.1