Last updated: 2025-09-22
Checks: 7 0
Knit directory:
genomics_ancest_disease_dispar/
This reproducible R Markdown analysis was created with workflowr (version 1.7.1). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.
Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.
Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.
The command set.seed(20220216)
was run prior to running
the code in the R Markdown file. Setting a seed ensures that any results
that rely on randomness, e.g. subsampling or permutations, are
reproducible.
Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.
Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.
Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.
The results in this page were generated with repository version 3305f6a. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.
Note that you need to be careful to ensure that all relevant files for
the analysis have been committed to Git prior to generating the results
(you can use wflow_publish
or
wflow_git_commit
). workflowr only checks the R Markdown
file, but you know if there are other scripts or data files that it
depends on. Below is the status of the Git repository when the results
were generated:
Ignored files:
Ignored: .DS_Store
Ignored: .Rproj.user/
Ignored: data/.DS_Store
Ignored: data/gbd/.DS_Store
Ignored: data/gbd/IHME-GBD_2021_DATA-d8cf695e-1.csv
Ignored: data/gbd/ihme_gbd_2019_global_disease_burden_rate_all_ages.csv
Ignored: data/gbd/ihme_gbd_2019_global_paf_rate_percent_all_ages.csv
Ignored: data/gbd/ihme_gbd_2021_global_disease_burden_rate_all_ages.csv
Ignored: data/gbd/ihme_gbd_2021_global_paf_rate_percent_all_ages.csv
Ignored: data/gwas_catalog/
Ignored: data/who/
Ignored: output/gwas_cat/
Ignored: output/gwas_study_info_cohort_corrected.csv
Ignored: output/gwas_study_info_trait_corrected.csv
Ignored: output/gwas_study_info_trait_ontology_info.csv
Ignored: output/gwas_study_info_trait_ontology_info_l1.csv
Ignored: output/gwas_study_info_trait_ontology_info_l2.csv
Ignored: output/trait_ontology/
Ignored: renv/
Unstaged changes:
Modified: analysis/index.Rmd
Modified: code/get_term_descendants.R
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
These are the previous versions of the repository in which changes were
made to the R Markdown (analysis/level_2_disease_group.Rmd
)
and HTML (docs/level_2_disease_group.html
) files. If you’ve
configured a remote Git repository (see ?wflow_git_remote
),
click on the hyperlinks in the table below to view the files as they
were in that past version.
File | Version | Author | Date | Message |
---|---|---|---|---|
Rmd | 3305f6a | IJbeasley | 2025-09-22 | …maybe fixing typos |
html | 200442f | IJbeasley | 2025-09-17 | Build site. |
Rmd | 614204e | IJbeasley | 2025-09-17 | More fixing up of disease grouping |
html | 7b87d93 | IJbeasley | 2025-09-17 | Build site. |
Rmd | da5c4b4 | IJbeasley | 2025-09-17 | More correction to cardiovascular disease terms |
html | 08b0db3 | IJbeasley | 2025-09-17 | Build site. |
Rmd | bb8ae95 | IJbeasley | 2025-09-17 | Better grouping of cardiovascular disease |
html | 7cf2803 | IJbeasley | 2025-09-17 | Build site. |
Rmd | 39262a4 | IJbeasley | 2025-09-17 | More typo fixing |
html | c0cf9bd | IJbeasley | 2025-09-16 | Build site. |
Rmd | 3519a0b | IJbeasley | 2025-09-16 | Collapsing traits to gbd |
html | f1b18b0 | IJbeasley | 2025-09-16 | Build site. |
Rmd | afe44b4 | IJbeasley | 2025-09-16 | Collapsing traits to gbd |
html | c204ac4 | IJbeasley | 2025-09-16 | Build site. |
Rmd | 7fa03f5 | IJbeasley | 2025-09-16 | More cancer typos |
html | 8f1639b | IJbeasley | 2025-09-16 | Build site. |
Rmd | 345ad9b | IJbeasley | 2025-09-16 | More cancer typos |
html | a15dd40 | IJbeasley | 2025-09-16 | Build site. |
Rmd | 16ead66 | IJbeasley | 2025-09-16 | Correcting some cancer grouping |
html | 6018e42 | IJbeasley | 2025-09-16 | Build site. |
Rmd | 02a0b9d | IJbeasley | 2025-09-16 | Improving cancer grouping |
html | 6f66696 | IJbeasley | 2025-09-16 | Build site. |
Rmd | 66cff1c | IJbeasley | 2025-09-16 | Even more disease term grouping |
html | 21b6c02 | IJbeasley | 2025-09-15 | Build site. |
html | 5ec3111 | IJbeasley | 2025-09-15 | Build site. |
html | 30d773e | IJbeasley | 2025-09-15 | Build site. |
html | 8d64a38 | IJbeasley | 2025-09-15 | Build site. |
Rmd | b3088d8 | IJbeasley | 2025-09-15 | workflowr::wflow_publish("analysis/level_2_disease_group.Rmd") |
html | b89d661 | IJbeasley | 2025-09-10 | Build site. |
Rmd | c0fcab7 | IJbeasley | 2025-09-10 | workflowr::wflow_publish("analysis/level_2_disease_group.Rmd") |
html | ead4d8e | IJbeasley | 2025-09-10 | Build site. |
Rmd | 3964f77 | IJbeasley | 2025-09-10 | workflowr::wflow_publish("analysis/level_2_disease_group.Rmd") |
html | 8fb639d | IJbeasley | 2025-09-10 | Build site. |
Rmd | edeb6f5 | IJbeasley | 2025-09-10 | workflowr::wflow_publish("analysis/level_2_disease_group.Rmd") |
html | fe91704 | IJbeasley | 2025-09-09 | Build site. |
Rmd | 9c64867 | IJbeasley | 2025-09-09 | Minor fixing of disease trait categorisation |
html | fa509c0 | IJbeasley | 2025-09-08 | Build site. |
Rmd | c9602c7 | IJbeasley | 2025-09-08 | More grouping to match GBD |
library(dplyr)
library(data.table)
library(ggplot2)
library(stringr)
source(here::here("code/get_term_descendants.R"))
gwas_study_info <- fread(here::here("output/gwas_cat/gwas_study_info_group_l1_v2.csv"))
gwas_study_info = gwas_study_info |>
mutate(l2_all_disease_terms = l1_all_disease_terms)
gwas_study_info |>
filter(grepl("lip and oral cavity cancer", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
all_disease_terms
<char>
1: oral cavity cancer
2: mouth neoplasm
3: tongue cancer
4: major salivary gland cancer
5: human papilloma virus infection, oral cavity cancer
6: tongue neoplasm
l2_all_disease_terms
<char>
1: lip and oral cavity cancer
2: lip and oral cavity cancer
3: lip and oral cavity cancer
4: lip and oral cavity cancer
5: human papilloma virus infection, lip and oral cavity cancer
6: lip and oral cavity cancer
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = "nasopharyngeal cancer",
"nasopharynx cancer"
)
)
gwas_study_info |>
filter(grepl("nasopharynx cancer", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct()
all_disease_terms l2_all_disease_terms
<char> <char>
1: nasopharyngeal neoplasm nasopharynx cancer
gwas_study_info |>
filter(grepl("other pharynx cancer", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
all_disease_terms
<char>
1: oropharynx cancer
2: laryngeal squamous cell carcinoma, hypopharynx cancer
3: human papilloma virus infection, oropharynx cancer
4: tonsil cancer
5: hypopharyngeal carcinoma
6: pharynx cancer, laryngeal carcinoma
l2_all_disease_terms
<char>
1: other pharynx cancer
2: larynx cancer, other pharynx cancer
3: human papilloma virus infection, other pharynx cancer
4: other pharynx cancer
5: other pharynx cancer
6: larynx cancer, other pharynx cancer
gwas_study_info |>
filter(grepl("esophageal cancer", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
all_disease_terms
<char>
1: esophageal adenocarcinoma, barretts esophagus
2: esophageal adenocarcinoma
3: esophageal adenocarcinoma, gastroesophageal reflux disease
4: esophageal squamous cell carcinoma
5: esophageal carcinoma, gastric carcinoma
6: squamous cell carcinoma, esophageal carcinoma
l2_all_disease_terms
<char>
1: barretts esophagus, esophageal cancer
2: esophageal cancer
3: esophageal cancer, gastroesophageal reflux disease
4: esophageal cancer
5: esophageal cancer, stomach cancer
6: esophageal cancer
gwas_study_info |>
filter(grepl("stomach cancer", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
all_disease_terms
<char>
1: gastric carcinoma
2: esophageal carcinoma, gastric carcinoma
3: gastric cardia carcinoma
4: gastric adenocarcinoma
5: lung carcinoma, squamous cell carcinoma, gastric carcinoma
6: gastric cancer
l2_all_disease_terms
<char>
1: stomach cancer
2: esophageal cancer, stomach cancer
3: stomach cancer
4: stomach cancer
5: lung cancer, stomach cancer
6: stomach cancer
gwas_study_info = gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = "colorectal cancer",
"colon and rectum cancer"
)
)
gwas_study_info |>
filter(grepl("colon and rectum cancer", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
all_disease_terms
<char>
1: colorectal cancer
2: sclerosing cholangitis, colorectal cancer
3: colorectal cancer, colorectal adenoma
4: metastatic colorectal cancer
5: lung carcinoma, estrogen-receptor negative breast cancer, ovarian endometrioid carcinoma, colorectal cancer, prostate carcinoma, ovarian serous carcinoma, breast carcinoma, ovarian carcinoma, lung adenocarcinoma, squamous cell lung carcinoma, cancer
6: rectum cancer
l2_all_disease_terms
<char>
1: colon and rectum cancer
2: colon and rectum cancer, sclerosing cholangitis
3: benign neoplasm, colon and rectum cancer
4: colon and rectum cancer
5: breast cancer, cancer, colon and rectum cancer, lung cancer, ovarian cancer, prostate cancer
6: colon and rectum cancer
gwas_study_info |>
filter(grepl("liver cancer", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
all_disease_terms
<char>
1: hepatitis b virus infection, hepatocellular carcinoma
2: sclerosing cholangitis, cholangiocarcinoma
3: cholangiocarcinoma, sclerosing cholangitis
4: sclerosing cholangitis, hepatocellular carcinoma
5: hepatitis c virus infection, hepatocellular carcinoma
6: hepatocellular carcinoma, non-alcoholic steatohepatitis
l2_all_disease_terms
<char>
1: hepatitis b infection, liver cancer
2: liver cancer, sclerosing cholangitis
3: liver cancer, sclerosing cholangitis
4: liver cancer, sclerosing cholangitis
5: hepatitis c infection, liver cancer
6: liver cancer, non-alcoholic fatty liver disease
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
case_when(
l2_all_disease_terms == "cancer of gallbladder and extrahepatic biliary tract" ~ "gallbladder and biliary tract cancer",
TRUE ~ l2_all_disease_terms
)
)
gwas_study_info |>
filter(grepl("gallbladder and biliary tract cancer", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct()
all_disease_terms
<char>
1: sclerosing cholangitis, gallbladder neoplasm
2: gallbladder neoplasm
l2_all_disease_terms
<char>
1: gallbladder and biliary tract cancer, sclerosing cholangitis
2: gallbladder and biliary tract cancer
gwas_study_info |>
filter(grepl("pancreatic cancer", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
all_disease_terms
<char>
1: pancreatic carcinoma
2: pancreatic ductal adenocarcinoma
3: pancreatic carcinoma, neutropenia
4: breast carcinoma, estrogen-receptor positive breast cancer, metastatic prostate cancer, pancreatic carcinoma, hypertension
5: breast carcinoma, estrogen-receptor positive breast cancer, metastatic prostate cancer, pancreatic carcinoma, proteinuria
6: breast carcinoma, estrogen-receptor positive breast cancer, metastatic prostate cancer, pancreatic carcinoma, hypertension, proteinuria
l2_all_disease_terms
<char>
1: pancreatic cancer
2: pancreatic cancer
3: neutropenia, pancreatic cancer
4: breast cancer, hypertension, pancreatic cancer, prostate cancer
5: breast cancer, pancreatic cancer, prostate cancer, proteinuria
6: breast cancer, hypertension, pancreatic cancer, prostate cancer, proteinuria
gwas_study_info |>
filter(grepl("larynx cancer", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
all_disease_terms
<char>
1: laryngeal squamous cell carcinoma
2: laryngeal squamous cell carcinoma, hypopharynx cancer
3: laryngeal carcinoma
4: laryngeal neoplasm
5: glottis neoplasm
6: pharynx cancer, laryngeal carcinoma
l2_all_disease_terms
<char>
1: larynx cancer
2: larynx cancer, other pharynx cancer
3: larynx cancer
4: larynx cancer
5: larynx cancer
6: larynx cancer, other pharynx cancer
resp_cancer_terms = c("lung cancer",
"bronchus cancer",
"tracheal cancer",
"respiratory system cancer"
)
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
ifelse(l2_all_disease_terms != "tracheal bronchus and lung cancer",
stringr::str_replace_all(l2_all_disease_terms,
pattern = paste0(resp_cancer_terms, collapse = "(?=,|$)|\\b"),
"tracheal bronchus and lung cancer"
),
l2_all_disease_terms
)
)
gwas_study_info |>
filter(grepl("tracheal bronchus and lung cancer", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
all_disease_terms
<char>
1: non-small cell lung carcinoma
2: lung adenocarcinoma
3: squamous cell lung carcinoma
4: lung carcinoma, family history of lung cancer
5: lung adenocarcinoma, family history of lung cancer
6: squamous cell lung carcinoma, family history of lung cancer
l2_all_disease_terms
<char>
1: tracheal bronchus and lung cancer
2: tracheal bronchus and lung cancer
3: tracheal bronchus and lung cancer
4: tracheal bronchus and lung cancer
5: tracheal bronchus and lung cancer
6: tracheal bronchus and lung cancer
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms,
pattern = "malignant melanoma of skin",
"malignant skin melanoma"
)
)
gwas_study_info |>
filter(grepl("malignant skin melanoma", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
all_disease_terms
<char>
1: cutaneous melanoma
2: melanoma
3: neuroblastoma, cutaneous melanoma
4: skin cancer
5: skin carcinoma
6: melanoma, immune system toxicity
l2_all_disease_terms
<char>
1: malignant skin melanoma
2: malignant skin melanoma
3: malignant skin melanoma, neuroblastoma
4: malignant skin melanoma, non-melanoma skin cancer
5: malignant skin melanoma, non-melanoma skin cancer
6: immune system toxicity, malignant skin melanoma
gwas_study_info |>
filter(grepl("non-melanoma skin cancer", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct()
all_disease_terms
<char>
1: squamous cell carcinoma, basal cell carcinoma
2: keratinocyte carcinoma
3: basal cell carcinoma
4: non-melanoma skin carcinoma
5: skin cancer
6: skin neoplasm
7: skin carcinoma in situ
8: skin carcinoma
l2_all_disease_terms
<char>
1: non-melanoma skin cancer
2: non-melanoma skin cancer
3: non-melanoma skin cancer
4: non-melanoma skin cancer
5: malignant skin melanoma, non-melanoma skin cancer
6: non-melanoma skin cancer
7: non-melanoma skin cancer
8: malignant skin melanoma, non-melanoma skin cancer
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = "soft tissue sarcoma",
"soft tissue and other extraosseous sarcomas"
)
)
gwas_study_info |>
filter(grepl("soft tissue and other extraosseous sarcomas", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
all_disease_terms l2_all_disease_terms
<char> <char>
1: sarcoma, fibrosarcoma sarcoma, soft tissue and other extraosseous sarcomas
2: kaposis sarcoma soft tissue and other extraosseous sarcomas
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = "bone cancer|osteosarcoma",
"malignant neoplasm of bone and articular cartilage"
)
)
gwas_study_info |>
filter(grepl("malignant neoplasm of bone and articular cartilage", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
all_disease_terms
<char>
1: osteosarcoma
2: acute myeloid leukemia
3: myeloid leukemia
4: malignant bone neoplasm
5: acute myeloid leukemia, myelodysplastic syndrome
6: bone neoplasm
l2_all_disease_terms
<char>
1: malignant neoplasm of bone and articular cartilage
2: malignant neoplasm of bone and articular cartilage
3: malignant neoplasm of bone and articular cartilage
4: malignant neoplasm of bone and articular cartilage
5: malignant neoplasm of bone and articular cartilage, myelodysplastic syndrome
6: malignant neoplasm of bone and articular cartilage
gwas_study_info |>
filter(grepl("breast cancer", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
all_disease_terms
<char>
1: estrogen-receptor negative breast cancer
2: breast carcinoma
3: estrogen-receptor positive breast cancer
4: breast carcinoma,
5: estrogen-receptor positive breast cancer, breast carcinoma
6: estrogen-receptor negative breast cancer, breast carcinoma
l2_all_disease_terms
<char>
1: breast cancer
2: breast cancer
3: breast cancer
4: breast cancer
5: breast cancer
6: breast cancer
gwas_study_info |>
filter(grepl("cervical cancer", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
all_disease_terms
<char>
1: cervical carcinoma
2: cervical cancer
3: dysplasia of cervix, cervical cancer
4: dysplasia, cervical cancer
5: uterine cervix carcinoma in situ
6: cervical carcinoma, human papilloma virus infection
l2_all_disease_terms
<char>
1: cervical cancer
2: cervical cancer
3: cervical cancer, dysplasia of cervix
4: cervical cancer, dysplasia
5: cervical cancer
6: cervical cancer, human papilloma virus infection
# ? is endometrial cancer a subset of uterine cancer for GBD?
# is for ontology: http://purl.obolibrary.org/obo/MONDO_0002715
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = "endometrial cancer",
"uterine cancer"
)
)
gwas_study_info |>
filter(grepl("uterine cancer", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
all_disease_terms l2_all_disease_terms
<char> <char>
1: endometrial endometrioid carcinoma uterine cancer
2: endometrial carcinoma uterine cancer
3: endometrial neoplasm uterine cancer
4: uterine carcinoma uterine cancer
5: endometrial cancer, covid-19 uterine cancer
6: uterine corpus cancer uterine cancer
gwas_study_info |>
filter(grepl("ovarian cancer", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
all_disease_terms
<char>
1: ovarian carcinoma
2: malignant epithelial tumor of ovary
3: prostate carcinoma, breast carcinoma, ovarian carcinoma
4: lung carcinoma, estrogen-receptor negative breast cancer, ovarian endometrioid carcinoma, colorectal cancer, prostate carcinoma, ovarian serous carcinoma, breast carcinoma, ovarian carcinoma, lung adenocarcinoma, squamous cell lung carcinoma, cancer
5: ovarian mucinous adenocarcinoma
6: ovarian serous carcinoma
l2_all_disease_terms
<char>
1: ovarian cancer
2: ovarian cancer
3: breast cancer, ovarian cancer, prostate cancer
4: breast cancer, cancer, colon and rectum cancer, tracheal bronchus and lung cancer, ovarian cancer, prostate cancer
5: ovarian cancer
6: ovarian cancer
gwas_study_info |>
filter(grepl("prostate cancer", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
all_disease_terms
<char>
1: prostate carcinoma
2: cancer aggressiveness measurement, prostate carcinoma
3: prostate carcinoma, breast carcinoma, ovarian carcinoma
4: metastatic prostate cancer, peripheral neuropathy
5: metastatic prostate cancer
6: prostate carcinoma, erectile dysfunction
l2_all_disease_terms
<char>
1: prostate cancer
2: measurement, prostate cancer
3: breast cancer, ovarian cancer, prostate cancer
4: peripheral neuropathy, prostate cancer
5: prostate cancer
6: erectile dysfunction, prostate cancer
gwas_study_info |>
filter(grepl("testicular cancer", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct()
all_disease_terms
<char>
1: testicular carcinoma
2: testicular carcinoma, cardiovascular disease
3: testicular neoplasm
l2_all_disease_terms
<char>
1: testicular cancer
2: cardiovascular disease, testicular cancer
3: testicular cancer
gwas_study_info |>
filter(grepl("kidney cancer", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
all_disease_terms l2_all_disease_terms
<char> <char>
1: renal cell carcinoma kidney cancer
2: nephroblastoma kidney cancer
3: kidney cancer kidney cancer
4: renal carcinoma kidney cancer
5: papillary renal cell carcinoma kidney cancer
6: kidney neoplasm kidney cancer
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = "urinary bladder cancer",
"bladder cancer"
)
)
gwas_study_info |>
filter(grepl("bladder cancer", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct()
all_disease_terms
<char>
1: urinary bladder carcinoma
2: urinary bladder carcinoma, disease progression measurement
3: urinary bladder cancer,
4: urinary bladder cancer
l2_all_disease_terms
<char>
1: bladder cancer
2: bladder cancer
3: bladder cancer
4: bladder cancer
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = "\\bcentral nervous system cancer\\b",
"brain and central nervous system cancer"
)
)
gwas_study_info |>
filter(grepl("brain and central nervous system cancer", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
all_disease_terms
<char>
1: glioblastoma multiforme
2: central nervous system cancer
3: glioma
4: central nervous system cancer, glioma
5: central nervous system cancer, glioblastoma multiforme
6: brain neoplasm
l2_all_disease_terms
<char>
1: brain and central nervous system cancer
2: brain and central nervous system cancer
3: brain and central nervous system cancer
4: brain and central nervous system cancer
5: brain and central nervous system cancer
6: brain and central nervous system cancer
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = "\\bocular melanoma\\b|ocular cancer\\b",
"eye cancer"
)
)
gwas_study_info |>
filter(grepl("eye cancer", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
all_disease_terms l2_all_disease_terms
<char> <char>
1: uveal melanoma eye cancer
2: choroidal melanoma eye cancer
3: epithelioid cell uveal melanoma eye cancer
4: uveal melanoma, epithelioid cell uveal melanoma eye cancer
5: uveal melanoma disease severity eye cancer
6: ocular cancer eye cancer
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = "\\bneuroblastoma\\b|\\bperipheral nervous system cancer\\b",
"neuroblastoma and other peripheral nervous cell tumors"
)
)
gwas_study_info |>
filter(grepl("neuroblastoma and other peripheral nervous cell tumors", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
all_disease_terms
<char>
1: neuroblastoma
2: neuroblastoma, cutaneous melanoma
l2_all_disease_terms
<char>
1: neuroblastoma and other peripheral nervous cell tumors
2: malignant skin melanoma, neuroblastoma and other peripheral nervous cell tumors
gwas_study_info |>
filter(grepl("thyroid cancer", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
all_disease_terms l2_all_disease_terms
<char> <char>
1: differentiated thyroid carcinoma thyroid cancer
2: papillary thyroid carcinoma thyroid cancer
3: follicular thyroid carcinoma thyroid cancer
4: thyroid carcinoma thyroid cancer
5: thyroid cancer thyroid cancer
gwas_study_info |>
filter(grepl("mesothelioma", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
all_disease_terms l2_all_disease_terms
<char> <char>
1: malignant pleural mesothelioma mesothelioma
2: mesothelioma mesothelioma
3: pleural mesothelioma mesothelioma
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = "hodgkins lymphoma",
"hodgkin lymphoma"
)
)
gwas_study_info |>
filter(grepl("hodgkin lymphoma", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
all_disease_terms
<char>
1: diffuse large b-cell lymphoma, multiple sclerosis
2: follicular lymphoma, multiple sclerosis
3: marginal zone b-cell lymphoma, multiple sclerosis
4: diffuse large b-cell lymphoma, rheumatoid arthritis
5: rheumatoid arthritis, follicular lymphoma
6: rheumatoid arthritis, marginal zone b-cell lymphoma
l2_all_disease_terms
<char>
1: multiple sclerosis, non-hodgkin lymphoma
2: multiple sclerosis, non-hodgkin lymphoma
3: multiple sclerosis, non-hodgkin lymphoma
4: non-hodgkin lymphoma, rheumatoid arthritis
5: non-hodgkin lymphoma, rheumatoid arthritis
6: non-hodgkin lymphoma, rheumatoid arthritis
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = "non-hodgkins lymphoma",
"non-hodgkin lymphoma"
)
)
gwas_study_info |>
filter(grepl("non-hodgkin lymphoma", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
all_disease_terms
<char>
1: diffuse large b-cell lymphoma, multiple sclerosis
2: follicular lymphoma, multiple sclerosis
3: marginal zone b-cell lymphoma, multiple sclerosis
4: diffuse large b-cell lymphoma, rheumatoid arthritis
5: rheumatoid arthritis, follicular lymphoma
6: rheumatoid arthritis, marginal zone b-cell lymphoma
l2_all_disease_terms
<char>
1: multiple sclerosis, non-hodgkin lymphoma
2: multiple sclerosis, non-hodgkin lymphoma
3: multiple sclerosis, non-hodgkin lymphoma
4: non-hodgkin lymphoma, rheumatoid arthritis
5: non-hodgkin lymphoma, rheumatoid arthritis
6: non-hodgkin lymphoma, rheumatoid arthritis
gwas_study_info |>
filter(grepl("multiple myeloma", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
all_disease_terms
<char>
1: multiple myeloma
2: multiple myeloma, peripheral neuropathy
3: multiple myeloma, chemotherapy-induced oral mucositis
4: multiple myeloma, monoclonal gammopathy
5: hodgkins lymphoma, multiple myeloma, chronic lymphocytic leukemia
6: hodgkins lymphoma, multiple myeloma, non-hodgkins lymphoma
l2_all_disease_terms
<char>
1: multiple myeloma
2: multiple myeloma, peripheral neuropathy
3: chemotherapy-induced oral mucositis, multiple myeloma
4: monoclonal gammopathy, multiple myeloma
5: hodgkin lymphoma, leukemia, multiple myeloma
6: hodgkin lymphoma, multiple myeloma, non-hodgkin lymphoma
gwas_study_info |>
filter(grepl("leukemia", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
all_disease_terms
<char>
1: acute lymphoblastic leukemia
2: multiple sclerosis, chronic lymphocytic leukemia
3: rheumatoid arthritis, chronic lymphocytic leukemia
4: systemic lupus erythematosus, chronic lymphocytic leukemia
5: acute lymphoblastic leukemia, asparaginase-induced acute pancreatitis
6: chronic lymphocytic leukemia
l2_all_disease_terms
<char>
1: leukemia
2: leukemia, multiple sclerosis
3: leukemia, rheumatoid arthritis
4: leukemia, systemic lupus erythematosus
5: leukemia, pancreatitis
6: leukemia
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
case_when(
l2_all_disease_terms == "cancer" ~ "other malignant neoplasms",
TRUE ~ l2_all_disease_terms
)
)
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
ifelse(PUBMED_ID == 27790247,
stringr::str_replace_all(l2_all_disease_terms,
pattern = ", cancer,",
", other malignant neoplasms,"
),
l2_all_disease_terms
)
)
### dealing with measuring cancer caused factor terms
gwas_study_info |>
filter(grepl("^cancer,", l2_all_disease_terms)) |>
pull(l2_all_disease_terms) |>
unique()
[1] "cancer, chronic obstructive pulmonary disease"
[2] "cancer, cardiotoxicity"
[3] "cancer, hand-foot syndrome"
[4] "cancer, peripheral neuropathy"
[5] "cancer, immune system toxicity"
[6] "cancer, hypothyroidism"
[7] "cancer, radiation-induced disorder"
[8] "cancer, osteonecrosis"
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
ifelse(grepl("^cancer,", l2_all_disease_terms),
stringr::str_replace_all(l2_all_disease_terms,
pattern = "^cancer,",
"other malignant neoplasms,"
),
l2_all_disease_terms
)
)
other_malignant_terms <- c(
"retroperitoneal cancer",
"peritoneal cancer",
"ewing sarcoma",
"digestive system cancer",
"intestinal cancer",
"small intestine cancer",
"female reproductive organ cancer",
"male reproductive organ cancer",
"vulvar cancer",
"testicular germ cell tumor",
"urogenital cancer",
"squamous cell cancer",
"head and neck cancer",
"malignant tumor of floor of mouth",
"nasal cavity cancer", #? not sure if should be somewhere else ..
"malignant lymphoid tumor",
"neuroendocrine tumor",
"lymphatic system cancer",
"childhood cancer" #? maybe sort furtrher
)
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = paste0(other_malignant_terms, collapse = "(?=,|$)|\\b"),
"other malignant neoplasms"
)
)
gwas_study_info |>
filter(grepl("other malignant neoplasms", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
all_disease_terms
<char>
1: squamous cell carcinoma
2: cancer
3: childhood cancer, cardiomyopathy
4: neuroendocrine neoplasm
5: small intestine neuroendocrine tumor
6: pancreatic neuroendocrine tumor
l2_all_disease_terms
<char>
1: other malignant neoplasms
2: other malignant neoplasms
3: cardiomyopathy, other malignant neoplasms
4: other malignant neoplasms
5: other malignant neoplasms
6: other malignant neoplasms
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
case_when(
l2_all_disease_terms == "benign neoplasm" ~ "other neoplasms",
TRUE ~ l2_all_disease_terms
)
)
unknown_sig_terms <- c("intracranial germ cell tumor",
"bladder tumor")
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = paste0(unknown_sig_terms, collapse = "(?=,|$)|\\b"),
"other neoplasms"
)
)
gwas_study_info |>
filter(grepl("other neoplasms", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
all_disease_terms l2_all_disease_terms
<char> <char>
1: benign prostatic hyperplasia other neoplasms
2: colorectal adenoma other neoplasms
3: colorectal cancer, endometrial neoplasm other neoplasms
4: upper aerodigestive tract neoplasm other neoplasms
5: meningioma other neoplasms
6: pituitary gland adenoma other neoplasms
gwas_study_info |>
filter(grepl("rheumatic heart disease", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
all_disease_terms l2_all_disease_terms
<char> <char>
1: rheumatic heart disease rheumatic heart disease
gwas_study_info |>
filter(grepl("ischemic heart disease", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
all_disease_terms
<char>
1: non-obstructive coronary artery disease
2: migraine disorder, coronary artery disease
3: coronary artery disease
4: type 2 diabetes mellitus, obesity, coronary artery disease
5: alzheimer disease, type 2 diabetes mellitus, coronary artery disease
6: myocardial infarction
l2_all_disease_terms
<char>
1: non-obstructive ischemic heart disease
2: ischemic heart disease, migraine
3: ischemic heart disease
4: ischemic heart disease, obesity, type 2 diabetes mellitus
5: alzheimers disease, ischemic heart disease, type 2 diabetes mellitus
6: ischemic heart disease
== coronary artery disease (https://www.ncbi.nlm.nih.gov/books/NBK209964/)
gwas_study_info = gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms,
pattern = "coronary artery disease",
"ischemic heart disease"
)
)
gwas_study_info |>
filter(grepl("stroke", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
all_disease_terms
<char>
1: non-lobar intracerebral hemorrhage
2: lobar intracerebral hemorrhage
3: small vessel stroke
4: alzheimer disease, small vessel stroke
5: ischemic stroke
6: ischemic stroke, cardiac embolism
l2_all_disease_terms
<char>
1: non-hemorrhagic stroke
2: hemorrhagic stroke
3: small vessel stroke
4: alzheimers disease, small vessel stroke
5: ischemic stroke
6: cardiac embolism, ischemic stroke
gwas_study_info |>
filter(grepl("hypertensive heart disease", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
all_disease_terms
<char>
1: hypertensive heart disease, kidney disease
2: hypertensive heart disease
l2_all_disease_terms
<char>
1: hypertensive heart disease, kidney disease
2: hypertensive heart disease
gwas_study_info |>
filter(grepl("heart valve disease", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
all_disease_terms
<char>
1: aortic stenosis, aortic valve calcification
2: aortic stenosis
3: heart valve disease
4: heart valve disease, heart murmur
l2_all_disease_terms
<char>
1: aortic valve calcification, heart valve disease
2: heart valve disease
3: heart valve disease
4: heart murmur, heart valve disease
gwas_study_info = gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = "heart valve disease",
"non-rheumatic valvular heart disease"
)
)
gwas_study_info |>
filter(grepl("non-rheumatic valvular heart disease", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
all_disease_terms
<char>
1: aortic stenosis, aortic valve calcification
2: aortic stenosis
3: heart valve disease
4: heart valve disease, heart murmur
l2_all_disease_terms
<char>
1: aortic valve calcification, non-rheumatic valvular heart disease
2: non-rheumatic valvular heart disease
3: non-rheumatic valvular heart disease
4: heart murmur, non-rheumatic valvular heart disease
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = "\\bcardiomyopathy\\b|\\bmyocarditis\\b",
"cardiomyopathy and myocarditis"
)
)
gwas_study_info |>
filter(grepl("cardiomyopathy and myocarditis", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
all_disease_terms
<char>
1: idiopathic dilated cardiomyopathy
2: childhood cancer, cardiomyopathy
3: hypertrophic cardiomyopathy
4: dilated cardiomyopathy
5: chagas cardiomyopathy
6: peripartum cardiomyopathy
l2_all_disease_terms
<char>
1: idiopathic dilated cardiomyopathy and myocarditis
2: cardiomyopathy and myocarditis, other malignant neoplasms
3: hypertrophic cardiomyopathy and myocarditis
4: dilated cardiomyopathy and myocarditis
5: chagas cardiomyopathy and myocarditis
6: peripartum cardiomyopathy and myocarditis
gwas_study_info |>
filter(grepl("pulmonary arterial hypertension", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
all_disease_terms l2_all_disease_terms
<char> <char>
1: pulmonary arterial hypertension pulmonary arterial hypertension
afib_terms <- c("atrial fibrillation",
"atrial flutter",
"post-operative atrial fibrillation")
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = paste0(afib_terms, collapse = "(?=,|$)|\\b"),
"atrial fibrillation and flutter"
)
)
gwas_study_info |>
filter(grepl("atrial fibrillation and flutter", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
all_disease_terms
<char>
1: atrial fibrillation
2: heart failure, diabetes mellitus, stroke, atrial fibrillation, coronary artery disease, cancer
3: stroke, atrial fibrillation, coronary artery disease, heart failure, diabetes mellitus, cancer
4: atrial flutter
5: post-operative atrial fibrillation
l2_all_disease_terms
<char>
1: atrial fibrillation and flutter
2: atrial fibrillation and flutter, other malignant neoplasms, diabetes mellitus, heart failure, ischemic heart disease, stroke
3: atrial fibrillation and flutter, other malignant neoplasms, diabetes mellitus, heart failure, ischemic heart disease, stroke
4: atrial fibrillation and flutter
5: atrial fibrillation and flutter, post-operative
url <- "http://www.ebi.ac.uk/ols4/api/ontologies/cvdo/terms/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FDOID_3627/descendants"
aortic_aneurysm_terms <- get_descendants(url)
[1] "Number of terms collected:"
[1] 6
[1] "\n Some example terms"
[1] "ruptured thoracoabdominal aortic aneurysm"
[2] "ruptured abdominal aortic aneurysm"
[3] "ruptured thoracic aortic aneurysm"
[4] "abdominal aortic aneurysm"
[5] "ruptured aortic aneurysm"
gwas_study_info = gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = paste0(aortic_aneurysm_terms, collapse = "(?=,|$)|\\b"),
"aortic aneurysm"
)
)
gwas_study_info |>
filter(grepl("aortic aneurysm", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
all_disease_terms
<char>
1: thoracic aortic aneurysm, abdominal aortic aneurysm, brain aneurysm
2: abdominal aortic aneurysm
3: thoracic aortic aneurysm
4: aortic aneurysm
5: marfan syndrome, thoracic aortic aneurysm
l2_all_disease_terms
<char>
1: aortic aneurysm, brain aneurysm, aortic aneurysm
2: aortic aneurysm
3: aortic aneurysm
4: aortic aneurysm
5: marfan syndrome, aortic aneurysm
gwas_study_info |>
filter(grepl("lower extremity peripheral arterial disease", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
all_disease_terms
<char>
1: peripheral arterial disease
2: type 2 diabetes mellitus, peripheral arterial disease
3: diabetes mellitus, peripheral arterial disease
l2_all_disease_terms
<char>
1: lower extremity peripheral arterial disease
2: lower extremity peripheral arterial disease, type 2 diabetes mellitus
3: diabetes mellitus, lower extremity peripheral arterial disease
gwas_study_info |>
filter(grepl("endocarditis", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
all_disease_terms
<char>
1: staphylococcus aureus infection, bacterial endocarditis
2: endocarditis
l2_all_disease_terms
<char>
1: endocarditis, staphylococcus aureus infection
2: endocarditis
other_cardiovascular_terms <- c("tachycardia",
"other cardiac arrhythmias",
"heart block",
"carotid artery disease",
"(?=,|^)hypertension",
"pericarditis",
"coronary artery calcification",
"arterial occlusion",
"congenital heart disease",
"other vascular disorders",
"congestive heart failure",
"heart failure",
"thrombotic diseas",
"arterial embolism",
"cardiac embolism",
"venus embolism",
"venus thrombosis",
"pulmonary embolism",
"arterial thrombosis",
"thromboembolism",
"vascular insufficiency",
"brain infarction",
"heart murmur"
)
other_cardiovascular_terms <- str_length_sort(other_cardiovascular_terms)
other_cardiovascular_terms <- paste0("(?=, |^)", other_cardiovascular_terms)
other_cardiovascular_terms <- paste0(other_cardiovascular_terms, "(?=,|$)")
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = paste0(other_cardiovascular_terms, collapse = "|"),
"other cardiovascular and circulatory diseases"
)
)
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = "(?=, |^)other vascular disorders(?=,|$)",
"other cardiovascular and circulatory diseases"
)
)
gwas_study_info |>
filter(grepl("other cardiovascular and circulatory diseases", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
all_disease_terms
<char>
1: hypertension
2: ischemic stroke, cardiac embolism
3: ischemic stroke, small artery occlusion
4: congenital heart disease
5: type 2 diabetes mellitus, coronary artery calcification
6: heart failure
l2_all_disease_terms
<char>
1: other cardiovascular and circulatory diseases
2: other cardiovascular and circulatory diseases, ischemic stroke
3: other cardiovascular and circulatory diseases, ischemic stroke
4: other cardiovascular and circulatory diseases
5: other cardiovascular and circulatory diseases, type 2 diabetes mellitus
6: other cardiovascular and circulatory diseases
gwas_study_info |>
filter(grepl("chronic obstructive pulmonary disease", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
all_disease_terms
<char>
1: chronic obstructive pulmonary disease
2: chronic obstructive pulmonary disease, chronic bronchitis
3: digestive system carcinoma, chronic obstructive pulmonary disease
4: cancer, chronic obstructive pulmonary disease
5: lung carcinoma, chronic obstructive pulmonary disease
6: asthma, chronic obstructive pulmonary disease
l2_all_disease_terms
<char>
1: chronic obstructive pulmonary disease
2: chronic bronchitis, chronic obstructive pulmonary disease
3: chronic obstructive pulmonary disease, other malignant neoplasms
4: other malignant neoplasms, chronic obstructive pulmonary disease
5: chronic obstructive pulmonary disease, tracheal bronchus and lung cancer
6: asthma, chronic obstructive pulmonary disease
gwas_study_info |>
filter(grepl("pneumoconiosis", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
all_disease_terms l2_all_disease_terms
<char> <char>
1: pneumoconiosis pneumoconiosis
gwas_study_info |>
filter(grepl("asthma", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
all_disease_terms
<char>
1: asthma
2: asthma, allergic disease
3: childhood onset asthma
4: childhood onset asthma, respiratory symptom change measurement
5: age of onset of asthma
6: aspirin-induced asthma
l2_all_disease_terms
<char>
1: asthma
2: allergic disease, asthma
3: asthma
4: asthma
5: asthma
6: asthma
interstitial_lung_disease_terms <- c("pulmonary sarcoidosis",
"interstitial lung disease"
)
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = paste0(interstitial_lung_disease_terms,
collapse = "(?=,|$)|\\b"),
"interstitial lung disease and pulmonary sarcoidosis"
)
)
gwas_study_info |>
filter(grepl("interstitial lung disease and pulmonary sarcoidosis", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
all_disease_terms
<char>
1: rheumatoid arthritis, interstitial lung disease
2: interstitial lung disease
3: systemic scleroderma, interstitial lung disease
l2_all_disease_terms
<char>
1: interstitial lung disease and pulmonary sarcoidosis, rheumatoid arthritis
2: interstitial lung disease and pulmonary sarcoidosis
3: interstitial lung disease and pulmonary sarcoidosis, systemic scleroderma
gwas_study_info |>
filter(grepl("other chronic respiratory diseases", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
Empty data.table (0 rows and 2 cols): all_disease_terms,l2_all_disease_terms
url <- "http://www.ebi.ac.uk/ols4/api/ontologies/snomed/terms/http%253A%252F%252Fsnomed.info%252Fid%252F328383001/descendants"
chronic_liver_disease_terms <- get_descendants(url)
[1] "Number of terms collected:"
[1] 114
[1] "\n Some example terms"
[1] "hepatic ascites co-occurrent with chronic active hepatitis due to toxic liver disease"
[2] "cirrhosis of liver co-occurrent and due to primary sclerosing cholangitis (disorder)"
[3] "chronic hepatitis c co-occurrent with human immunodeficiency virus infection"
[4] "primary biliary cirrhosis co-occurrent with systemic scleroderma (disorder)"
[5] "pulmonary fibrosis, hepatic hyperplasia, bone marrow hypoplasia syndrome"
chronic_liver_disease_terms <- c("primary biliary cirrhosis",
"alcoholic liver cirrhosis",
"chronic hepatitis B virus infection",
"acute-on-chronic liver failure",
"non-alcoholic fatty liver disease",
"cirrhosis of liver",
"primary biliary cirrhosis",
"chronic hepatitis",
"liver disease",
chronic_liver_disease_terms)
chronic_liver_disease_terms <- str_length_sort(chronic_liver_disease_terms)
chronic_liver_disease_terms <- paste0("(?=, |^)", chronic_liver_disease_terms)
chronic_liver_disease_terms <- paste0(chronic_liver_disease_terms, "(?=,|$)")
pattern = paste0(chronic_liver_disease_terms, collapse = "|")
gwas_study_info = gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms,
pattern = pattern,
"cirrhosis and other chronic liver diseases"
)
)
gwas_study_info = gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms,
pattern = "(?=, |^)liver disease(?=,|$)",
"cirrhosis and other chronic liver diseases"
)
)
gwas_study_info |>
filter(grepl("cirrhosis and other chronic liver diseases", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
all_disease_terms
<char>
1: alcoholic liver cirrhosis, alcoholic pancreatitis
2: primary biliary cirrhosis
3: non-alcoholic fatty liver disease severity measurement
4: non-alcoholic steatohepatitis
5: non-alcoholic fatty liver disease
6: alcoholic liver cirrhosis
l2_all_disease_terms
<char>
1: cirrhosis and other chronic liver diseases, alcoholic pancreatitis
2: cirrhosis and other chronic liver diseases
3: cirrhosis and other chronic liver diseases
4: cirrhosis and other chronic liver diseases
5: cirrhosis and other chronic liver diseases
6: cirrhosis and other chronic liver diseases
upper_dig_terms <- c("peptic ulcer diseases",
"duodenitis",
"gastritis",
"gastroesophageal reflux disease")
gwas_study_info = gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms,
pattern = paste0(upper_dig_terms, collapse = "(?=,|$)|\\b"),
"upper digestive system diseases"
)
)
gwas_study_info |>
filter(grepl("upper digestive system diseases", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
all_disease_terms
<char>
1: gastroesophageal reflux disease
2: esophageal adenocarcinoma, gastroesophageal reflux disease
3: barretts esophagus, gastroesophageal reflux disease
4: gastritis
5: atrophic gastritis
6: gastroesophageal reflux disease, major depressive disorder
l2_all_disease_terms
<char>
1: upper digestive system diseases
2: esophageal cancer, upper digestive system diseases
3: barretts esophagus, upper digestive system diseases
4: upper digestive system diseases
5: atrophic upper digestive system diseases
6: upper digestive system diseases, major depressive disorder
gwas_study_info |>
filter(grepl("appendicitis", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
all_disease_terms l2_all_disease_terms
<char> <char>
1: appendicitis appendicitis
gwas_study_info = gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms,
pattern = "\\bparalytic ileus\\b|\\bintestinal obstruction\\b",
"paralytic ileus and intestinal obstruction"
)
)
gwas_study_info |>
filter(grepl("paralytic ileus and intestinal obstruction", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
all_disease_terms l2_all_disease_terms
<char> <char>
1: intestinal obstruction paralytic ileus and intestinal obstruction
2: paralytic ileus paralytic ileus and intestinal obstruction
3: intestinal impaction paralytic ileus and intestinal obstruction
hernia_terms <- c("inguinal hernia",
"femoral hernia",
"abdominal hernia",
"hernia of the abdominal wall")
gwas_study_info = gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms,
pattern = paste0(hernia_terms, collapse = "(?=,|$)|\\b"),
"inguinal femoral and abdominal hernia"
)
)
gwas_study_info |>
filter(grepl("inguinal femoral and abdominal hernia", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
all_disease_terms l2_all_disease_terms
<char> <char>
1: inguinal hernia inguinal femoral and abdominal hernia
2: hernia of the abdominal wall inguinal femoral and abdominal hernia
3: femoral hernia inguinal femoral and abdominal hernia
ibd_terms <- c("crohns disease",
"ulcerative colitis",
"inflammatory bowel disease")
gwas_study_info = gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms,
pattern = paste0(ibd_terms, collapse = "(?=,|$)|\\b"),
"inflammatory bowel disease"
)
)
gwas_study_info |>
filter(grepl("vascular intestinal disorders", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
Empty data.table (0 rows and 2 cols): all_disease_terms,l2_all_disease_terms
gal_bile_terms = c("gallbladder disease",
"bile duct disorder",
"biliary tract disease")
gwas_study_info = gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms,
pattern = paste0(gal_bile_terms, collapse = "(?=,|$)|\\b"),
"gallbladder and biliary diseases"
)
)
gwas_study_info |>
filter(grepl("gallbladder and biliary diseases", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
all_disease_terms
<char>
1: bile duct disorder
2: gallbladder disease
3: biliary tract disease
4: non-neoplastic bile duct disorder
5: biliary tract disease, pancreas disease, liver disease
l2_all_disease_terms
<char>
1: gallbladder and biliary diseases
2: gallbladder and biliary diseases
3: gallbladder and biliary diseases
4: non-neoplastic gallbladder and biliary diseases
5: gallbladder and biliary diseases, liver disease, pancreas disease
gwas_study_info |>
filter(grepl("pancreatitis", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
all_disease_terms
<char>
1: alcoholic liver cirrhosis, alcoholic pancreatitis, alcohol dependence
2: alcoholic liver cirrhosis, alcoholic pancreatitis
3: inflammatory bowel disease, pancreatitis
4: acute lymphoblastic leukemia, asparaginase-induced acute pancreatitis
5: autoimmune pancreatitis type 1, salivary gland lesion, lachrymal gland lesion
6: alcoholic pancreatitis
l2_all_disease_terms
<char>
1: alcohol dependence, alcoholic liver cirrhosis, alcoholic pancreatitis
2: cirrhosis and other chronic liver diseases, alcoholic pancreatitis
3: inflammatory bowel disease, pancreatitis
4: leukemia, pancreatitis
5: autoimmune pancreatitis, lachrymal gland lesion, salivary gland lesion
6: alcoholic pancreatitis
gwas_study_info |>
filter(grepl("other digestive diseases", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
Empty data.table (0 rows and 2 cols): all_disease_terms,l2_all_disease_terms
dementia <- c("alzheimers disease biomarker measurement",
"alzheimers disease neuropathologic change",
"aids dementia",
"dementia",
"frontotemporal dementia",
"lewy body dementia",
"vascular dementia",
"alzheimers disease"
)
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = paste0(dementia, collapse = "(?=,|$)|\\b"),
"alzheimer's disease and other dementias"
)
)
gwas_study_info |>
filter(grepl("alzheimer's disease and other dementias", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
all_disease_terms
<char>
1: alzheimer disease
2: age of onset of alzheimer disease
3: hypertension, alzheimer disease
4: family history of alzheimers disease
5: alzheimer disease, family history of alzheimers disease
6: lewy body dementia
l2_all_disease_terms
<char>
1: alzheimer's disease and other dementias
2: alzheimer's disease and other dementias
3: alzheimer's disease and other dementias, hypertension
4: alzheimer's disease and other dementias
5: alzheimer's disease and other dementias
6: alzheimer's disease and other dementias
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = "parkinsons disease",
"parkinson's disease"
)
)
gwas_study_info |>
filter(grepl("parkinson's disease", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
all_disease_terms
<char>
1: parkinson disease
2: age of onset of parkinson disease
3: frontotemporal dementia, parkinson disease
4: lewy body attribute, parkinson disease
5: schizophrenia, parkinson disease
6: dementia, parkinson disease, disease progression measurement
l2_all_disease_terms
<char>
1: parkinson's disease
2: parkinson's disease
3: alzheimer's disease and other dementias, parkinson's disease
4: alzheimer's disease and other dementias, parkinson's disease
5: parkinson's disease, schizophrenia
6: alzheimer's disease and other dementias, parkinson's disease
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = "idiopathic generalized epilepsy",
"idiopathic epilepsy"
)
)
gwas_study_info |>
filter(grepl("idiopathic epilepsy", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
all_disease_terms l2_all_disease_terms
<char> <char>
1: juvenile absence epilepsy idiopathic epilepsy
gwas_study_info |>
filter(grepl("multiple sclerosis", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
all_disease_terms
<char>
1: multiple sclerosis
2: type 2 diabetes mellitus, multiple sclerosis
3: multiple sclerosis, chronic lymphocytic leukemia
4: diffuse large b-cell lymphoma, multiple sclerosis
5: follicular lymphoma, multiple sclerosis
6: marginal zone b-cell lymphoma, multiple sclerosis
l2_all_disease_terms
<char>
1: multiple sclerosis
2: multiple sclerosis, type 2 diabetes mellitus
3: leukemia, multiple sclerosis
4: multiple sclerosis, non-hodgkin lymphoma
5: multiple sclerosis, non-hodgkin lymphoma
6: multiple sclerosis, non-hodgkin lymphoma
gwas_study_info |>
filter(grepl("motor neuron disease", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
all_disease_terms l2_all_disease_terms
<char> <char>
1: motor neuron disease motor neuron disease
gwas_study_info = gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms,
pattern = "\\bheadache disorder\\b|cluster headache\\b|migraine\\b",
"headache disorders"
)
)
gwas_study_info |>
filter(grepl("headache disorders", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
all_disease_terms
<char>
1: migraine disorder, coronary artery disease
2: migraine without aura
3: migraine disorder
4: migraine with aura
5: cluster headache
6: bipolar disorder, migraine disorder
l2_all_disease_terms
<char>
1: ischemic heart disease, headache disorders
2: headache disorders
3: headache disorders
4: headache disorders
5: headache disorders
6: bipolar disorder, headache disorders
gwas_study_info |>
filter(grepl("other neurological disorders", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
Empty data.table (0 rows and 2 cols): all_disease_terms,l2_all_disease_terms
gwas_study_info |>
filter(grepl("schizophrenia", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
all_disease_terms
<char>
1: schizophrenia
2: major depressive disorder, schizophrenia, bipolar disorder
3: schizophrenia, bipolar disorder
4: schizophrenia, major depressive disorder
5: schizophrenia, type 2 diabetes mellitus
6: treatment refractory schizophrenia, drug-induced agranulocytosis
l2_all_disease_terms
<char>
1: schizophrenia
2: bipolar disorder, major depressive disorder, schizophrenia
3: bipolar disorder, schizophrenia
4: major depressive disorder, schizophrenia
5: schizophrenia, type 2 diabetes mellitus
6: drug-induced agranulocytosis, schizophrenia
depressive_terms <- c("depressive disorder",
"depressive symptom",
"depressive episode",
"major depressive disorders",
"major depressive disorder",
"major depressive episode"
)
depressive_terms <- str_length_sort(depressive_terms)
depressive_terms <- paste0("(?=, |^)", depressive_terms)
depressive_terms <- paste0(depressive_terms, "(?=,|$)")
gwas_study_info = gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = paste0(depressive_terms, collapse = "|"),
"depressive disorders"
)
) |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = "(?=, |^)depressive disorder(?=,|$)",
"depressive disorders"
)
) |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = "(?=, |^)major depressive episode(?=,|$)",
"depressive disorders"
)
)
gwas_study_info |>
filter(grepl("depressive disorders", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
all_disease_terms
<char>
1: major depressive disorder
2: schizophrenia, major depressive disorder
3: major depressive disorder, mood disorder
4: mood disorder, major depressive disorder
5: major depressive disorder, major depressive episode, psychotic symptom measurement
6: psychotic symptom measurement, major depressive episode
l2_all_disease_terms
<char>
1: depressive disorders
2: depressive disorders, schizophrenia
3: depressive disorders, mood disorder
4: depressive disorders, mood disorder
5: depressive disorders, major depressive episode, psychotic
6: depressive disorders, psychotic
url <- "http://www.ebi.ac.uk/ols4/api/ontologies/mesh/terms/http%253A%252F%252Fid.nlm.nih.gov%252Fmesh%252FD001008/descendants"
anxiety_terms <- get_descendants(url)
[1] "Number of terms collected:"
[1] 15
[1] "\n Some example terms"
[1] "obsessive-compulsive disorder" "generalized anxiety disorder"
[3] "neurocirculatory asthenia" "excoriation disorder"
[5] "anxiety, separation"
anxiety_terms <- c(anxiety_terms,
"obsessive-compulsive symptom measurement",
"obsessive-compulsive disorder"
)
anxiety_terms <- str_length_sort(anxiety_terms)
anxiety_terms <- paste0("(?=, |^)", anxiety_terms)
anxiety_terms <- paste0(anxiety_terms, "(?=,|$)")
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = paste0(anxiety_terms, collapse = "|"),
"anxiety disorders"
)
) |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = "(?=, |^)anxiety disorder(?=,|$)|(?=, |^)anxiety measurement(?=,|$)",
"anxiety disorders"
)
) |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = "(?=, |^)anxiety(?=,|$)",
"anxiety disorders"
)
)
gwas_study_info |>
filter(grepl("anxiety disorders", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
all_disease_terms l2_all_disease_terms
<char> <char>
1: generalized anxiety disorder anxiety disorders
2: anxiety disorder measurement anxiety disorders
3: obsessive-compulsive disorder anxiety disorders
4: anxiety disorder anxiety disorders
5: anxiety anxiety disorders
6: panic disorder anxiety disorders
gwas_study_info = gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = "bulimia nervosa|anorexia nervosa|binge eating|eating disorder",
"eating disorders"
)
) |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = "anorexia",
"eating disorders"
)
)
gwas_study_info |>
filter(grepl("eating disorders", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
all_disease_terms l2_all_disease_terms
<char> <char>
1: bipolar disorder, binge eating eating disorders, bipolar disorder
2: binge eating, bipolar disorder eating disorders, bipolar disorder
3: anorexia nervosa eating disorders
4: eating disorder eating disorders
5: bulimia nervosa eating disorders
6: bipolar disorder, eating disorder bipolar disorder, eating disorders
gwas_study_info = gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms,
pattern = "autism",
"autism spectrum disorders"
)
)
gwas_study_info |>
filter(grepl("autism spectrum disorders", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
all_disease_terms
<char>
1: autism spectrum disorder symptom
2: asperger syndrome
3: autism spectrum disorder, schizophrenia
4: autism spectrum disorder
5: obsessive-compulsive disorder, autism spectrum disorder
6: autism
l2_all_disease_terms
<char>
1: autism spectrum disorders
2: autism spectrum disorders
3: autism spectrum disorders, schizophrenia
4: autism spectrum disorders
5: autism spectrum disorders, obsessive-compulsive disorder
6: autism spectrum disorders
gwas_study_info = gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = "adhd",
"attention-deficit/hyperactivity disorder"
)
)
gwas_study_info |>
filter(grepl("attention-deficit/hyperactivity disorder", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
all_disease_terms
<char>
1: obsessive-compulsive disorder, attention deficit hyperactivity disorder
2: attention deficit hyperactivity disorder
3: attention deficit hyperactivity disorder, oppositional defiant disorder measurement
4: attention deficit hyperactivity disorder, conduct disorder
5: attention deficit hyperactivity disorder, bipolar disorder, autism spectrum disorder, schizophrenia, major depressive disorder
6: attention deficit hyperactivity disorder, bipolar disorder
l2_all_disease_terms
<char>
1: attention-deficit/hyperactivity disorder, obsessive-compulsive disorder
2: attention-deficit/hyperactivity disorder
3: attention-deficit/hyperactivity disorder, oppositional defiant disorder
4: attention-deficit/hyperactivity disorder, conduct disorder
5: attention-deficit/hyperactivity disorder, autism spectrum disorders, bipolar disorder, major depressive disorder, schizophrenia
6: attention-deficit/hyperactivity disorder, bipolar disorder
gwas_study_info |>
filter(grepl("conduct disorder", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
all_disease_terms
<char>
1: conduct disorder
2: attention deficit hyperactivity disorder, conduct disorder
3: attention deficit hyperactivity disorder, conduct disorder, oppositional defiant disorder
l2_all_disease_terms
<char>
1: conduct disorder
2: attention-deficit/hyperactivity disorder, conduct disorder
3: attention-deficit/hyperactivity disorder, conduct disorder, oppositional defiant disorder
terms <- c("developmental disability")
gwas_study_info = gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = paste0(terms, collapse = "(?=,|$)|\\b"),
"idiopathic developmental intellectual disability"
)
)
gwas_study_info |>
filter(grepl("idiopathic developmental intellectual disability", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
all_disease_terms l2_all_disease_terms
<char> <char>
1: developmental disability idiopathic developmental intellectual disability
url <- "http://www.ebi.ac.uk/ols4/api/ontologies/mondo/terms/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FMONDO_0002028/descendants"
personality_disorders <- get_descendants(url)
[1] "Number of terms collected:"
[1] 10
[1] "\n Some example terms"
[1] "obsessive-compulsive personality disorder"
[2] "narcissistic personality disorder"
[3] "schizotypal personality disorder"
[4] "histrionic personality disorder"
[5] "antisocial personality disorder"
personality_disorders <- paste0("(?=, |^)", personality_disorders)
personality_disorders <- paste0(personality_disorders, "(?=,|$)")
gwas_study_info = gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms,
pattern = paste0(personality_disorders, collapse = "|"),
"personality disorders"
)
)
url <- "http://www.ebi.ac.uk/ols4/api/ontologies/efo/terms/http%253A%252F%252Fwww.ebi.ac.uk%252Fefo%252FEFO_0004247/descendants"
mood_disorders <- get_descendants(url)
[1] "Number of terms collected:"
[1] 10
[1] "\n Some example terms"
[1] "mixed anxiety and depressive disorder"
[2] "treatment resistant depression"
[3] "major depressive disorder"
[4] "postpartum depression"
[5] "depressive disorder"
mood_disorders <- str_length_sort(mood_disorders)
mood_disorders <- paste0("(?=, |^)", mood_disorders)
mood_disorders <- paste0(mood_disorders, "(?=,|$)")
gwas_study_info = gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms,
pattern = paste0(mood_disorders, collapse = "|"),
"mood disorder"
)
)
url <- "http://www.ebi.ac.uk/ols4/api/ontologies/doid/terms/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FDOID_535/descendants"
sleep_disorders <- get_descendants(url)
[1] "Number of terms collected:"
[1] 16
[1] "\n Some example terms"
[1] "periodic limb movement disorder" "advanced sleep phase syndrome 3"
[3] "advanced sleep phase syndrome 2" "advanced sleep phase syndrome 1"
[5] "advanced sleep phase syndrome 4"
url <- "http://www.ebi.ac.uk/ols4/api/ontologies/efo/terms/http%253A%252F%252Fwww.ebi.ac.uk%252Fefo%252FEFO_0008568/descendants"
other_sleep_disorders <- get_descendants(url)
[1] "Number of terms collected:"
[1] 26
[1] "\n Some example terms"
[1] "autosomal dominant cerebellar ataxia, deafness and narcolepsy"
[2] "hereditary sensory neuropathy-deafness-dementia syndrome"
[3] "rapid eye movement sleep disorder"
[4] "substance-induced sleep disorder"
[5] "drug induced central sleep apnea"
sleep_disorders <- c(sleep_disorders,
other_sleep_disorders)
sleep_disorders <- str_length_sort(sleep_disorders)
gwas_study_info = gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms,
pattern = paste0(sleep_disorders, collapse = "(?=,|$)|\\b"),
"sleep disorders"
)
)
other_mental_disorders <- c("manic or hypomanic episode",
"mental or behavioural disorder",
"mental disorder",
"post-traumatic stress disorder",
"stress-related disorder",
"acute stress reaction",
"occupation-related stress disorder",
"psychotic symptom",
"psychosis",
"psychiatric disorder",
"personality disorders",
"personality disorder",
"mood disorder",
"sleep disorders",
"sleep disorder",
"mixed anxiety disorders and depressive disorders",
"emotional symptom",
"dissociative disorder",
"hallucinations",
"somatoform disorder",
"schizoaffective disorder"
)
other_mental_disorders <- str_length_sort(other_mental_disorders)
other_mental_disorders <- paste0("(?=, |^)", other_mental_disorders)
other_mental_disorders <- paste0(other_mental_disorders, "(?=,|$)")
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = paste0(other_mental_disorders, collapse = "|"),
"other mental disorders"
)
)
gwas_study_info |>
filter(grepl("other mental disorders", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
all_disease_terms
<char>
1: bipolar disorder
2: major depressive disorder, schizophrenia, bipolar disorder
3: major depressive disorder, bipolar disorder
4: schizophrenia, bipolar disorder
5: bipolar disorder, major depressive disorder
6: insomnia
l2_all_disease_terms
<char>
1: other mental disorders
2: other mental disorders, major depressive disorder, schizophrenia
3: other mental disorders, major depressive disorder
4: other mental disorders, schizophrenia
5: other mental disorders, major depressive disorder
6: other mental disorders
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = "alcohol-related disorders|alcohol and nicotine codependence",
"alcohol use disorders"
)
)
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = "alcohol use disorder",
"alcohol use disorders"
)
)
gwas_study_info |>
filter(grepl("alcohol use disorders", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
all_disease_terms
<char>
1: alcohol and nicotine codependence
2: alcohol use disorder measurement
3: alcohol-related disorders
4: addictive alcohol use
5: alcohol-related disorders, hepatocellular carcinoma
6: alcohol use disorder measurement, alcohol dependence
l2_all_disease_terms
<char>
1: alcohol use disorderss
2: alcohol use disorders
3: alcohol use disorderss
4: alcohol use disorderss
5: alcohol use disorderss, liver cancer
6: alcohol dependence, alcohol use disorders
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = "opioid dependence|opioid use disorder",
"opioid use disorders"
)
)
gwas_study_info |>
filter(grepl("opioid use disorders", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
all_disease_terms l2_all_disease_terms
<char> <char>
1: opioid dependence opioid use disorders
2: opioid use disorder opioid use disorders
3: opioid use disorder, opioid use disorders
gwas_study_info = gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = "cocaine-related disorders",
"cocaine use disorders"
)
)
gwas_study_info |>
filter(grepl("cocaine use disorders", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
all_disease_terms l2_all_disease_terms
<char> <char>
1: cocaine dependence cocaine use disorders
2: cocaine use disorder cocaine use disorders
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = "methamphetamine",
"amphetamine"
)
)
gwas_study_info |>
filter(grepl("amphetamine", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
all_disease_terms
<char>
1: methamphetamine dependence
2: methamphetamine-induced psychosis
3: alcohol dependence, heroin dependence, methamphetamine dependence
l2_all_disease_terms
<char>
1: amphetamine use disorders
2: amphetamine use disorders
3: alcohol dependence, heroin dependence, amphetamine use disorders
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = "cannabis dependence",
"cannabis use disorders"
)
)
gwas_study_info |>
filter(grepl("cannabis use disorders", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
all_disease_terms
<char>
1: cannabis dependence
2: cannabis dependence measurement
3: cannabis dependence, schizophrenia, substance abuse
4: cannabis dependence, substance abuse
5: hiv infection, cannabis dependence measurement
l2_all_disease_terms
<char>
1: cannabis use disorders
2: cannabis use disorders
3: cannabis use disorders, schizophrenia, substance abuse
4: cannabis use disorders, substance abuse
5: cannabis use disorders
other_drug_use_terms <- c("heroin dependence",
"drug dependence",
"nictone dependence",
"substance abuse",
"drug misuse",
"alcohol use disorders delirium"
)
other_drug_use_terms <- str_length_sort(other_drug_use_terms)
other_drug_use_terms <- paste0("(?=, |^)", other_drug_use_terms)
other_drug_use_terms <- paste0(other_drug_use_terms, "(?=,|$)")
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = paste0(other_drug_use_terms, collapse = "|"),
"other drug use disorders"
)
)
gwas_study_info |>
filter(grepl("other drug use disorders", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
all_disease_terms
<char>
1: heroin dependence
2: drug dependence
3: substance abuse
4: drug misuse, self-injurious behavior
5: drug misuse, major depressive disorder
l2_all_disease_terms
<char>
1: other drug use disorders
2: other drug use disorders
3: other drug use disorders
4: other drug use disorders, self-injurious behavior
5: other drug use disorders, major depressive disorder
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = "type 1 diabetes mellitus",
"diabetes mellitus type 1"
)
)
gwas_study_info |>
filter(grepl("diabetes mellitus type 1", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
all_disease_terms
<char>
1: type 1 diabetes mellitus
2: autoimmune thyroid disease, systemic lupus erythematosus, type 1 diabetes mellitus, ankylosing spondylitis, psoriasis, common variable immunodeficiency, celiac disease, ulcerative colitis, crohns disease, autoimmune disease, juvenile idiopathic arthritis
3: rheumatoid arthritis, type 1 diabetes mellitus, psoriasis, celiac disease, ulcerative colitis, crohns disease, progressive supranuclear palsy
4: rheumatoid arthritis, amyotrophic lateral sclerosis, type 1 diabetes mellitus, psoriasis, celiac disease, ulcerative colitis, crohns disease
5: rheumatoid arthritis, type 1 diabetes mellitus, corticobasal degeneration disorder, psoriasis, celiac disease, ulcerative colitis, crohns disease
6: type 1 diabetes mellitus, type 2 diabetes mellitus, neuropathy, diabetic foot
l2_all_disease_terms
<char>
1: diabetes mellitus type 1
2: ankylosing spondylitis, autoimmune disease, autoimmune thyroid disease, celiac disease, common variable immunodeficiency, inflammatory bowel disease, juvenile idiopathic arthritis, psoriasis, systemic lupus erythematosus, diabetes mellitus type 1, inflammatory bowel disease
3: celiac disease, inflammatory bowel disease, progressive supranuclear palsy, psoriasis, rheumatoid arthritis, diabetes mellitus type 1, inflammatory bowel disease
4: amyotrophic lateral sclerosis, celiac disease, inflammatory bowel disease, psoriasis, rheumatoid arthritis, diabetes mellitus type 1, inflammatory bowel disease
5: celiac disease, corticobasal degeneration disorder, inflammatory bowel disease, psoriasis, rheumatoid arthritis, diabetes mellitus type 1, inflammatory bowel disease
6: diabetic foot, neuropathy, diabetes mellitus type 1, type 2 diabetes mellitus
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = "type 2 diabetes mellitus",
"diabetes mellitus type 2"
)
)
gwas_study_info |>
filter(grepl("diabetes mellitus type 2", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
all_disease_terms
<char>
1: type 2 diabetes mellitus
2: type 2 diabetes mellitus, multiple sclerosis
3: type 2 diabetes mellitus, diabetic maculopathy
4: type 2 diabetes mellitus, obesity, coronary artery disease
5: schizophrenia, type 2 diabetes mellitus
6: alzheimer disease, type 2 diabetes mellitus, coronary artery disease
l2_all_disease_terms
<char>
1: diabetes mellitus type 2
2: multiple sclerosis, diabetes mellitus type 2
3: diabetic maculopathy, diabetes mellitus type 2
4: ischemic heart disease, obesity, diabetes mellitus type 2
5: schizophrenia, diabetes mellitus type 2
6: alzheimer's disease and other dementias, ischemic heart disease, diabetes mellitus type 2
gwas_study_info |>
filter(grepl("chronic kidney disease", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
all_disease_terms
<char>
1: chronic kidney disease
2: chronic kidney disease, proteinuria
3: diabetic nephropathy, type 2 diabetes mellitus
4: diabetic nephropathy
5: type 1 diabetes mellitus, chronic kidney disease, diabetic nephropathy
6: type 1 diabetes mellitus, diabetic nephropathy
l2_all_disease_terms
<char>
1: chronic kidney disease
2: chronic kidney disease, proteinuria
3: chronic kidney disease, diabetes mellitus type 2
4: chronic kidney disease
5: chronic kidney disease, diabetes mellitus type 1
6: chronic kidney disease, diabetes mellitus type 1
glomerulonephritis_terms <- c("(?=, |^)chronic glomerulonephritis",
"(?=, |^)membranous glomerulonephritis",
"(?=, |^)proliferative glomerulonephritis")
gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = paste0(glomerulonephritis_terms, collapse = "(?=,|$)|\\b"),
"glomerulonephritis"
)
)
DATE_ADDED_TO_CATALOG PUBMED_ID FIRST_AUTHOR DATE
<IDat> <int> <char> <IDat>
1: 2018-01-17 29093273 Almgren P 2017-11-02
2: 2018-01-17 29093273 Almgren P 2017-11-02
3: 2018-01-17 29093273 Almgren P 2017-11-02
4: 2018-01-17 29093273 Almgren P 2017-11-02
5: 2018-01-17 29093273 Almgren P 2017-11-02
---
142851: 2024-09-12 37984853 Eissman JM 2023-11-20
142852: 2024-09-12 37984853 Eissman JM 2023-11-20
142853: 2024-09-12 37984853 Eissman JM 2023-11-20
142854: 2024-09-12 37984853 Eissman JM 2023-11-20
142855: 2024-09-12 37984853 Eissman JM 2023-11-20
JOURNAL LINK
<char> <char>
1: JCI Insight www.ncbi.nlm.nih.gov/pubmed/29093273
2: JCI Insight www.ncbi.nlm.nih.gov/pubmed/29093273
3: JCI Insight www.ncbi.nlm.nih.gov/pubmed/29093273
4: JCI Insight www.ncbi.nlm.nih.gov/pubmed/29093273
5: JCI Insight www.ncbi.nlm.nih.gov/pubmed/29093273
---
142851: Alzheimers Dement www.ncbi.nlm.nih.gov/pubmed/37984853
142852: Alzheimers Dement www.ncbi.nlm.nih.gov/pubmed/37984853
142853: Alzheimers Dement www.ncbi.nlm.nih.gov/pubmed/37984853
142854: Alzheimers Dement www.ncbi.nlm.nih.gov/pubmed/37984853
142855: Alzheimers Dement www.ncbi.nlm.nih.gov/pubmed/37984853
STUDY
<char>
1: Genetic determinants of circulating GIP and GLP-1 concentrations.
2: Genetic determinants of circulating GIP and GLP-1 concentrations.
3: Genetic determinants of circulating GIP and GLP-1 concentrations.
4: Genetic determinants of circulating GIP and GLP-1 concentrations.
5: Genetic determinants of circulating GIP and GLP-1 concentrations.
---
142851: Sex-specific genetic architecture of late-life memory performance.
142852: Sex-specific genetic architecture of late-life memory performance.
142853: Sex-specific genetic architecture of late-life memory performance.
142854: Sex-specific genetic architecture of late-life memory performance.
142855: Sex-specific genetic architecture of late-life memory performance.
DISEASE/TRAIT
<char>
1: Insulin levels in response to oral glucose tolerance test (fasting)
2: Glucagon levels in response to oral glucose tolerance test (fasting)
3: GIP levels in response to oral glucose tolerance test (fasting)
4: GLP-1 levels in response to oral glucose tolerance test (fasting)
5: Insulin levels in response to oral glucose tolerance test (30 minutes)
---
142851: Baseline memory in normal cognition x sex interaction
142852: Baseline memory x sex interaction
142853: Baseline memory in impaired cognition x sex interaction
142854: Baseline memory in normal cognition x sex interaction
142855: Baseline memory x sex interaction
INITIAL_SAMPLE_SIZE
<char>
1: 3,344 Swedish ancestry individuals
2: 3,344 Swedish ancestry individuals
3: 3,344 Swedish ancestry individuals
4: 3,344 Swedish ancestry individuals
5: 3,344 Swedish ancestry individuals
---
142851: 12,789 European ancestry individuals, 1,775 African ancestry individuals
142852: 3,367 African ancestry individuals
142853: 1,242 African ancestry individuals
142854: 1,775 African ancestry individuals
142855: 24,216 European ancestry individuals
REPLICATION_SAMPLE_SIZE PLATFORM_[SNPS_PASSING_QC]
<char> <char>
1: 4,905 Finnish ancestry individuals Illumina [at least 81396]
2: 4,905 Finnish ancestry individuals Illumina [at least 81396]
3: 4,905 Finnish ancestry individuals Illumina [at least 81396]
4: 4,905 Finnish ancestry individuals Illumina [at least 81396]
5: 4,905 Finnish ancestry individuals Illumina [at least 81396]
---
142851: Affymetrix, Illumina [NR] (imputed)
142852: Affymetrix, Illumina [NR] (imputed)
142853: Affymetrix, Illumina [NR] (imputed)
142854: Affymetrix, Illumina [NR] (imputed)
142855: Affymetrix, Illumina [NR] (imputed)
ASSOCIATION_COUNT
<int>
1: 1
2: 4
3: 2
4: 4
5: 3
---
142851: 56
142852: 55
142853: 37
142854: 48
142855: 106
MAPPED_TRAIT
<char>
1: insulin measurement, glucose tolerance test
2: glucagon measurement, glucose tolerance test
3: glucose-dependent insulinotropic peptide measurement, glucose tolerance test
4: glucagon-like peptide-1 measurement, glucose tolerance test
5: insulin measurement, glucose tolerance test
---
142851: memory performance, sex interaction measurement
142852: memory performance, sex interaction measurement
142853: memory performance, sex interaction measurement
142854: memory performance, sex interaction measurement
142855: memory performance, sex interaction measurement
MAPPED_TRAIT_URI
<char>
1: http://www.ebi.ac.uk/efo/EFO_0004467, http://www.ebi.ac.uk/efo/EFO_0004307
2: http://www.ebi.ac.uk/efo/EFO_0008463, http://www.ebi.ac.uk/efo/EFO_0004307
3: http://www.ebi.ac.uk/efo/EFO_0008464, http://www.ebi.ac.uk/efo/EFO_0004307
4: http://www.ebi.ac.uk/efo/EFO_0008465, http://www.ebi.ac.uk/efo/EFO_0004307
5: http://www.ebi.ac.uk/efo/EFO_0004467, http://www.ebi.ac.uk/efo/EFO_0004307
---
142851: http://www.ebi.ac.uk/efo/EFO_0004874, http://www.ebi.ac.uk/efo/EFO_0008343
142852: http://www.ebi.ac.uk/efo/EFO_0004874, http://www.ebi.ac.uk/efo/EFO_0008343
142853: http://www.ebi.ac.uk/efo/EFO_0004874, http://www.ebi.ac.uk/efo/EFO_0008343
142854: http://www.ebi.ac.uk/efo/EFO_0004874, http://www.ebi.ac.uk/efo/EFO_0008343
142855: http://www.ebi.ac.uk/efo/EFO_0004874, http://www.ebi.ac.uk/efo/EFO_0008343
STUDY_ACCESSION GENOTYPING_TECHNOLOGY SUBMISSION_DATE
<char> <char> <lgcl>
1: GCST005159 Genome-wide genotyping array NA
2: GCST005162 Genome-wide genotyping array NA
3: GCST005167 Genome-wide genotyping array NA
4: GCST005164 Genome-wide genotyping array NA
5: GCST005160 Genome-wide genotyping array NA
---
142851: GCST90448438 Genome-wide genotyping array NA
142852: GCST90448439 Genome-wide genotyping array NA
142853: GCST90448440 Genome-wide genotyping array NA
142854: GCST90448441 Genome-wide genotyping array NA
142855: GCST90448442 Genome-wide genotyping array NA
STATISTICAL_MODEL BACKGROUND_TRAIT MAPPED_BACKGROUND_TRAIT
<lgcl> <lgcl> <char>
1: NA NA
2: NA NA
3: NA NA
4: NA NA
5: NA NA
---
142851: NA NA
142852: NA NA
142853: NA NA cognitive impairment measurement
142854: NA NA
142855: NA NA
MAPPED_BACKGROUND_TRAIT_URI COHORT
<char> <char>
1:
2:
3:
4:
5:
---
142851: ADNI|NACC|ROSMAP|MARS
142852: ADNI|NACC|ROSMAP|MARS
142853: http://www.ebi.ac.uk/efo/EFO_0007998 ADNI|NACC|ROSMAP|MARS
142854: ADNI|NACC|ROSMAP|MARS
142855: ADNI|NACC|ROSMAP|MARS
FULL_SUMMARY_STATISTICS SUMMARY_STATS_LOCATION GXE disease_terms
<char> <char> <char> <char>
1: no no
2: no no
3: no no
4: no no
5: no no
---
142851: no yes
142852: no yes
142853: no yes
142854: no yes
142855: no yes
MAPPED_TRAIT_CATEGORY background_disease_terms
<char> <char>
1: Measurement
2: Measurement
3: Measurement
4: Measurement
5: Measurement
---
142851: Measurement
142852: Measurement
142853: Measurement
142854: Measurement
142855: Measurement
BACKGROUND_TRAIT_CATEGORY DISEASE_STUDY all_disease_terms
<char> <lgcl> <char>
1: Other FALSE
2: Other FALSE
3: Other FALSE
4: Other FALSE
5: Other FALSE
---
142851: Other FALSE
142852: Other FALSE
142853: Measurement FALSE
142854: Other FALSE
142855: Other FALSE
collected_all_disease_terms l1_all_disease_terms l2_all_disease_terms
<char> <char> <char>
1:
2:
3:
4:
5:
---
142851:
142852:
142853:
142854:
142855:
gwas_study_info |>
filter(grepl("acute glomerulonephritis", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
Empty data.table (0 rows and 2 cols): all_disease_terms,l2_all_disease_terms
gwas_study_info |>
filter(grepl("dermatitis", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
all_disease_terms
<char>
1: seborrheic dermatitis
2: contact dermatitis due to nickel
3: eczematoid dermatitis
4: recalcitrant atopic dermatitis
5: allergic rhinitis, eczematoid dermatitis
6: asthma, allergic rhinitis, eczematoid dermatitis
l2_all_disease_terms
<char>
1: seborrheic dermatitis
2: contact dermatitis
3: eczematoid dermatitis
4: recalcitrant atopic dermatitis
5: allergic rhinitis, eczematoid dermatitis
6: allergic rhinitis, asthma, eczematoid dermatitis
gwas_study_info |>
filter(grepl("psoriasis", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
all_disease_terms
<char>
1: psoriasis
2: autoimmune thyroid disease, systemic lupus erythematosus, type 1 diabetes mellitus, ankylosing spondylitis, psoriasis, common variable immunodeficiency, celiac disease, ulcerative colitis, crohns disease, autoimmune disease, juvenile idiopathic arthritis
3: cutaneous psoriasis measurement, psoriasis
4: psoriasis vulgaris
5: cutaneous psoriasis measurement, psoriatic arthritis, psoriasis
6: rheumatoid arthritis, frontotemporal dementia, psoriasis, celiac disease, ulcerative colitis, crohns disease
l2_all_disease_terms
<char>
1: psoriasis
2: ankylosing spondylitis, autoimmune disease, autoimmune thyroid disease, celiac disease, common variable immunodeficiency, inflammatory bowel disease, juvenile idiopathic arthritis, psoriasis, systemic lupus erythematosus, diabetes mellitus type 1, inflammatory bowel disease
3: cutaneous psoriasis measurement, psoriasis
4: psoriasis vulgaris
5: cutaneous psoriasis measurement, psoriasis, psoriatic arthritis
6: celiac disease, inflammatory bowel disease, alzheimer's disease and other dementias, psoriasis, rheumatoid arthritis, inflammatory bowel disease
bacterial_skin_disease_terms <- c("staphylococcal skin infections",
"skin and soft tissue staphylococcus aureus infection",
"cellulitis"
)
bacterial_skin_disease_terms <- str_length_sort(bacterial_skin_disease_terms)
bacterial_skin_disease_terms <- paste0("(?=, |^)", bacterial_skin_disease_terms)
bacterial_skin_disease_terms <- paste0(bacterial_skin_disease_terms, "(?=,|$)")
gwas_study_info = gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = paste0(bacterial_skin_disease_terms, collapse = "|"),
"bacterial skin diseases"
)
)
gwas_study_info |>
filter(grepl("bacterial skin diseases", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
all_disease_terms
<char>
1: skin and soft tissue staphylococcus aureus infection
2: staphylococcal skin infections
3: cellulitis
4: cellulitis, lymphangitis
l2_all_disease_terms
<char>
1: bacterial skin diseases
2: bacterial skin diseases
3: bacterial skin diseases
4: bacterial skin diseases, lymphangitis
gwas_study_info |>
filter(grepl("scabies", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
Empty data.table (0 rows and 2 cols): all_disease_terms,l2_all_disease_terms
fungal_skin_disease_terms <- c("tinea",
"dermatomycosis")
fungal_skin_disease_terms <- str_length_sort(fungal_skin_disease_terms)
fungal_skin_disease_terms <- paste0("(?=, |^)", fungal_skin_disease_terms)
fungal_skin_disease_terms <- paste0(fungal_skin_disease_terms, "(?=,|$)")
gwas_study_info = gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = paste0(fungal_skin_disease_terms, collapse = "|"),
"fungal skin diseases"
)
)
gwas_study_info |>
filter(grepl("fungal skin diseases", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
all_disease_terms l2_all_disease_terms
<char> <char>
1: dermatomycosis, dermatophytosis fungal skin diseases, tinea
2: dermatophytosis fungal skin diseases
3: dermatomycosis fungal skin diseases
4: tinea fungal skin diseases
gwas_study_info = gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = "acne",
"acne vulgaris"
)
)
gwas_study_info |>
filter(grepl("acne vulgaris", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
all_disease_terms l2_all_disease_terms
<char> <char>
1: acne acne vulgaris
gwas_study_info |>
filter(grepl("pruritus", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
all_disease_terms
<char>
1: non-alcoholic steatohepatitis, pruritus
2: pruritus
l2_all_disease_terms
<char>
1: cirrhosis and other chronic liver diseases, pruritus
2: pruritus
gwas_study_info |>
filter(grepl("urticaria", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
all_disease_terms l2_all_disease_terms
<char> <char>
1: urticaria, angioedema angioedema, urticaria
2: urticaria urticaria
gwas_study_info |>
filter(grepl("decubitus ulcer", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
all_disease_terms l2_all_disease_terms
<char> <char>
1: decubitus ulcer decubitus ulcer
other_skin_disease_terms <- c("sebaceous gland disease",
"rosacea",
"erythematosquamous dermatosis",
"dry skin",
"skin tags",
"dermatochalasis",
"epidermal thickening",
"epidermal inclusion cyst",
"cutaneous lupus erythematosus",
"androgenetic alopecia",
"chemotherapy-induced alopecia",
"cutaneous leishmaniasis")
other_skin_disease_terms <- str_length_sort(other_skin_disease_terms)
other_skin_disease_terms <- paste0("(?=, |^)", other_skin_disease_terms)
other_skin_disease_terms <- paste0(other_skin_disease_terms, "(?=,|$)")
gwas_study_info = gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = paste0(other_skin_disease_terms, collapse = "|"),
"other skin and subcutaneous diseases"
)
)
gwas_study_info |>
filter(grepl("other skin and subcutaneous diseases", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
all_disease_terms l2_all_disease_terms
<char> <char>
1: androgenetic alopecia other skin and subcutaneous diseases
2: chemotherapy-induced alopecia other skin and subcutaneous diseases
3: rosacea severity measurement other skin and subcutaneous diseases
4: cutaneous lupus erythematosus other skin and subcutaneous diseases
5: rosacea other skin and subcutaneous diseases
6: dry skin other skin and subcutaneous diseases
vision_loss_terms <- c("blindness",
"color vision disorder",
"vision disorder",
"visuospatial impairment",
"pathological blindness and vision loss",
"visual impairment",
"myopia",
"refractive error",
"hyperopia",
"astigmatism",
"corneal astigmatism",
"presbyopia",
"anisometropia",
"esotropia",
"non-accomodative esotropia",
"accommodative esotropia",
"abnormality of refraction",
"abnormality of vision",
"age-related macular degeneration",
"degeneration of macula and posterior pole",
"age-related cataract",
"cataract")
gwas_study_info = gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = paste0(vision_loss_terms, collapse = "(?=,|$)|\\b"),
"blindness and vision loss"
)
)
gwas_study_info |>
filter(grepl("blindness and vision loss", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
all_disease_terms
<char>
1: corneal astigmatism
2: wet macular degeneration
3: refractive error
4: age-related macular degeneration
5: atrophic macular degeneration, age-related macular degeneration, wet macular degeneration
6: age-related macular degeneration, disease progression measurement
l2_all_disease_terms
<char>
1: blindness and vision loss
2: blindness and vision loss
3: blindness and vision loss
4: blindness and vision loss
5: blindness and vision loss, atrophic macular degeneration
6: blindness and vision loss
other_sense_terms <- c("abnormality of the sense of smell",
"disturbances of sensation of smell and taste")
other_sense_terms <- str_length_sort(other_sense_terms)
other_sense_terms <- paste0("(?=, |^)", other_sense_terms)
other_sense_terms <- paste0(other_sense_terms, "(?=,|$)")
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = paste0(other_sense_terms, collapse = "|"),
"other sense organ diseases"
)
)
gwas_study_info |>
filter(grepl("other sense organ diseases", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
all_disease_terms
<char>
1: disturbances of sensation of smell and taste
2: covid-19 symptoms measurement, abnormality of the sense of smell, ageusia
l2_all_disease_terms
<char>
1: other sense organ diseases
2: other sense organ diseases, ageusia
other_neuro <- c("mild neurocognitive disorder",
"hiv-associated neurocognitive disorder")
urinary_diseases_terms <- c("urinary incontinence",
"stress urinary incontinence",
"urgency urinary incontinence",
"urinary system disease",
"urinary tract obstruction",
"bladder neck obstruction",
"bladder calculus",
"urethral disease",
"uterine disorder",
"urethral syndrome",
"uterine inflammatory disease",
"lower urinary tract calculus",
"enuresis"
)
urinary_diseases_terms <- str_length_sort(urinary_diseases_terms)
urin_male_infert <- c(urinary_diseases_terms, "male infertility")
urin_male_infert <- paste0("(?=, |^)", urin_male_infert)
urin_male_infert <- paste0(urin_male_infert, "(?=,|$)")
gwas_study_info = gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(
l2_all_disease_terms,
paste0("(?=, |^)", urin_male_infert, collapse = "|"),
"urinary diseases and male infertility"
)
)
diseases <- stringr::str_split(pattern = ", ",
gwas_study_info$l2_all_disease_terms) |>
unlist() |>
stringr::str_trim()
pregnancy_terms <- grep("pregnancy", diseases, value = T)
gyno_terms <- c("endometriosis",
"female reproductive system disease",
"female genital tract fistula",
"placenta disease",
"female infertility",
"polycystic ovary syndrome",
"ovarian dysfunction",
"ovarian gynecological diseases",
"ovarian cyst",
"ovarian disease",
"primary ovarian insufficiency",
"premature ovarian insufficiency",
"vaginal inflammation",
"vaginal disorder",
"postmenopausal atrophic vaginitis",
"uterine prolapse",
"atrophic vaginitis",
"vaginitis",
"vulvovaginitis",
"abnormal delivery",
pregnancy_terms)
gyno_terms <- str_length_sort(gyno_terms)
gyno_terms <- paste0("(?=, |^)", gyno_terms)
gyno_terms <- paste0(gyno_terms, "(?=,|$)")
gwas_study_info = gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms,
pattern = paste0(gyno_terms, collapse = "(?=,|$)|\\b"),
"gynecological diseases"
)
)
hemoglobinopathies_terms <- c("sickle cell disease and related diseases",
"thalassemia",
"inherited hemoglobinopathy",
"hemoglobin e disease"
)
hemopath_hemo_anemias <- c(hemoglobinopathies_terms, "hemolytic anemia")
gwas_study_info = gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(
l2_all_disease_terms,
paste0("(?=, |^)", hemopath_hemo_anemias, collapse = "|"),
"hemoglobinopathies and hemolytic anemias"
)
)
gwas_study_info |>
filter(grepl("hemoglobinopathies and hemolytic anemias", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
all_disease_terms
<char>
1: sickle cell disease and related diseases
2: sickle cell anemia
3: hemoglobin e disease
4: sickle cell anemia, thromboembolism
5: inherited hemoglobinopathy
6: hemolytic anemia
l2_all_disease_terms
<char>
1: hemoglobinopathies and hemolytic anemias
2: hemoglobinopathies and hemolytic anemias
3: hemoglobinopathies and hemolytic anemias
4: hemoglobinopathies and hemolytic anemias, thromboembolism
5: hemoglobinopathies and hemolytic anemias
6: hemoglobinopathies and hemolytic anemias
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = "disorderss",
"disorders"
)
)
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = "anxiety disorders disorderss",
"anxiety disorders"
)
)
gbd_data <- data.table::fread(here::here("data/gbd/ihme_gbd_2019_global_disease_burden_rate_all_ages.csv"))
gbd_data$cause <- stringr::str_remove_all(gbd_data$cause, ",")
diseases <- stringr::str_split(pattern = ", ",
gwas_study_info$l2_all_disease_terms[gwas_study_info$l2_all_disease_terms != ""]) |>
unlist() |>
stringr::str_trim()
gbd_data$cause[!tolower(gbd_data$cause) %in% unique(diseases)] |> sort() |> unique()
[1] "Acute glomerulonephritis"
[2] "Congenital birth defects"
[3] "Drug use disorders"
[4] "Endocrine metabolic blood and immune disorders"
[5] "Oral disorders"
[6] "Other chronic respiratory diseases"
[7] "Other digestive diseases"
[8] "Other musculoskeletal disorders"
[9] "Other neurological disorders"
[10] "Scabies"
[11] "Sudden infant death syndrome"
[12] "Total burden related to Non-alcoholic fatty liver disease (NAFLD)"
[13] "Vascular intestinal disorders"
[14] "Viral skin diseases"
gbd_data =
gbd_data |>
mutate(cause = tolower(cause))
gwas_disease_traits = data.frame(cause = diseases)
# gwas_study_info |>
# filter(DISEASE_STUDY == T) |>
# select(all_disease_terms, l1_all_disease_terms, cause = l2_all_disease_terms) |>
# distinct()
left_join(gwas_disease_traits,
gbd_data) |>
head()
Joining with `by = join_by(cause)`
Warning in left_join(gwas_disease_traits, gbd_data): Detected an unexpected many-to-many relationship between `x` and `y`.
ℹ Row 3 of `x` matches multiple rows in `y`.
ℹ Row 19 of `y` matches multiple rows in `x`.
ℹ If a many-to-many relationship is expected, set `relationship =
"many-to-many"` to silence this warning.
cause
1 idiopathic dilated cardiomyopathy and myocarditis
2 cleft lip
3 tracheal bronchus and lung cancer
4 tracheal bronchus and lung cancer
5 tracheal bronchus and lung cancer
6 tracheal bronchus and lung cancer
measure location sex age metric year
1 <NA> <NA> <NA> <NA> <NA> NA
2 <NA> <NA> <NA> <NA> <NA> NA
3 DALYs (Disability-Adjusted Life Years) Global Both All ages Rate 2019
4 Prevalence Global Both All ages Rate 2019
5 Incidence Global Both All ages Rate 2019
6 DALYs (Disability-Adjusted Life Years) Global Both All ages Rate 2019
val upper lower
1 NA NA NA
2 NA NA NA
3 580.36100 627.79984 532.74652
4 40.27440 43.51721 37.12978
5 28.16826 30.49575 25.77712
6 580.36100 627.79984 532.74652
gwas_study_info |> select(cause = l2_all_disease_terms) |>
distinct() |>
left_join(gbd_data) |>
head()
Joining with `by = join_by(cause)`
cause
<char>
1:
2: idiopathic dilated cardiomyopathy and myocarditis
3: cleft lip
4: tracheal bronchus and lung cancer
5: tracheal bronchus and lung cancer
6: tracheal bronchus and lung cancer
measure location sex age metric year
<char> <char> <char> <char> <char> <int>
1: <NA> <NA> <NA> <NA> <NA> NA
2: <NA> <NA> <NA> <NA> <NA> NA
3: <NA> <NA> <NA> <NA> <NA> NA
4: DALYs (Disability-Adjusted Life Years) Global Both All ages Rate 2019
5: Prevalence Global Both All ages Rate 2019
6: Incidence Global Both All ages Rate 2019
val upper lower
<num> <num> <num>
1: NA NA NA
2: NA NA NA
3: NA NA NA
4: 580.36100 627.79984 532.74652
5: 40.27440 43.51721 37.12978
6: 28.16826 30.49575 25.77712
diseases <- stringr::str_split(pattern = ", ",
gwas_study_info$l2_all_disease_terms[gwas_study_info$l2_all_disease_terms != ""]) |>
unlist() |>
stringr::str_trim()
length(unique(diseases))
[1] 1683
# make frequency table
freq <- table(as.factor(diseases))
# sort in decreasing order
freq_sorted <- sort(freq, decreasing = TRUE)
# show top N, e.g. top 10
head(freq_sorted, 10)
chronic kidney disease
10873
hypertension
6808
diabetes mellitus type 2
922
other mental disorders
649
other cardiovascular and circulatory diseases
568
other neoplasms
557
ischemic heart disease
525
alzheimer's disease and other dementias
509
depressive disorders
431
gynecological diseases
394
gwas_study_info <- fwrite(gwas_study_info,
here::here("output/gwas_cat/gwas_study_info_trait_group_l2.csv"))
sessionInfo()
R version 4.3.1 (2023-06-16)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS 15.6.1
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
time zone: America/Los_Angeles
tzcode source: internal
attached base packages:
[1] stats graphics grDevices datasets utils methods base
other attached packages:
[1] jsonlite_2.0.0 httr_1.4.7 stringr_1.5.1 ggplot2_3.5.2
[5] data.table_1.17.8 dplyr_1.1.4 workflowr_1.7.1
loaded via a namespace (and not attached):
[1] gtable_0.3.6 compiler_4.3.1 renv_1.0.3 promises_1.3.3
[5] tidyselect_1.2.1 Rcpp_1.1.0 git2r_0.36.2 callr_3.7.6
[9] later_1.4.2 jquerylib_0.1.4 scales_1.4.0 yaml_2.3.10
[13] fastmap_1.2.0 here_1.0.1 R6_2.6.1 generics_0.1.4
[17] curl_6.4.0 knitr_1.50 tibble_3.3.0 rprojroot_2.1.0
[21] RColorBrewer_1.1-3 bslib_0.9.0 pillar_1.11.0 rlang_1.1.6
[25] cachem_1.1.0 stringi_1.8.7 httpuv_1.6.16 xfun_0.52
[29] getPass_0.2-4 fs_1.6.6 sass_0.4.10 cli_3.6.5
[33] withr_3.0.2 magrittr_2.0.3 ps_1.9.1 grid_4.3.1
[37] digest_0.6.37 processx_3.8.6 rstudioapi_0.17.1 lifecycle_1.0.4
[41] vctrs_0.6.5 evaluate_1.0.4 glue_1.8.0 farver_2.1.2
[45] whisker_0.4.1 rmarkdown_2.29 tools_4.3.1 pkgconfig_2.0.3
[49] htmltools_0.5.8.1