Last updated: 2025-09-16
Checks: 7 0
Knit directory:
genomics_ancest_disease_dispar/
This reproducible R Markdown analysis was created with workflowr (version 1.7.1). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.
Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.
Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.
The command set.seed(20220216)
was run prior to running
the code in the R Markdown file. Setting a seed ensures that any results
that rely on randomness, e.g. subsampling or permutations, are
reproducible.
Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.
Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.
Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.
The results in this page were generated with repository version 345ad9b. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.
Note that you need to be careful to ensure that all relevant files for
the analysis have been committed to Git prior to generating the results
(you can use wflow_publish
or
wflow_git_commit
). workflowr only checks the R Markdown
file, but you know if there are other scripts or data files that it
depends on. Below is the status of the Git repository when the results
were generated:
Ignored files:
Ignored: .DS_Store
Ignored: .Rproj.user/
Ignored: data/.DS_Store
Ignored: data/gbd/.DS_Store
Ignored: data/gbd/ihme_gbd_2019_global_disease_burden_rate_all_ages.csv
Ignored: data/gbd/ihme_gbd_2019_global_paf_rate_percent_all_ages.csv
Ignored: data/gbd/ihme_gbd_2021_global_disease_burden_rate_all_ages.csv
Ignored: data/gbd/ihme_gbd_2021_global_paf_rate_percent_all_ages.csv
Ignored: data/gwas_catalog/
Ignored: data/who/
Ignored: output/gwas_cat/
Ignored: output/gwas_study_info_cohort_corrected.csv
Ignored: output/gwas_study_info_trait_corrected.csv
Ignored: output/gwas_study_info_trait_ontology_info.csv
Ignored: output/gwas_study_info_trait_ontology_info_l1.csv
Ignored: output/gwas_study_info_trait_ontology_info_l2.csv
Ignored: output/trait_ontology/
Ignored: renv/
Unstaged changes:
Modified: code/get_term_descendants.R
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
These are the previous versions of the repository in which changes were
made to the R Markdown (analysis/level_2_disease_group.Rmd
)
and HTML (docs/level_2_disease_group.html
) files. If you’ve
configured a remote Git repository (see ?wflow_git_remote
),
click on the hyperlinks in the table below to view the files as they
were in that past version.
File | Version | Author | Date | Message |
---|---|---|---|---|
Rmd | 345ad9b | IJbeasley | 2025-09-16 | More cancer typos |
html | a15dd40 | IJbeasley | 2025-09-16 | Build site. |
Rmd | 16ead66 | IJbeasley | 2025-09-16 | Correcting some cancer grouping |
html | 6018e42 | IJbeasley | 2025-09-16 | Build site. |
Rmd | 02a0b9d | IJbeasley | 2025-09-16 | Improving cancer grouping |
html | 6f66696 | IJbeasley | 2025-09-16 | Build site. |
Rmd | 66cff1c | IJbeasley | 2025-09-16 | Even more disease term grouping |
html | 21b6c02 | IJbeasley | 2025-09-15 | Build site. |
html | 5ec3111 | IJbeasley | 2025-09-15 | Build site. |
html | 30d773e | IJbeasley | 2025-09-15 | Build site. |
html | 8d64a38 | IJbeasley | 2025-09-15 | Build site. |
Rmd | b3088d8 | IJbeasley | 2025-09-15 | workflowr::wflow_publish("analysis/level_2_disease_group.Rmd") |
html | b89d661 | IJbeasley | 2025-09-10 | Build site. |
Rmd | c0fcab7 | IJbeasley | 2025-09-10 | workflowr::wflow_publish("analysis/level_2_disease_group.Rmd") |
html | ead4d8e | IJbeasley | 2025-09-10 | Build site. |
Rmd | 3964f77 | IJbeasley | 2025-09-10 | workflowr::wflow_publish("analysis/level_2_disease_group.Rmd") |
html | 8fb639d | IJbeasley | 2025-09-10 | Build site. |
Rmd | edeb6f5 | IJbeasley | 2025-09-10 | workflowr::wflow_publish("analysis/level_2_disease_group.Rmd") |
html | fe91704 | IJbeasley | 2025-09-09 | Build site. |
Rmd | 9c64867 | IJbeasley | 2025-09-09 | Minor fixing of disease trait categorisation |
html | fa509c0 | IJbeasley | 2025-09-08 | Build site. |
Rmd | c9602c7 | IJbeasley | 2025-09-08 | More grouping to match GBD |
library(dplyr)
library(data.table)
library(ggplot2)
library(stringr)
source(here::here("code/get_term_descendants.R"))
gwas_study_info <- fread(here::here("output/gwas_cat/gwas_study_info_group_l1_v2.csv"))
gwas_study_info = gwas_study_info |>
mutate(l2_all_disease_terms = l1_all_disease_terms)
gwas_study_info |>
filter(grepl("lip and oral cavity cancer", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct()
all_disease_terms
<char>
1: oral cavity cancer
2: mouth neoplasm
3: tongue cancer
4: major salivary gland cancer
5: human papilloma virus infection, oral cavity cancer
6: tongue neoplasm
7: major salivary gland carcinoma
8: lip cancer
9: oral squamous cell carcinoma
l2_all_disease_terms
<char>
1: lip and oral cavity cancer
2: lip and oral cavity cancer
3: lip and oral cavity cancer
4: lip and oral cavity cancer
5: human papilloma virus infection, lip and oral cavity cancer
6: lip and oral cavity cancer
7: lip and oral cavity cancer
8: lip and oral cavity cancer
9: lip and oral cavity cancer
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = "nasopharyngeal cancer",
"nasopharynx cancer"
)
)
gwas_study_info |>
filter(grepl("nasopharynx cancer", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct()
all_disease_terms l2_all_disease_terms
<char> <char>
1: nasopharyngeal neoplasm nasopharynx cancer
gwas_study_info |>
filter(grepl("other pharynx cancer", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct()
all_disease_terms
<char>
1: oropharynx cancer
2: laryngeal squamous cell carcinoma, hypopharynx cancer
3: human papilloma virus infection, oropharynx cancer
4: tonsil cancer
5: hypopharyngeal carcinoma
6: pharynx cancer, laryngeal carcinoma
l2_all_disease_terms
<char>
1: other pharynx cancer
2: larynx cancer, other pharynx cancer
3: human papilloma virus infection, other pharynx cancer
4: other pharynx cancer
5: other pharynx cancer
6: larynx cancer, other pharynx cancer
gwas_study_info |>
filter(grepl("esophageal cancer", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct()
all_disease_terms
<char>
1: esophageal adenocarcinoma, barretts esophagus
2: esophageal adenocarcinoma
3: esophageal adenocarcinoma, gastroesophageal reflux disease
4: esophageal squamous cell carcinoma
5: esophageal carcinoma, gastric carcinoma
6: squamous cell carcinoma, esophageal carcinoma
7: esophageal carcinoma
8: esophageal adenocarcinoma, digestive system disease, barretts esophagus
9: neoplasm of esophagus
10: esophageal cancer
l2_all_disease_terms
<char>
1: barretts esophagus, esophageal cancer
2: esophageal cancer
3: esophageal cancer, gastroesophageal reflux disease
4: esophageal cancer
5: esophageal cancer, stomach cancer
6: esophageal cancer
7: esophageal cancer
8: barretts esophagus, digestive system disease, esophageal cancer
9: esophageal cancer
10: esophageal cancer
gwas_study_info |>
filter(grepl("stomach cancer", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct()
all_disease_terms
<char>
1: gastric carcinoma
2: esophageal carcinoma, gastric carcinoma
3: gastric cardia carcinoma
4: gastric adenocarcinoma
5: lung carcinoma, squamous cell carcinoma, gastric carcinoma
6: gastric cancer
7: stomach neoplasm
8: gastric intestinal type adenocarcinoma
9: diffuse gastric adenocarcinoma
10: cardia cancer
l2_all_disease_terms
<char>
1: stomach cancer
2: esophageal cancer, stomach cancer
3: stomach cancer
4: stomach cancer
5: lung cancer, stomach cancer
6: stomach cancer
7: stomach cancer
8: stomach cancer
9: stomach cancer
10: stomach cancer
gwas_study_info = gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = "colorectal cancer",
"colon and rectum cancer"
)
)
gwas_study_info |>
filter(grepl("colon and rectum cancer", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct()
all_disease_terms
<char>
1: colorectal cancer
2: sclerosing cholangitis, colorectal cancer
3: colorectal cancer, colorectal adenoma
4: metastatic colorectal cancer
5: lung carcinoma, estrogen-receptor negative breast cancer, ovarian endometrioid carcinoma, colorectal cancer, prostate carcinoma, ovarian serous carcinoma, breast carcinoma, ovarian carcinoma, lung adenocarcinoma, squamous cell lung carcinoma, cancer
6: rectum cancer
7: colonic neoplasm
8: colorectal adenocarcinoma
9: colon carcinoma
10: colorectal cancer, colorectal mucinous adenocarcinoma
11: colorectal carcinoma
12: colorectal cancer, peripheral neuropathy
13: colorectal cancer, stomatitis
14: colorectal cancer, neutropenia
15: colorectal cancer, hand-foot syndrome
16: colorectal cancer, exanthem
17: colorectal cancer, sleepiness
18: anal carcinoma
19: cecum cancer
20: sigmoid neoplasm
21: rectum cancer, colonic neoplasm
22: colorectal cancer, inflammatory bowel disease
23: colon carcinoma, sensory peripheral neuropathy
24: colon carcinoma, drug allergy
25: colorectal cancer, lung cancer
26: colorectal cancer, squamous cell lung carcinoma
27: colorectal cancer, skin disease
28: skin disease, colon carcinoma
29: age of onset of colorectal cancer
30: cecal neoplasm
31: colorectal cancer, breast carcinoma
32: metastatic colorectal cancer, disease progression measurement
33: polyp of large intestine, colorectal cancer
all_disease_terms
l2_all_disease_terms
<char>
1: colon and rectum cancer
2: colon and rectum cancer, sclerosing cholangitis
3: benign neoplasm, colon and rectum cancer
4: colon and rectum cancer
5: breast cancer, colon and rectum cancer, lung cancer, ovarian cancer, prostate cancer
6: colon and rectum cancer
7: colon and rectum cancer
8: colon and rectum cancer
9: colon and rectum cancer
10: colon and rectum cancer
11: colon and rectum cancer
12: colon and rectum cancer, peripheral neuropathy
13: colon and rectum cancer, stomatitis
14: colon and rectum cancer, neutropenia
15: colon and rectum cancer, hand-foot syndrome
16: colon and rectum cancer, exanthem
17: colon and rectum cancer, sleepiness
18: colon and rectum cancer
19: colon and rectum cancer
20: colon and rectum cancer
21: colon and rectum cancer
22: colon and rectum cancer, inflammatory bowel disease
23: colon and rectum cancer, sensory peripheral neuropathy
24: colon and rectum cancer, drug allergy
25: colon and rectum cancer, lung cancer
26: colon and rectum cancer, lung cancer
27: colon and rectum cancer, skin disease
28: colon and rectum cancer, skin disease
29: colon and rectum cancer
30: colon and rectum cancer
31: breast cancer, colon and rectum cancer
32: colon and rectum cancer
33: benign neoplasm, colon and rectum cancer
l2_all_disease_terms
gwas_study_info |>
filter(grepl("liver cancer", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct()
all_disease_terms
<char>
1: hepatitis b virus infection, hepatocellular carcinoma
2: sclerosing cholangitis, cholangiocarcinoma
3: cholangiocarcinoma, sclerosing cholangitis
4: sclerosing cholangitis, hepatocellular carcinoma
5: hepatitis c virus infection, hepatocellular carcinoma
6: hepatocellular carcinoma, non-alcoholic steatohepatitis
7: hepatocellular carcinoma
8: biliary tract cancer
9: liver neoplasm
10: intrahepatic bile duct cancer, liver cancer
11: liver cancer
12: liver cancer, bile duct cancer
13: bile duct cancer
14: extrahepatic bile duct carcinoma
15: intrahepatic cholangiocarcinoma
16: alcohol-related disorders, hepatocellular carcinoma
17: hepatitis virus-related hepatocellular carcinoma
18: alcoholic liver cirrhosis, hepatocellular carcinoma
19: cholangiocarcinoma
l2_all_disease_terms
<char>
1: hepatitis b infection, liver cancer
2: liver cancer, sclerosing cholangitis
3: liver cancer, sclerosing cholangitis
4: liver cancer, sclerosing cholangitis
5: hepatitis c infection, liver cancer
6: liver cancer, non-alcoholic fatty liver disease
7: liver cancer
8: liver cancer
9: liver cancer
10: liver cancer
11: liver cancer
12: liver cancer
13: liver cancer
14: liver cancer
15: liver cancer
16: alcohol-related disorders, liver cancer
17: liver cancer
18: alcoholic liver disease, liver cancer
19: liver cancer
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
case_when(
l2_all_disease_terms == "cancer of gallbladder and extrahepatic biliary tract" ~ "gallbladder and biliary tract cancer",
TRUE ~ l2_all_disease_terms
)
)
gwas_study_info |>
filter(grepl("gallbladder and biliary tract cancer", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct()
all_disease_terms
<char>
1: sclerosing cholangitis, gallbladder neoplasm
2: gallbladder neoplasm
3: carcinoma of gallbladder and extrahepatic biliary tract
l2_all_disease_terms
<char>
1: gallbladder and biliary tract cancer, sclerosing cholangitis
2: gallbladder and biliary tract cancer
3: gallbladder and biliary tract cancer
gwas_study_info |>
filter(grepl("pancreatic cancer", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct()
all_disease_terms
<char>
1: pancreatic carcinoma
2: pancreatic ductal adenocarcinoma
3: pancreatic carcinoma, neutropenia
4: breast carcinoma, estrogen-receptor positive breast cancer, metastatic prostate cancer, pancreatic carcinoma, hypertension
5: breast carcinoma, estrogen-receptor positive breast cancer, metastatic prostate cancer, pancreatic carcinoma, proteinuria
6: breast carcinoma, estrogen-receptor positive breast cancer, metastatic prostate cancer, pancreatic carcinoma, hypertension, proteinuria
l2_all_disease_terms
<char>
1: pancreatic cancer
2: pancreatic cancer
3: neutropenia, pancreatic cancer
4: breast cancer, hypertension, pancreatic cancer, prostate cancer
5: breast cancer, pancreatic cancer, prostate cancer, proteinuria
6: breast cancer, hypertension, pancreatic cancer, prostate cancer, proteinuria
gwas_study_info |>
filter(grepl("larynx cancer", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct()
all_disease_terms
<char>
1: laryngeal squamous cell carcinoma
2: laryngeal squamous cell carcinoma, hypopharynx cancer
3: laryngeal carcinoma
4: laryngeal neoplasm
5: glottis neoplasm
6: pharynx cancer, laryngeal carcinoma
l2_all_disease_terms
<char>
1: larynx cancer
2: larynx cancer, other pharynx cancer
3: larynx cancer
4: larynx cancer
5: larynx cancer
6: larynx cancer, other pharynx cancer
resp_cancer_terms = c("lung cancer",
"bronchus cancer",
"tracheal cancer",
"respiratory system cancer"
)
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
ifelse(l2_all_disease_terms != "tracheal bronchus and lung cancer",
stringr::str_replace_all(l2_all_disease_terms,
pattern = paste0(resp_cancer_terms, collapse = "(?=,|$)|\\b"),
"tracheal bronchus and lung cancer"
),
l2_all_disease_terms
)
)
gwas_study_info |>
filter(grepl("tracheal bronchus and lung cancer", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct()
all_disease_terms
<char>
1: non-small cell lung carcinoma
2: lung adenocarcinoma
3: squamous cell lung carcinoma
4: lung carcinoma, family history of lung cancer
5: lung adenocarcinoma, family history of lung cancer
6: squamous cell lung carcinoma, family history of lung cancer
7: lung carcinoma
8: small cell lung carcinoma
9: lung carcinoma, schizophrenia
10: lung carcinoma, squamous cell carcinoma, lung adenocarcinoma
11: lung carcinoma, estrogen-receptor negative breast cancer, ovarian endometrioid carcinoma, colorectal cancer, prostate carcinoma, ovarian serous carcinoma, breast carcinoma, ovarian carcinoma, lung adenocarcinoma, squamous cell lung carcinoma, cancer
12: non-small cell lung carcinoma, drug-induced liver injury
13: lung carcinoma, chronic obstructive pulmonary disease
14: lung carcinoma, squamous cell carcinoma, gastric carcinoma
15: lung cancer
16: respiratory system cancer
17: bronchus cancer
18: lung cancer, bronchus cancer
19: family history of lung cancer
20: breast cancer, lung cancer
21: head and neck carcinoma, lung cancer
22: small cell carcinoma
23: colorectal cancer, lung cancer
24: lung cancer, gastroesophageal reflux disease
25: peptic ulcer disease, lung cancer
26: colorectal cancer, squamous cell lung carcinoma
27: peptic ulcer disease, squamous cell lung carcinoma
28: non-small cell lung carcinoma, disease progression measurement
29: lung neoplasm
30: cancer
31: lung cancer, radiation-induced disorder
all_disease_terms
l2_all_disease_terms
<char>
1: tracheal bronchus and lung cancer
2: tracheal bronchus and lung cancer
3: tracheal bronchus and lung cancer
4: tracheal bronchus and lung cancer
5: tracheal bronchus and lung cancer
6: tracheal bronchus and lung cancer
7: tracheal bronchus and lung cancer
8: tracheal bronchus and lung cancer
9: tracheal bronchus and lung cancer, schizophrenia
10: tracheal bronchus and lung cancer
11: breast cancer, colon and rectum cancer, tracheal bronchus and lung cancer, ovarian cancer, prostate cancer
12: drug-induced liver injury, tracheal bronchus and lung cancer
13: chronic obstructive pulmonary disease, tracheal bronchus and lung cancer
14: tracheal bronchus and lung cancer, stomach cancer
15: tracheal bronchus and lung cancer
16: tracheal bronchus and lung cancer
17: tracheal bronchus and lung cancer
18: tracheal bronchus and lung cancer, tracheal bronchus and lung cancer
19: tracheal bronchus and lung cancer
20: breast cancer, tracheal bronchus and lung cancer
21: head and neck cancer, tracheal bronchus and lung cancer
22: tracheal bronchus and lung cancer
23: colon and rectum cancer, tracheal bronchus and lung cancer
24: gastroesophageal reflux disease, tracheal bronchus and lung cancer
25: tracheal bronchus and lung cancer, peptic ulcer disease
26: colon and rectum cancer, tracheal bronchus and lung cancer
27: tracheal bronchus and lung cancer, peptic ulcer disease
28: tracheal bronchus and lung cancer
29: tracheal bronchus and lung cancer
30: tracheal bronchus and lung cancer
31: tracheal bronchus and lung cancer, radiation-induced disorder
l2_all_disease_terms
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms,
pattern = "malignant melanoma of skin",
"malignant skin melanoma"
)
)
gwas_study_info |>
filter(grepl("malignant skin melanoma", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct()
all_disease_terms
<char>
1: cutaneous melanoma
2: melanoma
3: neuroblastoma, cutaneous melanoma
4: melanoma, immune system toxicity
5: non-melanoma skin carcinoma
l2_all_disease_terms
<char>
1: malignant skin melanoma
2: malignant skin melanoma
3: malignant skin melanoma, neuroblastoma
4: immune system toxicity, malignant skin melanoma
5: non-malignant skin melanoma skin cancer
gwas_study_info |>
filter(grepl("non-melanoma skin cancer", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct()
all_disease_terms l2_all_disease_terms
<char> <char>
1: squamous cell carcinoma, basal cell carcinoma non-melanoma skin cancer
2: keratinocyte carcinoma non-melanoma skin cancer
3: basal cell carcinoma non-melanoma skin cancer
4: non-melanoma skin carcinoma non-melanoma skin cancer
5: skin neoplasm non-melanoma skin cancer
6: skin carcinoma in situ non-melanoma skin cancer
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = "soft tissue sarcoma",
"soft tissue and other extraosseous sarcomas"
)
)
gwas_study_info |>
filter(grepl("soft tissue and other extraosseous sarcomas", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct()
all_disease_terms l2_all_disease_terms
<char> <char>
1: sarcoma, fibrosarcoma sarcoma, soft tissue and other extraosseous sarcomas
2: kaposis sarcoma soft tissue and other extraosseous sarcomas
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = "bone cancer|osteosarcoma",
"malignant neoplasm of bone and articular cartilage"
)
)
gwas_study_info |>
filter(grepl("malignant neoplasm of bone and articular cartilage", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct()
all_disease_terms
<char>
1: osteosarcoma
2: acute myeloid leukemia
3: myeloid leukemia
4: malignant bone neoplasm
5: acute myeloid leukemia, myelodysplastic syndrome
6: bone neoplasm
7: myelofibrosis
8: acute lymphoblastic leukemia, acute myeloid leukemia, myelodysplastic syndrome
l2_all_disease_terms
<char>
1: malignant neoplasm of bone and articular cartilage
2: malignant neoplasm of bone and articular cartilage
3: malignant neoplasm of bone and articular cartilage
4: malignant neoplasm of bone and articular cartilage
5: malignant neoplasm of bone and articular cartilage, myelodysplastic syndrome
6: malignant neoplasm of bone and articular cartilage
7: malignant neoplasm of bone and articular cartilage
8: malignant neoplasm of bone and articular cartilage, leukemia, myelodysplastic syndrome
gwas_study_info |>
filter(grepl("breast cancer", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct()
all_disease_terms
<char>
1: estrogen-receptor negative breast cancer
2: breast carcinoma
3: estrogen-receptor positive breast cancer
4: breast carcinoma,
5: estrogen-receptor positive breast cancer, breast carcinoma
6: estrogen-receptor negative breast cancer, breast carcinoma
7: breast carcinoma, peripheral neuropathy
8: prostate carcinoma, breast carcinoma, ovarian carcinoma
9: male breast carcinoma
10: invasive lobular carcinoma
11: lung carcinoma, estrogen-receptor negative breast cancer, ovarian endometrioid carcinoma, colorectal cancer, prostate carcinoma, ovarian serous carcinoma, breast carcinoma, ovarian carcinoma, lung adenocarcinoma, squamous cell lung carcinoma, cancer
12: tp53 positive breast carcinoma
13: breast carcinoma, congestive heart failure
14: triple-negative breast cancer
15: breast carcinoma, chemotherapy-induced hypertension
16: estrogen-receptor negative breast cancer, estrogen-receptor positive breast cancer, breast carcinoma
17: childhood cancer, breast carcinoma
18: breast cancer
19: luminal a breast carcinoma
20: luminal b breast carcinoma
21: basal-like breast carcinoma
22: her2 positive breast carcinoma
23: breast carcinoma, chemotherapy-induced alopecia
24: progesterone-receptor negative breast cancer
25: her2 negative breast carcinoma
26: breast carcinoma, amenorrhea
27: breast carcinoma, post operative nausea and vomiting
28: breast cancer, covid-19
29: estrogen-receptor positive breast cancer, breast carcinoma, her2 negative breast carcinoma, progesterone-receptor positive breast cancer
30: her2 positive breast carcinoma, estrogen-receptor positive breast cancer, breast carcinoma, progesterone-receptor positive breast cancer
31: progesterone-receptor negative breast cancer, estrogen-receptor negative breast cancer, her2 positive breast carcinoma, breast carcinoma
32: breast carcinoma, triple-negative breast cancer
33: her2 positive breast carcinoma, musculoskeletal system disease
34: lobular breast carcinoma in situ
35: breast carcinoma in situ
36: breast neoplasm
37: breast cancer, ovarian carcinoma
38: breast cancer, lung cancer
39: estrogen-receptor negative breast cancer, estrogen-receptor positive breast cancer
40: estrogen-receptor positive breast cancer, triple-negative breast cancer
41: her2 positive breast carcinoma, triple-negative breast cancer
42: breast carcinoma, cardiotoxicity
43: breast carcinoma, uterine leiomyoma
44: estrogen-receptor positive breast cancer, uterine leiomyoma
45: estrogen-receptor negative breast cancer, uterine leiomyoma
46: ductal breast carcinoma in situ and lobular carcinoma in situ
47: luminal b breast carcinoma, luminal a breast carcinoma
48: her2 positive breast carcinoma, luminal a breast carcinoma
49: triple-negative breast cancer, luminal a breast carcinoma
50: luminal b breast carcinoma, triple-negative breast cancer
51: luminal b breast carcinoma, her2 negative breast carcinoma, triple-negative breast cancer
52: luminal b breast carcinoma, her2 positive breast carcinoma
53: breast cancer, radiation-induced disorder
54: colorectal cancer, breast carcinoma
55: breast carcinoma, dermatological toxicity
56: breast carcinoma, edema
57: breast carcinoma, telangiectasia of the skin
58: breast carcinoma, lymphedema
59: luminal b breast carcinoma, her2 negative breast carcinoma
60: breast carcinoma, estrogen-receptor positive breast cancer, metastatic prostate cancer, pancreatic carcinoma, hypertension
61: breast carcinoma, estrogen-receptor positive breast cancer, metastatic prostate cancer, pancreatic carcinoma, proteinuria
62: breast carcinoma, estrogen-receptor positive breast cancer, metastatic prostate cancer, pancreatic carcinoma, hypertension, proteinuria
63: brcax breast cancer
64: schizophrenia, breast carcinoma
65: schizophrenia, estrogen-receptor positive breast cancer
66: estrogen-receptor negative breast cancer, schizophrenia
67: breast cancer, neutropenia, leukopenia
all_disease_terms
l2_all_disease_terms
<char>
1: breast cancer
2: breast cancer
3: breast cancer
4: breast cancer
5: breast cancer
6: breast cancer
7: breast cancer, peripheral neuropathy
8: breast cancer, ovarian cancer, prostate cancer
9: breast cancer
10: breast cancer
11: breast cancer, colon and rectum cancer, tracheal bronchus and lung cancer, ovarian cancer, prostate cancer
12: breast cancer
13: breast cancer, congestive heart failure
14: breast cancer
15: breast cancer, hypertension
16: breast cancer
17: breast cancer, childhood cancer
18: breast cancer
19: breast cancer
20: breast cancer
21: breast cancer
22: breast cancer
23: breast cancer, chemotherapy-induced alopecia
24: breast cancer
25: breast cancer
26: amenorrhea, breast cancer
27: breast cancer, post operative nausea and vomiting
28: breast cancer, covid-19
29: breast cancer
30: breast cancer
31: breast cancer
32: breast cancer
33: breast cancer, musculoskeletal system disease
34: breast cancer
35: breast cancer
36: breast cancer
37: breast cancer, ovarian cancer
38: breast cancer, tracheal bronchus and lung cancer
39: breast cancer
40: breast cancer
41: breast cancer
42: breast cancer, cardiotoxicity
43: benign neoplasm, breast cancer
44: benign neoplasm, breast cancer
45: benign neoplasm, breast cancer
46: breast cancer
47: breast cancer
48: breast cancer
49: breast cancer
50: breast cancer
51: breast cancer
52: breast cancer
53: breast cancer, radiation-induced disorder
54: breast cancer, colon and rectum cancer
55: breast cancer, dermatological toxicity
56: breast cancer, edema
57: breast cancer, telangiectasia of the skin
58: breast cancer, lymphedema
59: breast cancer
60: breast cancer, hypertension, pancreatic cancer, prostate cancer
61: breast cancer, pancreatic cancer, prostate cancer, proteinuria
62: breast cancer, hypertension, pancreatic cancer, prostate cancer, proteinuria
63: breast cancer
64: breast cancer, schizophrenia
65: breast cancer, schizophrenia
66: breast cancer, schizophrenia
67: breast cancer, leukopenia, neutropenia
l2_all_disease_terms
gwas_study_info |>
filter(grepl("cervical cancer", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct()
all_disease_terms
<char>
1: cervical carcinoma
2: cervical cancer
3: dysplasia of cervix, cervical cancer
4: dysplasia, cervical cancer
5: uterine cervix carcinoma in situ
6: cervical carcinoma, human papilloma virus infection
7: cervical intraepithelial neoplasia grade 2/3
8: cervical carcinoma, cervical intraepithelial neoplasia grade 2/3
l2_all_disease_terms
<char>
1: cervical cancer
2: cervical cancer
3: cervical cancer, dysplasia of cervix
4: cervical cancer, dysplasia
5: cervical cancer
6: cervical cancer, human papilloma virus infection
7: cervical cancer
8: cervical cancer
# ? is endometrial cancer a subset of uterine cancer for GBD?
# is for ontology: http://purl.obolibrary.org/obo/MONDO_0002715
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = "endometrial cancer",
"uterine cancer"
)
)
gwas_study_info |>
filter(grepl("uterine cancer", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct()
all_disease_terms l2_all_disease_terms
<char> <char>
1: endometrial endometrioid carcinoma uterine cancer
2: endometrial carcinoma uterine cancer
3: endometrial neoplasm uterine cancer
4: uterine carcinoma uterine cancer
5: endometrial cancer, covid-19 covid-19, uterine cancer
6: uterine corpus cancer uterine cancer
7: ovarian endometrioid adenocarcinoma uterine cancer
8: uterine cancer uterine cancer
9: uterine adnexa cancer, ovarian cancer ovarian cancer, uterine cancer
10: endometrial cancer uterine cancer
11: endometrial carcinoma, endometriosis uterine cancer, endometriosis
gwas_study_info |>
filter(grepl("ovarian cancer", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct()
all_disease_terms
<char>
1: ovarian carcinoma
2: malignant epithelial tumor of ovary
3: prostate carcinoma, breast carcinoma, ovarian carcinoma
4: lung carcinoma, estrogen-receptor negative breast cancer, ovarian endometrioid carcinoma, colorectal cancer, prostate carcinoma, ovarian serous carcinoma, breast carcinoma, ovarian carcinoma, lung adenocarcinoma, squamous cell lung carcinoma, cancer
5: ovarian mucinous adenocarcinoma
6: ovarian serous carcinoma
7: ovarian clear cell adenocarcinoma
8: ovarian endometrioid carcinoma
9: ovarian carcinoma, covid-19
10: high grade ovarian serous adenocarcinoma
11: ovarian serous adenocarcinoma
12: ovarian clear cell cancer
13: uterine adnexa cancer, ovarian cancer
14: ovarian cancer
15: ovarian neoplasm
16: breast cancer, ovarian carcinoma
17: ovarian carcinoma, cancer aggressiveness measurement
18: ovarian serous carcinoma, cancer aggressiveness measurement
l2_all_disease_terms
<char>
1: ovarian cancer
2: ovarian cancer
3: breast cancer, ovarian cancer, prostate cancer
4: breast cancer, colon and rectum cancer, tracheal bronchus and lung cancer, ovarian cancer, prostate cancer
5: ovarian cancer
6: ovarian cancer
7: ovarian cancer
8: ovarian cancer
9: covid-19, ovarian cancer
10: ovarian cancer
11: ovarian cancer
12: ovarian cancer
13: ovarian cancer, uterine cancer
14: ovarian cancer
15: ovarian cancer
16: breast cancer, ovarian cancer
17: ovarian cancer
18: ovarian cancer
gwas_study_info |>
filter(grepl("prostate cancer", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct()
all_disease_terms
<char>
1: prostate carcinoma
2: cancer aggressiveness measurement, prostate carcinoma
3: prostate carcinoma, breast carcinoma, ovarian carcinoma
4: metastatic prostate cancer, peripheral neuropathy
5: metastatic prostate cancer
6: prostate carcinoma, erectile dysfunction
7: lung carcinoma, estrogen-receptor negative breast cancer, ovarian endometrioid carcinoma, colorectal cancer, prostate carcinoma, ovarian serous carcinoma, breast carcinoma, ovarian carcinoma, lung adenocarcinoma, squamous cell lung carcinoma, cancer
8: prostate carcinoma, adverse effect
9: prostate cancer
10: grade iii prostatic intraepithelial neoplasia
11: prostate carcinoma, type 2 diabetes mellitus
12: prostate cancer, disease progression measurement
13: prostate cancer, radiation-induced disorder
14: breast carcinoma, estrogen-receptor positive breast cancer, metastatic prostate cancer, pancreatic carcinoma, hypertension
15: breast carcinoma, estrogen-receptor positive breast cancer, metastatic prostate cancer, pancreatic carcinoma, proteinuria
16: breast carcinoma, estrogen-receptor positive breast cancer, metastatic prostate cancer, pancreatic carcinoma, hypertension, proteinuria
17: metastatic prostate cancer, disease progression measurement
l2_all_disease_terms
<char>
1: prostate cancer
2: prostate cancer
3: breast cancer, ovarian cancer, prostate cancer
4: peripheral neuropathy, prostate cancer
5: prostate cancer
6: erectile dysfunction, prostate cancer
7: breast cancer, colon and rectum cancer, tracheal bronchus and lung cancer, ovarian cancer, prostate cancer
8: complication, prostate cancer
9: prostate cancer
10: prostate cancer
11: prostate cancer, type 2 diabetes mellitus
12: prostate cancer
13: prostate cancer, radiation-induced disorder
14: breast cancer, hypertension, pancreatic cancer, prostate cancer
15: breast cancer, pancreatic cancer, prostate cancer, proteinuria
16: breast cancer, hypertension, pancreatic cancer, prostate cancer, proteinuria
17: prostate cancer
gwas_study_info |>
filter(grepl("testicular cancer", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct()
all_disease_terms
<char>
1: testicular carcinoma
2: testicular carcinoma, cardiovascular disease
3: testicular neoplasm
l2_all_disease_terms
<char>
1: testicular cancer
2: cardiovascular disease, testicular cancer
3: testicular cancer
gwas_study_info |>
filter(grepl("kidney cancer", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct()
all_disease_terms l2_all_disease_terms
<char> <char>
1: renal cell carcinoma kidney cancer
2: nephroblastoma kidney cancer
3: kidney cancer kidney cancer
4: clear cell renal carcinoma kidney cancer
5: renal carcinoma kidney cancer
6: papillary renal cell carcinoma kidney cancer
7: kidney neoplasm kidney cancer
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = "urinary bladder cancer",
"bladder cancer"
)
)
gwas_study_info |>
filter(grepl("bladder cancer", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct()
all_disease_terms
<char>
1: urinary bladder carcinoma
2: urinary bladder carcinoma, disease progression measurement
3: urinary bladder cancer,
4: urinary bladder cancer
l2_all_disease_terms
<char>
1: bladder cancer
2: bladder cancer
3: bladder cancer
4: bladder cancer
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = "\\bcentral nervous system cancer\\b",
"brain and central nervous system cancer"
)
)
gwas_study_info |>
filter(grepl("brain and central nervous system cancer", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct()
all_disease_terms
<char>
1: glioblastoma multiforme
2: central nervous system cancer
3: glioma
4: central nervous system cancer, glioma
5: central nervous system cancer, glioblastoma multiforme
6: brain neoplasm
7: astrocytoma
8: oligodendroglioma
9: nervous system cancer, brain cancer
10: brain cancer
11: malignant glioma
12: central nervous system non-hodgkin lymphoma
l2_all_disease_terms
<char>
1: brain and central nervous system cancer
2: brain and central nervous system cancer
3: brain and central nervous system cancer
4: brain and central nervous system cancer
5: brain and central nervous system cancer
6: brain and central nervous system cancer
7: brain and central nervous system cancer
8: brain and central nervous system cancer
9: brain and central nervous system cancer
10: brain and central nervous system cancer
11: brain and central nervous system cancer
12: brain and central nervous system cancer
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = "\\bocular melanoma\\b|ocular cancer\\b",
"eye cancer"
)
)
gwas_study_info |>
filter(grepl("eye cancer", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct()
all_disease_terms l2_all_disease_terms
<char> <char>
1: uveal melanoma eye cancer
2: choroidal melanoma eye cancer
3: epithelioid cell uveal melanoma eye cancer
4: uveal melanoma, epithelioid cell uveal melanoma eye cancer
5: uveal melanoma disease severity eye cancer
6: ocular cancer eye cancer
7: eye neoplasm eye cancer
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = "\\bneuroblastoma\\b|\\bperipheral nervous system cancer\\b",
"neuroblastoma and other peripheral nervous system cancers"
)
)
gwas_study_info |>
filter(grepl("neuroblastoma and other peripheral nervous system cancers", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct()
all_disease_terms
<char>
1: neuroblastoma
2: neuroblastoma, cutaneous melanoma
l2_all_disease_terms
<char>
1: neuroblastoma and other peripheral nervous system cancers
2: malignant skin melanoma, neuroblastoma and other peripheral nervous system cancers
gwas_study_info |>
filter(grepl("thyroid cancer", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct()
all_disease_terms l2_all_disease_terms
<char> <char>
1: differentiated thyroid carcinoma thyroid cancer
2: papillary thyroid carcinoma thyroid cancer
3: follicular thyroid carcinoma thyroid cancer
4: thyroid carcinoma thyroid cancer
5: thyroid cancer thyroid cancer
gwas_study_info |>
filter(grepl("mesothelioma", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct()
all_disease_terms l2_all_disease_terms
<char> <char>
1: malignant pleural mesothelioma mesothelioma
2: mesothelioma mesothelioma
3: pleural mesothelioma mesothelioma
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = "hodgkins lymphoma",
"hodgkin lymphoma"
)
)
gwas_study_info |>
filter(grepl("hodgkin lymphoma", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct()
all_disease_terms
<char>
1: diffuse large b-cell lymphoma, multiple sclerosis
2: follicular lymphoma, multiple sclerosis
3: marginal zone b-cell lymphoma, multiple sclerosis
4: diffuse large b-cell lymphoma, rheumatoid arthritis
5: rheumatoid arthritis, follicular lymphoma
6: rheumatoid arthritis, marginal zone b-cell lymphoma
7: diffuse large b-cell lymphoma, systemic lupus erythematosus
8: systemic lupus erythematosus, follicular lymphoma
9: marginal zone b-cell lymphoma, systemic lupus erythematosus
10: nodular sclerosis hodgkin lymphoma
11: hodgkins lymphoma
12: diffuse large b-cell lymphoma
13: neoplasm of mature b-cells
14: marginal zone b-cell lymphoma
15: hodgkins lymphoma, multiple myeloma, chronic lymphocytic leukemia
16: diffuse large b-cell lymphoma, marginal zone b-cell lymphoma, follicular lymphoma, chronic lymphocytic leukemia
17: hodgkins lymphoma, multiple myeloma, non-hodgkins lymphoma
18: acute lymphoblastic leukemia, lymphoblastic lymphoma, venous thromboembolism
19: non-hodgkins lymphoma
20: follicular lymphoma
21: reticulum cell sarcoma
22: extranodal nasal nk/t cell lymphoma
23: hiv infection, non-hodgkins lymphoma
all_disease_terms
l2_all_disease_terms
<char>
1: multiple sclerosis, non-hodgkin lymphoma
2: multiple sclerosis, non-hodgkin lymphoma
3: multiple sclerosis, non-hodgkin lymphoma
4: non-hodgkin lymphoma, rheumatoid arthritis
5: non-hodgkin lymphoma, rheumatoid arthritis
6: non-hodgkin lymphoma, rheumatoid arthritis
7: non-hodgkin lymphoma, systemic lupus erythematosus
8: non-hodgkin lymphoma, systemic lupus erythematosus
9: non-hodgkin lymphoma, systemic lupus erythematosus
10: hodgkin lymphoma
11: hodgkin lymphoma
12: non-hodgkin lymphoma
13: non-hodgkin lymphoma
14: non-hodgkin lymphoma
15: hodgkin lymphoma, leukemia, multiple myeloma
16: leukemia, non-hodgkin lymphoma
17: hodgkin lymphoma, multiple myeloma, non-hodgkin lymphoma
18: leukemia, non-hodgkin lymphoma, venous thromboembolism
19: non-hodgkin lymphoma
20: non-hodgkin lymphoma
21: non-hodgkin lymphoma
22: non-hodgkin lymphoma
23: hiv infection, non-hodgkin lymphoma
l2_all_disease_terms
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = "non-hodgkins lymphoma",
"non-hodgkin lymphoma"
)
)
gwas_study_info |>
filter(grepl("non-hodgkin lymphoma", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct()
all_disease_terms
<char>
1: diffuse large b-cell lymphoma, multiple sclerosis
2: follicular lymphoma, multiple sclerosis
3: marginal zone b-cell lymphoma, multiple sclerosis
4: diffuse large b-cell lymphoma, rheumatoid arthritis
5: rheumatoid arthritis, follicular lymphoma
6: rheumatoid arthritis, marginal zone b-cell lymphoma
7: diffuse large b-cell lymphoma, systemic lupus erythematosus
8: systemic lupus erythematosus, follicular lymphoma
9: marginal zone b-cell lymphoma, systemic lupus erythematosus
10: diffuse large b-cell lymphoma
11: neoplasm of mature b-cells
12: marginal zone b-cell lymphoma
13: diffuse large b-cell lymphoma, marginal zone b-cell lymphoma, follicular lymphoma, chronic lymphocytic leukemia
14: hodgkins lymphoma, multiple myeloma, non-hodgkins lymphoma
15: acute lymphoblastic leukemia, lymphoblastic lymphoma, venous thromboembolism
16: non-hodgkins lymphoma
17: follicular lymphoma
18: reticulum cell sarcoma
19: extranodal nasal nk/t cell lymphoma
20: hiv infection, non-hodgkins lymphoma
all_disease_terms
l2_all_disease_terms
<char>
1: multiple sclerosis, non-hodgkin lymphoma
2: multiple sclerosis, non-hodgkin lymphoma
3: multiple sclerosis, non-hodgkin lymphoma
4: non-hodgkin lymphoma, rheumatoid arthritis
5: non-hodgkin lymphoma, rheumatoid arthritis
6: non-hodgkin lymphoma, rheumatoid arthritis
7: non-hodgkin lymphoma, systemic lupus erythematosus
8: non-hodgkin lymphoma, systemic lupus erythematosus
9: non-hodgkin lymphoma, systemic lupus erythematosus
10: non-hodgkin lymphoma
11: non-hodgkin lymphoma
12: non-hodgkin lymphoma
13: leukemia, non-hodgkin lymphoma
14: hodgkin lymphoma, multiple myeloma, non-hodgkin lymphoma
15: leukemia, non-hodgkin lymphoma, venous thromboembolism
16: non-hodgkin lymphoma
17: non-hodgkin lymphoma
18: non-hodgkin lymphoma
19: non-hodgkin lymphoma
20: hiv infection, non-hodgkin lymphoma
l2_all_disease_terms
gwas_study_info |>
filter(grepl("multiple myeloma", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct()
all_disease_terms
<char>
1: multiple myeloma
2: multiple myeloma, peripheral neuropathy
3: multiple myeloma, chemotherapy-induced oral mucositis
4: multiple myeloma, monoclonal gammopathy
5: hodgkins lymphoma, multiple myeloma, chronic lymphocytic leukemia
6: hodgkins lymphoma, multiple myeloma, non-hodgkins lymphoma
7: multiple myeloma, clostridium difficile infection
l2_all_disease_terms
<char>
1: multiple myeloma
2: multiple myeloma, peripheral neuropathy
3: chemotherapy-induced oral mucositis, multiple myeloma
4: monoclonal gammopathy, multiple myeloma
5: hodgkin lymphoma, leukemia, multiple myeloma
6: hodgkin lymphoma, multiple myeloma, non-hodgkin lymphoma
7: clostridium difficile infection, multiple myeloma
gwas_study_info |>
filter(grepl("leukemia", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct()
all_disease_terms
<char>
1: acute lymphoblastic leukemia
2: multiple sclerosis, chronic lymphocytic leukemia
3: rheumatoid arthritis, chronic lymphocytic leukemia
4: systemic lupus erythematosus, chronic lymphocytic leukemia
5: acute lymphoblastic leukemia, asparaginase-induced acute pancreatitis
6: chronic lymphocytic leukemia
7: b-cell acute lymphoblastic leukemia
8: chronic myelogenous leukemia
9: hodgkins lymphoma, multiple myeloma, chronic lymphocytic leukemia
10: childhood acute lymphoblastic leukemia
11: acute lymphoblastic leukemia, peripheral neuropathy
12: diffuse large b-cell lymphoma, marginal zone b-cell lymphoma, follicular lymphoma, chronic lymphocytic leukemia
13: acute lymphoblastic leukemia, lymphoblastic lymphoma, venous thromboembolism
14: leukemia
15: b-cell acute lymphoblastic leukemia, adult onset asthma
16: b-cell acute lymphoblastic leukemia, childhood onset asthma
17: b-cell acute lymphoblastic leukemia, graves disease
18: b-cell acute lymphoblastic leukemia, hashimotos thyroiditis
19: b-cell acute lymphoblastic leukemia, hypothyroidism
20: b-cell acute lymphoblastic leukemia, primary biliary cirrhosis
21: b-cell acute lymphoblastic leukemia, sclerosing cholangitis
22: b-cell acute lymphoblastic leukemia, inflammatory bowel disease
23: b-cell acute lymphoblastic leukemia, crohns disease
24: b-cell acute lymphoblastic leukemia, ulcerative colitis
25: b-cell acute lymphoblastic leukemia, rheumatoid arthritis
26: b-cell acute lymphoblastic leukemia, multiple sclerosis
27: b-cell acute lymphoblastic leukemia, systemic scleroderma
28: b-cell acute lymphoblastic leukemia, systemic lupus erythematosus
29: b-cell acute lymphoblastic leukemia, type 1 diabetes mellitus
30: b-cell acute lymphoblastic leukemia, vitiligo
31: lymphoid leukemia
32: acute lymphoblastic leukemia, hyperbilirubinemia
33: b-cell acute lymphoblastic leukemia with t(1;19)(q23;p13.3); e2a-pbx1 (tcf3-pbx1)
34: childhood acute lymphoblastic leukemia, b-cell acute lymphoblastic leukemia with t(1;19)(q23;p13.3); e2a-pbx1 (tcf3-pbx1)
35: monocytic leukemia
36: childhood t acute lymphoblastic leukemia
37: acute lymphoblastic leukemia, neurotoxicity
38: acute lymphoblastic leukemia, acute myeloid leukemia, myelodysplastic syndrome
39: myeloid neoplasm
all_disease_terms
l2_all_disease_terms
<char>
1: leukemia
2: leukemia, multiple sclerosis
3: leukemia, rheumatoid arthritis
4: leukemia, systemic lupus erythematosus
5: leukemia, pancreatitis
6: leukemia
7: leukemia
8: leukemia
9: hodgkin lymphoma, leukemia, multiple myeloma
10: leukemia
11: leukemia, peripheral neuropathy
12: leukemia, non-hodgkin lymphoma
13: leukemia, non-hodgkin lymphoma, venous thromboembolism
14: leukemia
15: asthma, leukemia
16: asthma, leukemia
17: graves disease, leukemia
18: hashimotos thyroiditis, leukemia
19: hypothyroidism, leukemia
20: leukemia, primary biliary cirrhosis
21: leukemia, sclerosing cholangitis
22: inflammatory bowel disease, leukemia
23: crohns disease, leukemia
24: leukemia, ulcerative colitis
25: leukemia, rheumatoid arthritis
26: leukemia, multiple sclerosis
27: leukemia, systemic scleroderma
28: leukemia, systemic lupus erythematosus
29: leukemia, type 1 diabetes mellitus
30: leukemia, vitiligo
31: leukemia
32: hyperbilirubinemia, leukemia
33: leukemia
34: leukemia
35: leukemia
36: leukemia
37: leukemia, neurotoxicity
38: malignant neoplasm of bone and articular cartilage, leukemia, myelodysplastic syndrome
39: leukemia
l2_all_disease_terms
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
case_when(
l2_all_disease_terms == "cancer" ~ "other malignant neoplasms",
TRUE ~ l2_all_disease_terms
)
)
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
ifelse(PUBMED_ID == 27790247,
stringr::str_replace_all(l2_all_disease_terms,
pattern = ", cancer,",
", other malignant neoplasms,"
),
l2_all_disease_terms
)
)
### dealing with measuring cancer caused factor terms
gwas_study_info |>
filter(grepl("^cancer,", l2_all_disease_terms)) |>
pull(l2_all_disease_terms) |>
unique()
[1] "cancer, chronic obstructive pulmonary disease"
[2] "cancer, cardiotoxicity"
[3] "cancer, hand-foot syndrome"
[4] "cancer, peripheral neuropathy"
[5] "cancer, immune system toxicity"
[6] "cancer, hypothyroidism"
[7] "cancer, radiation-induced disorder"
[8] "cancer, osteonecrosis"
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
ifelse(grepl("^cancer,", l2_all_disease_terms),
stringr::str_replace_all(l2_all_disease_terms,
pattern = "^cancer,",
"other malignant neoplasms,"
),
l2_all_disease_terms
)
)
other_malignant_terms <- c(
"retroperitoneal cancer",
"peritoneal cancer",
"ewing sarcoma",
"digestive system cancer",
"intestinal cancer",
"small intestine cancer",
"female reproductive organ cancer",
"male reproductive organ cancer",
"vulvar cancer",
"testicular germ cell tumor",
"urogenital cancer",
"squamous cell cancer",
"head and neck cancer",
"malignant tumor of floor of mouth",
"nasal cavity cancer", #? not sure if should be somewhere else ..
"malignant lymphoid tumor",
"neuroendocrine tumor",
"lymphatic system cancer",
"childhood cancer" #? maybe sort furtrher
)
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = paste0(other_malignant_terms, collapse = "(?=,|$)|\\b"),
"other malignant neoplasms"
)
)
gwas_study_info |>
filter(grepl("other malignant neoplasms", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct()
all_disease_terms
<char>
1: squamous cell carcinoma
2: cancer
3: childhood cancer, cardiomyopathy
4: neuroendocrine neoplasm
5: small intestine neuroendocrine tumor
6: pancreatic neuroendocrine tumor
7: pulmonary neuroendocrine tumor
8: childhood cancer, cardiotoxicity
9: childhood cancer, obesity
10: testicular germ cell tumor
11: cutaneous squamous cell carcinoma
12: head and neck malignant neoplasia
13: ewing sarcoma
14: carcinoid tumor
15: head and neck squamous cell carcinoma, pain
16: digestive system carcinoma, chronic obstructive pulmonary disease
17: cancer, chronic obstructive pulmonary disease
18: digestive system carcinoma
19: heart failure, diabetes mellitus, stroke, atrial fibrillation, coronary artery disease, cancer
20: stroke, atrial fibrillation, coronary artery disease, heart failure, diabetes mellitus, cancer
21: head and neck squamous cell carcinoma
22: childhood cancer, breast carcinoma
23: cancer, cardiotoxicity
24: childhood cancer
25: childhood cancer, bone fracture
26: cancer, hand-foot syndrome
27: head and neck malignant neoplasia, osteoradionecrosis
28: head and neck carcinoma
29: digestive system cancer
30: reproductive system cancer
31: male reproductive organ cancer
32: childhood cancer, type 2 diabetes mellitus
33: lymphatic system cancer
34: malignant tumor of floor of mouth
35: nasal cavity cancer
36: small intestine cancer
37: retroperitoneal cancer, peritoneum cancer
38: cancer
39: female reproductive organ cancer
40: small intestine carcinoma
41: cancer, peripheral neuropathy
42: childhood cancer, hearing loss, ototoxicity
43: childhood cancer, hearing loss
44: intestinal cancer
45: vulvar neoplasm
46: vulvar carcinoma
47: in situ carcinoma
48: carcinoma
49: lymphoid neoplasm
50: head and neck carcinoma, lung cancer
51: cancer, immune system toxicity
52: cancer, hypothyroidism
53: urogenital neoplasm
54: head and neck malignant neoplasia, radiation-induced disorder
55: cancer, radiation-induced disorder
56: head and neck malignant neoplasia, neuropathic pain
57: cancer, osteonecrosis
58: head and neck carcinoma, mucositis
59: head and neck carcinoma, fibrosis
60: head and neck carcinoma, mucositis, dysphagia
61: head and neck carcinoma, fibrosis, dysphagia, xerostomia
62: head and neck carcinoma, fibrosis, mucositis, dysphagia, xerostomia
all_disease_terms
l2_all_disease_terms
<char>
1: other malignant neoplasms
2: other malignant neoplasms
3: cardiomyopathy, other malignant neoplasms
4: other malignant neoplasms
5: other malignant neoplasms
6: other malignant neoplasms
7: other malignant neoplasms
8: cardiotoxicity, other malignant neoplasms
9: other malignant neoplasms, obesity
10: other malignant neoplasms
11: other malignant neoplasms
12: other malignant neoplasms
13: other malignant neoplasms
14: other malignant neoplasms
15: other malignant neoplasms, pain
16: chronic obstructive pulmonary disease, other malignant neoplasms
17: other malignant neoplasms, chronic obstructive pulmonary disease
18: other malignant neoplasms
19: atrial fibrillation, other malignant neoplasms, coronary artery disease, diabetes mellitus, heart failure, stroke
20: atrial fibrillation, other malignant neoplasms, coronary artery disease, diabetes mellitus, heart failure, stroke
21: other malignant neoplasms
22: breast cancer, other malignant neoplasms
23: other malignant neoplasms, cardiotoxicity
24: other malignant neoplasms
25: bone fracture, other malignant neoplasms
26: other malignant neoplasms, hand-foot syndrome
27: other malignant neoplasms, osteoradionecrosis
28: other malignant neoplasms
29: other malignant neoplasms
30: other malignant neoplasms
31: other malignant neoplasms
32: other malignant neoplasms, type 2 diabetes mellitus
33: other malignant neoplasms
34: other malignant neoplasms
35: other malignant neoplasms
36: other malignant neoplasms
37: other malignant neoplasms, other malignant neoplasms
38: other malignant neoplasms, other malignant neoplasms
39: other malignant neoplasms
40: other malignant neoplasms
41: other malignant neoplasms, peripheral neuropathy
42: other malignant neoplasms, hearing loss, ototoxicity
43: other malignant neoplasms, hearing loss
44: other malignant neoplasms
45: other malignant neoplasms
46: other malignant neoplasms
47: other malignant neoplasms
48: other malignant neoplasms
49: other malignant neoplasms
50: other malignant neoplasms, tracheal bronchus and lung cancer
51: other malignant neoplasms, immune system toxicity
52: other malignant neoplasms, hypothyroidism
53: other malignant neoplasms
54: other malignant neoplasms, radiation-induced disorder
55: other malignant neoplasms, radiation-induced disorder
56: other malignant neoplasms, neuropathic pain
57: other malignant neoplasms, osteonecrosis
58: other malignant neoplasms, mucositis
59: fibrosis, other malignant neoplasms
60: dysphagia, other malignant neoplasms, mucositis
61: dysphagia, fibrosis, other malignant neoplasms, xerostomia
62: dysphagia, fibrosis, other malignant neoplasms, mucositis, xerostomia
l2_all_disease_terms
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
case_when(
l2_all_disease_terms == "benign neoplasm" ~ "other neoplasms",
TRUE ~ l2_all_disease_terms
)
)
unknown_sig_terms <- c("intracranial germ cell tumor",
"bladder tumor")
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = paste0(unknown_sig_terms, collapse = "(?=,|$)|\\b"),
"other neoplasms"
)
)
gwas_study_info |>
filter(grepl("other neoplasms", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
all_disease_terms l2_all_disease_terms
<char> <char>
1: benign prostatic hyperplasia other neoplasms
2: colorectal adenoma other neoplasms
3: colorectal cancer, endometrial neoplasm other neoplasms
4: upper aerodigestive tract neoplasm other neoplasms
5: meningioma other neoplasms
6: pituitary gland adenoma other neoplasms
gwas_study_info |>
filter(grepl("rheumatic heart disease", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct()
all_disease_terms l2_all_disease_terms
<char> <char>
1: rheumatic heart disease rheumatic heart disease
gwas_study_info |>
filter(grepl("ischemic heart disease", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct()
Empty data.table (0 rows and 2 cols): all_disease_terms,l2_all_disease_terms
gwas_study_info |>
filter(grepl("stroke", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct()
all_disease_terms
<char>
1: intracerebral hemorrhage
2: non-lobar intracerebral hemorrhage
3: lobar intracerebral hemorrhage
4: small vessel stroke
5: alzheimer disease, small vessel stroke
6: ischemic stroke
7: ischemic stroke, cardiac embolism
8: ischemic stroke, large artery stroke
9: ischemic stroke, small artery occlusion
10: ischemic stroke, venous thromboembolism, stroke, deep vein thrombosis, pulmonary embolism, abnormal thrombosis
11: stroke
12: large artery stroke
13: stroke, coronary artery disease
14: large artery stroke, coronary artery disease
15: cardioembolic stroke
16: heart failure, diabetes mellitus, stroke, atrial fibrillation, coronary artery disease, cancer
17: stroke, atrial fibrillation, coronary artery disease, heart failure, diabetes mellitus, cancer
18: ischemic stroke, stroke outcome severity measurement
19: hemorrhagic stroke
20: ischemic stroke, type 2 diabetes mellitus
21: stroke, gallstones
22: intracranial hemorrhage
23: subdural hemorrhage
24: hypertension, stroke
25: ischemic stroke, parenchymal hematoma
26: age of onset of stroke disorder
27: stroke, occlusion precerebral artery
28: stroke, major depressive disorder
29: post-operative stroke
30: small vessel stroke, stroke outcome severity measurement
31: stroke, chronic obstructive pulmonary disease
32: barretts esophagus, cardioembolic stroke, ankylosing spondylitis, psoriasis, pneumococcal bacteremia, parkinson disease, multiple sclerosis, bacteriemia, small vessel stroke, schizophrenia, psychosis, large artery stroke, visceral leishmaniasis, ulcerative colitis, glaucoma
33: hypertension, ischemic stroke, coronary artery disease
34: hypertension, ischemic stroke
35: diabetes mellitus, ischemic stroke, coronary artery disease
36: diabetes mellitus, ischemic stroke
37: rare dyslipidemia, ischemic stroke, coronary artery disease
38: rare dyslipidemia, ischemic stroke
39: stroke, type 2 diabetes mellitus, coronary artery disease
40: ischemic stroke, covid-19
41: stroke outcome severity measurement
42: type 1 diabetes mellitus, stroke
43: stroke, dementia
all_disease_terms
l2_all_disease_terms
<char>
1: stroke
2: stroke
3: stroke
4: stroke
5: alzheimers disease, stroke
6: stroke
7: cardiac embolism, stroke
8: stroke
9: small artery occlusion, stroke
10: abnormal thrombosis, deep vein thrombosis, pulmonary embolism, stroke, venous thromboembolism
11: stroke
12: stroke
13: coronary artery disease, stroke
14: coronary artery disease, stroke
15: stroke
16: atrial fibrillation, other malignant neoplasms, coronary artery disease, diabetes mellitus, heart failure, stroke
17: atrial fibrillation, other malignant neoplasms, coronary artery disease, diabetes mellitus, heart failure, stroke
18: stroke
19: stroke
20: stroke, type 2 diabetes mellitus
21: gallstones, stroke
22: stroke
23: stroke
24: hypertension, stroke
25: parenchymal hematoma, stroke
26: stroke
27: occlusion precerebral artery, stroke
28: major depressive disorder, stroke
29: post-operative, stroke
30: stroke
31: chronic obstructive pulmonary disease, stroke
32: ankylosing spondylitis, bacteriemia, barretts esophagus, glaucoma, multiple sclerosis, parkinsons disease, pneumococcal bacteremia, psoriasis, psychosis, schizophrenia, stroke, ulcerative colitis, visceral leishmaniasis
33: coronary artery disease, hypertension, stroke
34: hypertension, stroke
35: coronary artery disease, diabetes mellitus, stroke
36: diabetes mellitus, stroke
37: coronary artery disease, rare dyslipidemia, stroke
38: rare dyslipidemia, stroke
39: coronary artery disease, stroke, type 2 diabetes mellitus
40: covid-19, stroke
41: stroke
42: stroke, type 1 diabetes mellitus
43: dementia, stroke
l2_all_disease_terms
gwas_study_info |>
filter(grepl("hypertensive heart disease", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct()
all_disease_terms
<char>
1: hypertensive heart disease, kidney disease
2: hypertensive heart disease
l2_all_disease_terms
<char>
1: hypertensive heart disease, kidney disease
2: hypertensive heart disease
gwas_study_info |>
filter(grepl("non-rheumatic heart disease", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct()
Empty data.table (0 rows and 2 cols): all_disease_terms,l2_all_disease_terms
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = "\\bcardiomyopathy\\b|\\bmyocarditis\\b",
"cardiomyopathy and myocarditis"
)
)
gwas_study_info |>
filter(grepl("cardiomyopathy and myocarditis", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct()
all_disease_terms
<char>
1: idiopathic dilated cardiomyopathy
2: childhood cancer, cardiomyopathy
3: hypertrophic cardiomyopathy
4: dilated cardiomyopathy
5: chagas cardiomyopathy
6: peripartum cardiomyopathy
7: takotsubo cardiomyopathy
8: nonischemic cardiomyopathy
9: schizophrenia, myocarditis
10: cardiomyopathy
11: myocarditis
12: chagas disease, chagas cardiomyopathy
13: rheumatoid arthritis, ischemic cardiomyopathy
l2_all_disease_terms
<char>
1: idiopathic cardiomyopathy and myocarditis
2: cardiomyopathy and myocarditis, other malignant neoplasms
3: cardiomyopathy and myocarditis
4: cardiomyopathy and myocarditis
5: cardiomyopathy and myocarditis
6: cardiomyopathy and myocarditis
7: cardiomyopathy and myocarditis
8: cardiomyopathy and myocarditis
9: cardiomyopathy and myocarditis, schizophrenia
10: cardiomyopathy and myocarditis
11: cardiomyopathy and myocarditis
12: cardiomyopathy and myocarditis, chagas disease
13: cardiomyopathy and myocarditis, rheumatoid arthritis
afib_terms <- c("atrial fibrillation",
"atrial flutter",
"post-operative atrial fibrillation")
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = paste0(afib_terms, collapse = "(?=,|$)|\\b"),
"atrial fibrillation and flutter"
)
)
url <- "http://www.ebi.ac.uk/ols4/api/ontologies/cvdo/terms/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FDOID_3627/descendants"
aortic_aneurysm_terms <- get_descendants(url)
[1] "Number of terms collected:"
[1] 6
[1] "\n Some example terms"
[1] "ruptured thoracoabdominal aortic aneurysm"
[2] "ruptured abdominal aortic aneurysm"
[3] "ruptured thoracic aortic aneurysm"
[4] "abdominal aortic aneurysm"
[5] "ruptured aortic aneurysm"
gwas_study_info = gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = paste0(aortic_aneurysm_terms, collapse = "(?=,|$)|\\b"),
"aortic aneurysm"
)
)
gwas_study_info |>
filter(grepl("endocarditis", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct()
all_disease_terms
<char>
1: staphylococcus aureus infection, bacterial endocarditis
2: endocarditis
l2_all_disease_terms
<char>
1: endocarditis, staphylococcus aureus infection
2: endocarditis
gwas_study_info = gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = "acne",
"acne vulgaris"
)
)
gwas_study_info = gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = "adhd",
"attention-deficit/hyperactivity disorder"
)
)
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = "alcohol-related disorders|alcohol and nicotine codependence",
"alcohol use disorders"
)
)
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = "alcohol use disorder",
"alcohol use disorders"
)
)
dementia <- c("alzheimers disease biomarker measurement",
"alzheimers disease neuropathologic change",
"aids dementia",
"dementia",
"frontotemporal dementia",
"lewy body dementia",
"vascular dementia",
"alzheimers disease"
)
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = paste0(dementia, collapse = "(?=,|$)|\\b"),
"alzheimer's disease and other dementias"
)
)
url <- "http://www.ebi.ac.uk/ols4/api/ontologies/mesh/terms/http%253A%252F%252Fid.nlm.nih.gov%252Fmesh%252FD001008/descendants"
anxiety_terms <- get_descendants(url)
[1] "Number of terms collected:"
[1] 15
[1] "\n Some example terms"
[1] "obsessive-compulsive disorder" "generalized anxiety disorder"
[3] "neurocirculatory asthenia" "excoriation disorder"
[5] "anxiety, separation"
anxiety_terms <- c(anxiety_terms, "obsessive-compulsive symptom measurement")
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = paste0(anxiety_terms, collapse = "(?=,|$)|\\b"),
"anxiety disorders"
)
) |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = "anxiety disorder|anxiety measurement",
"anxiety disorders"
)
) |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = "anxiety",
"anxiety disorders"
)
)
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = "methamphetamine",
"amphetamine"
)
)
gwas_study_info = gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms,
pattern = "autism",
"autism spectrum disorders"
)
)
vision_loss_terms <- c("blindness",
"color vision disorder",
"vision disorder",
"myopia",
"refractive error",
"hyperopia",
"astigmatism",
"corneal astigmatism",
"presbyopia",
"anisometropia",
"esotropia",
"non-accomodative esotropia",
"accommodative esotropia")
gwas_study_info = gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = paste0(vision_loss_terms, collapse = "(?=,|$)|\\b"),
"blindness and vision loss"
)
)
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = "cannabis dependence",
"cannabis use disorders"
)
)
url <- "http://www.ebi.ac.uk/ols4/api/ontologies/snomed/terms/http%253A%252F%252Fsnomed.info%252Fid%252F328383001/descendants"
chronic_liver_disease_terms <- get_descendants(url)
[1] "Number of terms collected:"
[1] 114
[1] "\n Some example terms"
[1] "hepatic ascites co-occurrent with chronic active hepatitis due to toxic liver disease"
[2] "cirrhosis of liver co-occurrent and due to primary sclerosing cholangitis (disorder)"
[3] "chronic hepatitis c co-occurrent with human immunodeficiency virus infection"
[4] "primary biliary cirrhosis co-occurrent with systemic scleroderma (disorder)"
[5] "pulmonary fibrosis, hepatic hyperplasia, bone marrow hypoplasia syndrome"
chronic_liver_disease_terms <- c("primary biliary cirrhosis",
"alcoholic liver cirrhosis",
"chronic hepatitis B virus infection",
"acute-on-chronic liver failure",
chronic_liver_disease_terms)
gwas_study_info = gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms,
pattern = paste0(chronic_liver_disease_terms, collapse = "(?=,|$)|\\b"),
"cirrhosis and other chronic liver diseases"
)
)
gwas_study_info = gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms,
pattern = "liver disease",
"cirrhosis and other chronic liver diseases"
)
)
gwas_study_info = gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = "cocaine-related disorders",
"cocaine use disorders"
)
)
gwas_study_info = gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = "depressive symptom measurement|major depressive disorder",
"depressive disorders"
)
) |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = "depressive disorder",
"depressive disorders"
)
)
gal_bile_terms = c("gallbladder disease",
"bile duct disorder",
"biliary tract disease")
gwas_study_info = gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms,
pattern = paste0(gal_bile_terms, collapse = "(?=,|$)|\\b"),
"gallbladder and biliary diseases"
)
)
diseases <- stringr::str_split(pattern = ", ",
gwas_study_info$l2_all_disease_terms) |>
unlist() |>
stringr::str_trim()
pregnancy_terms <- grep("pregnancy", diseases, value = T)
gyno_terms <- c("endometriosis","placenta disease", pregnancy_terms)
gwas_study_info = gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms,
pattern = paste0(gyno_terms, collapse = "(?=,|$)|\\b"),
"gynecological diseases"
)
)
gwas_study_info = gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = "bulimia nervosa|anorexia nervosa|binge eating|eating disorder",
"eating disorders"
)
) |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = "anorexia",
"eating disorders"
)
)
gwas_study_info = gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms,
pattern = "headache disorder|cluster headache|migraine",
"headache disorders"
)
)
== coronary artery disease (https://www.ncbi.nlm.nih.gov/books/NBK209964/)
gwas_study_info = gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms,
pattern = "coronary artery disease",
"ischemic heart disease"
)
)
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = "opioid dependence|opioid use disorder",
"opioid use disorders"
)
)
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = "parkinsons disease",
"parkinson's disease"
)
)
url <- "http://www.ebi.ac.uk/ols4/api/ontologies/mondo/terms/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FMONDO_0002028/descendants"
personality_disorders <- get_descendants(url)
[1] "Number of terms collected:"
[1] 10
[1] "\n Some example terms"
[1] "obsessive-compulsive personality disorder"
[2] "narcissistic personality disorder"
[3] "schizotypal personality disorder"
[4] "histrionic personality disorder"
[5] "antisocial personality disorder"
gwas_study_info = gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms,
pattern = paste0(personality_disorders, collapse = "(?=,|$)|\\b"),
"personality disorders"
)
)
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = "heroin dependence|drug dependence|nictone dependence|substance abuse|drug misuse|alcohol use disorders delirium",
"other drug use disorders"
)
)
url <- "http://www.ebi.ac.uk/ols4/api/ontologies/doid/terms/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FDOID_535/descendants"
sleep_disorders <- get_descendants(url)
[1] "Number of terms collected:"
[1] 16
[1] "\n Some example terms"
[1] "periodic limb movement disorder" "advanced sleep phase syndrome 3"
[3] "advanced sleep phase syndrome 2" "advanced sleep phase syndrome 1"
[5] "advanced sleep phase syndrome 4"
url <- "http://www.ebi.ac.uk/ols4/api/ontologies/efo/terms/http%253A%252F%252Fwww.ebi.ac.uk%252Fefo%252FEFO_0008568/descendants"
other_sleep_disorders <- get_descendants(url)
[1] "Number of terms collected:"
[1] 26
[1] "\n Some example terms"
[1] "autosomal dominant cerebellar ataxia, deafness and narcolepsy"
[2] "hereditary sensory neuropathy-deafness-dementia syndrome"
[3] "rapid eye movement sleep disorder"
[4] "substance-induced sleep disorder"
[5] "drug induced central sleep apnea"
sleep_disorders <- c(sleep_disorders,
other_sleep_disorders)
gwas_study_info = gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms,
pattern = paste0(sleep_disorders, collapse = "(?=,|$)|\\b"),
"sleep disorders"
)
)
and remove alz, parks, dementia
other_mental_disorders <- c("schizophrenia",
"manic or hypomanic episode",
"mental or behavioural disorder",
"mental disorder"
)
other_neuro <- c("mild neurocognitive disorder",
"hiv-associated neurocognitive disorder")
disturbances of sensation of smell and taste
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = "disorderss",
"disorders"
)
)
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = "anxiety disorders disorderss",
"anxiety disorders"
)
)
gbd_data <- data.table::fread(here::here("data/gbd/ihme_gbd_2019_global_disease_burden_rate_all_ages.csv"))
gbd_data$cause <- stringr::str_remove_all(gbd_data$cause, ",")
diseases <- stringr::str_split(pattern = ", ",
gwas_study_info$l2_all_disease_terms[gwas_study_info$l2_all_disease_terms != ""]) |>
unlist() |>
stringr::str_trim()
gbd_data$cause[!tolower(gbd_data$cause) %in% unique(diseases)] |> sort()
[1] "Acute glomerulonephritis"
[2] "Acute glomerulonephritis"
[3] "Acute glomerulonephritis"
[4] "Bacterial skin diseases"
[5] "Bacterial skin diseases"
[6] "Bacterial skin diseases"
[7] "Chronic kidney disease"
[8] "Chronic kidney disease"
[9] "Chronic kidney disease"
[10] "Congenital birth defects"
[11] "Congenital birth defects"
[12] "Congenital birth defects"
[13] "Diabetes mellitus type 1"
[14] "Diabetes mellitus type 1"
[15] "Diabetes mellitus type 1"
[16] "Diabetes mellitus type 2"
[17] "Diabetes mellitus type 2"
[18] "Diabetes mellitus type 2"
[19] "Drug use disorders"
[20] "Drug use disorders"
[21] "Drug use disorders"
[22] "Endocrine metabolic blood and immune disorders"
[23] "Endocrine metabolic blood and immune disorders"
[24] "Endocrine metabolic blood and immune disorders"
[25] "Fungal skin diseases"
[26] "Fungal skin diseases"
[27] "Fungal skin diseases"
[28] "Hemoglobinopathies and hemolytic anemias"
[29] "Hemoglobinopathies and hemolytic anemias"
[30] "Hemoglobinopathies and hemolytic anemias"
[31] "Idiopathic developmental intellectual disability"
[32] "Idiopathic developmental intellectual disability"
[33] "Idiopathic epilepsy"
[34] "Idiopathic epilepsy"
[35] "Idiopathic epilepsy"
[36] "Inguinal femoral and abdominal hernia"
[37] "Inguinal femoral and abdominal hernia"
[38] "Inguinal femoral and abdominal hernia"
[39] "Interstitial lung disease and pulmonary sarcoidosis"
[40] "Interstitial lung disease and pulmonary sarcoidosis"
[41] "Interstitial lung disease and pulmonary sarcoidosis"
[42] "Lower extremity peripheral arterial disease"
[43] "Lower extremity peripheral arterial disease"
[44] "Lower extremity peripheral arterial disease"
[45] "Neuroblastoma and other peripheral nervous cell tumors"
[46] "Neuroblastoma and other peripheral nervous cell tumors"
[47] "Neuroblastoma and other peripheral nervous cell tumors"
[48] "Non-rheumatic valvular heart disease"
[49] "Non-rheumatic valvular heart disease"
[50] "Non-rheumatic valvular heart disease"
[51] "Oral disorders"
[52] "Oral disorders"
[53] "Oral disorders"
[54] "Other cardiovascular and circulatory diseases"
[55] "Other cardiovascular and circulatory diseases"
[56] "Other chronic respiratory diseases"
[57] "Other digestive diseases"
[58] "Other mental disorders"
[59] "Other mental disorders"
[60] "Other mental disorders"
[61] "Other musculoskeletal disorders"
[62] "Other musculoskeletal disorders"
[63] "Other neurological disorders"
[64] "Other neurological disorders"
[65] "Other neurological disorders"
[66] "Other sense organ diseases"
[67] "Other sense organ diseases"
[68] "Other sense organ diseases"
[69] "Other skin and subcutaneous diseases"
[70] "Other skin and subcutaneous diseases"
[71] "Other skin and subcutaneous diseases"
[72] "Paralytic ileus and intestinal obstruction"
[73] "Paralytic ileus and intestinal obstruction"
[74] "Paralytic ileus and intestinal obstruction"
[75] "Pulmonary Arterial Hypertension"
[76] "Pulmonary Arterial Hypertension"
[77] "Pulmonary Arterial Hypertension"
[78] "Scabies"
[79] "Scabies"
[80] "Scabies"
[81] "Sudden infant death syndrome"
[82] "Total burden related to Non-alcoholic fatty liver disease (NAFLD)"
[83] "Total burden related to Non-alcoholic fatty liver disease (NAFLD)"
[84] "Total burden related to Non-alcoholic fatty liver disease (NAFLD)"
[85] "Upper digestive system diseases"
[86] "Upper digestive system diseases"
[87] "Upper digestive system diseases"
[88] "Urinary diseases and male infertility"
[89] "Urinary diseases and male infertility"
[90] "Urinary diseases and male infertility"
[91] "Vascular intestinal disorders"
[92] "Vascular intestinal disorders"
[93] "Vascular intestinal disorders"
[94] "Viral skin diseases"
[95] "Viral skin diseases"
[96] "Viral skin diseases"
gbd_data =
gbd_data |>
mutate(cause = tolower(cause))
gwas_disease_traits = data.frame(cause = diseases)
# gwas_study_info |>
# filter(DISEASE_STUDY == T) |>
# select(all_disease_terms, l1_all_disease_terms, cause = l2_all_disease_terms) |>
# distinct()
left_join(gwas_disease_traits,
gbd_data) |>
head()
Joining with `by = join_by(cause)`
Warning in left_join(gwas_disease_traits, gbd_data): Detected an unexpected many-to-many relationship between `x` and `y`.
ℹ Row 3 of `x` matches multiple rows in `y`.
ℹ Row 19 of `y` matches multiple rows in `x`.
ℹ If a many-to-many relationship is expected, set `relationship =
"many-to-many"` to silence this warning.
cause
1 idiopathic cardiomyopathy and myocarditis
2 cleft lip
3 tracheal bronchus and lung cancer
4 tracheal bronchus and lung cancer
5 tracheal bronchus and lung cancer
6 tracheal bronchus and lung cancer
measure location sex age metric year
1 <NA> <NA> <NA> <NA> <NA> NA
2 <NA> <NA> <NA> <NA> <NA> NA
3 DALYs (Disability-Adjusted Life Years) Global Both All ages Rate 2019
4 Prevalence Global Both All ages Rate 2019
5 Incidence Global Both All ages Rate 2019
6 DALYs (Disability-Adjusted Life Years) Global Both All ages Rate 2019
val upper lower
1 NA NA NA
2 NA NA NA
3 580.36100 627.79984 532.74652
4 40.27440 43.51721 37.12978
5 28.16826 30.49575 25.77712
6 580.36100 627.79984 532.74652
gwas_study_info |> select(cause = l2_all_disease_terms) |>
distinct() |>
left_join(gbd_data) |>
head()
Joining with `by = join_by(cause)`
cause
<char>
1:
2: idiopathic cardiomyopathy and myocarditis
3: cleft lip
4: tracheal bronchus and lung cancer
5: tracheal bronchus and lung cancer
6: tracheal bronchus and lung cancer
measure location sex age metric year
<char> <char> <char> <char> <char> <int>
1: <NA> <NA> <NA> <NA> <NA> NA
2: <NA> <NA> <NA> <NA> <NA> NA
3: <NA> <NA> <NA> <NA> <NA> NA
4: DALYs (Disability-Adjusted Life Years) Global Both All ages Rate 2019
5: Prevalence Global Both All ages Rate 2019
6: Incidence Global Both All ages Rate 2019
val upper lower
<num> <num> <num>
1: NA NA NA
2: NA NA NA
3: NA NA NA
4: 580.36100 627.79984 532.74652
5: 40.27440 43.51721 37.12978
6: 28.16826 30.49575 25.77712
diseases <- stringr::str_split(pattern = ", ",
gwas_study_info$l2_all_disease_terms[gwas_study_info$l2_all_disease_terms != ""]) |>
unlist() |>
stringr::str_trim()
length(unique(diseases))
[1] 1596
# make frequency table
freq <- table(as.factor(diseases))
# sort in decreasing order
freq_sorted <- sort(freq, decreasing = TRUE)
# show top N, e.g. top 10
head(freq_sorted, 10)
kidney disease hypertension
10915 7096
type 2 diabetes mellitus other neoplasms
922 537
depressive disorders alzheimer's disease and other dementias
513 509
ischemic heart disease breast cancer
501 379
schizophrenia asthma
368 348
gwas_study_info <- fwrite(gwas_study_info,
here::here("output/gwas_cat/gwas_study_info_trait_group_l2.csv"))
sessionInfo()
R version 4.3.1 (2023-06-16)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS 15.6.1
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
time zone: America/Los_Angeles
tzcode source: internal
attached base packages:
[1] stats graphics grDevices datasets utils methods base
other attached packages:
[1] jsonlite_2.0.0 httr_1.4.7 stringr_1.5.1 ggplot2_3.5.2
[5] data.table_1.17.8 dplyr_1.1.4 workflowr_1.7.1
loaded via a namespace (and not attached):
[1] gtable_0.3.6 compiler_4.3.1 renv_1.0.3 promises_1.3.3
[5] tidyselect_1.2.1 Rcpp_1.1.0 git2r_0.36.2 callr_3.7.6
[9] later_1.4.2 jquerylib_0.1.4 scales_1.4.0 yaml_2.3.10
[13] fastmap_1.2.0 here_1.0.1 R6_2.6.1 generics_0.1.4
[17] curl_6.4.0 knitr_1.50 tibble_3.3.0 rprojroot_2.1.0
[21] RColorBrewer_1.1-3 bslib_0.9.0 pillar_1.11.0 rlang_1.1.6
[25] cachem_1.1.0 stringi_1.8.7 httpuv_1.6.16 xfun_0.52
[29] getPass_0.2-4 fs_1.6.6 sass_0.4.10 cli_3.6.5
[33] withr_3.0.2 magrittr_2.0.3 ps_1.9.1 grid_4.3.1
[37] digest_0.6.37 processx_3.8.6 rstudioapi_0.17.1 lifecycle_1.0.4
[41] vctrs_0.6.5 evaluate_1.0.4 glue_1.8.0 farver_2.1.2
[45] whisker_0.4.1 rmarkdown_2.29 tools_4.3.1 pkgconfig_2.0.3
[49] htmltools_0.5.8.1