Last updated: 2025-09-16
Checks: 7 0
Knit directory:
genomics_ancest_disease_dispar/
This reproducible R Markdown analysis was created with workflowr (version 1.7.1). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.
Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.
Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.
The command set.seed(20220216)
was run prior to running
the code in the R Markdown file. Setting a seed ensures that any results
that rely on randomness, e.g. subsampling or permutations, are
reproducible.
Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.
Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.
Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.
The results in this page were generated with repository version 16ead66. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.
Note that you need to be careful to ensure that all relevant files for
the analysis have been committed to Git prior to generating the results
(you can use wflow_publish
or
wflow_git_commit
). workflowr only checks the R Markdown
file, but you know if there are other scripts or data files that it
depends on. Below is the status of the Git repository when the results
were generated:
Ignored files:
Ignored: .DS_Store
Ignored: .Rproj.user/
Ignored: data/.DS_Store
Ignored: data/gbd/.DS_Store
Ignored: data/gbd/ihme_gbd_2019_global_disease_burden_rate_all_ages.csv
Ignored: data/gbd/ihme_gbd_2019_global_paf_rate_percent_all_ages.csv
Ignored: data/gbd/ihme_gbd_2021_global_disease_burden_rate_all_ages.csv
Ignored: data/gbd/ihme_gbd_2021_global_paf_rate_percent_all_ages.csv
Ignored: data/gwas_catalog/
Ignored: data/who/
Ignored: output/gwas_cat/
Ignored: output/gwas_study_info_cohort_corrected.csv
Ignored: output/gwas_study_info_trait_corrected.csv
Ignored: output/gwas_study_info_trait_ontology_info.csv
Ignored: output/gwas_study_info_trait_ontology_info_l1.csv
Ignored: output/gwas_study_info_trait_ontology_info_l2.csv
Ignored: output/trait_ontology/
Ignored: renv/
Unstaged changes:
Modified: code/get_term_descendants.R
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
These are the previous versions of the repository in which changes were
made to the R Markdown (analysis/level_2_disease_group.Rmd
)
and HTML (docs/level_2_disease_group.html
) files. If you’ve
configured a remote Git repository (see ?wflow_git_remote
),
click on the hyperlinks in the table below to view the files as they
were in that past version.
File | Version | Author | Date | Message |
---|---|---|---|---|
Rmd | 16ead66 | IJbeasley | 2025-09-16 | Correcting some cancer grouping |
html | 6018e42 | IJbeasley | 2025-09-16 | Build site. |
Rmd | 02a0b9d | IJbeasley | 2025-09-16 | Improving cancer grouping |
html | 6f66696 | IJbeasley | 2025-09-16 | Build site. |
Rmd | 66cff1c | IJbeasley | 2025-09-16 | Even more disease term grouping |
html | 21b6c02 | IJbeasley | 2025-09-15 | Build site. |
html | 5ec3111 | IJbeasley | 2025-09-15 | Build site. |
html | 30d773e | IJbeasley | 2025-09-15 | Build site. |
html | 8d64a38 | IJbeasley | 2025-09-15 | Build site. |
Rmd | b3088d8 | IJbeasley | 2025-09-15 | workflowr::wflow_publish("analysis/level_2_disease_group.Rmd") |
html | b89d661 | IJbeasley | 2025-09-10 | Build site. |
Rmd | c0fcab7 | IJbeasley | 2025-09-10 | workflowr::wflow_publish("analysis/level_2_disease_group.Rmd") |
html | ead4d8e | IJbeasley | 2025-09-10 | Build site. |
Rmd | 3964f77 | IJbeasley | 2025-09-10 | workflowr::wflow_publish("analysis/level_2_disease_group.Rmd") |
html | 8fb639d | IJbeasley | 2025-09-10 | Build site. |
Rmd | edeb6f5 | IJbeasley | 2025-09-10 | workflowr::wflow_publish("analysis/level_2_disease_group.Rmd") |
html | fe91704 | IJbeasley | 2025-09-09 | Build site. |
Rmd | 9c64867 | IJbeasley | 2025-09-09 | Minor fixing of disease trait categorisation |
html | fa509c0 | IJbeasley | 2025-09-08 | Build site. |
Rmd | c9602c7 | IJbeasley | 2025-09-08 | More grouping to match GBD |
library(dplyr)
library(data.table)
library(ggplot2)
library(stringr)
source(here::here("code/get_term_descendants.R"))
gwas_study_info <- fread(here::here("output/gwas_cat/gwas_study_info_group_l1_v2.csv"))
gwas_study_info = gwas_study_info |>
mutate(l2_all_disease_terms = l1_all_disease_terms)
gwas_study_info |>
filter(grepl("lip and oral cavity cancer", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct()
all_disease_terms
<char>
1: oral cavity cancer
2: mouth neoplasm
3: tongue cancer
4: major salivary gland cancer
5: human papilloma virus infection, oral cavity cancer
6: tongue neoplasm
7: major salivary gland carcinoma
8: lip cancer
9: oral squamous cell carcinoma
l2_all_disease_terms
<char>
1: lip and oral cavity cancer
2: lip and oral cavity cancer
3: lip and oral cavity cancer
4: lip and oral cavity cancer
5: human papilloma virus infection, lip and oral cavity cancer
6: lip and oral cavity cancer
7: lip and oral cavity cancer
8: lip and oral cavity cancer
9: lip and oral cavity cancer
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = "nasopharyngeal cancer",
"nasopharynx cancer"
)
)
gwas_study_info |>
filter(grepl("nasopharynx cancer", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct()
all_disease_terms l2_all_disease_terms
<char> <char>
1: nasopharyngeal neoplasm nasopharynx cancer
gwas_study_info |>
filter(grepl("other pharynx cancer", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct()
all_disease_terms
<char>
1: oropharynx cancer
2: laryngeal squamous cell carcinoma, hypopharynx cancer
3: human papilloma virus infection, oropharynx cancer
4: tonsil cancer
5: hypopharyngeal carcinoma
6: pharynx cancer, laryngeal carcinoma
l2_all_disease_terms
<char>
1: other pharynx cancer
2: larynx cancer, other pharynx cancer
3: human papilloma virus infection, other pharynx cancer
4: other pharynx cancer
5: other pharynx cancer
6: larynx cancer, other pharynx cancer
gwas_study_info |>
filter(grepl("esophageal cancer", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct()
all_disease_terms
<char>
1: esophageal adenocarcinoma, barretts esophagus
2: esophageal adenocarcinoma
3: esophageal adenocarcinoma, gastroesophageal reflux disease
4: esophageal squamous cell carcinoma
5: esophageal carcinoma, gastric carcinoma
6: squamous cell carcinoma, esophageal carcinoma
7: esophageal carcinoma
8: esophageal adenocarcinoma, digestive system disease, barretts esophagus
9: neoplasm of esophagus
10: esophageal cancer
l2_all_disease_terms
<char>
1: barretts esophagus, esophageal cancer
2: esophageal cancer
3: esophageal cancer, gastroesophageal reflux disease
4: esophageal cancer
5: esophageal cancer, stomach cancer
6: esophageal cancer
7: esophageal cancer
8: barretts esophagus, digestive system disease, esophageal cancer
9: esophageal cancer
10: esophageal cancer
gwas_study_info |>
filter(grepl("stomach cancer", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct()
all_disease_terms
<char>
1: gastric carcinoma
2: esophageal carcinoma, gastric carcinoma
3: gastric cardia carcinoma
4: gastric adenocarcinoma
5: lung carcinoma, squamous cell carcinoma, gastric carcinoma
6: gastric cancer
7: stomach neoplasm
8: gastric intestinal type adenocarcinoma
9: diffuse gastric adenocarcinoma
10: cardia cancer
l2_all_disease_terms
<char>
1: stomach cancer
2: esophageal cancer, stomach cancer
3: stomach cancer
4: stomach cancer
5: lung cancer, stomach cancer
6: stomach cancer
7: stomach cancer
8: stomach cancer
9: stomach cancer
10: stomach cancer
gwas_study_info = gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = "colorectal cancer",
"colon and rectum cancer"
)
)
gwas_study_info |>
filter(grepl("colon and rectum cancer", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct()
all_disease_terms
<char>
1: colorectal cancer
2: sclerosing cholangitis, colorectal cancer
3: colorectal cancer, colorectal adenoma
4: metastatic colorectal cancer
5: lung carcinoma, estrogen-receptor negative breast cancer, ovarian endometrioid carcinoma, colorectal cancer, prostate carcinoma, ovarian serous carcinoma, breast carcinoma, ovarian carcinoma, lung adenocarcinoma, squamous cell lung carcinoma, cancer
6: rectum cancer
7: colonic neoplasm
8: colorectal adenocarcinoma
9: colon carcinoma
10: colorectal cancer, colorectal mucinous adenocarcinoma
11: colorectal carcinoma
12: colorectal cancer, peripheral neuropathy
13: colorectal cancer, stomatitis
14: colorectal cancer, neutropenia
15: colorectal cancer, hand-foot syndrome
16: colorectal cancer, exanthem
17: colorectal cancer, sleepiness
18: anal carcinoma
19: cecum cancer
20: sigmoid neoplasm
21: rectum cancer, colonic neoplasm
22: colorectal cancer, inflammatory bowel disease
23: colon carcinoma, sensory peripheral neuropathy
24: colon carcinoma, drug allergy
25: colorectal cancer, lung cancer
26: colorectal cancer, squamous cell lung carcinoma
27: colorectal cancer, skin disease
28: skin disease, colon carcinoma
29: age of onset of colorectal cancer
30: cecal neoplasm
31: colorectal cancer, breast carcinoma
32: metastatic colorectal cancer, disease progression measurement
33: polyp of large intestine, colorectal cancer
all_disease_terms
l2_all_disease_terms
<char>
1: colon and rectum cancer
2: colon and rectum cancer, sclerosing cholangitis
3: benign neoplasm, colon and rectum cancer
4: colon and rectum cancer
5: breast cancer, cancer, colon and rectum cancer, lung cancer, ovarian cancer, prostate cancer
6: colon and rectum cancer
7: colon and rectum cancer
8: colon and rectum cancer
9: colon and rectum cancer
10: colon and rectum cancer
11: colon and rectum cancer
12: colon and rectum cancer, peripheral neuropathy
13: colon and rectum cancer, stomatitis
14: colon and rectum cancer, neutropenia
15: colon and rectum cancer, hand-foot syndrome
16: colon and rectum cancer, exanthem
17: colon and rectum cancer, sleepiness
18: colon and rectum cancer
19: colon and rectum cancer
20: colon and rectum cancer
21: colon and rectum cancer
22: colon and rectum cancer, inflammatory bowel disease
23: colon and rectum cancer, sensory peripheral neuropathy
24: colon and rectum cancer, drug allergy
25: colon and rectum cancer, lung cancer
26: colon and rectum cancer, lung cancer
27: colon and rectum cancer, skin disease
28: colon and rectum cancer, skin disease
29: colon and rectum cancer
30: colon and rectum cancer
31: breast cancer, colon and rectum cancer
32: colon and rectum cancer
33: benign neoplasm, colon and rectum cancer
l2_all_disease_terms
gwas_study_info |>
filter(grepl("liver cancer", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct()
all_disease_terms
<char>
1: hepatitis b virus infection, hepatocellular carcinoma
2: sclerosing cholangitis, cholangiocarcinoma
3: cholangiocarcinoma, sclerosing cholangitis
4: sclerosing cholangitis, hepatocellular carcinoma
5: hepatitis c virus infection, hepatocellular carcinoma
6: hepatocellular carcinoma, non-alcoholic steatohepatitis
7: hepatocellular carcinoma
8: biliary tract cancer
9: liver neoplasm
10: intrahepatic bile duct cancer, liver cancer
11: liver cancer
12: liver cancer, bile duct cancer
13: bile duct cancer
14: extrahepatic bile duct carcinoma
15: intrahepatic cholangiocarcinoma
16: alcohol-related disorders, hepatocellular carcinoma
17: hepatitis virus-related hepatocellular carcinoma
18: alcoholic liver cirrhosis, hepatocellular carcinoma
19: cholangiocarcinoma
l2_all_disease_terms
<char>
1: hepatitis b infection, liver cancer
2: liver cancer, sclerosing cholangitis
3: liver cancer, sclerosing cholangitis
4: liver cancer, sclerosing cholangitis
5: hepatitis c infection, liver cancer
6: liver cancer, non-alcoholic fatty liver disease
7: liver cancer
8: liver cancer
9: liver cancer
10: liver cancer
11: liver cancer
12: liver cancer
13: liver cancer
14: liver cancer
15: liver cancer
16: alcohol-related disorders, liver cancer
17: liver cancer
18: alcoholic liver disease, liver cancer
19: liver cancer
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
case_when(
l2_all_disease_terms == "cancer of gallbladder and extrahepatic biliary tract" ~ "gallbladder and biliary tract cancer",
TRUE ~ l2_all_disease_terms
)
)
gwas_study_info |>
filter(grepl("gallbladder and biliary tract cancer", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct()
all_disease_terms
<char>
1: sclerosing cholangitis, gallbladder neoplasm
2: gallbladder neoplasm
3: carcinoma of gallbladder and extrahepatic biliary tract
l2_all_disease_terms
<char>
1: gallbladder and biliary tract cancer, sclerosing cholangitis
2: gallbladder and biliary tract cancer
3: gallbladder and biliary tract cancer
gwas_study_info |>
filter(grepl("pancreatic cancer", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct()
all_disease_terms
<char>
1: pancreatic carcinoma
2: pancreatic ductal adenocarcinoma
3: pancreatic carcinoma, neutropenia
4: breast carcinoma, estrogen-receptor positive breast cancer, metastatic prostate cancer, pancreatic carcinoma, hypertension
5: breast carcinoma, estrogen-receptor positive breast cancer, metastatic prostate cancer, pancreatic carcinoma, proteinuria
6: breast carcinoma, estrogen-receptor positive breast cancer, metastatic prostate cancer, pancreatic carcinoma, hypertension, proteinuria
l2_all_disease_terms
<char>
1: pancreatic cancer
2: pancreatic cancer
3: neutropenia, pancreatic cancer
4: breast cancer, hypertension, pancreatic cancer, prostate cancer
5: breast cancer, pancreatic cancer, prostate cancer, proteinuria
6: breast cancer, hypertension, pancreatic cancer, prostate cancer, proteinuria
gwas_study_info |>
filter(grepl("larynx cancer", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct()
all_disease_terms
<char>
1: laryngeal squamous cell carcinoma
2: laryngeal squamous cell carcinoma, hypopharynx cancer
3: laryngeal carcinoma
4: laryngeal neoplasm
5: glottis neoplasm
6: pharynx cancer, laryngeal carcinoma
l2_all_disease_terms
<char>
1: larynx cancer
2: larynx cancer, other pharynx cancer
3: larynx cancer
4: larynx cancer
5: larynx cancer
6: larynx cancer, other pharynx cancer
resp_cancer_terms = c("lung cancer",
"bronchus cancer",
"tracheal cancer",
"respiratory system cancer"
)
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
ifelse(l2_all_disease_terms != "tracheal bronchus and lung cancer",
stringr::str_replace_all(l2_all_disease_terms,
pattern = paste0(resp_cancer_terms, collapse = "(?=,|$)|\\b"),
"tracheal bronchus and lung cancer"
),
l2_all_disease_terms
)
)
gwas_study_info |>
filter(grepl("tracheal bronchus and lung cancer", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct()
all_disease_terms
<char>
1: non-small cell lung carcinoma
2: lung adenocarcinoma
3: squamous cell lung carcinoma
4: lung carcinoma, family history of lung cancer
5: lung adenocarcinoma, family history of lung cancer
6: squamous cell lung carcinoma, family history of lung cancer
7: lung carcinoma
8: small cell lung carcinoma
9: lung carcinoma, schizophrenia
10: lung carcinoma, squamous cell carcinoma, lung adenocarcinoma
11: lung carcinoma, estrogen-receptor negative breast cancer, ovarian endometrioid carcinoma, colorectal cancer, prostate carcinoma, ovarian serous carcinoma, breast carcinoma, ovarian carcinoma, lung adenocarcinoma, squamous cell lung carcinoma, cancer
12: non-small cell lung carcinoma, drug-induced liver injury
13: lung carcinoma, chronic obstructive pulmonary disease
14: lung carcinoma, squamous cell carcinoma, gastric carcinoma
15: lung cancer
16: respiratory system cancer
17: bronchus cancer
18: lung cancer, bronchus cancer
19: family history of lung cancer
20: breast cancer, lung cancer
21: head and neck carcinoma, lung cancer
22: small cell carcinoma
23: colorectal cancer, lung cancer
24: lung cancer, gastroesophageal reflux disease
25: peptic ulcer disease, lung cancer
26: colorectal cancer, squamous cell lung carcinoma
27: peptic ulcer disease, squamous cell lung carcinoma
28: non-small cell lung carcinoma, disease progression measurement
29: lung neoplasm
30: cancer
31: lung cancer, radiation-induced disorder
all_disease_terms
l2_all_disease_terms
<char>
1: tracheal bronchus and lung cancer
2: tracheal bronchus and lung cancer
3: tracheal bronchus and lung cancer
4: tracheal bronchus and lung cancer
5: tracheal bronchus and lung cancer
6: tracheal bronchus and lung cancer
7: tracheal bronchus and lung cancer
8: tracheal bronchus and lung cancer
9: tracheal bronchus and lung cancer, schizophrenia
10: tracheal bronchus and lung cancer
11: breast cancer, cancer, colon and rectum cancer, tracheal bronchus and lung cancer, ovarian cancer, prostate cancer
12: drug-induced liver injury, tracheal bronchus and lung cancer
13: chronic obstructive pulmonary disease, tracheal bronchus and lung cancer
14: tracheal bronchus and lung cancer, stomach cancer
15: tracheal bronchus and lung cancer
16: tracheal bronchus and lung cancer
17: tracheal bronchus and lung cancer
18: tracheal bronchus and lung cancer, tracheal bronchus and lung cancer
19: tracheal bronchus and lung cancer
20: breast cancer, tracheal bronchus and lung cancer
21: head and neck cancer, tracheal bronchus and lung cancer
22: tracheal bronchus and lung cancer
23: colon and rectum cancer, tracheal bronchus and lung cancer
24: gastroesophageal reflux disease, tracheal bronchus and lung cancer
25: tracheal bronchus and lung cancer, peptic ulcer disease
26: colon and rectum cancer, tracheal bronchus and lung cancer
27: tracheal bronchus and lung cancer, peptic ulcer disease
28: tracheal bronchus and lung cancer
29: tracheal bronchus and lung cancer
30: tracheal bronchus and lung cancer
31: tracheal bronchus and lung cancer, radiation-induced disorder
l2_all_disease_terms
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms,
pattern = "malignant melanoma of skin",
"malignant skin melanoma"
)
)
gwas_study_info |>
filter(grepl("malignant skin melanoma", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct()
all_disease_terms
<char>
1: cutaneous melanoma
2: melanoma
3: neuroblastoma, cutaneous melanoma
4: melanoma, immune system toxicity
5: non-melanoma skin carcinoma
l2_all_disease_terms
<char>
1: malignant skin melanoma
2: malignant skin melanoma
3: malignant skin melanoma, neuroblastoma
4: immune system toxicity, malignant skin melanoma
5: non-malignant skin melanoma skin cancer
gwas_study_info |>
filter(grepl("non-melanoma skin cancer", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct()
all_disease_terms l2_all_disease_terms
<char> <char>
1: squamous cell carcinoma, basal cell carcinoma non-melanoma skin cancer
2: keratinocyte carcinoma non-melanoma skin cancer
3: basal cell carcinoma non-melanoma skin cancer
4: non-melanoma skin carcinoma non-melanoma skin cancer
5: skin neoplasm non-melanoma skin cancer
6: skin carcinoma in situ non-melanoma skin cancer
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = "soft tissue sarcoma",
"soft tissue and other extraosseous sarcomas"
)
)
gwas_study_info |>
filter(grepl("soft tissue and other extraosseous sarcomas", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct()
all_disease_terms l2_all_disease_terms
<char> <char>
1: sarcoma, fibrosarcoma sarcoma, soft tissue and other extraosseous sarcomas
2: kaposis sarcoma soft tissue and other extraosseous sarcomas
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = "bone cancer|osteosarcoma",
"malignant neoplasm of bone and articular cartilage"
)
)
gwas_study_info |>
filter(grepl("malignant neoplasm of bone and articular cartilage", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct()
all_disease_terms
<char>
1: osteosarcoma
2: acute myeloid leukemia
3: myeloid leukemia
4: malignant bone neoplasm
5: acute myeloid leukemia, myelodysplastic syndrome
6: bone neoplasm
7: myelofibrosis
8: acute lymphoblastic leukemia, acute myeloid leukemia, myelodysplastic syndrome
l2_all_disease_terms
<char>
1: malignant neoplasm of bone and articular cartilage
2: malignant neoplasm of bone and articular cartilage
3: malignant neoplasm of bone and articular cartilage
4: malignant neoplasm of bone and articular cartilage
5: malignant neoplasm of bone and articular cartilage, myelodysplastic syndrome
6: malignant neoplasm of bone and articular cartilage
7: malignant neoplasm of bone and articular cartilage
8: malignant neoplasm of bone and articular cartilage, leukemia, myelodysplastic syndrome
gwas_study_info |>
filter(grepl("breast cancer", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct()
all_disease_terms
<char>
1: estrogen-receptor negative breast cancer
2: breast carcinoma
3: estrogen-receptor positive breast cancer
4: breast carcinoma,
5: estrogen-receptor positive breast cancer, breast carcinoma
6: estrogen-receptor negative breast cancer, breast carcinoma
7: breast carcinoma, peripheral neuropathy
8: prostate carcinoma, breast carcinoma, ovarian carcinoma
9: male breast carcinoma
10: invasive lobular carcinoma
11: lung carcinoma, estrogen-receptor negative breast cancer, ovarian endometrioid carcinoma, colorectal cancer, prostate carcinoma, ovarian serous carcinoma, breast carcinoma, ovarian carcinoma, lung adenocarcinoma, squamous cell lung carcinoma, cancer
12: tp53 positive breast carcinoma
13: breast carcinoma, congestive heart failure
14: triple-negative breast cancer
15: breast carcinoma, chemotherapy-induced hypertension
16: estrogen-receptor negative breast cancer, estrogen-receptor positive breast cancer, breast carcinoma
17: childhood cancer, breast carcinoma
18: breast cancer
19: luminal a breast carcinoma
20: luminal b breast carcinoma
21: basal-like breast carcinoma
22: her2 positive breast carcinoma
23: breast carcinoma, chemotherapy-induced alopecia
24: progesterone-receptor negative breast cancer
25: her2 negative breast carcinoma
26: breast carcinoma, amenorrhea
27: breast carcinoma, post operative nausea and vomiting
28: breast cancer, covid-19
29: estrogen-receptor positive breast cancer, breast carcinoma, her2 negative breast carcinoma, progesterone-receptor positive breast cancer
30: her2 positive breast carcinoma, estrogen-receptor positive breast cancer, breast carcinoma, progesterone-receptor positive breast cancer
31: progesterone-receptor negative breast cancer, estrogen-receptor negative breast cancer, her2 positive breast carcinoma, breast carcinoma
32: breast carcinoma, triple-negative breast cancer
33: her2 positive breast carcinoma, musculoskeletal system disease
34: lobular breast carcinoma in situ
35: breast carcinoma in situ
36: breast neoplasm
37: breast cancer, ovarian carcinoma
38: breast cancer, lung cancer
39: estrogen-receptor negative breast cancer, estrogen-receptor positive breast cancer
40: estrogen-receptor positive breast cancer, triple-negative breast cancer
41: her2 positive breast carcinoma, triple-negative breast cancer
42: breast carcinoma, cardiotoxicity
43: breast carcinoma, uterine leiomyoma
44: estrogen-receptor positive breast cancer, uterine leiomyoma
45: estrogen-receptor negative breast cancer, uterine leiomyoma
46: ductal breast carcinoma in situ and lobular carcinoma in situ
47: luminal b breast carcinoma, luminal a breast carcinoma
48: her2 positive breast carcinoma, luminal a breast carcinoma
49: triple-negative breast cancer, luminal a breast carcinoma
50: luminal b breast carcinoma, triple-negative breast cancer
51: luminal b breast carcinoma, her2 negative breast carcinoma, triple-negative breast cancer
52: luminal b breast carcinoma, her2 positive breast carcinoma
53: breast cancer, radiation-induced disorder
54: colorectal cancer, breast carcinoma
55: breast carcinoma, dermatological toxicity
56: breast carcinoma, edema
57: breast carcinoma, telangiectasia of the skin
58: breast carcinoma, lymphedema
59: luminal b breast carcinoma, her2 negative breast carcinoma
60: breast carcinoma, estrogen-receptor positive breast cancer, metastatic prostate cancer, pancreatic carcinoma, hypertension
61: breast carcinoma, estrogen-receptor positive breast cancer, metastatic prostate cancer, pancreatic carcinoma, proteinuria
62: breast carcinoma, estrogen-receptor positive breast cancer, metastatic prostate cancer, pancreatic carcinoma, hypertension, proteinuria
63: brcax breast cancer
64: schizophrenia, breast carcinoma
65: schizophrenia, estrogen-receptor positive breast cancer
66: estrogen-receptor negative breast cancer, schizophrenia
67: breast cancer, neutropenia, leukopenia
all_disease_terms
l2_all_disease_terms
<char>
1: breast cancer
2: breast cancer
3: breast cancer
4: breast cancer
5: breast cancer
6: breast cancer
7: breast cancer, peripheral neuropathy
8: breast cancer, ovarian cancer, prostate cancer
9: breast cancer
10: breast cancer
11: breast cancer, cancer, colon and rectum cancer, tracheal bronchus and lung cancer, ovarian cancer, prostate cancer
12: breast cancer
13: breast cancer, congestive heart failure
14: breast cancer
15: breast cancer, hypertension
16: breast cancer
17: breast cancer, childhood cancer
18: breast cancer
19: breast cancer
20: breast cancer
21: breast cancer
22: breast cancer
23: breast cancer, chemotherapy-induced alopecia
24: breast cancer
25: breast cancer
26: amenorrhea, breast cancer
27: breast cancer, post operative nausea and vomiting
28: breast cancer, covid-19
29: breast cancer
30: breast cancer
31: breast cancer
32: breast cancer
33: breast cancer, musculoskeletal system disease
34: breast cancer
35: breast cancer
36: breast cancer
37: breast cancer, ovarian cancer
38: breast cancer, tracheal bronchus and lung cancer
39: breast cancer
40: breast cancer
41: breast cancer
42: breast cancer, cardiotoxicity
43: benign neoplasm, breast cancer
44: benign neoplasm, breast cancer
45: benign neoplasm, breast cancer
46: breast cancer
47: breast cancer
48: breast cancer
49: breast cancer
50: breast cancer
51: breast cancer
52: breast cancer
53: breast cancer, radiation-induced disorder
54: breast cancer, colon and rectum cancer
55: breast cancer, dermatological toxicity
56: breast cancer, edema
57: breast cancer, telangiectasia of the skin
58: breast cancer, lymphedema
59: breast cancer
60: breast cancer, hypertension, pancreatic cancer, prostate cancer
61: breast cancer, pancreatic cancer, prostate cancer, proteinuria
62: breast cancer, hypertension, pancreatic cancer, prostate cancer, proteinuria
63: breast cancer
64: breast cancer, schizophrenia
65: breast cancer, schizophrenia
66: breast cancer, schizophrenia
67: breast cancer, leukopenia, neutropenia
l2_all_disease_terms
gwas_study_info |>
filter(grepl("cervical cancer", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct()
all_disease_terms
<char>
1: cervical carcinoma
2: cervical cancer
3: dysplasia of cervix, cervical cancer
4: dysplasia, cervical cancer
5: uterine cervix carcinoma in situ
6: cervical carcinoma, human papilloma virus infection
7: cervical intraepithelial neoplasia grade 2/3
8: cervical carcinoma, cervical intraepithelial neoplasia grade 2/3
l2_all_disease_terms
<char>
1: cervical cancer
2: cervical cancer
3: cervical cancer, dysplasia of cervix
4: cervical cancer, dysplasia
5: cervical cancer
6: cervical cancer, human papilloma virus infection
7: cervical cancer
8: cervical cancer
# ? is endometrial cancer a subset of uterine cancer for GBD?
# is for ontology: http://purl.obolibrary.org/obo/MONDO_0002715
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = "endometrial cancer",
"uterine cancer"
)
)
gwas_study_info |>
filter(grepl("uterine cancer", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct()
all_disease_terms l2_all_disease_terms
<char> <char>
1: endometrial endometrioid carcinoma uterine cancer
2: endometrial carcinoma uterine cancer
3: endometrial neoplasm uterine cancer
4: uterine carcinoma uterine cancer
5: endometrial cancer, covid-19 covid-19, uterine cancer
6: uterine corpus cancer uterine cancer
7: ovarian endometrioid adenocarcinoma uterine cancer
8: uterine cancer uterine cancer
9: uterine adnexa cancer, ovarian cancer ovarian cancer, uterine cancer
10: endometrial cancer uterine cancer
11: endometrial carcinoma, endometriosis uterine cancer, endometriosis
gwas_study_info |>
filter(grepl("ovarian cancer", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct()
all_disease_terms
<char>
1: ovarian carcinoma
2: malignant epithelial tumor of ovary
3: prostate carcinoma, breast carcinoma, ovarian carcinoma
4: lung carcinoma, estrogen-receptor negative breast cancer, ovarian endometrioid carcinoma, colorectal cancer, prostate carcinoma, ovarian serous carcinoma, breast carcinoma, ovarian carcinoma, lung adenocarcinoma, squamous cell lung carcinoma, cancer
5: ovarian mucinous adenocarcinoma
6: ovarian serous carcinoma
7: ovarian clear cell adenocarcinoma
8: ovarian endometrioid carcinoma
9: ovarian carcinoma, covid-19
10: high grade ovarian serous adenocarcinoma
11: ovarian serous adenocarcinoma
12: ovarian clear cell cancer
13: uterine adnexa cancer, ovarian cancer
14: ovarian cancer
15: ovarian neoplasm
16: breast cancer, ovarian carcinoma
17: ovarian carcinoma, cancer aggressiveness measurement
18: ovarian serous carcinoma, cancer aggressiveness measurement
l2_all_disease_terms
<char>
1: ovarian cancer
2: ovarian cancer
3: breast cancer, ovarian cancer, prostate cancer
4: breast cancer, cancer, colon and rectum cancer, tracheal bronchus and lung cancer, ovarian cancer, prostate cancer
5: ovarian cancer
6: ovarian cancer
7: ovarian cancer
8: ovarian cancer
9: covid-19, ovarian cancer
10: ovarian cancer
11: ovarian cancer
12: ovarian cancer
13: ovarian cancer, uterine cancer
14: ovarian cancer
15: ovarian cancer
16: breast cancer, ovarian cancer
17: ovarian cancer
18: ovarian cancer
gwas_study_info |>
filter(grepl("prostate cancer", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct()
all_disease_terms
<char>
1: prostate carcinoma
2: cancer aggressiveness measurement, prostate carcinoma
3: prostate carcinoma, breast carcinoma, ovarian carcinoma
4: metastatic prostate cancer, peripheral neuropathy
5: metastatic prostate cancer
6: prostate carcinoma, erectile dysfunction
7: lung carcinoma, estrogen-receptor negative breast cancer, ovarian endometrioid carcinoma, colorectal cancer, prostate carcinoma, ovarian serous carcinoma, breast carcinoma, ovarian carcinoma, lung adenocarcinoma, squamous cell lung carcinoma, cancer
8: prostate carcinoma, adverse effect
9: prostate cancer
10: grade iii prostatic intraepithelial neoplasia
11: prostate carcinoma, type 2 diabetes mellitus
12: prostate cancer, disease progression measurement
13: prostate cancer, radiation-induced disorder
14: breast carcinoma, estrogen-receptor positive breast cancer, metastatic prostate cancer, pancreatic carcinoma, hypertension
15: breast carcinoma, estrogen-receptor positive breast cancer, metastatic prostate cancer, pancreatic carcinoma, proteinuria
16: breast carcinoma, estrogen-receptor positive breast cancer, metastatic prostate cancer, pancreatic carcinoma, hypertension, proteinuria
17: metastatic prostate cancer, disease progression measurement
l2_all_disease_terms
<char>
1: prostate cancer
2: prostate cancer
3: breast cancer, ovarian cancer, prostate cancer
4: peripheral neuropathy, prostate cancer
5: prostate cancer
6: erectile dysfunction, prostate cancer
7: breast cancer, cancer, colon and rectum cancer, tracheal bronchus and lung cancer, ovarian cancer, prostate cancer
8: complication, prostate cancer
9: prostate cancer
10: prostate cancer
11: prostate cancer, type 2 diabetes mellitus
12: prostate cancer
13: prostate cancer, radiation-induced disorder
14: breast cancer, hypertension, pancreatic cancer, prostate cancer
15: breast cancer, pancreatic cancer, prostate cancer, proteinuria
16: breast cancer, hypertension, pancreatic cancer, prostate cancer, proteinuria
17: prostate cancer
gwas_study_info |>
filter(grepl("testicular cancer", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct()
all_disease_terms
<char>
1: testicular carcinoma
2: testicular carcinoma, cardiovascular disease
3: testicular neoplasm
l2_all_disease_terms
<char>
1: testicular cancer
2: cardiovascular disease, testicular cancer
3: testicular cancer
gwas_study_info |>
filter(grepl("kidney cancer", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct()
all_disease_terms l2_all_disease_terms
<char> <char>
1: renal cell carcinoma kidney cancer
2: nephroblastoma kidney cancer
3: kidney cancer kidney cancer
4: clear cell renal carcinoma kidney cancer
5: renal carcinoma kidney cancer
6: papillary renal cell carcinoma kidney cancer
7: kidney neoplasm kidney cancer
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = "urinary bladder cancer",
"bladder cancer"
)
)
gwas_study_info |>
filter(grepl("bladder cancer", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct()
all_disease_terms
<char>
1: urinary bladder carcinoma
2: urinary bladder carcinoma, disease progression measurement
3: urinary bladder cancer,
4: urinary bladder cancer
l2_all_disease_terms
<char>
1: bladder cancer
2: bladder cancer
3: bladder cancer
4: bladder cancer
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = "\\bcentral nervous system cancer\\b",
"brain and central nervous system cancer"
)
)
gwas_study_info |>
filter(grepl("brain and central nervous system cancer", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct()
all_disease_terms
<char>
1: glioblastoma multiforme
2: central nervous system cancer
3: glioma
4: central nervous system cancer, glioma
5: central nervous system cancer, glioblastoma multiforme
6: brain neoplasm
7: astrocytoma
8: oligodendroglioma
9: nervous system cancer, brain cancer
10: brain cancer
11: malignant glioma
12: central nervous system non-hodgkin lymphoma
l2_all_disease_terms
<char>
1: brain and central nervous system cancer
2: brain and central nervous system cancer
3: brain and central nervous system cancer
4: brain and central nervous system cancer
5: brain and central nervous system cancer
6: brain and central nervous system cancer
7: brain and central nervous system cancer
8: brain and central nervous system cancer
9: brain and central nervous system cancer
10: brain and central nervous system cancer
11: brain and central nervous system cancer
12: brain and central nervous system cancer
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = "\\bocular melanoma\\b|ocular cancer\\b",
"eye cancer"
)
)
gwas_study_info |>
filter(grepl("eye cancer", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct()
all_disease_terms l2_all_disease_terms
<char> <char>
1: uveal melanoma eye cancer
2: choroidal melanoma eye cancer
3: epithelioid cell uveal melanoma eye cancer
4: uveal melanoma, epithelioid cell uveal melanoma eye cancer
5: uveal melanoma disease severity eye cancer
6: ocular cancer eye cancer
7: eye neoplasm eye cancer
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = "\\bneuroblastoma\\b|\\bperipheral nervous system cancer\\b",
"neuroblastoma and other peripheral nervous system cancers"
)
)
gwas_study_info |>
filter(grepl("neuroblastoma and other peripheral nervous system cancers", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct()
all_disease_terms
<char>
1: neuroblastoma
2: neuroblastoma, cutaneous melanoma
l2_all_disease_terms
<char>
1: neuroblastoma and other peripheral nervous system cancers
2: malignant skin melanoma, neuroblastoma and other peripheral nervous system cancers
gwas_study_info |>
filter(grepl("thyroid cancer", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct()
all_disease_terms l2_all_disease_terms
<char> <char>
1: differentiated thyroid carcinoma thyroid cancer
2: papillary thyroid carcinoma thyroid cancer
3: follicular thyroid carcinoma thyroid cancer
4: thyroid carcinoma thyroid cancer
5: thyroid cancer thyroid cancer
gwas_study_info |>
filter(grepl("mesothelioma", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct()
all_disease_terms l2_all_disease_terms
<char> <char>
1: malignant pleural mesothelioma mesothelioma
2: mesothelioma mesothelioma
3: pleural mesothelioma mesothelioma
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = "hodgkins lymphoma",
"hodgkin lymphoma"
)
)
gwas_study_info |>
filter(grepl("hodgkin lymphoma", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct()
all_disease_terms
<char>
1: diffuse large b-cell lymphoma, multiple sclerosis
2: follicular lymphoma, multiple sclerosis
3: marginal zone b-cell lymphoma, multiple sclerosis
4: diffuse large b-cell lymphoma, rheumatoid arthritis
5: rheumatoid arthritis, follicular lymphoma
6: rheumatoid arthritis, marginal zone b-cell lymphoma
7: diffuse large b-cell lymphoma, systemic lupus erythematosus
8: systemic lupus erythematosus, follicular lymphoma
9: marginal zone b-cell lymphoma, systemic lupus erythematosus
10: nodular sclerosis hodgkin lymphoma
11: hodgkins lymphoma
12: diffuse large b-cell lymphoma
13: neoplasm of mature b-cells
14: marginal zone b-cell lymphoma
15: hodgkins lymphoma, multiple myeloma, chronic lymphocytic leukemia
16: diffuse large b-cell lymphoma, marginal zone b-cell lymphoma, follicular lymphoma, chronic lymphocytic leukemia
17: hodgkins lymphoma, multiple myeloma, non-hodgkins lymphoma
18: acute lymphoblastic leukemia, lymphoblastic lymphoma, venous thromboembolism
19: non-hodgkins lymphoma
20: follicular lymphoma
21: reticulum cell sarcoma
22: extranodal nasal nk/t cell lymphoma
23: hiv infection, non-hodgkins lymphoma
all_disease_terms
l2_all_disease_terms
<char>
1: multiple sclerosis, non-hodgkin lymphoma
2: multiple sclerosis, non-hodgkin lymphoma
3: multiple sclerosis, non-hodgkin lymphoma
4: non-hodgkin lymphoma, rheumatoid arthritis
5: non-hodgkin lymphoma, rheumatoid arthritis
6: non-hodgkin lymphoma, rheumatoid arthritis
7: non-hodgkin lymphoma, systemic lupus erythematosus
8: non-hodgkin lymphoma, systemic lupus erythematosus
9: non-hodgkin lymphoma, systemic lupus erythematosus
10: hodgkin lymphoma
11: hodgkin lymphoma
12: non-hodgkin lymphoma
13: non-hodgkin lymphoma
14: non-hodgkin lymphoma
15: hodgkin lymphoma, leukemia, multiple myeloma
16: leukemia, non-hodgkin lymphoma
17: hodgkin lymphoma, multiple myeloma, non-hodgkin lymphoma
18: leukemia, non-hodgkin lymphoma, venous thromboembolism
19: non-hodgkin lymphoma
20: non-hodgkin lymphoma
21: non-hodgkin lymphoma
22: non-hodgkin lymphoma
23: hiv infection, non-hodgkin lymphoma
l2_all_disease_terms
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = "non-hodgkins lymphoma",
"non-hodgkin lymphoma"
)
)
gwas_study_info |>
filter(grepl("non-hodgkin lymphoma", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct()
all_disease_terms
<char>
1: diffuse large b-cell lymphoma, multiple sclerosis
2: follicular lymphoma, multiple sclerosis
3: marginal zone b-cell lymphoma, multiple sclerosis
4: diffuse large b-cell lymphoma, rheumatoid arthritis
5: rheumatoid arthritis, follicular lymphoma
6: rheumatoid arthritis, marginal zone b-cell lymphoma
7: diffuse large b-cell lymphoma, systemic lupus erythematosus
8: systemic lupus erythematosus, follicular lymphoma
9: marginal zone b-cell lymphoma, systemic lupus erythematosus
10: diffuse large b-cell lymphoma
11: neoplasm of mature b-cells
12: marginal zone b-cell lymphoma
13: diffuse large b-cell lymphoma, marginal zone b-cell lymphoma, follicular lymphoma, chronic lymphocytic leukemia
14: hodgkins lymphoma, multiple myeloma, non-hodgkins lymphoma
15: acute lymphoblastic leukemia, lymphoblastic lymphoma, venous thromboembolism
16: non-hodgkins lymphoma
17: follicular lymphoma
18: reticulum cell sarcoma
19: extranodal nasal nk/t cell lymphoma
20: hiv infection, non-hodgkins lymphoma
all_disease_terms
l2_all_disease_terms
<char>
1: multiple sclerosis, non-hodgkin lymphoma
2: multiple sclerosis, non-hodgkin lymphoma
3: multiple sclerosis, non-hodgkin lymphoma
4: non-hodgkin lymphoma, rheumatoid arthritis
5: non-hodgkin lymphoma, rheumatoid arthritis
6: non-hodgkin lymphoma, rheumatoid arthritis
7: non-hodgkin lymphoma, systemic lupus erythematosus
8: non-hodgkin lymphoma, systemic lupus erythematosus
9: non-hodgkin lymphoma, systemic lupus erythematosus
10: non-hodgkin lymphoma
11: non-hodgkin lymphoma
12: non-hodgkin lymphoma
13: leukemia, non-hodgkin lymphoma
14: hodgkin lymphoma, multiple myeloma, non-hodgkin lymphoma
15: leukemia, non-hodgkin lymphoma, venous thromboembolism
16: non-hodgkin lymphoma
17: non-hodgkin lymphoma
18: non-hodgkin lymphoma
19: non-hodgkin lymphoma
20: hiv infection, non-hodgkin lymphoma
l2_all_disease_terms
gwas_study_info |>
filter(grepl("multiple myeloma", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct()
all_disease_terms
<char>
1: multiple myeloma
2: multiple myeloma, peripheral neuropathy
3: multiple myeloma, chemotherapy-induced oral mucositis
4: multiple myeloma, monoclonal gammopathy
5: hodgkins lymphoma, multiple myeloma, chronic lymphocytic leukemia
6: hodgkins lymphoma, multiple myeloma, non-hodgkins lymphoma
7: multiple myeloma, clostridium difficile infection
l2_all_disease_terms
<char>
1: multiple myeloma
2: multiple myeloma, peripheral neuropathy
3: chemotherapy-induced oral mucositis, multiple myeloma
4: monoclonal gammopathy, multiple myeloma
5: hodgkin lymphoma, leukemia, multiple myeloma
6: hodgkin lymphoma, multiple myeloma, non-hodgkin lymphoma
7: clostridium difficile infection, multiple myeloma
gwas_study_info |>
filter(grepl("leukemia", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct()
all_disease_terms
<char>
1: acute lymphoblastic leukemia
2: multiple sclerosis, chronic lymphocytic leukemia
3: rheumatoid arthritis, chronic lymphocytic leukemia
4: systemic lupus erythematosus, chronic lymphocytic leukemia
5: acute lymphoblastic leukemia, asparaginase-induced acute pancreatitis
6: chronic lymphocytic leukemia
7: b-cell acute lymphoblastic leukemia
8: chronic myelogenous leukemia
9: hodgkins lymphoma, multiple myeloma, chronic lymphocytic leukemia
10: childhood acute lymphoblastic leukemia
11: acute lymphoblastic leukemia, peripheral neuropathy
12: diffuse large b-cell lymphoma, marginal zone b-cell lymphoma, follicular lymphoma, chronic lymphocytic leukemia
13: acute lymphoblastic leukemia, lymphoblastic lymphoma, venous thromboembolism
14: leukemia
15: b-cell acute lymphoblastic leukemia, adult onset asthma
16: b-cell acute lymphoblastic leukemia, childhood onset asthma
17: b-cell acute lymphoblastic leukemia, graves disease
18: b-cell acute lymphoblastic leukemia, hashimotos thyroiditis
19: b-cell acute lymphoblastic leukemia, hypothyroidism
20: b-cell acute lymphoblastic leukemia, primary biliary cirrhosis
21: b-cell acute lymphoblastic leukemia, sclerosing cholangitis
22: b-cell acute lymphoblastic leukemia, inflammatory bowel disease
23: b-cell acute lymphoblastic leukemia, crohns disease
24: b-cell acute lymphoblastic leukemia, ulcerative colitis
25: b-cell acute lymphoblastic leukemia, rheumatoid arthritis
26: b-cell acute lymphoblastic leukemia, multiple sclerosis
27: b-cell acute lymphoblastic leukemia, systemic scleroderma
28: b-cell acute lymphoblastic leukemia, systemic lupus erythematosus
29: b-cell acute lymphoblastic leukemia, type 1 diabetes mellitus
30: b-cell acute lymphoblastic leukemia, vitiligo
31: lymphoid leukemia
32: acute lymphoblastic leukemia, hyperbilirubinemia
33: b-cell acute lymphoblastic leukemia with t(1;19)(q23;p13.3); e2a-pbx1 (tcf3-pbx1)
34: childhood acute lymphoblastic leukemia, b-cell acute lymphoblastic leukemia with t(1;19)(q23;p13.3); e2a-pbx1 (tcf3-pbx1)
35: monocytic leukemia
36: childhood t acute lymphoblastic leukemia
37: acute lymphoblastic leukemia, neurotoxicity
38: acute lymphoblastic leukemia, acute myeloid leukemia, myelodysplastic syndrome
39: myeloid neoplasm
all_disease_terms
l2_all_disease_terms
<char>
1: leukemia
2: leukemia, multiple sclerosis
3: leukemia, rheumatoid arthritis
4: leukemia, systemic lupus erythematosus
5: leukemia, pancreatitis
6: leukemia
7: leukemia
8: leukemia
9: hodgkin lymphoma, leukemia, multiple myeloma
10: leukemia
11: leukemia, peripheral neuropathy
12: leukemia, non-hodgkin lymphoma
13: leukemia, non-hodgkin lymphoma, venous thromboembolism
14: leukemia
15: asthma, leukemia
16: asthma, leukemia
17: graves disease, leukemia
18: hashimotos thyroiditis, leukemia
19: hypothyroidism, leukemia
20: leukemia, primary biliary cirrhosis
21: leukemia, sclerosing cholangitis
22: inflammatory bowel disease, leukemia
23: crohns disease, leukemia
24: leukemia, ulcerative colitis
25: leukemia, rheumatoid arthritis
26: leukemia, multiple sclerosis
27: leukemia, systemic scleroderma
28: leukemia, systemic lupus erythematosus
29: leukemia, type 1 diabetes mellitus
30: leukemia, vitiligo
31: leukemia
32: hyperbilirubinemia, leukemia
33: leukemia
34: leukemia
35: leukemia
36: leukemia
37: leukemia, neurotoxicity
38: malignant neoplasm of bone and articular cartilage, leukemia, myelodysplastic syndrome
39: leukemia
l2_all_disease_terms
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
case_when(
l2_all_disease_terms == "cancer" ~ "other malignant neoplasms",
TRUE ~ l2_all_disease_terms
)
)
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
ifelse(PUBMED_ID == 27790247,
stringr::str_replace_all(l2_all_disease_terms,
pattern = ", cancer,",
"other malignant neoplasms"
),
l2_all_disease_terms
)
)
other_malignant_terms <- c(
"retroperitoneal cancer",
"peritoneal cancer",
"ewing sarcoma",
"digestive system cancer",
"intestinal cancer",
"small intestine cancer",
"female reproductive organ cancer",
"male reproductive organ cancer",
"vulvar cancer",
"testicular germ cell tumor",
"urogenital cancer",
"squamous cell cancer",
"head and neck cancer",
"malignant tumor of floor of mouth",
"nasal cavity cancer", #? not sure if should be somewhere else ..
"malignant lymphoid tumor",
"neuroendocrine tumor",
"lymphatic system cancer",
"childhood cancer" #? maybe sort furtrher
)
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = paste0(other_malignant_terms, collapse = "(?=,|$)|\\b"),
"other malignant neoplasms"
)
)
gwas_study_info |>
filter(grepl("other malignant neoplasms", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct()
all_disease_terms
<char>
1: squamous cell carcinoma
2: cancer
3: childhood cancer, cardiomyopathy
4: neuroendocrine neoplasm
5: small intestine neuroendocrine tumor
6: pancreatic neuroendocrine tumor
7: pulmonary neuroendocrine tumor
8: childhood cancer, cardiotoxicity
9: childhood cancer, obesity
10: testicular germ cell tumor
11: cutaneous squamous cell carcinoma
12: head and neck malignant neoplasia
13: ewing sarcoma
14: carcinoid tumor
15: head and neck squamous cell carcinoma, pain
16: digestive system carcinoma, chronic obstructive pulmonary disease
17: digestive system carcinoma
18: heart failure, diabetes mellitus, stroke, atrial fibrillation, coronary artery disease, cancer
19: stroke, atrial fibrillation, coronary artery disease, heart failure, diabetes mellitus, cancer
20: head and neck squamous cell carcinoma
21: childhood cancer, breast carcinoma
22: childhood cancer
23: childhood cancer, bone fracture
24: head and neck malignant neoplasia, osteoradionecrosis
25: head and neck carcinoma
26: digestive system cancer
27: reproductive system cancer
28: male reproductive organ cancer
29: childhood cancer, type 2 diabetes mellitus
30: lymphatic system cancer
31: malignant tumor of floor of mouth
32: nasal cavity cancer
33: small intestine cancer
34: retroperitoneal cancer, peritoneum cancer
35: cancer
36: female reproductive organ cancer
37: small intestine carcinoma
38: childhood cancer, hearing loss, ototoxicity
39: childhood cancer, hearing loss
40: intestinal cancer
41: vulvar neoplasm
42: vulvar carcinoma
43: in situ carcinoma
44: carcinoma
45: lymphoid neoplasm
46: head and neck carcinoma, lung cancer
47: urogenital neoplasm
48: head and neck malignant neoplasia, radiation-induced disorder
49: head and neck malignant neoplasia, neuropathic pain
50: head and neck carcinoma, mucositis
51: head and neck carcinoma, fibrosis
52: head and neck carcinoma, mucositis, dysphagia
53: head and neck carcinoma, fibrosis, dysphagia, xerostomia
54: head and neck carcinoma, fibrosis, mucositis, dysphagia, xerostomia
all_disease_terms
l2_all_disease_terms
<char>
1: other malignant neoplasms
2: other malignant neoplasms
3: cardiomyopathy, other malignant neoplasms
4: other malignant neoplasms
5: other malignant neoplasms
6: other malignant neoplasms
7: other malignant neoplasms
8: cardiotoxicity, other malignant neoplasms
9: other malignant neoplasms, obesity
10: other malignant neoplasms
11: other malignant neoplasms
12: other malignant neoplasms
13: other malignant neoplasms
14: other malignant neoplasms
15: other malignant neoplasms, pain
16: chronic obstructive pulmonary disease, other malignant neoplasms
17: other malignant neoplasms
18: atrial fibrillationother malignant neoplasms coronary artery disease, diabetes mellitus, heart failure, stroke
19: atrial fibrillationother malignant neoplasms coronary artery disease, diabetes mellitus, heart failure, stroke
20: other malignant neoplasms
21: breast cancer, other malignant neoplasms
22: other malignant neoplasms
23: bone fracture, other malignant neoplasms
24: other malignant neoplasms, osteoradionecrosis
25: other malignant neoplasms
26: other malignant neoplasms
27: other malignant neoplasms
28: other malignant neoplasms
29: other malignant neoplasms, type 2 diabetes mellitus
30: other malignant neoplasms
31: other malignant neoplasms
32: other malignant neoplasms
33: other malignant neoplasms
34: other malignant neoplasms, other malignant neoplasms
35: other malignant neoplasms, other malignant neoplasms
36: other malignant neoplasms
37: other malignant neoplasms
38: other malignant neoplasms, hearing loss, ototoxicity
39: other malignant neoplasms, hearing loss
40: other malignant neoplasms
41: other malignant neoplasms
42: other malignant neoplasms
43: other malignant neoplasms
44: other malignant neoplasms
45: other malignant neoplasms
46: other malignant neoplasms, tracheal bronchus and lung cancer
47: other malignant neoplasms
48: other malignant neoplasms, radiation-induced disorder
49: other malignant neoplasms, neuropathic pain
50: other malignant neoplasms, mucositis
51: fibrosis, other malignant neoplasms
52: dysphagia, other malignant neoplasms, mucositis
53: dysphagia, fibrosis, other malignant neoplasms, xerostomia
54: dysphagia, fibrosis, other malignant neoplasms, mucositis, xerostomia
l2_all_disease_terms
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
case_when(
l2_all_disease_terms == "benign neoplasm" ~ "other neoplasms",
TRUE ~ l2_all_disease_terms
)
)
unknown_sig_terms <- c("intracranial germ cell tumor",
"bladder tumor")
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = paste0(unknown_sig_terms, collapse = "(?=,|$)|\\b"),
"other neoplasms"
)
)
gwas_study_info |>
filter(grepl("other neoplasms", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct()
all_disease_terms l2_all_disease_terms
<char> <char>
1: benign prostatic hyperplasia other neoplasms
2: colorectal adenoma other neoplasms
3: colorectal cancer, endometrial neoplasm other neoplasms
4: upper aerodigestive tract neoplasm other neoplasms
5: meningioma other neoplasms
6: pituitary gland adenoma other neoplasms
7: nasal cavity polyp other neoplasms
8: nasopharyngeal neoplasm, hearing loss other neoplasms
9: metachronous colorectal adenoma other neoplasms
10: testicular neoplasm, hearing loss other neoplasms
11: hematopoietic and lymphoid system neoplasm other neoplasms
12: benign lipomatous neoplasm other neoplasms
13: benign neoplasm other neoplasms
14: uterine neoplasm other neoplasms
15: benign neoplasm of eye other neoplasms
16: respiratory system neoplasm other neoplasms
17: secondary malignant neoplasm other neoplasms
18: uterine leiomyoma other neoplasms
19: benign colon neoplasm other neoplasms
20: benign digestive system neoplasm other neoplasms
21: benign chondrogenic neoplasm other neoplasms
22: skin lipoma other neoplasms
23: benign soft tissue neoplasm other neoplasms
24: benign neoplasm of skin other neoplasms
25: uterine benign neoplasm other neoplasms
26: benign ovarian neoplasm other neoplasms
27: female reproductive organ cancer other neoplasms
28: benign urinary system neoplasm other neoplasms
29: nervous system benign neoplasm other neoplasms
30: benign neoplasm of spinal cord other neoplasms
31: benign thyroid gland neoplasm other neoplasms
32: benign endocrine neoplasm other neoplasms
33: benign neoplasm of adrenal gland other neoplasms
34: benign neoplasm of parathyroid gland other neoplasms
35: benign neoplasm of pituitary gland other neoplasms
36: lymphangioma, hemangioma other neoplasms
37: malignant renal pelvis neoplasm other neoplasms
38: malignant urinary system neoplasm other neoplasms
39: rectosigmoid junction neoplasm other neoplasms
40: digestive system neoplasm other neoplasms
41: connective tissue neoplasm, bone neoplasm other neoplasms
42: connective tissue neoplasm other neoplasms
43: polyp other neoplasms
44: stomach polyp other neoplasms
45: polyp of large intestine other neoplasms
46: female genital tract polyp other neoplasms
47: uterine polyp other neoplasms
48: cervical polyp other neoplasms
49: breast benign neoplasm other neoplasms
50: pancreatic intraductal papillary-mucinous neoplasm other neoplasms
51: malignant laryngeal neoplasm other neoplasms
52: bone neoplasm other neoplasms
53: melanocytic nevus other neoplasms
54: urogenital neoplasm other neoplasms
55: intraductal breast neoplasm other neoplasms
56: connective and soft tissue neoplasm other neoplasms
57: uterine cervix neoplasm other neoplasms
58: malignant colon neoplasm other neoplasms
59: meningeal neoplasm other neoplasms
60: benign brain neoplasm other neoplasms
61: metastatic malignant neoplasm other neoplasms
62: lobular capilliary hemangioma other neoplasms
63: lymph node neoplasm other neoplasms
64: aldosterone-producing adenoma other neoplasms
65: schwannoma other neoplasms
66: polyp of colon other neoplasms
67: hemangioma of subcutaneous tissue other neoplasms
68: adenomatous colon polyp other neoplasms
69: anal polyp other neoplasms
70: bladder tumor other neoplasms
71: urinary system neoplasm other neoplasms
72: benign neoplasm of stomach other neoplasms
73: polyp of gallbladder other neoplasms
74: gallbladder neoplasm other neoplasms
75: pancreatic neoplasm other neoplasms
76: hepatic hemangioma other neoplasms
77: intracranial germ cell tumor other neoplasms
78: vestibular schwannoma other neoplasms
79: hemangioma other neoplasms
80: polyp of rectum other neoplasms
all_disease_terms l2_all_disease_terms
gwas_study_info |>
filter(grepl("rheumatic heart disease", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct()
all_disease_terms l2_all_disease_terms
<char> <char>
1: rheumatic heart disease rheumatic heart disease
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = "\\bcardiomyopathy\\b|\\bmyocarditis\\b",
"cardiomyopathy and myocarditis"
)
)
gwas_study_info |>
filter(grepl("cardiomyopathy and myocarditis", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct()
all_disease_terms
<char>
1: idiopathic dilated cardiomyopathy
2: childhood cancer, cardiomyopathy
3: hypertrophic cardiomyopathy
4: dilated cardiomyopathy
5: chagas cardiomyopathy
6: peripartum cardiomyopathy
7: takotsubo cardiomyopathy
8: nonischemic cardiomyopathy
9: schizophrenia, myocarditis
10: cardiomyopathy
11: myocarditis
12: chagas disease, chagas cardiomyopathy
13: rheumatoid arthritis, ischemic cardiomyopathy
l2_all_disease_terms
<char>
1: idiopathic cardiomyopathy and myocarditis
2: cardiomyopathy and myocarditis, other malignant neoplasms
3: cardiomyopathy and myocarditis
4: cardiomyopathy and myocarditis
5: cardiomyopathy and myocarditis
6: cardiomyopathy and myocarditis
7: cardiomyopathy and myocarditis
8: cardiomyopathy and myocarditis
9: cardiomyopathy and myocarditis, schizophrenia
10: cardiomyopathy and myocarditis
11: cardiomyopathy and myocarditis
12: cardiomyopathy and myocarditis, chagas disease
13: cardiomyopathy and myocarditis, rheumatoid arthritis
afib_terms <- c("atrial fibrillation",
"atrial flutter",
"post-operative atrial fibrillation")
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = paste0(afib_terms, collapse = "(?=,|$)|\\b"),
"atrial fibrillation and flutter"
)
)
url <- "http://www.ebi.ac.uk/ols4/api/ontologies/cvdo/terms/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FDOID_3627/descendants"
aortic_aneurysm_terms <- get_descendants(url)
[1] "Number of terms collected:"
[1] 6
[1] "\n Some example terms"
[1] "ruptured thoracoabdominal aortic aneurysm"
[2] "ruptured abdominal aortic aneurysm"
[3] "ruptured thoracic aortic aneurysm"
[4] "abdominal aortic aneurysm"
[5] "ruptured aortic aneurysm"
gwas_study_info = gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = paste0(aortic_aneurysm_terms, collapse = "(?=,|$)|\\b"),
"aortic aneurysm"
)
)
gwas_study_info |>
filter(grepl("endocarditis", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct()
all_disease_terms
<char>
1: staphylococcus aureus infection, bacterial endocarditis
2: endocarditis
l2_all_disease_terms
<char>
1: endocarditis, staphylococcus aureus infection
2: endocarditis
gwas_study_info = gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = "acne",
"acne vulgaris"
)
)
gwas_study_info = gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = "adhd",
"attention-deficit/hyperactivity disorder"
)
)
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = "alcohol-related disorders|alcohol and nicotine codependence",
"alcohol use disorders"
)
)
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = "alcohol use disorder",
"alcohol use disorders"
)
)
dementia <- c("alzheimers disease biomarker measurement",
"alzheimers disease neuropathologic change",
"aids dementia",
"dementia",
"frontotemporal dementia",
"lewy body dementia",
"vascular dementia",
"alzheimers disease"
)
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = paste0(dementia, collapse = "(?=,|$)|\\b"),
"alzheimer's disease and other dementias"
)
)
url <- "http://www.ebi.ac.uk/ols4/api/ontologies/mesh/terms/http%253A%252F%252Fid.nlm.nih.gov%252Fmesh%252FD001008/descendants"
anxiety_terms <- get_descendants(url)
[1] "Number of terms collected:"
[1] 15
[1] "\n Some example terms"
[1] "obsessive-compulsive disorder" "generalized anxiety disorder"
[3] "neurocirculatory asthenia" "excoriation disorder"
[5] "anxiety, separation"
anxiety_terms <- c(anxiety_terms, "obsessive-compulsive symptom measurement")
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = paste0(anxiety_terms, collapse = "(?=,|$)|\\b"),
"anxiety disorders"
)
) |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = "anxiety disorder|anxiety measurement",
"anxiety disorders"
)
) |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = "anxiety",
"anxiety disorders"
)
)
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = "methamphetamine",
"amphetamine"
)
)
gwas_study_info = gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms,
pattern = "autism",
"autism spectrum disorders"
)
)
vision_loss_terms <- c("blindness",
"color vision disorder",
"vision disorder",
"myopia",
"refractive error",
"hyperopia",
"astigmatism",
"corneal astigmatism",
"presbyopia",
"anisometropia",
"esotropia",
"non-accomodative esotropia",
"accommodative esotropia")
gwas_study_info = gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = paste0(vision_loss_terms, collapse = "(?=,|$)|\\b"),
"blindness and vision loss"
)
)
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = "cannabis dependence",
"cannabis use disorders"
)
)
url <- "http://www.ebi.ac.uk/ols4/api/ontologies/snomed/terms/http%253A%252F%252Fsnomed.info%252Fid%252F328383001/descendants"
chronic_liver_disease_terms <- get_descendants(url)
[1] "Number of terms collected:"
[1] 114
[1] "\n Some example terms"
[1] "hepatic ascites co-occurrent with chronic active hepatitis due to toxic liver disease"
[2] "cirrhosis of liver co-occurrent and due to primary sclerosing cholangitis (disorder)"
[3] "chronic hepatitis c co-occurrent with human immunodeficiency virus infection"
[4] "primary biliary cirrhosis co-occurrent with systemic scleroderma (disorder)"
[5] "pulmonary fibrosis, hepatic hyperplasia, bone marrow hypoplasia syndrome"
chronic_liver_disease_terms <- c("primary biliary cirrhosis",
"alcoholic liver cirrhosis",
"chronic hepatitis B virus infection",
"acute-on-chronic liver failure",
chronic_liver_disease_terms)
gwas_study_info = gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms,
pattern = paste0(chronic_liver_disease_terms, collapse = "(?=,|$)|\\b"),
"cirrhosis and other chronic liver diseases"
)
)
gwas_study_info = gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms,
pattern = "liver disease",
"cirrhosis and other chronic liver diseases"
)
)
gwas_study_info = gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = "cocaine-related disorders",
"cocaine use disorders"
)
)
gwas_study_info = gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = "depressive symptom measurement|major depressive disorder",
"depressive disorders"
)
) |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = "depressive disorder",
"depressive disorders"
)
)
gal_bile_terms = c("gallbladder disease",
"bile duct disorder",
"biliary tract disease")
gwas_study_info = gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms,
pattern = paste0(gal_bile_terms, collapse = "(?=,|$)|\\b"),
"gallbladder and biliary diseases"
)
)
diseases <- stringr::str_split(pattern = ", ",
gwas_study_info$l2_all_disease_terms) |>
unlist() |>
stringr::str_trim()
pregnancy_terms <- grep("pregnancy", diseases, value = T)
gyno_terms <- c("endometriosis","placenta disease", pregnancy_terms)
gwas_study_info = gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms,
pattern = paste0(gyno_terms, collapse = "(?=,|$)|\\b"),
"gynecological diseases"
)
)
gwas_study_info = gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = "bulimia nervosa|anorexia nervosa|binge eating|eating disorder",
"eating disorders"
)
) |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = "anorexia",
"eating disorders"
)
)
gwas_study_info = gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms,
pattern = "headache disorder|cluster headache|migraine",
"headache disorders"
)
)
== coronary artery disease (https://www.ncbi.nlm.nih.gov/books/NBK209964/)
gwas_study_info = gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms,
pattern = "coronary artery disease",
"ischemic heart disease"
)
)
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = "opioid dependence|opioid use disorder",
"opioid use disorders"
)
)
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = "parkinsons disease",
"parkinson's disease"
)
)
url <- "http://www.ebi.ac.uk/ols4/api/ontologies/mondo/terms/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FMONDO_0002028/descendants"
personality_disorders <- get_descendants(url)
[1] "Number of terms collected:"
[1] 10
[1] "\n Some example terms"
[1] "obsessive-compulsive personality disorder"
[2] "narcissistic personality disorder"
[3] "schizotypal personality disorder"
[4] "histrionic personality disorder"
[5] "antisocial personality disorder"
gwas_study_info = gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms,
pattern = paste0(personality_disorders, collapse = "(?=,|$)|\\b"),
"personality disorders"
)
)
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = "heroin dependence|drug dependence|nictone dependence|substance abuse|drug misuse|alcohol use disorders delirium",
"other drug use disorders"
)
)
url <- "http://www.ebi.ac.uk/ols4/api/ontologies/doid/terms/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FDOID_535/descendants"
sleep_disorders <- get_descendants(url)
[1] "Number of terms collected:"
[1] 16
[1] "\n Some example terms"
[1] "periodic limb movement disorder" "advanced sleep phase syndrome 3"
[3] "advanced sleep phase syndrome 2" "advanced sleep phase syndrome 1"
[5] "advanced sleep phase syndrome 4"
url <- "http://www.ebi.ac.uk/ols4/api/ontologies/efo/terms/http%253A%252F%252Fwww.ebi.ac.uk%252Fefo%252FEFO_0008568/descendants"
other_sleep_disorders <- get_descendants(url)
[1] "Number of terms collected:"
[1] 26
[1] "\n Some example terms"
[1] "autosomal dominant cerebellar ataxia, deafness and narcolepsy"
[2] "hereditary sensory neuropathy-deafness-dementia syndrome"
[3] "rapid eye movement sleep disorder"
[4] "substance-induced sleep disorder"
[5] "drug induced central sleep apnea"
sleep_disorders <- c(sleep_disorders,
other_sleep_disorders)
gwas_study_info = gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms,
pattern = paste0(sleep_disorders, collapse = "(?=,|$)|\\b"),
"sleep disorders"
)
)
and remove alz, parks, dementia
other_mental_disorders <- c("schizophrenia",
"manic or hypomanic episode",
"mental or behavioural disorder",
"mental disorder"
)
other_neuro <- c("mild neurocognitive disorder",
"hiv-associated neurocognitive disorder")
disturbances of sensation of smell and taste
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = "disorderss",
"disorders"
)
)
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = "anxiety disorders disorderss",
"anxiety disorders"
)
)
gbd_data <- data.table::fread(here::here("data/gbd/ihme_gbd_2019_global_disease_burden_rate_all_ages.csv"))
gbd_data$cause <- stringr::str_remove_all(gbd_data$cause, ",")
diseases <- stringr::str_split(pattern = ", ",
gwas_study_info$l2_all_disease_terms[gwas_study_info$l2_all_disease_terms != ""]) |>
unlist() |>
stringr::str_trim()
gbd_data$cause[!tolower(gbd_data$cause) %in% unique(diseases)] |> sort()
[1] "Acute glomerulonephritis"
[2] "Acute glomerulonephritis"
[3] "Acute glomerulonephritis"
[4] "Bacterial skin diseases"
[5] "Bacterial skin diseases"
[6] "Bacterial skin diseases"
[7] "Chronic kidney disease"
[8] "Chronic kidney disease"
[9] "Chronic kidney disease"
[10] "Congenital birth defects"
[11] "Congenital birth defects"
[12] "Congenital birth defects"
[13] "Diabetes mellitus type 1"
[14] "Diabetes mellitus type 1"
[15] "Diabetes mellitus type 1"
[16] "Diabetes mellitus type 2"
[17] "Diabetes mellitus type 2"
[18] "Diabetes mellitus type 2"
[19] "Drug use disorders"
[20] "Drug use disorders"
[21] "Drug use disorders"
[22] "Endocrine metabolic blood and immune disorders"
[23] "Endocrine metabolic blood and immune disorders"
[24] "Endocrine metabolic blood and immune disorders"
[25] "Fungal skin diseases"
[26] "Fungal skin diseases"
[27] "Fungal skin diseases"
[28] "Hemoglobinopathies and hemolytic anemias"
[29] "Hemoglobinopathies and hemolytic anemias"
[30] "Hemoglobinopathies and hemolytic anemias"
[31] "Idiopathic developmental intellectual disability"
[32] "Idiopathic developmental intellectual disability"
[33] "Idiopathic epilepsy"
[34] "Idiopathic epilepsy"
[35] "Idiopathic epilepsy"
[36] "Inguinal femoral and abdominal hernia"
[37] "Inguinal femoral and abdominal hernia"
[38] "Inguinal femoral and abdominal hernia"
[39] "Interstitial lung disease and pulmonary sarcoidosis"
[40] "Interstitial lung disease and pulmonary sarcoidosis"
[41] "Interstitial lung disease and pulmonary sarcoidosis"
[42] "Lower extremity peripheral arterial disease"
[43] "Lower extremity peripheral arterial disease"
[44] "Lower extremity peripheral arterial disease"
[45] "Neuroblastoma and other peripheral nervous cell tumors"
[46] "Neuroblastoma and other peripheral nervous cell tumors"
[47] "Neuroblastoma and other peripheral nervous cell tumors"
[48] "Non-rheumatic valvular heart disease"
[49] "Non-rheumatic valvular heart disease"
[50] "Non-rheumatic valvular heart disease"
[51] "Oral disorders"
[52] "Oral disorders"
[53] "Oral disorders"
[54] "Other cardiovascular and circulatory diseases"
[55] "Other cardiovascular and circulatory diseases"
[56] "Other chronic respiratory diseases"
[57] "Other digestive diseases"
[58] "Other mental disorders"
[59] "Other mental disorders"
[60] "Other mental disorders"
[61] "Other musculoskeletal disorders"
[62] "Other musculoskeletal disorders"
[63] "Other neurological disorders"
[64] "Other neurological disorders"
[65] "Other neurological disorders"
[66] "Other sense organ diseases"
[67] "Other sense organ diseases"
[68] "Other sense organ diseases"
[69] "Other skin and subcutaneous diseases"
[70] "Other skin and subcutaneous diseases"
[71] "Other skin and subcutaneous diseases"
[72] "Paralytic ileus and intestinal obstruction"
[73] "Paralytic ileus and intestinal obstruction"
[74] "Paralytic ileus and intestinal obstruction"
[75] "Pulmonary Arterial Hypertension"
[76] "Pulmonary Arterial Hypertension"
[77] "Pulmonary Arterial Hypertension"
[78] "Scabies"
[79] "Scabies"
[80] "Scabies"
[81] "Sudden infant death syndrome"
[82] "Total burden related to Non-alcoholic fatty liver disease (NAFLD)"
[83] "Total burden related to Non-alcoholic fatty liver disease (NAFLD)"
[84] "Total burden related to Non-alcoholic fatty liver disease (NAFLD)"
[85] "Upper digestive system diseases"
[86] "Upper digestive system diseases"
[87] "Upper digestive system diseases"
[88] "Urinary diseases and male infertility"
[89] "Urinary diseases and male infertility"
[90] "Urinary diseases and male infertility"
[91] "Vascular intestinal disorders"
[92] "Vascular intestinal disorders"
[93] "Vascular intestinal disorders"
[94] "Viral skin diseases"
[95] "Viral skin diseases"
[96] "Viral skin diseases"
gbd_data =
gbd_data |>
mutate(cause = tolower(cause))
gwas_disease_traits = data.frame(cause = diseases)
# gwas_study_info |>
# filter(DISEASE_STUDY == T) |>
# select(all_disease_terms, l1_all_disease_terms, cause = l2_all_disease_terms) |>
# distinct()
left_join(gwas_disease_traits,
gbd_data) |>
head()
Joining with `by = join_by(cause)`
Warning in left_join(gwas_disease_traits, gbd_data): Detected an unexpected many-to-many relationship between `x` and `y`.
ℹ Row 3 of `x` matches multiple rows in `y`.
ℹ Row 19 of `y` matches multiple rows in `x`.
ℹ If a many-to-many relationship is expected, set `relationship =
"many-to-many"` to silence this warning.
cause
1 idiopathic cardiomyopathy and myocarditis
2 cleft lip
3 tracheal bronchus and lung cancer
4 tracheal bronchus and lung cancer
5 tracheal bronchus and lung cancer
6 tracheal bronchus and lung cancer
measure location sex age metric year
1 <NA> <NA> <NA> <NA> <NA> NA
2 <NA> <NA> <NA> <NA> <NA> NA
3 DALYs (Disability-Adjusted Life Years) Global Both All ages Rate 2019
4 Prevalence Global Both All ages Rate 2019
5 Incidence Global Both All ages Rate 2019
6 DALYs (Disability-Adjusted Life Years) Global Both All ages Rate 2019
val upper lower
1 NA NA NA
2 NA NA NA
3 580.36100 627.79984 532.74652
4 40.27440 43.51721 37.12978
5 28.16826 30.49575 25.77712
6 580.36100 627.79984 532.74652
gwas_study_info |> select(cause = l2_all_disease_terms) |>
distinct() |>
left_join(gbd_data) |>
head()
Joining with `by = join_by(cause)`
cause
<char>
1:
2: idiopathic cardiomyopathy and myocarditis
3: cleft lip
4: tracheal bronchus and lung cancer
5: tracheal bronchus and lung cancer
6: tracheal bronchus and lung cancer
measure location sex age metric year
<char> <char> <char> <char> <char> <int>
1: <NA> <NA> <NA> <NA> <NA> NA
2: <NA> <NA> <NA> <NA> <NA> NA
3: <NA> <NA> <NA> <NA> <NA> NA
4: DALYs (Disability-Adjusted Life Years) Global Both All ages Rate 2019
5: Prevalence Global Both All ages Rate 2019
6: Incidence Global Both All ages Rate 2019
val upper lower
<num> <num> <num>
1: NA NA NA
2: NA NA NA
3: NA NA NA
4: 580.36100 627.79984 532.74652
5: 40.27440 43.51721 37.12978
6: 28.16826 30.49575 25.77712
diseases <- stringr::str_split(pattern = ", ",
gwas_study_info$l2_all_disease_terms[gwas_study_info$l2_all_disease_terms != ""]) |>
unlist() |>
stringr::str_trim()
length(unique(diseases))
[1] 1598
# make frequency table
freq <- table(as.factor(diseases))
# sort in decreasing order
freq_sorted <- sort(freq, decreasing = TRUE)
# show top N, e.g. top 10
head(freq_sorted, 10)
kidney disease hypertension
10915 7096
type 2 diabetes mellitus other neoplasms
922 537
depressive disorders alzheimer's disease and other dementias
513 509
ischemic heart disease breast cancer
499 379
schizophrenia asthma
368 348
gwas_study_info <- fwrite(gwas_study_info,
here::here("output/gwas_cat/gwas_study_info_trait_group_l2.csv"))
sessionInfo()
R version 4.3.1 (2023-06-16)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS 15.6.1
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
time zone: America/Los_Angeles
tzcode source: internal
attached base packages:
[1] stats graphics grDevices datasets utils methods base
other attached packages:
[1] jsonlite_2.0.0 httr_1.4.7 stringr_1.5.1 ggplot2_3.5.2
[5] data.table_1.17.8 dplyr_1.1.4 workflowr_1.7.1
loaded via a namespace (and not attached):
[1] gtable_0.3.6 compiler_4.3.1 renv_1.0.3 promises_1.3.3
[5] tidyselect_1.2.1 Rcpp_1.1.0 git2r_0.36.2 callr_3.7.6
[9] later_1.4.2 jquerylib_0.1.4 scales_1.4.0 yaml_2.3.10
[13] fastmap_1.2.0 here_1.0.1 R6_2.6.1 generics_0.1.4
[17] curl_6.4.0 knitr_1.50 tibble_3.3.0 rprojroot_2.1.0
[21] RColorBrewer_1.1-3 bslib_0.9.0 pillar_1.11.0 rlang_1.1.6
[25] cachem_1.1.0 stringi_1.8.7 httpuv_1.6.16 xfun_0.52
[29] getPass_0.2-4 fs_1.6.6 sass_0.4.10 cli_3.6.5
[33] withr_3.0.2 magrittr_2.0.3 ps_1.9.1 grid_4.3.1
[37] digest_0.6.37 processx_3.8.6 rstudioapi_0.17.1 lifecycle_1.0.4
[41] vctrs_0.6.5 evaluate_1.0.4 glue_1.8.0 farver_2.1.2
[45] whisker_0.4.1 rmarkdown_2.29 tools_4.3.1 pkgconfig_2.0.3
[49] htmltools_0.5.8.1