Last updated: 2025-09-16

Checks: 7 0

Knit directory: genomics_ancest_disease_dispar/

This reproducible R Markdown analysis was created with workflowr (version 1.7.1). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.


Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

The command set.seed(20220216) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version 7fa03f5. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .DS_Store
    Ignored:    .Rproj.user/
    Ignored:    data/.DS_Store
    Ignored:    data/gbd/.DS_Store
    Ignored:    data/gbd/ihme_gbd_2019_global_disease_burden_rate_all_ages.csv
    Ignored:    data/gbd/ihme_gbd_2019_global_paf_rate_percent_all_ages.csv
    Ignored:    data/gbd/ihme_gbd_2021_global_disease_burden_rate_all_ages.csv
    Ignored:    data/gbd/ihme_gbd_2021_global_paf_rate_percent_all_ages.csv
    Ignored:    data/gwas_catalog/
    Ignored:    data/who/
    Ignored:    output/gwas_cat/
    Ignored:    output/gwas_study_info_cohort_corrected.csv
    Ignored:    output/gwas_study_info_trait_corrected.csv
    Ignored:    output/gwas_study_info_trait_ontology_info.csv
    Ignored:    output/gwas_study_info_trait_ontology_info_l1.csv
    Ignored:    output/gwas_study_info_trait_ontology_info_l2.csv
    Ignored:    output/trait_ontology/
    Ignored:    renv/

Unstaged changes:
    Modified:   code/get_term_descendants.R

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.


These are the previous versions of the repository in which changes were made to the R Markdown (analysis/level_2_disease_group.Rmd) and HTML (docs/level_2_disease_group.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File Version Author Date Message
Rmd 7fa03f5 IJbeasley 2025-09-16 More cancer typos
html 8f1639b IJbeasley 2025-09-16 Build site.
Rmd 345ad9b IJbeasley 2025-09-16 More cancer typos
html a15dd40 IJbeasley 2025-09-16 Build site.
Rmd 16ead66 IJbeasley 2025-09-16 Correcting some cancer grouping
html 6018e42 IJbeasley 2025-09-16 Build site.
Rmd 02a0b9d IJbeasley 2025-09-16 Improving cancer grouping
html 6f66696 IJbeasley 2025-09-16 Build site.
Rmd 66cff1c IJbeasley 2025-09-16 Even more disease term grouping
html 21b6c02 IJbeasley 2025-09-15 Build site.
html 5ec3111 IJbeasley 2025-09-15 Build site.
html 30d773e IJbeasley 2025-09-15 Build site.
html 8d64a38 IJbeasley 2025-09-15 Build site.
Rmd b3088d8 IJbeasley 2025-09-15 workflowr::wflow_publish("analysis/level_2_disease_group.Rmd")
html b89d661 IJbeasley 2025-09-10 Build site.
Rmd c0fcab7 IJbeasley 2025-09-10 workflowr::wflow_publish("analysis/level_2_disease_group.Rmd")
html ead4d8e IJbeasley 2025-09-10 Build site.
Rmd 3964f77 IJbeasley 2025-09-10 workflowr::wflow_publish("analysis/level_2_disease_group.Rmd")
html 8fb639d IJbeasley 2025-09-10 Build site.
Rmd edeb6f5 IJbeasley 2025-09-10 workflowr::wflow_publish("analysis/level_2_disease_group.Rmd")
html fe91704 IJbeasley 2025-09-09 Build site.
Rmd 9c64867 IJbeasley 2025-09-09 Minor fixing of disease trait categorisation
html fa509c0 IJbeasley 2025-09-08 Build site.
Rmd c9602c7 IJbeasley 2025-09-08 More grouping to match GBD

1 Set up

library(dplyr)
library(data.table)
library(ggplot2)
library(stringr)

1.1 Ontology help - for getting disease subtypes

source(here::here("code/get_term_descendants.R"))
gwas_study_info <- fread(here::here("output/gwas_cat/gwas_study_info_group_l1_v2.csv"))

2 Objectives:

  • Further group disease terms (level 2 categories) to match GBD (globalqburden of disease) categories more closely.

2.1 Grouping - level 2 set up

gwas_study_info = gwas_study_info |>
  mutate(l2_all_disease_terms = l1_all_disease_terms)

3 Neoplasms

3.1 [x] Lip and oral cavity cancer

gwas_study_info |> 
 filter(grepl("lip and oral cavity cancer", l2_all_disease_terms)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
                                     all_disease_terms
                                                <char>
1:                                  oral cavity cancer
2:                                      mouth neoplasm
3:                                       tongue cancer
4:                         major salivary gland cancer
5: human papilloma virus infection, oral cavity cancer
6:                                     tongue neoplasm
                                          l2_all_disease_terms
                                                        <char>
1:                                  lip and oral cavity cancer
2:                                  lip and oral cavity cancer
3:                                  lip and oral cavity cancer
4:                                  lip and oral cavity cancer
5: human papilloma virus infection, lip and oral cavity cancer
6:                                  lip and oral cavity cancer

3.2 Nasopharynx cancer

gwas_study_info =
gwas_study_info |> 
  mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = "nasopharyngeal cancer",
                                   "nasopharynx cancer"
                          )  
        )
  
gwas_study_info |> 
 filter(grepl("nasopharynx cancer", l2_all_disease_terms)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct()
         all_disease_terms l2_all_disease_terms
                    <char>               <char>
1: nasopharyngeal neoplasm   nasopharynx cancer

3.3 [x] Other pharynx cancer

gwas_study_info |> 
 filter(grepl("other pharynx cancer", l2_all_disease_terms)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
                                       all_disease_terms
                                                  <char>
1:                                     oropharynx cancer
2: laryngeal squamous cell carcinoma, hypopharynx cancer
3:    human papilloma virus infection, oropharynx cancer
4:                                         tonsil cancer
5:                              hypopharyngeal carcinoma
6:                   pharynx cancer, laryngeal carcinoma
                                    l2_all_disease_terms
                                                  <char>
1:                                  other pharynx cancer
2:                   larynx cancer, other pharynx cancer
3: human papilloma virus infection, other pharynx cancer
4:                                  other pharynx cancer
5:                                  other pharynx cancer
6:                   larynx cancer, other pharynx cancer

3.4 Esophageal cancer

gwas_study_info |>
 filter(grepl("esophageal cancer", l2_all_disease_terms)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
                                            all_disease_terms
                                                       <char>
1:              esophageal adenocarcinoma, barretts esophagus
2:                                  esophageal adenocarcinoma
3: esophageal adenocarcinoma, gastroesophageal reflux disease
4:                         esophageal squamous cell carcinoma
5:                    esophageal carcinoma, gastric carcinoma
6:              squamous cell carcinoma, esophageal carcinoma
                                 l2_all_disease_terms
                                               <char>
1:              barretts esophagus, esophageal cancer
2:                                  esophageal cancer
3: esophageal cancer, gastroesophageal reflux disease
4:                                  esophageal cancer
5:                  esophageal cancer, stomach cancer
6:                                  esophageal cancer

3.5 Stomach cancer

gwas_study_info |>
 filter(grepl("stomach cancer", l2_all_disease_terms)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
                                            all_disease_terms
                                                       <char>
1:                                          gastric carcinoma
2:                    esophageal carcinoma, gastric carcinoma
3:                                   gastric cardia carcinoma
4:                                     gastric adenocarcinoma
5: lung carcinoma, squamous cell carcinoma, gastric carcinoma
6:                                             gastric cancer
                l2_all_disease_terms
                              <char>
1:                    stomach cancer
2: esophageal cancer, stomach cancer
3:                    stomach cancer
4:                    stomach cancer
5:       lung cancer, stomach cancer
6:                    stomach cancer

3.6 Colon and rectum cancer

gwas_study_info = gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = "colorectal cancer",
                                   "colon and rectum cancer"
                          )  
        )

gwas_study_info |>
 filter(grepl("colon and rectum cancer", l2_all_disease_terms)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |> 
  head()
                                                                                                                                                                                                                                           all_disease_terms
                                                                                                                                                                                                                                                      <char>
1:                                                                                                                                                                                                                                         colorectal cancer
2:                                                                                                                                                                                                                 sclerosing cholangitis, colorectal cancer
3:                                                                                                                                                                                                                     colorectal cancer, colorectal adenoma
4:                                                                                                                                                                                                                              metastatic colorectal cancer
5: lung carcinoma, estrogen-receptor negative breast cancer, ovarian endometrioid carcinoma, colorectal cancer, prostate carcinoma, ovarian serous carcinoma, breast carcinoma, ovarian carcinoma, lung adenocarcinoma, squamous cell lung carcinoma, cancer
6:                                                                                                                                                                                                                                             rectum cancer
                                                                   l2_all_disease_terms
                                                                                 <char>
1:                                                              colon and rectum cancer
2:                                      colon and rectum cancer, sclerosing cholangitis
3:                                             benign neoplasm, colon and rectum cancer
4:                                                              colon and rectum cancer
5: breast cancer, colon and rectum cancer, lung cancer, ovarian cancer, prostate cancer
6:                                                              colon and rectum cancer

3.7 Liver cancer

gwas_study_info |>
 filter(grepl("liver cancer", l2_all_disease_terms)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
                                         all_disease_terms
                                                    <char>
1:   hepatitis b virus infection, hepatocellular carcinoma
2:              sclerosing cholangitis, cholangiocarcinoma
3:              cholangiocarcinoma, sclerosing cholangitis
4:        sclerosing cholangitis, hepatocellular carcinoma
5:   hepatitis c virus infection, hepatocellular carcinoma
6: hepatocellular carcinoma, non-alcoholic steatohepatitis
                              l2_all_disease_terms
                                            <char>
1:             hepatitis b infection, liver cancer
2:            liver cancer, sclerosing cholangitis
3:            liver cancer, sclerosing cholangitis
4:            liver cancer, sclerosing cholangitis
5:             hepatitis c infection, liver cancer
6: liver cancer, non-alcoholic fatty liver disease

3.8 Gallbladder and biliary tract cancer

gwas_study_info = 
gwas_study_info |>
 mutate(l2_all_disease_terms  = 
        case_when(
          l2_all_disease_terms == "cancer of gallbladder and extrahepatic biliary tract" ~ "gallbladder and biliary tract cancer",
          TRUE ~ l2_all_disease_terms
                 )
        )

gwas_study_info |>
 filter(grepl("gallbladder and biliary tract cancer", l2_all_disease_terms)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct()
                                         all_disease_terms
                                                    <char>
1:            sclerosing cholangitis, gallbladder neoplasm
2:                                    gallbladder neoplasm
3: carcinoma of gallbladder and extrahepatic biliary tract
                                           l2_all_disease_terms
                                                         <char>
1: gallbladder and biliary tract cancer, sclerosing cholangitis
2:                         gallbladder and biliary tract cancer
3:                         gallbladder and biliary tract cancer

3.9 Pancreatic cancer

gwas_study_info |>
 filter(grepl("pancreatic cancer", l2_all_disease_terms)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
                                                                                                                         all_disease_terms
                                                                                                                                    <char>
1:                                                                                                                    pancreatic carcinoma
2:                                                                                                        pancreatic ductal adenocarcinoma
3:                                                                                                       pancreatic carcinoma, neutropenia
4:              breast carcinoma, estrogen-receptor positive breast cancer, metastatic prostate cancer, pancreatic carcinoma, hypertension
5:               breast carcinoma, estrogen-receptor positive breast cancer, metastatic prostate cancer, pancreatic carcinoma, proteinuria
6: breast carcinoma, estrogen-receptor positive breast cancer, metastatic prostate cancer, pancreatic carcinoma, hypertension, proteinuria
                                                           l2_all_disease_terms
                                                                         <char>
1:                                                            pancreatic cancer
2:                                                            pancreatic cancer
3:                                               neutropenia, pancreatic cancer
4:              breast cancer, hypertension, pancreatic cancer, prostate cancer
5:               breast cancer, pancreatic cancer, prostate cancer, proteinuria
6: breast cancer, hypertension, pancreatic cancer, prostate cancer, proteinuria

3.10 Larynx cancer

gwas_study_info |>
 filter(grepl("larynx cancer", l2_all_disease_terms)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
                                       all_disease_terms
                                                  <char>
1:                     laryngeal squamous cell carcinoma
2: laryngeal squamous cell carcinoma, hypopharynx cancer
3:                                   laryngeal carcinoma
4:                                    laryngeal neoplasm
5:                                      glottis neoplasm
6:                   pharynx cancer, laryngeal carcinoma
                  l2_all_disease_terms
                                <char>
1:                       larynx cancer
2: larynx cancer, other pharynx cancer
3:                       larynx cancer
4:                       larynx cancer
5:                       larynx cancer
6: larynx cancer, other pharynx cancer

3.11 Tracheal, bronchus, and lung cancer

resp_cancer_terms = c("lung cancer",
                      "bronchus cancer",
                      "tracheal cancer",
                      "respiratory system cancer"
                        )

gwas_study_info = 
gwas_study_info |> 
 mutate(l2_all_disease_terms  =
          ifelse(l2_all_disease_terms != "tracheal bronchus and lung cancer",
          stringr::str_replace_all(l2_all_disease_terms,
                                  pattern = paste0(resp_cancer_terms, collapse = "(?=,|$)|\\b"),
                                   "tracheal bronchus and lung cancer"
                          ),
          l2_all_disease_terms
          )
        )

gwas_study_info |>
 filter(grepl("tracheal bronchus and lung cancer", l2_all_disease_terms)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
                                             all_disease_terms
                                                        <char>
1:                               non-small cell lung carcinoma
2:                                         lung adenocarcinoma
3:                                squamous cell lung carcinoma
4:               lung carcinoma, family history of lung cancer
5:          lung adenocarcinoma, family history of lung cancer
6: squamous cell lung carcinoma, family history of lung cancer
                l2_all_disease_terms
                              <char>
1: tracheal bronchus and lung cancer
2: tracheal bronchus and lung cancer
3: tracheal bronchus and lung cancer
4: tracheal bronchus and lung cancer
5: tracheal bronchus and lung cancer
6: tracheal bronchus and lung cancer

3.12 Malignant skin melanoma

gwas_study_info =
gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms,
                                  pattern = "malignant melanoma of skin",
                                   "malignant skin melanoma"
                          )  
        )

gwas_study_info |>
 filter(grepl("malignant skin melanoma", l2_all_disease_terms)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
                   all_disease_terms
                              <char>
1:                cutaneous melanoma
2:                          melanoma
3: neuroblastoma, cutaneous melanoma
4:  melanoma, immune system toxicity
5:       non-melanoma skin carcinoma
                              l2_all_disease_terms
                                            <char>
1:                         malignant skin melanoma
2:                         malignant skin melanoma
3:          malignant skin melanoma, neuroblastoma
4: immune system toxicity, malignant skin melanoma
5:         non-malignant skin melanoma skin cancer

3.13 Non-melanoma skin cancer

gwas_study_info |>
 filter(grepl("non-melanoma skin cancer", l2_all_disease_terms)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct()
                               all_disease_terms     l2_all_disease_terms
                                          <char>                   <char>
1: squamous cell carcinoma, basal cell carcinoma non-melanoma skin cancer
2:                        keratinocyte carcinoma non-melanoma skin cancer
3:                          basal cell carcinoma non-melanoma skin cancer
4:                   non-melanoma skin carcinoma non-melanoma skin cancer
5:                                 skin neoplasm non-melanoma skin cancer
6:                        skin carcinoma in situ non-melanoma skin cancer

3.14 Soft tissue and other extraosseous sarcomas

gwas_study_info = 
gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = "soft tissue sarcoma",
                                   "soft tissue and other extraosseous sarcomas"
                          )  
        )

gwas_study_info |>
 filter(grepl("soft tissue and other extraosseous sarcomas", l2_all_disease_terms)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
       all_disease_terms                                 l2_all_disease_terms
                  <char>                                               <char>
1: sarcoma, fibrosarcoma sarcoma, soft tissue and other extraosseous sarcomas
2:       kaposis sarcoma          soft tissue and other extraosseous sarcomas

3.15 Malignant neoplasm of bone and articular cartilage

gwas_study_info =
gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = "bone cancer|osteosarcoma",
                                   "malignant neoplasm of bone and articular cartilage"
                          )  
        )

gwas_study_info |>
 filter(grepl("malignant neoplasm of bone and articular cartilage", l2_all_disease_terms)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
                                  all_disease_terms
                                             <char>
1:                                     osteosarcoma
2:                           acute myeloid leukemia
3:                                 myeloid leukemia
4:                          malignant bone neoplasm
5: acute myeloid leukemia, myelodysplastic syndrome
6:                                    bone neoplasm
                                                           l2_all_disease_terms
                                                                         <char>
1:                           malignant neoplasm of bone and articular cartilage
2:                           malignant neoplasm of bone and articular cartilage
3:                           malignant neoplasm of bone and articular cartilage
4:                           malignant neoplasm of bone and articular cartilage
5: malignant neoplasm of bone and articular cartilage, myelodysplastic syndrome
6:                           malignant neoplasm of bone and articular cartilage

3.16 Breast cancer

gwas_study_info |>
 filter(grepl("breast cancer", l2_all_disease_terms)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
                                            all_disease_terms
                                                       <char>
1:                   estrogen-receptor negative breast cancer
2:                                           breast carcinoma
3:                   estrogen-receptor positive breast cancer
4:                                          breast carcinoma,
5: estrogen-receptor positive breast cancer, breast carcinoma
6: estrogen-receptor negative breast cancer, breast carcinoma
   l2_all_disease_terms
                 <char>
1:        breast cancer
2:        breast cancer
3:        breast cancer
4:        breast cancer
5:        breast cancer
6:        breast cancer

3.17 Cervical cancer

gwas_study_info |>
 filter(grepl("cervical cancer", l2_all_disease_terms)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
                                     all_disease_terms
                                                <char>
1:                                  cervical carcinoma
2:                                     cervical cancer
3:                dysplasia of cervix, cervical cancer
4:                          dysplasia, cervical cancer
5:                    uterine cervix carcinoma in situ
6: cervical carcinoma, human papilloma virus infection
                               l2_all_disease_terms
                                             <char>
1:                                  cervical cancer
2:                                  cervical cancer
3:             cervical cancer, dysplasia of cervix
4:                       cervical cancer, dysplasia
5:                                  cervical cancer
6: cervical cancer, human papilloma virus infection

3.18 Uterine cancer

# ? is endometrial cancer a subset of uterine cancer for GBD?
# is for ontology: http://purl.obolibrary.org/obo/MONDO_0002715
gwas_study_info =
gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = "endometrial cancer",
                                   "uterine cancer"
                          )  
        )

gwas_study_info |>
 filter(grepl("uterine cancer", l2_all_disease_terms)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
                    all_disease_terms     l2_all_disease_terms
                               <char>                   <char>
1: endometrial endometrioid carcinoma           uterine cancer
2:              endometrial carcinoma           uterine cancer
3:               endometrial neoplasm           uterine cancer
4:                  uterine carcinoma           uterine cancer
5:       endometrial cancer, covid-19 covid-19, uterine cancer
6:              uterine corpus cancer           uterine cancer

3.19 Ovarian cancer

gwas_study_info |>
 filter(grepl("ovarian cancer", l2_all_disease_terms)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct()
                                                                                                                                                                                                                                            all_disease_terms
                                                                                                                                                                                                                                                       <char>
 1:                                                                                                                                                                                                                                         ovarian carcinoma
 2:                                                                                                                                                                                                                       malignant epithelial tumor of ovary
 3:                                                                                                                                                                                                   prostate carcinoma, breast carcinoma, ovarian carcinoma
 4: lung carcinoma, estrogen-receptor negative breast cancer, ovarian endometrioid carcinoma, colorectal cancer, prostate carcinoma, ovarian serous carcinoma, breast carcinoma, ovarian carcinoma, lung adenocarcinoma, squamous cell lung carcinoma, cancer
 5:                                                                                                                                                                                                                           ovarian mucinous adenocarcinoma
 6:                                                                                                                                                                                                                                  ovarian serous carcinoma
 7:                                                                                                                                                                                                                         ovarian clear cell adenocarcinoma
 8:                                                                                                                                                                                                                            ovarian endometrioid carcinoma
 9:                                                                                                                                                                                                                               ovarian carcinoma, covid-19
10:                                                                                                                                                                                                                  high grade ovarian serous adenocarcinoma
11:                                                                                                                                                                                                                             ovarian serous adenocarcinoma
12:                                                                                                                                                                                                                                 ovarian clear cell cancer
13:                                                                                                                                                                                                                     uterine adnexa cancer, ovarian cancer
14:                                                                                                                                                                                                                                            ovarian cancer
15:                                                                                                                                                                                                                                          ovarian neoplasm
16:                                                                                                                                                                                                                          breast cancer, ovarian carcinoma
17:                                                                                                                                                                                                      ovarian carcinoma, cancer aggressiveness measurement
18:                                                                                                                                                                                               ovarian serous carcinoma, cancer aggressiveness measurement
                                                                                          l2_all_disease_terms
                                                                                                        <char>
 1:                                                                                             ovarian cancer
 2:                                                                                             ovarian cancer
 3:                                                             breast cancer, ovarian cancer, prostate cancer
 4: breast cancer, colon and rectum cancer, tracheal bronchus and lung cancer, ovarian cancer, prostate cancer
 5:                                                                                             ovarian cancer
 6:                                                                                             ovarian cancer
 7:                                                                                             ovarian cancer
 8:                                                                                             ovarian cancer
 9:                                                                                   covid-19, ovarian cancer
10:                                                                                             ovarian cancer
11:                                                                                             ovarian cancer
12:                                                                                             ovarian cancer
13:                                                                             ovarian cancer, uterine cancer
14:                                                                                             ovarian cancer
15:                                                                                             ovarian cancer
16:                                                                              breast cancer, ovarian cancer
17:                                                                                             ovarian cancer
18:                                                                                             ovarian cancer

3.20 Prostate cancer

gwas_study_info |>
 filter(grepl("prostate cancer", l2_all_disease_terms)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
                                         all_disease_terms
                                                    <char>
1:                                      prostate carcinoma
2:   cancer aggressiveness measurement, prostate carcinoma
3: prostate carcinoma, breast carcinoma, ovarian carcinoma
4:       metastatic prostate cancer, peripheral neuropathy
5:                              metastatic prostate cancer
6:                prostate carcinoma, erectile dysfunction
                             l2_all_disease_terms
                                           <char>
1:                                prostate cancer
2:                                prostate cancer
3: breast cancer, ovarian cancer, prostate cancer
4:         peripheral neuropathy, prostate cancer
5:                                prostate cancer
6:          erectile dysfunction, prostate cancer

3.21 Testicular cancer

gwas_study_info |>
 filter(grepl("testicular cancer", l2_all_disease_terms)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct()
                              all_disease_terms
                                         <char>
1:                         testicular carcinoma
2: testicular carcinoma, cardiovascular disease
3:                          testicular neoplasm
                        l2_all_disease_terms
                                      <char>
1:                         testicular cancer
2: cardiovascular disease, testicular cancer
3:                         testicular cancer

3.22 Kidney cancer

gwas_study_info |>
 filter(grepl("kidney cancer", l2_all_disease_terms)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head() 
                all_disease_terms l2_all_disease_terms
                           <char>               <char>
1:           renal cell carcinoma        kidney cancer
2:                 nephroblastoma        kidney cancer
3:                  kidney cancer        kidney cancer
4:     clear cell renal carcinoma        kidney cancer
5:                renal carcinoma        kidney cancer
6: papillary renal cell carcinoma        kidney cancer

3.23 Bladder cancer

gwas_study_info = 
  gwas_study_info |>
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = "urinary bladder cancer",
                                   "bladder cancer"
                          )  
        )


gwas_study_info |>
 filter(grepl("bladder cancer", l2_all_disease_terms)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() 
                                            all_disease_terms
                                                       <char>
1:                                  urinary bladder carcinoma
2: urinary bladder carcinoma, disease progression measurement
3:                                    urinary bladder cancer,
4:                                     urinary bladder cancer
   l2_all_disease_terms
                 <char>
1:       bladder cancer
2:       bladder cancer
3:       bladder cancer
4:       bladder cancer

3.24 Brain and central nervous system cancer

gwas_study_info = 
  gwas_study_info |>
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = "\\bcentral nervous system cancer\\b",
                                   "brain and central nervous system cancer"
                          )  
        )


gwas_study_info |>
 filter(grepl("brain and central nervous system cancer", l2_all_disease_terms)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
                                        all_disease_terms
                                                   <char>
1:                                glioblastoma multiforme
2:                          central nervous system cancer
3:                                                 glioma
4:                  central nervous system cancer, glioma
5: central nervous system cancer, glioblastoma multiforme
6:                                         brain neoplasm
                      l2_all_disease_terms
                                    <char>
1: brain and central nervous system cancer
2: brain and central nervous system cancer
3: brain and central nervous system cancer
4: brain and central nervous system cancer
5: brain and central nervous system cancer
6: brain and central nervous system cancer

3.25 Eye cancer

gwas_study_info = 
gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = "\\bocular melanoma\\b|ocular cancer\\b",
                                   "eye cancer"
                          )  
        )

gwas_study_info |>
 filter(grepl("eye cancer", l2_all_disease_terms)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |> 
  head()
                                 all_disease_terms l2_all_disease_terms
                                            <char>               <char>
1:                                  uveal melanoma           eye cancer
2:                              choroidal melanoma           eye cancer
3:                 epithelioid cell uveal melanoma           eye cancer
4: uveal melanoma, epithelioid cell uveal melanoma           eye cancer
5:                 uveal melanoma disease severity           eye cancer
6:                                   ocular cancer           eye cancer

3.26 Neuroblastoma and other peripheral nervous system cancers

gwas_study_info =
gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = "\\bneuroblastoma\\b|\\bperipheral nervous system cancer\\b",
                                   "neuroblastoma and other peripheral nervous system cancers"
                          )  
        )

gwas_study_info |>
 filter(grepl("neuroblastoma and other peripheral nervous system cancers", l2_all_disease_terms)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
                   all_disease_terms
                              <char>
1:                     neuroblastoma
2: neuroblastoma, cutaneous melanoma
                                                                 l2_all_disease_terms
                                                                               <char>
1:                          neuroblastoma and other peripheral nervous system cancers
2: malignant skin melanoma, neuroblastoma and other peripheral nervous system cancers

3.27 Thyroid cancer

gwas_study_info |>
 filter(grepl("thyroid cancer", l2_all_disease_terms)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
                  all_disease_terms l2_all_disease_terms
                             <char>               <char>
1: differentiated thyroid carcinoma       thyroid cancer
2:      papillary thyroid carcinoma       thyroid cancer
3:     follicular thyroid carcinoma       thyroid cancer
4:                thyroid carcinoma       thyroid cancer
5:                   thyroid cancer       thyroid cancer

3.28 Mesothelioma

gwas_study_info |>
 filter(grepl("mesothelioma", l2_all_disease_terms)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
                all_disease_terms l2_all_disease_terms
                           <char>               <char>
1: malignant pleural mesothelioma         mesothelioma
2:                   mesothelioma         mesothelioma
3:           pleural mesothelioma         mesothelioma

3.29 Hodgkins lymphoma

gwas_study_info = 
gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = "hodgkins lymphoma",
                                   "hodgkin lymphoma"
                          )  
        )


gwas_study_info |> 
 filter(grepl("hodgkin lymphoma", l2_all_disease_terms)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
                                     all_disease_terms
                                                <char>
1:   diffuse large b-cell lymphoma, multiple sclerosis
2:             follicular lymphoma, multiple sclerosis
3:   marginal zone b-cell lymphoma, multiple sclerosis
4: diffuse large b-cell lymphoma, rheumatoid arthritis
5:           rheumatoid arthritis, follicular lymphoma
6: rheumatoid arthritis, marginal zone b-cell lymphoma
                         l2_all_disease_terms
                                       <char>
1:   multiple sclerosis, non-hodgkin lymphoma
2:   multiple sclerosis, non-hodgkin lymphoma
3:   multiple sclerosis, non-hodgkin lymphoma
4: non-hodgkin lymphoma, rheumatoid arthritis
5: non-hodgkin lymphoma, rheumatoid arthritis
6: non-hodgkin lymphoma, rheumatoid arthritis

3.30 Non-hodgkin lymphoma

gwas_study_info = 
gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = "non-hodgkins lymphoma",
                                   "non-hodgkin lymphoma"
                          )  
        )

gwas_study_info |>
 filter(grepl("non-hodgkin lymphoma", l2_all_disease_terms)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
                                     all_disease_terms
                                                <char>
1:   diffuse large b-cell lymphoma, multiple sclerosis
2:             follicular lymphoma, multiple sclerosis
3:   marginal zone b-cell lymphoma, multiple sclerosis
4: diffuse large b-cell lymphoma, rheumatoid arthritis
5:           rheumatoid arthritis, follicular lymphoma
6: rheumatoid arthritis, marginal zone b-cell lymphoma
                         l2_all_disease_terms
                                       <char>
1:   multiple sclerosis, non-hodgkin lymphoma
2:   multiple sclerosis, non-hodgkin lymphoma
3:   multiple sclerosis, non-hodgkin lymphoma
4: non-hodgkin lymphoma, rheumatoid arthritis
5: non-hodgkin lymphoma, rheumatoid arthritis
6: non-hodgkin lymphoma, rheumatoid arthritis

3.31 Multiple myeloma

gwas_study_info |>
 filter(grepl("multiple myeloma", l2_all_disease_terms)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
                                                   all_disease_terms
                                                              <char>
1:                                                  multiple myeloma
2:                           multiple myeloma, peripheral neuropathy
3:             multiple myeloma, chemotherapy-induced oral mucositis
4:                           multiple myeloma, monoclonal gammopathy
5: hodgkins lymphoma, multiple myeloma, chronic lymphocytic leukemia
6:        hodgkins lymphoma, multiple myeloma, non-hodgkins lymphoma
                                       l2_all_disease_terms
                                                     <char>
1:                                         multiple myeloma
2:                  multiple myeloma, peripheral neuropathy
3:    chemotherapy-induced oral mucositis, multiple myeloma
4:                  monoclonal gammopathy, multiple myeloma
5:             hodgkin lymphoma, leukemia, multiple myeloma
6: hodgkin lymphoma, multiple myeloma, non-hodgkin lymphoma

3.32 Leukemia

gwas_study_info |>
 filter(grepl("leukemia", l2_all_disease_terms)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
                                                       all_disease_terms
                                                                  <char>
1:                                          acute lymphoblastic leukemia
2:                      multiple sclerosis, chronic lymphocytic leukemia
3:                    rheumatoid arthritis, chronic lymphocytic leukemia
4:            systemic lupus erythematosus, chronic lymphocytic leukemia
5: acute lymphoblastic leukemia, asparaginase-induced acute pancreatitis
6:                                          chronic lymphocytic leukemia
                     l2_all_disease_terms
                                   <char>
1:                               leukemia
2:           leukemia, multiple sclerosis
3:         leukemia, rheumatoid arthritis
4: leukemia, systemic lupus erythematosus
5:                 leukemia, pancreatitis
6:                               leukemia

3.33 Other malignant neoplasms

gwas_study_info =
  gwas_study_info |>
  mutate(l2_all_disease_terms = 
         case_when(
         l2_all_disease_terms == "cancer" ~ "other malignant neoplasms",
         TRUE ~ l2_all_disease_terms
                 )
         )

gwas_study_info =
  gwas_study_info |>
  mutate(l2_all_disease_terms = 
         ifelse(PUBMED_ID == 27790247,
                stringr::str_replace_all(l2_all_disease_terms,
                                        pattern = ", cancer,",
                                        ", other malignant neoplasms,"
                                        ),
                l2_all_disease_terms
                
         )
  )


### dealing with measuring cancer caused factor terms
gwas_study_info |> 
  filter(grepl("^cancer,", l2_all_disease_terms)) |> 
  pull(l2_all_disease_terms) |> 
  unique()
[1] "cancer, chronic obstructive pulmonary disease"
[2] "cancer, cardiotoxicity"                       
[3] "cancer, hand-foot syndrome"                   
[4] "cancer, peripheral neuropathy"                
[5] "cancer, immune system toxicity"               
[6] "cancer, hypothyroidism"                       
[7] "cancer, radiation-induced disorder"           
[8] "cancer, osteonecrosis"                        
gwas_study_info =
  gwas_study_info |>
  mutate(l2_all_disease_terms = 
         ifelse(grepl("^cancer,", l2_all_disease_terms),
                stringr::str_replace_all(l2_all_disease_terms,
                                        pattern = "^cancer,",
                                        "other malignant neoplasms,"
                                        ),
                l2_all_disease_terms
                
         )
  )

other_malignant_terms <- c(
                           "retroperitoneal cancer",
                           "peritoneal cancer",
                           "ewing sarcoma",
                           
                           "digestive system cancer",
                           "intestinal cancer",
                           "small intestine cancer",
                           
                           "female reproductive organ cancer",
                           "male reproductive organ cancer",
                           "vulvar cancer",                           
                           "testicular germ cell tumor",
                           "urogenital cancer",
                           
                           "squamous cell cancer",

                           "head and neck cancer",
                           "malignant tumor of floor of mouth", 
                           "nasal cavity cancer", #? not sure if should be somewhere else .. 
                           
                           "malignant lymphoid tumor",
                           "neuroendocrine tumor",
                           "lymphatic system cancer",
                           
                           "childhood cancer" #? maybe sort furtrher

                           )

gwas_study_info = 
gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = paste0(other_malignant_terms, collapse = "(?=,|$)|\\b"),
                                  "other malignant neoplasms"
                          )  
        )


gwas_study_info |>
 filter(grepl("other malignant neoplasms", l2_all_disease_terms)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct()
                                                                                 all_disease_terms
                                                                                            <char>
 1:                                                                        squamous cell carcinoma
 2:                                                                                         cancer
 3:                                                               childhood cancer, cardiomyopathy
 4:                                                                        neuroendocrine neoplasm
 5:                                                           small intestine neuroendocrine tumor
 6:                                                                pancreatic neuroendocrine tumor
 7:                                                                 pulmonary neuroendocrine tumor
 8:                                                               childhood cancer, cardiotoxicity
 9:                                                                      childhood cancer, obesity
10:                                                                     testicular germ cell tumor
11:                                                              cutaneous squamous cell carcinoma
12:                                                              head and neck malignant neoplasia
13:                                                                                  ewing sarcoma
14:                                                                                carcinoid tumor
15:                                                    head and neck squamous cell carcinoma, pain
16:                              digestive system carcinoma, chronic obstructive pulmonary disease
17:                                                  cancer, chronic obstructive pulmonary disease
18:                                                                     digestive system carcinoma
19: heart failure, diabetes mellitus, stroke, atrial fibrillation, coronary artery disease, cancer
20: stroke, atrial fibrillation, coronary artery disease, heart failure, diabetes mellitus, cancer
21:                                                          head and neck squamous cell carcinoma
22:                                                             childhood cancer, breast carcinoma
23:                                                                         cancer, cardiotoxicity
24:                                                                               childhood cancer
25:                                                                childhood cancer, bone fracture
26:                                                                     cancer, hand-foot syndrome
27:                                          head and neck malignant neoplasia, osteoradionecrosis
28:                                                                        head and neck carcinoma
29:                                                                        digestive system cancer
30:                                                                     reproductive system cancer
31:                                                                 male reproductive organ cancer
32:                                                     childhood cancer, type 2 diabetes mellitus
33:                                                                        lymphatic system cancer
34:                                                              malignant tumor of floor of mouth
35:                                                                            nasal cavity cancer
36:                                                                         small intestine cancer
37:                                                      retroperitoneal cancer, peritoneum cancer
38:                                                                                         cancer
39:                                                               female reproductive organ cancer
40:                                                                      small intestine carcinoma
41:                                                                  cancer, peripheral neuropathy
42:                                                    childhood cancer, hearing loss, ototoxicity
43:                                                                 childhood cancer, hearing loss
44:                                                                              intestinal cancer
45:                                                                                vulvar neoplasm
46:                                                                               vulvar carcinoma
47:                                                                              in situ carcinoma
48:                                                                                      carcinoma
49:                                                                              lymphoid neoplasm
50:                                                           head and neck carcinoma, lung cancer
51:                                                                 cancer, immune system toxicity
52:                                                                         cancer, hypothyroidism
53:                                                                            urogenital neoplasm
54:                                  head and neck malignant neoplasia, radiation-induced disorder
55:                                                             cancer, radiation-induced disorder
56:                                            head and neck malignant neoplasia, neuropathic pain
57:                                                                          cancer, osteonecrosis
58:                                                             head and neck carcinoma, mucositis
59:                                                              head and neck carcinoma, fibrosis
60:                                                  head and neck carcinoma, mucositis, dysphagia
61:                                       head and neck carcinoma, fibrosis, dysphagia, xerostomia
62:                            head and neck carcinoma, fibrosis, mucositis, dysphagia, xerostomia
                                                                                 all_disease_terms
                                                                                                 l2_all_disease_terms
                                                                                                               <char>
 1:                                                                                         other malignant neoplasms
 2:                                                                                         other malignant neoplasms
 3:                                                                         cardiomyopathy, other malignant neoplasms
 4:                                                                                         other malignant neoplasms
 5:                                                                                         other malignant neoplasms
 6:                                                                                         other malignant neoplasms
 7:                                                                                         other malignant neoplasms
 8:                                                                         cardiotoxicity, other malignant neoplasms
 9:                                                                                other malignant neoplasms, obesity
10:                                                                                         other malignant neoplasms
11:                                                                                         other malignant neoplasms
12:                                                                                         other malignant neoplasms
13:                                                                                         other malignant neoplasms
14:                                                                                         other malignant neoplasms
15:                                                                                   other malignant neoplasms, pain
16:                                                  chronic obstructive pulmonary disease, other malignant neoplasms
17:                                                  other malignant neoplasms, chronic obstructive pulmonary disease
18:                                                                                         other malignant neoplasms
19: atrial fibrillation, other malignant neoplasms, coronary artery disease, diabetes mellitus, heart failure, stroke
20: atrial fibrillation, other malignant neoplasms, coronary artery disease, diabetes mellitus, heart failure, stroke
21:                                                                                         other malignant neoplasms
22:                                                                          breast cancer, other malignant neoplasms
23:                                                                         other malignant neoplasms, cardiotoxicity
24:                                                                                         other malignant neoplasms
25:                                                                          bone fracture, other malignant neoplasms
26:                                                                     other malignant neoplasms, hand-foot syndrome
27:                                                                     other malignant neoplasms, osteoradionecrosis
28:                                                                                         other malignant neoplasms
29:                                                                                         other malignant neoplasms
30:                                                                                         other malignant neoplasms
31:                                                                                         other malignant neoplasms
32:                                                               other malignant neoplasms, type 2 diabetes mellitus
33:                                                                                         other malignant neoplasms
34:                                                                                         other malignant neoplasms
35:                                                                                         other malignant neoplasms
36:                                                                                         other malignant neoplasms
37:                                                              other malignant neoplasms, other malignant neoplasms
38:                                                              other malignant neoplasms, other malignant neoplasms
39:                                                                                         other malignant neoplasms
40:                                                                                         other malignant neoplasms
41:                                                                  other malignant neoplasms, peripheral neuropathy
42:                                                              other malignant neoplasms, hearing loss, ototoxicity
43:                                                                           other malignant neoplasms, hearing loss
44:                                                                                         other malignant neoplasms
45:                                                                                         other malignant neoplasms
46:                                                                                         other malignant neoplasms
47:                                                                                         other malignant neoplasms
48:                                                                                         other malignant neoplasms
49:                                                                                         other malignant neoplasms
50:                                                      other malignant neoplasms, tracheal bronchus and lung cancer
51:                                                                 other malignant neoplasms, immune system toxicity
52:                                                                         other malignant neoplasms, hypothyroidism
53:                                                                                         other malignant neoplasms
54:                                                             other malignant neoplasms, radiation-induced disorder
55:                                                             other malignant neoplasms, radiation-induced disorder
56:                                                                       other malignant neoplasms, neuropathic pain
57:                                                                          other malignant neoplasms, osteonecrosis
58:                                                                              other malignant neoplasms, mucositis
59:                                                                               fibrosis, other malignant neoplasms
60:                                                                   dysphagia, other malignant neoplasms, mucositis
61:                                                        dysphagia, fibrosis, other malignant neoplasms, xerostomia
62:                                             dysphagia, fibrosis, other malignant neoplasms, mucositis, xerostomia
                                                                                                 l2_all_disease_terms

3.34 Other neoplasms

gwas_study_info =
  gwas_study_info |>
  mutate(l2_all_disease_terms = 
         case_when(
         l2_all_disease_terms == "benign neoplasm" ~ "other neoplasms",
         TRUE ~ l2_all_disease_terms
                 )
         )

unknown_sig_terms <- c("intracranial germ cell tumor",
                       "bladder tumor")

gwas_study_info =
gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = paste0(unknown_sig_terms, collapse = "(?=,|$)|\\b"),
                                   "other neoplasms"
                          )
 )

gwas_study_info |>
 filter(grepl("other neoplasms", l2_all_disease_terms)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
                         all_disease_terms l2_all_disease_terms
                                    <char>               <char>
1:            benign prostatic hyperplasia      other neoplasms
2:                      colorectal adenoma      other neoplasms
3: colorectal cancer, endometrial neoplasm      other neoplasms
4:      upper aerodigestive tract neoplasm      other neoplasms
5:                              meningioma      other neoplasms
6:                 pituitary gland adenoma      other neoplasms

4 Cardiovascular diseases

4.1 Rheumatic heart disease

gwas_study_info |>
 filter(grepl("rheumatic heart disease", l2_all_disease_terms)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
         all_disease_terms    l2_all_disease_terms
                    <char>                  <char>
1: rheumatic heart disease rheumatic heart disease

4.2 Ischemic heart disease

gwas_study_info |>
 filter(grepl("ischemic heart disease", l2_all_disease_terms)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
Empty data.table (0 rows and 2 cols): all_disease_terms,l2_all_disease_terms

== coronary artery disease (https://www.ncbi.nlm.nih.gov/books/NBK209964/)

gwas_study_info = gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms,
                                  pattern = "coronary artery disease",
                                   "ischemic heart disease"
                          )  
        )

4.3 Stroke

gwas_study_info |>
 filter(grepl("stroke", l2_all_disease_terms)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
                        all_disease_terms       l2_all_disease_terms
                                   <char>                     <char>
1:               intracerebral hemorrhage                     stroke
2:     non-lobar intracerebral hemorrhage                     stroke
3:         lobar intracerebral hemorrhage                     stroke
4:                    small vessel stroke                     stroke
5: alzheimer disease, small vessel stroke alzheimers disease, stroke
6:                        ischemic stroke                     stroke

4.4 Hypertensive heart disease

gwas_study_info |>
 filter(grepl("hypertensive heart disease", l2_all_disease_terms)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
                            all_disease_terms
                                       <char>
1: hypertensive heart disease, kidney disease
2:                 hypertensive heart disease
                         l2_all_disease_terms
                                       <char>
1: hypertensive heart disease, kidney disease
2:                 hypertensive heart disease

4.5 Non-rheumatic heart disease

gwas_study_info |>
 filter(grepl("non-rheumatic heart disease", l2_all_disease_terms)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
Empty data.table (0 rows and 2 cols): all_disease_terms,l2_all_disease_terms

4.6 Cardiomyopathy and myocarditis

gwas_study_info =
gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = "\\bcardiomyopathy\\b|\\bmyocarditis\\b",
                                   "cardiomyopathy and myocarditis"
                          )  
        )

gwas_study_info |>
 filter(grepl("cardiomyopathy and myocarditis", l2_all_disease_terms)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
                   all_disease_terms
                              <char>
1: idiopathic dilated cardiomyopathy
2:  childhood cancer, cardiomyopathy
3:       hypertrophic cardiomyopathy
4:            dilated cardiomyopathy
5:             chagas cardiomyopathy
6:         peripartum cardiomyopathy
                                        l2_all_disease_terms
                                                      <char>
1:                 idiopathic cardiomyopathy and myocarditis
2: cardiomyopathy and myocarditis, other malignant neoplasms
3:                            cardiomyopathy and myocarditis
4:                            cardiomyopathy and myocarditis
5:                            cardiomyopathy and myocarditis
6:                            cardiomyopathy and myocarditis

4.7 Pulmonary arterial hypertension

gwas_study_info |>
 filter(grepl("pulmonary arterial hypertension", l2_all_disease_terms)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
Empty data.table (0 rows and 2 cols): all_disease_terms,l2_all_disease_terms

4.8 Atrial fibrillation & flutter

afib_terms <- c("atrial fibrillation",
                "atrial flutter",
                "post-operative atrial fibrillation")

gwas_study_info = 
gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = paste0(afib_terms, collapse = "(?=,|$)|\\b"),
                                   "atrial fibrillation and flutter"
                          )  
        )

gwas_study_info |>
 filter(grepl("atrial fibrillation and flutter", l2_all_disease_terms)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
                                                                                all_disease_terms
                                                                                           <char>
1:                                                                            atrial fibrillation
2: heart failure, diabetes mellitus, stroke, atrial fibrillation, coronary artery disease, cancer
3: stroke, atrial fibrillation, coronary artery disease, heart failure, diabetes mellitus, cancer
4:                                                                                 atrial flutter
5:                                                             post-operative atrial fibrillation
                                                                                                           l2_all_disease_terms
                                                                                                                         <char>
1:                                                                                              atrial fibrillation and flutter
2: atrial fibrillation and flutter, other malignant neoplasms, ischemic heart disease, diabetes mellitus, heart failure, stroke
3: atrial fibrillation and flutter, other malignant neoplasms, ischemic heart disease, diabetes mellitus, heart failure, stroke
4:                                                                                              atrial fibrillation and flutter
5:                                                                              atrial fibrillation and flutter, post-operative

4.9 Aortic aneurysm

url <- "http://www.ebi.ac.uk/ols4/api/ontologies/cvdo/terms/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FDOID_3627/descendants"

aortic_aneurysm_terms <- get_descendants(url)
[1] "Number of terms collected:"
[1] 6
[1] "\n Some example terms"
[1] "ruptured thoracoabdominal aortic aneurysm"
[2] "ruptured abdominal aortic aneurysm"       
[3] "ruptured thoracic aortic aneurysm"        
[4] "abdominal aortic aneurysm"                
[5] "ruptured aortic aneurysm"                 
gwas_study_info = gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = paste0(aortic_aneurysm_terms, collapse = "(?=,|$)|\\b"),
                                   "aortic aneurysm"
                          )  
        )

gwas_study_info |>
 filter(grepl("aortic aneurysm", l2_all_disease_terms)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
                                                     all_disease_terms
                                                                <char>
1: thoracic aortic aneurysm, abdominal aortic aneurysm, brain aneurysm
2:                                           abdominal aortic aneurysm
3:                                            thoracic aortic aneurysm
4:                                                     aortic aneurysm
5:                           marfan syndrome, thoracic aortic aneurysm
               l2_all_disease_terms
                             <char>
1:  aortic aneurysm, brain aneurysm
2:                  aortic aneurysm
3:                  aortic aneurysm
4:                  aortic aneurysm
5: aortic aneurysm, marfan syndrome

4.10 Lower extremity peripheral arterial disease

4.11 Endocarditis

gwas_study_info |> 
 filter(grepl("endocarditis", l2_all_disease_terms)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
                                         all_disease_terms
                                                    <char>
1: staphylococcus aureus infection, bacterial endocarditis
2:                                            endocarditis
                            l2_all_disease_terms
                                          <char>
1: endocarditis, staphylococcus aureus infection
2:                                  endocarditis

4.12 Other cardiovascular and circulatory diseases

5 Chronic respiratory diseases

5.1 Chronic obstructive pulmonary disease

gwas_study_info |>
 filter(grepl("chronic obstructive pulmonary disease", l2_all_disease_terms)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
                                                   all_disease_terms
                                                              <char>
1:                             chronic obstructive pulmonary disease
2:         chronic obstructive pulmonary disease, chronic bronchitis
3: digestive system carcinoma, chronic obstructive pulmonary disease
4:                     cancer, chronic obstructive pulmonary disease
5:             lung carcinoma, chronic obstructive pulmonary disease
6:                     asthma, chronic obstructive pulmonary disease
                                                       l2_all_disease_terms
                                                                     <char>
1:                                    chronic obstructive pulmonary disease
2:                chronic bronchitis, chronic obstructive pulmonary disease
3:         chronic obstructive pulmonary disease, other malignant neoplasms
4:         other malignant neoplasms, chronic obstructive pulmonary disease
5: chronic obstructive pulmonary disease, tracheal bronchus and lung cancer
6:                            asthma, chronic obstructive pulmonary disease

5.2 Pneumoconiosis

gwas_study_info |>
 filter(grepl("pneumoconiosis", l2_all_disease_terms)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
   all_disease_terms l2_all_disease_terms
              <char>               <char>
1:    pneumoconiosis       pneumoconiosis

5.3 Asthma

gwas_study_info |>
 filter(grepl("asthma", l2_all_disease_terms)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
                                                all_disease_terms
                                                           <char>
1:                                                         asthma
2:                                       asthma, allergic disease
3:                                         childhood onset asthma
4: childhood onset asthma, respiratory symptom change measurement
5:                                         age of onset of asthma
6:                                         aspirin-induced asthma
          l2_all_disease_terms
                        <char>
1:                      asthma
2:    allergic disease, asthma
3:                      asthma
4: asthma, respiratory symptom
5:                      asthma
6:                      asthma

5.4 Interstitial lung disease and pulmonary sarcoidosis

interstitial_lung_disease_terms <- c("pulmonary sarcoidosis",
                                   "interstitial lung disease"
                                   )

gwas_study_info = 
gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = paste0(interstitial_lung_disease_terms, 
                                                   collapse = "(?=,|$)|\\b"),
                                   "interstitial lung disease and pulmonary sarcoidosis"
                          )  
        )

gwas_study_info |>
 filter(grepl("interstitial lung disease and pulmonary sarcoidosis", l2_all_disease_terms)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
                                 all_disease_terms
                                            <char>
1: rheumatoid arthritis, interstitial lung disease
2:                       interstitial lung disease
3: systemic scleroderma, interstitial lung disease
                                                        l2_all_disease_terms
                                                                      <char>
1: interstitial lung disease and pulmonary sarcoidosis, rheumatoid arthritis
2:                       interstitial lung disease and pulmonary sarcoidosis
3: interstitial lung disease and pulmonary sarcoidosis, systemic scleroderma

5.5 Other chronic respiratory diseases

gwas_study_info |>
 filter(grepl("other chronic respiratory diseases", l2_all_disease_terms)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
Empty data.table (0 rows and 2 cols): all_disease_terms,l2_all_disease_terms

6 Digestive diseases

6.1 Cirrhosis & other chronic liver diseases

url <- "http://www.ebi.ac.uk/ols4/api/ontologies/snomed/terms/http%253A%252F%252Fsnomed.info%252Fid%252F328383001/descendants"

chronic_liver_disease_terms <- get_descendants(url)
[1] "Number of terms collected:"
[1] 114
[1] "\n Some example terms"
[1] "hepatic ascites co-occurrent with chronic active hepatitis due to toxic liver disease"
[2] "cirrhosis of liver co-occurrent and due to primary sclerosing cholangitis (disorder)" 
[3] "chronic hepatitis c co-occurrent with human immunodeficiency virus infection"         
[4] "primary biliary cirrhosis co-occurrent with systemic scleroderma (disorder)"          
[5] "pulmonary fibrosis, hepatic hyperplasia, bone marrow hypoplasia syndrome"             
chronic_liver_disease_terms <- c("primary biliary cirrhosis",
                                 "alcoholic liver cirrhosis",
                                 "chronic hepatitis B virus infection", 
                                 "acute-on-chronic liver failure",
                                 chronic_liver_disease_terms)

gwas_study_info = gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms,
                                  pattern = paste0(chronic_liver_disease_terms, collapse = "(?=,|$)|\\b"),
                                   "cirrhosis and other chronic liver diseases"
                          )  
        )

gwas_study_info = gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms,
                                  pattern = "liver disease",
                                   "cirrhosis and other chronic liver diseases"
                          )  
        )

6.2 Gallbladder and biliary diseases

gal_bile_terms = c("gallbladder disease",
                   "bile duct disorder",
                   "biliary tract disease")


gwas_study_info = gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms,
                                  pattern = paste0(gal_bile_terms, collapse = "(?=,|$)|\\b"),
                                   "gallbladder and biliary diseases"
                          )  
        )

gwas_study_info |>
 filter(grepl("gallbladder and biliary diseases", l2_all_disease_terms)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
                                        all_disease_terms
                                                   <char>
1:                                     bile duct disorder
2:                                    gallbladder disease
3:                                  biliary tract disease
4:                      non-neoplastic bile duct disorder
5: biliary tract disease, pancreas disease, liver disease
                                                                             l2_all_disease_terms
                                                                                           <char>
1:                                                               gallbladder and biliary diseases
2:                                                               gallbladder and biliary diseases
3:                                                               gallbladder and biliary diseases
4:                                                non-neoplastic gallbladder and biliary diseases
5: gallbladder and biliary diseases, cirrhosis and other chronic liver diseases, pancreas disease

6.3 Appendicitis

gwas_study_info |>
 filter(grepl("appendicitis", l2_all_disease_terms)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
   all_disease_terms l2_all_disease_terms
              <char>               <char>
1:      appendicitis         appendicitis

6.4 Paralytic ileus and intestinal obstruction

gwas_study_info |>
 filter(grepl("paralytic ileus and intestinal obstruction", l2_all_disease_terms)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
Empty data.table (0 rows and 2 cols): all_disease_terms,l2_all_disease_terms

7 Rename disease terms to match GBD categories more closely

7.1 Acne -> acne vulgaris

gwas_study_info = gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = "acne",
                                   "acne vulgaris"
                          )  
        )

7.2 ADHD

gwas_study_info = gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = "adhd",
                                   "attention-deficit/hyperactivity disorder"
                          )  
        )

7.4 Alcohol use disorders

gwas_study_info = 
gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = "alcohol-related disorders|alcohol and nicotine codependence",
                                   "alcohol use disorders"
                          )  
        )

gwas_study_info = 
gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = "alcohol use disorder",
                                   "alcohol use disorders"
                          )  
        )

7.5 Alzheimer’s disease and other dementias

dementia <- c("alzheimers disease biomarker measurement",
              "alzheimers disease neuropathologic change",
              "aids dementia",
              "dementia",
              "frontotemporal dementia",
              "lewy body dementia",
              "vascular dementia",
              "alzheimers disease"
)

gwas_study_info = 
gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = paste0(dementia, collapse = "(?=,|$)|\\b"),
                                   "alzheimer's disease and other dementias"
                          )  
        )

7.6 Anxiety disorders

url <- "http://www.ebi.ac.uk/ols4/api/ontologies/mesh/terms/http%253A%252F%252Fid.nlm.nih.gov%252Fmesh%252FD001008/descendants"

anxiety_terms <- get_descendants(url)
[1] "Number of terms collected:"
[1] 15
[1] "\n Some example terms"
[1] "obsessive-compulsive disorder" "generalized anxiety disorder" 
[3] "neurocirculatory asthenia"     "excoriation disorder"         
[5] "anxiety, separation"          
anxiety_terms <- c(anxiety_terms, "obsessive-compulsive symptom measurement")

gwas_study_info = 
gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = paste0(anxiety_terms, collapse = "(?=,|$)|\\b"),
                                   "anxiety disorders"
                          )  
        ) |>
   mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = "anxiety disorder|anxiety measurement",
                                   "anxiety disorders"
                          )  
        ) |>
     mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = "anxiety",
                                   "anxiety disorders"
                          )  
        )

7.7 Amphetamine use disorders

gwas_study_info = 
gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = "methamphetamine",
                                   "amphetamine"
                          )  
        )

7.8 Autism spectrum disorders

gwas_study_info = gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms,
                                  pattern = "autism",
                                   "autism spectrum disorders"
                          )  
        )

7.9 Blindness and vision loss

vision_loss_terms <- c("blindness",
                       "color vision disorder",
                       "vision disorder",
                       "myopia",
                       "refractive error",
                       "hyperopia",
                       "astigmatism",
                       "corneal astigmatism",
                       "presbyopia",
                       "anisometropia",
                       "esotropia",
                       "non-accomodative esotropia",
                       "accommodative esotropia")

gwas_study_info = gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = paste0(vision_loss_terms, collapse = "(?=,|$)|\\b"),
                                   "blindness and vision loss"
                          )  
        )

7.10 Cannabis use disorders

gwas_study_info = 
gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = "cannabis dependence",
                                   "cannabis use disorders"
                          )  
        )

7.11 Cocaine use disorders

gwas_study_info = gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = "cocaine-related disorders",
                                   "cocaine use disorders"
                          )  
        )

7.12 Depressive disorders

gwas_study_info = gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = "depressive symptom measurement|major depressive disorder",
                                   "depressive disorders"
                          )  
        ) |>
   mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = "depressive disorder",
                                   "depressive disorders"
                          )  
        )

7.13 Gynecological diseases

diseases <- stringr::str_split(pattern = ", ", 
                               gwas_study_info$l2_all_disease_terms)  |> 
            unlist() |>
            stringr::str_trim()


pregnancy_terms <- grep("pregnancy", diseases, value = T)

gyno_terms <- c("endometriosis","placenta disease", pregnancy_terms)


gwas_study_info = gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms,
                                  pattern = paste0(gyno_terms, collapse = "(?=,|$)|\\b"),
                                   "gynecological diseases"
                          )  
        )

7.14 Eating disorders

gwas_study_info = gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = "bulimia nervosa|anorexia nervosa|binge eating|eating disorder",
                                  "eating disorders"
                          )  
        ) |>
   mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = "anorexia",
                                  "eating disorders"
                          )  
        )

7.15 Headache disorders

gwas_study_info = gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms,
                                  pattern = "headache disorder|cluster headache|migraine",
                                   "headache disorders"
                          )  
        )

7.16 Opioid use disorders

gwas_study_info = 
gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = "opioid dependence|opioid use disorder",
                                   "opioid use disorders"
                          )  
        )

7.17 Parkinsons disease

gwas_study_info = 
gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = "parkinsons disease",
                                   "parkinson's disease"
                          )  
        )

7.18 Personality disorders

url <- "http://www.ebi.ac.uk/ols4/api/ontologies/mondo/terms/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FMONDO_0002028/descendants"

personality_disorders <- get_descendants(url)
[1] "Number of terms collected:"
[1] 10
[1] "\n Some example terms"
[1] "obsessive-compulsive personality disorder"
[2] "narcissistic personality disorder"        
[3] "schizotypal personality disorder"         
[4] "histrionic personality disorder"          
[5] "antisocial personality disorder"          
gwas_study_info = gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms,
                                  pattern = paste0(personality_disorders, collapse = "(?=,|$)|\\b"),
                                   "personality disorders"
                          )  
        )

7.19 Other drug use disorders

gwas_study_info = 
gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = "heroin dependence|drug dependence|nictone dependence|substance abuse|drug misuse|alcohol use disorders delirium",
                                   "other drug use disorders"
                          )  
        )

7.20 Sleep disorders

url <- "http://www.ebi.ac.uk/ols4/api/ontologies/doid/terms/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FDOID_535/descendants"

sleep_disorders <- get_descendants(url)
[1] "Number of terms collected:"
[1] 16
[1] "\n Some example terms"
[1] "periodic limb movement disorder" "advanced sleep phase syndrome 3"
[3] "advanced sleep phase syndrome 2" "advanced sleep phase syndrome 1"
[5] "advanced sleep phase syndrome 4"
url <- "http://www.ebi.ac.uk/ols4/api/ontologies/efo/terms/http%253A%252F%252Fwww.ebi.ac.uk%252Fefo%252FEFO_0008568/descendants"

other_sleep_disorders <- get_descendants(url)
[1] "Number of terms collected:"
[1] 26
[1] "\n Some example terms"
[1] "autosomal dominant cerebellar ataxia, deafness and narcolepsy"
[2] "hereditary sensory neuropathy-deafness-dementia syndrome"     
[3] "rapid eye movement sleep disorder"                            
[4] "substance-induced sleep disorder"                             
[5] "drug induced central sleep apnea"                             
sleep_disorders <- c(sleep_disorders,
                    other_sleep_disorders)

gwas_study_info = gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms,
                                  pattern = paste0(sleep_disorders, collapse = "(?=,|$)|\\b"),
                                   "sleep disorders"
                          )  
        )

9 Malignant skin melanoma

and remove alz, parks, dementia

10 Other mental disorders

other_mental_disorders <- c("schizophrenia",
                            "manic or hypomanic episode",
                            "mental or behavioural disorder",
                            "mental disorder"
)

11 Other neurological disorders

other_neuro <- c("mild neurocognitive disorder",
                 "hiv-associated neurocognitive disorder")

12 Other sensory

disturbances of sensation of smell and taste

13 Check compatibility with gbd data

13.1 Weird fix

gwas_study_info = 
gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = "disorderss",
                                   "disorders"
                          )  
        )


gwas_study_info = 
gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = "anxiety disorders disorderss",
                                   "anxiety disorders"
                          )  
        )
gbd_data <- data.table::fread(here::here("data/gbd/ihme_gbd_2019_global_disease_burden_rate_all_ages.csv"))

gbd_data$cause <- stringr::str_remove_all(gbd_data$cause, ",")

diseases <- stringr::str_split(pattern = ", ", 
                               gwas_study_info$l2_all_disease_terms[gwas_study_info$l2_all_disease_terms != ""])  |> 
            unlist() |>
            stringr::str_trim()

gbd_data$cause[!tolower(gbd_data$cause) %in% unique(diseases)] |> sort() |> unique()
 [1] "Acute glomerulonephritis"                                         
 [2] "Bacterial skin diseases"                                          
 [3] "Chronic kidney disease"                                           
 [4] "Congenital birth defects"                                         
 [5] "Diabetes mellitus type 1"                                         
 [6] "Diabetes mellitus type 2"                                         
 [7] "Drug use disorders"                                               
 [8] "Endocrine metabolic blood and immune disorders"                   
 [9] "Fungal skin diseases"                                             
[10] "Hemoglobinopathies and hemolytic anemias"                         
[11] "Idiopathic developmental intellectual disability"                 
[12] "Idiopathic epilepsy"                                              
[13] "Inguinal femoral and abdominal hernia"                            
[14] "Lower extremity peripheral arterial disease"                      
[15] "Neuroblastoma and other peripheral nervous cell tumors"           
[16] "Non-rheumatic valvular heart disease"                             
[17] "Oral disorders"                                                   
[18] "Other cardiovascular and circulatory diseases"                    
[19] "Other chronic respiratory diseases"                               
[20] "Other digestive diseases"                                         
[21] "Other mental disorders"                                           
[22] "Other musculoskeletal disorders"                                  
[23] "Other neurological disorders"                                     
[24] "Other sense organ diseases"                                       
[25] "Other skin and subcutaneous diseases"                             
[26] "Paralytic ileus and intestinal obstruction"                       
[27] "Pulmonary Arterial Hypertension"                                  
[28] "Scabies"                                                          
[29] "Sudden infant death syndrome"                                     
[30] "Total burden related to Non-alcoholic fatty liver disease (NAFLD)"
[31] "Upper digestive system diseases"                                  
[32] "Urinary diseases and male infertility"                            
[33] "Vascular intestinal disorders"                                    
[34] "Viral skin diseases"                                              
gbd_data =
  gbd_data |>
  mutate(cause = tolower(cause))

gwas_disease_traits = data.frame(cause = diseases)
  # gwas_study_info |>
  # filter(DISEASE_STUDY == T) |>
  # select(all_disease_terms, l1_all_disease_terms, cause = l2_all_disease_terms) |>
  # distinct()

left_join(gwas_disease_traits, 
          gbd_data) |>
  head()
Joining with `by = join_by(cause)`
Warning in left_join(gwas_disease_traits, gbd_data): Detected an unexpected many-to-many relationship between `x` and `y`.
ℹ Row 3 of `x` matches multiple rows in `y`.
ℹ Row 19 of `y` matches multiple rows in `x`.
ℹ If a many-to-many relationship is expected, set `relationship =
  "many-to-many"` to silence this warning.
                                      cause
1 idiopathic cardiomyopathy and myocarditis
2                                 cleft lip
3         tracheal bronchus and lung cancer
4         tracheal bronchus and lung cancer
5         tracheal bronchus and lung cancer
6         tracheal bronchus and lung cancer
                                 measure location  sex      age metric year
1                                   <NA>     <NA> <NA>     <NA>   <NA>   NA
2                                   <NA>     <NA> <NA>     <NA>   <NA>   NA
3 DALYs (Disability-Adjusted Life Years)   Global Both All ages   Rate 2019
4                             Prevalence   Global Both All ages   Rate 2019
5                              Incidence   Global Both All ages   Rate 2019
6 DALYs (Disability-Adjusted Life Years)   Global Both All ages   Rate 2019
        val     upper     lower
1        NA        NA        NA
2        NA        NA        NA
3 580.36100 627.79984 532.74652
4  40.27440  43.51721  37.12978
5  28.16826  30.49575  25.77712
6 580.36100 627.79984 532.74652
gwas_study_info |> select(cause = l2_all_disease_terms) |>
  distinct() |>
  left_join(gbd_data) |>
  head()
Joining with `by = join_by(cause)`
                                       cause
                                      <char>
1:                                          
2: idiopathic cardiomyopathy and myocarditis
3:                                 cleft lip
4:         tracheal bronchus and lung cancer
5:         tracheal bronchus and lung cancer
6:         tracheal bronchus and lung cancer
                                  measure location    sex      age metric  year
                                   <char>   <char> <char>   <char> <char> <int>
1:                                   <NA>     <NA>   <NA>     <NA>   <NA>    NA
2:                                   <NA>     <NA>   <NA>     <NA>   <NA>    NA
3:                                   <NA>     <NA>   <NA>     <NA>   <NA>    NA
4: DALYs (Disability-Adjusted Life Years)   Global   Both All ages   Rate  2019
5:                             Prevalence   Global   Both All ages   Rate  2019
6:                              Incidence   Global   Both All ages   Rate  2019
         val     upper     lower
       <num>     <num>     <num>
1:        NA        NA        NA
2:        NA        NA        NA
3:        NA        NA        NA
4: 580.36100 627.79984 532.74652
5:  40.27440  43.51721  37.12978
6:  28.16826  30.49575  25.77712
diseases <- stringr::str_split(pattern = ", ", 
                               gwas_study_info$l2_all_disease_terms[gwas_study_info$l2_all_disease_terms != ""])  |> 
            unlist() |>
            stringr::str_trim()

length(unique(diseases))
[1] 1596
# make frequency table
freq <- table(as.factor(diseases))

# sort in decreasing order
freq_sorted <- sort(freq, decreasing = TRUE)

# show top N, e.g. top 10
head(freq_sorted, 10)

                         kidney disease                            hypertension 
                                  10915                                    7096 
               type 2 diabetes mellitus                         other neoplasms 
                                    922                                     537 
                   depressive disorders alzheimer's disease and other dementias 
                                    513                                     509 
                 ischemic heart disease                           breast cancer 
                                    501                                     379 
                          schizophrenia                                  asthma 
                                    368                                     348 

13.1.1 Save the updated gwas_study_info with harmonized disease terms

gwas_study_info <- fwrite(gwas_study_info,
                          here::here("output/gwas_cat/gwas_study_info_trait_group_l2.csv"))

sessionInfo()
R version 4.3.1 (2023-06-16)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS 15.6.1

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: America/Los_Angeles
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices datasets  utils     methods   base     

other attached packages:
[1] jsonlite_2.0.0    httr_1.4.7        stringr_1.5.1     ggplot2_3.5.2    
[5] data.table_1.17.8 dplyr_1.1.4       workflowr_1.7.1  

loaded via a namespace (and not attached):
 [1] gtable_0.3.6       compiler_4.3.1     renv_1.0.3         promises_1.3.3    
 [5] tidyselect_1.2.1   Rcpp_1.1.0         git2r_0.36.2       callr_3.7.6       
 [9] later_1.4.2        jquerylib_0.1.4    scales_1.4.0       yaml_2.3.10       
[13] fastmap_1.2.0      here_1.0.1         R6_2.6.1           generics_0.1.4    
[17] curl_6.4.0         knitr_1.50         tibble_3.3.0       rprojroot_2.1.0   
[21] RColorBrewer_1.1-3 bslib_0.9.0        pillar_1.11.0      rlang_1.1.6       
[25] cachem_1.1.0       stringi_1.8.7      httpuv_1.6.16      xfun_0.52         
[29] getPass_0.2-4      fs_1.6.6           sass_0.4.10        cli_3.6.5         
[33] withr_3.0.2        magrittr_2.0.3     ps_1.9.1           grid_4.3.1        
[37] digest_0.6.37      processx_3.8.6     rstudioapi_0.17.1  lifecycle_1.0.4   
[41] vctrs_0.6.5        evaluate_1.0.4     glue_1.8.0         farver_2.1.2      
[45] whisker_0.4.1      rmarkdown_2.29     tools_4.3.1        pkgconfig_2.0.3   
[49] htmltools_0.5.8.1