Last updated: 2025-09-23

Checks: 7 0

Knit directory: genomics_ancest_disease_dispar/

This reproducible R Markdown analysis was created with workflowr (version 1.7.1). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.


Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

The command set.seed(20220216) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version dd87c61. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .DS_Store
    Ignored:    .Rproj.user/
    Ignored:    data/.DS_Store
    Ignored:    data/gbd/.DS_Store
    Ignored:    data/gbd/IHME-GBD_2021_DATA-d8cf695e-1.csv
    Ignored:    data/gbd/ihme_gbd_2019_global_disease_burden_rate_all_ages.csv
    Ignored:    data/gbd/ihme_gbd_2019_global_paf_rate_percent_all_ages.csv
    Ignored:    data/gbd/ihme_gbd_2021_global_disease_burden_rate_all_ages.csv
    Ignored:    data/gbd/ihme_gbd_2021_global_paf_rate_percent_all_ages.csv
    Ignored:    data/gwas_catalog/
    Ignored:    data/icd/
    Ignored:    data/who/
    Ignored:    output/gwas_cat/
    Ignored:    output/gwas_study_info_cohort_corrected.csv
    Ignored:    output/gwas_study_info_trait_corrected.csv
    Ignored:    output/gwas_study_info_trait_ontology_info.csv
    Ignored:    output/gwas_study_info_trait_ontology_info_l1.csv
    Ignored:    output/gwas_study_info_trait_ontology_info_l2.csv
    Ignored:    output/trait_ontology/
    Ignored:    renv/

Unstaged changes:
    Modified:   code/get_term_descendants.R

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.


These are the previous versions of the repository in which changes were made to the R Markdown (analysis/level_2_disease_group.Rmd) and HTML (docs/level_2_disease_group.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File Version Author Date Message
Rmd dd87c61 IJbeasley 2025-09-23 Using icd codes to help grouping
html 904bb1d IJbeasley 2025-09-22 Build site.
Rmd bbcc167 IJbeasley 2025-09-22 Even more typo etc.
html 75debe4 IJbeasley 2025-09-22 Build site.
Rmd b3e3287 IJbeasley 2025-09-22 …maybe fixing typos
html b8ee7f0 IJbeasley 2025-09-22 Build site.
Rmd 3305f6a IJbeasley 2025-09-22 …maybe fixing typos
html 200442f IJbeasley 2025-09-17 Build site.
Rmd 614204e IJbeasley 2025-09-17 More fixing up of disease grouping
html 7b87d93 IJbeasley 2025-09-17 Build site.
Rmd da5c4b4 IJbeasley 2025-09-17 More correction to cardiovascular disease terms
html 08b0db3 IJbeasley 2025-09-17 Build site.
Rmd bb8ae95 IJbeasley 2025-09-17 Better grouping of cardiovascular disease
html 7cf2803 IJbeasley 2025-09-17 Build site.
Rmd 39262a4 IJbeasley 2025-09-17 More typo fixing
html c0cf9bd IJbeasley 2025-09-16 Build site.
Rmd 3519a0b IJbeasley 2025-09-16 Collapsing traits to gbd
html f1b18b0 IJbeasley 2025-09-16 Build site.
Rmd afe44b4 IJbeasley 2025-09-16 Collapsing traits to gbd
html c204ac4 IJbeasley 2025-09-16 Build site.
Rmd 7fa03f5 IJbeasley 2025-09-16 More cancer typos
html 8f1639b IJbeasley 2025-09-16 Build site.
Rmd 345ad9b IJbeasley 2025-09-16 More cancer typos
html a15dd40 IJbeasley 2025-09-16 Build site.
Rmd 16ead66 IJbeasley 2025-09-16 Correcting some cancer grouping
html 6018e42 IJbeasley 2025-09-16 Build site.
Rmd 02a0b9d IJbeasley 2025-09-16 Improving cancer grouping
html 6f66696 IJbeasley 2025-09-16 Build site.
Rmd 66cff1c IJbeasley 2025-09-16 Even more disease term grouping
html 21b6c02 IJbeasley 2025-09-15 Build site.
html 5ec3111 IJbeasley 2025-09-15 Build site.
html 30d773e IJbeasley 2025-09-15 Build site.
html 8d64a38 IJbeasley 2025-09-15 Build site.
Rmd b3088d8 IJbeasley 2025-09-15 workflowr::wflow_publish("analysis/level_2_disease_group.Rmd")
html b89d661 IJbeasley 2025-09-10 Build site.
Rmd c0fcab7 IJbeasley 2025-09-10 workflowr::wflow_publish("analysis/level_2_disease_group.Rmd")
html ead4d8e IJbeasley 2025-09-10 Build site.
Rmd 3964f77 IJbeasley 2025-09-10 workflowr::wflow_publish("analysis/level_2_disease_group.Rmd")
html 8fb639d IJbeasley 2025-09-10 Build site.
Rmd edeb6f5 IJbeasley 2025-09-10 workflowr::wflow_publish("analysis/level_2_disease_group.Rmd")
html fe91704 IJbeasley 2025-09-09 Build site.
Rmd 9c64867 IJbeasley 2025-09-09 Minor fixing of disease trait categorisation
html fa509c0 IJbeasley 2025-09-08 Build site.
Rmd c9602c7 IJbeasley 2025-09-08 More grouping to match GBD

1 Set up

library(dplyr)
library(data.table)
library(ggplot2)
library(stringr)

1.1 Ontology help - for getting disease subtypes

source(here::here("code/get_term_descendants.R"))
gwas_study_info <- fread(here::here("output/gwas_cat/gwas_study_info_group_l1_v2.csv"))

2 Objectives:

  • Further group disease terms (level 2 categories) to match GBD (globalqburden of disease) categories more closely.

2.1 Grouping - level 2 set up

gwas_study_info = gwas_study_info |>
  mutate(l2_all_disease_terms = l1_all_disease_terms)

3 Maternal and neonatal disorders

3.1 Maternal disorders

3.1.1

4 Neoplasms

4.1 Lip and oral cavity cancer

gwas_study_info |> 
 filter(grepl(vec_to_grep_pattern("lip and oral cavity cancer"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
                                     all_disease_terms
                                                <char>
1:                                  oral cavity cancer
2:                                      mouth neoplasm
3:                                       tongue cancer
4:                         major salivary gland cancer
5: human papilloma virus infection, oral cavity cancer
6:                                     tongue neoplasm
                                          l2_all_disease_terms
                                                        <char>
1:                                  lip and oral cavity cancer
2:                                  lip and oral cavity cancer
3:                                  lip and oral cavity cancer
4:                                  lip and oral cavity cancer
5: human papilloma virus infection, lip and oral cavity cancer
6:                                  lip and oral cavity cancer

4.2 Nasopharynx cancer

gwas_study_info =
gwas_study_info |> 
  mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = vec_to_grep_pattern("nasopharyngeal cancer"),
                                   "nasopharynx cancer"
                          )  
        )
  
gwas_study_info |> 
 filter(grepl(vec_to_grep_pattern("nasopharynx cancer"),
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct()
         all_disease_terms l2_all_disease_terms
                    <char>               <char>
1: nasopharyngeal neoplasm   nasopharynx cancer

4.3 Other pharynx cancer

gwas_study_info |> 
 filter(grepl(vec_to_grep_pattern("other pharynx cancer"),
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
                                       all_disease_terms
                                                  <char>
1:                                     oropharynx cancer
2: laryngeal squamous cell carcinoma, hypopharynx cancer
3:    human papilloma virus infection, oropharynx cancer
4:                                         tonsil cancer
5:                              hypopharyngeal carcinoma
6:                   pharynx cancer, laryngeal carcinoma
                                    l2_all_disease_terms
                                                  <char>
1:                                  other pharynx cancer
2:                   larynx cancer, other pharynx cancer
3: human papilloma virus infection, other pharynx cancer
4:                                  other pharynx cancer
5:                                  other pharynx cancer
6:                   larynx cancer, other pharynx cancer

4.4 Esophageal cancer

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("esophageal cancer"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
                                            all_disease_terms
                                                       <char>
1:              esophageal adenocarcinoma, barretts esophagus
2:                                  esophageal adenocarcinoma
3: esophageal adenocarcinoma, gastroesophageal reflux disease
4:                         esophageal squamous cell carcinoma
5:                    esophageal carcinoma, gastric carcinoma
6:              squamous cell carcinoma, esophageal carcinoma
                                 l2_all_disease_terms
                                               <char>
1:              barretts esophagus, esophageal cancer
2:                                  esophageal cancer
3: esophageal cancer, gastroesophageal reflux disease
4:                                  esophageal cancer
5:                  esophageal cancer, stomach cancer
6:                                  esophageal cancer

4.5 Stomach cancer

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("stomach cancer"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
                                            all_disease_terms
                                                       <char>
1:                                          gastric carcinoma
2:                    esophageal carcinoma, gastric carcinoma
3:                                   gastric cardia carcinoma
4:                                     gastric adenocarcinoma
5: lung carcinoma, squamous cell carcinoma, gastric carcinoma
6:                                             gastric cancer
                l2_all_disease_terms
                              <char>
1:                    stomach cancer
2: esophageal cancer, stomach cancer
3:                    stomach cancer
4:                    stomach cancer
5:       lung cancer, stomach cancer
6:                    stomach cancer

4.6 Colon and rectum cancer

gwas_study_info = gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = vec_to_grep_pattern("colorectal cancer"),
                                   "colon and rectum cancer"
                          )  
        )

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("colon and rectum cancer"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |> 
  head()
                                                                                                                                                                                                                                           all_disease_terms
                                                                                                                                                                                                                                                      <char>
1:                                                                                                                                                                                                                                         colorectal cancer
2:                                                                                                                                                                                                                 sclerosing cholangitis, colorectal cancer
3:                                                                                                                                                                                                                     colorectal cancer, colorectal adenoma
4:                                                                                                                                                                                                                              metastatic colorectal cancer
5: lung carcinoma, estrogen-receptor negative breast cancer, ovarian endometrioid carcinoma, colorectal cancer, prostate carcinoma, ovarian serous carcinoma, breast carcinoma, ovarian carcinoma, lung adenocarcinoma, squamous cell lung carcinoma, cancer
6:                                                                                                                                                                                                                                             rectum cancer
                                                                           l2_all_disease_terms
                                                                                         <char>
1:                                                                      colon and rectum cancer
2:                                              colon and rectum cancer, sclerosing cholangitis
3:                                                     benign neoplasm, colon and rectum cancer
4:                                                                      colon and rectum cancer
5: breast cancer, cancer, colon and rectum cancer, lung cancer, ovarian cancer, prostate cancer
6:                                                                      colon and rectum cancer

4.7 Liver cancer

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("liver cancer"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
                                         all_disease_terms
                                                    <char>
1:   hepatitis b virus infection, hepatocellular carcinoma
2:              sclerosing cholangitis, cholangiocarcinoma
3:              cholangiocarcinoma, sclerosing cholangitis
4:        sclerosing cholangitis, hepatocellular carcinoma
5:   hepatitis c virus infection, hepatocellular carcinoma
6: hepatocellular carcinoma, non-alcoholic steatohepatitis
                              l2_all_disease_terms
                                            <char>
1:             hepatitis b infection, liver cancer
2:            liver cancer, sclerosing cholangitis
3:            liver cancer, sclerosing cholangitis
4:            liver cancer, sclerosing cholangitis
5:             hepatitis c infection, liver cancer
6: liver cancer, non-alcoholic fatty liver disease

4.8 Gallbladder and biliary tract cancer

gwas_study_info = 
gwas_study_info |>
 mutate(l2_all_disease_terms  = 
        case_when(
          l2_all_disease_terms == "cancer of gallbladder and extrahepatic biliary tract" ~ "gallbladder and biliary tract cancer",
          TRUE ~ l2_all_disease_terms
                 )
        )

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("gallbladder and biliary tract cancer"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct()
                                         all_disease_terms
                                                    <char>
1:            sclerosing cholangitis, gallbladder neoplasm
2:                                    gallbladder neoplasm
3: carcinoma of gallbladder and extrahepatic biliary tract
                                           l2_all_disease_terms
                                                         <char>
1: gallbladder and biliary tract cancer, sclerosing cholangitis
2:                         gallbladder and biliary tract cancer
3:                         gallbladder and biliary tract cancer

4.9 Pancreatic cancer

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("pancreatic cancer"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
                                                                                                                         all_disease_terms
                                                                                                                                    <char>
1:                                                                                                                    pancreatic carcinoma
2:                                                                                                        pancreatic ductal adenocarcinoma
3:                                                                                                       pancreatic carcinoma, neutropenia
4:              breast carcinoma, estrogen-receptor positive breast cancer, metastatic prostate cancer, pancreatic carcinoma, hypertension
5:               breast carcinoma, estrogen-receptor positive breast cancer, metastatic prostate cancer, pancreatic carcinoma, proteinuria
6: breast carcinoma, estrogen-receptor positive breast cancer, metastatic prostate cancer, pancreatic carcinoma, hypertension, proteinuria
                                                           l2_all_disease_terms
                                                                         <char>
1:                                                            pancreatic cancer
2:                                                            pancreatic cancer
3:                                               neutropenia, pancreatic cancer
4:              breast cancer, hypertension, pancreatic cancer, prostate cancer
5:               breast cancer, pancreatic cancer, prostate cancer, proteinuria
6: breast cancer, hypertension, pancreatic cancer, prostate cancer, proteinuria

4.10 Larynx cancer

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("larynx cancer"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
                                       all_disease_terms
                                                  <char>
1:                     laryngeal squamous cell carcinoma
2: laryngeal squamous cell carcinoma, hypopharynx cancer
3:                                   laryngeal carcinoma
4:                                    laryngeal neoplasm
5:                                      glottis neoplasm
6:                   pharynx cancer, laryngeal carcinoma
                  l2_all_disease_terms
                                <char>
1:                       larynx cancer
2: larynx cancer, other pharynx cancer
3:                       larynx cancer
4:                       larynx cancer
5:                       larynx cancer
6: larynx cancer, other pharynx cancer

4.11 Tracheal, bronchus, and lung cancer

resp_cancer_terms = c("lung cancer",
                      "bronchus cancer",
                      "tracheal cancer",
                      "respiratory system cancer"
                        )

gwas_study_info = 
gwas_study_info |> 
 mutate(l2_all_disease_terms  =
          ifelse(l2_all_disease_terms != "tracheal bronchus and lung cancer",
          stringr::str_replace_all(l2_all_disease_terms,
                                   pattern = vec_to_grep_pattern(resp_cancer_terms),
                                  #pattern = paste0(resp_cancer_terms, collapse = "(?=,|$)|\\b"),
                                   "tracheal bronchus and lung cancer"
                          ),
          l2_all_disease_terms
          )
        )

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("tracheal bronchus and lung cancer"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
                                             all_disease_terms
                                                        <char>
1:                               non-small cell lung carcinoma
2:                                         lung adenocarcinoma
3:                                squamous cell lung carcinoma
4:               lung carcinoma, family history of lung cancer
5:          lung adenocarcinoma, family history of lung cancer
6: squamous cell lung carcinoma, family history of lung cancer
                l2_all_disease_terms
                              <char>
1: tracheal bronchus and lung cancer
2: tracheal bronchus and lung cancer
3: tracheal bronchus and lung cancer
4: tracheal bronchus and lung cancer
5: tracheal bronchus and lung cancer
6: tracheal bronchus and lung cancer

4.12 Malignant skin melanoma

gwas_study_info =
gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms,
                                  pattern = vec_to_grep_pattern("malignant melanoma of skin"),
                                   "malignant skin melanoma"
                          )  
        )

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("malignant skin melanoma"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
                   all_disease_terms
                              <char>
1:                cutaneous melanoma
2:                          melanoma
3: neuroblastoma, cutaneous melanoma
4:                       skin cancer
5:                    skin carcinoma
6:  melanoma, immune system toxicity
                                l2_all_disease_terms
                                              <char>
1:                           malignant skin melanoma
2:                           malignant skin melanoma
3:            malignant skin melanoma, neuroblastoma
4: malignant skin melanoma, non-melanoma skin cancer
5: malignant skin melanoma, non-melanoma skin cancer
6:   immune system toxicity, malignant skin melanoma

4.13 Non-melanoma skin cancer

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("non-melanoma skin cancer"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct()
                               all_disease_terms
                                          <char>
1: squamous cell carcinoma, basal cell carcinoma
2:                        keratinocyte carcinoma
3:                          basal cell carcinoma
4:             cutaneous squamous cell carcinoma
5:                   non-melanoma skin carcinoma
6:                                   skin cancer
7:                                 skin neoplasm
8:                        skin carcinoma in situ
9:                                skin carcinoma
                                l2_all_disease_terms
                                              <char>
1:                          non-melanoma skin cancer
2:                          non-melanoma skin cancer
3:                          non-melanoma skin cancer
4:                          non-melanoma skin cancer
5:                          non-melanoma skin cancer
6: malignant skin melanoma, non-melanoma skin cancer
7:                          non-melanoma skin cancer
8:                          non-melanoma skin cancer
9: malignant skin melanoma, non-melanoma skin cancer

4.14 Soft tissue and other extraosseous sarcomas

gwas_study_info = 
gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = vec_to_grep_pattern("soft tissue sarcoma"),
                                   "soft tissue and other extraosseous sarcomas"
                          )  
        )

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("soft tissue and other extraosseous sarcomas"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
       all_disease_terms                                 l2_all_disease_terms
                  <char>                                               <char>
1: sarcoma, fibrosarcoma sarcoma, soft tissue and other extraosseous sarcomas
2:       kaposis sarcoma          soft tissue and other extraosseous sarcomas

4.15 Malignant neoplasm of bone and articular cartilage

gwas_study_info =
gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = vec_to_grep_pattern(c("bone cancer","osteosarcoma")),
                                   "malignant neoplasm of bone and articular cartilage"
                          )  
        )

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("malignant neoplasm of bone and articular cartilage"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
                                  all_disease_terms
                                             <char>
1:                                     osteosarcoma
2:                           acute myeloid leukemia
3:                                 myeloid leukemia
4:                          malignant bone neoplasm
5: acute myeloid leukemia, myelodysplastic syndrome
6:                                    bone neoplasm
                                                           l2_all_disease_terms
                                                                         <char>
1:                           malignant neoplasm of bone and articular cartilage
2:                           malignant neoplasm of bone and articular cartilage
3:                           malignant neoplasm of bone and articular cartilage
4:                           malignant neoplasm of bone and articular cartilage
5: malignant neoplasm of bone and articular cartilage, myelodysplastic syndrome
6:                           malignant neoplasm of bone and articular cartilage

4.16 Breast cancer

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("breast cancer"), 
              l2_all_disease_terms,
              perl = T
              )) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
                                            all_disease_terms
                                                       <char>
1:                   estrogen-receptor negative breast cancer
2:                                           breast carcinoma
3:                   estrogen-receptor positive breast cancer
4:                                          breast carcinoma,
5: estrogen-receptor positive breast cancer, breast carcinoma
6: estrogen-receptor negative breast cancer, breast carcinoma
   l2_all_disease_terms
                 <char>
1:        breast cancer
2:        breast cancer
3:        breast cancer
4:        breast cancer
5:        breast cancer
6:        breast cancer

4.17 Cervical cancer

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("cervical cancer"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
                                     all_disease_terms
                                                <char>
1:                                  cervical carcinoma
2:                                     cervical cancer
3:                dysplasia of cervix, cervical cancer
4:                          dysplasia, cervical cancer
5:                    uterine cervix carcinoma in situ
6: cervical carcinoma, human papilloma virus infection
                               l2_all_disease_terms
                                             <char>
1:                                  cervical cancer
2:                                  cervical cancer
3:             cervical cancer, dysplasia of cervix
4:                       cervical cancer, dysplasia
5:                                  cervical cancer
6: cervical cancer, human papilloma virus infection

4.18 Uterine cancer

# ? is endometrial cancer a subset of uterine cancer for GBD?
# is for ontology: http://purl.obolibrary.org/obo/MONDO_0002715
gwas_study_info =
gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = vec_to_grep_pattern("endometrial cancer"),
                                   "uterine cancer"
                          )  
        )

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("uterine cancer"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
                    all_disease_terms l2_all_disease_terms
                               <char>               <char>
1: endometrial endometrioid carcinoma       uterine cancer
2:              endometrial carcinoma       uterine cancer
3:               endometrial neoplasm       uterine cancer
4:                  uterine carcinoma       uterine cancer
5:       endometrial cancer, covid-19       uterine cancer
6:              uterine corpus cancer       uterine cancer

4.19 Ovarian cancer

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("ovarian cancer"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
                                                                                                                                                                                                                                           all_disease_terms
                                                                                                                                                                                                                                                      <char>
1:                                                                                                                                                                                                                                         ovarian carcinoma
2:                                                                                                                                                                                                                       malignant epithelial tumor of ovary
3:                                                                                                                                                                                                   prostate carcinoma, breast carcinoma, ovarian carcinoma
4: lung carcinoma, estrogen-receptor negative breast cancer, ovarian endometrioid carcinoma, colorectal cancer, prostate carcinoma, ovarian serous carcinoma, breast carcinoma, ovarian carcinoma, lung adenocarcinoma, squamous cell lung carcinoma, cancer
5:                                                                                                                                                                                                                           ovarian mucinous adenocarcinoma
6:                                                                                                                                                                                                                                  ovarian serous carcinoma
                                                                                                 l2_all_disease_terms
                                                                                                               <char>
1:                                                                                                     ovarian cancer
2:                                                                                                     ovarian cancer
3:                                                                     breast cancer, ovarian cancer, prostate cancer
4: breast cancer, cancer, colon and rectum cancer, tracheal bronchus and lung cancer, ovarian cancer, prostate cancer
5:                                                                                                     ovarian cancer
6:                                                                                                     ovarian cancer

4.20 Prostate cancer

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("prostate cancer"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
                                         all_disease_terms
                                                    <char>
1:                                      prostate carcinoma
2:   cancer aggressiveness measurement, prostate carcinoma
3: prostate carcinoma, breast carcinoma, ovarian carcinoma
4:       metastatic prostate cancer, peripheral neuropathy
5:                              metastatic prostate cancer
6:                prostate carcinoma, erectile dysfunction
                             l2_all_disease_terms
                                           <char>
1:                                prostate cancer
2:                                prostate cancer
3: breast cancer, ovarian cancer, prostate cancer
4:         peripheral neuropathy, prostate cancer
5:                                prostate cancer
6:          erectile dysfunction, prostate cancer

4.21 Testicular cancer

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("testicular cancer"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct()
                              all_disease_terms
                                         <char>
1:                         testicular carcinoma
2: testicular carcinoma, cardiovascular disease
3:                          testicular neoplasm
                        l2_all_disease_terms
                                      <char>
1:                         testicular cancer
2: cardiovascular disease, testicular cancer
3:                         testicular cancer

4.22 Kidney cancer

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("kidney cancer"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head() 
                all_disease_terms l2_all_disease_terms
                           <char>               <char>
1:           renal cell carcinoma        kidney cancer
2:                 nephroblastoma        kidney cancer
3:                  kidney cancer        kidney cancer
4:     clear cell renal carcinoma        kidney cancer
5:                renal carcinoma        kidney cancer
6: papillary renal cell carcinoma        kidney cancer

4.23 Bladder cancer

gwas_study_info = 
  gwas_study_info |>
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = vec_to_grep_pattern("urinary bladder cancer"),
                                   "bladder cancer"
                          )  
        )


gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("bladder cancer"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() 
                                            all_disease_terms
                                                       <char>
1:                                  urinary bladder carcinoma
2: urinary bladder carcinoma, disease progression measurement
3:                                    urinary bladder cancer,
4:                                     urinary bladder cancer
   l2_all_disease_terms
                 <char>
1:       bladder cancer
2:       bladder cancer
3:       bladder cancer
4:       bladder cancer

4.24 Brain and central nervous system cancer

gwas_study_info = 
  gwas_study_info |>
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = vec_to_grep_pattern("central nervous system cancer"),
                                   "brain and central nervous system cancer"
                          )  
        )


gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("brain and central nervous system cancer"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
                                        all_disease_terms
                                                   <char>
1:                                glioblastoma multiforme
2:                          central nervous system cancer
3:                                                 glioma
4:                  central nervous system cancer, glioma
5: central nervous system cancer, glioblastoma multiforme
6:                                         brain neoplasm
                      l2_all_disease_terms
                                    <char>
1: brain and central nervous system cancer
2: brain and central nervous system cancer
3: brain and central nervous system cancer
4: brain and central nervous system cancer
5: brain and central nervous system cancer
6: brain and central nervous system cancer

4.25 Eye cancer

gwas_study_info = 
gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = 
                                    vec_to_grep_pattern(c("ocular melanoma",
                                                       "ocular cancer")
                                                       ),
                                   "eye cancer"
                          )  
        )

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("eye cancer"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |> 
  head()
                                 all_disease_terms l2_all_disease_terms
                                            <char>               <char>
1:                                  uveal melanoma           eye cancer
2:                              choroidal melanoma           eye cancer
3:                 epithelioid cell uveal melanoma           eye cancer
4: uveal melanoma, epithelioid cell uveal melanoma           eye cancer
5:                 uveal melanoma disease severity           eye cancer
6:                                   ocular cancer           eye cancer

4.26 Neuroblastoma and other peripheral nervous cell tumors

gwas_study_info =
gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms,
                                   pattern = vec_to_grep_pattern(
                                     c("neuroblastoma",
                                       "peripheral nervous system cancer")
                                   ),
                                   "neuroblastoma and other peripheral nervous cell tumors"
                          )  
        )

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("neuroblastoma and other peripheral nervous cell tumors"),
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
                   all_disease_terms
                              <char>
1:                     neuroblastoma
2: neuroblastoma, cutaneous melanoma
                                                              l2_all_disease_terms
                                                                            <char>
1:                          neuroblastoma and other peripheral nervous cell tumors
2: malignant skin melanoma, neuroblastoma and other peripheral nervous cell tumors

4.27 Thyroid cancer

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("thyroid cancer"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
                  all_disease_terms l2_all_disease_terms
                             <char>               <char>
1: differentiated thyroid carcinoma       thyroid cancer
2:      papillary thyroid carcinoma       thyroid cancer
3:     follicular thyroid carcinoma       thyroid cancer
4:                thyroid carcinoma       thyroid cancer
5:                   thyroid cancer       thyroid cancer

4.28 Mesothelioma

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("mesothelioma"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
                all_disease_terms l2_all_disease_terms
                           <char>               <char>
1: malignant pleural mesothelioma         mesothelioma
2:                   mesothelioma         mesothelioma
3:           pleural mesothelioma         mesothelioma

4.29 Hodgkins lymphoma

gwas_study_info = 
gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = vec_to_grep_pattern("hodgkins lymphoma"),
                                   "hodgkin lymphoma"
                          )  
        )


gwas_study_info |> 
 filter(grepl(vec_to_grep_pattern("hodgkin lymphoma"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
                                                   all_disease_terms
                                                              <char>
1:                                nodular sclerosis hodgkin lymphoma
2:                                                 hodgkins lymphoma
3:                                                          lymphoma
4: hodgkins lymphoma, multiple myeloma, chronic lymphocytic leukemia
5:        hodgkins lymphoma, multiple myeloma, non-hodgkins lymphoma
6:                                                  lymphoma, asthma
                                        l2_all_disease_terms
                                                      <char>
1:                                          hodgkin lymphoma
2:                                          hodgkin lymphoma
3:                    hodgkin lymphoma, non-hodgkin lymphoma
4:              hodgkin lymphoma, leukemia, multiple myeloma
5: hodgkin lymphoma, multiple myeloma, non-hodgkins lymphoma
6:            asthma, hodgkin lymphoma, non-hodgkin lymphoma

4.30 Non-hodgkin lymphoma

gwas_study_info = 
gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = vec_to_grep_pattern("non-hodgkins lymphoma"),
                                   "non-hodgkin lymphoma"
                          )  
        )

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("non-hodgkin lymphoma"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
                                     all_disease_terms
                                                <char>
1:   diffuse large b-cell lymphoma, multiple sclerosis
2:             follicular lymphoma, multiple sclerosis
3:   marginal zone b-cell lymphoma, multiple sclerosis
4: diffuse large b-cell lymphoma, rheumatoid arthritis
5:           rheumatoid arthritis, follicular lymphoma
6: rheumatoid arthritis, marginal zone b-cell lymphoma
                         l2_all_disease_terms
                                       <char>
1:   multiple sclerosis, non-hodgkin lymphoma
2:   multiple sclerosis, non-hodgkin lymphoma
3:   multiple sclerosis, non-hodgkin lymphoma
4: non-hodgkin lymphoma, rheumatoid arthritis
5: non-hodgkin lymphoma, rheumatoid arthritis
6: non-hodgkin lymphoma, rheumatoid arthritis

4.31 Multiple myeloma

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("multiple myeloma"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
                                                   all_disease_terms
                                                              <char>
1:                                                  multiple myeloma
2:                           multiple myeloma, peripheral neuropathy
3:             multiple myeloma, chemotherapy-induced oral mucositis
4:                           multiple myeloma, monoclonal gammopathy
5: hodgkins lymphoma, multiple myeloma, chronic lymphocytic leukemia
6:        hodgkins lymphoma, multiple myeloma, non-hodgkins lymphoma
                                       l2_all_disease_terms
                                                     <char>
1:                                         multiple myeloma
2:                  multiple myeloma, peripheral neuropathy
3:    chemotherapy-induced oral mucositis, multiple myeloma
4:                  monoclonal gammopathy, multiple myeloma
5:             hodgkin lymphoma, leukemia, multiple myeloma
6: hodgkin lymphoma, multiple myeloma, non-hodgkin lymphoma

4.32 Leukemia

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("leukemia"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
                                                       all_disease_terms
                                                                  <char>
1:                                          acute lymphoblastic leukemia
2:                      multiple sclerosis, chronic lymphocytic leukemia
3:                    rheumatoid arthritis, chronic lymphocytic leukemia
4:            systemic lupus erythematosus, chronic lymphocytic leukemia
5: acute lymphoblastic leukemia, asparaginase-induced acute pancreatitis
6:                                          chronic lymphocytic leukemia
                     l2_all_disease_terms
                                   <char>
1:                               leukemia
2:           leukemia, multiple sclerosis
3:         leukemia, rheumatoid arthritis
4: leukemia, systemic lupus erythematosus
5:                 leukemia, pancreatitis
6:                               leukemia

4.33 Other malignant neoplasms

gwas_study_info =
  gwas_study_info |>
  mutate(l2_all_disease_terms = 
         case_when(
         l2_all_disease_terms == "cancer" ~ "other malignant neoplasms",
         TRUE ~ l2_all_disease_terms
                 )
         )

gwas_study_info =
  gwas_study_info |>
  mutate(l2_all_disease_terms = 
         ifelse(PUBMED_ID == 27790247,
                stringr::str_replace_all(l2_all_disease_terms,
                                        pattern = ", cancer,",
                                        ", other malignant neoplasms,"
                                        ),
                l2_all_disease_terms
                
         )
  )


### dealing with measuring cancer caused factor terms
gwas_study_info |> 
  filter(grepl("^cancer,", l2_all_disease_terms)) |> 
  pull(l2_all_disease_terms) |> 
  unique()
[1] "cancer, chronic obstructive pulmonary disease"
[2] "cancer, cardiotoxicity"                       
[3] "cancer, hand-foot syndrome"                   
[4] "cancer, peripheral neuropathy"                
[5] "cancer, immune system toxicity"               
[6] "cancer, hypothyroidism"                       
[7] "cancer, radiation-induced disorder"           
[8] "cancer, osteonecrosis"                        
gwas_study_info =
  gwas_study_info |>
  mutate(l2_all_disease_terms = 
         ifelse(grepl("^cancer,", l2_all_disease_terms),
                stringr::str_replace_all(l2_all_disease_terms,
                                        pattern = "^cancer,",
                                        "other malignant neoplasms,"
                                        ),
                l2_all_disease_terms
                
         )
  )

other_malignant_terms <- c(
                           "retroperitoneal cancer",
                           "peritoneal cancer",
                           "ewing sarcoma",
                           
                           "digestive system cancer",
                           "intestinal cancer",
                           "small intestine cancer",
                           
                           "female reproductive organ cancer",
                           "male reproductive organ cancer",
                           "vulvar cancer",                           
                           "testicular germ cell tumor",
                           "urogenital cancer",
                           
                           "squamous cell cancer",

                           "head and neck cancer",
                           "malignant tumor of floor of mouth", 
                           "nasal cavity cancer", #? not sure if should be somewhere else .. 
                           
                           "malignant lymphoid tumor",
                           "neuroendocrine tumor",
                           "lymphatic system cancer",
                           
                           "childhood cancer" #? maybe sort furtrher

                           )

gwas_study_info = 
gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms,
                                   pattern = vec_to_grep_pattern(other_malignant_terms),
                                  # pattern = paste0(other_malignant_terms, collapse = "(?=,|$)|\\b"),
                                  "other malignant neoplasms"
                          )  
        )


gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("other malignant neoplasms"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |> 
  head()
                      all_disease_terms
                                 <char>
1:              squamous cell carcinoma
2:                               cancer
3:     childhood cancer, cardiomyopathy
4:              neuroendocrine neoplasm
5: small intestine neuroendocrine tumor
6:      pancreatic neuroendocrine tumor
                        l2_all_disease_terms
                                      <char>
1:                 other malignant neoplasms
2:                 other malignant neoplasms
3: cardiomyopathy, other malignant neoplasms
4:                 other malignant neoplasms
5:                 other malignant neoplasms
6:                 other malignant neoplasms

4.34 Other neoplasms

gwas_study_info =
  gwas_study_info |>
  mutate(l2_all_disease_terms = 
         case_when(
         l2_all_disease_terms == "benign neoplasm" ~ "other neoplasms",
         TRUE ~ l2_all_disease_terms
                 )
         )

unknown_sig_terms <- c("intracranial germ cell tumor",
                       "bladder tumor")

gwas_study_info =
gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = vec_to_grep_pattern(unknown_sig_terms),
                                   "other neoplasms"
                          )
 )

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("other neoplasms"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
                         all_disease_terms l2_all_disease_terms
                                    <char>               <char>
1:            benign prostatic hyperplasia      other neoplasms
2:                      colorectal adenoma      other neoplasms
3: colorectal cancer, endometrial neoplasm      other neoplasms
4:      upper aerodigestive tract neoplasm      other neoplasms
5:                              meningioma      other neoplasms
6:   nasopharyngeal neoplasm, hearing loss      other neoplasms

5 Cardiovascular diseases

5.1 Rheumatic heart disease

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("rheumatic heart disease"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
         all_disease_terms    l2_all_disease_terms
                    <char>                  <char>
1: rheumatic heart disease rheumatic heart disease

5.2 Ischemic heart disease

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("ischemic heart disease"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
                                                      all_disease_terms
                                                                 <char>
1:                              non-obstructive coronary artery disease
2:                 mucocutaneous lymph node syndrome, coronary aneurysm
3:                           migraine disorder, coronary artery disease
4:                                              coronary artery disease
5:           type 2 diabetes mellitus, obesity, coronary artery disease
6: alzheimer disease, type 2 diabetes mellitus, coronary artery disease
                                                   l2_all_disease_terms
                                                                 <char>
1:                                               ischemic heart disease
2:            ischemic heart disease, mucocutaneous lymph node syndrome
3:                                     ischemic heart disease, migraine
4:                                               ischemic heart disease
5:            ischemic heart disease, obesity, type 2 diabetes mellitus
6: alzheimers disease, ischemic heart disease, type 2 diabetes mellitus

== coronary artery disease (https://www.ncbi.nlm.nih.gov/books/NBK209964/)

gwas_study_info = gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms,
                                  pattern = vec_to_grep_pattern("coronary artery disease"),
                                   "ischemic heart disease"
                          )  
        )

5.3 Stroke

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("stroke"),
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
                        all_disease_terms       l2_all_disease_terms
                                   <char>                     <char>
1:               intracerebral hemorrhage                     stroke
2:     non-lobar intracerebral hemorrhage                     stroke
3:         lobar intracerebral hemorrhage                     stroke
4:                    small vessel stroke                     stroke
5: alzheimer disease, small vessel stroke alzheimers disease, stroke
6:                        ischemic stroke                     stroke

5.4 Hypertensive heart disease

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("hypertensive heart disease"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
                            all_disease_terms
                                       <char>
1: hypertensive heart disease, kidney disease
2:                 hypertensive heart disease
                         l2_all_disease_terms
                                       <char>
1: hypertensive heart disease, kidney disease
2:                 hypertensive heart disease

5.5 Non-rheumatic valvular heart disease

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("heart valve disease"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
                             all_disease_terms l2_all_disease_terms
                                        <char>               <char>
1: aortic stenosis, aortic valve calcification  heart valve disease
2:                             aortic stenosis  heart valve disease
3:                  aortic valve calcification  heart valve disease
4:                       mitral valve prolapse  heart valve disease
5:                        aortic valve disease  heart valve disease
6:                         heart valve disease  heart valve disease
gwas_study_info = gwas_study_info |>
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = vec_to_grep_pattern("heart valve disease"),
                                   "non-rheumatic valvular heart disease"
                          )  
        )



gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("non-rheumatic valvular heart disease"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
                             all_disease_terms
                                        <char>
1: aortic stenosis, aortic valve calcification
2:                             aortic stenosis
3:                  aortic valve calcification
4:                       mitral valve prolapse
5:                        aortic valve disease
6:                         heart valve disease
                   l2_all_disease_terms
                                 <char>
1: non-rheumatic valvular heart disease
2: non-rheumatic valvular heart disease
3: non-rheumatic valvular heart disease
4: non-rheumatic valvular heart disease
5: non-rheumatic valvular heart disease
6: non-rheumatic valvular heart disease

5.6 Cardiomyopathy and myocarditis

gwas_study_info =
gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = vec_to_grep_pattern(c("cardiomyopathy",
                                                                  "myocarditis")),
                                   "cardiomyopathy and myocarditis"
                          )  
        )

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("cardiomyopathy and myocarditis"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
                  all_disease_terms
                             <char>
1: childhood cancer, cardiomyopathy
2:      hypertrophic cardiomyopathy
3:           dilated cardiomyopathy
4:            chagas cardiomyopathy
5:        peripartum cardiomyopathy
6:         takotsubo cardiomyopathy
                                        l2_all_disease_terms
                                                      <char>
1: cardiomyopathy and myocarditis, other malignant neoplasms
2:                            cardiomyopathy and myocarditis
3:                            cardiomyopathy and myocarditis
4:                            cardiomyopathy and myocarditis
5:                            cardiomyopathy and myocarditis
6:                            cardiomyopathy and myocarditis

5.7 Pulmonary arterial hypertension

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("pulmonary arterial hypertension"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
                 all_disease_terms            l2_all_disease_terms
                            <char>                          <char>
1: pulmonary arterial hypertension pulmonary arterial hypertension

5.8 Atrial fibrillation & flutter

afib_terms <- c("atrial fibrillation",
                "atrial flutter",
                "post-operative atrial fibrillation")

gwas_study_info = 
gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = vec_to_grep_pattern(afib_terms),
                                   "atrial fibrillation and flutter"
                          )  
        )

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("atrial fibrillation and flutter"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
                                                                                all_disease_terms
                                                                                           <char>
1:                                                                            atrial fibrillation
2: heart failure, diabetes mellitus, stroke, atrial fibrillation, coronary artery disease, cancer
3: stroke, atrial fibrillation, coronary artery disease, heart failure, diabetes mellitus, cancer
4:                                                                                 atrial flutter
5:                                                             post-operative atrial fibrillation
                                                                                                           l2_all_disease_terms
                                                                                                                         <char>
1:                                                                                              atrial fibrillation and flutter
2: atrial fibrillation and flutter, other malignant neoplasms, diabetes mellitus, heart failure, ischemic heart disease, stroke
3: atrial fibrillation and flutter, other malignant neoplasms, diabetes mellitus, heart failure, ischemic heart disease, stroke
4:                                                                                              atrial fibrillation and flutter
5:                                                                              atrial fibrillation and flutter, post-operative

5.9 Aortic aneurysm

url <- "http://www.ebi.ac.uk/ols4/api/ontologies/cvdo/terms/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FDOID_3627/descendants"

aortic_aneurysm_terms <- get_descendants(url)
[1] "Number of terms collected:"
[1] 6
[1] "\n Some example terms"
[1] "ruptured thoracoabdominal aortic aneurysm"
[2] "ruptured abdominal aortic aneurysm"       
[3] "ruptured thoracic aortic aneurysm"        
[4] "abdominal aortic aneurysm"                
[5] "ruptured aortic aneurysm"                 
gwas_study_info = gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = vec_to_grep_pattern(aortic_aneurysm_terms),
                                   "aortic aneurysm"
                          )  
        )

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("aortic aneurysm"), 
              l2_all_disease_terms,
              perl = T
              )) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
Empty data.table (0 rows and 2 cols): all_disease_terms,l2_all_disease_terms

5.10 Lower extremity peripheral arterial disease

lower_extremity_peripheral_arterial_disease_terms <- c("raynaud disease"
                                                        )

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("lower extremity peripheral arterial disease"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
                                       all_disease_terms
                                                  <char>
1:                           peripheral arterial disease
2: type 2 diabetes mellitus, peripheral arterial disease
3:        diabetes mellitus, peripheral arterial disease
                                                    l2_all_disease_terms
                                                                  <char>
1:                           lower extremity peripheral arterial disease
2: lower extremity peripheral arterial disease, type 2 diabetes mellitus
3:        diabetes mellitus, lower extremity peripheral arterial disease

5.11 Endocarditis

gwas_study_info |> 
 filter(grepl(vec_to_grep_pattern("endocarditis"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
                                         all_disease_terms
                                                    <char>
1: staphylococcus aureus infection, bacterial endocarditis
2:                                            endocarditis
                            l2_all_disease_terms
                                          <char>
1: endocarditis, staphylococcus aureus infection
2:                                  endocarditis

5.12 Other cardiovascular and circulatory diseases

other_cardiovascular_terms <- c("tachycardia",
                                "other cardiac arrhythmias",
                                "heart block",
                                "carotid artery disease",
                                "hypertension",
                                "pericarditis",
                                "coronary artery calcification",
                                "arterial occlusion",
                                "other vascular disorders",
                                "congestive heart failure",
                                "heart failure",
                                "thrombotic diseas",
                                "arterial embolism",
                                "cardiac embolism",
                                "venus embolism",
                                "venus thrombosis",
                                "pulmonary embolism",                                
                                "arterial thrombosis",
                                "thromboembolism",
                                "vascular insufficiency",
                                "brain infarction",
                                "heart murmur"
                                 )

gwas_study_info =
gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = vec_to_grep_pattern(other_cardiovascular_terms),
                                   "other cardiovascular and circulatory diseases"
                          )
 )

gwas_study_info =
gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = "(?<=^|, ) other vascular disorders(?=,|$)",
                                   "other cardiovascular and circulatory diseases"
                          )
 )

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("other cardiovascular and circulatory diseases"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
                                                                                                all_disease_terms
                                                                                                           <char>
1:                                                                                                   hypertension
2:                                                                                hypertension, alzheimer disease
3:                                                                              ischemic stroke, cardiac embolism
4:                                                                        ischemic stroke, small artery occlusion
5: ischemic stroke, venous thromboembolism, stroke, deep vein thrombosis, pulmonary embolism, abnormal thrombosis
6:                                                        type 2 diabetes mellitus, coronary artery calcification
                                                                                                      l2_all_disease_terms
                                                                                                                    <char>
1:                                                                           other cardiovascular and circulatory diseases
2:                                                       alzheimers disease, other cardiovascular and circulatory diseases
3:                                                                   other cardiovascular and circulatory diseases, stroke
4:                                                                   other cardiovascular and circulatory diseases, stroke
5: deep vein thrombosis, other cardiovascular and circulatory diseases, stroke, thrombotic disease, venous thromboembolism
6:                                                 other cardiovascular and circulatory diseases, type 2 diabetes mellitus

6 Chronic respiratory diseases

6.1 Chronic obstructive pulmonary disease

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("chronic obstructive pulmonary disease"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
                                                   all_disease_terms
                                                              <char>
1:                             chronic obstructive pulmonary disease
2:         chronic obstructive pulmonary disease, chronic bronchitis
3: digestive system carcinoma, chronic obstructive pulmonary disease
4:                     cancer, chronic obstructive pulmonary disease
5:             lung carcinoma, chronic obstructive pulmonary disease
6:                     asthma, chronic obstructive pulmonary disease
                                                       l2_all_disease_terms
                                                                     <char>
1:                                    chronic obstructive pulmonary disease
2:                chronic bronchitis, chronic obstructive pulmonary disease
3:         chronic obstructive pulmonary disease, other malignant neoplasms
4:         other malignant neoplasms, chronic obstructive pulmonary disease
5: chronic obstructive pulmonary disease, tracheal bronchus and lung cancer
6:                            asthma, chronic obstructive pulmonary disease

6.2 Pneumoconiosis

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("pneumoconiosis"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
   all_disease_terms l2_all_disease_terms
              <char>               <char>
1:    pneumoconiosis       pneumoconiosis

6.3 Asthma

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("asthma"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
                                                all_disease_terms
                                                           <char>
1:                                                         asthma
2:                                       asthma, allergic disease
3:                                         childhood onset asthma
4: childhood onset asthma, respiratory symptom change measurement
5:                                         age of onset of asthma
6:                                         aspirin-induced asthma
       l2_all_disease_terms
                     <char>
1:                   asthma
2: allergic disease, asthma
3:                   asthma
4:                   asthma
5:                   asthma
6:                   asthma

6.4 Interstitial lung disease and pulmonary sarcoidosis

interstitial_lung_disease_terms <- c("pulmonary sarcoidosis",
                                   "interstitial lung disease"
                                   )

gwas_study_info = 
gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = vec_to_grep_pattern(interstitial_lung_disease_terms),
                                   "interstitial lung disease and pulmonary sarcoidosis"
                          )  
        )

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("interstitial lung disease and pulmonary sarcoidosis"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
                                 all_disease_terms
                                            <char>
1: rheumatoid arthritis, interstitial lung disease
2:                       interstitial lung disease
3: systemic scleroderma, interstitial lung disease
                                                        l2_all_disease_terms
                                                                      <char>
1: interstitial lung disease and pulmonary sarcoidosis, rheumatoid arthritis
2:                       interstitial lung disease and pulmonary sarcoidosis
3: interstitial lung disease and pulmonary sarcoidosis, systemic scleroderma

6.5 Other chronic respiratory diseases

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("other chronic respiratory diseases"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
Empty data.table (0 rows and 2 cols): all_disease_terms,l2_all_disease_terms

7 Digestive diseases

7.1 Cirrhosis & other chronic liver diseases

url <- "http://www.ebi.ac.uk/ols4/api/ontologies/snomed/terms/http%253A%252F%252Fsnomed.info%252Fid%252F328383001/descendants"

chronic_liver_disease_terms <- get_descendants(url)
[1] "Number of terms collected:"
[1] 114
[1] "\n Some example terms"
[1] "hepatic ascites co-occurrent with chronic active hepatitis due to toxic liver disease"
[2] "cirrhosis of liver co-occurrent and due to primary sclerosing cholangitis (disorder)" 
[3] "chronic hepatitis c co-occurrent with human immunodeficiency virus infection"         
[4] "primary biliary cirrhosis co-occurrent with systemic scleroderma (disorder)"          
[5] "pulmonary fibrosis, hepatic hyperplasia, bone marrow hypoplasia syndrome"             
chronic_liver_disease_terms <- c("primary biliary cirrhosis",
                                 "alcoholic liver cirrhosis",
                                 "chronic hepatitis B virus infection", 
                                 "acute-on-chronic liver failure",
                                 "non-alcoholic fatty liver disease",
                                 "cirrhosis of liver",
                                 "primary biliary cirrhosis",
                                 "chronic hepatitis",
                                 "liver disease",
                                 chronic_liver_disease_terms)

gwas_study_info = gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms,
                                  pattern = vec_to_grep_pattern(chronic_liver_disease_terms),
                                   "cirrhosis and other chronic liver diseases"
                          )  
        )

gwas_study_info = gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms,
                                  pattern = "(?<=^|, ) liver disease(?=,|$)",
                                   "cirrhosis and other chronic liver diseases"
                          )  
        )


gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("cirrhosis and other chronic liver diseases"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
                                        all_disease_terms
                                                   <char>
1:                              primary biliary cirrhosis
2:                       alagille syndrome, liver disease
3:              inflammatory bowel disease, liver disease
4: non-alcoholic fatty liver disease severity measurement
5:    non-alcoholic fatty liver disease, hepatic fibrosis
6:                          non-alcoholic steatohepatitis
                                                     l2_all_disease_terms
                                                                   <char>
1:                             cirrhosis and other chronic liver diseases
2:          alagille syndrome, cirrhosis and other chronic liver diseases
3: inflammatory bowel disease, cirrhosis and other chronic liver diseases
4:                             cirrhosis and other chronic liver diseases
5:           hepatic fibrosis, cirrhosis and other chronic liver diseases
6:                             cirrhosis and other chronic liver diseases

7.2 Upper digestive system diseases

upper_dig_terms <- c("peptic ulcer diseases",
                     "duodenitis",
                     "gastritis",
                     "gastroesophageal reflux disease")

gwas_study_info = gwas_study_info |>
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms,
                                  pattern = vec_to_grep_pattern(upper_dig_terms),
                                   "upper digestive system diseases"
                          )  
        )

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("upper digestive system diseases"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
                                                 all_disease_terms
                                                            <char>
1:                                 gastroesophageal reflux disease
2:      esophageal adenocarcinoma, gastroesophageal reflux disease
3:             barretts esophagus, gastroesophageal reflux disease
4:                                                       gastritis
5:      gastroesophageal reflux disease, major depressive disorder
6: post-traumatic stress disorder, gastroesophageal reflux disease
                                              l2_all_disease_terms
                                                            <char>
1:                                 upper digestive system diseases
2:              esophageal cancer, upper digestive system diseases
3:             barretts esophagus, upper digestive system diseases
4:                                 upper digestive system diseases
5:      upper digestive system diseases, major depressive disorder
6: upper digestive system diseases, post-traumatic stress disorder

7.3 Appendicitis

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("appendicitis"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
   all_disease_terms l2_all_disease_terms
              <char>               <char>
1:      appendicitis         appendicitis

7.4 Paralytic ileus and intestinal obstruction

gwas_study_info = gwas_study_info |>
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms,
                                  pattern = vec_to_grep_pattern(c("paralytic ileus",
                                                             "intestinal obstruction")
                                                             ),
                                   "paralytic ileus and intestinal obstruction"
                          )  
        )

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("paralytic ileus and intestinal obstruction"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
                                            all_disease_terms
                                                       <char>
1:                  cystic fibrosis associated meconium ileus
2:                                     intestinal obstruction
3:                                            paralytic ileus
4:                                       intestinal impaction
5: cystic fibrosis, cystic fibrosis associated meconium ileus
6:                                                      ileus
                                          l2_all_disease_terms
                                                        <char>
1:                  paralytic ileus and intestinal obstruction
2:                  paralytic ileus and intestinal obstruction
3:                  paralytic ileus and intestinal obstruction
4:                  paralytic ileus and intestinal obstruction
5: cystic fibrosis, paralytic ileus and intestinal obstruction
6:                  paralytic ileus and intestinal obstruction

7.5 Inguinal femoral and abdominal hernia

hernia_terms <- c("inguinal hernia",
                  "femoral hernia",
                  "abdominal hernia",
                  "hernia of the abdominal wall")

gwas_study_info = gwas_study_info |>
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms,
                                  pattern = vec_to_grep_pattern(hernia_terms),
                                   "inguinal femoral and abdominal hernia"
                          )  
        )

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("inguinal femoral and abdominal hernia"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
              all_disease_terms                  l2_all_disease_terms
                         <char>                                <char>
1:              inguinal hernia inguinal femoral and abdominal hernia
2: hernia of the abdominal wall inguinal femoral and abdominal hernia
3:               femoral hernia inguinal femoral and abdominal hernia
4:             umbilical hernia inguinal femoral and abdominal hernia
5:               ventral hernia inguinal femoral and abdominal hernia
6:            incisional hernia inguinal femoral and abdominal hernia

7.6 Inflammatory bowel disease

ibd_terms <- c("crohns disease",
               "ulcerative colitis",
               "inflammatory bowel disease",
               "enteritis",
               "gastroenteritis")

gwas_study_info = gwas_study_info |>
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms,
                                  pattern = vec_to_grep_pattern(ibd_terms),
                                   "inflammatory bowel disease"
                          )  
        )

gwas_study_info |>
  filter(grepl(vec_to_grep_pattern("inflammatory bowel disease"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
                          all_disease_terms
                                     <char>
1:                           crohns disease
2:               inflammatory bowel disease
3:                       ulcerative colitis
4:   inflammatory bowel disease, leukopenia
5:     inflammatory bowel disease, alopecia
6: inflammatory bowel disease, pancreatitis
                       l2_all_disease_terms
                                     <char>
1:               inflammatory bowel disease
2:               inflammatory bowel disease
3:               inflammatory bowel disease
4:   inflammatory bowel disease, leukopenia
5:     alopecia, inflammatory bowel disease
6: inflammatory bowel disease, pancreatitis

7.7 Vascular intestinal disorders

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("vascular intestinal disorders"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
Empty data.table (0 rows and 2 cols): all_disease_terms,l2_all_disease_terms

7.8 Gallbladder and biliary diseases

gal_bile_terms = c("gallbladder disease",
                   "bile duct disorder",
                   "biliary tract disease",
                   "cholelithiasis",
                   "cholecystitis")


gwas_study_info = gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms,
                                  pattern = vec_to_grep_pattern(gal_bile_terms),
                                   "gallbladder and biliary diseases"
                          )  
        )

gwas_study_info |>
 filter(grepl("gallbladder and biliary diseases", 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
                    all_disease_terms
                               <char>
1: sickle cell anemia, cholelithiasis
2:                     cholelithiasis
3:                 bile duct disorder
4:                gallbladder disease
5:              biliary tract disease
6:      cholelithiasis, cholecystitis
                                                         l2_all_disease_terms
                                                                       <char>
1: gallbladder and biliary diseases, sickle cell disease and related diseases
2:                                           gallbladder and biliary diseases
3:                                           gallbladder and biliary diseases
4:                                           gallbladder and biliary diseases
5:                                           gallbladder and biliary diseases
6:         gallbladder and biliary diseases, gallbladder and biliary diseases

7.9 Pancreatitis

gwas_study_info |>
 filter(grepl("pancreatitis", 
              l2_all_disease_terms)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
                                                               all_disease_terms
                                                                          <char>
1:         alcoholic liver cirrhosis, alcoholic pancreatitis, alcohol dependence
2:                             alcoholic liver cirrhosis, alcoholic pancreatitis
3:                                      inflammatory bowel disease, pancreatitis
4:         acute lymphoblastic leukemia, asparaginase-induced acute pancreatitis
5: autoimmune pancreatitis type 1, salivary gland lesion, lachrymal gland lesion
6:                                                        alcoholic pancreatitis
                                               l2_all_disease_terms
                                                             <char>
1: alcohol-related disorders, alcoholic liver disease, pancreatitis
2:                            alcoholic liver disease, pancreatitis
3:                         inflammatory bowel disease, pancreatitis
4:                                           leukemia, pancreatitis
5:      lachrymal gland lesion, pancreatitis, salivary gland lesion
6:                                                     pancreatitis

7.10 Other digestive diseases

# I84-I84.9, 
# K20-K20.9, 
# K22-K22.6, 
# K22.8-K24, 
# K31-K31.8, 
# K38-K38.2, 
# K57-K62, 
# K62.2-K62.6, 
# K62.8-K62.9, 
# K64-K64.9, 
# K66.8, K67, 
# K68, 
# K77, 
# K90-K90.9, 
# K92.8, 
# K93.8

# 579 - celiac disease

other_digestive_terms <- c("esophagitis",
                           "eosinophilic esophagitis",
                           "esophageal ulcer",
                           # "barretts esophagus",
                           "diverticulitis",
                           "celiac disease",
                           "irritable bowel syndrome",
                           "anal fissure",
                           "anal fistula",
                           #? "anal polyp"
                           "rectal prolapse",
                           "rectal abscess"
                           )

gwas_study_info = gwas_study_info |>
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms,
                                  pattern = vec_to_grep_pattern(other_digestive_terms),
                                   "other digestive diseases"
                          )
 )


gwas_study_info |>
 filter(grepl("other digestive diseases", 
              l2_all_disease_terms)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
                                                                                                                                                                                                                                                all_disease_terms
                                                                                                                                                                                                                                                           <char>
1: autoimmune thyroid disease, systemic lupus erythematosus, type 1 diabetes mellitus, ankylosing spondylitis, psoriasis, common variable immunodeficiency, celiac disease, ulcerative colitis, crohns disease, autoimmune disease, juvenile idiopathic arthritis
2:                                                                                                                                                                                                                                                 celiac disease
3:                                                                                                                                                   rheumatoid arthritis, frontotemporal dementia, psoriasis, celiac disease, ulcerative colitis, crohns disease
4:                                                                                                                  rheumatoid arthritis, type 1 diabetes mellitus, psoriasis, celiac disease, ulcerative colitis, crohns disease, progressive supranuclear palsy
5:                                                                                                                   rheumatoid arthritis, amyotrophic lateral sclerosis, type 1 diabetes mellitus, psoriasis, celiac disease, ulcerative colitis, crohns disease
6:                                                                                                              rheumatoid arthritis, type 1 diabetes mellitus, corticobasal degeneration disorder, psoriasis, celiac disease, ulcerative colitis, crohns disease
                                                                                                                                                                                                                                                                           l2_all_disease_terms
                                                                                                                                                                                                                                                                                         <char>
1: ankylosing spondylitis, autoimmune disease, autoimmune thyroid disease, other digestive diseases, common variable immunodeficiency, inflammatory bowel disease, juvenile idiopathic arthritis, psoriasis, systemic lupus erythematosus, type 1 diabetes mellitus, inflammatory bowel disease
2:                                                                                                                                                                                                                                                                     other digestive diseases
3:                                                                                                                                                   other digestive diseases, inflammatory bowel disease, frontotemporal dementia, psoriasis, rheumatoid arthritis, inflammatory bowel disease
4:                                                                                                                  other digestive diseases, inflammatory bowel disease, progressive supranuclear palsy, psoriasis, rheumatoid arthritis, type 1 diabetes mellitus, inflammatory bowel disease
5:                                                                                                                   amyotrophic lateral sclerosis, other digestive diseases, inflammatory bowel disease, psoriasis, rheumatoid arthritis, type 1 diabetes mellitus, inflammatory bowel disease
6:                                                                                                              other digestive diseases, corticobasal degeneration disorder, inflammatory bowel disease, psoriasis, rheumatoid arthritis, type 1 diabetes mellitus, inflammatory bowel disease

8 Neurological disorders

8.1 Alzheimer’s disease and other dementias

dementia <- c("alzheimers disease biomarker measurement",
              "alzheimers disease neuropathologic change",
              "aids dementia",
              "dementia",
              "frontotemporal dementia",
              "lewy body dementia",
              "vascular dementia",
              "alzheimers disease",
              "neurodegenerative disease"
)

gwas_study_info = 
gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = vec_to_grep_pattern(dementia),
                                   "alzheimer's disease and other dementias"
                          )  
        )

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("alzheimer's disease and other dementias"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
                                         all_disease_terms
                                                    <char>
1:                                       alzheimer disease
2:                       age of onset of alzheimer disease
3:                         hypertension, alzheimer disease
4:                    family history of alzheimers disease
5: alzheimer disease, family history of alzheimers disease
6:                                      lewy body dementia
                                                                     l2_all_disease_terms
                                                                                   <char>
1:                                                alzheimer's disease and other dementias
2:                                                alzheimer's disease and other dementias
3: alzheimer's disease and other dementias, other cardiovascular and circulatory diseases
4:                                                alzheimer's disease and other dementias
5:                                                alzheimer's disease and other dementias
6:                                                alzheimer's disease and other dementias

8.2 Parkinsons disease

gwas_study_info = 
gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = "parkinsons disease",
                                   "parkinson's disease"
                          )  
        )

gwas_study_info |>
 filter(grepl("parkinson's disease", l2_all_disease_terms)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
                                              all_disease_terms
                                                         <char>
1:                                            parkinson disease
2:                            age of onset of parkinson disease
3:                   frontotemporal dementia, parkinson disease
4:                       lewy body attribute, parkinson disease
5:                             schizophrenia, parkinson disease
6: dementia, parkinson disease, disease progression measurement
                                           l2_all_disease_terms
                                                         <char>
1:                                          parkinson's disease
2:                                          parkinson's disease
3: alzheimer's disease and other dementias, parkinson's disease
4: alzheimer's disease and other dementias, parkinson's disease
5:                           parkinson's disease, schizophrenia
6: alzheimer's disease and other dementias, parkinson's disease

8.3 Idiopathic epilepsy

gwas_study_info =
gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = "idiopathic generalized epilepsy",
                                   "idiopathic epilepsy"
                          )  
        )

gwas_study_info |>
 filter(grepl("idiopathic epilepsy", l2_all_disease_terms)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
             all_disease_terms l2_all_disease_terms
                        <char>               <char>
1:  childhood absence epilepsy  idiopathic epilepsy
2:   juvenile absence epilepsy  idiopathic epilepsy
3: juvenile myoclonic epilepsy  idiopathic epilepsy

8.4 Multiple sclerosis

gwas_study_info |>
 filter(grepl("multiple sclerosis", l2_all_disease_terms)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
                                   all_disease_terms
                                              <char>
1:                                multiple sclerosis
2:      type 2 diabetes mellitus, multiple sclerosis
3:  multiple sclerosis, chronic lymphocytic leukemia
4: diffuse large b-cell lymphoma, multiple sclerosis
5:           follicular lymphoma, multiple sclerosis
6: marginal zone b-cell lymphoma, multiple sclerosis
                           l2_all_disease_terms
                                         <char>
1:                           multiple sclerosis
2: multiple sclerosis, type 2 diabetes mellitus
3:                 leukemia, multiple sclerosis
4:     multiple sclerosis, non-hodgkin lymphoma
5:     multiple sclerosis, non-hodgkin lymphoma
6:     multiple sclerosis, non-hodgkin lymphoma

8.5 Motor neuron disease

gwas_study_info |>
 filter(grepl("motor neuron disease", l2_all_disease_terms)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
      all_disease_terms l2_all_disease_terms
                 <char>               <char>
1: motor neuron disease motor neuron disease

8.6 Headache disorders

headache_terms <- c("headache disorder",
                    "cluster headache",
                    "migraine"
                    )



gwas_study_info = gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms,
                                  #pattern = "\\bheadache disorder\\b|cluster headache\\b|migraine\\b",
                                  pattern = vec_to_grep_pattern(headache_terms),
                                   "headache disorders"
                          )  
        )

gwas_study_info |>
 filter(grepl("(?<=^|, )headache disorders", 
              l2_all_disease_terms, 
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
                            all_disease_terms
                                       <char>
1: migraine disorder, coronary artery disease
2:                      migraine without aura
3:                          migraine disorder
4:                         migraine with aura
5:                           cluster headache
6:        bipolar disorder, migraine disorder
                         l2_all_disease_terms
                                       <char>
1: ischemic heart disease, headache disorders
2:                         headache disorders
3:                         headache disorders
4:                         headache disorders
5:                         headache disorders
6:       bipolar disorder, headache disorders

8.7 Other neurological disorders

other_neuro_terms <- c("huntington disease",
                       "hereditary ataxia",
                       "torsion dystonia",
                       "x-linked dystonia-parkinsonism",
                       "isolated dystonia",
                       "limb dystonia",
                       "myoclonus",
                       "restless legs syndrome",
                        "chronic inflammatory demyelinating polyneuropathy",
                       "demyelinating disease of central nervous system",
                       "myasthenia gravis",
                       "complex regional pain syndrome")

gwas_study_info = gwas_study_info |>
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms,
                                  pattern = vec_to_grep_pattern(other_neuro_terms),
                                   "other neurological disorders"
                          )  
        )

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("other neurological disorders"), 
              l2_all_disease_terms, 
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
                                     all_disease_terms
                                                <char>
1:                              restless legs syndrome
2: huntington disease, disease progression measurement
3:                        late-onset myasthenia gravis
4:                      complex regional pain syndrome
5:                  age of onset of huntington disease
6:                                   myasthenia gravis
           l2_all_disease_terms
                         <char>
1: other neurological disorders
2: other neurological disorders
3: other neurological disorders
4: other neurological disorders
5: other neurological disorders
6: other neurological disorders

9 Mental disorders

9.1 Schizophrenia

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("schizophrenia"), 
              l2_all_disease_terms, 
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
                                                  all_disease_terms
                                                             <char>
1:                                                    schizophrenia
2:       major depressive disorder, schizophrenia, bipolar disorder
3:                                  schizophrenia, bipolar disorder
4:                         schizophrenia, major depressive disorder
5:                          schizophrenia, type 2 diabetes mellitus
6: treatment refractory schizophrenia, drug-induced agranulocytosis
                                         l2_all_disease_terms
                                                       <char>
1:                                              schizophrenia
2: bipolar disorder, major depressive disorder, schizophrenia
3:                            bipolar disorder, schizophrenia
4:                   major depressive disorder, schizophrenia
5:                    schizophrenia, type 2 diabetes mellitus
6:                drug-induced agranulocytosis, schizophrenia

9.2 Depressive disorders

depressive_terms <- c("depressive disorder",
                      "depressive symptom",
                      "depressive episode",
                      "major depressive disorders",
                      "major depressive disorder",
                      "major depressive episode",
                      "depressive"
)


gwas_study_info = gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = vec_to_grep_pattern(depressive_terms),
                                   "depressive disorders"
                          )  
        ) |>
   mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = vec_to_grep_pattern("depressive disorder"),
                                   "depressive disorders"
                          )  
        ) 
  
  
gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("depressive disorders"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
                                            all_disease_terms
                                                       <char>
1:                                  major depressive disorder
2: major depressive disorder, schizophrenia, bipolar disorder
3:                major depressive disorder, bipolar disorder
4:                bipolar disorder, major depressive disorder
5:                   schizophrenia, major depressive disorder
6:  depressive symptom measurement, major depressive disorder
                                    l2_all_disease_terms
                                                  <char>
1:                                  depressive disorders
2: bipolar disorder, depressive disorders, schizophrenia
3:                bipolar disorder, depressive disorders
4:                bipolar disorder, depressive disorders
5:                   depressive disorders, schizophrenia
6:            depressive disorders, depressive disorders

9.3 Anxiety disorders

url <- "http://www.ebi.ac.uk/ols4/api/ontologies/mesh/terms/http%253A%252F%252Fid.nlm.nih.gov%252Fmesh%252FD001008/descendants"

anxiety_terms <- get_descendants(url)
[1] "Number of terms collected:"
[1] 15
[1] "\n Some example terms"
[1] "obsessive-compulsive disorder" "generalized anxiety disorder" 
[3] "neurocirculatory asthenia"     "excoriation disorder"         
[5] "anxiety, separation"          
anxiety_terms <- c(anxiety_terms, 
                   "obsessive-compulsive symptom measurement",
                   "obsessive-compulsive disorder",
                   "obsessive-compulsive"
                   )


gwas_study_info = 
gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = vec_to_grep_pattern(anxiety_terms),
                                   "anxiety disorders"
                          )  
        ) |>
   mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = "(?<=^|, ) anxiety disorder(?=,|$)|(?<=^|, ) anxiety measurement(?=,|$)",
                                   "anxiety disorders"
                          )  
        ) |>
     mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = "(?<=^|, ) anxiety(?=,|$)",
                                   "anxiety disorders"
                          )  
        )

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("anxiety disorders"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
                                                         all_disease_terms
                                                                    <char>
1:                                            generalized anxiety disorder
2: obsessive-compulsive disorder, attention deficit hyperactivity disorder
3:                                           obsessive-compulsive disorder
4:                                obsessive-compulsive symptom measurement
5:                 obsessive-compulsive disorder, autism spectrum disorder
6:                                                          panic disorder
        l2_all_disease_terms
                      <char>
1:         anxiety disorders
2:   adhd, anxiety disorders
3:         anxiety disorders
4:         anxiety disorders
5: autism, anxiety disorders
6:         anxiety disorders

9.4 Eating disorders

gwas_study_info = gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = 
                                    vec_to_grep_pattern(
                                      c("bulimia nervosa",
                                        "anorexia nervosa",
                                        "binge eating",
                                        "eating disorder"
                                      )
                                    ),
                                  "eating disorders"
                          )  
        ) |>
   mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = "anorexia",
                                  "eating disorders"
                          )  
        )


gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("eating disorders"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
                   all_disease_terms               l2_all_disease_terms
                              <char>                             <char>
1:    bipolar disorder, binge eating eating disorders, bipolar disorder
2:    binge eating, bipolar disorder eating disorders, bipolar disorder
3:                  anorexia nervosa                   eating disorders
4:                   eating disorder                   eating disorders
5:                   bulimia nervosa                   eating disorders
6: bipolar disorder, eating disorder bipolar disorder, eating disorders

9.5 Autism spectrum disorders

gwas_study_info = gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms,
                                  pattern = vec_to_grep_pattern("autism"),
                                   "autism spectrum disorders"
                          )  
        )

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("autism spectrum disorders"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
                                         all_disease_terms
                                                    <char>
1:                        autism spectrum disorder symptom
2:                                       asperger syndrome
3:                 autism spectrum disorder, schizophrenia
4:                                autism spectrum disorder
5: obsessive-compulsive disorder, autism spectrum disorder
6:                                                  autism
                           l2_all_disease_terms
                                         <char>
1:                    autism spectrum disorders
2:                    autism spectrum disorders
3:     autism spectrum disorders, schizophrenia
4:                    autism spectrum disorders
5: autism spectrum disorders, anxiety disorders
6:                    autism spectrum disorders

9.6 ADHD

gwas_study_info = gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = vec_to_grep_pattern("adhd"),
                                   "attention-deficit/hyperactivity disorder"
                          )  
        )

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("attention-deficit/hyperactivity disorder"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
                                                                                                                all_disease_terms
                                                                                                                           <char>
1:                                                        obsessive-compulsive disorder, attention deficit hyperactivity disorder
2:                                                                                       attention deficit hyperactivity disorder
3:                                            attention deficit hyperactivity disorder, oppositional defiant disorder measurement
4:                                                                     attention deficit hyperactivity disorder, conduct disorder
5: attention deficit hyperactivity disorder, bipolar disorder, autism spectrum disorder, schizophrenia, major depressive disorder
6:                                                                     attention deficit hyperactivity disorder, bipolar disorder
                                                                                                         l2_all_disease_terms
                                                                                                                       <char>
1:                                                                attention-deficit/hyperactivity disorder, anxiety disorders
2:                                                                                   attention-deficit/hyperactivity disorder
3:                                                    attention-deficit/hyperactivity disorder, oppositional defiant disorder
4:                                                                 attention-deficit/hyperactivity disorder, conduct disorder
5: attention-deficit/hyperactivity disorder, autism spectrum disorders, bipolar disorder, depressive disorders, schizophrenia
6:                                                                 attention-deficit/hyperactivity disorder, bipolar disorder

9.7 Conduct disorder

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("conduct disorder"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
                                                                           all_disease_terms
                                                                                      <char>
1:                                                                          conduct disorder
2:                                attention deficit hyperactivity disorder, conduct disorder
3: attention deficit hyperactivity disorder, conduct disorder, oppositional defiant disorder
                                                                        l2_all_disease_terms
                                                                                      <char>
1:                                                                          conduct disorder
2:                                attention-deficit/hyperactivity disorder, conduct disorder
3: attention-deficit/hyperactivity disorder, conduct disorder, oppositional defiant disorder

9.8 Idiopathic developmental intellectual disability

terms <- c("developmental disability")

gwas_study_info = gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = vec_to_grep_pattern(terms),
                                   "idiopathic developmental intellectual disability"
                          )  
        )

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("idiopathic developmental intellectual disability"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
          all_disease_terms                             l2_all_disease_terms
                     <char>                                           <char>
1: developmental disability idiopathic developmental intellectual disability

9.9 Other mental disorders

9.9.1 Personality disorders

url <- "http://www.ebi.ac.uk/ols4/api/ontologies/mondo/terms/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FMONDO_0002028/descendants"

personality_disorders <- get_descendants(url)
[1] "Number of terms collected:"
[1] 10
[1] "\n Some example terms"
[1] "obsessive-compulsive personality disorder"
[2] "narcissistic personality disorder"        
[3] "schizotypal personality disorder"         
[4] "histrionic personality disorder"          
[5] "antisocial personality disorder"          
gwas_study_info = gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms,
                                  pattern = vec_to_grep_pattern(personality_disorders),
                                   "personality disorders"
                          )  
        )

9.9.2 Mood disorders

url <- "http://www.ebi.ac.uk/ols4/api/ontologies/efo/terms/http%253A%252F%252Fwww.ebi.ac.uk%252Fefo%252FEFO_0004247/descendants"

mood_disorders <- get_descendants(url)
[1] "Number of terms collected:"
[1] 10
[1] "\n Some example terms"
[1] "mixed anxiety and depressive disorder"
[2] "treatment resistant depression"       
[3] "major depressive disorder"            
[4] "postpartum depression"                
[5] "depressive disorder"                  
gwas_study_info = gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms,
                                  pattern = vec_to_grep_pattern(mood_disorders),
                                   "mood disorder"
                          )  
        )

9.9.3 Sleep disorders

url <- "http://www.ebi.ac.uk/ols4/api/ontologies/doid/terms/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FDOID_535/descendants"

sleep_disorders <- get_descendants(url)
[1] "Number of terms collected:"
[1] 16
[1] "\n Some example terms"
[1] "periodic limb movement disorder" "advanced sleep phase syndrome 3"
[3] "advanced sleep phase syndrome 2" "advanced sleep phase syndrome 1"
[5] "advanced sleep phase syndrome 4"
url <- "http://www.ebi.ac.uk/ols4/api/ontologies/efo/terms/http%253A%252F%252Fwww.ebi.ac.uk%252Fefo%252FEFO_0008568/descendants"

other_sleep_disorders <- get_descendants(url)
[1] "Number of terms collected:"
[1] 26
[1] "\n Some example terms"
[1] "autosomal dominant cerebellar ataxia, deafness and narcolepsy"
[2] "hereditary sensory neuropathy-deafness-dementia syndrome"     
[3] "rapid eye movement sleep disorder"                            
[4] "substance-induced sleep disorder"                             
[5] "drug induced central sleep apnea"                             
sleep_disorders <- c(sleep_disorders,
                    other_sleep_disorders)

sleep_disorders <- str_length_sort(sleep_disorders)

gwas_study_info = gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms,
                                  pattern = vec_to_grep_pattern(sleep_disorders),
                                   "sleep disorders"
                          )  
        )
other_mental_disorders <- c("manic or hypomanic episode",
                            "mental or behavioural disorder",
                            "mental disorder",
                            "post-traumatic stress disorder",
                            "stress-related disorder",
                            "acute stress reaction",
                            "occupation-related stress disorder",
                            "psychotic symptom",
                            "psychosis",
                            "psychiatric disorder",
                             "personality disorders",
                            "personality disorder",
                            "mood disorder",
                            "sleep disorders",
                            "sleep disorder",
                            "mixed anxiety disorders and depressive disorders",
                            "emotional symptom",
                            "dissociative disorder",
                            "hallucinations",
                            "somatoform disorder",
                            "schizoaffective disorder",
                            "phobic disorder",
                            "psychotic"
                            
)


gwas_study_info =
gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = vec_to_grep_pattern(other_mental_disorders),
                                   "other mental disorders"
                          )  
        )

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("other mental disorders"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
                                            all_disease_terms
                                                       <char>
1:                                           bipolar disorder
2: major depressive disorder, schizophrenia, bipolar disorder
3:                major depressive disorder, bipolar disorder
4:                            schizophrenia, bipolar disorder
5:                bipolar disorder, major depressive disorder
6:                             bipolar disorder, binge eating
                                          l2_all_disease_terms
                                                        <char>
1:                                      other mental disorders
2: other mental disorders, depressive disorders, schizophrenia
3:                other mental disorders, depressive disorders
4:                       other mental disorders, schizophrenia
5:                other mental disorders, depressive disorders
6:                    eating disorders, other mental disorders

10 Substance use disorders

10.1 Alcohol use disorders

gwas_study_info = 
gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms,
                                  pattern = 
                                    vec_to_grep_pattern(
                                      c("alcohol-related disorders",
                                        "alcohol and nicotine codependence",
                                        "alcohol use disorder"
                                      )),
                                   "alcohol use disorders"
                          )  
        )


gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("alcohol use disorders"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
                                                       all_disease_terms
                                                                  <char>
1:                                                    alcohol dependence
2: alcoholic liver cirrhosis, alcoholic pancreatitis, alcohol dependence
3:                                        alcohol dependence measurement
4:                                     alcohol and nicotine codependence
5:                                                    alcohol withdrawal
6:                                    age of onset of alcohol dependence
                                           l2_all_disease_terms
                                                         <char>
1:                                        alcohol use disorders
2: alcohol use disorders, alcoholic liver disease, pancreatitis
3:                                        alcohol use disorders
4:                                        alcohol use disorders
5:                                        alcohol use disorders
6:                                        alcohol use disorders

10.2 Opioid use disorders

gwas_study_info = 
gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = vec_to_grep_pattern(
                                    c("opioid-related disorders",
                                      "opioid dependence",
                                      "opioid use disorder"
                                      )
                                    ),
                                   "opioid use disorders"
                          )  
        )

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("opioid use disorders"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
      all_disease_terms l2_all_disease_terms
                 <char>               <char>
1:    opioid dependence opioid use disorders
2:  opioid use disorder opioid use disorders
3: opioid use disorder, opioid use disorders

10.3 Cocaine use disorders

gwas_study_info = gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = vec_to_grep_pattern("cocaine-related disorders"),
                                   "cocaine use disorders"
                          )  
        )

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("cocaine use disorders"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
      all_disease_terms  l2_all_disease_terms
                 <char>                <char>
1:   cocaine dependence cocaine use disorders
2: cocaine use disorder cocaine use disorders

10.4 Amphetamine use disorders

gwas_study_info = 
gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = "methamphetamine",
                                   "amphetamine"
                          )  
        )

gwas_study_info |>
 filter(grepl("amphetamine", 
              l2_all_disease_terms)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
                                                   all_disease_terms
                                                              <char>
1:                                        methamphetamine dependence
2:                                 methamphetamine-induced psychosis
3: alcohol dependence, heroin dependence, methamphetamine dependence
                                                  l2_all_disease_terms
                                                                <char>
1:                                           amphetamine use disorders
2:                                           amphetamine use disorders
3: alcohol use disorders, heroin dependence, amphetamine use disorders

10.5 Cannabis use disorders

gwas_study_info = 
gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = vec_to_grep_pattern("cannabis dependence"),
                                   "cannabis use disorders"
                          )  
        )

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("cannabis use disorders"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
                                     all_disease_terms
                                                <char>
1:                                 cannabis dependence
2:                     cannabis dependence measurement
3: cannabis dependence, schizophrenia, substance abuse
4:                cannabis dependence, substance abuse
5:      hiv infection, cannabis dependence measurement
                                     l2_all_disease_terms
                                                   <char>
1:                                 cannabis use disorders
2:                                 cannabis use disorders
3: cannabis use disorders, schizophrenia, substance abuse
4:                cannabis use disorders, substance abuse
5:                                 cannabis use disorders

10.6 Other drug use disorders

other_drug_use_terms <- c("heroin dependence",
                          "drug dependence",
                          "nictone dependence",
                          "substance abuse",
                          "drug misuse",
                          "nicotine-related disorders"
                          )

gwas_study_info = 
gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = vec_to_grep_pattern(other_drug_use_terms),
                                   "other drug use disorders"
                          )  
        )

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("other drug use disorders"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
                   all_disease_terms     l2_all_disease_terms
                              <char>                   <char>
1: nicotine dependence symptom count other drug use disorders
2:               nicotine dependence other drug use disorders
3: nicotine withdrawal symptom count other drug use disorders
4:   nicotine withdrawal measurement other drug use disorders
5:                 heroin dependence other drug use disorders
6:                   drug dependence other drug use disorders

11 Diabetes and kidney diseases

11.1 Diabetes mellitus type 1

gwas_study_info =
gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = vec_to_grep_pattern("type 1 diabetes mellitus"),
                                   "diabetes mellitus type 1"
                          )  
        )

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("diabetes mellitus type 1"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
                                                                                                                                                                                                                                                all_disease_terms
                                                                                                                                                                                                                                                           <char>
1:                                                                                                                                                                                                                                       type 1 diabetes mellitus
2: autoimmune thyroid disease, systemic lupus erythematosus, type 1 diabetes mellitus, ankylosing spondylitis, psoriasis, common variable immunodeficiency, celiac disease, ulcerative colitis, crohns disease, autoimmune disease, juvenile idiopathic arthritis
3:                                                                                                                  rheumatoid arthritis, type 1 diabetes mellitus, psoriasis, celiac disease, ulcerative colitis, crohns disease, progressive supranuclear palsy
4:                                                                                                                   rheumatoid arthritis, amyotrophic lateral sclerosis, type 1 diabetes mellitus, psoriasis, celiac disease, ulcerative colitis, crohns disease
5:                                                                                                              rheumatoid arthritis, type 1 diabetes mellitus, corticobasal degeneration disorder, psoriasis, celiac disease, ulcerative colitis, crohns disease
6:                                                                                                                                                                                  type 1 diabetes mellitus, type 2 diabetes mellitus, neuropathy, diabetic foot
                                                                                                                                                                                                                                                                           l2_all_disease_terms
                                                                                                                                                                                                                                                                                         <char>
1:                                                                                                                                                                                                                                                                     diabetes mellitus type 1
2: ankylosing spondylitis, autoimmune disease, autoimmune thyroid disease, other digestive diseases, common variable immunodeficiency, inflammatory bowel disease, juvenile idiopathic arthritis, psoriasis, systemic lupus erythematosus, diabetes mellitus type 1, inflammatory bowel disease
3:                                                                                                                  other digestive diseases, inflammatory bowel disease, progressive supranuclear palsy, psoriasis, rheumatoid arthritis, diabetes mellitus type 1, inflammatory bowel disease
4:                                                                                                                   amyotrophic lateral sclerosis, other digestive diseases, inflammatory bowel disease, psoriasis, rheumatoid arthritis, diabetes mellitus type 1, inflammatory bowel disease
5:                                                                                                              other digestive diseases, corticobasal degeneration disorder, inflammatory bowel disease, psoriasis, rheumatoid arthritis, diabetes mellitus type 1, inflammatory bowel disease
6:                                                                                                                                                                                                                diabetic foot, neuropathy, diabetes mellitus type 1, type 2 diabetes mellitus

11.2 Diabetes mellitus type 2

gwas_study_info =
gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = vec_to_grep_pattern("type 2 diabetes mellitus"),
                                   "diabetes mellitus type 2"
                          )  
        )

gwas_study_info |>
 filter(grepl("diabetes mellitus type 2", 
              l2_all_disease_terms)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
                                                      all_disease_terms
                                                                 <char>
1:                                             type 2 diabetes mellitus
2:                         type 2 diabetes mellitus, multiple sclerosis
3:                       type 2 diabetes mellitus, diabetic maculopathy
4:           type 2 diabetes mellitus, obesity, coronary artery disease
5:                              schizophrenia, type 2 diabetes mellitus
6: alzheimer disease, type 2 diabetes mellitus, coronary artery disease
                                                                        l2_all_disease_terms
                                                                                      <char>
1:                                                                  diabetes mellitus type 2
2:                                              multiple sclerosis, diabetes mellitus type 2
3:                                            diabetic eye disease, diabetes mellitus type 2
4:                                 ischemic heart disease, obesity, diabetes mellitus type 2
5:                                                   schizophrenia, diabetes mellitus type 2
6: alzheimer's disease and other dementias, ischemic heart disease, diabetes mellitus type 2

11.3 Chronic kidney disease

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("chronic kidney disease"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
                                                        all_disease_terms
                                                                   <char>
1:                                                 chronic kidney disease
2:                                    chronic kidney disease, proteinuria
3:                         diabetic nephropathy, type 2 diabetes mellitus
4:                                                   diabetic nephropathy
5: type 1 diabetes mellitus, chronic kidney disease, diabetic nephropathy
6:                         type 1 diabetes mellitus, diabetic nephropathy
                               l2_all_disease_terms
                                             <char>
1:                           chronic kidney disease
2:              chronic kidney disease, proteinuria
3: chronic kidney disease, diabetes mellitus type 2
4:                           chronic kidney disease
5: chronic kidney disease, diabetes mellitus type 1
6: chronic kidney disease, diabetes mellitus type 1

11.4 Acute glomerulonephritis

glomerulonephritis_terms <- c("chronic glomerulonephritis",
                              "membranous glomerulonephritis",
                              "proliferative glomerulonephritis")

gwas_study_info |>
  mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = vec_to_grep_pattern(glomerulonephritis_terms),
                                   "glomerulonephritis"
                          )  
        )
        DATE_ADDED_TO_CATALOG PUBMED_ID FIRST_AUTHOR       DATE
                       <IDat>     <int>       <char>     <IDat>
     1:            2018-01-17  29093273    Almgren P 2017-11-02
     2:            2018-01-17  29093273    Almgren P 2017-11-02
     3:            2018-01-17  29093273    Almgren P 2017-11-02
     4:            2018-01-17  29093273    Almgren P 2017-11-02
     5:            2018-01-17  29093273    Almgren P 2017-11-02
    ---                                                        
142851:            2024-09-12  37984853   Eissman JM 2023-11-20
142852:            2024-09-12  37984853   Eissman JM 2023-11-20
142853:            2024-09-12  37984853   Eissman JM 2023-11-20
142854:            2024-09-12  37984853   Eissman JM 2023-11-20
142855:            2024-09-12  37984853   Eissman JM 2023-11-20
                  JOURNAL                                 LINK
                   <char>                               <char>
     1:       JCI Insight www.ncbi.nlm.nih.gov/pubmed/29093273
     2:       JCI Insight www.ncbi.nlm.nih.gov/pubmed/29093273
     3:       JCI Insight www.ncbi.nlm.nih.gov/pubmed/29093273
     4:       JCI Insight www.ncbi.nlm.nih.gov/pubmed/29093273
     5:       JCI Insight www.ncbi.nlm.nih.gov/pubmed/29093273
    ---                                                       
142851: Alzheimers Dement www.ncbi.nlm.nih.gov/pubmed/37984853
142852: Alzheimers Dement www.ncbi.nlm.nih.gov/pubmed/37984853
142853: Alzheimers Dement www.ncbi.nlm.nih.gov/pubmed/37984853
142854: Alzheimers Dement www.ncbi.nlm.nih.gov/pubmed/37984853
142855: Alzheimers Dement www.ncbi.nlm.nih.gov/pubmed/37984853
                                                                     STUDY
                                                                    <char>
     1:  Genetic determinants of circulating GIP and GLP-1 concentrations.
     2:  Genetic determinants of circulating GIP and GLP-1 concentrations.
     3:  Genetic determinants of circulating GIP and GLP-1 concentrations.
     4:  Genetic determinants of circulating GIP and GLP-1 concentrations.
     5:  Genetic determinants of circulating GIP and GLP-1 concentrations.
    ---                                                                   
142851: Sex-specific genetic architecture of late-life memory performance.
142852: Sex-specific genetic architecture of late-life memory performance.
142853: Sex-specific genetic architecture of late-life memory performance.
142854: Sex-specific genetic architecture of late-life memory performance.
142855: Sex-specific genetic architecture of late-life memory performance.
                                                                 DISEASE/TRAIT
                                                                        <char>
     1:    Insulin levels in response to oral glucose tolerance test (fasting)
     2:   Glucagon levels in response to oral glucose tolerance test (fasting)
     3:        GIP levels in response to oral glucose tolerance test (fasting)
     4:      GLP-1 levels in response to oral glucose tolerance test (fasting)
     5: Insulin levels in response to oral glucose tolerance test (30 minutes)
    ---                                                                       
142851:                  Baseline memory in normal cognition x sex interaction
142852:                                      Baseline memory x sex interaction
142853:                Baseline memory in impaired cognition x sex interaction
142854:                  Baseline memory in normal cognition x sex interaction
142855:                                      Baseline memory x sex interaction
                                                             INITIAL_SAMPLE_SIZE
                                                                          <char>
     1:                                       3,344 Swedish ancestry individuals
     2:                                       3,344 Swedish ancestry individuals
     3:                                       3,344 Swedish ancestry individuals
     4:                                       3,344 Swedish ancestry individuals
     5:                                       3,344 Swedish ancestry individuals
    ---                                                                         
142851: 12,789 European ancestry individuals, 1,775 African ancestry individuals
142852:                                       3,367 African ancestry individuals
142853:                                       1,242 African ancestry individuals
142854:                                       1,775 African ancestry individuals
142855:                                     24,216 European ancestry individuals
                   REPLICATION_SAMPLE_SIZE          PLATFORM_[SNPS_PASSING_QC]
                                    <char>                              <char>
     1: 4,905 Finnish ancestry individuals           Illumina [at least 81396]
     2: 4,905 Finnish ancestry individuals           Illumina [at least 81396]
     3: 4,905 Finnish ancestry individuals           Illumina [at least 81396]
     4: 4,905 Finnish ancestry individuals           Illumina [at least 81396]
     5: 4,905 Finnish ancestry individuals           Illumina [at least 81396]
    ---                                                                       
142851:                                    Affymetrix, Illumina [NR] (imputed)
142852:                                    Affymetrix, Illumina [NR] (imputed)
142853:                                    Affymetrix, Illumina [NR] (imputed)
142854:                                    Affymetrix, Illumina [NR] (imputed)
142855:                                    Affymetrix, Illumina [NR] (imputed)
        ASSOCIATION_COUNT
                    <int>
     1:                 1
     2:                 4
     3:                 2
     4:                 4
     5:                 3
    ---                  
142851:                56
142852:                55
142853:                37
142854:                48
142855:               106
                                                                        MAPPED_TRAIT
                                                                              <char>
     1:                                  insulin measurement, glucose tolerance test
     2:                                 glucagon measurement, glucose tolerance test
     3: glucose-dependent insulinotropic peptide measurement, glucose tolerance test
     4:                  glucagon-like peptide-1 measurement, glucose tolerance test
     5:                                  insulin measurement, glucose tolerance test
    ---                                                                             
142851:                              memory performance, sex interaction measurement
142852:                              memory performance, sex interaction measurement
142853:                              memory performance, sex interaction measurement
142854:                              memory performance, sex interaction measurement
142855:                              memory performance, sex interaction measurement
                                                                  MAPPED_TRAIT_URI
                                                                            <char>
     1: http://www.ebi.ac.uk/efo/EFO_0004467, http://www.ebi.ac.uk/efo/EFO_0004307
     2: http://www.ebi.ac.uk/efo/EFO_0008463, http://www.ebi.ac.uk/efo/EFO_0004307
     3: http://www.ebi.ac.uk/efo/EFO_0008464, http://www.ebi.ac.uk/efo/EFO_0004307
     4: http://www.ebi.ac.uk/efo/EFO_0008465, http://www.ebi.ac.uk/efo/EFO_0004307
     5: http://www.ebi.ac.uk/efo/EFO_0004467, http://www.ebi.ac.uk/efo/EFO_0004307
    ---                                                                           
142851: http://www.ebi.ac.uk/efo/EFO_0004874, http://www.ebi.ac.uk/efo/EFO_0008343
142852: http://www.ebi.ac.uk/efo/EFO_0004874, http://www.ebi.ac.uk/efo/EFO_0008343
142853: http://www.ebi.ac.uk/efo/EFO_0004874, http://www.ebi.ac.uk/efo/EFO_0008343
142854: http://www.ebi.ac.uk/efo/EFO_0004874, http://www.ebi.ac.uk/efo/EFO_0008343
142855: http://www.ebi.ac.uk/efo/EFO_0004874, http://www.ebi.ac.uk/efo/EFO_0008343
        STUDY_ACCESSION        GENOTYPING_TECHNOLOGY SUBMISSION_DATE
                 <char>                       <char>          <lgcl>
     1:      GCST005159 Genome-wide genotyping array              NA
     2:      GCST005162 Genome-wide genotyping array              NA
     3:      GCST005167 Genome-wide genotyping array              NA
     4:      GCST005164 Genome-wide genotyping array              NA
     5:      GCST005160 Genome-wide genotyping array              NA
    ---                                                             
142851:    GCST90448438 Genome-wide genotyping array              NA
142852:    GCST90448439 Genome-wide genotyping array              NA
142853:    GCST90448440 Genome-wide genotyping array              NA
142854:    GCST90448441 Genome-wide genotyping array              NA
142855:    GCST90448442 Genome-wide genotyping array              NA
        STATISTICAL_MODEL BACKGROUND_TRAIT          MAPPED_BACKGROUND_TRAIT
                   <lgcl>           <lgcl>                           <char>
     1:                NA               NA                                 
     2:                NA               NA                                 
     3:                NA               NA                                 
     4:                NA               NA                                 
     5:                NA               NA                                 
    ---                                                                    
142851:                NA               NA                                 
142852:                NA               NA                                 
142853:                NA               NA cognitive impairment measurement
142854:                NA               NA                                 
142855:                NA               NA                                 
                 MAPPED_BACKGROUND_TRAIT_URI                COHORT
                                      <char>                <char>
     1:                                                           
     2:                                                           
     3:                                                           
     4:                                                           
     5:                                                           
    ---                                                           
142851:                                      ADNI|NACC|ROSMAP|MARS
142852:                                      ADNI|NACC|ROSMAP|MARS
142853: http://www.ebi.ac.uk/efo/EFO_0007998 ADNI|NACC|ROSMAP|MARS
142854:                                      ADNI|NACC|ROSMAP|MARS
142855:                                      ADNI|NACC|ROSMAP|MARS
        FULL_SUMMARY_STATISTICS SUMMARY_STATS_LOCATION    GXE disease_terms
                         <char>                 <char> <char>        <char>
     1:                      no                            no              
     2:                      no                            no              
     3:                      no                            no              
     4:                      no                            no              
     5:                      no                            no              
    ---                                                                    
142851:                      no                           yes              
142852:                      no                           yes              
142853:                      no                           yes              
142854:                      no                           yes              
142855:                      no                           yes              
        MAPPED_TRAIT_CATEGORY background_disease_terms
                       <char>                   <char>
     1:           Measurement                         
     2:           Measurement                         
     3:           Measurement                         
     4:           Measurement                         
     5:           Measurement                         
    ---                                               
142851:           Measurement                         
142852:           Measurement                         
142853:           Measurement                         
142854:           Measurement                         
142855:           Measurement                         
        BACKGROUND_TRAIT_CATEGORY DISEASE_STUDY all_disease_terms
                           <char>        <lgcl>            <char>
     1:                     Other         FALSE                  
     2:                     Other         FALSE                  
     3:                     Other         FALSE                  
     4:                     Other         FALSE                  
     5:                     Other         FALSE                  
    ---                                                          
142851:                     Other         FALSE                  
142852:                     Other         FALSE                  
142853:               Measurement         FALSE                  
142854:                     Other         FALSE                  
142855:                     Other         FALSE                  
        collected_all_disease_terms l1_all_disease_terms l2_all_disease_terms
                             <char>               <char>               <char>
     1:                                                                      
     2:                                                                      
     3:                                                                      
     4:                                                                      
     5:                                                                      
    ---                                                                      
142851:                                                                      
142852:                                                                      
142853:                                                                      
142854:                                                                      
142855:                                                                      
gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("acute glomerulonephritis"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
Empty data.table (0 rows and 2 cols): all_disease_terms,l2_all_disease_terms

12 Skin and subcutaneous diseases

12.1 Dermatitis

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("dermatitis"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
                  all_disease_terms l2_all_disease_terms
                             <char>               <char>
1:              pemphigus foliaceus           dermatitis
2:                        pemphigus           dermatitis
3:               pemphigus vulgaris           dermatitis
4:            seborrheic dermatitis           dermatitis
5:                    atopic eczema           dermatitis
6: contact dermatitis due to nickel           dermatitis

12.2 Psoriasis

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("psoriasis"),
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
                                                                                                                                                                                                                                                all_disease_terms
                                                                                                                                                                                                                                                           <char>
1:                                                                                                                                                                                                                                                      psoriasis
2: autoimmune thyroid disease, systemic lupus erythematosus, type 1 diabetes mellitus, ankylosing spondylitis, psoriasis, common variable immunodeficiency, celiac disease, ulcerative colitis, crohns disease, autoimmune disease, juvenile idiopathic arthritis
3:                                                                                                                                                                                                                     cutaneous psoriasis measurement, psoriasis
4:                                                                                                                                                                                                                                             psoriasis vulgaris
5:                                                                                                                                                                                                                                            psoriatic arthritis
6:                                                                                                                                                                                                cutaneous psoriasis measurement, psoriatic arthritis, psoriasis
                                                                                                                                                                                                                                                                           l2_all_disease_terms
                                                                                                                                                                                                                                                                                         <char>
1:                                                                                                                                                                                                                                                                                    psoriasis
2: ankylosing spondylitis, autoimmune disease, autoimmune thyroid disease, other digestive diseases, common variable immunodeficiency, inflammatory bowel disease, juvenile idiopathic arthritis, psoriasis, systemic lupus erythematosus, diabetes mellitus type 1, inflammatory bowel disease
3:                                                                                                                                                                                                                                                                                    psoriasis
4:                                                                                                                                                                                                                                                                                    psoriasis
5:                                                                                                                                                                                                                                                                                    psoriasis
6:                                                                                                                                                                                                                                                                                    psoriasis

12.3 Bacterial skin diseases

bacterial_skin_disease_terms <- c("staphylococcal skin infections",
                                  "skin and soft tissue staphylococcus aureus infection",
                                  "cellulitis"
                                  # "skin infection"
                                  )

gwas_study_info = gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = vec_to_grep_pattern(bacterial_skin_disease_terms),
                                   "bacterial skin diseases"
                          )  
        )

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("bacterial skin diseases"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
                                      all_disease_terms
                                                 <char>
1: skin and soft tissue staphylococcus aureus infection
2:                       staphylococcal skin infections
3:                                  cellulitis, abscess
4:                                           cellulitis
5:                             cellulitis, lymphangitis
                    l2_all_disease_terms
                                  <char>
1:               bacterial skin diseases
2:               bacterial skin diseases
3:      abscess, bacterial skin diseases
4:               bacterial skin diseases
5: bacterial skin diseases, lymphangitis

12.4 Scabies

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("scabies"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
Empty data.table (0 rows and 2 cols): all_disease_terms,l2_all_disease_terms

12.5 Fungal skin diseases

fungal_skin_disease_terms <- c("tinea",
                               "dermatomycosis")

gwas_study_info = gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = vec_to_grep_pattern(fungal_skin_disease_terms),
                                   "fungal skin diseases"
                          )  
        )

gwas_study_info |>
 filter(grepl("fungal skin diseases", l2_all_disease_terms)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
                 all_disease_terms                       l2_all_disease_terms
                            <char>                                     <char>
1: dermatomycosis, dermatophytosis fungal skin diseases, fungal skin diseases
2:                 dermatophytosis                       fungal skin diseases
3:                   tinea unguium                       fungal skin diseases
4:                     tinea pedis                       fungal skin diseases
5:                  dermatomycosis                       fungal skin diseases
6:                           tinea                       fungal skin diseases

12.6 Viral skin diseases

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("viral skin diseases"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
Empty data.table (0 rows and 2 cols): all_disease_terms,l2_all_disease_terms

12.7 Acne vulgaris

gwas_study_info = gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = vec_to_grep_pattern("acne"),
                                   "acne vulgaris"
                          )  
        )

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("acne vulgaris"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
   all_disease_terms l2_all_disease_terms
              <char>               <char>
1:              acne        acne vulgaris

12.8 Pruritus

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("pruritus"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
                         all_disease_terms
                                    <char>
1: non-alcoholic steatohepatitis, pruritus
2:                                pruritus
                                   l2_all_disease_terms
                                                 <char>
1: cirrhosis and other chronic liver diseases, pruritus
2:                                             pruritus

12.9 Urticaria

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("urticaria"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
       all_disease_terms  l2_all_disease_terms
                  <char>                <char>
1: urticaria, angioedema angioedema, urticaria
2:             urticaria             urticaria

12.10 Decubitus ulcer

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("decubitus ulcer"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
   all_disease_terms l2_all_disease_terms
              <char>               <char>
1:   decubitus ulcer      decubitus ulcer

12.11 Other skin and subcutaneous diseases

other_skin_disease_terms <- c("sebaceous gland disease",
                              "rosacea",
                              "erythematosquamous dermatosis",
                              "dry skin",
                              "skin tags",
                              "dermatochalasis",
                              "epidermal thickening",
                              "epidermal inclusion cyst",
                              "cutaneous lupus erythematosus",
                              "androgenetic alopecia",
                              "chemotherapy-induced alopecia",
                              "cutaneous leishmaniasis")

gwas_study_info = gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = vec_to_grep_pattern(other_skin_disease_terms),
                                   "other skin and subcutaneous diseases"
                          )
 )
                                   

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("other skin and subcutaneous diseases"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
                                 all_disease_terms
                                            <char>
1:                           androgenetic alopecia
2:                   chemotherapy-induced alopecia
3:                    rosacea severity measurement
4:                   cutaneous lupus erythematosus
5:                                         rosacea
6: breast carcinoma, chemotherapy-induced alopecia
                                  l2_all_disease_terms
                                                <char>
1:                other skin and subcutaneous diseases
2:                other skin and subcutaneous diseases
3:                other skin and subcutaneous diseases
4:                other skin and subcutaneous diseases
5:                other skin and subcutaneous diseases
6: breast cancer, other skin and subcutaneous diseases

13 Sense organ diseases

13.1 Blindness and vision loss

vision_loss_terms <- c("blindness",
                       "color vision disorder",
                       "vision disorder",
                       "visuospatial impairment",
                       "pathological blindness and vision loss",
                       "visual impairment",
                       "myopia",
                       "refractive error",
                       "hyperopia",
                       "astigmatism",
                       "corneal astigmatism",
                       "presbyopia",
                       "anisometropia",
                       "esotropia",
                       "non-accomodative esotropia",
                       "accommodative esotropia",
                       "abnormality of refraction",
                       "abnormality of vision",
                       "age-related macular degeneration",
                       "degeneration of macula and posterior pole",
                       "age-related cataract",
                       "cataract")

gwas_study_info = gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = vec_to_grep_pattern(vision_loss_terms),
                                   "blindness and vision loss"
                          )  
        )

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("blindness and vision loss"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
                                                                           all_disease_terms
                                                                                      <char>
1:                                                                       corneal astigmatism
2:                                                                  wet macular degeneration
3:                                                                          refractive error
4:                                                          age-related macular degeneration
5: atrophic macular degeneration, age-related macular degeneration, wet macular degeneration
6:                                                             atrophic macular degeneration
        l2_all_disease_terms
                      <char>
1: blindness and vision loss
2: blindness and vision loss
3: blindness and vision loss
4: blindness and vision loss
5: blindness and vision loss
6: blindness and vision loss

13.3 Other sense organ diseases

other_sense_terms <- c("abnormality of the sense of smell",
                       "disturbances of sensation of smell and taste")

gwas_study_info =
gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = vec_to_grep_pattern(other_sense_terms),
                                   "other sense organ diseases"
                          )  
        )

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("other sense organ diseases"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
Empty data.table (0 rows and 2 cols): all_disease_terms,l2_all_disease_terms

15 Other neurological disorders

other_neuro <- c("mild neurocognitive disorder",
                 "hiv-associated neurocognitive disorder")

16 Other non-communicable diseases

16.1 Congenital birth defects

16.1.1 Neural tube defects

16.1.2 Congenital heart defects

16.1.3 Orofacial clefts

gwas_study_info = gwas_study_info |>
  mutate(l2_all_disease_terms = 
           stringr::str_replace_all(
             l2_all_disease_terms,
             vec_to_grep_pattern("orofacial cleft"),
             "orofacial clefts"
           )
  )

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("orofacial clefts"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
            all_disease_terms l2_all_disease_terms
                       <char>               <char>
1:                  cleft lip     orofacial clefts
2:               cleft palate     orofacial clefts
3:            orofacial cleft     orofacial clefts
4: orofacial cleft, cleft lip     orofacial clefts
5:    cleft palate, cleft lip     orofacial clefts

16.1.4 Down syndrome

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("down syndrome"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
                              all_disease_terms
                                         <char>
1: down syndrome, atrioventricular canal defect
2:                                down syndrome
                                         l2_all_disease_terms
                                                       <char>
1: congenital anomaly of cardiovascular system, down syndrome
2:                                              down syndrome

16.1.5 Klinefelter syndrome

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("klinefelter syndrome"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
Empty data.table (0 rows and 2 cols): all_disease_terms,l2_all_disease_terms
icd10 <- readxl::read_xlsx(here::here("data/icd/cdc_valid_icd10_Sep_23_2025.xlsx"))

icd10 <- icd10 |>
  rename_with(~ tolower(gsub(" ", "_", .x))) 

16.1.6 Other chromosomal abnormalities

# Q87-Q87.8, Q91-Q93.9, Q95-Q95.9, Q97-Q97.9, Q99-Q99.8

other_chrom_abn_icd10 <-
  c(paste0("Q", seq(87, 87.8, by = 0.1) * 10),
    paste0("Q", seq(91, 93.9, by = 0.1) * 10),
    paste0("Q", seq(95, 95.9, by = 0.1) * 10),
    paste0("Q", seq(97, 97.9, by = 0.1) * 10),
    paste0("Q", seq(99, 99.8, by = 0.1) * 10)
    )

other_chrom_abn_terms <-
  icd10 |>
  filter(grepl(paste0(other_chrom_abn_icd10, collapse = "|"), code)) |>
  pull(`short_description_(valid_icd-10_fy2025)`) |>
  unique()

other_chrom_abn <- c("marfran syndrome",
                     "fragile x syndrome",
                     "22q11.2 deletion syndrome",
                     "chromosomal disorder")

url <- "http://www.ebi.ac.uk/ols4/api/ontologies/mondo/terms/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FMONDO_0019040/descendants"

chromosomal_disorders <- get_descendants(url)
[1] "Number of terms collected:"
[1] 587
[1] "\n Some example terms"
[1] "neurodevelopmental disorder-craniofacial dysmorphism-cardiac defect-hip dysplasia syndrome due to 9q21 microdeletion"
[2] "wilms tumor, aniridia, genitourinary anomalies, intellectual disability, and obesity syndrome"                       
[3] "severe neonatal hypotonia-seizures-encephalopathy syndrome due to 5q31.3 microdeletion"                              
[4] "severe motor and intellectual disabilities-sensorineural deafness-dystonia syndrome"                                 
[5] "syndrome caused by partial chromosomal duplication of the short arm of chromosome 9"                                 
other_chrom_abn <- c(other_chrom_abn, chromosomal_disorders)

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          vec_to_grep_pattern(c(other_chrom_abn_terms, other_chrom_abn)),
                          "other chromosomal abnormalities"
         )
  )

16.1.7 Congenital heart abnormalities

# Q20-Q28.9
congen_heart_icd10 <-
  paste0("Q", seq(20, 28.9, by = 0.1) * 10) 


congen_terms <-
  icd10 |>
  filter(grepl(paste0(congen_heart_icd10, collapse = "|"), code)) |>
  pull(`short_description_(valid_icd-10_fy2025)`) |>
  unique()

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          "congenital anomaly of cardiovascular system",
                          "congenital heart abnormalities"
         )
  )

16.1.8 Congenital musculoskeletal and limb anomalies

# Q65-Q79, Q79.6-Q79.9
congen_musculo_icd10 <-
  c(paste0("Q", seq(65, 79, by = 0.1) * 10),
    paste0("Q", seq(79.6, 79.9, by = 0.1) * 10)
    )

congen_musculo_terms <-
  icd10 |>
  filter(grepl(paste0(congen_musculo_icd10, collapse = "|"), code)) |>
  pull(`short_description_(valid_icd-10_fy2025)`) |>
  unique()


congen_muscle <- c("osteochondrodysplasia",
                   "congenital deformities of limbs",
                   "lower limb asymmetry",
                   "familial clubfoot with or without associated lower limb anomalies",
                   "abnormality of limbs",
                   "abnormal foot morphology",
                   congen_musculo_terms
                   )


gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          vec_to_grep_pattern(c(congen_musculo_terms, congen_muscle)),
                          "congenital musculoskeletal and limb anomalies"
         )
  )

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("congenital musculoskeletal and limb anomalies"), 
              l1_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l1_all_disease_terms) |>
  distinct() |>
  head()
                                                   all_disease_terms
                                                              <char>
1:     charcot-marie-tooth disease type 1a, abnormal foot morphology
2: familial clubfoot with or without associated lower limb anomalies
3:                                   congenital deformities of limbs
4:                                          abnormal foot morphology
5:                                              abnormality of limbs
6:                                              lower limb asymmetry
                                                             l1_all_disease_terms
                                                                           <char>
1: congenital musculoskeletal and limb anomalies, other chromosomal abnormalities
2:                                  congenital musculoskeletal and limb anomalies
3:                                  congenital musculoskeletal and limb anomalies
4:                                  congenital musculoskeletal and limb anomalies
5:                                  congenital musculoskeletal and limb anomalies
6:                                  congenital musculoskeletal and limb anomalies

16.1.9 Urogenital congenital anomalies

# P96.0, Q50-Q60.6, Q63-Q64.9

urogenital_icd10 <-
  c("P960",
    paste0("Q", seq(50, 60.6, by = 0.1) * 10),
    paste0("Q", seq(63, 64.9, by = 0.1) * 10)
    )

urogenital_terms <-
  icd10 |>
  filter(grepl(paste0(urogenital_icd10, collapse = "|"), code)) |>
  pull(`short_description_(valid_icd-10_fy2025)`) |>
  unique()


urogenital_terms <- 
  c("abnormal morphology of female internal genitalia",
    "abnormality of the genital system",
    "functional abnormality of the bladder",
    "congenital anomaly of kidney and urinary tract",
    "bladder exstrophy", # ICD9:753.5
    tolower(urogenital_terms))

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          vec_to_grep_pattern(urogenital_terms),
                          "urogenital congenital anomalies"
         )
  )


gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("urogenital congenital anomalies"), 
              l1_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l1_all_disease_terms) |>
  distinct() |>
  head()
                                     all_disease_terms
                                                <char>
1:                                   bladder exstrophy
2: behcets syndrome, abnormality of the genital system
3:                   abnormality of the genital system
4:      congenital anomaly of kidney and urinary tract
5:               functional abnormality of the bladder
6:    abnormal morphology of female internal genitalia
                                l1_all_disease_terms
                                              <char>
1:                   urogenital congenital anomalies
2: urogenital congenital anomalies, behcets syndrome
3:                   urogenital congenital anomalies
4:                   urogenital congenital anomalies
5:                   urogenital congenital anomalies
6:                   urogenital congenital anomalies

16.1.10 Digestive congenital anomalies

# Q38-Q45.9, Q79.0-Q79.5

digestive_icd10 <-
  c(paste0("Q", seq(38, 45.9, by = 0.1) * 10),
    paste0("Q", seq(79, 79.5, by = 0.1) * 10)
    )

digestive_terms <-
  icd10 |>
  filter(grepl(paste0(digestive_icd10, collapse = "|"), code)) |>
  pull(`short_description_(valid_icd-10_fy2025)`) |>
  unique()

digestive_terms <- tolower(digestive_terms)
digestive_terms <- stringr::str_remove_all(digestive_terms, ", unspecified$")

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          vec_to_grep_pattern(tolower(digestive_terms)),
                          "digestive congenital anomalies"
         )
  )


gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("digestive congenital anomalies"), 
              l1_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l1_all_disease_terms) |>
  distinct() |>
  head()
                  all_disease_terms
                             <char>
1: alagille syndrome, liver disease
                            l1_all_disease_terms
                                          <char>
1: digestive congenital anomalies, liver disease

16.2 Urinary diseases and male infertility

16.2.1 Urinary tract infections and interstitial nephritis

# N10-N12.9, N13.6, N15, N15.1-N16.8, N30-N30.3, N30.8-N30.9, N34-N34.3, N39.0-N39.2

uti_icd10 <-
  c(paste0("N", seq(10, 12.9, by = 0.1) * 10),
    "N136",
    paste0("N", seq(151, 168, by = 0.1) * 10),
    paste0("N", seq(30, 30.3, by = 0.1) * 10),
    paste0("N", seq(308, 309, by = 0.1) * 10),
    paste0("N", seq(34, 34.3, by = 0.1) * 10),
    paste0("N", seq(390, 392, by = 0.1) * 10)
    )


uti_terms <-
  icd10 |>
  filter(grepl(paste0(uti_icd10, collapse = "|"), code)) |>
  pull(`short_description_(valid_icd-10_fy2025)`) |>
  unique()

uti_terms <-
  c("acute cystitis",
    "chronic cystitis",
    "chronic interstitial cystitis",
    "urinary tract infection",
    "interstitial nephritis",
    "pyelonephritis",
    "urethritis",
    uti_terms
    )

gwas_study_info = gwas_study_info |>
  mutate(l2_all_disease_terms = 
           stringr::str_replace_all(
             l2_all_disease_terms,
             vec_to_grep_pattern(uti_terms),
             "urinary tract infections and interstitial nephritis"
           )
  )

16.2.2 Urolithiasis

# N20-N23.0

urolithiasis_icd10 <-
  paste0("N", seq(20, 23.0, by = 0.1) * 10)

urolithiasis_terms <-
  icd10 |>
  filter(grepl(paste0(urolithiasis_icd10, collapse = "|"), code)) |>
  pull(`short_description_(valid_icd-10_fy2025)`) |>
  unique()

urolithiasis_terms <-
  c("lower urinary tract calculus",
    "bladder calculus",
    "calcium oxalate nephrolithiasis",
    "nephrolithiasis",
    "renal colic", #icd-9 788.0
    urolithiasis_terms
    )

16.2.3 Benign prostatic hyperplasia

16.2.4 Male infertility

gwas_study_info |>
  filter(grepl(vec_to_grep_pattern("male infertility"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct()
               all_disease_terms l2_all_disease_terms
                          <char>               <char>
1:              male infertility     male infertility
2:                   azoospermia     male infertility
3: male infertility, azoospermia     male infertility

16.2.5 Other urinary diseases

# N25-N28.1, N29-N29.8, N31-N32.0, N32.3-N32.4, N36-N36.9, N39, N41-N41.9, N44-N44.0, N45-N45.9, N49-N49.9

urine_icd10 <- 
  c(paste0("N", seq(25, 28.1, by = 0.1) * 10),
    paste0("N", seq(29, 29.8, by = 0.1) * 10),
    paste0("N", seq(31, 32.0, by = 0.1) * 10),
    paste0("N", seq(32.3, 32.4, by = 0.1) * 10),
    paste0("N", seq(36, 36.9, by = 0.1) * 10),
    "N39",
    paste0("N", seq(41, 41.9, by = 0.1) * 10),
    paste0("N", seq(44, 44.0, by = 0.1) * 10),
    paste0("N", seq(45, 45.9, by = 0.1) * 10),
    paste0("N", seq(49, 49.9, by = 0.1) * 10)
    )

urine_terms <-
  icd10 |>
  filter(grepl(paste0(urine_icd10, collapse = "|"), code)) |>
  pull(`short_description_(valid_icd-10_fy2025)`) |>
  unique() |>
  tolower()

urine_terms <- stringr::str_remove_all(urine_terms, ", unspecified$")
urine_terms <- stringr::str_remove_all(urine_terms, ", site not specified$")


urinary_diseases_terms <- c("urinary incontinence",
                            "stress urinary incontinence",
                            "urgency urinary incontinence",
                            "urinary system disease",
                            "bladder neck obstruction",
                            "urethral disease",
                            "urethral syndrome",
                            "uterine inflammatory disease",
                            "enuresis",
                            "neurogenic bladder", # icd9 596.54
                            "bladder diverticulum", # icd9 596.3
                            urine_terms
                            )

urinary_diseases_terms <- str_length_sort(urinary_diseases_terms)

gwas_study_info = gwas_study_info |>
  mutate(l2_all_disease_terms = 
           stringr::str_replace_all(
             l2_all_disease_terms,
             vec_to_grep_pattern(urinary_diseases_terms),
             "other urinary diseases"
           )
  )

16.3 Gynocological diseases

16.3.1 Uterine fibroids

# D25-D26, D28.2
icd10_uterine_fibroids <-
  c(paste0("D", seq(25, 26, by = 0.1) * 10),
    "D282"
    )

icd10_uterine_fibroids_terms <-
  icd10 |>
  filter(grepl(paste0(icd10_uterine_fibroids, collapse = "|"), code)) |>
  pull(`short_description_(valid_icd-10_fy2025)`) |>
  unique()

uterine_fibroid_terms <- 
  c("uterine leiomyoma",
    "uterine fibroid")



gwas_study_info = gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms,
                                  pattern = vec_to_grep_pattern(uterine_fibroid_terms),
                                   "uterine fibroids"
                          )  
        )

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("uterine fibroids"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
              all_disease_terms          l2_all_disease_terms
                         <char>                        <char>
1:              uterine fibroid              uterine fibroids
2: uterine fibroid, menorrhagia menorrhagia, uterine fibroids

16.3.2 Polycystic ovary syndrome

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("polycystic ovary syndrome"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
                                     all_disease_terms
                                                <char>
1:                           polycystic ovary syndrome
2: polycystic ovary syndrome, type 2 diabetes mellitus
                                  l2_all_disease_terms
                                                <char>
1:                           polycystic ovary syndrome
2: polycystic ovary syndrome, diabetes mellitus type 2

16.3.3 Female infertility

gwas_study_info |>
  filter(grepl(
         vec_to_grep_pattern("female infertility"),
         l2_all_disease_terms,
         perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() 
                                                                all_disease_terms
                                                                           <char>
1:                                                             female infertility
2:                                              female infertility, endometriosis
3: female infertility, tubal factor infertility, chlamydophila infectious disease
4:                                                       tubal factor infertility
                l2_all_disease_terms
                              <char>
1:                female infertility
2: endometriosis, female infertility
3:                female infertility
4:                female infertility

16.3.4 Endometriosis

endo_terms <- c("ovarian endometriosis",
                "endometriosis of pelvic peritoneum"
                )

gwas_study_info = gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms,
                                  pattern = vec_to_grep_pattern(endo_terms),
                                   "endometriosis"
                          )  
        )

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("endometriosis"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
                          all_disease_terms                l2_all_disease_terms
                                     <char>                              <char>
1:                            endometriosis                       endometriosis
2:        female infertility, endometriosis   endometriosis, female infertility
3:         migraine disorder, endometriosis   endometriosis, headache disorders
4: endometriosis, major depressive disorder endometriosis, depressive disorders
5:                    ovarian endometriosis                       endometriosis
6:       endometriosis of pelvic peritoneum                       endometriosis

16.3.5 Genital prolapse

fem_genital_prolapse_terms <- c("uterine prolapse",
                              "cystocele",
                              "rectocele"
)

gwas_study_info = gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms,
                                  pattern = vec_to_grep_pattern(fem_genital_prolapse_terms),
                                   "genital prolapse"
                          )
 )
        
        
gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("genital prolapse"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
   all_disease_terms l2_all_disease_terms
              <char>               <char>
1:  uterine prolapse     genital prolapse
2:         rectocele     genital prolapse

16.3.6 Premenstrual syndrome

16.3.7 Other gynocological diseases

diseases <- stringr::str_split(pattern = ", ", 
                               gwas_study_info$l2_all_disease_terms)  |> 
            unlist() |>
            stringr::str_trim()

# N72-N72.0, N75-N77.8, N83-N83.9

other_gyno_terms <- c("cervicitis", #N72
                      "bartholin gland disease", # N75
                      "bartholin duct cyst",
                      "vaginitis", # N76
                      "vaginal inflammation",
                      "postmenopausal atrophic vaginitis",                      
                      "atrophic vaginitis",
                      "vulvovaginitis",
                      "ulceration of vulva", # N77
                      "ovarian cyst", # N83
                      "follicular cyst"
                      )

# pregnancy_terms <- grep("pregnancy", diseases, value = T)

# gyno_terms <- c(
#                 "female reproductive system disease",
#                 "female genital tract fistula",
#                 "placenta disease", 
#                 "ovarian gynecological diseases",
#                 "vaginal disorder",
# 
# 
#                 "abnormal delivery",
#                 pregnancy_terms)


gwas_study_info = gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms,
                                  pattern = vec_to_grep_pattern(other_gyno_terms),
                                   "other gynecological diseases"
                          )  
        )

16.4 Hemoglobinopathies and hemolytic anemias

hemoglobinopathies_terms <- c("sickle cell disease and related diseases",
                              "thalassemia",
                              "inherited hemoglobinopathy",
                              "hemoglobin e disease"
)

hemopath_hemo_anemias <- c(hemoglobinopathies_terms, "hemolytic anemia")

gwas_study_info = gwas_study_info |>
  mutate(l2_all_disease_terms = 
           stringr::str_replace_all(
             l2_all_disease_terms,
             vec_to_grep_pattern(hemopath_hemo_anemias),
             "hemoglobinopathies and hemolytic anemias"
           )
  )

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("hemoglobinopathies and hemolytic anemias"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
                          all_disease_terms
                                     <char>
1: sickle cell disease and related diseases
2:                       sickle cell anemia
3:       sickle cell anemia, cholelithiasis
4:                     hemoglobin e disease
5:      sickle cell anemia, thromboembolism
6:                familial hemolytic anemia
                                                                      l2_all_disease_terms
                                                                                    <char>
1:                                                hemoglobinopathies and hemolytic anemias
2:                                                hemoglobinopathies and hemolytic anemias
3:              gallbladder and biliary diseases, hemoglobinopathies and hemolytic anemias
4:                                                hemoglobinopathies and hemolytic anemias
5: hemoglobinopathies and hemolytic anemias, other cardiovascular and circulatory diseases
6:                                                hemoglobinopathies and hemolytic anemias

16.4.1 Endocrine, metabolic, blood, and immune disorders

endo_terms <- c("ovarian dysfunction",
                "ovarian disease",
                "menstrual disorder",
            #    "premenstral tension" #icd9 625.4, icd10 N94.3,
                "adrenocortical insufficiency", # ICD9:255.4, ICD10:E27.1 / ICD10:E27.4
                "cushing syndrome", # ICD9:255.0, ICD10:E24
                "hyperaldosteronism", # ICD10CM:E26, ICD9:255.1
                "delayed puberty", #ICD10:E30.1, ICD9:259.1
                "central precocious puberty",  #?ICD10: E30.1
            "endocrine system disease" # ????
                )

gwas_study_info = gwas_study_info |>
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms,
                                  pattern = vec_to_grep_pattern(endo_terms),
                                   "other endocrine disorders"
                          )  
        )

thyroid_terms <- c("congenital hypothyroidism due to developmental anomaly", #ICD-10:E03.1
                   "hypothyroidism",
                   "nontoxic goiter",#ICD-10: E04
                   "thyrotoxicosis", #ICD-10: E05
                   "toxic nodular goiter", # ICD9:242.3)
                   "thyroiditis", #ICD-10: E06, IDC9:245
                   "hashimotos thyroiditis", #ICD9:245.2
                   "graves disease", #ICD9:242.0,
                   "autoimmune thyroid disease" #ICD10:E06.3,
                   )

gwas_study_info = gwas_study_info |>
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms,
                                  pattern = vec_to_grep_pattern(thyroid_terms),
                                   "thyroid disorders"
                          )  
        )


other_blood_disorders <- c("neutropenia",
                           "cryoglobulinemia")

gwas_study_info = gwas_study_info |>
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms,
                                  pattern = vec_to_grep_pattern(other_blood_disorders),
                                   "other blood disorders"
                          )  
        )

metabolism_disorders <- c("hypoglycemia",
                          "hypoparathyroidism",
                          "parathyroid disease", # ICD9: 252
                          "primary hyperparathyroidism", # ICD10 E21.0
                          "hyperparathyroidism", #ID20 E20
                          # "secondary hyperparathyroidism of renal origin" not incl, as otherwise specified in ICD10CM:N25.81
                          "obesity",
                          "metabolic syndrome",
                          "metabolic syndrome x",
                          "inborn errors of metabolism",
                          "metabolic disease",
                          "mineral metabolism disease", # ICD9:275.8, ICD9:275.9, ICD10:E83
                          "acidosis", # ICD9:276.2
                          "disorder of acid-base balance", # ? ICD10-cm E87.8
                          "bilirubin metabolism disease"
                          )

gwas_study_info = gwas_study_info |>
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms,
                                  pattern = vec_to_grep_pattern(metabolism_disorders),
                                   "other metabolic disorders"
                          )  
        )

16.5 Oral disorders

oral_disorders_terms <- c("dental caries",
                           "tooth disease",
                          "toothache",
                           "periodontal disease"
)


gwas_study_info = gwas_study_info |>
  mutate(l2_all_disease_terms = 
           stringr::str_replace_all(
             l2_all_disease_terms,
             vec_to_grep_pattern(oral_disorders_terms),
             "oral disorders"
           )
  )


gwas_study_info |> 
 filter(grepl(vec_to_grep_pattern("oral disorders"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
                       all_disease_terms l2_all_disease_terms
                                  <char>               <char>
1:                         dental caries       oral disorders
2:              aggressive periodontitis       oral disorders
3: pit and fissure surface dental caries       oral disorders
4:          smooth surface dental caries       oral disorders
5:                         periodontitis       oral disorders
6:                             toothache       oral disorders

17 Check compatibility with gbd data

17.1 Weird fix

gwas_study_info = 
gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = "disorderss",
                                   "disorders"
                          )  
        )


gwas_study_info = 
gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = "anxiety disorders disorderss",
                                   "anxiety disorders"
                          )  
        )
gbd_data <- data.table::fread(here::here("data/gbd/ihme_gbd_2019_global_disease_burden_rate_all_ages.csv"))

gbd_data$cause <- stringr::str_remove_all(gbd_data$cause, ",")

diseases <- stringr::str_split(pattern = ", ", 
                               gwas_study_info$l2_all_disease_terms[gwas_study_info$l2_all_disease_terms != ""])  |> 
            unlist() |>
            stringr::str_trim()

gbd_data$cause[!tolower(gbd_data$cause) %in% unique(diseases)] |> sort() |> unique()
 [1] "Acute glomerulonephritis"                                         
 [2] "Aortic aneurysm"                                                  
 [3] "Bipolar disorder"                                                 
 [4] "Congenital birth defects"                                         
 [5] "Drug use disorders"                                               
 [6] "Endocrine metabolic blood and immune disorders"                   
 [7] "Gynecological diseases"                                           
 [8] "Other chronic respiratory diseases"                               
 [9] "Other musculoskeletal disorders"                                  
[10] "Other sense organ diseases"                                       
[11] "Scabies"                                                          
[12] "Sudden infant death syndrome"                                     
[13] "Total burden related to Non-alcoholic fatty liver disease (NAFLD)"
[14] "Urinary diseases and male infertility"                            
[15] "Vascular intestinal disorders"                                    
[16] "Viral skin diseases"                                              
gbd_data =
  gbd_data |>
  mutate(cause = tolower(cause))

gwas_disease_traits = data.frame(cause = diseases)
  # gwas_study_info |>
  # filter(DISEASE_STUDY == T) |>
  # select(all_disease_terms, l1_all_disease_terms, cause = l2_all_disease_terms) |>
  # distinct()

left_join(gwas_disease_traits, 
          gbd_data) |>
  head()
Joining with `by = join_by(cause)`
Warning in left_join(gwas_disease_traits, gbd_data): Detected an unexpected many-to-many relationship between `x` and `y`.
ℹ Row 3 of `x` matches multiple rows in `y`.
ℹ Row 19 of `y` matches multiple rows in `x`.
ℹ If a many-to-many relationship is expected, set `relationship =
  "many-to-many"` to silence this warning.
                              cause                                measure
1 idiopathic dilated cardiomyopathy                                   <NA>
2                  orofacial clefts                                   <NA>
3 tracheal bronchus and lung cancer DALYs (Disability-Adjusted Life Years)
4 tracheal bronchus and lung cancer                             Prevalence
5 tracheal bronchus and lung cancer                              Incidence
6 tracheal bronchus and lung cancer DALYs (Disability-Adjusted Life Years)
  location  sex      age metric year       val     upper     lower
1     <NA> <NA>     <NA>   <NA>   NA        NA        NA        NA
2     <NA> <NA>     <NA>   <NA>   NA        NA        NA        NA
3   Global Both All ages   Rate 2019 580.36100 627.79984 532.74652
4   Global Both All ages   Rate 2019  40.27440  43.51721  37.12978
5   Global Both All ages   Rate 2019  28.16826  30.49575  25.77712
6   Global Both All ages   Rate 2019 580.36100 627.79984 532.74652
gwas_study_info |> select(cause = l2_all_disease_terms) |>
  distinct() |>
  left_join(gbd_data) |>
  head()
Joining with `by = join_by(cause)`
                               cause                                measure
                              <char>                                 <char>
1:                                                                     <NA>
2: idiopathic dilated cardiomyopathy                                   <NA>
3:                  orofacial clefts                                   <NA>
4: tracheal bronchus and lung cancer DALYs (Disability-Adjusted Life Years)
5: tracheal bronchus and lung cancer                             Prevalence
6: tracheal bronchus and lung cancer                              Incidence
   location    sex      age metric  year       val     upper     lower
     <char> <char>   <char> <char> <int>     <num>     <num>     <num>
1:     <NA>   <NA>     <NA>   <NA>    NA        NA        NA        NA
2:     <NA>   <NA>     <NA>   <NA>    NA        NA        NA        NA
3:     <NA>   <NA>     <NA>   <NA>    NA        NA        NA        NA
4:   Global   Both All ages   Rate  2019 580.36100 627.79984 532.74652
5:   Global   Both All ages   Rate  2019  40.27440  43.51721  37.12978
6:   Global   Both All ages   Rate  2019  28.16826  30.49575  25.77712
diseases <- stringr::str_split(pattern = ", ", 
                               gwas_study_info$l2_all_disease_terms[gwas_study_info$l2_all_disease_terms != ""])  |> 
            unlist() |>
            stringr::str_trim()

length(unique(diseases))
[1] 1338
# make frequency table
freq <- table(as.factor(diseases))

# sort in decreasing order
freq_sorted <- sort(freq, decreasing = TRUE)

# show top N, e.g. top 10
head(freq_sorted, 10)

                       chronic kidney disease 
                                        10889 
other cardiovascular and circulatory diseases 
                                         7585 
                     diabetes mellitus type 2 
                                          922 
                       other mental disorders 
                                          828 
                         depressive disorders 
                                          726 
                       ischemic heart disease 
                                          579 
      alzheimer's disease and other dementias 
                                          514 
                              other neoplasms 
                                          509 
                    blindness and vision loss 
                                          415 
                    other metabolic disorders 
                                          381 

17.1.1 Save the updated gwas_study_info with harmonized disease terms

gwas_study_info <- fwrite(gwas_study_info,
                          here::here("output/gwas_cat/gwas_study_info_trait_group_l2.csv"))

sessionInfo()
R version 4.3.1 (2023-06-16)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS 15.6.1

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: America/Los_Angeles
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices datasets  utils     methods   base     

other attached packages:
[1] jsonlite_2.0.0    httr_1.4.7        stringr_1.5.1     ggplot2_3.5.2    
[5] data.table_1.17.8 dplyr_1.1.4       workflowr_1.7.1  

loaded via a namespace (and not attached):
 [1] gtable_0.3.6       compiler_4.3.1     renv_1.0.3         promises_1.3.3    
 [5] tidyselect_1.2.1   Rcpp_1.1.0         git2r_0.36.2       callr_3.7.6       
 [9] later_1.4.2        jquerylib_0.1.4    scales_1.4.0       readxl_1.4.5      
[13] yaml_2.3.10        fastmap_1.2.0      here_1.0.1         R6_2.6.1          
[17] generics_0.1.4     curl_6.4.0         knitr_1.50         tibble_3.3.0      
[21] rprojroot_2.1.0    RColorBrewer_1.1-3 bslib_0.9.0        pillar_1.11.0     
[25] rlang_1.1.6        cachem_1.1.0       stringi_1.8.7      httpuv_1.6.16     
[29] xfun_0.52          getPass_0.2-4      fs_1.6.6           sass_0.4.10       
[33] cli_3.6.5          withr_3.0.2        magrittr_2.0.3     ps_1.9.1          
[37] grid_4.3.1         digest_0.6.37      processx_3.8.6     rstudioapi_0.17.1 
[41] lifecycle_1.0.4    vctrs_0.6.5        evaluate_1.0.4     glue_1.8.0        
[45] cellranger_1.1.0   farver_2.1.2       whisker_0.4.1      rmarkdown_2.29    
[49] tools_4.3.1        pkgconfig_2.0.3    htmltools_0.5.8.1