Last updated: 2025-12-29

Checks: 7 0

Knit directory: genomics_ancest_disease_dispar/

This reproducible R Markdown analysis was created with workflowr (version 1.7.1). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.


Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

The command set.seed(20220216) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version caf759a. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .DS_Store
    Ignored:    .Rproj.user/
    Ignored:    .venv/
    Ignored:    analysis/.DS_Store
    Ignored:    ancestry_dispar_env/
    Ignored:    data/.DS_Store
    Ignored:    data/cdc/
    Ignored:    data/cohort/
    Ignored:    data/gbd/.DS_Store
    Ignored:    data/gbd/IHME-GBD_2021_DATA-d8cf695e-1.csv
    Ignored:    data/gbd/IHME-GBD_2023_DATA-73cc01fd-1.csv
    Ignored:    data/gbd/ihme_gbd_2019_global_disease_burden_rate_all_ages.csv
    Ignored:    data/gbd/ihme_gbd_2019_global_paf_rate_percent_all_ages.csv
    Ignored:    data/gbd/ihme_gbd_2021_global_disease_burden_rate_all_ages.csv
    Ignored:    data/gbd/ihme_gbd_2021_global_paf_rate_percent_all_ages.csv
    Ignored:    data/gwas_catalog/
    Ignored:    data/icd/.DS_Store
    Ignored:    data/icd/2025AA/
    Ignored:    data/icd/IHME_GBD_2019_COD_CAUSE_ICD_CODE_MAP_Y2020M10D15.XLSX
    Ignored:    data/icd/IHME_GBD_2019_NONFATAL_CAUSE_ICD_CODE_MAP_Y2020M10D15.XLSX
    Ignored:    data/icd/IHME_GBD_2021_COD_CAUSE_ICD_CODE_MAP_Y2024M05D16.XLSX
    Ignored:    data/icd/IHME_GBD_2021_NONFATAL_CAUSE_ICD_CODE_MAP_Y2024M05D16.XLSX
    Ignored:    data/icd/UK_Biobank_master_file.tsv
    Ignored:    data/icd/cdc_valid_icd10_Sep_23_2025.xlsx
    Ignored:    data/icd/cdc_valid_icd9_Sep_23_2025.xlsx
    Ignored:    data/icd/hp_umls_mapping.csv
    Ignored:    data/icd/lancet_conditions_icd10.xlsx
    Ignored:    data/icd/manual_disease_icd10_mappings.xlsx
    Ignored:    data/icd/mondo_umls_mapping.csv
    Ignored:    data/icd/phecode_international_version_unrolled.csv
    Ignored:    data/icd/phecode_to_icd10_manual_mapping.xlsx
    Ignored:    data/icd/semiautomatic_ICD-pheno.txt
    Ignored:    data/icd/semiautomatic_ICD-pheno_UKB_subset.txt
    Ignored:    data/icd/umls-2025AA-mrconso.zip
    Ignored:    figures/
    Ignored:    human_dictionary/
    Ignored:    igsr_populations.tsv
    Ignored:    output/.DS_Store
    Ignored:    output/abstracts/
    Ignored:    output/doccano/
    Ignored:    output/fulltexts/
    Ignored:    output/gwas_cat/
    Ignored:    output/gwas_cohorts/
    Ignored:    output/icd_map/
    Ignored:    output/trait_ontology/
    Ignored:    pubmedbert-cohort-ner-model/
    Ignored:    pubmedbert-cohort-ner/
    Ignored:    r-spacyr/
    Ignored:    renv/
    Ignored:    venv/
    Ignored:    visualization.Rdata

Unstaged changes:
    Modified:   .gitignore
    Modified:   analysis/disease_inves_by_ancest.Rmd
    Modified:   analysis/get_full_text.Rmd
    Modified:   analysis/gwas_to_gbd.Rmd
    Modified:   analysis/index.Rmd
    Modified:   analysis/missing_cohort_info.Rmd
    Modified:   analysis/replication_ancestry_bias.Rmd
    Modified:   analysis/text_for_cohort_labels.Rmd

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.


These are the previous versions of the repository in which changes were made to the R Markdown (analysis/level_2_disease_group.Rmd) and HTML (docs/level_2_disease_group.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File Version Author Date Message
Rmd caf759a IJbeasley 2025-12-29 Archiving old GWAS trait conversion
html c5ee886 IJbeasley 2025-09-24 Build site.
Rmd fde6f07 IJbeasley 2025-09-24 Even more grouping
html fd33f6d IJbeasley 2025-09-24 Build site.
Rmd fee6e4e IJbeasley 2025-09-24 Yay better way to group with icd codes
html 9ff91c0 IJbeasley 2025-09-24 Build site.
Rmd a61120f IJbeasley 2025-09-24 More fixing diseases
html 8f83a4a IJbeasley 2025-09-23 Build site.
Rmd 54d06f8 IJbeasley 2025-09-23 More icd codes
html bed5e82 IJbeasley 2025-09-23 Build site.
Rmd fbee909 IJbeasley 2025-09-23 Using icd codes to help grouping
html 9046c0d IJbeasley 2025-09-23 Build site.
Rmd dd87c61 IJbeasley 2025-09-23 Using icd codes to help grouping
html 904bb1d IJbeasley 2025-09-22 Build site.
Rmd bbcc167 IJbeasley 2025-09-22 Even more typo etc.
html 75debe4 IJbeasley 2025-09-22 Build site.
Rmd b3e3287 IJbeasley 2025-09-22 …maybe fixing typos
html b8ee7f0 IJbeasley 2025-09-22 Build site.
Rmd 3305f6a IJbeasley 2025-09-22 …maybe fixing typos
html 200442f IJbeasley 2025-09-17 Build site.
Rmd 614204e IJbeasley 2025-09-17 More fixing up of disease grouping
html 7b87d93 IJbeasley 2025-09-17 Build site.
Rmd da5c4b4 IJbeasley 2025-09-17 More correction to cardiovascular disease terms
html 08b0db3 IJbeasley 2025-09-17 Build site.
Rmd bb8ae95 IJbeasley 2025-09-17 Better grouping of cardiovascular disease
html 7cf2803 IJbeasley 2025-09-17 Build site.
Rmd 39262a4 IJbeasley 2025-09-17 More typo fixing
html c0cf9bd IJbeasley 2025-09-16 Build site.
Rmd 3519a0b IJbeasley 2025-09-16 Collapsing traits to gbd
html f1b18b0 IJbeasley 2025-09-16 Build site.
Rmd afe44b4 IJbeasley 2025-09-16 Collapsing traits to gbd
html c204ac4 IJbeasley 2025-09-16 Build site.
Rmd 7fa03f5 IJbeasley 2025-09-16 More cancer typos
html 8f1639b IJbeasley 2025-09-16 Build site.
Rmd 345ad9b IJbeasley 2025-09-16 More cancer typos
html a15dd40 IJbeasley 2025-09-16 Build site.
Rmd 16ead66 IJbeasley 2025-09-16 Correcting some cancer grouping
html 6018e42 IJbeasley 2025-09-16 Build site.
Rmd 02a0b9d IJbeasley 2025-09-16 Improving cancer grouping
html 6f66696 IJbeasley 2025-09-16 Build site.
Rmd 66cff1c IJbeasley 2025-09-16 Even more disease term grouping
html 21b6c02 IJbeasley 2025-09-15 Build site.
html 5ec3111 IJbeasley 2025-09-15 Build site.
html 30d773e IJbeasley 2025-09-15 Build site.
html 8d64a38 IJbeasley 2025-09-15 Build site.
Rmd b3088d8 IJbeasley 2025-09-15 workflowr::wflow_publish("analysis/level_2_disease_group.Rmd")
html b89d661 IJbeasley 2025-09-10 Build site.
Rmd c0fcab7 IJbeasley 2025-09-10 workflowr::wflow_publish("analysis/level_2_disease_group.Rmd")
html ead4d8e IJbeasley 2025-09-10 Build site.
Rmd 3964f77 IJbeasley 2025-09-10 workflowr::wflow_publish("analysis/level_2_disease_group.Rmd")
html 8fb639d IJbeasley 2025-09-10 Build site.
Rmd edeb6f5 IJbeasley 2025-09-10 workflowr::wflow_publish("analysis/level_2_disease_group.Rmd")
html fe91704 IJbeasley 2025-09-09 Build site.
Rmd 9c64867 IJbeasley 2025-09-09 Minor fixing of disease trait categorisation
html fa509c0 IJbeasley 2025-09-08 Build site.
Rmd c9602c7 IJbeasley 2025-09-08 More grouping to match GBD

1 Set up

library(dplyr)
library(data.table)
library(ggplot2)
library(stringr)

1.1 Ontology help - for getting disease subtypes

source(here::here("code/get_term_descendants.R"))
gwas_study_info <- fread(here::here("output/gwas_cat/gwas_study_info_group_l1_v2.csv"))

2 Objectives:

  • Further group disease terms (level 2 categories) to match GBD (globalqburden of disease) categories more closely.

2.1 Grouping - level 2 set up

gwas_study_info = gwas_study_info |>
  mutate(l2_all_disease_terms = l1_all_disease_terms)

3 Maternal and neonatal disorders

3.1 Maternal disorders

maternal_disorder_terms <- c("galactorrhea", #other direct maternal disorders
                             "eclampsia", # maternal hypertensive disorders
                             "hyperemesis gravidarum", #other direct maternal disorders
                             "vomiting of pregnancy", #other direct maternal disorders
                             "intrahepatic cholestasis of pregnancy", #other direct maternal disorders (? indirect maternal deaths)
                             "loss of pregnancy", # maternal abortion and miscarriage
                             "ectopic pregnancy", # ectopic pregnancy
                             "post term pregnancy", # other direct maternal disorders
                             "pregnancy disorder",
                             "obstructed labor", # Maternal obstructed labor and uterine rupture
                             "early pregnancy hemorrhage", # maternal hemorrhage
                             "preterm premature rupture of the membranes", # Other direct maternal disorders
                             "stillbirth", # maternal abortion and miscarriage
                             "miscarriage", # maternal abortion and miscarriage
                             "abnormal delivery",
                             "abruptio placentae", # maternal hemorrhage
                             "failed induction",
                             "gestational diabetes", # indirect maternal deaths
                             "chorioamnionitis"
                             
                             )



gwas_study_info =
gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = vec_to_grep_pattern(maternal_disorder_terms),
                                   "maternal disorders"
                          )  
        )

3.1.1 Neonatal disorders

neonatal_disorders <- c("neonatal jaundice", # Hemolytic disease and other neonatal jaundice)
                        "neonatal sepsis", # Neonatal sepsis and other neonatal infections
                        "neonatal abstinence syndrome", # Other neonatal disorders
                        "perinatal disease", # Neonatal sepsis and other neonatal infections
                        "asphyxia neonatorum" # Neonatal encephalopathy due to birth asphyxia and trauma
)


gwas_study_info =
gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = vec_to_grep_pattern(neonatal_disorders),
                                   "neonatal disorders"
                          )  
        )

4 Neoplasms

4.1 Lip and oral cavity cancer

gwas_study_info |> 
 filter(grepl(vec_to_grep_pattern("lip and oral cavity cancer"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()

4.2 Nasopharynx cancer

gwas_study_info =
gwas_study_info |> 
  mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = vec_to_grep_pattern("nasopharyngeal cancer"),
                                   "nasopharynx cancer"
                          )  
        )
  
gwas_study_info |> 
 filter(grepl(vec_to_grep_pattern("nasopharynx cancer"),
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct()

4.3 Other pharynx cancer

gwas_study_info |> 
 filter(grepl(vec_to_grep_pattern("other pharynx cancer"),
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()

4.4 Esophageal cancer

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("esophageal cancer"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()

4.5 Stomach cancer

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("stomach cancer"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()

4.6 Colon and rectum cancer

gwas_study_info = gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = vec_to_grep_pattern("colorectal cancer"),
                                   "colon and rectum cancer"
                          )  
        )

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("colon and rectum cancer"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |> 
  head()

4.7 Liver cancer

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("liver cancer"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()

4.8 Gallbladder and biliary tract cancer

gwas_study_info = 
gwas_study_info |>
 mutate(l2_all_disease_terms  = 
        case_when(
          l2_all_disease_terms == "cancer of gallbladder and extrahepatic biliary tract" ~ "gallbladder and biliary tract cancer",
          TRUE ~ l2_all_disease_terms
                 )
        )

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("gallbladder and biliary tract cancer"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct()

4.9 Pancreatic cancer

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("pancreatic cancer"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()

4.10 Larynx cancer

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("larynx cancer"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()

4.11 Tracheal, bronchus, and lung cancer

resp_cancer_terms = c("lung cancer",
                      "bronchus cancer",
                      "tracheal cancer",
                      "respiratory system cancer"
                        )

gwas_study_info = 
gwas_study_info |> 
 mutate(l2_all_disease_terms  =
          ifelse(l2_all_disease_terms != "tracheal bronchus and lung cancer",
          stringr::str_replace_all(l2_all_disease_terms,
                                   pattern = vec_to_grep_pattern(resp_cancer_terms),
                                  #pattern = paste0(resp_cancer_terms, collapse = "(?=,|$)|\\b"),
                                   "tracheal bronchus and lung cancer"
                          ),
          l2_all_disease_terms
          )
        )

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("tracheal bronchus and lung cancer"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()

4.12 Malignant skin melanoma

gwas_study_info =
gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms,
                                  pattern = vec_to_grep_pattern("malignant melanoma of skin"),
                                   "malignant skin melanoma"
                          )  
        )

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("malignant skin melanoma"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()

4.13 Non-melanoma skin cancer

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("non-melanoma skin cancer"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct()

4.14 Soft tissue and other extraosseous sarcomas

gwas_study_info = 
gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = vec_to_grep_pattern("soft tissue sarcoma"),
                                   "soft tissue and other extraosseous sarcomas"
                          )  
        )

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("soft tissue and other extraosseous sarcomas"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()

4.15 Malignant neoplasm of bone and articular cartilage

gwas_study_info =
gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = vec_to_grep_pattern(c("bone cancer","osteosarcoma")),
                                   "malignant neoplasm of bone and articular cartilage"
                          )  
        )

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("malignant neoplasm of bone and articular cartilage"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()

4.16 Breast cancer

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("breast cancer"), 
              l2_all_disease_terms,
              perl = T
              )) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()

4.17 Cervical cancer

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("cervical cancer"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()

4.18 Uterine cancer

# ? is endometrial cancer a subset of uterine cancer for GBD?
# is for ontology: http://purl.obolibrary.org/obo/MONDO_0002715
gwas_study_info =
gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = vec_to_grep_pattern("endometrial cancer"),
                                   "uterine cancer"
                          )  
        )

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("uterine cancer"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()

4.19 Ovarian cancer

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("ovarian cancer"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()

4.20 Prostate cancer

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("prostate cancer"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()

4.21 Testicular cancer

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("testicular cancer"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct()

4.22 Kidney cancer

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("kidney cancer"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head() 

4.23 Bladder cancer

gwas_study_info = 
  gwas_study_info |>
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = vec_to_grep_pattern("urinary bladder cancer"),
                                   "bladder cancer"
                          )  
        )


gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("bladder cancer"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() 

4.24 Brain and central nervous system cancer

gwas_study_info = 
  gwas_study_info |>
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = vec_to_grep_pattern("central nervous system cancer"),
                                   "brain and central nervous system cancer"
                          )  
        )


gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("brain and central nervous system cancer"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()

4.25 Eye cancer

gwas_study_info = 
gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = 
                                    vec_to_grep_pattern(c("ocular melanoma",
                                                       "ocular cancer")
                                                       ),
                                   "eye cancer"
                          )  
        )

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("eye cancer"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |> 
  head()

4.26 Neuroblastoma and other peripheral nervous cell tumors

gwas_study_info =
gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms,
                                   pattern = vec_to_grep_pattern(
                                     c("neuroblastoma",
                                       "peripheral nervous system cancer")
                                   ),
                                   "neuroblastoma and other peripheral nervous cell tumors"
                          )  
        )

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("neuroblastoma and other peripheral nervous cell tumors"),
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()

4.27 Thyroid cancer

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("thyroid cancer"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()

4.28 Mesothelioma

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("mesothelioma"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()

4.29 Hodgkins lymphoma

gwas_study_info = 
gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = vec_to_grep_pattern("hodgkins lymphoma"),
                                   "hodgkin lymphoma"
                          )  
        )


gwas_study_info |> 
 filter(grepl(vec_to_grep_pattern("hodgkin lymphoma"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()

4.30 Non-hodgkin lymphoma

gwas_study_info = 
gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = vec_to_grep_pattern("non-hodgkins lymphoma"),
                                   "non-hodgkin lymphoma"
                          )  
        )

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("non-hodgkin lymphoma"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()

4.31 Multiple myeloma

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("multiple myeloma"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()

4.32 Leukemia

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("leukemia"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()

4.33 Other malignant neoplasms

gwas_study_info =
  gwas_study_info |>
  mutate(l2_all_disease_terms = 
         case_when(
         l2_all_disease_terms == "cancer" ~ "other malignant neoplasms",
         TRUE ~ l2_all_disease_terms
                 )
         )

gwas_study_info =
  gwas_study_info |>
  mutate(l2_all_disease_terms = 
         ifelse(PUBMED_ID == 27790247,
                stringr::str_replace_all(l2_all_disease_terms,
                                        pattern = ", cancer,",
                                        ", other malignant neoplasms,"
                                        ),
                l2_all_disease_terms
                
         )
  )


### dealing with measuring cancer caused factor terms
gwas_study_info |> 
  filter(grepl("^cancer,", l2_all_disease_terms)) |> 
  pull(l2_all_disease_terms) |> 
  unique()
 

gwas_study_info =
  gwas_study_info |>
  mutate(l2_all_disease_terms = 
         ifelse(grepl("^cancer,", l2_all_disease_terms),
                stringr::str_replace_all(l2_all_disease_terms,
                                        pattern = "^cancer,",
                                        "other malignant neoplasms,"
                                        ),
                l2_all_disease_terms
                
         )
  )

other_malignant_terms <- c(
                           "retroperitoneal cancer",
                           "peritoneal cancer",
                           "ewing sarcoma",
                           
                           "digestive system cancer",
                           "intestinal cancer",
                           "small intestine cancer",
                           
                           "female reproductive organ cancer",
                           "male reproductive organ cancer",
                           "vulvar cancer",                           
                           "testicular germ cell tumor",
                           "urogenital cancer",
                           
                           "squamous cell cancer",

                           "head and neck cancer",
                           "malignant tumor of floor of mouth", 
                           "nasal cavity cancer", #? not sure if should be somewhere else .. 
                           
                           "malignant lymphoid tumor",
                           "neuroendocrine tumor",
                           "lymphatic system cancer",
                           
                           "childhood cancer" #? maybe sort furtrher

                           )

gwas_study_info = 
gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms,
                                   pattern = vec_to_grep_pattern(other_malignant_terms),
                                  # pattern = paste0(other_malignant_terms, collapse = "(?=,|$)|\\b"),
                                  "other malignant neoplasms"
                          )  
        )


gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("other malignant neoplasms"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |> 
  head()

4.34 Other neoplasms

other_neoplasm_terms <- c("clonal hematopoiesis",
                          "neoplasm",
                          "benign neoplasm")

gwas_study_info = 
  gwas_study_info |>
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = vec_to_grep_pattern(other_neoplasm_terms),
                                   "other neoplasms"
                          )  
        )

gwas_study_info =
  gwas_study_info |>
  mutate(l2_all_disease_terms = 
         case_when(
         l2_all_disease_terms == "benign neoplasm" ~ "other neoplasms",
         TRUE ~ l2_all_disease_terms
                 )
         )

unknown_sig_terms <- c("intracranial germ cell tumor",
                       "bladder tumor",
                       "dysplasia of cervix")

gwas_study_info =
gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = vec_to_grep_pattern(unknown_sig_terms),
                                   "other neoplasms"
                          )
 )

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("other neoplasms"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()

5 Cardiovascular diseases

5.1 Rheumatic heart disease

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("rheumatic heart disease"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()

5.2 Ischemic heart disease

# add chronic ischemic heart disease
gwas_study_info = 
  gwas_study_info |>
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = vec_to_grep_pattern("chronic ischemic heart disease"),
                                   "ischemic heart disease"
                          )  
        )

# unsure --- to double check: aortic atherosclerosis
gwas_study_info = 
  gwas_study_info |>
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = vec_to_grep_pattern("aortic atherosclerosis"),
                                   "ischemic heart disease"
                          )  
        )

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("ischemic heart disease"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()

5.3 Stroke

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("stroke"),
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()

5.4 Hypertensive heart disease

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("hypertensive heart disease"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()

5.5 Non-rheumatic valvular heart disease

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("heart valve disease"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()

gwas_study_info = gwas_study_info |>
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = vec_to_grep_pattern("heart valve disease"),
                                   "non-rheumatic valvular heart disease"
                          )  
        )



gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("non-rheumatic valvular heart disease"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()

5.6 Cardiomyopathy and myocarditis

gwas_study_info =
gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = vec_to_grep_pattern(c("cardiomyopathy",
                                                                  "myocarditis")),
                                   "cardiomyopathy and myocarditis"
                          )  
        )

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("cardiomyopathy and myocarditis"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()

5.7 Pulmonary arterial hypertension

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("pulmonary arterial hypertension"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()

5.8 Atrial fibrillation & flutter

afib_terms <- c("atrial fibrillation",
                "atrial flutter",
                "post-operative atrial fibrillation")

gwas_study_info = 
gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = vec_to_grep_pattern(afib_terms),
                                   "atrial fibrillation and flutter"
                          )  
        )

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("atrial fibrillation and flutter"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()

5.9 Aortic aneurysm

url <- "http://www.ebi.ac.uk/ols4/api/ontologies/cvdo/terms/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FDOID_3627/descendants"

aortic_aneurysm_terms <- get_descendants(url)

gwas_study_info = gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = vec_to_grep_pattern(aortic_aneurysm_terms),
                                   "aortic aneurysm"
                          )  
        )

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("aortic aneurysm"), 
              l2_all_disease_terms,
              perl = T
              )) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()

5.10 Lower extremity peripheral arterial disease

lower_extremity_peripheral_arterial_disease_terms <- c("raynaud disease"
                                                        )

gwas_study_info =
  gwas_study_info |>
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = vec_to_grep_pattern(lower_extremity_peripheral_arterial_disease_terms),
                                   "lower extremity peripheral arterial disease"
                          )  
        )

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("lower extremity peripheral arterial disease"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()

5.11 Endocarditis

gwas_study_info |> 
 filter(grepl(vec_to_grep_pattern("endocarditis"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()

5.12 Other cardiovascular and circulatory diseases

other_cardiovascular_terms <- c("tachycardia",
                                "other cardiac arrhythmias",
                                "heart block",
                                "carotid artery disease",
                                "hypertension",
                                "pericarditis",
                                "phlebitis", # ICD9 - 451
                                "coronary artery calcification",
                                "arterial occlusion",
                                "other vascular disorders",
                                "congestive heart failure",
                                "heart failure",
                                "thrombotic diseas",
                                "arterial embolism",
                                "cardiac embolism",
                                "venus embolism",
                                "venus thrombosis",
                                "pulmonary embolism",                                
                                "arterial thrombosis",
                                "thromboembolism",
                                "vascular insufficiency",
                                "brain infarction",
                                "heart murmur",
                                "deep vein thrombosis"
                                 )

gwas_study_info =
gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = vec_to_grep_pattern(other_cardiovascular_terms),
                                   "other cardiovascular and circulatory diseases"
                          )
 )

gwas_study_info =
gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = "(?<=^|, ) other vascular disorders(?=,|$)",
                                   "other cardiovascular and circulatory diseases"
                          )
 )

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("other cardiovascular and circulatory diseases"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()

6 Chronic respiratory diseases

6.1 Chronic obstructive pulmonary disease

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("chronic obstructive pulmonary disease"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()

6.2 Pneumoconiosis

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("pneumoconiosis"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()

6.3 Asthma

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("asthma"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()

6.4 Interstitial lung disease and pulmonary sarcoidosis

interstitial_lung_disease_terms <- c("pulmonary sarcoidosis",
                                   "interstitial lung disease",
                                   "löfgren syndrome",
                                   "sarcoidosis"
                                   )

gwas_study_info = 
gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = vec_to_grep_pattern(interstitial_lung_disease_terms),
                                   "interstitial lung disease and pulmonary sarcoidosis"
                          )  
        )

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("interstitial lung disease and pulmonary sarcoidosis"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()

6.5 Other chronic respiratory diseases

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("other chronic respiratory diseases"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()

7 Digestive diseases

7.1 Cirrhosis & other chronic liver diseases

url <- "http://www.ebi.ac.uk/ols4/api/ontologies/snomed/terms/http%253A%252F%252Fsnomed.info%252Fid%252F328383001/descendants"

chronic_liver_disease_terms <- get_descendants(url)

chronic_liver_disease_terms <- c("primary biliary cirrhosis",
                                 "alcoholic liver cirrhosis",
                                 "chronic hepatitis B virus infection", 
                                 "acute-on-chronic liver failure",
                                 "non-alcoholic fatty liver disease",
                                 "cirrhosis of liver",
                                 "primary biliary cirrhosis",
                                 "chronic hepatitis",
                                 "liver disease",
                                 "alcoholic liver disease",
                                 "hepatitis c induced liver cirrhosis",
                                 chronic_liver_disease_terms)

gwas_study_info = gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms,
                                  pattern = vec_to_grep_pattern(chronic_liver_disease_terms),
                                   "cirrhosis and other chronic liver diseases"
                          )  
        )

gwas_study_info = gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms,
                                  pattern = "(?<=^|, ) liver disease(?=,|$)",
                                   "cirrhosis and other chronic liver diseases"
                          )  
        )


gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("cirrhosis and other chronic liver diseases"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()

7.2 Upper digestive system diseases

upper_dig_terms <- c("peptic ulcer diseases",
                     "peptic ulcer",
                     "duodenitis",
                     "gastritis",
                     "atrophic gastritis",
                     "gastroesophageal reflux disease")

gwas_study_info = gwas_study_info |>
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms,
                                  pattern = vec_to_grep_pattern(upper_dig_terms),
                                   "upper digestive system diseases"
                          )  
        )

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("upper digestive system diseases"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()

7.3 Appendicitis

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("appendicitis"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()

7.4 Paralytic ileus and intestinal obstruction

gwas_study_info = gwas_study_info |>
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms,
                                  pattern = vec_to_grep_pattern(c("paralytic ileus",
                                                             "intestinal obstruction")
                                                             ),
                                   "paralytic ileus and intestinal obstruction"
                          )  
        )

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("paralytic ileus and intestinal obstruction"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()

7.5 Inguinal femoral and abdominal hernia

hernia_terms <- c("inguinal hernia",
                  "femoral hernia",
                  "abdominal hernia",
                  "hernia of the abdominal wall",
                  "hiatus hernia")

gwas_study_info = gwas_study_info |>
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms,
                                  pattern = vec_to_grep_pattern(hernia_terms),
                                   "inguinal femoral and abdominal hernia"
                          )  
        )

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("inguinal femoral and abdominal hernia"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()

7.6 Inflammatory bowel disease

ibd_terms <- c("crohns disease",
               "ulcerative colitis",
               "inflammatory bowel disease",
               "enteritis",
               "gastroenteritis")

gwas_study_info = gwas_study_info |>
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms,
                                  pattern = vec_to_grep_pattern(ibd_terms),
                                   "inflammatory bowel disease"
                          )  
        )

gwas_study_info |>
  filter(grepl(vec_to_grep_pattern("inflammatory bowel disease"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()

7.7 Vascular intestinal disorders

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("vascular intestinal disorders"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()

7.8 Gallbladder and biliary diseases

gal_bile_terms = c("gallbladder disease",
                   "bile duct disorder",
                   "biliary tract disease",
                   "cholelithiasis",
                   "cholecystitis",
                   "sclerosing cholangitis",
                   "gallstones")


gwas_study_info = gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms,
                                  pattern = vec_to_grep_pattern(gal_bile_terms),
                                   "gallbladder and biliary diseases"
                          )  
        )

gwas_study_info |>
 filter(grepl("gallbladder and biliary diseases", 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()

7.9 Pancreatitis

gwas_study_info |>
 filter(grepl("pancreatitis", 
              l2_all_disease_terms)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()

7.10 Other digestive diseases

# I84-I84.9, 
# K20-K20.9, 
# K22-K22.6, 
# K22.8-K24, 
# K31-K31.8, 
# K38-K38.2, 
# K57-K62, 
# K62.2-K62.6, 
# K62.8-K62.9, 
# K64-K64.9, 
# K66.8, K67, 
# K68, 
# K77, 
# K90-K90.9, 
# K92.8, 
# K93.8

# 579 - celiac disease

other_digestive_terms <- c("esophagitis",
                           "eosinophilic esophagitis",
                           "esophageal ulcer",
                           # "barretts esophagus",
                           "diverticulitis",
                           "celiac disease",
                           "irritable bowel syndrome",
                           "anal fissure",
                           "anal fistula",
                           #? "anal polyp"
                           "rectal prolapse",
                           "rectal abscess",
                           "hemorrhoid",
                           "peritonitis"
                           )

gwas_study_info = gwas_study_info |>
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms,
                                  pattern = vec_to_grep_pattern(other_digestive_terms),
                                   "other digestive diseases"
                          )
 )


gwas_study_info |>
 filter(grepl("other digestive diseases", 
              l2_all_disease_terms)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()

8 Neurological disorders

8.1 Alzheimer’s disease and other dementias

dementia <- c("alzheimers disease biomarker measurement",
              "alzheimers disease neuropathologic change",
              "aids dementia",
              "dementia",
              "frontotemporal dementia",
              "lewy body dementia",
              "vascular dementia",
              "alzheimers disease",
              "neurodegenerative disease"
)

gwas_study_info = 
gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = vec_to_grep_pattern(dementia),
                                   "alzheimer's disease and other dementias"
                          )  
        )

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("alzheimer's disease and other dementias"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()

8.2 Parkinsons disease

gwas_study_info = 
gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = "parkinsons disease",
                                   "parkinson's disease"
                          )  
        )

gwas_study_info |>
 filter(grepl("parkinson's disease", l2_all_disease_terms)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()

8.3 Idiopathic epilepsy

gwas_study_info =
gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = "idiopathic generalized epilepsy",
                                   "idiopathic epilepsy"
                          )  
        )

# 
gwas_study_info =
gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = "rolandic epilepsy",
                                   "idiopathic epilepsy"
                          )  
        )



gwas_study_info =
gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = "partial epilepsy",
                                   "idiopathic epilepsy"
                          )  
        )

gwas_study_info |>
 filter(grepl("idiopathic epilepsy", l2_all_disease_terms)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()

8.4 Multiple sclerosis

gwas_study_info |>
 filter(grepl("multiple sclerosis", l2_all_disease_terms)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()

8.5 Motor neuron disease

motor_neuron_disease_terms <- c("anterior horn cell disease",
                                "anterior horn disorder",
                                "amyotrophic lateral sclerosis",
                              "motor neuron disease"
)

gwas_study_info = gwas_study_info |>
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms,
                                  pattern = vec_to_grep_pattern(motor_neuron_disease_terms),
                                   "motor neuron disease"
                          )  
        )

gwas_study_info |>
 filter(grepl(
      vec_to_grep_pattern("motor neuron disease"), 
      l2_all_disease_terms,
      perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()

8.6 Headache disorders

headache_terms <- c("headache disorder",
                    "cluster headache",
                    "migraine",
                    "headache"
                    )



gwas_study_info = gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms,
                                  #pattern = "\\bheadache disorder\\b|cluster headache\\b|migraine\\b",
                                  pattern = vec_to_grep_pattern(headache_terms),
                                   "headache disorders"
                          )  
        )

gwas_study_info |>
 filter(grepl("(?<=^|, )headache disorders", 
              l2_all_disease_terms, 
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()

8.7 Other neurological disorders

other_neuro_terms <- c("huntington disease",
                       "hereditary ataxia",
                       "torsion dystonia",
                       "x-linked dystonia-parkinsonism",
                       "isolated dystonia",
                       "limb dystonia",
                       "cervical dystonia",
                       "myoclonus",
                       "restless legs syndrome",
                        "chronic inflammatory demyelinating polyneuropathy",
                       "demyelinating disease of central nervous system",
                       "myasthenia gravis",
                       "complex regional pain syndrome",
                       "acute transverse myelitis",
                       "machado-joseph disease",
                       "degenerative disease of the spinal cord",
                       "cerebral palsy",
                       "peripheral neuropathy",
                       "essential tremor",
                       "facial nerve disease",
                       "sporadic amyotrophic lateral sclerosis"
                       )

gwas_study_info = gwas_study_info |>
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms,
                                  pattern = vec_to_grep_pattern(other_neuro_terms),
                                   "other neurological disorders"
                          )  
        )

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("other neurological disorders"), 
              l2_all_disease_terms, 
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()

9 Mental disorders

9.1 Schizophrenia

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("schizophrenia"), 
              l2_all_disease_terms, 
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()

9.2 Depressive disorders

depressive_terms <- c("depressive disorder",
                      "depressive symptom",
                      "depressive episode",
                      "major depressive disorders",
                      "major depressive disorder",
                      "major depressive episode",
                      "depressive"
)


gwas_study_info = gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = vec_to_grep_pattern(depressive_terms),
                                   "depressive disorders"
                          )  
        ) |>
   mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = vec_to_grep_pattern("depressive disorder"),
                                   "depressive disorders"
                          )  
        ) 
  
  
gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("depressive disorders"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()

9.3 Anxiety disorders

url <- "http://www.ebi.ac.uk/ols4/api/ontologies/mesh/terms/http%253A%252F%252Fid.nlm.nih.gov%252Fmesh%252FD001008/descendants"

anxiety_terms <- get_descendants(url)

anxiety_terms <- c(anxiety_terms, 
                   "obsessive-compulsive symptom measurement",
                   "obsessive-compulsive disorder",
                   "obsessive-compulsive",
                   "anxiety"
                   )


gwas_study_info = 
gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = vec_to_grep_pattern(anxiety_terms),
                                   "anxiety disorders"
                          )  
        ) |>
   mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = "(?<=^|, )anxiety disorder(?=,|$)|(?<=^|, ) anxiety measurement(?=,|$)",
                                   "anxiety disorders"
                          )  
        ) |>
     mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = "(?<=^|, )anxiety(?=,|$)",
                                   "anxiety disorders"
                          )  
        )

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("anxiety disorders"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()

9.4 Eating disorders

gwas_study_info = gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = 
                                    vec_to_grep_pattern(
                                      c("bulimia nervosa",
                                        "anorexia nervosa",
                                        "binge eating",
                                        "eating disorder"
                                      )
                                    ),
                                  "eating disorders"
                          )  
        ) |>
   mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = "anorexia",
                                  "eating disorders"
                          )  
        )


gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("eating disorders"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()

9.5 Autism spectrum disorders

gwas_study_info = gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms,
                                  pattern = vec_to_grep_pattern("autism"),
                                   "autism spectrum disorders"
                          )  
        )

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("autism spectrum disorders"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()

9.6 ADHD

gwas_study_info = gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = vec_to_grep_pattern("adhd"),
                                   "attention-deficit/hyperactivity disorder"
                          )  
        )

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("attention-deficit/hyperactivity disorder"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()

9.7 Conduct disorder

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("conduct disorder"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()

9.8 Idiopathic developmental intellectual disability

terms <- c("developmental disability",
           "dyslexia")

gwas_study_info = gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = vec_to_grep_pattern(terms),
                                   "idiopathic developmental intellectual disability"
                          )  
        )

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("idiopathic developmental intellectual disability"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()

9.9 Other mental disorders

9.9.1 Personality disorders

url <- "http://www.ebi.ac.uk/ols4/api/ontologies/mondo/terms/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FMONDO_0002028/descendants"

personality_disorders <- get_descendants(url)

gwas_study_info = gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms,
                                  pattern = vec_to_grep_pattern(personality_disorders),
                                   "personality disorders"
                          )  
        )

9.9.2 Mood disorders

url <- "http://www.ebi.ac.uk/ols4/api/ontologies/efo/terms/http%253A%252F%252Fwww.ebi.ac.uk%252Fefo%252FEFO_0004247/descendants"

mood_disorders <- get_descendants(url)

gwas_study_info = gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms,
                                  pattern = vec_to_grep_pattern(mood_disorders),
                                   "mood disorder"
                          )  
        )

9.9.3 Sleep disorders

url <- "http://www.ebi.ac.uk/ols4/api/ontologies/doid/terms/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FDOID_535/descendants"

sleep_disorders <- get_descendants(url)

url <- "http://www.ebi.ac.uk/ols4/api/ontologies/efo/terms/http%253A%252F%252Fwww.ebi.ac.uk%252Fefo%252FEFO_0008568/descendants"

other_sleep_disorders <- get_descendants(url)

sleep_disorders <- c(sleep_disorders,
                    other_sleep_disorders)

sleep_disorders <- str_length_sort(sleep_disorders)

gwas_study_info = gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms,
                                  pattern = vec_to_grep_pattern(sleep_disorders),
                                   "sleep disorders"
                          )  
        )
other_mental_disorders <- c("manic or hypomanic episode",
                            "mental or behavioural disorder",
                            "mental disorder",
                            "post-traumatic stress disorder",
                            "stress-related disorder",
                            "acute stress reaction",
                            "occupation-related stress disorder",
                            "psychotic symptom",
                            "psychosis",
                            "psychiatric disorder",
                             "personality disorders",
                            "personality disorder",
                            "mood disorder",
                            "sleep disorders",
                            "sleep disorder",
                            "mixed anxiety disorders and depressive disorders",
                            "emotional symptom",
                            "dissociative disorder",
                            "hallucinations",
                            "somatoform disorder",
                            "schizoaffective disorder",
                            "phobic disorder",
                            "psychotic"
                            
)


gwas_study_info =
gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = vec_to_grep_pattern(other_mental_disorders),
                                   "other mental disorders"
                          )  
        )

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("other mental disorders"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()

10 Substance use disorders

10.1 Alcohol use disorders

gwas_study_info = 
gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms,
                                  pattern = 
                                    vec_to_grep_pattern(
                                      c("alcohol-related disorders",
                                        "alcohol and nicotine codependence",
                                        "alcohol use disorder"
                                      )),
                                   "alcohol use disorders"
                          )  
        )


gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("alcohol use disorders"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()

10.2 Opioid use disorders

gwas_study_info = 
gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = vec_to_grep_pattern(
                                    c("opioid-related disorders",
                                      "opioid dependence",
                                      "opioid use disorder"
                                      )
                                    ),
                                   "opioid use disorders"
                          )  
        )

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("opioid use disorders"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()

10.3 Cocaine use disorders

gwas_study_info = gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = vec_to_grep_pattern("cocaine-related disorders"),
                                   "cocaine use disorders"
                          )  
        )

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("cocaine use disorders"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()

10.4 Amphetamine use disorders

gwas_study_info = 
gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = "methamphetamine",
                                   "amphetamine"
                          )  
        )

gwas_study_info |>
 filter(grepl("amphetamine", 
              l2_all_disease_terms)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()

10.5 Cannabis use disorders

gwas_study_info = 
gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = vec_to_grep_pattern("cannabis dependence"),
                                   "cannabis use disorders"
                          )  
        )

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("cannabis use disorders"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()

10.6 Other drug use disorders

other_drug_use_terms <- c("heroin dependence",
                          "drug dependence",
                          "nictone dependence",
                          "substance abuse",
                          "drug misuse",
                          "nicotine-related disorders"
                          )

gwas_study_info = 
gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = vec_to_grep_pattern(other_drug_use_terms),
                                   "other drug use disorders"
                          )  
        )

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("other drug use disorders"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()

11 Diabetes and kidney diseases

11.1 Diabetes mellitus type 1

gwas_study_info =
gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = vec_to_grep_pattern("type 1 diabetes mellitus"),
                                   "diabetes mellitus type 1"
                          )  
        )

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("diabetes mellitus type 1"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()

11.2 Diabetes mellitus type 2

gwas_study_info =
gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = vec_to_grep_pattern("type 2 diabetes mellitus"),
                                   "diabetes mellitus type 2"
                          )  
        )

gwas_study_info |>
 filter(grepl("diabetes mellitus type 2", 
              l2_all_disease_terms)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()

11.3 Chronic kidney disease

chronic_kidney_disease <- c("cystic kidney disease")

gwas_study_info = gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = vec_to_grep_pattern(chronic_kidney_disease),
                                   "chronic kidney disease"
                          )  
        )

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("chronic kidney disease"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()

11.4 Acute glomerulonephritis

glomerulonephritis_terms <- c("chronic glomerulonephritis",
                              "membranous glomerulonephritis",
                              "proliferative glomerulonephritis")

gwas_study_info |>
  mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = vec_to_grep_pattern(glomerulonephritis_terms),
                                   "glomerulonephritis"
                          )  
        )
  


gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("acute glomerulonephritis"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()

12 Skin and subcutaneous diseases

12.1 Dermatitis

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("dermatitis"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()

12.2 Psoriasis

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("psoriasis"),
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()

12.3 Bacterial skin diseases

bacterial_skin_disease_terms <- c("staphylococcal skin infections",
                                  "skin and soft tissue staphylococcus aureus infection",
                                  "cellulitis"
                                  # "skin infection"
                                  )

gwas_study_info = gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = vec_to_grep_pattern(bacterial_skin_disease_terms),
                                   "bacterial skin diseases"
                          )  
        )

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("bacterial skin diseases"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()

12.4 Scabies

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("scabies"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()

12.5 Fungal skin diseases

fungal_skin_disease_terms <- c("tinea",
                               "dermatomycosis")

gwas_study_info = gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = vec_to_grep_pattern(fungal_skin_disease_terms),
                                   "fungal skin diseases"
                          )  
        )

gwas_study_info |>
 filter(grepl("fungal skin diseases", l2_all_disease_terms)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()

12.6 Viral skin diseases

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("viral skin diseases"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()

12.7 Acne vulgaris

acne_terms <- c("sapho syndrome")

gwas_study_info = 
  gwas_study_info |>
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = vec_to_grep_pattern(acne_terms),
                                   "acne vulgaris"
                          )  
        )

gwas_study_info = gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = vec_to_grep_pattern("acne"),
                                   "acne vulgaris"
                          )  
        )

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("acne vulgaris"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()

12.8 Pruritus

# also add prurigo to puritus
gwas_study_info = gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = vec_to_grep_pattern("prurigo"),
                                   "pruritus"
                          )  
        )


gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("pruritus"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()

12.9 Urticaria

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("urticaria"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()

12.10 Decubitus ulcer

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("decubitus ulcer"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()

12.11 Other skin and subcutaneous diseases

other_skin_disease_terms <- c("sebaceous gland disease",
                              "rosacea",
                              "erythematosquamous dermatosis",
                              "dry skin",
                              "skin tags",
                              "dermatochalasis",
                              "epidermal thickening",
                              "epidermal inclusion cyst",
                              "cutaneous lupus erythematosus",
                              "androgenetic alopecia",
                              "chemotherapy-induced alopecia",
                              "cutaneous leishmaniasis",
                              "acanthosis nigricans",
                              "stevens-johnson syndrome",
                              "keloid")

gwas_study_info = gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = vec_to_grep_pattern(other_skin_disease_terms),
                                   "other skin and subcutaneous diseases"
                          )
 )
                                   

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("other skin and subcutaneous diseases"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()

13 Sense organ diseases

13.1 Blindness and vision loss

vision_loss_terms <- c("blindness",
                       "color vision disorder",
                       "vision disorder",
                       "visuospatial impairment",
                       "pathological blindness and vision loss",
                       "visual impairment",
                       "myopia",
                       "refractive error",
                       "retinopathy",
                       "hyperopia",
                       "astigmatism",
                       "corneal astigmatism",
                       "presbyopia",
                       "anisometropia",
                       "esotropia",
                       "non-accomodative esotropia",
                       "accommodative esotropia",
                       "abnormality of refraction",
                       "abnormality of vision",
                       "age-related macular degeneration",
                       "degeneration of macula and posterior pole",
                       "age-related cataract",
                       "retinal degeneration",
                       "retinal drusen",
                       "cataract")

gwas_study_info = gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = vec_to_grep_pattern(vision_loss_terms),
                                   "blindness and vision loss"
                          )  
        )

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("blindness and vision loss"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()

13.3 Other sense organ diseases

other_sense_terms <- c("abnormality of the sense of smell",
                       "disturbances of sensation of smell and taste",
                       "tinnitus",
                       "disturbance of skin sensation",
                       "disturbances to senses",
                       "vogt-koyanagi-harada disease",
                       "pathological myopia",
                       "lacrimal apparatus disease",
                       "keratoconus"
                       )

gwas_study_info =
gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = vec_to_grep_pattern(other_sense_terms),
                                   "other sense organ diseases"
                          )  
        )

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("other sense organ diseases"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()

14 Musculoskeletal disorders

14.1 Rheumatoid arthritis

# add juvenile idiopathic arthritis to rheumatoid arthritis
gwas_study_info = gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = vec_to_grep_pattern("juvenile idiopathic arthritis"),
                                   "rheumatoid arthritis"
                          )  
        )

# rheumatoid factor-negative juvenile idiopathic arthritis
gwas_study_info = gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = vec_to_grep_pattern("rheumatoid factor-negative juvenile idiopathic arthritis"),
                                   "rheumatoid arthritis"
                          )  
        )

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("rheumatoid arthritis"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()

14.2 Osteoarthritis

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("osteoarthritis"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()

14.3 Low back pain

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("low back pain"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()

14.4 Neck pain

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("neck pain"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()

14.5 Gout

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("gout"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()

14.6 Other musculoskeletal disorders

other_musculo_terms <- c("infectious arthritis",
                         "scoliosis",
                         "fasciitis",
                         "plica syndrome",
                         "panniculitis",
                         "sciatica",
                         "polymyalgia rheumatica",
                         "acquired musculoskeletal deformity",
                         "pyogenic arthritis",
                         "reactive arthritis",
                         "acroiliac arthritis",
                         "arthritis",
                         "systemic lupus erythematosus",
                         "musculoskeletal system disease",
                         "spondylosis",
                         "osteonecrosis",
                         "myalgia"
                         )


gwas_study_info = gwas_study_info |>
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = vec_to_grep_pattern(other_musculo_terms),
                                   "other musculoskeletal disorders"
                          )  
        )


gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("other musculoskeletal disorders"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()

15 Other non-communicable diseases

15.1 Congenital birth defects

15.1.1 Neural tube defects

15.1.2 Congenital heart defects

15.1.3 Orofacial clefts

gwas_study_info = gwas_study_info |>
  mutate(l2_all_disease_terms = 
           stringr::str_replace_all(
             l2_all_disease_terms,
             vec_to_grep_pattern("orofacial cleft"),
             "orofacial clefts"
           )
  )

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("orofacial clefts"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()

15.1.4 Down syndrome

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("down syndrome"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()

15.1.5 Klinefelter syndrome

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("klinefelter syndrome"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()
icd10 <- readxl::read_xlsx(here::here("data/icd/cdc_valid_icd10_Sep_23_2025.xlsx"))

icd10 <- icd10 |>
  rename_with(~ tolower(gsub(" ", "_", .x))) 

15.1.6 Other chromosomal abnormalities

# Q87-Q87.8, Q91-Q93.9, Q95-Q95.9, Q97-Q97.9, Q99-Q99.8

other_chrom_abn_icd10 <-
  c(paste0("Q", seq(87, 87.8, by = 0.1) * 10),
    paste0("Q", seq(91, 93.9, by = 0.1) * 10),
    paste0("Q", seq(95, 95.9, by = 0.1) * 10),
    paste0("Q", seq(97, 97.9, by = 0.1) * 10),
    paste0("Q", seq(99, 99.8, by = 0.1) * 10)
    )

other_chrom_abn_terms <-
  icd10 |>
  filter(grepl(paste0(other_chrom_abn_icd10, collapse = "|"), code)) |>
  pull(`short_description_(valid_icd-10_fy2025)`) |>
  unique()

other_chrom_abn <- c("marfan syndrome",
                     "fragile x syndrome",
                     "22q11.2 deletion syndrome",
                     "chromosomal disorder")

url <- "http://www.ebi.ac.uk/ols4/api/ontologies/mondo/terms/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FMONDO_0019040/descendants"

chromosomal_disorders <- get_descendants(url)

other_chrom_abn <- c(other_chrom_abn, 
                     chromosomal_disorders)

gwas_study_info = gwas_study_info |>
  mutate(l2_all_disease_terms  = 
         stringr::str_replace_all(l2_all_disease_terms,
                          vec_to_grep_pattern(c(other_chrom_abn_terms, other_chrom_abn)),
                          "other chromosomal abnormalities"
         )
  )

15.1.7 Congenital heart abnormalities

# Q20-Q28.9
congen_heart_icd10 <-
  paste0("Q", seq(20, 28.9, by = 0.1) * 10) 


congen_terms <-
  icd10 |>
  filter(grepl(paste0(congen_heart_icd10, collapse = "|"), code)) |>
  pull(`short_description_(valid_icd-10_fy2025)`) |>
  unique()

gwas_study_info = gwas_study_info |>
  mutate(l2_all_disease_terms  = 
         stringr::str_replace_all(l2_all_disease_terms,
                          "congenital anomaly of cardiovascular system",
                          "congenital heart abnormalities"
         )
  )

15.1.8 Congenital musculoskeletal and limb anomalies

# Q65-Q79, Q79.6-Q79.9
congen_musculo_icd10 <-
  c(paste0("Q", seq(65, 79, by = 0.1) * 10),
    paste0("Q", seq(79.6, 79.9, by = 0.1) * 10)
    )

congen_musculo_terms <-
  icd10 |>
  filter(grepl(paste0(congen_musculo_icd10, collapse = "|"), code)) |>
  pull(`short_description_(valid_icd-10_fy2025)`) |>
  unique()


congen_muscle <- c("osteochondrodysplasia",
                   "congenital deformities of limbs",
                   "lower limb asymmetry",
                   "familial clubfoot with or without associated lower limb anomalies",
                   "abnormality of limbs",
                   "abnormal foot morphology",
                   congen_musculo_terms
                   )


gwas_study_info = gwas_study_info |>
  mutate(l2_all_disease_terms  = 
         stringr::str_replace_all(l2_all_disease_terms,
                          vec_to_grep_pattern(c(congen_musculo_terms, congen_muscle)),
                          "congenital musculoskeletal and limb anomalies"
         )
  )

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("congenital musculoskeletal and limb anomalies"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()

15.1.9 Urogenital congenital anomalies

# P96.0, Q50-Q60.6, Q63-Q64.9

urogenital_icd10 <-
  c("P960",
    paste0("Q", seq(50, 60.6, by = 0.1) * 10),
    paste0("Q", seq(63, 64.9, by = 0.1) * 10)
    )

urogenital_terms <-
  icd10 |>
  filter(grepl(paste0(urogenital_icd10, collapse = "|"), code)) |>
  pull(`short_description_(valid_icd-10_fy2025)`) |>
  unique()


urogenital_terms <- 
  c("abnormal morphology of female internal genitalia",
    "abnormality of the genital system",
    "functional abnormality of the bladder",
    "congenital anomaly of kidney and urinary tract",
    "bladder exstrophy", # ICD9:753.5
    tolower(urogenital_terms))

gwas_study_info = gwas_study_info |>
  mutate(l2_all_disease_terms  = 
         stringr::str_replace_all(l2_all_disease_terms,
                          vec_to_grep_pattern(urogenital_terms),
                          "urogenital congenital anomalies"
         )
  )


gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("urogenital congenital anomalies"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()

15.1.10 Digestive congenital anomalies

# Q38-Q45.9, Q79.0-Q79.5

digestive_icd10 <-
  c(paste0("Q", seq(38, 45.9, by = 0.1) * 10),
    paste0("Q", seq(79, 79.5, by = 0.1) * 10)
    )

digestive_terms <-
  icd10 |>
  filter(grepl(paste0(digestive_icd10, collapse = "|"), code)) |>
  pull(`short_description_(valid_icd-10_fy2025)`) |>
  unique()

digestive_terms <- tolower(digestive_terms)
digestive_terms <- stringr::str_remove_all(digestive_terms, ", unspecified$")

digestive_terms <- c(
  "hirschsprung disease",
  digestive_terms
)

gwas_study_info = gwas_study_info |>
  mutate(l2_all_disease_terms  = 
         stringr::str_replace_all(l2_all_disease_terms,
                          vec_to_grep_pattern(tolower(digestive_terms)),
                          "digestive congenital anomalies"
         )
  )




gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("digestive congenital anomalies"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()

15.1.11 Other congenital birth defects

other_congen <- c("abnormality of the respiratory system")

15.2 Urinary diseases and male infertility

15.2.1 Urinary tract infections and interstitial nephritis

# N10-N12.9, N13.6, N15, N15.1-N16.8, N30-N30.3, N30.8-N30.9, N34-N34.3, N39.0-N39.2

uti_icd10 <-
  c(paste0("N", seq(10, 12.9, by = 0.1) * 10),
    "N136",
    paste0("N", seq(151, 168, by = 0.1) * 10),
    paste0("N", seq(30, 30.3, by = 0.1) * 10),
    paste0("N", seq(308, 309, by = 0.1) * 10),
    paste0("N", seq(34, 34.3, by = 0.1) * 10),
    paste0("N", seq(390, 392, by = 0.1) * 10)
    )


uti_terms <-
  icd10 |>
  filter(grepl(paste0(uti_icd10, collapse = "|"), code)) |>
  pull(`short_description_(valid_icd-10_fy2025)`) |>
  unique()

uti_terms <-
  c("acute cystitis",
    "chronic cystitis",
    "chronic interstitial cystitis",
    "urinary tract infection",
    "interstitial nephritis",
    "pyelonephritis",
    "urethritis",
    "cystitis",
    uti_terms
    )

gwas_study_info = gwas_study_info |>
  mutate(l2_all_disease_terms = 
           stringr::str_replace_all(
             l2_all_disease_terms,
             vec_to_grep_pattern(uti_terms),
             "urinary tract infections and interstitial nephritis"
           )
  )

15.2.2 Urolithiasis

# N20-N23.0

urolithiasis_icd10 <-
  paste0("N", seq(20, 23.0, by = 0.1) * 10)

urolithiasis_terms <-
  icd10 |>
  filter(grepl(paste0(urolithiasis_icd10, collapse = "|"), code)) |>
  pull(`short_description_(valid_icd-10_fy2025)`) |>
  unique()

urolithiasis_terms <- tolower(urolithiasis_terms)
urolithiasis_terms <- stringr::str_remove_all(urolithiasis_terms, ", unspecified$")

urolithiasis_terms <-
  c("lower urinary tract calculus",
    "urinary calculus",
    "bladder calculus",
    "calcium oxalate nephrolithiasis",
    "calcium phosphate nephrolithiasis",
    "uric acid nephrolithiasis",
    "nephrolithiasis",
    "renal colic", #icd-9 788.0
    "ureterolithiasis",
    urolithiasis_terms
    )

gwas_study_info = gwas_study_info |>
  mutate(l2_all_disease_terms = 
           stringr::str_replace_all(
             l2_all_disease_terms,
             vec_to_grep_pattern(urolithiasis_terms),
             "urolithiasis"
           )
  )

15.2.3 Benign prostatic hyperplasia

15.2.4 Male infertility

gwas_study_info |>
  filter(grepl(vec_to_grep_pattern("male infertility"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct()

15.2.5 Other urinary diseases

# N25-N28.1, N29-N29.8, N31-N32.0, N32.3-N32.4, N36-N36.9, N39, N41-N41.9, N44-N44.0, N45-N45.9, N49-N49.9

urine_icd10 <- 
  c(paste0("N", seq(25, 28.1, by = 0.1) * 10),
    paste0("N", seq(29, 29.8, by = 0.1) * 10),
    paste0("N", seq(31, 32.0, by = 0.1) * 10),
    paste0("N", seq(32.3, 32.4, by = 0.1) * 10),
    paste0("N", seq(36, 36.9, by = 0.1) * 10),
    "N39",
    paste0("N", seq(41, 41.9, by = 0.1) * 10),
    paste0("N", seq(44, 44.0, by = 0.1) * 10),
    paste0("N", seq(45, 45.9, by = 0.1) * 10),
    paste0("N", seq(49, 49.9, by = 0.1) * 10)
    )

urine_terms <-
  icd10 |>
  filter(grepl(paste0(urine_icd10, collapse = "|"), code)) |>
  pull(`short_description_(valid_icd-10_fy2025)`) |>
  unique() |>
  tolower()

urine_terms <- stringr::str_remove_all(urine_terms, ", unspecified$")
urine_terms <- stringr::str_remove_all(urine_terms, ", site not specified$")


urinary_diseases_terms <- c("urinary incontinence",
                            "stress urinary incontinence",
                            "urgency urinary incontinence",
                            "urinary system disease",
                            "bladder neck obstruction",
                            "urinary tract obstruction",
                            "urethral disease",
                            "urethral syndrome",
                            "uterine inflammatory disease",
                            "enuresis",
                            "neurogenic bladder", # icd9 596.54
                            "bladder diverticulum", # icd9 596.3
                            "priapism",
                            "hydronephrosis",
                            urine_terms
                            )

urinary_diseases_terms <- str_length_sort(urinary_diseases_terms)

gwas_study_info = gwas_study_info |>
  mutate(l2_all_disease_terms = 
           stringr::str_replace_all(
             l2_all_disease_terms,
             vec_to_grep_pattern(urinary_diseases_terms),
             "other urinary diseases"
           )
  )

15.3 Gynocological diseases

15.3.1 Uterine fibroids

# D25-D26, D28.2
icd10_uterine_fibroids <-
  c(paste0("D", seq(25, 26, by = 0.1) * 10),
    "D282"
    )

icd10_uterine_fibroids_terms <-
  icd10 |>
  filter(grepl(paste0(icd10_uterine_fibroids, collapse = "|"), code)) |>
  pull(`short_description_(valid_icd-10_fy2025)`) |>
  unique()

uterine_fibroid_terms <- 
  c("uterine leiomyoma",
    "uterine fibroid")



gwas_study_info = gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms,
                                  pattern = vec_to_grep_pattern(uterine_fibroid_terms),
                                   "uterine fibroids"
                          )  
        )

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("uterine fibroids"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()

15.3.2 Polycystic ovary syndrome

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("polycystic ovary syndrome"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()

15.3.3 Female infertility

gwas_study_info |>
  filter(grepl(
         vec_to_grep_pattern("female infertility"),
         l2_all_disease_terms,
         perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() 

15.3.4 Endometriosis

endo_terms <- c("ovarian endometriosis",
                "endometriosis of pelvic peritoneum"
                )

gwas_study_info = gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms,
                                  pattern = vec_to_grep_pattern(endo_terms),
                                   "endometriosis"
                          )  
        )

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("endometriosis"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()

15.3.5 Genital prolapse

fem_genital_prolapse_terms <- c("uterine prolapse",
                              "cystocele",
                              "rectocele",
                              "enterocele"
)

gwas_study_info = gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms,
                                  pattern = vec_to_grep_pattern(fem_genital_prolapse_terms),
                                   "genital prolapse"
                          )
 )
        
        
gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("genital prolapse"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()

15.3.6 Premenstrual syndrome

15.3.7 Other gynocological diseases

diseases <- stringr::str_split(pattern = ", ", 
                               gwas_study_info$l2_all_disease_terms)  |> 
            unlist() |>
            stringr::str_trim()

# N72-N72.0, N75-N77.8, N83-N83.9

other_gyno_terms <- c("cervicitis", #N72
                      "bartholin gland disease", # N75
                      "bartholin duct cyst",
                      "vaginitis", # N76
                      "vaginal inflammation",
                      "postmenopausal atrophic vaginitis",                      
                      "atrophic vaginitis",
                      "vulvovaginitis",
                      "ulceration of vulva", # N77
                      "ovarian cyst", # N83
                      "follicular cyst",
                      "noninflammatory disorder of ovary fallopian tube and broad ligament",
                      "cervical disorder",
                      "dysmenorrhea",
                      "dyspareunia"
                      )

# pregnancy_terms <- grep("pregnancy", diseases, value = T)

# gyno_terms <- c(
#                 "female reproductive system disease",
#                 "female genital tract fistula",
#                 "placenta disease", 
#                 "ovarian gynecological diseases",
#                 "vaginal disorder",
# 
# 
#                 "abnormal delivery",
#                 pregnancy_terms)


gwas_study_info = gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms,
                                  pattern = vec_to_grep_pattern(other_gyno_terms),
                                   "other gynecological diseases"
                          )  
        )

15.4 Hemoglobinopathies and hemolytic anemias

hemoglobinopathies_terms <- c("sickle cell disease and related diseases",
                              "thalassemia",
                              "inherited hemoglobinopathy",
                              "hemoglobin e disease"
)

hemopath_hemo_anemias <- c(hemoglobinopathies_terms, "hemolytic anemia")

gwas_study_info = gwas_study_info |>
  mutate(l2_all_disease_terms = 
           stringr::str_replace_all(
             l2_all_disease_terms,
             vec_to_grep_pattern(hemopath_hemo_anemias),
             "hemoglobinopathies and hemolytic anemias"
           )
  )

gwas_study_info |>
 filter(grepl(vec_to_grep_pattern("hemoglobinopathies and hemolytic anemias"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()

15.4.1 Endocrine, metabolic, blood, and immune disorders

endo_terms <- c("ovarian dysfunction",
                "ovarian disease",
                "menstrual disorder",
            #    "premenstral tension" #icd9 625.4, icd10 N94.3,
                "adrenocortical insufficiency", # ICD9:255.4, ICD10:E27.1 / ICD10:E27.4
                "cushing syndrome", # ICD9:255.0, ICD10:E24
                "hyperaldosteronism", # ICD10CM:E26, ICD9:255.1
                "delayed puberty", #ICD10:E30.1, ICD9:259.1
                "central precocious puberty",  #?ICD10: E30.1
            "endocrine system disease", # ????
            "primary aldosteronism"
                )

gwas_study_info = gwas_study_info |>
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms,
                                  pattern = vec_to_grep_pattern(endo_terms),
                                   "other endocrine disorders"
                          )  
        )

thyroid_terms <- c("congenital hypothyroidism due to developmental anomaly", #ICD-10:E03.1
                   "hypothyroidism",
                   "hyperthyroidism",
                   "myxedema",
                   "nontoxic goiter",#ICD-10: E04
                   "thyrotoxicosis", #ICD-10: E05
                   "toxic nodular goiter", # ICD9:242.3)
                   "goiter",
                   "thyroiditis", #ICD-10: E06, IDC9:245
                   "hashimotos thyroiditis", #ICD9:245.2
                   "graves disease", #ICD9:242.0,
                   "autoimmune thyroid disease" #ICD10:E06.3,
                   )

gwas_study_info = gwas_study_info |>
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms,
                                  pattern = vec_to_grep_pattern(thyroid_terms),
                                   "thyroid disorders"
                          )  
        )


other_blood_disorders <- c("neutropenia",
                           "cryoglobulinemia",
                           "acquired coagulation factor deficiency", #ICD10: D68.4
                           "von willebrand disease", # ICD9 286.4
                           "hemophilia a", #  Congenital factor VIII disorder - ICD9 286.0
                           "qualitative platelet defect",
                           "blood coagulation disease",
                           "hematologic disease",
                           "thrombophilia"
                           )

gwas_study_info = gwas_study_info |>
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms,
                                  pattern = vec_to_grep_pattern(other_blood_disorders),
                                   "other blood disorders"
                          )  
        )

metabolism_disorders <- c("hypoglycemia",
                          "hypoparathyroidism",
                          "parathyroid disease", # ICD9: 252
                          "primary hyperparathyroidism", # ICD10 E21.0
                          "hyperparathyroidism", #ID20 E20
                          # "secondary hyperparathyroidism of renal origin" not incl, as otherwise specified in ICD10CM:N25.81
                          "obesity",
                          "metabolic syndrome",
                          "metabolic syndrome x",
                          "inborn errors of metabolism",
                          "metabolic disease",
                          "mineral metabolism disease", # ICD9:275.8, ICD9:275.9, ICD10:E83
                          "acidosis", # ICD9:276.2
                          "disorder of acid-base balance", # ? ICD10-cm E87.8
                          "bilirubin metabolism disease",
                          "cystic fibrosis", # ICD9:277.0, ICD10:E84
                          "hyperlipidemia",
                          "hypovolemia",
                          "rare dyslipidemia"
                          )

gwas_study_info = gwas_study_info |>
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms,
                                  pattern = vec_to_grep_pattern(metabolism_disorders),
                                   "other metabolic disorders"
                          )  
        )

15.4.2 Other hemoglobinopathies and hemolytic anemias

other_hemo <- c("aplastic anemia",
                "pure red cell aplasia",
                "severe malarial anemia",
                "anemia")

gwas_study_info = gwas_study_info |>
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms,
                                  pattern = vec_to_grep_pattern(other_hemo),
                                   "other hemoglobinopathies and hemolytic anemias"
                          )  
        )

15.5 Oral disorders

oral_disorders_terms <- c("dental caries",
                           "tooth disease",
                          "toothache",
                           "periodontal disease",
                          "tooth agenesis"
)


gwas_study_info = gwas_study_info |>
  mutate(l2_all_disease_terms = 
           stringr::str_replace_all(
             l2_all_disease_terms,
             vec_to_grep_pattern(oral_disorders_terms),
             "oral disorders"
           )
  )


gwas_study_info |> 
 filter(grepl(vec_to_grep_pattern("oral disorders"), 
              l2_all_disease_terms,
              perl = T)) |>
  select(all_disease_terms, l2_all_disease_terms) |>
  distinct() |>
  head()

16 Check compatibility with gbd data

16.1 Weird fix

gwas_study_info = 
gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = "disorderss",
                                   "disorders"
                          )  
        )


gwas_study_info = 
gwas_study_info |> 
 mutate(l2_all_disease_terms  = 
          stringr::str_replace_all(l2_all_disease_terms ,
                                  pattern = "anxiety disorders disorderss",
                                   "anxiety disorders"
                          )  
        )
gbd_data <- data.table::fread(here::here("data/gbd/ihme_gbd_2019_global_disease_burden_rate_all_ages.csv"))

gbd_data$cause <- stringr::str_remove_all(gbd_data$cause, ",")

diseases <- stringr::str_split(pattern = ", ", 
                               gwas_study_info$l2_all_disease_terms[gwas_study_info$l2_all_disease_terms != ""])  |> 
            unlist() |>
            stringr::str_trim()

gbd_data$cause[!tolower(gbd_data$cause) %in% unique(diseases)] |> sort() |> unique()
gbd_data =
  gbd_data |>
  mutate(cause = tolower(cause))

gwas_disease_traits = data.frame(cause = diseases)
  # gwas_study_info |>
  # filter(DISEASE_STUDY == T) |>
  # select(all_disease_terms, l2_all_disease_terms, cause = l2_all_disease_terms) |>
  # distinct()

left_join(gwas_disease_traits, 
          gbd_data) |>
  head()

gwas_study_info |> select(cause = l2_all_disease_terms) |>
  distinct() |>
  left_join(gbd_data) |>
  head()
diseases <- stringr::str_split(pattern = ", ", 
                               gwas_study_info$l2_all_disease_terms[gwas_study_info$l2_all_disease_terms != ""])  |> 
            unlist() |>
            stringr::str_trim()

length(unique(diseases))

# make frequency table
freq <- table(as.factor(diseases))

# sort in decreasing order
freq_sorted <- sort(freq, decreasing = TRUE)

# show top N, e.g. top 10
head(freq_sorted, 10)

16.1.1 Save the updated gwas_study_info with harmonized disease terms

gwas_study_info <- fwrite(gwas_study_info,
                          here::here("output/gwas_cat/gwas_study_info_trait_group_l2.csv"))

sessionInfo()