Last updated: 2025-12-29

Checks: 7 0

Knit directory: genomics_ancest_disease_dispar/

This reproducible R Markdown analysis was created with workflowr (version 1.7.1). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.


Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

The command set.seed(20220216) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version 2b73f21. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .DS_Store
    Ignored:    .Rproj.user/
    Ignored:    .venv/
    Ignored:    analysis/.DS_Store
    Ignored:    ancestry_dispar_env/
    Ignored:    data/.DS_Store
    Ignored:    data/cdc/
    Ignored:    data/cohort/
    Ignored:    data/gbd/.DS_Store
    Ignored:    data/gbd/IHME-GBD_2021_DATA-d8cf695e-1.csv
    Ignored:    data/gbd/IHME-GBD_2023_DATA-73cc01fd-1.csv
    Ignored:    data/gbd/ihme_gbd_2019_global_disease_burden_rate_all_ages.csv
    Ignored:    data/gbd/ihme_gbd_2019_global_paf_rate_percent_all_ages.csv
    Ignored:    data/gbd/ihme_gbd_2021_global_disease_burden_rate_all_ages.csv
    Ignored:    data/gbd/ihme_gbd_2021_global_paf_rate_percent_all_ages.csv
    Ignored:    data/gwas_catalog/
    Ignored:    data/icd/.DS_Store
    Ignored:    data/icd/2025AA/
    Ignored:    data/icd/IHME_GBD_2019_COD_CAUSE_ICD_CODE_MAP_Y2020M10D15.XLSX
    Ignored:    data/icd/IHME_GBD_2019_NONFATAL_CAUSE_ICD_CODE_MAP_Y2020M10D15.XLSX
    Ignored:    data/icd/IHME_GBD_2021_COD_CAUSE_ICD_CODE_MAP_Y2024M05D16.XLSX
    Ignored:    data/icd/IHME_GBD_2021_NONFATAL_CAUSE_ICD_CODE_MAP_Y2024M05D16.XLSX
    Ignored:    data/icd/UK_Biobank_master_file.tsv
    Ignored:    data/icd/cdc_valid_icd10_Sep_23_2025.xlsx
    Ignored:    data/icd/cdc_valid_icd9_Sep_23_2025.xlsx
    Ignored:    data/icd/hp_umls_mapping.csv
    Ignored:    data/icd/lancet_conditions_icd10.xlsx
    Ignored:    data/icd/manual_disease_icd10_mappings.xlsx
    Ignored:    data/icd/mondo_umls_mapping.csv
    Ignored:    data/icd/phecode_international_version_unrolled.csv
    Ignored:    data/icd/phecode_to_icd10_manual_mapping.xlsx
    Ignored:    data/icd/semiautomatic_ICD-pheno.txt
    Ignored:    data/icd/semiautomatic_ICD-pheno_UKB_subset.txt
    Ignored:    data/icd/umls-2025AA-mrconso.zip
    Ignored:    figures/
    Ignored:    human_dictionary/
    Ignored:    igsr_populations.tsv
    Ignored:    output/.DS_Store
    Ignored:    output/abstracts/
    Ignored:    output/doccano/
    Ignored:    output/fulltexts/
    Ignored:    output/gwas_cat/
    Ignored:    output/gwas_cohorts/
    Ignored:    output/icd_map/
    Ignored:    output/trait_ontology/
    Ignored:    pubmedbert-cohort-ner-model/
    Ignored:    pubmedbert-cohort-ner/
    Ignored:    r-spacyr/
    Ignored:    renv/
    Ignored:    venv/
    Ignored:    visualization.Rdata

Unstaged changes:
    Modified:   .gitignore
    Modified:   analysis/disease_inves_by_ancest.Rmd
    Modified:   analysis/get_full_text.Rmd
    Modified:   analysis/gwas_to_gbd.Rmd
    Modified:   analysis/index.Rmd
    Modified:   analysis/level_2_disease_group.Rmd
    Modified:   analysis/missing_cohort_info.Rmd
    Modified:   analysis/replication_ancestry_bias.Rmd
    Modified:   analysis/text_for_cohort_labels.Rmd

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.


These are the previous versions of the repository in which changes were made to the R Markdown (analysis/level_1_disease_group_non_cancer.Rmd) and HTML (docs/level_1_disease_group_non_cancer.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File Version Author Date Message
Rmd 2b73f21 IJbeasley 2025-12-29 Archiving old GWAS trait conversion
html abd5cd8 IJbeasley 2025-09-24 Build site.
Rmd 43ca958 IJbeasley 2025-09-24 More fixing diseases
html 9fa0815 IJbeasley 2025-09-23 Build site.
Rmd b1256b2 IJbeasley 2025-09-23 More icd codes
html 3828be8 IJbeasley 2025-09-23 Build site.
Rmd 935db36 IJbeasley 2025-09-23 Using icd codes to help grouping
html 4bdd100 IJbeasley 2025-09-22 Build site.
Rmd 94d5bd5 IJbeasley 2025-09-22 More typo + structure fixing …
html 8b518cc IJbeasley 2025-09-22 Build site.
Rmd 61c94d3 IJbeasley 2025-09-22 …maybe fixing typos
html 23665d1 IJbeasley 2025-09-22 Build site.
Rmd 914b2e4 IJbeasley 2025-09-22 Excluding infectious disease
html f1d115d IJbeasley 2025-09-17 Build site.
Rmd 44b6e86 IJbeasley 2025-09-17 More fixing up of disease grouping
html b244021 IJbeasley 2025-09-17 Build site.
Rmd 61ecfb6 IJbeasley 2025-09-17 More correction to cardiovascular disease terms
html 778adb7 IJbeasley 2025-09-17 Build site.
Rmd 4f49935 IJbeasley 2025-09-17 Better grouping of cardiovascular disease
html cb730ee IJbeasley 2025-09-16 Build site.
Rmd b01d9aa IJbeasley 2025-09-16 Improving cancer grouping
html f15743e IJbeasley 2025-09-16 Build site.
Rmd 18a4e85 IJbeasley 2025-09-16 More disease grouping
html 922c9c3 IJbeasley 2025-09-16 Build site.
Rmd c601713 IJbeasley 2025-09-16 Even more disease term grouping
html add5ecc IJbeasley 2025-09-15 Build site.
Rmd 465a689 IJbeasley 2025-09-15 More disease term grouping
html 7504c04 IJbeasley 2025-09-15 Build site.
Rmd f01005f IJbeasley 2025-09-15 workflowr::wflow_publish("analysis/level_1_disease_group_non_cancer.Rmd")
html 9aa118e IJbeasley 2025-09-15 Build site.
Rmd ffbf74a IJbeasley 2025-09-15 Further grouping of disease terms
html b19b361 IJbeasley 2025-09-15 Build site.
Rmd 9cc22ba IJbeasley 2025-09-15 Dealing with duplicate disease terms
html 1679f9d IJbeasley 2025-09-10 Build site.
html 2250e22 IJbeasley 2025-09-10 Build site.
Rmd e3de56c IJbeasley 2025-09-10 Update cardiac disease grouping
html e713c34 IJbeasley 2025-09-10 Build site.
Rmd 934b11f IJbeasley 2025-09-10 workflowr::wflow_publish("analysis/level_1_disease_group_non_cancer.Rmd")

1 Set up

library(dplyr)
library(data.table)
library(stringr)

1.1 Ontology help - for getting disease subtypes

source(here::here("code/get_term_descendants.R"))
gwas_study_info <- fread(here::here("output/gwas_cat/gwas_study_info_disease_trait_filtered_v2.csv"))

2 Initial summary

2.1 Objectives of this analysis:

Level 1 disease grouping: - collapse disease causes, e.g. contact dermatitis due to nickel to contact dermatitis - collapse disease subtypes, e.g. bipolar I and bipolar II to bipolar disorder - collapse disease onset times, e.g. early-onset alzheimers disease and late-onset alzheimers disease to alzheimers disease - collapse disease stages - collapse disease complications, e.g. device complication, trauma complication to complication

2.2 Grouping - level 1 set up

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms = collected_all_disease_terms)

2.3 Initial summary - number of unique study terms

2.3.1 When separate studies with multiple terms

diseases <- stringr::str_split(pattern = ", ", 
 gwas_study_info$collected_all_disease_terms[gwas_study_info$collected_all_disease_terms != ""])  |> 
            unlist() |>
            stringr::str_trim()

length(unique(diseases))

3 Disease complications

3.1 Nervous system complications caused by herpes zoster

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          pattern = vec_to_grep_pattern("herpes zoster, nervous system disease"),
                          "nervous system disease"
                          )
         )

3.2 herpes zoster, psoriasis

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          pattern = vec_to_grep_pattern("herpes zoster, psoriasis"),
                          "psoriasis"
         )
         )

3.3 herpes zoster, rheumatoid arthritis

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          pattern = vec_to_grep_pattern("herpes zoster, rheumatoid arthritis"),
                          "rheumatoid arthritis"
                          
         )
         )

4 Disturbances to senses

disturb_senses_terms <- c("disturbances of sensation of smell and taste",
                          "abnormality of the sense of smell",
                          "ageusia")

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          pattern = vec_to_grep_pattern(disturb_senses_terms),
                          "disturbances to senses"
         )
         )

5 Disease stage grouping

5.1 Chronic kidney disease (CKD)

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms = 
         stringr::str_replace_all(l1_all_disease_terms,
                          vec_to_grep_pattern("chronic kidney disease stage 5"),
                          "chronic kidney disease"
         ))


gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms = 
         stringr::str_replace_all(l1_all_disease_terms,
                          vec_to_grep_pattern("stage 5 chronic kidney disease"),
                          "chronic kidney disease"
         ))

6 Disease cause grouping

7 Disease complication

disease_complication_terms = c("device complication",
                             "trauma complication",
                             "adverse effect",
                             "aseptic loosening"
                             )

pattern <- vec_to_grep_pattern(disease_complication_terms)

gwas_study_info = gwas_study_info |>
 mutate(l1_all_disease_terms  = 
          stringr::str_replace_all(l1_all_disease_terms,
                                  pattern = pattern,
                                   "complication"
                          )  
        )

8 Cardiovascular diseases

8.1 Hypertension

8.1.1 Pulmonary arterial hypertension

url <- "http://www.ebi.ac.uk/ols4/api/ontologies/efo/terms/http%253A%252F%252Fwww.ebi.ac.uk%252Fefo%252FEFO_0001361/descendants"

pulmonary_arterial_hypertension_terms <- get_descendants(url)

pattern <- vec_to_grep_pattern(pulmonary_arterial_hypertension_terms)

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          pattern = pattern,
                          "pulmonary arterial hypertension"
         ))

8.1.2 Atrial fibrillation & flutter

8.1.3 Atrial fibrillation

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          vec_to_grep_pattern("post-operative atrial fibrillation"),
                          "atrial fibrillation, post-operative"
         ))



url <- "http://www.ebi.ac.uk/ols4/api/ontologies/snomed/terms/http%253A%252F%252Fsnomed.info%252Fid%252F49436004/descendants"

atrial_fibrillation_terms <- get_descendants(url)

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          pattern = vec_to_grep_pattern(atrial_fibrillation_terms),
                          "atrial fibrillation"
         ))

8.1.3.1 Atrial flutter

url <- "http://www.ebi.ac.uk/ols4/api/ontologies/snomed/terms/http%253A%252F%252Fsnomed.info%252Fid%252F5370000/descendants"

atrial_flutter_terms <- get_descendants(url)

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          pattern = vec_to_grep_pattern(atrial_flutter_terms),
                          "atrial flutter"
         ))

8.2 Heart disease

8.2.1 Hypertensive heart disease

url <- "http://www.ebi.ac.uk/ols4/api/ontologies/snomed/terms/http%253A%252F%252Fsnomed.info%252Fid%252F64715009/descendants"

hypertensive_heart_disease_terms <- get_descendants(url)

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          pattern = vec_to_grep_pattern(hypertensive_heart_disease_terms),
                          "hypertensive heart disease"
         ))

8.2.2 Ischemic heart disease

url <- "http://www.ebi.ac.uk/ols4/api/ontologies/cvdo/terms/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FDOID_3394/descendants"

ischemic_heart_disease_terms <- get_descendants(url)

url <- "http://www.ebi.ac.uk/ols4/api/ontologies/snomed/terms/http%253A%252F%252Fsnomed.info%252Fid%252F414545008/descendants"

ischemic_heart_disease_terms_snomed <- get_descendants(url)

ischemic_heart_disease_terms = c(ischemic_heart_disease_terms,
                                 ischemic_heart_disease_terms_snomed
                                 ) |>
                                str_length_sort()

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          pattern = vec_to_grep_pattern(ischemic_heart_disease_terms),
                          "ischemic heart disease"
         ))

8.2.2.1 Coronary artery disease

Likely also add, as ICD10 I25.1, seems to be coronary artery disease And gbd includes this in ischemic heart disease

url <- "http://www.ebi.ac.uk/ols4/api/ontologies/efo/terms/http%253A%252F%252Fwww.ebi.ac.uk%252Fefo%252FEFO_0001645/descendants"

coronary_artery_disease_terms <- get_descendants(url)

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          pattern = vec_to_grep_pattern(coronary_artery_disease_terms),
                          #pattern = paste0(coronary_artery_disease_terms, collapse = "(?=,|$) |\\b"),
                          "coronary artery disease"
         ))

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          pattern = vec_to_grep_pattern("coronary artery disease"),
                          # "coronary artery disease",
                          "ischemic heart disease"
         ))

8.2.2.2 Myocardial infarction

url <- "http://www.ebi.ac.uk/ols4/api/ontologies/efo/terms/http%253A%252F%252Fwww.ebi.ac.uk%252Fefo%252FEFO_0000612/descendants"

myocardial_infarction_terms <- get_descendants(url)

myocardial_infarction_terms <- c("subsequent st elevation \\(stemi\\) and non-st elevation \\(nstemi\\) myocardial infarction",
                              myocardial_infarction_terms
                              )

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                                  pattern = vec_to_grep_pattern(myocardial_infarction_terms), 
                          #pattern = paste0(myocardial_infarction_terms, collapse = "(?=,|$) |\\b"),
                                  "myocardial infarction"
                          )
         )


gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          pattern = vec_to_grep_pattern("post-operative myocardial infarction"),
                          "myocardial infarction, post-operative"
         ))

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          pattern = vec_to_grep_pattern("myocardial infarction"),
                          "ischemic heart disease"
         ))

8.2.2.3 Fixing

to_fix <- c("non-st elevation ischemic heart disease",
            "post-operative ischemic heart disease",
            "st elevation ischemic heart disease",
            "subsequent st elevation \\(stemi\\) and non-st elevation \\(nstemi\\) ischemic heart disease")

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          pattern = vec_to_grep_pattern(to_fix),
                          #paste0(to_fix, collapse = "(?=,|$) |\\b"),
                          "ischemic heart disease"
         ))

8.2.3 (Non-rheumatic) Heart valvular disease

url <- "http://www.ebi.ac.uk/ols4/api/ontologies/efo/terms/http%253A%252F%252Fwww.ebi.ac.uk%252Fefo%252FEFO_0009551/descendants"

heart_valve_disease_terms <- get_descendants(url)

# make non-rheumatic 
rheumatic_terms <- c("rheumatic pulmonary valve disease",
                     "rheumatic disease of mitral valve")

heart_valve_disease_terms <- 
  heart_valve_disease_terms[!heart_valve_disease_terms %in% rheumatic_terms]

gwas_study_info = gwas_study_info |> 
          mutate(l1_all_disease_terms  = 
          stringr::str_replace_all(l1_all_disease_terms,
                                   pattern = vec_to_grep_pattern(heart_valve_disease_terms),
                                  # pattern = paste0(heart_valve_disease_terms, 
                                  #                  collapse = "(?=,|$) |\\b"),
                                   "heart valve disease"
                                  )  
          )

8.3 Other cardiovascular diseases (GBD)

8.3.1 Aortic aneurysm

aortic_aneurysm_terms <- c("thoracic aortic aneurysm",
                          "abdominal aortic aneurysm"
                          )

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                                  pattern = vec_to_grep_pattern(aortic_aneurysm_terms),
                          # pattern = paste0(aortic_aneurysm_terms, collapse = "(?=,|$) |\\b"),
                          "aortic aneurysm"
         ))

8.3.2 Endocarditis

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          vec_to_grep_pattern("bacterial endocarditis"),
                          "endocarditis"
         ))

8.3.3 Cardiomyopathy

url <- "http://www.ebi.ac.uk/ols4/api/ontologies/hp/terms/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FHP_0001638/descendants"

cardiomyopathy_terms <- get_descendants(url)

cardiomyopathy_terms <- c("chagas cardiomyopathy",
                          "ischemic cardiomyopathy",
                          "idiopathic cardiomyopathy",
                          "nonischemic cardiomyopathy",
                          "peripartum cardiomyopathy", #? maybe include in pregnancy 
                          cardiomyopathy_terms
                          )

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                                  pattern = vec_to_grep_pattern(cardiomyopathy_terms),
                          # pattern = paste0(cardiomyopathy_terms, collapse = "(?=,|$) |\\b"),
                          "cardiomyopathy"
         ))

gwas_study_info =
gwas_study_info |> 
 mutate(l1_all_disease_terms  = 
          stringr::str_replace_all(l1_all_disease_terms ,
                                  pattern = vec_to_grep_pattern("\\bidiopathic cardiomyopathy\\b"),
                                   "cardiomyopathy"
                          )  
        )

8.3.4 Stroke

https://www.ebi.ac.uk/ols4/ontologies/efo/classes/http%253A%252F%252Fwww.ebi.ac.uk%252Fefo%252FEFO_0000712?lang=en

hemorrhage_terms <- c("intracerebral hemorrhage",
                      "non-lobar intracerebral hemorrhage",
                      "lobar intracerebral hemorrhage",
                      "parenchymal hematoma")

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                                  pattern = vec_to_grep_pattern(hemorrhage_terms),
                          # pattern = paste0(hemorrhage_terms, collapse = "(?=,|$) |\\b"),
                          "hemorrhagic stroke")
  )

stroke_terms <- c("stroke outcome",
                  "large artery stroke",
                  "small vessel stroke",
                  "ischemic stroke",
                  "stroke disorder",
                  "cardioembolic stroke",
                  "hemorrhagic stroke",
                  "intracranial hemorrhage",
                  "subdural hemorrhage"
                  )

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          pattern = vec_to_grep_pattern(stroke_terms),
                          # paste0(stroke_terms, collapse = "(?=,|$) |\\b"),
                          "stroke")
  )


gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          vec_to_grep_pattern("post-operative stroke"),
                          "post-operative, stroke")
  )

8.3.5 Lower extremity peripheral artery disease

# studies in gwas catalog don't specify lower or upper extremity
# but most peripheral artery disease is in lower extremities

gwas_study_info =
 gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          vec_to_grep_pattern("\\bperipheral artery disease|\\bperipheral arterial disease"),
                          "lower extremity peripheral arterial disease"
         ))

8.4 Other (non-GBD)

8.4.1 Carotid artery disease

carotid_artery_disease_terms <- c("carotid artery thrombosis",
                             "carotid atherosclerosis"
                             )

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                                  pattern = vec_to_grep_pattern(carotid_artery_disease_terms),
                          # pattern = paste0(carotid_artery_disease_terms, collapse = "(?=,|$) |\\b"),
                          "carotid artery disease"
         ))

8.4.2 Other hypertension

url <-  "http://www.ebi.ac.uk/ols4/api/ontologies/efo/terms/http%253A%252F%252Fwww.ebi.ac.uk%252Fefo%252FEFO_0000537/descendants"

hypertension_terms <- get_descendants(url)

hypertension_terms = c("primary hypertension",
                       "chronic venous hypertension",
                       "portal hypertension",
                      hypertension_terms
                      )

hypertension_terms = hypertension_terms[!hypertension_terms %in% 
                                        c(pulmonary_arterial_hypertension_terms,
                                          hypertensive_heart_disease_terms,
                                          "pulmonary arterial hypertension",
                                          "hypertensive heart disease"
                                          )
                                        ]

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                                  pattern = vec_to_grep_pattern(hypertension_terms),
                          # pattern = paste0(hypertension_terms, collapse = "(?=,|$) |\\b"),
                          "hypertension"
         ))

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          pattern = vec_to_grep_pattern(hypertension_terms),
                          # pattern = paste0(hypertension_terms, collapse = "(?=,|$) |\\b"),
                          "hypertension"
         ))

8.4.3 Heart block

8.4.4 Bundle branch block

bundle_branch_block_terms <- c("left bundle branch block",
                             "right bundle branch block",
                             "complete right bundle branch block"
                             )

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          pattern = vec_to_grep_pattern(bundle_branch_block_terms),
                          "bundle branch block"
         )
  )
heart_block_terms <- c("first degree atrioventricular block",
                      "second degree atrioventricular block",
                      "third-degree atrioventricular block",
                      "atrioventricular block",
                      "bundle branch block",
                      "complete right heart block",
                      "left heart block"
                      )

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          pattern = vec_to_grep_pattern(heart_block_terms),
                          # pattern = paste0(heart_block_terms, collapse = "(?=,|$) |\\b"),
                          "heart block"
         ))

8.4.5 Tachycardia

tacycardia_terms <- c("paroxysmal tachycardia",
                     "ventricular tachycardia",
                     "supraventricular tachycardia",
                     "paroxysmal supraventricular tachycardia",
                     "paroxysmal ventricular tachycardia"
                     )


gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                                  pattern = vec_to_grep_pattern(tacycardia_terms),
                          # pattern = paste0(tacycardia_terms, collapse = "(?=,|$) |\\b"),
                          "tachycardia"
         )
  )

8.4.6 Cardiac arrest

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                         vec_to_grep_pattern("sudden cardiac arrest"),
                          "cardiac arrest"
         )
  )

8.4.7 Other cardiac arrhythmias

other_cardiac_arrhythmia_terms <- c("ventricular fibrillation",
                                   "ventricular arrhythmia",
                                   "cardiac arrest",
                                   "bradycardia",
                                   "cardiac arrhythmia"
                                   )


gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                                  pattern = vec_to_grep_pattern(other_cardiac_arrhythmia_terms),
                          # pattern = paste0(other_cardiac_arrhythmia_terms, collapse = "(?=,|$) |\\b"),
                          "other cardiac arrhythmias"
         )
  )

8.4.8 Abnormal cardiovascular system morphology

url <- "http://www.ebi.ac.uk/ols4/api/ontologies/mp/terms/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FMP_0002127/descendants"

mp_terms <- get_descendants(url)

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                                  pattern = vec_to_grep_pattern(mp_terms),
                          "abnormal cardiovascular system morphology"
         )
  )

8.4.9 Thrombosis

url <- "http://www.ebi.ac.uk/ols4/api/ontologies/mondo/terms/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FMONDO_0000831/descendants"

thrombosis_terms <- get_descendants(url)

thrombosis_terms <- c("abnormal thrombosis",
                      thrombosis_terms)

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                                  pattern = vec_to_grep_pattern(thrombosis_terms),
                          # pattern = paste0(thrombosis_terms, collapse = "(?=,|$) |\\b"),
                          "thrombotic disease"
         )
  )

8.4.10 Arterial occlusion

arterial_occlusion_terms <- c("cerebral artery occlusion",
                              "occlusion precerebral artery",
                              "small artery occlusion"
                              )

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                                  pattern = vec_to_grep_pattern(arterial_occlusion_terms),
                          # pattern = paste0(arterial_occlusion_terms, collapse = "(?=,|$) |\\b"),
                          "arterial occlusion"
         )
  )

8.4.11 Other vascular disorders / diseases

other_vascular_disorder_terms <- c("skin vascular disease",
                                   "peripheral vascular disease",
                                   "vascular disease",
                                   "vasculitis",
                                   "arteritis")

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          pattern = vec_to_grep_pattern(other_vascular_disorder_terms),
                          "other vascular disorders"
         )
  )

8.5 Congestive heart failure

congestive_heart_failure_terms <- c("systolic heart failure",
                                 "diastolic heart failure",
                                 "cor pulmonale",
                                 "congenital heart disease",
                                 "chronic pulmonary heart disease"
                                 )


gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                                  pattern = vec_to_grep_pattern(congestive_heart_failure_terms),
                          # pattern = paste0(congestive_heart_failure_terms, collapse = "(?=,|$) |\\b"),
                          "congestive heart failure"
         ))

8.6 Brain infarction

brain_infarction_terms <- c("cerebral infarction",
                          "mri defined brain infarct",
                          "small cerebral infarction"
                          )

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                                  pattern = vec_to_grep_pattern(brain_infarction_terms),
                          # pattern = paste0(brain_infarction_terms, collapse = "(?=,|$) |\\b"),
                          "brain infarction"
         ))

8.7 Vascular insufficiency

vascular_insufficiency_terms <- c("vascular insufficiency disorder",
                                 "chronic venous insufficiency",
                                  "chronic intestinal vascular insufficiency"
                                 )

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                                  pattern = vec_to_grep_pattern(vascular_insufficiency_terms),
                          # pattern = paste0(vascular_insufficiency_terms, collapse = "(?=,|$) |\\b"),
                          "vascular insufficiency"
         )
  )

9 Disease subtype grouping (non-cancer)

9.1 Abornmal total eosinophil count

url <- "http://www.ebi.ac.uk/ols4/api/ontologies/hp/terms/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FHP_0020064/descendants"

abnormal_total_eosinophil_count_terms <- get_descendants(url)

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                                  pattern = vec_to_grep_pattern(abnormal_total_eosinophil_count_terms),
                          # pattern = paste0(abnormal_total_eosinophil_count_terms, collapse = "(?=,|$) |\\b"),
                          "abnormal total eosinophil count"
         ))

9.2 Abortion

pregnancy_loss_terms <- c("habitual abortion",
                          "incomplete abortion",
                          "spontaneous abortion",
                          "abortion",
                          "spontaneous loss of pregnancy",
                          "incomplete loss of pregnancy"
                          )

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms = 
         stringr::str_replace_all(l1_all_disease_terms,
                                  pattern = vec_to_grep_pattern(pregnancy_loss_terms),
                          # paste0(pregnancy_loss_terms, collapse = "(?=,|$) |\\b"),
                          "loss of pregnancy"
         ))

# exact synonyms (to avoid partial matches
# https://www.ebi.ac.uk/ols4/ontologies/efo/classes/http%253A%252F%252Fwww.ebi.ac.uk%252Fefo%252FEFO_1001491?lang=en

9.3 Acute pancreatitis

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          vec_to_grep_pattern("asparaginase-induced acute pancreatitis"),
                          "acute pancreatitis"
         ))

9.5 Alcohol-use disorders

9.6 Alcoholic liver disease

alcoholic_liver_disease_terms <- c("alcoholic liver cirrhosis",
                                   "alcoholic fatty liver disease",
                                   "alcoholic hepatitis"
                                   )


gwas_study_info = gwas_study_info |>
      mutate(l1_all_disease_terms = 
           stringr::str_replace_all(l1_all_disease_terms,
                            pattern = vec_to_grep_pattern(alcoholic_liver_disease_terms),
                            "alcoholic liver disease"
           ))

9.7 Alzheimer’s disease

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                                  pattern = vec_to_grep_pattern(
                                    c("early-onset alzheimers disease",
                                  "late-onset alzheimers disease")),
                          "alzheimers disease"
         ))

9.8 Allergic contact dermatitis

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          vec_to_grep_pattern("allergic contact dermatitis of eyelid"),
                          "allergic contact dermatitis"
         ))

9.9 Allergic rhinitis

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          vec_to_grep_pattern("seasonal allergic rhinitis"),
                          "allergic rhinitis"
         ))

9.10 Aplastic anemia

url <- "http://www.ebi.ac.uk/ols4/api/ontologies/mondo/terms/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FMONDO_0015909/descendants"

aplastic_anemia_terms <- get_descendants(url)

aplastic_anemia_terms <- c("severe aplastic anemia",
                           aplastic_anemia_terms
                          )

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          pattern = vec_to_grep_pattern(aplastic_anemia_terms),
                          "aplastic anemia"
         ))

9.11 Autoimmune pancreatitis

gwas_study_info = gwas_study_info |>
      mutate(l1_all_disease_terms  = 
           stringr::str_replace_all(l1_all_disease_terms,
                            vec_to_grep_pattern("autoimmune pancreatitis type 1"),
                            "autoimmune pancreatitis"
           )) 

9.12 Cataract

url <- "http://www.ebi.ac.uk/ols4/api/ontologies/mondo/terms/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FMONDO_0005129/descendants"

cataract_terms <- get_descendants(url)

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          pattern = vec_to_grep_pattern(cataract_terms),
                          "cataract"
         ))

9.13 Charcot-marie-tooth disease type 1a

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          vec_to_grep_pattern("charcot-marie-tooth disease type 1a, decreased fine motor function"),
                          "charcot-marie-tooth disease type 1a"
         ))


gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          vec_to_grep_pattern("charcot-marie-tooth disease type 1a, decreased walking ability"),
                          "charcot-marie-tooth disease type 1a"
         ))

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          vec_to_grep_pattern("charcot-marie-tooth disease type 1a, gait imbalance"),
                          "charcot-marie-tooth disease type 1a, gait disturbance"
         ))

9.14 Chronic pain

url <- "http://www.ebi.ac.uk/ols4/api/ontologies/efo/terms/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FHP_0012532/descendants"

chronic_pain_terms <- get_descendants(url)

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          pattern = vec_to_grep_pattern(chronic_pain_terms),
                          "chronic pain"
         ))

9.15 Cholecystitis

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          vec_to_grep_pattern("acute cholecystitis"),
                          "cholecystitis"
         ))

9.16 Communication disorder

url <- "http://www.ebi.ac.uk/ols4/api/ontologies/mondo/terms/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FMONDO_0002182/descendants"

communication_disorder_terms <- get_descendants(url)

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          pattern = vec_to_grep_pattern(communication_disorder_terms),
                          "communication disorder"
         ))

9.17 Delirium

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          vec_to_grep_pattern("post-operative delirium"),
                          "delirium, post-operative"
         ))

9.18 Dermatitis

url <- "http://www.ebi.ac.uk/ols4/api/ontologies/doid/terms/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FDOID_2723/descendants"

dermatitis_terms <- get_descendants(url)

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          pattern = vec_to_grep_pattern(dermatitis_terms),
                          "dermatitis"
         ))

# see: http://www.ebi.ac.uk/efo/EFO_0000274
eczema_terms <- c("atopic eczema",
                 "hand eczema",
                 "eczematoid dermatitis", # see: http://purl.obolibrary.org/obo/HP_0000964
                 "recalcitrant dermatitis" # assuming same as recalcitrant atopic dermatitis http://www.ebi.ac.uk/efo/EFO_1000651
                 )

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          pattern = vec_to_grep_pattern(eczema_terms),
                          "dermatitis"
         ))

9.19 Diabetic eye disease

diabetic_eye_terms <- c("diabetic maculopathy",
                        "diabetic macular edema",
                        "diabetic retinopathy",
                        "proliferative diabetic retinopathy",
                        "non-proliferative diabetic retinopathy",
                        "diabetes mellitus type 2 associated cataract")


gwas_study_info = 
  gwas_study_info |> 
  mutate(l1_all_disease_terms  = 
           stringr::str_replace_all(l1_all_disease_terms,
                                    pattern = vec_to_grep_pattern(diabetic_eye_terms),
                                    "diabetic eye disease"
           )  
  )

9.20 Diabetic neuropathy

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          vec_to_grep_pattern("diabetic polyneuropathy"),
                          "diabetic neuropathy"
         ))

9.21 Deficiency anemia

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          vec_to_grep_pattern("iron deficiency anemia"),
                          "deficiency anemia"
         ))

9.22 Encephalopathy

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          vec_to_grep_pattern("delayed encephalopathy after acute poisoning"),
                          "encephalopathy, poisoning"
         ))


gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          vec_to_grep_pattern("encephalopathy acute infection-induced"),
                          "encephalopathy"
         ))

9.23 Gout

# from: http://www.ebi.ac.uk/efo/EFO_0004274
gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          vec_to_grep_pattern("renal overload-type gout"),
                          "gout"
         ))

9.24 Hemolytic anemia

url <- "http://www.ebi.ac.uk/ols4/api/ontologies/mondo/terms/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FMONDO_0003664/descendants"


hemolytic_anemia_terms <- get_descendants(url)

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          pattern = vec_to_grep_pattern(hemolytic_anemia_terms),
                          "hemolytic anemia"
         ))

9.25 Hereditary progressive muscular distrophy

url <- "http://www.ebi.ac.uk/ols4/api/ontologies/snomed/terms/http%253A%252F%252Fsnomed.info%252Fid%252F193225000/descendants"

hereditary_progressive_muscular_dystrophy_terms <- get_descendants(url)

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          pattern = vec_to_grep_pattern(hereditary_progressive_muscular_dystrophy_terms),
                          "hereditary progressive muscular dystrophy"
         ))

9.26 Hernia of the abdominal wall

url <- "http://www.ebi.ac.uk/ols4/api/ontologies/hp/terms/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FHP_0004299/children"

hernia_abdominal_wall_terms <- get_descendants(url)

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          pattern = vec_to_grep_pattern(hernia_abdominal_wall_terms),
                          "hernia of the abdominal wall"
         ))

9.27 Hyperaldosteronism

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          vec_to_grep_pattern("primary hyperaldosteronism"),
                          "hyperaldosteronism"
         ))

9.28 Hyperlipidemia

hyperlipidemia_terms <- c("hypercholesterolemia",
                         "familial hypercholesterolemia",
                         "familial hyperlipidemia"
                         )

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          pattern = vec_to_grep_pattern(hyperlipidemia_terms),
                          "hyperlipidemia"
         ))

9.29 Inborn errors of metabolism

inborn_error_metab <- c("inborn disorder of amino acid metabolism", # ICD9:270, ICD9:270.9
                        "inborn disorder of amino acid transport", # ICD9:270.0
                        "inborn disorder of porphyrin metabolism", # ? ebi suggests ... ICD10CM:E80.4 - doesn't seem right match, but close - E80 range likely captures it
                        "inborn carbohydrate metabolic disorder", # ICD9:271.8
                        "familial lipoprotein lipase deficiency", # ICD9:272.3
                        "lactose intolerance", # ICD10: E73
                        "hereditary hemochromatosis", # ICD10CM:E83.110, ICD9: 275.01
                        "alpha 1-antitrypsin deficiency", # ICD9:273.4
                        "urea cycle disorder", # ICD9CM:270.6
                        "gaucher disease", # ICD10CM:E75.22, ICD9:272.7
                        "plasma protein metabolism disease", # ? ICD9:273.8
                        "disorder of metabolite absorption and transport"
                        )

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          pattern = vec_to_grep_pattern(inborn_error_metab),
                          "inborn errors of metabolism"
         ))

9.30 Intestinal obstruction

url <- "http://www.ebi.ac.uk/ols4/api/ontologies/mondo/terms/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FMONDO_0004565/descendants"

intestinal_obstruction_terms <- get_descendants(url)

# also add: http://purl.obolibrary.org/obo/DOID_8437

intestinal_obstruction_terms = c(intestinal_obstruction_terms,
                                 "intestinal impaction"
                                 )

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          pattern = vec_to_grep_pattern(intestinal_obstruction_terms),
                          "intestinal obstruction"
         ))

9.31 Chronic kidney disease

url <- "http://www.ebi.ac.uk/ols4/api/ontologies/efo/terms/http%253A%252F%252Fwww.ebi.ac.uk%252Fefo%252FEFO_0003884/descendants"


kidney_disease_terms <- get_descendants(url)

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          pattern = vec_to_grep_pattern(kidney_disease_terms),
                          "chronic kidney disease"
         ))

9.32 Penile disorder

penile_disorder <- c("balanitis",
                     "balanoposthitis",
                     "priapism",
                     "phimosis",
                     "erectile dysfunction",
                     "leukoplakia of penis")


gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          pattern = vec_to_grep_pattern(penile_disorder),
                          "penile disorder"
         ))

9.33 Photosensitivity disease

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
        stringr::str_replace_all(l1_all_disease_terms,
                         vec_to_grep_pattern(
                           c("phototoxic dermatitis",
                           "skin sensitivity to sun"
                           )),
                         "photosensitivity disease")
        )

9.34 Phobic disorder

url <- "http://www.ebi.ac.uk/ols4/api/ontologies/efo/terms/http%253A%252F%252Fwww.ebi.ac.uk%252Fefo%252FEFO_1001908/descendants"

phobic_disorder_terms <- get_descendants(url)

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          pattern = vec_to_grep_pattern(phobic_disorder_terms),
                          "phobic disorder"
         ))

9.35 Mineral metabolism disease

url <- "http://www.ebi.ac.uk/ols4/api/ontologies/efo/terms/http%253A%252F%252Fwww.ebi.ac.uk%252Fefo%252FEFO_0009556/descendants"

mineral_metabolism_disorder_terms <- get_descendants(url)

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          pattern = vec_to_grep_pattern(mineral_metabolism_disorder_terms),
                          "mineral metabolism disease"
         ))

9.36 Non-alcoholic fatty liver disease (NAFLD)

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                                  vec_to_grep_pattern(
                                    c("non-alcoholic steatohepatitis",
                                      "non-alcoholic liver disease"
                                     )
                                   ),
                          "non-alcoholic fatty liver disease"
         ))

9.37 Orofacial cleft

orofacial_cleft_terms <- c("cleft lip",
                            "cleft palate"
                            )


gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          pattern = vec_to_grep_pattern(orofacial_cleft_terms),
                          "orofacial cleft"
         ))

9.38 Osteoarthritis

osteoarthritis_terms <- c("knee osteoarthritis",
                         "hip osteoarthritis",
                         "hand osteoarthritis",
                         "spine osteoarthritis",
                         "toe osteoarthritis"
                         )

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          pattern = vec_to_grep_pattern(osteoarthritis_terms),
                          "osteoarthritis"
         )
  )

9.39 Ovarian dysfunction

url <- "http://www.ebi.ac.uk/ols4/api/ontologies/efo/terms/http%253A%252F%252Fwww.ebi.ac.uk%252Fefo%252FEFO_0009003/descendants"

ovarian_dysfunction_terms <- get_descendants(url)

ovarian_dysfunction_terms <- c(ovarian_dysfunction_terms,
                              "premature ovarian failure",
                              "premature ovarian insufficiency"
                              )

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          pattern = vec_to_grep_pattern(ovarian_dysfunction_terms),
                          "ovarian dysfunction"
         ))

9.40 Pancreatitis

url <- "http://www.ebi.ac.uk/ols4/api/ontologies/efo/terms/http%253A%252F%252Fwww.ebi.ac.uk%252Fefo%252FEFO_0000278/descendants"

pancreatitis_terms <- get_descendants(url)

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          pattern = vec_to_grep_pattern(pancreatitis_terms),
                          "pancreatitis"
         ))

9.41 Peridontal disease

url <- "http://www.ebi.ac.uk/ols4/api/ontologies/doid/terms/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FDOID_3388/descendants"

periodontal_terms <- get_descendants(url)

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          pattern = vec_to_grep_pattern(periodontal_terms),
                          "periodontal disease"
         ))

9.42 Pituitary gland disease

url <- "http://www.ebi.ac.uk/ols4/api/ontologies/efo/terms/http%253A%252F%252Fwww.ebi.ac.uk%252Fefo%252FEFO_0009607/descendants"

pituitary_gland_disease_terms <- get_descendants(url)

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          pattern = vec_to_grep_pattern(pituitary_gland_disease_terms),
                          "pituitary gland disease"
         ))

9.43 Polycythemia

url <- "http://www.ebi.ac.uk/ols4/api/ontologies/efo/terms/http%253A%252F%252Fwww.ebi.ac.uk%252Fefo%252FEFO_0005804/descendants"

polycythemia_terms <- get_descendants(url)

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          pattern = vec_to_grep_pattern(polycythemia_terms),
                          "polycythemia"
         ))

9.44 Pulmonary embolism

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          vec_to_grep_pattern("pulmonary embolism, pulmonary infarction"),
                          "pulmonary embolism"
         ))

9.45 Psychosis

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          vec_to_grep_pattern("psychosis predisposition"),
                          "psychosis")
         )

9.46 Retinopathy

retinopathy_terms <- c("chronic central serous retinopathy",
                       "central serous retinopathy"
                       )

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms = 
           stringr::str_replace_all(l1_all_disease_terms,
                          pattern = vec_to_grep_pattern(retinopathy_terms),
                          "retinopathy")
  )

9.47 Sciatica

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_remove_all(l1_all_disease_terms,
                          "^ldh-related sciatica, |, ldh-related sciatica$"
         )
  )

9.48 Sickle cell disease

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          vec_to_grep_pattern("sickle cell anemia"),
                          "sickle cell disease and related diseases")
         )

9.49 Staphylococcus aureus infection

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          vec_to_grep_pattern("methicillin-resistant staphylococcus aureus infection"),
                          "staphylococcus aureus infection"
         ))

9.50 Systemic scleroderma

url <- "http://www.ebi.ac.uk/ols4/api/ontologies/efo/terms/http%253A%252F%252Fwww.ebi.ac.uk%252Fefo%252FEFO_0000717/descendants"

scleroderma_terms <- get_descendants(url)

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          pattern = vec_to_grep_pattern(scleroderma_terms),
                          "systemic scleroderma"
         ))

9.51 Tenosynovitis

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
        stringr::str_replace_all(l1_all_disease_terms,
                         vec_to_grep_pattern("stenosing tenosynovitis"),
                         "tenosynovitis")
        )

9.52 Thrombocytopenia

thrombocytopenia_terms <- c("thrombocytopenia 4",
                           "acquired thrombocytopenia",
                           "primary thrombocytopenia"
                           )

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          pattern = vec_to_grep_pattern(thrombocytopenia_terms),
                          "thrombocytopenia"
         ))

9.53 Tinea

tinea_terms <- c("tinea pedis",
                 "tinea unguium",
                 "dermatophytosis"
                 )


gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          pattern = vec_to_grep_pattern(tinea_terms),
                          "tinea"
         ))

9.54 Vitamin b deficiency

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          vec_to_grep_pattern("vitamin b12 deficiency"),
                          "vitamin b deficiency"
         ))

9.55 Visual impairment

url <- "http://www.ebi.ac.uk/ols4/api/ontologies/efo/terms/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FHP_0000505/descendants"

visual_impairment_terms <- get_descendants(url)

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          pattern = vec_to_grep_pattern(visual_impairment_terms),
                          "visual impairment"
         ))

10 Further grouping of terms

10.1 Abnormal refraction

url <- "http://www.ebi.ac.uk/ols4/api/ontologies/upheno/terms/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FHP_0000539/descendants"

abnormal_refraction_terms <- get_descendants(url)

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          pattern = vec_to_grep_pattern(abnormal_refraction_terms),
                          "abnormality of refraction"
         ))

11 Final summary - number of unique study terms

11.1 Deal with duplicate terms created during grouping

gwas_study_info = 
 gwas_study_info |>
  rowwise() |>
  mutate(l1_all_disease_terms = paste0(sort(unique(unlist(strsplit(l1_all_disease_terms, ", ")))),
                                      collapse = ", ")
         ) |>
  ungroup()

11.2 Deal with hanging commas and spaces

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms = stringr::str_remove_all(l1_all_disease_terms, "^,|,$")
         ) |>
  mutate(l1_all_disease_terms = stringr::str_trim(l1_all_disease_terms)
         ) 

11.3 Final summary - number of unique study terms pairs

n_studies_trait = gwas_study_info |>
  dplyr::filter(DISEASE_STUDY == T) |>
  dplyr::select(l1_all_disease_terms, PUBMED_ID) |>
  dplyr::distinct() |>
  dplyr::group_by(l1_all_disease_terms) |>
  dplyr::summarise(n_studies = dplyr::n()) |>
  dplyr::arrange(desc(n_studies))


head(n_studies_trait)

dim(n_studies_trait)

11.3.1 When separate studies with multiple terms

diseases <- stringr::str_split(pattern = ", ", 
                               gwas_study_info$l1_all_disease_terms[gwas_study_info$l1_all_disease_terms != ""])  |> 
            unlist() |>
            stringr::str_trim()


test <- data.frame(trait = unique(diseases))

length(unique(diseases))

# make frequency table
freq <- table(as.factor(diseases))

# sort in decreasing order
freq_sorted <- sort(freq, decreasing = TRUE)

# show top N, e.g. top 10
head(freq_sorted, 10)

11.3.2 Save the updated gwas_study_info with harmonized disease terms

fwrite(gwas_study_info,
        here::here("output/gwas_cat/gwas_study_info_group_l1.csv")
         )

sessionInfo()