Last updated: 2025-09-16

Checks: 7 0

Knit directory: genomics_ancest_disease_dispar/

This reproducible R Markdown analysis was created with workflowr (version 1.7.1). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.


Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

The command set.seed(20220216) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version b01d9aa. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .DS_Store
    Ignored:    .Rproj.user/
    Ignored:    data/.DS_Store
    Ignored:    data/gbd/.DS_Store
    Ignored:    data/gbd/ihme_gbd_2019_global_disease_burden_rate_all_ages.csv
    Ignored:    data/gbd/ihme_gbd_2019_global_paf_rate_percent_all_ages.csv
    Ignored:    data/gbd/ihme_gbd_2021_global_disease_burden_rate_all_ages.csv
    Ignored:    data/gbd/ihme_gbd_2021_global_paf_rate_percent_all_ages.csv
    Ignored:    data/gwas_catalog/
    Ignored:    data/who/
    Ignored:    output/gwas_cat/
    Ignored:    output/gwas_study_info_cohort_corrected.csv
    Ignored:    output/gwas_study_info_trait_corrected.csv
    Ignored:    output/gwas_study_info_trait_ontology_info.csv
    Ignored:    output/gwas_study_info_trait_ontology_info_l1.csv
    Ignored:    output/gwas_study_info_trait_ontology_info_l2.csv
    Ignored:    output/trait_ontology/
    Ignored:    renv/

Unstaged changes:
    Modified:   analysis/level_1_disease_group_cancer.Rmd
    Modified:   analysis/level_2_disease_group.Rmd
    Modified:   code/get_term_descendants.R

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.


These are the previous versions of the repository in which changes were made to the R Markdown (analysis/level_1_disease_group_non_cancer.Rmd) and HTML (docs/level_1_disease_group_non_cancer.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File Version Author Date Message
Rmd b01d9aa IJbeasley 2025-09-16 Improving cancer grouping
html f15743e IJbeasley 2025-09-16 Build site.
Rmd 18a4e85 IJbeasley 2025-09-16 More disease grouping
html 922c9c3 IJbeasley 2025-09-16 Build site.
Rmd c601713 IJbeasley 2025-09-16 Even more disease term grouping
html add5ecc IJbeasley 2025-09-15 Build site.
Rmd 465a689 IJbeasley 2025-09-15 More disease term grouping
html 7504c04 IJbeasley 2025-09-15 Build site.
Rmd f01005f IJbeasley 2025-09-15 workflowr::wflow_publish("analysis/level_1_disease_group_non_cancer.Rmd")
html 9aa118e IJbeasley 2025-09-15 Build site.
Rmd ffbf74a IJbeasley 2025-09-15 Further grouping of disease terms
html b19b361 IJbeasley 2025-09-15 Build site.
Rmd 9cc22ba IJbeasley 2025-09-15 Dealing with duplicate disease terms
html 1679f9d IJbeasley 2025-09-10 Build site.
html 2250e22 IJbeasley 2025-09-10 Build site.
Rmd e3de56c IJbeasley 2025-09-10 Update cardiac disease grouping
html e713c34 IJbeasley 2025-09-10 Build site.
Rmd 934b11f IJbeasley 2025-09-10 workflowr::wflow_publish("analysis/level_1_disease_group_non_cancer.Rmd")

1 Set up

library(dplyr)
library(data.table)
library(stringr)

1.1 Ontology help - for getting disease subtypes

source(here::here("code/get_term_descendants.R"))
gwas_study_info <- fread(here::here("output/gwas_cat/gwas_study_info_disease_trait_simplified.csv"))

2 Initial summary

2.1 Objectives of this analysis:

Level 1 disease grouping: - collapse disease causes, e.g. contact dermatitis due to nickel to contact dermatitis - collapse disease subtypes, e.g. bipolar I and bipolar II to bipolar disorder - collapse disease onset times, e.g. early-onset alzheimers disease and late-onset alzheimers disease to alzheimers disease - collapse disease stages - collapse disease complications, e.g. device complication, trauma complication to complication

2.2 Grouping - level 1 set up

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms = collected_all_disease_terms)

2.3 Initial summary - number of unique study terms

n_studies_trait = gwas_study_info |>
  dplyr::filter(DISEASE_STUDY == T) |>
  dplyr::select(collected_all_disease_terms, PUBMED_ID) |>
  dplyr::distinct() |>
  dplyr::group_by(collected_all_disease_terms) |>
  dplyr::summarise(n_studies = dplyr::n()) |>
  dplyr::arrange(desc(n_studies))

head(n_studies_trait)
# A tibble: 6 × 2
  collected_all_disease_terms n_studies
  <chr>                           <int>
1 type 2 diabetes mellitus          145
2 alzheimers disease                116
3 breast cancer                     112
4 asthma                            110
5 major depressive disorder         108
6 schizophrenia                     103
dim(n_studies_trait)
[1] 3043    2

2.3.1 When separate studies with multiple terms

diseases <- stringr::str_split(pattern = ", ", 
 gwas_study_info$collected_all_disease_terms[gwas_study_info$collected_all_disease_terms != ""])  |> 
            unlist() |>
            stringr::str_trim()

length(unique(diseases))
[1] 2194

3 Disturbances to senses

disturb_senses_terms <- c("disturbances of sensation of smell and taste",
                          "abnormality of the sense of smell",
                          "ageusia")

4 Disease stage grouping

4.1 Chronic kidney disease (CKD)

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms = 
         stringr::str_replace_all(l1_all_disease_terms,
                          "chronic kidney disease stage 5|chronic kidney disease stage 4",
                          "chronic kidney disease"
         ))


gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms = 
         stringr::str_replace_all(l1_all_disease_terms,
                          "stage 5 chronic kidney disease",
                          "chronic kidney disease"
         ))

5 Disease cause grouping

5.1 Contact dermatitis

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms = 
         stringr::str_replace_all(l1_all_disease_terms,
                          "contact dermatitis due to nickel",
                          "contact dermatitis"
         ))

6 Disease complication

disease_complication_terms = c("device complication",
                             "trauma complication",
                             "adverse effect",
                             "aseptic loosening"
                             )

gwas_study_info = gwas_study_info |>
 mutate(l1_all_disease_terms  = 
          stringr::str_replace_all(l1_all_disease_terms,
                                  pattern = paste0(disease_complication_terms, collapse = "(?=,|$)|\\b"),
                                   "complication"
                          )  
        )

7 Disease subtype grouping (non-cancer)

7.1 Abornmal total eosinophil count

url <- "http://www.ebi.ac.uk/ols4/api/ontologies/hp/terms/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FHP_0020064/descendants"

abnormal_total_eosinophil_count_terms <- get_descendants(url)
[1] "Number of terms collected:"
[1] 4
[1] "\n Some example terms"
[1] "severely increased total eosinophil count"
[2] "decreased total eosinophil count"         
[3] "increased total eosinophil count"         
[4] "episodic eosinophilia"                    
[5] NA                                         
gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          pattern = paste0(abnormal_total_eosinophil_count_terms, collapse = "(?=,|$)|\\b"),
                          "abnormal total eosinophil count"
         ))

7.2 Abortion

pregnancy_loss_terms <- c("habitual abortion",
                          "incomplete abortion",
                          "spontaneous abortion",
                          "abortion",
                          "spontaneous loss of pregnancy",
                          "incomplete loss of pregnancy"
                          )

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms = 
         stringr::str_replace_all(l1_all_disease_terms,
                          paste0(pregnancy_loss_terms, collapse = "(?=,|$)|\\b"),
                          "loss of pregnancy"
         ))

# exact synonyms (to avoid partial matches
# https://www.ebi.ac.uk/ols4/ontologies/efo/classes/http%253A%252F%252Fwww.ebi.ac.uk%252Fefo%252FEFO_1001491?lang=en

7.3 Acute pancreatitis

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          "asparaginase-induced acute pancreatitis",
                          "acute pancreatitis"
         ))

7.5 Alcohol-use disorders

alcohol_use_disorder_terms <- c("alcohol dependence",
                             "alcohol withdrawal",
                             "alcohol withdrawal delirium",
                             "alcohol abuse",
                             "alcohol-related disorders delirium",
                             "alcohol use disorder",
                             "addictive alcohol use"
                             )

gwas_study_info = gwas_study_info |>
      mutate(l1_all_disease_terms = 
           stringr::str_replace_all(l1_all_disease_terms,
                            paste0(alcohol_use_disorder_terms, collapse = "(?=,|$)|\\b"),
                            "alcohol-related disorders"
           ))

gwas_study_info = gwas_study_info |>
      mutate(l1_all_disease_terms = 
           stringr::str_replace_all(l1_all_disease_terms,
                            paste0(alcohol_use_disorder_terms, collapse = "(?=,|$)|\\b"),
                            "alcohol-related disorders"
           ))

7.6 Alcoholic liver disease

alcoholic_liver_disease_terms <- c("alcoholic liver cirrhosis",
                                   "alcoholic fatty liver disease",
                                   "alcoholic hepatitis"
                                   )


gwas_study_info = gwas_study_info |>
      mutate(l1_all_disease_terms = 
           stringr::str_replace_all(l1_all_disease_terms,
                             paste0(alcoholic_liver_disease_terms, collapse = "(?=,|$)|\\b"),
                            "alcoholic liver disease"
           ))

7.7 Alzheimer’s disease

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          "early-onset alzheimers disease|late-onset alzheimers disease",
                          "alzheimers disease"
         ))

7.8 Allergic contact dermatitis

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          "allergic contact dermatitis of eyelid",
                          "allergic contact dermatitis"
         ))

7.9 Allergic rhinitis

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          "seasonal allergic rhinitis",
                          "allergic rhinitis"
         ))

7.10 Aortic aneurysm

aortic_aneurysm_terms <- c("thoracic aortic aneurysm",
                          "abdominal aortic aneurysm"
                          )

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          pattern = paste0(aortic_aneurysm_terms, collapse = "(?=,|$)|\\b"),
                          "aortic aneurysm"
         ))

7.11 Aplastic anemia

url <- "http://www.ebi.ac.uk/ols4/api/ontologies/mondo/terms/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FMONDO_0015909/descendants"

aplastic_anemia_terms <- get_descendants(url)
[1] "Number of terms collected:"
[1] 62
[1] "\n Some example terms"
[1] "diamond-blackfan anemia 15 with mandibulofacial dysostosis"
[2] "diamond-blackfan anemia 14 with mandibulofacial dysostosis"
[3] "cellular phase chronic idiopathic myelofibrosis"           
[4] "autosomal dominant aplasia and myelodysplasia"             
[5] "pancytopenia-developmental delay syndrome"                 
aplastic_anemia_terms <- c("severe aplastic anemia",
                           aplastic_anemia_terms
                          )

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          pattern = paste0(aplastic_anemia_terms, collapse = "(?=,|$)|\\b"),
                          "aplastic anemia"
         ))

7.12 Astigmatism

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          "corneal astigmatism",
                          "astigmatism"
         ))

7.13 Asthma

gwas_study_info = gwas_study_info |>
      mutate(l1_all_disease_terms  = 
           stringr::str_replace_all(l1_all_disease_terms,
                            "atopic asthma|chronic obstructive asthma",
                            "asthma"
           )) |>
    mutate(l1_all_disease_terms  = 
           stringr::str_replace_all(l1_all_disease_terms,
                            "childhood onset asthma|adult onset asthma|aspirin-induced asthma",
                            "asthma"
           ))

7.14 Atrial fibrillation

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          "post-operative atrial fibrillation",
                          "atrial fibrillation, post-operative"
         ))

7.15 Autoimmune pancreatitis

gwas_study_info = gwas_study_info |>
      mutate(l1_all_disease_terms  = 
           stringr::str_replace_all(l1_all_disease_terms,
                            "autoimmune pancreatitis type 1",
                            "autoimmune pancreatitis"
           )) 

7.16 Bipolar disorder

gwas_study_info = gwas_study_info |> 
    mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          "bipolar ii disorder|bipolar i disorder",
                          "bipolar disorder"
         ))

7.17 Bone fracture

url <- "http://www.ebi.ac.uk/ols4/api/ontologies/efo/terms/http%253A%252F%252Fwww.ebi.ac.uk%252Fefo%252FEFO_0003931/descendants"

bone_fracture_terms <- get_descendants(url)
[1] "Number of terms collected:"
[1] 29
[1] "\n Some example terms"
[1] "atypical femoral fracture" "periprosthetic fractures" 
[3] "upper extremity fracture"  "lower extremity fracture" 
[5] "multiple bone fractures"  
gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          pattern = paste0(bone_fracture_terms, collapse = "(?=,|$)|\\b"),
                          "bone fracture"
         ))

7.18 Bundle branch block

bundle_branch_block_terms <- c("left bundle branch block",
                             "right bundle branch block",
                             "complete right bundle branch block"
                             )

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          pattern = paste0(bundle_branch_block_terms, collapse = "(?=,|$)|\\b"),
                          "bundle branch block"
         )
  )

7.19 Cardiomyopathy

url <- "http://www.ebi.ac.uk/ols4/api/ontologies/hp/terms/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FHP_0001638/descendants"

cardiomyopathy_terms <- get_descendants(url)
[1] "Number of terms collected:"
[1] 14
[1] "\n Some example terms"
[1] "right ventricular noncompaction cardiomyopathy"
[2] "left ventricular noncompaction cardiomyopathy" 
[3] "biventricular noncompaction cardiomyopathy"    
[4] "concentric hypertrophic cardiomyopathy"        
[5] "apical hypertrophic cardiomyopathy"            
cardiomyopathy_terms <- c("chagas cardiomyopathy",
                          "ischemic cardiomyopathy",
                          "idiopathic cardiomyopathy",
                          "nonischemic cardiomyopathy",
                          "peripartum cardiomyopathy", #? maybe include in pregnancy 
                          cardiomyopathy_terms
                          )

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          pattern = paste0(cardiomyopathy_terms, collapse = "(?=,|$)|\\b"),
                          "cardiomyopathy"
         ))

7.20 Carotid artery disease

carotid_artery_disease_terms <- c("carotid artery thrombosis",
                             "carotid atherosclerosis"
                             )

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          pattern = paste0(carotid_artery_disease_terms, collapse = "(?=,|$)|\\b"),
                          "carotid artery disease"
         ))

7.21 Cataract

url <- "http://www.ebi.ac.uk/ols4/api/ontologies/mondo/terms/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FMONDO_0005129/descendants"

cataract_terms <- get_descendants(url)
[1] "Number of terms collected:"
[1] 80
[1] "\n Some example terms"
[1] "autosomal recessive nonsyndromic congenital nuclear cataract"
[2] "diabetes mellitus type 2 associated cataract"                
[3] "early-onset posterior subcapsular cataract"                  
[4] "autosomal dominant non-nuclear cataract"                     
[5] "kozlowski rafinski klicharska syndrome"                      
gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          pattern = paste0(cataract_terms, collapse = "(?=,|$)|\\b"),
                          "cataract"
         ))

7.22 Celiac disease

gwas_study_info = 
gwas_study_info |> 
 mutate(l1_all_disease_terms  = 
          stringr::str_replace_all(l1_all_disease_terms,
                                  pattern = "refractory celiac disease",
                                   "celiac disease"
                          )  
        )

7.23 Charcot-marie-tooth disease type 1a

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          "charcot-marie-tooth disease type 1a, decreased fine motor function",
                          "charcot-marie-tooth disease type 1a"
         ))


gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          "charcot-marie-tooth disease type 1a, decreased walking ability",
                          "charcot-marie-tooth disease type 1a"
         ))

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          "charcot-marie-tooth disease type 1a, gait imbalance",
                          "charcot-marie-tooth disease type 1a, gait disturbance"
         ))

7.24 Chronic pain

url <- "http://www.ebi.ac.uk/ols4/api/ontologies/efo/terms/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FHP_0012532/descendants"

chronic_pain_terms <- get_descendants(url)
[1] "Number of terms collected:"
[1] 9
[1] "\n Some example terms"
[1] "chronic musculoskeletal pain" "female chronic pelvic pain"  
[3] "chronic widespread pain"      "multisite chronic pain"      
[5] "chronic shoulder pain"       
gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          pattern = paste0(chronic_pain_terms, collapse = "(?=,|$)|\\b"),
                          "chronic pain"
         ))

7.25 Chronic rhinosinusitis

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          "chronic rhinosinusitis with nasal polyps",
                          "chronic rhinosinusitis"
         ))

7.26 Cholecystitis

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          "acute cholecystitis",
                          "cholecystitis"
         ))

7.27 Coronary artery disease

url <- "http://www.ebi.ac.uk/ols4/api/ontologies/efo/terms/http%253A%252F%252Fwww.ebi.ac.uk%252Fefo%252FEFO_0001645/descendants"

coronary_artery_disease_terms <- get_descendants(url)
[1] "Number of terms collected:"
[1] 16
[1] "\n Some example terms"
[1] "coronary artery disease, autosomal dominant 2"
[2] "non-obstructive coronary artery disease"      
[3] "spontaneous coronary artery dissection"       
[4] "postoperative ventricular dysfunction"        
[5] "intermediate coronary syndrome"               
gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          pattern = paste0(coronary_artery_disease_terms, collapse = "(?=,|$)|\\b"),
                          "coronary artery disease"
         ))

7.28 Communication disorder

url <- "http://www.ebi.ac.uk/ols4/api/ontologies/mondo/terms/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FMONDO_0002182/descendants"

communication_disorder_terms <- get_descendants(url)
[1] "Number of terms collected:"
[1] 20
[1] "\n Some example terms"
[1] "mixed receptive-expressive language disorder"
[2] "stuttering, familial persistent, 3"          
[3] "stuttering, familial persistent, 4"          
[4] "stuttering, familial persistent, 2"          
[5] "stuttering, familial persistent, 1"          
gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          pattern = paste0(communication_disorder_terms, collapse = "(?=,|$)|\\b"),
                          "communication disorder"
         ))

7.29 Congestive heart failure

congestive_heart_failure_terms <- c("systolic heart failure",
                                 "diastolic heart failure",
                                 "cor pulmonale",
                                 "congenital heart disease",
                                 "chronic pulmonary heart disease"
                                 )


gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          pattern = paste0(congestive_heart_failure_terms, collapse = "(?=,|$)|\\b"),
                          "congestive heart failure"
         ))

7.30 Creutzfeldt jacob disease

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          "sporadic creutzfeld jacob disease",
                          "creutzfeldt jacob disease"
         ))

7.31 Delirium

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          "post-operative delirium",
                          "delirium, post-operative"
         ))

7.32 Dental caries

dental_caries_terms <- c("pit and fissure surface dental caries",
                         "smooth surface dental caries",
                         "primary dental caries",
                         "enamel caries",
                         "permanent dental caries"
                         )

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          pattern = paste0(dental_caries_terms, collapse = "(?=,|$)|\\b"),
                          "dental caries"
         )
)

7.33 Dermatitis

url <- "http://www.ebi.ac.uk/ols4/api/ontologies/doid/terms/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FDOID_2723/descendants"

dermatitis_terms <- get_descendants(url)
[1] "Number of terms collected:"
[1] 93
[1] "\n Some example terms"
[1] "epidermolysis bullosa with congenital localized absence of skin and deformity of nails"
[2] "diphenylmethane-4,4'-diisocyanate allergic contact dermatitis"                         
[3] "1-chloro-2,4-dinitrobenzene allergic contact dermatitis"                               
[4] "epidermolysis bullosa simplex with mottled pigmentation"                               
[5] "junctional epidermolysis bullosa with pyloric atresia"                                 
gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          pattern = paste0(dermatitis_terms, collapse = "(?=,|$)|\\b"),
                          "dermatitis"
         ))

# see: http://www.ebi.ac.uk/efo/EFO_0000274
eczema_terms <- c("atopic eczema",
                 "hand eczema",
                 "eczematoid dermatitis", # see: http://purl.obolibrary.org/obo/HP_0000964
                 "recalcitrant dermatitis" # assuming same as recalcitrant atopic dermatitis http://www.ebi.ac.uk/efo/EFO_1000651
                 )

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          pattern = paste0(eczema_terms, collapse = "(?=,|$)|\\b"),
                          "dermatitis"
         ))

7.34 Drug allergy

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          "beta-lactam allergy",
                          "drug allergy"
         ))

7.35 Diabetic eye disease

diabetic_eye_terms <- c("diabetic maculopathy",
                        "diabetic macular edema",
                        "diabetic retinopathy",
                        "proliferative diabetic retinopathy",
                        "non-proliferative diabetic retinopathy",
                        "diabetes mellitus type 2 associated cataract")


gwas_study_info = 
  gwas_study_info |> 
  mutate(l1_all_disease_terms  = 
           stringr::str_replace_all(l1_all_disease_terms ,
                                    pattern = paste0(diabetic_eye_terms, collapse = "(?=,|$)|\\b"),
                                    "diabetic eye disease"
           )  
  )

7.36 Diabetic neuropathy

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          "diabetic polyneuropathy",
                          "diabetic neuropathy"
         ))

7.37 Deficiency anemia

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          "iron deficiency anemia",
                          "deficiency anemia"
         ))

7.38 Encephalopathy

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          "delayed encephalopathy after acute poisoning",
                          "encephalopathy, poisoning"
         ))


gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          "encephalopathy acute infection-induced",
                          "encephalopathy"
         ))

7.39 Endocarditis

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          "bacterial endocarditis",
                          "endocarditis"
         ))

7.40 Food allergy

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          "peanut allergy|milk allergy|egg allergy|wheat allergic reaction",
                          "food allergy"
         ))

7.41 Glaucoma

url <- "http://www.ebi.ac.uk/ols4/api/ontologies/efo/terms/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FMONDO_0005041/descendants"

glaucoma_terms <- get_descendants(url)
[1] "Number of terms collected:"
[1] 19
[1] "\n Some example terms"
[1] "cyp1b1-related glaucoma with or without anterior segment dysgenesis"
[2] "glaucoma secondary to spherophakia/ectopia lentis and megalocornea" 
[3] "hereditary glaucoma, primary closed-angle"                          
[4] "primary angle closure glaucoma"                                     
[5] "secondary dysgenetic glaucoma"                                      
gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          pattern = paste0(glaucoma_terms, collapse = "(?=,|$)|\\b"),
                          "glaucoma"
         ))

7.42 Gout

# from: http://www.ebi.ac.uk/efo/EFO_0004274
gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          "renal overload-type gout",
                          "gout"
         ))

7.43 Graft vs host disease

graft_vs_host_terms <- c("chronic graft versus host disease",
                         "chronic graft vs. host disease",
                         "acute graft versus host disease",
                         "acute graft vs. host disease"
                         )

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          paste0(graft_vs_host_terms, collapse = "(?=,|$)|\\b"),
                          "graft versus host disease"
         ))

7.44 Heart block

heart_block_terms <- c("first degree atrioventricular block",
                      "second degree atrioventricular block",
                      "third-degree atrioventricular block",
                      "bundle branch block"
                      )

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          pattern = paste0(heart_block_terms, collapse = "(?=,|$)|\\b"),
                          "heart block"
         ))

7.45 Hemolytic anemia

url <- "http://www.ebi.ac.uk/ols4/api/ontologies/mondo/terms/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FMONDO_0003664/descendants"


hemolytic_anemia_terms <- get_descendants(url)
[1] "Number of terms collected:"
[1] 76
[1] "\n Some example terms"
[1] "dehydrated hereditary stomatocytosis with or without pseudohyperkalemia and/or perinatal edema"
[2] "anemia, nonspherocytic hemolytic, associated with abnormality of red cell membrane"            
[3] "anemia, nonspherocytic hemolytic, possibly due to defect in porphyrin metabolism"              
[4] "x-linked dyserythropoetic anemia with abnormal platelets and neutropenia"                      
[5] "hemolytic anemia due to erythrocyte adenosine deaminase overproduction"                        
gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          pattern = paste0(hemolytic_anemia_terms, collapse = "(?=,|$)|\\b"),
                          "hemolytic anemia"
         ))

7.46 Hernia of the abdominal wall

url <- "http://www.ebi.ac.uk/ols4/api/ontologies/hp/terms/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FHP_0004299/children"

hernia_abdominal_wall_terms <- get_descendants(url)
[1] "Number of terms collected:"
[1] 6
[1] "\n Some example terms"
[1] "incisional hernia" "umbilical hernia"  "inguinal hernia"  
[4] "femoral hernia"    "ventral hernia"   
gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          pattern = paste0(hernia_abdominal_wall_terms, collapse = "(?=,|$)|\\b"),
                          "hernia of the abdominal wall"
         ))

7.47 Herpes

gwas_study_info = 
gwas_study_info |> 
 mutate(l1_all_disease_terms  = 
          stringr::str_replace_all(l1_all_disease_terms,
                                  pattern = "herpes simplex infection",
                                   "herpesviridae infectious disease"
                          )  
        )

7.48 Hypertension

url <-  "http://www.ebi.ac.uk/ols4/api/ontologies/efo/terms/http%253A%252F%252Fwww.ebi.ac.uk%252Fefo%252FEFO_0000537/descendants"

hypertension_terms <- get_descendants(url)
[1] "Number of terms collected:"
[1] 46
[1] "\n Some example terms"
[1] "pulmonary veno-occlusive disease and/or pulmonary capillary haemangiomatosis"
[2] "pulmonary arterial hypertension associated with connective tissue disease"   
[3] "pulmonary arterial hypertension associated with chronic hemolytic anemia"    
[4] "pulmonary arterial hypertension associated with congenital heart disease"    
[5] "hyperuricemia-pulmonary hypertension-renal failure-alkalosis syndrome"       
hypertension_terms = c("primary pulmonary hypertension",
                       "primary hypertension",
                      hypertension_terms
                      )

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          pattern = paste0(hypertension_terms, collapse = "(?=,|$)|\\b"),
                          "hypertension"
         ))

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          pattern = paste0(hypertension_terms, collapse = "(?=,|$)|\\b"),
                          "hypertension"
         ))

7.49 HIV

gwas_study_info = gwas_study_info |>
    mutate(l1_all_disease_terms = 
         stringr::str_replace_all(l1_all_disease_terms,
                          "hiv-1 infection",
                          "hiv infection"
         ))

7.50 Hyperlipidemia

hyperlipidemia_terms <- c("hypercholesterolemia",
                         "familial hypercholesterolemia",
                         "familial hyperlipidemia"
                         )

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          pattern = paste0(hyperlipidemia_terms, collapse = "(?=,|$)|\\b"),
                          "hyperlipidemia"
         ))

7.51 Idiopathic generalized epilepsy

url <- "http://www.ebi.ac.uk/ols4/api/ontologies/snomed/terms/http%253A%252F%252Fsnomed.info%252Fid%252F36803009/descendants"

idiopathic_generalized_epilepsy_terms <- get_descendants(url)
[1] "Number of terms collected:"
[1] 4
[1] "\n Some example terms"
[1] "epilepsy with generalized tonic-clonic seizures alone (disorder)"
[2] "juvenile myoclonic epilepsy"                                     
[3] "childhood absence epilepsy"                                      
[4] "juvenile absence epilepsy"                                       
[5] NA                                                                
idiopathic_generalized_epilepsy_terms = stringr::str_remove_all(
                                        idiopathic_generalized_epilepsy_terms, 
                                        " \\(disorder\\)$"
                                        )


gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          pattern = paste0(idiopathic_generalized_epilepsy_terms, collapse = "(?=,|$)|\\b"),
                          "idiopathic generalized epilepsy"
         ))

7.52 Inborn errors of metabolism

inborn_error_metab <- c("inborn disorder of amino acid metabolism",
                        "inborn disorder of amino acid transport",
                        "inborn disorder of porphyrin metabolism",
                        "inborn carbohydrate metabolic disorder",
                        "familial lipoprotein lipase deficiency",
                        "lactose intolerance",
                        "hereditary hemochromatosis",
                        "alpha 1-antitrypsin deficiency",
                        "urea cycle disorder",
                        "gaucher disease",
                        "plasma protein metabolism disease",
                        "disorder of metabolite absorption and transport")

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          pattern = paste0(inborn_error_metab, collapse = "(?=,|$)|\\b"),
                          "inborn errors of metabolism"
         ))

7.53 Infectious meningitis

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          "bacterial meningitis|pneumococcal meningitis|viral meningitis",
                          "infectious meningitis"
         ))

7.54 Infertility (female & male)

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          "tubal factor infertility",
                          "female infertility"
                          )) |>
  mutate(l1_all_disease_terms  = 
                  stringr::str_replace_all(l1_all_disease_terms,
                          "azoospermia",
                          "male infertility"
                          )  )

7.55 Influenza a (h1n1) (subset of influenza)

gwas_study_info = gwas_study_info |>
    mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          "influenza a \\(h1n1\\)",
                          "influenza"
         ))

7.56 Intestinal obstruction

url <- "http://www.ebi.ac.uk/ols4/api/ontologies/mondo/terms/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FMONDO_0004565/descendants"

intestinal_obstruction_terms <- get_descendants(url)
[1] "Number of terms collected:"
[1] 21
[1] "\n Some example terms"
[1] "intestinal obstruction in the newborn due to guanylate cyclase 2c deficiency"
[2] "intestinal pseudoobstruction, neuronal, chronic idiopathic, x-linked"        
[3] "visceral neuropathy, familial, 1, autosomal recessive"                       
[4] "visceral neuropathy, familial, 3, autosomal dominant"                        
[5] "cystic fibrosis associated meconium ileus"                                   
# also add: http://purl.obolibrary.org/obo/DOID_8437

intestinal_obstruction_terms = c(intestinal_obstruction_terms,
                                 "intestinal impaction"
                                 )

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          pattern = paste0(intestinal_obstruction_terms, collapse = "(?=,|$)|\\b"),
                          "intestinal obstruction"
         ))

7.57 Ischemic heart disease

url <- "http://www.ebi.ac.uk/ols4/api/ontologies/cvdo/terms/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FDOID_3394/descendants"

ischemic_heart_disease_terms <- get_descendants(url)
[1] "Number of terms collected:"
[1] 4
[1] "\n Some example terms"
[1] "atherosclerotic ischemic cardiomyopathy"
[2] "ischemic cardiomyopathy"                
[3] "acute coronary syndrome"                
[4] "coronary heart disease"                 
[5] NA                                       
gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          pattern = paste0(ischemic_heart_disease_terms, collapse = "(?=,|$)|\\b"),
                          "ischemic heart disease"
         ))

7.58 Juvenile idiopathic arthritis

jia_terms <- c("systemic juvenile idiopathic arthritis",
                "oligoarticular juvenile idiopathic arthritis",
                "polyarticular juvenile idiopathic arthritis"
                )

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
        stringr::str_replace_all(l1_all_disease_terms,
                         paste0(jia_terms, collapse = "(?=,|$)|\\b"),
                         "juvenile idiopathic arthritis")
        )

7.59 Kidney disease

url <- "http://www.ebi.ac.uk/ols4/api/ontologies/doid/terms/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FDOID_1074/descendants"

kidney_disease_terms <- get_descendants(url)
[1] "Number of terms collected:"
[1] 7
[1] "\n Some example terms"
[1] "acute kidney tubular necrosis" "end stage renal disease"      
[3] "chronic kidney disease"        "acute kidney failure"         
[5] "hepatorenal syndrome"         
gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          pattern = paste0(kidney_disease_terms, collapse = "(?=,|$)|\\b"),
                          "kidney disease"
         ))

7.60 Laryngitis

laryngitis_terms <- c("acute laryngitis",
                      "chronic laryngitis"
                      )

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          pattern = paste0(laryngitis_terms, collapse = "(?=,|$)|\\b"),
                          "laryngitis"
         ))

7.61 Lewy body dementia

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          "lewy body attribute",
                          "lewy body dementia"
         ))

7.62 Malaria

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          "plasmodium falciparum malaria|plasmodium vivax malaria",
                          "malaria"
         ))

7.63 Methamphetamine use disorders

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          "methamphetamine dependence|methamphetamine-induced psychosis",
                          "methamphetamine use disorders"
         ))

7.64 Migraine

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          "migraine disorder|migraine with aura|migraine without aura",
                          "migraine"
         ))

7.65 Multiple sclerosis

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          "relapsing-remitting multiple sclerosis",
                          "multiple sclerosis"
         ))

7.66 Myocardial infarction

url <- "http://www.ebi.ac.uk/ols4/api/ontologies/efo/terms/http%253A%252F%252Fwww.ebi.ac.uk%252Fefo%252FEFO_0000612/descendants"

myocardial_infarction_terms <- get_descendants(url)
[1] "Number of terms collected:"
[1] 7
[1] "\n Some example terms"
[1] "subsequent st elevation (stemi) and non-st elevation (nstemi) myocardial infarction"
[2] "acute anterolateral myocardial infarction"                                          
[3] "non-st elevation myocardial infarction"                                             
[4] "anterolateral myocardial infarction"                                                
[5] "st elevation myocardial infarction"                                                 
myocardial_infarction_terms <- c("subsequent st elevation \\(stemi\\) and non-st elevation \\(nstemi\\) myocardial infarction",
                              myocardial_infarction_terms
                              )

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          pattern = paste0(myocardial_infarction_terms, collapse = "(?=,|$)|\\b"),
                          "myocardial infarction"
         ))


gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          "post-operative myocardial infarction",
                          "myocardial infarction, post-operative"
         ))

7.67 Photosensitivity disease

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
        stringr::str_replace_all(l1_all_disease_terms,
                         "phototoxic dermatitis|skin sensitivity to sun",
                         "photosensitivity disease")
        )

7.68 Phobic disorder

url <- "http://www.ebi.ac.uk/ols4/api/ontologies/efo/terms/http%253A%252F%252Fwww.ebi.ac.uk%252Fefo%252FEFO_1001908/descendants"

phobic_disorder_terms <- get_descendants(url)
[1] "Number of terms collected:"
[1] 11
[1] "\n Some example terms"
[1] "panic disorder with agoraphobia" "blood-injection-injury phobia"  
[3] "social anxiety disorder"         "specific phobia"                
[5] "flying phobia"                  
gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          pattern = paste0(phobic_disorder_terms, collapse = "(?=,|$)|\\b"),
                          "phobic disorder"
         ))

7.69 Hereditary hemochromatosis

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          "hereditary hemochromatosis type 1",
                          "hereditary hemochromatosis"
         ))

7.70 Myasthenia gravis

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          "late-onset myasthenia gravis",
                          "myasthenia gravis"
         ))

7.71 Neuromyelitis optica

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          "aqp4-igg-negative neuromyelitis optica|aqp4-igg-positive neuromyelitis optica",
                          "neuromyelitis optica"
         ))

7.72 Neurofibromatosis

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          "neurofibromatosis type 1|neurofibromatosis type 2",
                          "neurofibromatosis"
         ))

7.73 Non-alcoholic fatty liver disease (NAFLD)

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          "non-alcoholic steatohepatitis|non-alcoholic liver disease",
                          "non-alcoholic fatty liver disease"
         ))

7.74 Obesity

gwas_study_info = 
gwas_study_info |> 
 mutate(l1_all_disease_terms  = 
          stringr::str_replace_all(l1_all_disease_terms,
                                  pattern = "morbid obesity|metabolically healthy obesity",
                                   "obesity"
                          )  
        )

7.75 Obsessive-compulsive disorder

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          "obsessive-compulsive trait",
                          "obsessive-compulsive disorder"
         ))

7.76 Osteoarthritis

osteoarthritis_terms <- c("knee osteoarthritis",
                         "hip osteoarthritis",
                         "hand osteoarthritis",
                         "spine osteoarthritis",
                         "toe osteoarthritis"
                         )

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          pattern = paste0(osteoarthritis_terms, collapse = "(?=,|$)|\\b"),
                          "osteoarthritis"
         )
  )

7.77 Osteonecrosis

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          "idiopathic osteonecrosis of the femoral head",
                          "osteonecrosis"
         ))

7.78 Pancreatitis

url <- "http://www.ebi.ac.uk/ols4/api/ontologies/efo/terms/http%253A%252F%252Fwww.ebi.ac.uk%252Fefo%252FEFO_0000278/descendants"

pancreatitis_terms <- get_descendants(url)
[1] "Number of terms collected:"
[1] 9
[1] "\n Some example terms"
[1] "thiopurine immunosuppressant-induced pancreatitis"
[2] "asparaginase-induced acute pancreatitis"          
[3] "hereditary chronic pancreatitis"                  
[4] "autoimmune pancreatitis type 1"                   
[5] "non-alcoholic pancreatitis"                       
gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          pattern = paste0(pancreatitis_terms, collapse = "(?=,|$)|\\b"),
                          "pancreatitis"
         ))

7.79 Peridontal disease

url <- "http://www.ebi.ac.uk/ols4/api/ontologies/doid/terms/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FDOID_3388/descendants"

periodontal_terms <- get_descendants(url)
[1] "Number of terms collected:"
[1] 19
[1] "\n Some example terms"
[1] "suppurative periapical periodontitis"
[2] "necrotizing ulcerative gingivitis"   
[3] "chronic apical periodontitis"        
[4] "acute apical periodontitis"          
[5] "periapical periodontitis"            
gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          pattern = paste0(periodontal_terms, collapse = "(?=,|$)|\\b"),
                          "periodontal disease"
         ))

7.80 Poisoning

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          "carbon monoxide poisoning",
                          "poisoning"
         ))

7.81 Polycythemia

url <- "http://www.ebi.ac.uk/ols4/api/ontologies/efo/terms/http%253A%252F%252Fwww.ebi.ac.uk%252Fefo%252FEFO_0005804/descendants"

polycythemia_terms <- get_descendants(url)
[1] "Number of terms collected:"
[1] 16
[1] "\n Some example terms"
[1] "autosomal recessive secondary polycythemia not associated with vhl gene"
[2] "primary familial polycythemia due to epo receptor mutation"             
[3] "autosomal dominant secondary polycythemia"                              
[4] "congenital secondary polycythemia"                                      
[5] "acquired secondary polycythemia"                                        
gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          pattern = paste0(polycythemia_terms, collapse = "(?=,|$)|\\b"),
                          "polycythemia"
         ))

7.82 Pulmonary embolism

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          "pulmonary embolism, pulmonary infarction",
                          "pulmonary embolism"
         ))

7.83 Pulmonary fibrosis

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          "idiopathic pulmonary fibrosis",
                          "pulmonary fibrosis"
         ))

7.84 Psoriasis

psoriasis_terms <- c("psoriasis vulgaris",
                      "psoriasis area and severity index",
                      "cutaneous psoriasis",
                      "psoriatic arthritis",
                     "parapsoriasis")


gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          pattern = paste0(psoriasis_terms, collapse = "(?=,|$)|\\b"),
                          "psoriasis")
  )

7.85 Psychosis

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          "psychosis predisposition",
                          "psychosis")
         )

7.86 Retinopathy

retinopathy_terms <- c("chronic central serous retinopathy",
                       "central serous retinopathy"
                       )

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms = 
           stringr::str_replace_all(l1_all_disease_terms,
                          pattern = paste0(retinopathy_terms, collapse = "(?=,|$)|\\b"),
                          "retinopathy")
  )

7.87 Rheumatoid arthritis

ra_terms <- c("acpa-positive rheumatoid arthritis",
               "acpa-negative rheumatoid arthritis",
              "adult-onset stills disease")

gwas_study_info = 
gwas_study_info |> 
 mutate(l1_all_disease_terms  = 
          stringr::str_replace_all(l1_all_disease_terms,
                                  pattern = paste0(ra_terms, collapse = "(?=,|$)|\\b"),
                                   "rheumatoid arthritis"
                          )  
        )

7.88 Schizoaffective disorder

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          "schizoaffective disorder-bipolar type",
                          "schizoaffective disorder")
         )

7.89 Schizophrenia

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          "treatment refractory schizophrenia",
                          "schizophrenia")
         )

7.90 Sciatica

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_remove_all(l1_all_disease_terms,
                          "^ldh-related sciatica, |, ldh-related sciatica$"
         )
  )

7.91 Scoliosis

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          "adolescent idiopathic scoliosis",
                          "scoliosis")
         )

7.92 Sickle cell disease

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          "sickle cell anemia",
                          "sickle cell disease and related diseases")
         )

7.93 Sleep apnea

sleep_apnea_terms <- c("sleep apnea measurement during non-rem sleep",
                      "sleep apnea measurement during rem sleep",
                      "obstructive sleep apnea")

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          pattern = paste0(sleep_apnea_terms, collapse = "(?=,|$)|\\b"),
                          "sleep apnea")
  )

7.94 Staphylococcus aureus infection

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          "methicillin-resistant staphylococcus aureus infection",
                          "staphylococcus aureus infection"
         ))

7.95 Stroke

https://www.ebi.ac.uk/ols4/ontologies/efo/classes/http%253A%252F%252Fwww.ebi.ac.uk%252Fefo%252FEFO_0000712?lang=en

hemorrhage_terms <- c("intracerebral hemorrhage",
                      "non-lobar intracerebral hemorrhage",
                      "lobar intracerebral hemorrhage")

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          pattern = paste0(hemorrhage_terms, collapse = "(?=,|$)|\\b"),
                          "hemorrhagic stroke")
  )

stroke_terms <- c("stroke outcome",
                  "large artery stroke",
                  "small vessel stroke",
                  "ischemic stroke",
                  "stroke disorder",
                  "cardioembolic stroke",
                  "hemorrhagic stroke",
                  "intracranial hemorrhage",
                  "subdural hemorrhage"
                  )

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          paste0(stroke_terms, collapse = "(?=,|$)|\\b"),
                          "stroke")
  )


gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          "post-operative stroke",
                          "post-operative, stroke")
  )

7.96 Tachycardia

tacycardia_terms <- c("paroxysmal tachycardia",
                     "ventricular tachycardia",
                     "supraventricular tachycardia",
                     "paroxysmal supraventricular tachycardia",
                     "paroxysmal ventricular tachycardia"
                     )


gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          pattern = paste0(tacycardia_terms, collapse = "(?=,|$)|\\b"),
                          "tachycardia"
         )
  )

7.97 Systemic scleroderma

url <- "http://www.ebi.ac.uk/ols4/api/ontologies/efo/terms/http%253A%252F%252Fwww.ebi.ac.uk%252Fefo%252FEFO_0000717/descendants"

scleroderma_terms <- get_descendants(url)
[1] "Number of terms collected:"
[1] 5
[1] "\n Some example terms"
[1] "anti-topoisomerase-i-antibody-positive systemic scleroderma"
[2] "anti-centromere-antibody-positive systemic scleroderma"     
[3] "limited cutaneous systemic sclerosis"                       
[4] "limited scleroderma"                                        
[5] "diffuse scleroderma"                                        
gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          pattern = paste0(scleroderma_terms, collapse = "(?=,|$)|\\b"),
                          "systemic scleroderma"
         ))

7.98 Tenosynovitis

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
        stringr::str_replace_all(l1_all_disease_terms,
                         "stenosing tenosynovitis",
                         "tenosynovitis")
        )

7.99 Tuberculosis

tb_terms <- c("mycobacterium tuberculosis infection",
              "\\bpulmonary tuberculosis",
              "extrapulmonary tuberculosis",
              "meningeal tuberculosis")

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          pattern = paste0(tb_terms, collapse = "(?=,|$)|\\b"),
                          "tuberculosis"
         ))

7.100 Type 1 diabetes

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          "latent autoimmune diabetes in adults",
                          "type 1 diabetes mellitus"
         ))

7.101 Thrombocytopenia

thrombocytopenia_terms <- c("thrombocytopenia 4",
                           "acquired thrombocytopenia",
                           "primary thrombocytopenia"
                           )

7.102 Tinea

tinea_terms <- c("tinea pedis",
                 "tinea unguium",
                 "dermatophytosis"
                 )


gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          pattern = paste0(tinea_terms, collapse = "(?=,|$)|\\b"),
                          "tinea"
         ))

7.103 Uveitis

uveitis_terms <- c("anterior uveitis"
                   )

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          pattern = paste0(uveitis_terms, collapse = "(?=,|$)|\\b"),
                          "uveitis"
         ))

7.104 Vitamin b deficiency

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          "vitamin b12 deficiency",
                          "vitamin b deficiency"
         ))

7.105 Visual impairment

url <- "http://www.ebi.ac.uk/ols4/api/ontologies/efo/terms/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FHP_0000505/descendants"

visual_impairment_terms <- get_descendants(url)
[1] "Number of terms collected:"
[1] 12
[1] "\n Some example terms"
[1] "constriction of peripheral visual field"
[2] "severely reduced visual acuity"         
[3] "slow decrease in visual acuity"         
[4] "peripheral visual field loss"           
[5] "cerebral visual impairment"             
gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          pattern = paste0(visual_impairment_terms, collapse = "(?=,|$)|\\b"),
                          "visual impairment"
         ))

8 Further grouping of terms

8.1 Abnormal refraction

url <- "http://www.ebi.ac.uk/ols4/api/ontologies/upheno/terms/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FHP_0000539/descendants"

abnormal_refraction_terms <- get_descendants(url)
[1] "Number of terms collected:"
[1] 22
[1] "\n Some example terms"
[1] "against the rule astigmatism" "with the rule astigmatism"   
[3] "moderate hypermetropia"       "lenticular astigmatism"      
[5] "irregular astigmatism"       
gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(l1_all_disease_terms,
                          pattern = paste0(abnormal_refraction_terms, collapse = "(?=,|$)|\\b"),
                          "abnormality of refraction"
         ))

9 Final summary - number of unique study terms

9.1 Deal with duplicate terms created during grouping

gwas_study_info = 
 gwas_study_info |>
  rowwise() |>
  mutate(l1_all_disease_terms = paste0(sort(unique(unlist(strsplit(l1_all_disease_terms, ", ")))),
                                      collapse = ", ")
         ) |>
  ungroup()

9.2 Deal with hanging commas and spaces

gwas_study_info = gwas_study_info |>
  mutate(l1_all_disease_terms = stringr::str_remove_all(l1_all_disease_terms, "^,|,$")
         ) |>
  mutate(l1_all_disease_terms = stringr::str_trim(l1_all_disease_terms)
         ) 

9.3 Final summary - number of unique study terms pairs

n_studies_trait = gwas_study_info |>
  dplyr::filter(DISEASE_STUDY == T) |>
  dplyr::select(l1_all_disease_terms, PUBMED_ID) |>
  dplyr::distinct() |>
  dplyr::group_by(l1_all_disease_terms) |>
  dplyr::summarise(n_studies = dplyr::n()) |>
  dplyr::arrange(desc(n_studies))


head(n_studies_trait)
# A tibble: 6 × 2
  l1_all_disease_terms      n_studies
  <chr>                         <int>
1 type 2 diabetes mellitus        145
2 asthma                          134
3 alzheimers disease              124
4 breast cancer                   112
5 major depressive disorder       108
6 schizophrenia                   108
dim(n_studies_trait)
[1] 2705    2

9.3.1 When separate studies with multiple terms

diseases <- stringr::str_split(pattern = ", ", 
                               gwas_study_info$l1_all_disease_terms[gwas_study_info$l1_all_disease_terms != ""])  |> 
            unlist() |>
            stringr::str_trim()


test <- data.frame(trait = unique(diseases))

length(unique(diseases))
[1] 1888
# make frequency table
freq <- table(as.factor(diseases))

# sort in decreasing order
freq_sorted <- sort(freq, decreasing = TRUE)

# show top N, e.g. top 10
head(freq_sorted, 10)

           kidney disease              hypertension  type 2 diabetes mellitus 
                    10915                      7096                       922 
  coronary artery disease major depressive disorder        alzheimers disease 
                      501                       471                       422 
            schizophrenia                    asthma                  covid-19 
                      368                       348                       305 
                   stroke 
                      303 

9.3.2 Save the updated gwas_study_info with harmonized disease terms

fwrite(gwas_study_info,
        here::here("output/gwas_cat/gwas_study_info_group_l1.csv")
         )

sessionInfo()
R version 4.3.1 (2023-06-16)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS 15.6.1

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: America/Los_Angeles
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices datasets  utils     methods   base     

other attached packages:
[1] jsonlite_2.0.0    httr_1.4.7        stringr_1.5.1     data.table_1.17.8
[5] dplyr_1.1.4       workflowr_1.7.1  

loaded via a namespace (and not attached):
 [1] compiler_4.3.1    renv_1.0.3        promises_1.3.3    tidyselect_1.2.1 
 [5] Rcpp_1.1.0        git2r_0.36.2      callr_3.7.6       later_1.4.2      
 [9] jquerylib_0.1.4   yaml_2.3.10       fastmap_1.2.0     here_1.0.1       
[13] R6_2.6.1          generics_0.1.4    curl_6.4.0        knitr_1.50       
[17] tibble_3.3.0      rprojroot_2.1.0   bslib_0.9.0       pillar_1.11.0    
[21] rlang_1.1.6       utf8_1.2.6        cachem_1.1.0      stringi_1.8.7    
[25] httpuv_1.6.16     xfun_0.52         getPass_0.2-4     fs_1.6.6         
[29] sass_0.4.10       cli_3.6.5         withr_3.0.2       magrittr_2.0.3   
[33] ps_1.9.1          digest_0.6.37     processx_3.8.6    rstudioapi_0.17.1
[37] lifecycle_1.0.4   vctrs_0.6.5       evaluate_1.0.4    glue_1.8.0       
[41] whisker_0.4.1     rmarkdown_2.29    tools_4.3.1       pkgconfig_2.0.3  
[45] htmltools_0.5.8.1