Last updated: 2026-01-04

Checks: 7 0

Knit directory: genomics_ancest_disease_dispar/

This reproducible R Markdown analysis was created with workflowr (version 1.7.1). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.


Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

The command set.seed(20220216) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version 3e01f04. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .DS_Store
    Ignored:    .Rproj.user/
    Ignored:    .venv/
    Ignored:    analysis/.DS_Store
    Ignored:    ancestry_dispar_env/
    Ignored:    data/.DS_Store
    Ignored:    data/RCDCFundingSummary_01042026.xlsx
    Ignored:    data/cdc/
    Ignored:    data/cohort/
    Ignored:    data/gbd/.DS_Store
    Ignored:    data/gbd/IHME-GBD_2021_DATA-d8cf695e-1.csv
    Ignored:    data/gbd/IHME-GBD_2023_DATA-73cc01fd-1.csv
    Ignored:    data/gbd/ihme_gbd_2019_global_disease_burden_rate_all_ages.csv
    Ignored:    data/gbd/ihme_gbd_2019_global_paf_rate_percent_all_ages.csv
    Ignored:    data/gbd/ihme_gbd_2021_global_disease_burden_rate_all_ages.csv
    Ignored:    data/gbd/ihme_gbd_2021_global_paf_rate_percent_all_ages.csv
    Ignored:    data/gwas_catalog/
    Ignored:    data/icd/.DS_Store
    Ignored:    data/icd/2025AA/
    Ignored:    data/icd/IHME_GBD_2019_COD_CAUSE_ICD_CODE_MAP_Y2020M10D15.XLSX
    Ignored:    data/icd/IHME_GBD_2019_NONFATAL_CAUSE_ICD_CODE_MAP_Y2020M10D15.XLSX
    Ignored:    data/icd/IHME_GBD_2021_COD_CAUSE_ICD_CODE_MAP_Y2024M05D16.XLSX
    Ignored:    data/icd/IHME_GBD_2021_NONFATAL_CAUSE_ICD_CODE_MAP_Y2024M05D16.XLSX
    Ignored:    data/icd/UK_Biobank_master_file.tsv
    Ignored:    data/icd/cdc_valid_icd10_Sep_23_2025.xlsx
    Ignored:    data/icd/cdc_valid_icd9_Sep_23_2025.xlsx
    Ignored:    data/icd/hp_umls_mapping.csv
    Ignored:    data/icd/lancet_conditions_icd10.xlsx
    Ignored:    data/icd/manual_disease_icd10_mappings.xlsx
    Ignored:    data/icd/mondo_umls_mapping.csv
    Ignored:    data/icd/phecode_international_version_unrolled.csv
    Ignored:    data/icd/phecode_to_icd10_manual_mapping.xlsx
    Ignored:    data/icd/semiautomatic_ICD-pheno.txt
    Ignored:    data/icd/semiautomatic_ICD-pheno_UKB_subset.txt
    Ignored:    data/icd/umls-2025AA-mrconso.zip
    Ignored:    data/icd/~$lancet_conditions_icd10.xlsx
    Ignored:    data/icd/~$phecode_to_icd10_manual_mapping.xlsx
    Ignored:    data/~$RCDCFundingSummary_01042026.xlsx
    Ignored:    figures/
    Ignored:    human_dictionary/
    Ignored:    igsr_populations.tsv
    Ignored:    output/.DS_Store
    Ignored:    output/abstracts/
    Ignored:    output/doccano/
    Ignored:    output/fulltexts/
    Ignored:    output/gwas_cat/
    Ignored:    output/gwas_cohorts/
    Ignored:    output/icd_map/
    Ignored:    output/trait_ontology/
    Ignored:    pubmedbert-cohort-ner-model/
    Ignored:    pubmedbert-cohort-ner/
    Ignored:    r-spacyr/
    Ignored:    renv/
    Ignored:    venv/
    Ignored:    visualization.Rdata

Unstaged changes:
    Modified:   analysis/disease_inves_by_ancest.Rmd
    Modified:   analysis/get_full_text.Rmd
    Modified:   analysis/map_trait_to_icd10.Rmd
    Modified:   analysis/missing_cohort_info.Rmd
    Modified:   analysis/replication_ancestry_bias.Rmd
    Modified:   analysis/text_for_cohort_labels.Rmd
    Modified:   code/test_extract_cdc_meta.R

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.


These are the previous versions of the repository in which changes were made to the R Markdown (analysis/group_non_cancer_diseases.Rmd) and HTML (docs/group_non_cancer_diseases.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File Version Author Date Message
html 554f8b0 IJbeasley 2026-01-03 Build site.
html 7817b8d IJbeasley 2025-12-29 Build site.
html b147392 IJbeasley 2025-12-29 Build site.
html da3fa14 IJbeasley 2025-12-29 Build site.
html 45dbf81 IJbeasley 2025-12-29 Build site.
Rmd 30b7321 IJbeasley 2025-12-29 Update grouping of non-cancer diseases
html 13f8a41 IJbeasley 2025-10-08 Build site.
html f5f3ace IJbeasley 2025-10-08 Build site.
html d17abe1 IJbeasley 2025-10-03 Build site.
Rmd 05f8bff IJbeasley 2025-10-03 Update / improve non-cancer disease grouping
html f2704e8 IJbeasley 2025-09-30 Build site.
Rmd fb5244b IJbeasley 2025-09-30 workflowr::wflow_publish("analysis/group_non_cancer_diseases.Rmd")
html 2e0797e IJbeasley 2025-09-29 Build site.
Rmd 9174f83 IJbeasley 2025-09-29 workflowr::wflow_publish("analysis/group_non_cancer_diseases.Rmd")
html 77a0ef3 IJbeasley 2025-09-29 Build site.
Rmd c76c87d IJbeasley 2025-09-29 workflowr::wflow_publish("analysis/group_non_cancer_diseases.Rmd")
html 1649f35 IJbeasley 2025-09-29 Build site.
Rmd 9264169 IJbeasley 2025-09-29 workflowr::wflow_publish("analysis/group_non_cancer_diseases.Rmd")
html 088e39a IJbeasley 2025-09-29 Build site.
Rmd abcec61 IJbeasley 2025-09-29 workflowr::wflow_publish("analysis/group_non_cancer_diseases.Rmd")
html f145113 IJbeasley 2025-09-29 Build site.
Rmd 08a1dec IJbeasley 2025-09-29 workflowr::wflow_publish("analysis/group_non_cancer_diseases.Rmd")
html 03e566e IJbeasley 2025-09-29 Build site.
Rmd 7a9fbf7 IJbeasley 2025-09-29 workflowr::wflow_publish("analysis/group_non_cancer_diseases.Rmd")
html d4610b1 IJbeasley 2025-09-28 Build site.
Rmd 44dec62 IJbeasley 2025-09-28 workflowr::wflow_publish("analysis/group_non_cancer_diseases.Rmd")
html 67e5464 IJbeasley 2025-09-28 Build site.
Rmd 99426f4 IJbeasley 2025-09-28 workflowr::wflow_publish("analysis/group_non_cancer_diseases.Rmd")
html b4be7a2 IJbeasley 2025-09-28 Build site.
Rmd 25fdd1f IJbeasley 2025-09-28 workflowr::wflow_publish("analysis/group_non_cancer_diseases.Rmd")
html d775543 IJbeasley 2025-09-28 Build site.
Rmd e777ee8 IJbeasley 2025-09-28 workflowr::wflow_publish("analysis/group_non_cancer_diseases.Rmd")
html 3640ce0 IJbeasley 2025-09-28 Build site.
Rmd d8f04de IJbeasley 2025-09-28 Using ICD / PheCode mapping

Set up

library(dplyr)
library(stringr)
library(data.table)

Ontology help - for getting disease subtypes

source(here::here("code/get_term_descendants.R"))

Load Data

# gwas_study_info <- fread(here::here("output/gwas_cat/gwas_study_info_group.csv"))
gwas_study_info <- data.table::fread(here::here("output/gwas_cat/gwas_study_cancer_group.csv"))

Basic corrections

Post-operative

gwas_study_info = gwas_study_info |> 
    mutate(collected_all_disease_terms  = 
         stringr::str_remove_all(collected_all_disease_terms,
                         pattern = "^post-operative |^postoperative"
         ))

Typo: spine psteoarthritis

gwas_study_info =
  gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern(c(
                            "spine psteoarthritis"
                            )
                            ),
                          "spine osteoarthritis"
         )
  )

Specific study corrections

Pubmed id: 37271218 (Study of complications in Behcet’s disease)

Taking definitions etc. from: https://pmc-ncbi-nlm-nih-gov.ucsf.idm.oclc.org/articles/PMC8584626/

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         ifelse(PUBMED_ID == "37271218",
                stringr::str_replace_all(collected_all_disease_terms,
                          pattern = "abnormality of the eye",
                          replacement = "uveitis, retinal vasculitis, chorioretinitis, papillitis"
         ),
         collected_all_disease_terms
         )
  )

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         ifelse(PUBMED_ID == "37271218",
                stringr::str_replace_all(collected_all_disease_terms,
                          pattern = "abnormality of the genital system",
                          replacement = "genital aphthosis"
         ),
         collected_all_disease_terms
         )
  )

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         ifelse(PUBMED_ID == "37271218",
                stringr::str_replace_all(collected_all_disease_terms,
                          pattern = "abnormal aortic morphology",
                          replacement = "arterial thrombosis, aortic aneurysm, aneurysm of pulmonary artery"
         ),
         collected_all_disease_terms
         )
  )

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         ifelse(PUBMED_ID == "37271218",
                stringr::str_replace_all(collected_all_disease_terms,
                          pattern = "abnormal venous morphology",
                          replacement = "venous thromboembolism, superficial thrombophlebitis"
         ),
         collected_all_disease_terms
         )
  )


gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         ifelse(PUBMED_ID == "37271218",
                stringr::str_replace_all(collected_all_disease_terms,
                          pattern = "abnormality of the vasculature",
                          replacement = "arterial thrombosis, aortic aneurysm, aneurysm of pulmonary artery, venous thromboembolism, superficial thrombophlebitis"
         ),
         collected_all_disease_terms
         )
  )

Pubmed id: 21878436 (Coronary restenosis <- add inclusion criteria)

# for pubmed id 21878436 
# measuring Coronary restenosis, which is an outcome of Percutaneous coronary intervention (PCI)
# included / background traits for the relevant study GENetic DEterminants of Restenosis project (GENDER) 
# are found:
# https://academic.oup.com/eurheartj/article/25/13/1163/465324?login=true#89258510

# stable angina
# silent ischaemia -> silent myocardial ischaemia
# non-ST-elevation acute coronary syndromes excl. acute ST-elevation myocardial infarction
# this includes: 
# unstable angina, 
# non-ST-elevation myocardial infarction (NSTEMI) within acute subendocardial myocardial infarction
# Acute ischemic heart disease, unspecified: 

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         ifelse(PUBMED_ID == "21878436",
                stringr::str_replace_all(collected_all_disease_terms,
                          pattern = "coronary restenosis",
                          replacement = "coronary restenosis, stable angina, silent myocardial ischaemia, unstable angina, acute subendocardial myocardial infarction, acute ischemic heart disease"
         ),
         collected_all_disease_terms
         )
  )

How many unique traits are there now?

diseases <- stringr::str_split(pattern = ", ",
                               gwas_study_info$collected_all_disease_terms[gwas_study_info$collected_all_disease_terms != ""])  |>
  unlist() |>
  stringr::str_trim()

diseases <- unique(diseases)

print(length(diseases))
[1] 2166

Filtering some traits

Abnormal blistering of the skin -> blister

gwas_study_info =
  gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         ifelse(`DISEASE/TRAIT` == "Blister (PheCode 911)",
         stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern("abnormal blistering of the skin"),
                          "blister"
         ),
         collected_all_disease_terms
         )
  )

Abnormality of the digestive system

gwas_study_info |> 
  filter(grepl(vec_to_grep_pattern("abnormality of the digestive system"), 
               collected_all_disease_terms, 
               perl = T)) |>
  pull(`DISEASE/TRAIT`) |>
  unique()
[1] "Other abdominal problem"
gwas_study_info =
  gwas_study_info |>
  mutate(collected_all_disease_terms =
           str_remove_all(collected_all_disease_terms,
                          pattern = vec_to_grep_pattern("abnormality of the digestive system")))

Abnormality of skin pigmentation

gwas_study_info |> 
  filter(grepl("abnormality of skin pigmentation", collected_all_disease_terms)) |>
  pull(`DISEASE/TRAIT`) |>
  unique()
 [1] "Skin colour saturation"                                                          
 [2] "Perceived skin darkness"                                                         
 [3] "Skin pigmentation"                                                               
 [4] "Non-melanoma skin cancer"                                                        
 [5] "Skin luminance"                                                                  
 [6] "Skin red/green component"                                                        
 [7] "Skin yellow/blue component"                                                      
 [8] "25-hydroxyvitamin D levels x skin colour (very fair vs. fair) interaction"       
 [9] "25-hydroxyvitamin D levels x skin colour (very fair vs. light olive) interaction"
[10] "Skin colour - Brown (UKB data field 1717)"                                       
[11] "Skin colour - Dark olive (UKB data field 1717)"                                  
[12] "Skin colour - Fair (UKB data field 1717)"                                        
[13] "Skin colour - Light olive (UKB data field 1717)"                                 
[14] "Skin colour - Very fair (UKB data field 1717)"                                   
[15] "Skin colour - Dark olive (UKB data field 1717) (Gene-based burden)"              
[16] "Skin colour - Brown (UKB data field 1717) (Gene-based burden)"                   
[17] "Skin colour - Fair (UKB data field 1717) (Gene-based burden)"                    
[18] "Skin colour - Light olive (UKB data field 1717) (Gene-based burden)"             
[19] "Skin colour - Very fair (UKB data field 1717) (Gene-based burden)"               
[20] "Skin pigmentation traits"                                                        
[21] "Tatto pigmentation"                                                              
# as can see, most of these traits are not about true skin pigmentation abnormalities
# just about skin pigmentation in general

# if `DISEASE/TRAIT` contains "UKB data field 1717" (which is just a skin colour measurment field)
# then remove "abnormality of skin pigmentation" from collected_all_disease_terms
gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         ifelse(grepl("UKB data field 1717", 
                     `DISEASE/TRAIT`,
                     ignore.case = TRUE),
                stringr::str_remove_all(collected_all_disease_terms,
                          pattern = "abnormality of skin pigmentation"
         ),
         collected_all_disease_terms
         )
  )

# also remove if `DISEASE/TRAIT` contains "Tattoo pigmentation"
gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         ifelse(grepl("tattoo pigmentation|tatto pigmentation", 
                     `DISEASE/TRAIT`,
                     ignore.case = TRUE),
                stringr::str_remove_all(collected_all_disease_terms,
                          pattern = "abnormality of skin pigmentation"
         ),
         collected_all_disease_terms
         )
  )

# also remove if `DISEASE/TRAIT` is any of the following:
# - Perceived skin darkness 
# - Skin red/green component
# - Skin yellow/blue component
# - Skin pigmentation traits
# - Skin pigmentation 
# - Skin luminance

remove_terms <- c("Perceived skin darkness",
                  "Skin red/green component",
                  "Skin yellow/blue component",
                  "Skin pigmentation traits",
                  "Skin pigmentation",
                  "Skin luminance",
                  "Skin colour saturation"
                  )

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         ifelse(`DISEASE/TRAIT` %in% remove_terms,
                stringr::str_remove_all(collected_all_disease_terms,
                          pattern = "abnormality of skin pigmentation"
         ),
         collected_all_disease_terms
         )
  )

# this leaves PUBMED_ID 23548203, which according to suplementary table 1
# is `Number of non-melanoma skin cancer (5+)` 
# so change to "non-melanoma skin cancer"

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         ifelse(PUBMED_ID == "23548203",
                stringr::str_replace_all(collected_all_disease_terms,
                          pattern = "abnormality of skin pigmentation",
                          replacement = "non-melanoma skin cancer"
         ),
         collected_all_disease_terms
         )
  )

Abnormal thrombosis

# interesting things going on with abnormal thrombosis
gwas_study_info |> 
  filter(grepl(vec_to_grep_pattern("abnormal thrombosis"), 
               collected_all_disease_terms, perl = T)) |> 
  select(`DISEASE/TRAIT`, collected_all_disease_terms, PUBMED_ID)
                                                                                                                                   DISEASE/TRAIT
                                                                                                                                          <char>
1:                                                                                                                                    Thrombosis
2:     Blood clot, DVT, bronchitis, emphysema, asthma, rhinitis, eczema, allergy diagnosed by doctor: PHESANT recoding (UKB data field 6152_100)
3: Blood clot, DVT, bronchitis, emphysema, asthma, rhinitis, eczema, allergy diagnosed by doctor: Blood clot in the lung (UKB data field 6152_7)
4:                                                                              Blood clot in the lung (UKB data field 6152) (Gene-based burden)
5:                                                                                                  Blood clot in the lung (UKB data field 6152)
   collected_all_disease_terms PUBMED_ID
                        <char>     <int>
1:         abnormal thrombosis  26908601
2:         abnormal thrombosis  34737426
3:         abnormal thrombosis  34737426
4:         abnormal thrombosis  34662886
5:         abnormal thrombosis  34662886
# if DISEASE/TRAIT contains Blood clot in the lung \\(UKB data field 6152\\) 
# replace abnormal thrombosis in collected_all_disease_terms with pulmonary embolism
gwas_study_info =
  gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         ifelse(grepl("Blood clot in the lung \\(UKB data field 6152\\)", 
                     `DISEASE/TRAIT`,
                     ignore.case = TRUE),
                stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern("abnormal thrombosis"),
                          "pulmonary embolism"
         ),
         collected_all_disease_terms
         )
  )


gwas_study_info =
  gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         ifelse(grepl("Blood clot in the lung \\(UKB data field 6152_7\\)", 
                     `DISEASE/TRAIT`,
                     ignore.case = TRUE),
                stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern("abnormal thrombosis"),
                          "pulmonary embolism"
         ),
         collected_all_disease_terms
         )
  )

## PHESANT recoding (UKB data field 6152_100)
gwas_study_info =
  gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         ifelse(grepl("PHESANT recoding \\(UKB data field 6152_100\\)", 
                     `DISEASE/TRAIT`,
                     ignore.case = TRUE),
                stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern("abnormal thrombosis, allergic disease, respiratory system disease"),
                          "pulmonary embolism, deep vein thrombosis, chronic bronchitis, emphysema, asthma, rhinitis, eczema"
         ),
         collected_all_disease_terms
         )
  )


# for pubmed id: 26908601
# remove abnormal thrombosis
# because this is included in the other provided terms 

gwas_study_info =
  gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         ifelse(PUBMED_ID == "26908601",
                stringr::str_remove_all(collected_all_disease_terms,
                          vec_to_grep_pattern("abnormal thrombosis")
         ),
         collected_all_disease_terms
         )
  )

Acute coronary syndrome

# for pubmed id 28753643
# acute coronary syndrome (ACS) 
# comes from the SOLID-TIMI 52 cohort: experienced acute coronary syndrome (ACS; unstable angina, non–ST-elevation or ST-elevation MI)

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         ifelse(PUBMED_ID == "28753643",
                stringr::str_replace_all(collected_all_disease_terms,
                          pattern = "acute coronary syndrome",
                          replacement = "unstable angina, acute subendocardial myocardial infarction, acute transmural myocardial infarction of unspecified site"
         ),
         collected_all_disease_terms
         )
  )

# define similarly in pubmed id: 24952865, 25583994, 25935875
gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         ifelse(PUBMED_ID %in% c("28753643","24952865", "28753645", "25583994", "25935875", "34158603"),
                stringr::str_replace_all(collected_all_disease_terms,
                          pattern = "acute coronary syndrome",
                          replacement = "unstable angina, acute subendocardial myocardial infarction, acute transmural myocardial infarction of unspecified site"
         ),
         collected_all_disease_terms
         )
  )

# for pubmed id 35356681, acute coronary syndrome (ACS) - is actually non-ST elevation acute coronary syndrome
gwas_study_info |> 
  filter(grepl(vec_to_grep_pattern("acute coronary syndrome"), 
               collected_all_disease_terms, 
               perl = T)) |> 
  filter(PUBMED_ID == "35356681") |> 
  pull(`DISEASE/TRAIT`) |>
  unique()
[1] "Bisoprolol clearance in non-ST elevation acute coronary syndrome"
gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         ifelse(PUBMED_ID == "35356681",
                stringr::str_replace_all(collected_all_disease_terms,
                          pattern = "acute coronary syndrome",
                          replacement = "acute subendocardial myocardial infarction"
         ),
         collected_all_disease_terms
         )
  )

Attached earlobe

gwas_study_info |> 
  filter(grepl("attached earlobe", collected_all_disease_terms)) |>
  pull(`DISEASE/TRAIT`) |>
  unique()
[1] "Common traits (Other)" "Earlobe attachment"   
# remove attached earlobe
gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_remove_all(collected_all_disease_terms,
                          pattern = "attached earlobe"
         ))

Chronic disease

gwas_study_info |> 
  filter(grepl(vec_to_grep_pattern("chronic disease"), 
               collected_all_disease_terms, 
               perl = T)) |>
  pull(`DISEASE/TRAIT`) |>
  unique()
[1] "Long standing illness disability or infirmity (UKB data field 2188)"                    
[2] "Long standing illness disability or infirmity (UKB data field 2188) (Gene-based burden)"
gwas_study_info =
  gwas_study_info |>
  mutate(collected_all_disease_terms =
           str_remove_all(collected_all_disease_terms,
                          pattern = vec_to_grep_pattern("chronic disease")))

Facial wrinkling

gwas_study_info |> 
  filter(grepl("facial wrinkling", collected_all_disease_terms)) |>
  pull(`DISEASE/TRAIT`) |>
  unique()
[1] "Facial wrinkles"                                
[2] "Facial wrinkles (forehead)"                     
[3] "Facial wrinkles (frown lines)"                  
[4] "Facial wrinkles (crow's feet)"                  
[5] "Facial wrinkles (principal components analysis)"
[6] "Facial wrinkles (under eye)"                    
# remove facial wrinkling
gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_remove_all(collected_all_disease_terms,
                          pattern = "facial wrinkling"
         ))

Fibrosis

gwas_study_info |> 
  filter(grepl(vec_to_grep_pattern("fibrosis"), 
               collected_all_disease_terms, 
               perl = T)) |>
  pull(`DISEASE/TRAIT`) |>
  unique()
[1] "Fibrosis"                                                                                                                                                           
[2] "Fibrosis (moderate to severe) in head and neck cancer treated with radiotherapy"                                                                                    
[3] "Fibrosis (severe) in head and neck cancer treated with radiotherapy"                                                                                                
[4] "Fibrosis or atrophy (moderate to severe) in head and neck cancer treated with radiotherapy"                                                                         
[5] "Fibrosis or atrophy (severe) in head and neck cancer treated with radiotherapy"                                                                                     
[6] "Standardised Total Average Toxicity (late dysphagia, xerostomia, fibrosis or atrophy) in head and neck cancer treated with radiotherapy"                            
[7] "Standardised Total Average Toxicity (acute dysphagia, mucositis, late dysphagia, xerostomia, fibrosis or atrophy) in head and neck cancer treated with radiotherapy"
gwas_study_info =
  gwas_study_info |>
  mutate(collected_all_disease_terms =
           str_replace_all(collected_all_disease_terms,
                           pattern = vec_to_grep_pattern("fibrosis, head and neck cancer"),
                          replacement = "head and neck cancer"
           )
  )

gwas_study_info =
  gwas_study_info |>
  mutate(collected_all_disease_terms =
           ifelse(PUBMED_ID == "38601445",
           str_replace_all(collected_all_disease_terms,
                           pattern = vec_to_grep_pattern("fibrosis"),
                          replacement = "lung cancer"
           ),
           collected_all_disease_terms
           )
  )

Formal thought disorder

gwas_study_info |> 
  filter(grepl(vec_to_grep_pattern("formal thought disorder"), 
               collected_all_disease_terms, 
               perl = T)) |>
  pull(`DISEASE/TRAIT`) |>
  unique()
[1] "Formal thought disorder in schizophrenia"                                                                           
[2] "Positive formal thought disorder in major depressive disorder, bipolar disorder or schizophrenia spectrum disorders"
# as formal thought disorder is within other psychiatric traits
# remove
gwas_study_info =
  gwas_study_info |>
  mutate(collected_all_disease_terms =
           str_remove_all(collected_all_disease_terms,
                          pattern = vec_to_grep_pattern("formal thought disorder")))

Gender identity disorder

gwas_study_info = 
  gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern("childhood gender nonconformity"),
                          "gender identity disorder"
         ))

gwas_study_info =
    gwas_study_info |>
    mutate(collected_all_disease_terms  = 
         stringr::str_remove_all(collected_all_disease_terms,
                          pattern = "gender identity disorder"
         ))

gwas_study_info =
  gwas_study_info |>
    mutate(collected_all_disease_terms  = 
         stringr::str_remove_all(collected_all_disease_terms,
                          pattern = "sexual and gender identity disorders"
         ))

Handedness

gwas_study_info |> 
  filter(grepl("handedness", collected_all_disease_terms)) |>
  pull(`DISEASE/TRAIT`) |>
  unique()
 [1] "Handedness (non-right-handed vs right-handed)"                          
 [2] "Handedness (Left-handed vs. non-left-handed)"                           
 [3] "Handedness (left-handed vs. right-handed)"                              
 [4] "Relative hand skill in reading disability"                              
 [5] "Relative hand skill"                                                    
 [6] "Handedness in dyslexia"                                                 
 [7] "Handedness"                                                             
 [8] "Handedness (chirality/laterality): Right-handed (UKB data field 1707_1)"
 [9] "Handedness (Right-handed vs. non-right-handed)"                         
[10] "Ambidextrousness"                                                       
[11] "Ambidextrousness (ambidextrous vs right-handed)"                        
[12] "Left-handedness"                                                        
[13] "Handedness (confirmatory factor analysis Factor 36)"                    
# remove handedness
gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_remove_all(collected_all_disease_terms,
                          pattern = "handedness"
         ))

# also remove functional laterality if Handedness chirality laterality is in DISEASE/TRAIT
gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         ifelse(grepl("Handedness", 
                     `DISEASE/TRAIT`,
                     ignore.case = TRUE),
                stringr::str_remove_all(collected_all_disease_terms,
                          pattern = "functional laterality"
         ),
         collected_all_disease_terms
         )
  )

# also remove functional laterality if Usual side of head for mobile phone use is in DISEASE/TRAIT
gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         ifelse(grepl("Usual side of head for mobile phone use", 
                     `DISEASE/TRAIT`,
                     ignore.case = TRUE),
                stringr::str_remove_all(collected_all_disease_terms,
                          pattern = "functional laterality"
         ),
         collected_all_disease_terms
         )
  )

# also remove functional laterality if PUBMED_ID=20585627
gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         ifelse(PUBMED_ID == "20585627",
                stringr::str_remove_all(collected_all_disease_terms,
                          pattern = "functional laterality"
         ),
         collected_all_disease_terms
         )
  ) 

Hyperplasia

gwas_study_info |> 
  filter(grepl(vec_to_grep_pattern("hyperplasia"), 
               collected_all_disease_terms, 
               perl = T)) |>
  pull(`DISEASE/TRAIT`) |>
  unique()
[1] "Bilateral adrenal hyperplasia"
gwas_study_info =
  gwas_study_info |>
  mutate(collected_all_disease_terms =
           ifelse(grepl("Bilateral adrenal hyperplasia",
                        `DISEASE/TRAIT`,
                       ignore.case = TRUE),
                  str_replace_all(collected_all_disease_terms,
                                  pattern = vec_to_grep_pattern("hyperplasia"),
                                  replacement = "bilateral adrenal hyperplasia"
                  ),
                  collected_all_disease_terms
           )
)

Hypersensitivity reaction disease

gwas_study_info |> 
  filter(grepl(vec_to_grep_pattern("hypersensitivity reaction disease"), 
               collected_all_disease_terms, 
               perl = T)) |>
  pull(`DISEASE/TRAIT`) |>
  unique()
[1] "Nevirapine-induced hypersensitivity in HIV"
gwas_study_info =
  gwas_study_info |>
  mutate(collected_all_disease_terms =
           str_remove_all(collected_all_disease_terms,
                          pattern = vec_to_grep_pattern("hypersensitivity reaction disease")))

Inflammatory skin disease

gwas_study_info |> 
  filter(grepl(vec_to_grep_pattern("inflammatory skin disease"), 
               collected_all_disease_terms, 
               perl = T)) |>
  pull(`DISEASE/TRAIT`) |>
  unique()
[1] "Inflammatory skin disease"
gwas_study_info =
  gwas_study_info |>
  mutate(collected_all_disease_terms =
           str_replace_all(collected_all_disease_terms,
                           pattern = vec_to_grep_pattern("inflammatory skin disease"),
                          replacement = "atopic dermatitis, psoriasis"
           )
  )

Sneezing in response to bright light (autosomal dominant compelling helio-ophthalmic outburst syndrome)

gwas_study_info |> 
  filter(grepl("autosomal dominant compelling helio-ophthalmic outburst syndrome", collected_all_disease_terms)) |>
  pull(`DISEASE/TRAIT`) |>
  unique()
[1] "Photic sneeze reflex"
# remove autosomal dominant compelling helio-ophthalmic outburst syndrome
gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_remove_all(collected_all_disease_terms,
                          pattern = "autosomal dominant compelling helio-ophthalmic outburst syndrome"
         ))

Skin sensitivity to sun

gwas_study_info |> 
  filter(grepl("skin sensitivity to sun", collected_all_disease_terms)) |>
  pull(`DISEASE/TRAIT`) |>
  unique()
 [1] "Skin sensitivity to sun"                                                                           
 [2] "Ease of sunburn"                                                                                   
 [3] "Ease of skin tanning"                                                                              
 [4] "Ease of skin tanning - Get mildly or occasionally tanned (UKB data field 1727)"                    
 [5] "Ease of skin tanning - Get moderately tanned (UKB data field 1727)"                                
 [6] "Ease of skin tanning - Get very tanned (UKB data field 1727)"                                      
 [7] "Ease of skin tanning - Never tan only burn (UKB data field 1727)"                                  
 [8] "Ease of skin tanning - Get mildly or occasionally tanned (UKB data field 1727) (Gene-based burden)"
 [9] "Ease of skin tanning - Get moderately tanned (UKB data field 1727) (Gene-based burden)"            
[10] "Ease of skin tanning - Never tan only burn (UKB data field 1727) (Gene-based burden)"              
[11] "Ease of skin tanning - Get very tanned (UKB data field 1727) (Gene-based burden)"                  
# remove skin sensitivity to sun
gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_remove_all(collected_all_disease_terms,
                          pattern = vec_to_grep_pattern("skin sensitivity to sun")
         ))

# also remove suntan
gwas_study_info |> 
  filter(grepl("suntan", collected_all_disease_terms)) |>
  pull(`DISEASE/TRAIT`) |>
  unique()
[1] "Tanning"          "Low tan response"
gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_remove_all(collected_all_disease_terms,
                          pattern = vec_to_grep_pattern("suntan"
                          )
         ))

Group some non-cancer diseases together

Abnormal brain morphology

# if DISEASE/TRAIT contains Unidentified bright object on brain MRI, 
# then replace abnormal brain morphology with Other abnormal findings on diagnostic imaging of central nervous system 
gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         ifelse(grepl("Unidentified bright object on brain MRI", 
                     `DISEASE/TRAIT`,
                     ignore.case = TRUE),
                stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern("abnormal brain morphology"),
                          "other abnormal findings on diagnostic imaging of central nervous system"
         ),
         collected_all_disease_terms
         )
  )

Abnormal circulating lipid concentration -> disorders of lipid metabolism

gwas_study_info =
  gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         ifelse(grepl("Disorders of lipid metabolism", 
                     `DISEASE/TRAIT`,
                     ignore.case = TRUE),
                stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern("abnormal circulating lipid concentration"),
                          "disorders of lipid metabolism"
         ),
         collected_all_disease_terms
         )
  )

Abnormal ecg

# if abnormal ecg in disease/trait, then change abnormal ekg to abnormal ecg
gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         ifelse(grepl("abnormal ecg", 
                     `DISEASE/TRAIT`,
                     ignore.case = TRUE),
                stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern("abnormal ekg"),
                          "abnormal ecg"
         ),
         collected_all_disease_terms
         )
  )

Abnormality of gait

gwas_study_info = 
  gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern(c(
                            "gait imbalance",
                            "decreased walking ability",
                            "postural instability")
                            ),
                          "abnormality of gait"
         ))

Abnormality of the nervous system, abnormality of the musculature, abnormality of the skeletal system

gwas_study_info = 
  gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern(c(
                            "abnormality of the musculature, abnormality of the nervous system, abnormality of the skeletal system")
                            ),
                          "other symptoms and signs involving the nervous and musculoskeletal systems"
         ))

Other demyelinating diseases of central nervous system

# if DISEASE/TRAIT contains PheCode 341
# replace abnormality of the nervous system, with other demyelinating diseases of central nervous system

gwas_study_info = 
  gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         ifelse(grepl("PheCode 341", 
                     `DISEASE/TRAIT`,
                     ignore.case = TRUE),
                stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern("abnormality of the nervous system"),
                          "other demyelinating diseases of central nervous system"
         ),
         collected_all_disease_terms
         )
  )


gwas_study_info = 
  gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         ifelse(grepl("PheCode 345", 
                     `DISEASE/TRAIT`,
                     ignore.case = TRUE),
                stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern("abnormality of the nervous system"),
                          "epilepsy"
         ),
         collected_all_disease_terms
         )
  )


gwas_study_info = 
  gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         ifelse(grepl("Neurological involvement in Behcet's disease", 
                     `DISEASE/TRAIT`,
                     ignore.case = TRUE),
                stringr::str_remove_all(collected_all_disease_terms,
                          vec_to_grep_pattern("abnormality of the nervous system")
         ),
         collected_all_disease_terms
         )
  )

Abdominal abscess

gwas_study_info |>
  filter(grepl(vec_to_grep_pattern("abdominal abscess"), 
               collected_all_disease_terms, perl = T)) 
         DISEASE/TRAIT PUBMED_ID  YEAR
                <char>     <int> <int>
1: Abdominal infection  35173190  2022
                                                                                                                                                          STUDY
                                                                                                                                                         <char>
1: A genome-wide association study in a large community-based cohort identifies multiple loci associated with susceptibility to bacterial and viral infections.
   STUDY_ACCESSION      MAPPED_TRAIT                     MAPPED_TRAIT_URI
            <char>            <char>                               <char>
1:    GCST90103953 abdominal abscess http://www.ebi.ac.uk/efo/EFO_1001753
   MAPPED_BACKGROUND_TRAIT MAPPED_BACKGROUND_TRAIT_URI     disease_terms
                    <char>                      <char>            <char>
1:                                                     abdominal abscess
   MAPPED_TRAIT_CATEGORY background_disease_terms BACKGROUND_TRAIT_CATEGORY
                  <char>                   <char>                    <char>
1:      Disease/Disorder                                              Other
   DISEASE_STUDY all_disease_terms collected_all_disease_terms
          <lgcl>            <char>                      <char>
1:          TRUE abdominal abscess           abdominal abscess
# just one study - 35173190, has abdominal abscess
# they define unusually as: D73.3, K35-37, K57, K61, K63.0, K65, K75.0, K81, K83.0
# 'Abdominal infections'

gwas_study_info = 
  gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         ifelse(PUBMED_ID == "35173190",
                stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern("abdominal abscess"),
                          "abdominal infections code"
         ),
         collected_all_disease_terms
         )
  )

Abnormal mammogram

# if DISEASE/TRAIT contains Abnormal mammogram, then change Abnormality of the breast to abnormal mammogram

gwas_study_info = 
  gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         ifelse(grepl("Abnormal findings on mammogram or breast exam", 
                     `DISEASE/TRAIT`,
                     ignore.case = TRUE),
                stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern("abnormality of the breast"),
                          "abnormal mammogram"
         ),
         collected_all_disease_terms
         )
  )

Achalasia of cardia

gwas_study_info = 
  gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern("achalasia"),
                          "achalasia of cardia"
         ))

Acute pancreatitis

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern("asparaginase-induced acute pancreatitis"),
                          "acute pancreatitis"
         ))

Amebiasis

# If `DISEASE/TRAIT` contains Diarrhoea-associated Entamoeba histolytica infection
# then replace amebiasis with amoeboma of intestine 

gwas_study_info =
  gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         ifelse(grepl("Diarrhoea-associated Entamoeba histolytica infection", 
                     `DISEASE/TRAIT`,
                     ignore.case = TRUE),
                stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern("amebiasis"),
                          "amoeboma of intestine"
         ),
         collected_all_disease_terms
         )
  )

Altitude sickness

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern("chronic mountain sickness"),
                          "altitude sickness"
         ))

Alopecia

alopecia_terms <- c("frontal fibrosing alopecia" 
                   )


gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          pattern = vec_to_grep_pattern(alopecia_terms),
                          "alopecia"
         )
  )
         
         
drug_induced_alopecia_terms <- c(
                               "chemotherapy-induced alopecia"
                               )


gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          pattern = vec_to_grep_pattern(drug_induced_alopecia_terms),
                          "drug-induced androgenic alopecia"
         )
  )

Alcohol and nicotine codependence

# alcohol and nicotine codependence -> alcohol dependence, nicotine dependence

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern("alcohol and nicotine codependence"),
                          "alcohol dependence, nicotine dependence"
         )) 

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern("alcohol dependence"),
                          "alcohol-related disorders"
         ))

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern("nicotine dependence"),
                          "tobacco use disorder"
         ))

Amyloidosis

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern("al amyloidosis"),
                          "amyloidosis"
         ))

Anxiety disorder

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern("anxiety"),
                          "anxiety disorder"
         )) 

Aplastic anemia

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          pattern = vec_to_grep_pattern("severe aplastic anemia"),
                          "aplastic anemia"
         ))

Androgenic alopecia

gwas_study_info = 
  gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          pattern = vec_to_grep_pattern("androgenetic alopecia"),
                          "androgenic alopecia"
         ))  

Angioedema -> Angioneurotic oedema

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern("angioedema"),
                          "angioneurotic oedema"
         ))

Astigmatism

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern("corneal astigmatism"),
                          "astigmatism"
         ))

Asthma

gwas_study_info = gwas_study_info |>
      mutate(collected_all_disease_terms  = 
           stringr::str_replace_all(collected_all_disease_terms,
                            vec_to_grep_pattern(c("atopic asthma", "chronic obstructive asthma")),
                            "asthma"
           )) |>
    mutate(collected_all_disease_terms  = 
           stringr::str_replace_all(collected_all_disease_terms,
                            vec_to_grep_pattern(c("childhood onset asthma",
                                                    "adult onset asthma",
                                                    "aspirin-induced asthma"
                                                  )
                            ),
                            "asthma"
           ))

Atrial fibrillation and flutter

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern("atrial flutter"),
                          "atrial fibrillation and flutter"
         ))

Atopic dermatitis

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern("recalcitrant atopic dermatitis"),
                          "atopic dermatitis"
         ))

Background retinopathy and retinal vascular changes

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern(c("macular telangiectasia type 2"
                                                )
                                              ),
                          "background retinopathy and retinal vascular changes"
         ))

Bacterial infection

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern(c(
                             "bacterial infection"
                             )
                             ),
                          "bacterial infection nos"
         ))

Benign mammary dysplasia

# if `DISEASE/TRAIT contains Benign mammary dysplasia`, then change abnormality of the breast to benign mammary dysplasia
gwas_study_info = 
  gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         ifelse(grepl("Benign mammary dysplasia", 
                     `DISEASE/TRAIT`,
                     ignore.case = TRUE),
                stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern("abnormality of the breast"),
                          "benign mammary dysplasia"
         ),
         collected_all_disease_terms
         )
  )

Bipolar disorder

gwas_study_info = gwas_study_info |> 
    mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern(
                            c("bipolar ii disorder",
                              "bipolar i disorder"
                              )
                            ),
                          "bipolar disorder"
         ))

Bronchopneumonia and lung abscess

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern(c(
                             "bronchopneumonia, lung abscess"
                             )
                             ),
                          "bronchopneumonia and lung abscess"
         ))

Blindness and low vision

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern(c(
                             "blindness",
                             "progressive visual loss",
                             "visual loss"
                             )
                             ),
                          "blindness and low vision"
         ))

# Disorders of optic nerve and visual pathways
gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern(c(
                             "visual pathway disorder",
                             "optic nerve disorder"
                             )
                             ),
                          "disorders of optic nerve and visual pathways"
         ))

# visuospatial impairment
gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern(c(
                             "visuospatial impairment"
                             )
                             ),
                          "other and unspecified symptoms and signs involving cognitive functions and awareness"
         ))

Cafe-au-lait spot

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern("cafe-au-lait spot"),
                          "café au lait spots"
         ))

Cancer of eye

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern(c(
                             "ocular melanoma")
                             ),
                          "cancer of eye"
         ))

Candidiasis of vulva and vagina

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern(c(
                             "vaginal yeast infection")
                             ),
                          "candidiasis of vulva and vagina"
         ))

Carbuncle and furuncle

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern(c(
                             "carbuncle", "furuncle")
                             ),
                          "carbuncle and furuncle"
         ))

Cardiac arrhythmia

arrhythmia_terms <-
c("ventricular arrhythmia",
  "torsades de pointes"
  )

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          pattern = vec_to_grep_pattern(arrhythmia_terms),
                          "cardiac arrhythmia"
         ))

Cardiomyopathy

cardiomyopathy_terms <- c("nonischemic cardiomyopathy")

gwas_study_info =
gwas_study_info |> 
 mutate(collected_all_disease_terms  = 
          stringr::str_replace_all(collected_all_disease_terms,
                                  pattern = vec_to_grep_pattern(cardiomyopathy_terms),
                                   "cardiomyopathy"
                          )  
        )

Celiac disease

gwas_study_info = 
gwas_study_info |> 
 mutate(collected_all_disease_terms  = 
          stringr::str_replace_all(collected_all_disease_terms,
                                  pattern = vec_to_grep_pattern("refractory celiac disease"),
                                   "celiac disease"
                          )  
        )

Central and perpheral ertigo

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern("central nervous system origin vertigo"),
                          "central origin vertigo"
         ))


gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern("peripheral vertigo"),
                          "peripheral or central vertigo"
         ))

Cerebral atherosclerosis

# if Brain vascular atherosclerosis in DISEASE/TRAIT, then change vascular brain injury to  cerebral atherosclerosis 

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         ifelse(grepl("Brain vascular atherosclerosis", 
                     `DISEASE/TRAIT`,
                     ignore.case = TRUE),
                stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern("vascular brain injury"),
                          "cerebral atherosclerosis"
         ),
         collected_all_disease_terms
         )
  )

Cerebrovascular disease

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
        ifelse(grepl("Vascular brain injury", 
                     `DISEASE/TRAIT`,
                     ignore.case = TRUE),
         stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern(c(
                             "vascular brain injury"
                             )
                             ),
                          "cerebrovascular disease"
         ),
         collected_all_disease_terms)
  )

Congenital anomalies of intestine

# anorectal malformation to congenital anomalies of intestine
gwas_study_info =
  gwas_study_info |>
  mutate(collected_all_disease_terms  = 
                stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern("anorectal malformation"),
                          "congenital anomalies of intestine"
         )
  )

Cerebral infarction

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
          ifelse(grepl("Ischemic stroke \\(cardioembolic\\)", 
                     `DISEASE/TRAIT`,
                     ignore.case = TRUE),
         stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern(c(
                             "cardiac embolism"
                             )),
                          "cardioembolic stroke"
         ),
         collected_all_disease_terms
         )
  )

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern(c(
                             "mri defined brain infarct",
                             "cardioembolic stroke")
                             ),
                          "cerebral infarction"
         ))

Cervical cancer, dysplasia -> cervical cancer, cervical dysplasia

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern("cervical cancer, dysplasia"),
                          "cervical cancer, cervical dysplasia"
         ))

Chromosomal anomalies

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern(c(
                            "22q11.2 deletion syndrome",
                            "abnormality of chromosome segregation",
                            "fragile x syndrome")
                            ),
                          "chromosomal anomalies"
         ))

Cholangitis

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern("sclerosing cholangitis"),
                          "cholangitis"
         ))

Chronic kidney disease

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms = 
         stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern("stage 5 chronic kidney disease"),
                          "chronic kidney disease"
         ))

Chronic non-alcoholic pancreatitis

# if Non-alcoholic chronic pancreatitis  in `DISEASE/TRAIT`,
# then non-alcoholic pancreatitis to chronic pancreatitis

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         ifelse(grepl("Non-alcoholic chronic pancreatitis", 
                     `DISEASE/TRAIT`,
                     ignore.case = TRUE),
                stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern("non-alcoholic pancreatitis"),
                          "chronic pancreatitis"
         ),
         collected_all_disease_terms
         )
  )

Chronic pain

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern(c(
                            "chronic widespread pain",
                            "multisite chronic pain",
                            "chronic musculoskeletal pain",
                            "chronic pain syndrome")
                            ),
                          "chronic pain"
         ))

Chronic pharyngitis, nasopharyngitis

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern(c("chronic pharyngitis, nasopharyngitis")
                                              ),
                          "chronic pharyngitis, chronic nasopharyngitis"
         ))

Chronic rhinosinusitis

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern("chronic rhinosinusitis with nasal polyps"),
                          "chronic rhinosinusitis"
         ))

Chronic sinusitis

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern("chronic sinus infection"),
                          "chronic sinusitis"
         ))

Crohn’s Disease

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern("perianal crohns disease"),
                          "crohns disease"
         ))

Cluster headache syndrome

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern("cluster headache"),
                          "cluster headache syndrome"
         ))

Circumscribed brain atrophy

brain_atropy <- c("frontotemporal dementia",
                  "grn-related frontotemporal lobar degeneration with tdp43 inclusions",
                  "pick disease")

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern(brain_atropy),
                          "circumscribed brain atrophy"
         ))

Contact dermatitis

pattern <- vec_to_grep_pattern("contact dermatitis due to nickel")


gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms = 
         stringr::str_replace_all(collected_all_disease_terms,
                          pattern = pattern,
                          "contact dermatitis"
         ))

Congenital heart disease/s

congenital_heart_disease_terms <- c(
  "heart septal defect",  
  "atrial heart septal defect",
  
  "congenital left-sided heart lesions",
  "congenital right-sided heart lesions",
  
  "congenital anomaly of the great arteries", # equiv = "Congenital malformation of great arteries, unspecified"
  
  # malformation of cardiac septum, 
    "abnormal cardiac septum morphology",
    "atrioventricular canal defect",
  "ventricular outflow obstruction"
)
  

url <- "http://www.ebi.ac.uk/ols4/api/ontologies/efo/terms/http%253A%252F%252Fwww.ebi.ac.uk%252Fefo%252FEFO_0005207/descendants"

congenital_heart_disease_terms <- c(congenital_heart_disease_terms,
                                  get_descendants(url)
                                  ) |>
  unique()
[1] "Number of terms collected:"
[1] 106
[1] "\n Some example terms"
[1] "double outlet right ventricle with subaortic or doubly committed ventricular septal defect with pulmonary stenosis"
[2] "gata6-related congenital heart disease with or without pancreatic agenesis or neonatal diabetes"                   
[3] "double outlet right ventricle with non-committed subpulmonary ventricular septal defect"                           
[4] "congenitally uncorrected transposition of the great arteries with cardiac malformation"                            
[5] "congenitally uncorrected transposition of the great arteries with coarctation"                                     
gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          pattern = vec_to_grep_pattern(congenital_heart_disease_terms),
                          "congenital heart disease"
         )
  )

# if DISEASE/TRAIT contains Congenital heart disease 
# then replace abnormal cardiovascular system morphology with congenital heart disease
gwas_study_info = 
  gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         ifelse(grepl("Congenital heart disease", 
                     `DISEASE/TRAIT`,
                     ignore.case = TRUE),
                stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern("abnormal cardiovascular system morphology"),
                          "congenital heart disease"
         ),
         collected_all_disease_terms
         )
  )


url <- "http://www.ebi.ac.uk/ols4/api/ontologies/mondo/terms/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FMONDO_0024239/descendants"

more_congenital_heart_disease_terms <- get_descendants(url)
[1] "Number of terms collected:"
[1] 230
[1] "\n Some example terms"
[1] "double outlet right ventricle with subaortic or doubly committed ventricular septal defect with pulmonary stenosis"
[2] "double outlet right ventricle with atrioventricular septal defect, pulmonary stenosis, heterotaxy"                 
[3] "gata6-related congenital heart disease with or without pancreatic agenesis or neonatal diabetes"                   
[4] "double outlet right ventricle with subaortic or doubly committed ventricular septal defect"                        
[5] "pulmonary valve agenesis-ventricular septal defect-persistent ductus arteriosus syndrome"                          
congenital_heart_disease_terms <- c(congenital_heart_disease_terms,
                                  more_congenital_heart_disease_terms
                                  ) |>
  unique() |>
  str_length_sort()


gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          pattern = vec_to_grep_pattern(congenital_heart_disease_terms),
                          "congenital anomaly of cardiovascular system"
         )
  )

Cardiac congenital anomalies

cardiac_congenital_anomalies_terms <- c(
  "bicuspid aortic valve"
)

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          pattern = vec_to_grep_pattern(cardiac_congenital_anomalies_terms),
                          "cardiac congenital anomalies"
         )
  )

Congenital hypertrophic pyloric stenosis

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern("infantile hypertrophic pyloric stenosis"),
                          "congenital hypertrophic pyloric stenosis"
         ))

Congenital hypothyroidism

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern("congenital hypothyroidism due to developmental anomaly"),
                          "congenital hypothyroidism"
         ))

Congestive heart failure (CHF) NOS

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern(c(
                            "diastolic heart failure",
                            "systolic heart failure")
                            ),
                          "congestive heart failure \\(chf\\) nos"
         ))

Coronary artery aneurysm and dissection

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern(c(
                            "coronary aneurysm",
                            "coronary artery dissection")
                            ),
                          "coronary artery aneurysm and dissection"
         ))

# Aortic aneurysm

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern(c(
                            "thoracic aortic aneurysm")
                            ),
                          "aortic aneurysm"
         ))

Creutzfeldt jacob disease

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern(
                          c("sporadic creutzfeld jacob disease",
                            "creutzfeldt jacob disease",
                            "creutzfeldt-jacob disease")
                          ),
                          "creutzfeldt-jakob disease"
         ))

Cryoglobulinemia

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern("cryoglobulinemia"),
                          "cryoglobulinaemia"
         ))

Cystic fibrosis with intestinal manifestations

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern("cystic fibrosis associated meconium ileus"),
                          "cystic fibrosis with intestinal manifestations"
         ))

Degeneration of macula and posterior pole

macular <- c("macular degeneration",
             "atrophic macular degeneration",
             "retinal drusen"
             )

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern(macular),
                          "degeneration of macula and posterior pole"
         ))

Dermatomyositis

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern("clinically amyopathic dermatomyositis"),
                          "dermatomyositis"
         ))

Dental caries

dental_caries_terms <- c("pit and fissure surface dental caries",
                         "smooth surface dental caries",
                         "primary dental caries",
                         "enamel caries",
                         "permanent dental caries"
                         )

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          pattern = vec_to_grep_pattern(dental_caries_terms),
                          "dental caries"
         )
)

Depressive episode

depress_epi <- c("depressive",
                 "depression",
                 "depressive disorder"
                 )

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern(depress_epi),
                          "depressive episode"
         ))

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern(depress_epi),
                          "depressive episode"
         ))

Dilated cardiomyopathy

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern("idiopathic dilated cardiomyopathy"),
                          "dilated cardiomyopathy"
         ))

Disturbances of sensation of smell and taste

terms <- c("ageusia",
           "abnormality of the sense of smell")

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          pattern = vec_to_grep_pattern(terms),
                          "disturbances of sensation of smell and taste"
         )
  )

Disorders of tooth development

tooth_dev_terms <- c("dental enamel hypoplasia",
                     "tooth agenesis",
                     "molar-incisor hypomineralization"
                     )

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          pattern = vec_to_grep_pattern(tooth_dev_terms),
                          "disorders of tooth development"
         )
  )

Disorders of purine and pyrimidine metabolism

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern("hyperuricemia"),
                          "disorders of purine and pyrimidine metabolism"
         ))

Disorders of refraction and accommodation

refractive_terms <- c("hyperopia",
                      "refractive error"
                      )

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          pattern = vec_to_grep_pattern(refractive_terms),
                          "disorders of refraction and accommodation"
         )
  )

Diseases of white blood cells

wbc_terms <- c("leukopenia")

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          pattern = vec_to_grep_pattern(wbc_terms),
                          "diseases of white blood cells"
         )
  )

Disorders of amino-acid metabolism

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern("amino acid metabolism disease"),
                          "disorders of amino-acid metabolism"
         ))

Drug allergy

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern("beta-lactam allergy"),
                          "drug allergy"
         ))

Eczema

# if DISEASE/STUDY contains eczema, change eczematoid dermatitis to eczema
gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         ifelse(grepl("eczema", 
                     `DISEASE/TRAIT`,
                     ignore.case = TRUE),
                stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern("eczematoid dermatitis"),
                          "eczema"
         ),
         collected_all_disease_terms
         )
  )


gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
                stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern("hand eczema"),
                          "eczema"
         )
  )

Essential hypertension

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern(
                            c(
                              "early onset hypertension",
                              "treatment-resistant hypertension")),
                          "essential hypertension"
         ))

Exstrophy of urinary bladder

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern("bladder exstrophy"),
                          "exstrophy of urinary bladder"
         ))

Febrile seizures

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern(c(
                                                "mmr-related febrile seizures",
                                                "febrile seizure")
                                              ),
                          "febrile convulsions"
         ))

Food allergy

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                                  vec_to_grep_pattern(
                                    c("peanut allergy",
                                      "milk allergy",
                                      "egg allergy",
                                      "wheat allergic reaction"
                                     )),
                          "food allergy"
         ))

Fuchs endothelial corneal dystrophy

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern("fuchs endothelial dystrophy"),
                          "fuchs endothelial corneal dystrophy"
         ))

Gingival and periodontal diseases

ginival_and_periodontal_terms <- c("periodontal pocket",
                                   "periodontal disorder",
                                   "gingival disease",
                                   "gingival bleeding"
                               )

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          pattern = vec_to_grep_pattern(ginival_and_periodontal_terms),
                          "gingival and periodontal diseases"
         )
  )

Glaucoma

url <- "http://www.ebi.ac.uk/ols4/api/ontologies/efo/terms/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FMONDO_0005041/descendants"

glaucoma_terms <- get_descendants(url)
[1] "Number of terms collected:"
[1] 19
[1] "\n Some example terms"
[1] "cyp1b1-related glaucoma with or without anterior segment dysgenesis"
[2] "glaucoma secondary to spherophakia/ectopia lentis and megalocornea" 
[3] "hereditary glaucoma, primary closed-angle"                          
[4] "primary angle closure glaucoma"                                     
[5] "secondary dysgenetic glaucoma"                                      
gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          pattern = vec_to_grep_pattern(glaucoma_terms),
                          "glaucoma"
         ))

Graft vs host disease

graft_vs_host_terms <- c("chronic graft versus host disease",
                         "chronic graft vs. host disease",
                         "acute graft versus host disease",
                         "acute graft vs. host disease"
                         )

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern(graft_vs_host_terms),
                          "graft versus host disease"
         ))

Hearing loss

gwas_study_info  = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern("age-related hearing impairment"),
                          "presbycusis"
         ))

gwas_study_info  = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern(c(
                            "deafness",
                            "noise-induced hearing loss")
                            ),
                          "hearing loss"
         ))

Hemiplegia

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern(c("hemiparesis")),
                          "hemiplegia"
         ))

Hepatic steatosis

# if DISEASE/TRAIT contains hepatic steatosis in non-alcoholic fatty liver disease
# replace hepatic steatosis with nonalcoholic steatohepatitis 

gwas_study_info =
  gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         ifelse(grepl("hepatic steatosis in non-alcoholic fatty liver disease", 
                     `DISEASE/TRAIT`,
                     ignore.case = TRUE),
                stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern("hepatic steatosis"),
                          "nonalcoholic steatohepatitis"
         ),
         collected_all_disease_terms
         )
  )


gwas_study_info =
  gwas_study_info |>
  mutate(collected_all_disease_terms  = 
                stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern("non-alcoholic steatohepatitis"),
                          "nonalcoholic steatohepatitis"
  )
  )

Hereditary hemochromatosis

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern("hereditary hemochromatosis type 1"),
                          "hereditary hemochromatosis"
         ))

Hordeolum and other deep inflammation of eyelid

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern("hordeolum"),
                          "hordeolum and other deep inflammation of eyelid"
         ))

Hypercholesterolemia

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern(c(
                            "familial hypercholesterolemia"
                            )),
                          "hypercholesterolemia"
         ))

Hyperlipidemia

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern(c(
                            "familial hyperlipidemia"
                            )),
                          "hyperlipidemia"
         ))

HIV disease resulting in encephalopathy

gwas_study_info = 
  gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern(c(
                                               "aids dementia",
                                               "hiv-associated neurocognitive disorder")
                                              ),
                          "hiv disease resulting in encephalopathy"
         ))

Intracranial artery stenosis

# "arterial stenosis"  -> "Intracranial artery stenosis"
# where DISEASE/TRAIT contains "Intracranial artery stenosis"
# replace "arterial stenosis" with "intracranial artery stenosis"

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         ifelse(grepl("Intracranial artery stenosis", 
                     `DISEASE/TRAIT`,
                     ignore.case = TRUE),
                stringr::str_replace_all(collected_all_disease_terms,
                          pattern = "arterial stenosis",
                          replacement = "intracranial artery stenosis"
         ),
         collected_all_disease_terms
         )
  )

Inherited retinal dystrophy

url <-  "http://www.ebi.ac.uk/ols4/api/ontologies/mondo/terms/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FMONDO_0019118/descendants"

inherited_retinal_dystrophy_terms <- get_descendants(url)
[1] "Number of terms collected:"
[1] 349
[1] "\n Some example terms"
[1] "spondyloepiphyseal dysplasia, sensorineural hearing loss, impaired intellectual development, and leber congenital amaurosis"
[2] "x-linked intellectual disability-limb spasticity-retinal dystrophy-diabetes insipidus syndrome"                             
[3] "microcephaly with or without chorioretinopathy, lymphedema, or intellectual disability"                                     
[4] "retinal vasculopathy with cerebral leukoencephalopathy and systemic manifestations"                                         
[5] "retinal dystrophy with inner retinal dysfunction and ganglion cell anomalies"                                               
gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          pattern = vec_to_grep_pattern(inherited_retinal_dystrophy_terms),
                          "hereditary retinal dystrophy"
         ))

Intentional self-harm by unspecified means

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern(c("self-injurious behavior",
                                                "self-injurious ideation")),
                          "intentional self-harm by unspecified means"
         ))

Infertility (male)

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
                  stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern(
                            c("azoospermia",
                              "sertoli cell-only syndrome")
                            ),
                          "male infertility"
                          )  )

Induratio penis plastica

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
                  stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern(
                            c("peyronie disease")
                            ),
                          "induratio penis plastica"
                          )  )

Intracerebral hemorrhage

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
                  stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern(
                            c("non-lobar intracerebral hemorrhage",
                              "lobar intracerebral hemorrhage")
                            ),
                          "intracerebral hemorrhage"
                          )  )

Idiopathic generalized epilepsy

url <- "http://www.ebi.ac.uk/ols4/api/ontologies/snomed/terms/http%253A%252F%252Fsnomed.info%252Fid%252F36803009/descendants"

idiopathic_generalized_epilepsy_terms <- get_descendants(url)
[1] "Number of terms collected:"
[1] 4
[1] "\n Some example terms"
[1] "epilepsy with generalized tonic-clonic seizures alone (disorder)"
[2] "juvenile myoclonic epilepsy"                                     
[3] "childhood absence epilepsy"                                      
[4] "juvenile absence epilepsy"                                       
[5] NA                                                                
idiopathic_generalized_epilepsy_terms = stringr::str_remove_all(
                                        idiopathic_generalized_epilepsy_terms, 
                                        " \\(disorder\\)$"
                                        )

idiopathic_generalized_epilepsy_terms = c("epilepsy with generalized tonic-clonic seizures",
                                          idiopathic_generalized_epilepsy_terms
                                         )


gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          pattern = vec_to_grep_pattern(idiopathic_generalized_epilepsy_terms),
                          "generalized idiopathic epilepsy and epileptic syndromes"
         ))

Idiopathic thrombocytopenic purpura

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern("autoimmune thrombocytopenic purpura"),
                          "idiopathic thrombocytopenic purpura"
         ))

Juvenile idiopathic arthritis

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern("rheumatoid factor-negative juvenile idiopathic arthritis"),
                          "juvenile idiopathic arthritis"
         ))

Keratoconjunctivitis

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern("keratoconjunctivitis sicca"),
                          "keratoconjunctivitis"
         ))

Learning disorder

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern(c("dyslexia",
                                                "mathematics disorder",
                                                "disorder of written expression"
                                                )
                                              ),
                          "learning disorder"
         ))

Lewy body dementia

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern("lewy body attribute"),
                          "lewy body dementia"
         ))

Migraine

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern(
                            c("migraine with aura",
                              "migraine without aura",
                              "migraine disorder"
                              )
                            ),
                          "migraine"
         ))

Multiple sclerosis

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern("relapsing-remitting multiple sclerosis"),
                          "multiple sclerosis"
         ))

Myasthenia gravis

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern("late-onset myasthenia gravis"),
                          "myasthenia gravis"
         ))

Narcolepsy-cataplexy syndrome -> Narcolepsy and cataplexy

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern("narcolepsy-cataplexy syndrome"),
                          "narcolepsy and cataplexy"
         ))

Nephrolithiasis

nephro_terms <- c("uric acid nephrolithiasis",
                   "calcium phosphate nephrolithiasis",
                    "calcium oxalate nephrolithiasis",
                   "struvite nephrolithiasis")


gwas_study_info =
gwas_study_info |> 
 mutate(collected_all_disease_terms  = 
          stringr::str_replace_all(collected_all_disease_terms,
                                  pattern = vec_to_grep_pattern(nephro_terms),
                                   "nephrolithiasis"
                          )  
        )

Nephritis and nephropathy with pathological lesion

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern(c("acute tubulointerstitial nephritis",
                                              "iga glomerulonephritis",
                                              "membranous glomerulonephritis",
                                              "lupus nephritis")
                                              ),
                          "nephritis and nephropathy with pathological lesion"
         ))

Neurofibromatosis

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern(
                            c("neurofibromatosis type 1",
                              "neurofibromatosis type 2")
                          ),
                          "neurofibromatosis"
         ))

Neuromyelitis optica

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern(
                            c("aquaporin-4 antibody positive neuromyelitis optica",
                              "aquaporin-4 antibody negative neuromyelitis optica",
                              "aqp4-igg-positive neuromyelitis optica",
                              "aqp4-igg-negative neuromyelitis optica"
                            )
                          ),
                          "neuromyelitis optica"
         ))

Noninflammatory disorders of vagina

non_inflam_terms <- c("abnormal vaginal discharge itching",
                      "abnormal vaginal discharge smell",
                      "vaginal discharge")

gwas_study_info = 
  gwas_study_info |>
   mutate(collected_all_disease_terms  = 
          stringr::str_replace_all(collected_all_disease_terms,
                                  pattern = 
                                    vec_to_grep_pattern(
                                      non_inflam_terms
                                      ),
                                   "noninflammatory disorders of vagina"
                          )  
        )

Obesity

gwas_study_info = 
gwas_study_info |> 
 mutate(collected_all_disease_terms  = 
          stringr::str_replace_all(collected_all_disease_terms,
                                  pattern = 
                                    vec_to_grep_pattern(
                                      c("morbid obesity",
                                         "metabolically healthy obesity"
                                        )
                                      ),
                                   "obesity"
                          )  
        )

Obsessive-compulsive disorder

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern(c
                                              ("obsessive-compulsive trait",
                                                "obsessive-compulsive")
                                              ),
                          "obsessive-compulsive disorder"
         ))

Occlusion of cerebral arteries

# if Brain vascular stenosis in DISEASE/TRAIT, then change vascular brain injury to  occlusion of cerebral arteries

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         ifelse(grepl("Brain vascular stenosis", 
                     `DISEASE/TRAIT`,
                     ignore.case = TRUE),
                stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern("vascular brain injury"),
                          "occlusion of cerebral arteries"
         ),
         collected_all_disease_terms
         )
  )

Opiod dependence

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern(c("heroin dependence",
                                              "opioid use disorder")),
                          "opioid dependence"
         ))

Osteonecrosis

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern("idiopathic osteonecrosis of the femoral head"),
                          "osteonecrosis"
         ))

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern("osteoradionecrosis"),
                          "osteonecrosis"
         ))

Other epilepsy

gwas_study_info =
  gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern(
                            c("mesial temporal lobe epilepsy with hippocampal sclerosis",
                              "rolandic epilepsy")
                            ),
                          "epilepsy"
         )) 

Other and unspecified cirrhosis of liver

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern(c("hepatitis c induced liver cirrhosis",
                                                "biliary liver cirrhosis")
                                              ),
                          "other and unspecified cirrhosis of liver"
         ))

Other cardiac conduction disorders

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern(c("familial long qt syndrome")
                                              ),
                          "other cardiac conduction disorders"
         ))

Other cerebral degenerations

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern(c("brain atrophy",
                                                "hippocampal atrophy"
                                                )
                                              ),
                          "other cerebral degenerations"
         ))

Other chronic nonalcoholic liver disease

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern(c("non-alcoholic fatty liver disease"
                                                )),
                          "other chronic nonalcoholic liver disease"
         ))

Other disorders of bone and cartilage

other_bone_cartilage_terms <- c("tietze syndrome"
                               )

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          pattern = vec_to_grep_pattern(other_bone_cartilage_terms),
                          "other disorders of bone and cartilage"
         )
  )

Other disorders of eyelids

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern(c("dermatochalasis",
                                                "filarial elephantiasis"
                                                )
                                              ),
                          "other disorders of eyelids"
         ))

Other disorders of iris and ciliary body

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern(c("iris disorder"
                                                )
                                              ),
                          "other disorders of iris and ciliary body"
         ))

Other eating disorders

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern(c("binge eating")
                                              ),
                          "other eating disorders"
         ))

Other haemoglobinopathies

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern("hemoglobin e disease"),
                          "other haemoglobinopathies"
         ))

Other paralytic syndromes

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern(
                            c("paraplegia",
                              "quadriplegia")
                            ),
                          "other paralytic syndromes"
         ))

Other specified degenerative diseases of nervous system

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern(c("corticobasal degeneration disorder"
                                                )
                                              ),
                          "other specified degenerative diseases of nervous system"
         ))

# progressive supranuclear palsy -> Dementia with cerebral degenerations

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern("progressive supranuclear palsy"),
                          "dementia with cerebral degenerations"
         ))

Other specified inflammatory liver diseases

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern(c("primary biliary cholangitis",
                                                "primary sclerosing cholangitis"
                                                )
                                              ),
                          "non-alcoholic steatohepatitis"
         ))

Other specified retinal disorders

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern(c("retinal edema"
                                              )),
                          "other specified retinal disorders"
         ))

Parkinsons disease

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern("young adult-onset parkinsonism"),
                          "parkinsons disease"
         ))

Perinatal jaundice

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern("neonatal jaundice"),
                          "perinatal jaundice"
         ))

Periodontitis

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern("aggressive periodontitis"),
                          "periodontitis"
         ))

Phlebitis and thrombophlebitis

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern(c("thrombophlebitis")),
                          "phlebitis and thrombophlebitis"
         ))

Phobias

# social anxiety disorder -> social phobias
gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern("social anxiety disorder"),
                          "social phobias"
         ))

# specific phobia -> Specific \\(isolated\\) phobias

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern("specific phobia"),
                          "specific \\(isolated\\) phobias"
         ))

Pigmentary iris degeneration

# where Pigmentary iris degeneration in DISEASE/TRAIT & pubmed id 39024449
# replace abnormality of the eye with pigmentary iris degeneration

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         ifelse(grepl("Pigmentary iris degeneration", 
                     `DISEASE/TRAIT`,
                     ignore.case = TRUE) & PUBMED_ID == "39024449",
                stringr::str_replace_all(collected_all_disease_terms,
                          pattern = "abnormality of the eye",
                          replacement = "pigmentary iris degeneration"
         ),
         collected_all_disease_terms
         )
  )

Precocious puberty

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern("central precocious puberty"),
                          "precocious puberty"
         ))

Premature menopause and other ovarian failure

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern("premature menopause"),
                          "premature menopause and other ovarian failure"
         ))

Primary hyperaldosteronism

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern("primary aldosteronism"),
                          "primary hyperaldosteronism"
         ))

Primary ovarian failure

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern("primary ovarian insufficiency"),
                          "primary ovarian failure"
         ))

Primary thrombophilia

# if DISEASE/TRAIT contains PheCode 286.8
# then replce thrombophilia with primary thrombophilia
gwas_study_info =
  gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         ifelse(grepl("PheCode 286.8", 
                     `DISEASE/TRAIT`,
                     ignore.case = TRUE),
                stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern("thrombophilia"),
                          "primary thrombophilia"
         ),
         collected_all_disease_terms
         )
  )

Proteinuria

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern(c("albuminuria",
                                                "moderate albuminuria")
                                              ),
                          "proteinuria"
         ))

Prurigo

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern("prurigo"),
                          "other prurigo"
         ))

Psoriasis

psoriasis_terms <- c("cutaneous psoriasis")


gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          pattern = vec_to_grep_pattern(psoriasis_terms),
                          "psoriasis")
  )

Psychosis (pyschotic, pyschotic symptoms)

psychosis_terms <- c("psychotic",
                     "psychotic symptoms"
                     )

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          pattern = vec_to_grep_pattern(psychosis_terms),
                          "psychosis")
  )

Pulmonary fibrosis

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern("idiopathic pulmonary fibrosis"),
                          "pulmonary fibrosis"
         ))

Rash, Petechiae

# for pubmed id: 37469131
# and DISEASE/TRAIT contains 'rush' (a tyop of rash)
# replace purpura with rash, petechiae

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         ifelse(PUBMED_ID == 37469131 & grepl("rush", `DISEASE/TRAIT`, ignore.case = TRUE),
                stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern("purpura"),
                          "rash, petechiae"
         ),
         collected_all_disease_terms
         )
  )

Rash and other nonspecific skin eruption

rash_terms <- c(
                "maculopapular eruption"
                )

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          pattern = vec_to_grep_pattern(rash_terms),
                          "rash and other nonspecific skin eruption"
         )
  )

Retinoschisis and retinoschisis and retinal cysts

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern("retinoschisis"),
                          "retinoschisis and retinal cysts"
         ))

Rheumatoid arthritis

ra_terms <- c("acpa-positive rheumatoid arthritis",
               "acpa-negative rheumatoid arthritis",
              "adult-onset stills disease")

gwas_study_info = 
gwas_study_info |> 
 mutate(collected_all_disease_terms  = 
          stringr::str_replace_all(collected_all_disease_terms,
                                  pattern = vec_to_grep_pattern(ra_terms),
                                   "rheumatoid arthritis"
                          )  
        )

Rhinitis

rhinitis_terms <- c("non-allergic rhinitis")

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          pattern = vec_to_grep_pattern(rhinitis_terms),
                          "rhinitis"
         )
  )

Sciatica

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern(c("ldh-related sciatica")
                                              ),
                          "sciatica"
         )) 

Schizoaffective disorder

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern("schizoaffective disorder-bipolar type"),
                          "schizoaffective disorder")
         )

Schizophrenia

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern("treatment refractory schizophrenia"),
                          "schizophrenia")
         )

Scoliosis

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern("adolescent idiopathic scoliosis"),
                          "scoliosis")
         )

Separation of retinal layers

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern(c("central serous retinopathy",
                                              "chronic central serous retinopathy")
                                              ),
                          "separation of retinal layers"
         ))

Sleep apnea

sleep_apnea_terms <- c("sleep apnea during non-rem sleep",
                      "sleep apnea during rem sleep",
                      "obstructive sleep apnea")

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          pattern = vec_to_grep_pattern(sleep_apnea_terms),
                          "sleep apnea")
  )

Sleep disorders

sleep_disorder_terms <- c("sleepiness",
                          "somnambulism",
                          "rem sleep behavior disorder",
                          "periodic limb movement disorder",
                          "bruxism"
                         )


gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          pattern = vec_to_grep_pattern(sleep_disorder_terms),
                          "sleep disorders"
         )
  )

Speech and language disorder

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern(c("specific language impairment",
                                                "language impairment",
                                                "social communication impairment"
                                                )
                                              ),
                          "speech and language disorder"
         ))

# if DISEASE/TRAIT contains Developmental stuttering
# then replace stuttering with speech and language disorder
gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         ifelse(grepl("Developmental stuttering", 
                     `DISEASE/TRAIT`,
                     ignore.case = TRUE),
                stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern("stuttering"),
                          "speech and language disorder"
         ),
         collected_all_disease_terms
         )
  )

Staphylococcus infections

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern(
                            c("staphylococcus aureus infection",
                              "skin and soft tissue staphylococcus aureus infection",
                              "methicillin-resistant staphylococcus aureus infection")),
                          "staphylococcus infections"
         ))

Strabismus

url <-  "http://www.ebi.ac.uk/ols4/api/ontologies/doid/terms/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FDOID_540/descendants"

strabismus_terms <- get_descendants(url)
[1] "Number of terms collected:"
[1] 26
[1] "\n Some example terms"
[1] "abnormal retinal correspondence" "brown's tendon sheath syndrome" 
[3] "internuclear ophthalmoplegia"    "duane retraction syndrome 3"    
[5] "duane retraction syndrome 2"    
strabismus_terms = c("non-accomodative esotropia",
                     strabismus_terms)

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          pattern = vec_to_grep_pattern(strabismus_terms),
                          "strabismus"
         ))

Stroke

other_nonspec_stroke <- c("large artery stroke",
                         "small vessel stroke",
                         "stroke outcome",
                         "stroke disorder"
                         )

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern(other_nonspec_stroke),
                          "stroke"
         ))

Stuttering, tics -> Tics and stuttering

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         ifelse(grepl("PheCode 313.2", 
                     `DISEASE/TRAIT`,
                     ignore.case = TRUE),
                stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern(c("stuttering, tics")),
                          "tics and stuttering"
                          ),
         collected_all_disease_terms
)
)


gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         ifelse(grepl("PheCode 333.3", 
                     `DISEASE/TRAIT`,
                     ignore.case = TRUE),
                stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern(c("stuttering, tics")),
                          "tics and choreas"
         ),
         collected_all_disease_terms
))

Systemic sclerosis

terms <- c("diffuse scleroderma",
           "limited scleroderma",
           "anti-centromere-antibody-positive systemic scleroderma",
           "anti-topoisomerase-i-antibody-positive systemic scleroderma")

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern(terms),
                          "systemic sclerosis"
         ))

Tinea

# if DISEASE/TRAIT contains Ringworm, replace tinea with tinea corporis
gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         ifelse(grepl("Ringworm", 
                     `DISEASE/TRAIT`,
                     ignore.case = TRUE),
                stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern("tinea"),
                          "tinea corporis"
         ),
         collected_all_disease_terms
         )
  )

Thyrotoxicosis with or without goiter

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern("hyperthyroidism"),
                          "thyrotoxicosis with or without goiter"
         ))

Treatment resistant (etc.)

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_remove_all(collected_all_disease_terms,
                       "^treatment-resistant |^treatment resistant |^treatment-resistant "
         ))

Treatment resistant depression

# if DISEASE/TRAIT contains Treatment resistant depression
# then replace depression with treatment resistant depression
gwas_study_info =
  gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         ifelse(grepl("Treatment resistant depression", 
                     `DISEASE/TRAIT`,
                     ignore.case = TRUE),
                stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern("depression"),
                          "treatment resistant depression"
         ),
         collected_all_disease_terms
         )
  )

gwas_study_info =
  gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         ifelse(grepl("treatment resistant depression", 
                     MAPPED_BACKGROUND_TRAIT,
                     ignore.case = TRUE),
                stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern("depression"),
                          "treatment resistant depression"
         ),
         collected_all_disease_terms
         )
  )

Type 1 diabetes

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern("latent autoimmune diabetes in adults"),
                          "type 1 diabetes mellitus"
         ))

Diabetic foot, neuropathy, type 1 diabetes mellitus, type 2 diabetes mellitus

icd_terms <- c("type 1 diabetes with foot ulcer",
               "type 2 diabetes with foot ulcer",
               "type 1 diabetes with diabetic neuropathy",
               "type 2 diabetes with diabetic neuropathy")

icd_replacement <- paste0(icd_terms, collapse = ", ")

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern("diabetic foot, neuropathy, type 1 diabetes mellitus, type 2 diabetes mellitus"),
                          icd_replacement
         ))

Type 2 diabetes with a ophthalmic manifestations

type_2_eye_terms <- c("diabetes mellitus type 2 associated cataract",
                      "diabetic maculopathy, type 2 diabetes mellitus",
                      "diabetic macular edema, type 2 diabetes mellitus",
                      "proliferative diabetic retinopathy, type 2 diabetes mellitus",
                      "macrovascular complications of diabetes, type 2 diabetes mellitus"
                     )

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern(type_2_eye_terms),
                          "type 2 diabetes with ophthalmic manifestations"
         ))

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern(type_2_eye_terms),
                          "type 2 diabetes with ophthalmic manifestations"
         ))

# for 30487263, all discovery samples are type 2 diabetes - so 
# replace proliferative diabetic retinopathy with type 2 diabetes with ophthalmic manifestations
gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         ifelse(PUBMED_ID == 30487263,
                stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern("proliferative diabetic retinopathy"),
                          "type 2 diabetes with ophthalmic manifestations"
         ),
         collected_all_disease_terms
         )
  )

# for pubmed id: 31482010
# all samples are type 2 diabetes
# so replace diabetic retinopathy with type 2 diabetes with ophthalmic manifestations
gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         ifelse(PUBMED_ID == 31482010,
                stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern("proliferative diabetic retinopathy"),
                          "type 2 diabetes with ophthalmic manifestations"
         ),
         collected_all_disease_terms
         )
)

# neuropathic pain, type 2 diabetes mellitus
# change to type 2 diabetes with diabetic neuropathy
gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
                stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern("neuropathic pain, type 2 diabetes mellitus"),
                          "type 2 diabetes with ophthalmic manifestations"
         )
  )

Unspecified condition associated with female genital organs and menstrual cycle

gwas_study_info = 
  gwas_study_info |>
  mutate(collected_all_disease_terms = 
    ifelse(grepl("Menstruation", 
                `DISEASE/TRAIT`,
                 ignore.case = TRUE),
           str_replace_all(collected_all_disease_terms,
                           pattern = vec_to_grep_pattern("decreased attention"),
                           "unspecified condition associated with female genital organs and menstrual cycle"),
           collected_all_disease_terms
    )
    )

Uveitis

uveitis_terms <- c("anterior uveitis",
                   "iritis",
                   "vogt-koyanagi-harada disease",
                   "birdshot chorioretinopathy",
                   "multifocal choroiditis"
                   )

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          pattern = vec_to_grep_pattern(uveitis_terms),
                          "uveitis"
         ))

Vasculitis

gwas_study_info =
  gwas_study_info |>
  mutate(collected_all_disease_terms  = 
                stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern("anti-neutrophil antibody associated vasculitis"),
                          "vasculitis"
         )
  )

Save

How many unique traits are there now?

diseases <- stringr::str_split(pattern = ", ",
                               gwas_study_info$collected_all_disease_terms[gwas_study_info$collected_all_disease_terms != ""])  |>
  unlist() |>
  stringr::str_trim()

diseases <- unique(diseases)

print(length(diseases))
[1] 1963
fwrite(
  gwas_study_info,
  here::here("output/gwas_cat/gwas_study_all_group.csv")
  )

sessionInfo()
R version 4.3.1 (2023-06-16)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS 15.6.1

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: America/Los_Angeles
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices datasets  utils     methods   base     

other attached packages:
[1] jsonlite_2.0.0    httr_1.4.7        data.table_1.17.8 stringr_1.6.0    
[5] dplyr_1.1.4       workflowr_1.7.1  

loaded via a namespace (and not attached):
 [1] compiler_4.3.1      BiocManager_1.30.26 renv_1.0.3         
 [4] promises_1.3.3      tidyselect_1.2.1    Rcpp_1.1.0         
 [7] git2r_0.36.2        callr_3.7.6         later_1.4.4        
[10] jquerylib_0.1.4     yaml_2.3.10         fastmap_1.2.0      
[13] here_1.0.1          R6_2.6.1            generics_0.1.4     
[16] curl_7.0.0          knitr_1.50          tibble_3.3.0       
[19] rprojroot_2.1.0     bslib_0.9.0         pillar_1.11.1      
[22] rlang_1.1.6         cachem_1.1.0        stringi_1.8.7      
[25] httpuv_1.6.16       xfun_0.55           getPass_0.2-4      
[28] fs_1.6.6            sass_0.4.10         cli_3.6.5          
[31] withr_3.0.2         magrittr_2.0.4      ps_1.9.1           
[34] digest_0.6.37       processx_3.8.6      rstudioapi_0.17.1  
[37] lifecycle_1.0.4     vctrs_0.6.5         evaluate_1.0.5     
[40] glue_1.8.0          whisker_0.4.1       rmarkdown_2.30     
[43] tools_4.3.1         pkgconfig_2.0.3     htmltools_0.5.8.1