Last updated: 2025-09-16
Checks: 7 0
Knit directory:
genomics_ancest_disease_dispar/
This reproducible R Markdown analysis was created with workflowr (version 1.7.1). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.
Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.
Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.
The command set.seed(20220216)
was run prior to running
the code in the R Markdown file. Setting a seed ensures that any results
that rely on randomness, e.g. subsampling or permutations, are
reproducible.
Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.
Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.
Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.
The results in this page were generated with repository version c601713. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.
Note that you need to be careful to ensure that all relevant files for
the analysis have been committed to Git prior to generating the results
(you can use wflow_publish
or
wflow_git_commit
). workflowr only checks the R Markdown
file, but you know if there are other scripts or data files that it
depends on. Below is the status of the Git repository when the results
were generated:
Ignored files:
Ignored: .DS_Store
Ignored: .Rproj.user/
Ignored: data/.DS_Store
Ignored: data/gbd/.DS_Store
Ignored: data/gbd/ihme_gbd_2019_global_disease_burden_rate_all_ages.csv
Ignored: data/gbd/ihme_gbd_2019_global_paf_rate_percent_all_ages.csv
Ignored: data/gbd/ihme_gbd_2021_global_disease_burden_rate_all_ages.csv
Ignored: data/gbd/ihme_gbd_2021_global_paf_rate_percent_all_ages.csv
Ignored: data/gwas_catalog/
Ignored: data/who/
Ignored: output/gwas_cat/
Ignored: output/gwas_study_info_cohort_corrected.csv
Ignored: output/gwas_study_info_trait_corrected.csv
Ignored: output/gwas_study_info_trait_ontology_info.csv
Ignored: output/gwas_study_info_trait_ontology_info_l1.csv
Ignored: output/gwas_study_info_trait_ontology_info_l2.csv
Ignored: output/trait_ontology/
Ignored: renv/
Unstaged changes:
Modified: analysis/level_1_disease_group_cancer.Rmd
Modified: analysis/level_2_disease_group.Rmd
Modified: code/get_term_descendants.R
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
These are the previous versions of the repository in which changes were
made to the R Markdown
(analysis/level_1_disease_group_non_cancer.Rmd
) and HTML
(docs/level_1_disease_group_non_cancer.html
) files. If
you’ve configured a remote Git repository (see
?wflow_git_remote
), click on the hyperlinks in the table
below to view the files as they were in that past version.
File | Version | Author | Date | Message |
---|---|---|---|---|
Rmd | c601713 | IJbeasley | 2025-09-16 | Even more disease term grouping |
html | add5ecc | IJbeasley | 2025-09-15 | Build site. |
Rmd | 465a689 | IJbeasley | 2025-09-15 | More disease term grouping |
html | 7504c04 | IJbeasley | 2025-09-15 | Build site. |
Rmd | f01005f | IJbeasley | 2025-09-15 | workflowr::wflow_publish("analysis/level_1_disease_group_non_cancer.Rmd") |
html | 9aa118e | IJbeasley | 2025-09-15 | Build site. |
Rmd | ffbf74a | IJbeasley | 2025-09-15 | Further grouping of disease terms |
html | b19b361 | IJbeasley | 2025-09-15 | Build site. |
Rmd | 9cc22ba | IJbeasley | 2025-09-15 | Dealing with duplicate disease terms |
html | 1679f9d | IJbeasley | 2025-09-10 | Build site. |
html | 2250e22 | IJbeasley | 2025-09-10 | Build site. |
Rmd | e3de56c | IJbeasley | 2025-09-10 | Update cardiac disease grouping |
html | e713c34 | IJbeasley | 2025-09-10 | Build site. |
Rmd | 934b11f | IJbeasley | 2025-09-10 | workflowr::wflow_publish("analysis/level_1_disease_group_non_cancer.Rmd") |
library(dplyr)
library(data.table)
library(stringr)
source(here::here("code/get_term_descendants.R"))
gwas_study_info <- fread(here::here("output/gwas_cat/gwas_study_info_disease_trait_simplified.csv"))
Level 1 disease grouping: - collapse disease causes, e.g. contact dermatitis due to nickel to contact dermatitis - collapse disease subtypes, e.g. bipolar I and bipolar II to bipolar disorder - collapse disease onset times, e.g. early-onset alzheimers disease and late-onset alzheimers disease to alzheimers disease - collapse disease stages - collapse disease complications, e.g. device complication, trauma complication to complication
gwas_study_info = gwas_study_info |>
mutate(l1_all_disease_terms = collected_all_disease_terms)
n_studies_trait = gwas_study_info |>
dplyr::filter(DISEASE_STUDY == T) |>
dplyr::select(collected_all_disease_terms, PUBMED_ID) |>
dplyr::distinct() |>
dplyr::group_by(collected_all_disease_terms) |>
dplyr::summarise(n_studies = dplyr::n()) |>
dplyr::arrange(desc(n_studies))
head(n_studies_trait)
# A tibble: 6 × 2
collected_all_disease_terms n_studies
<chr> <int>
1 type 2 diabetes mellitus 145
2 alzheimers disease 116
3 breast cancer 112
4 asthma 110
5 major depressive disorder 108
6 schizophrenia 103
dim(n_studies_trait)
[1] 3065 2
diseases <- stringr::str_split(pattern = ", ",
gwas_study_info$collected_all_disease_terms[gwas_study_info$collected_all_disease_terms != ""]) |>
unlist() |>
stringr::str_trim()
length(unique(diseases))
[1] 2194
disturb_senses_terms <- c("disturbances of sensation of smell and taste",
"abnormality of the sense of smell",
"ageusia")
gwas_study_info = gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms,
"chronic kidney disease stage 5|chronic kidney disease stage 4",
"chronic kidney disease"
))
gwas_study_info = gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms,
"stage 5 chronic kidney disease",
"chronic kidney disease"
))
gwas_study_info = gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms,
"contact dermatitis due to nickel",
"contact dermatitis"
))
disease_complication_terms = c("device complication",
"trauma complication",
"adverse effect",
"aseptic loosening"
)
gwas_study_info = gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms,
pattern = paste0(disease_complication_terms, collapse = "(?=,|$)|\\b"),
"complication"
)
)
url <- "http://www.ebi.ac.uk/ols4/api/ontologies/hp/terms/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FHP_0020064/descendants"
abnormal_total_eosinophil_count_terms <- get_descendants(url)
[1] "Number of terms collected:"
[1] 4
[1] "\n Some example terms"
[1] "severely increased total eosinophil count"
[2] "decreased total eosinophil count"
[3] "increased total eosinophil count"
[4] "episodic eosinophilia"
[5] NA
gwas_study_info = gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms,
pattern = paste0(abnormal_total_eosinophil_count_terms, collapse = "(?=,|$)|\\b"),
"abnormal total eosinophil count"
))
pregnancy_loss_terms <- c("habitual abortion",
"incomplete abortion",
"spontaneous abortion",
"abortion",
"spontaneous loss of pregnancy",
"incomplete loss of pregnancy"
)
gwas_study_info = gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms,
paste0(pregnancy_loss_terms, collapse = "(?=,|$)|\\b"),
"loss of pregnancy"
))
# exact synonyms (to avoid partial matches
# https://www.ebi.ac.uk/ols4/ontologies/efo/classes/http%253A%252F%252Fwww.ebi.ac.uk%252Fefo%252FEFO_1001491?lang=en
gwas_study_info = gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms,
"asparaginase-induced acute pancreatitis",
"acute pancreatitis"
))
url<- "http://www.ebi.ac.uk/ols4/api/ontologies/mondo/terms/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FMONDO_0004907/descendants"
alopecia_terms <- get_descendants(url)
gwas_study_info = gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms,
pattern = paste0(alopecia_terms, collapse = "(?=,|$)|\\b"),
"alopecia"
))
alchol_use_disorder_terms <- c("alcohol dependence",
"alcohol withdrawal",
"alcohol withdrawal delirium",
"alcohol abuse",
"alcohol-related disorders delirium",
"alcohol use disorder",
"addictive alcohol use"
)
gwas_study_info = gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms,
paste0(alchol_use_disorder_terms, collapse = "(?=,|$)|\\b"),
"alcohol-related disorders"
))
gwas_study_info = gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms,
paste0(alchol_use_disorder_terms, collapse = "(?=,|$)|\\b"),
"alcohol-related disorders"
))
alcoholic_liver_disease_terms <- c("alcoholic liver cirrhosis",
"alcoholic fatty liver disease",
"alcoholic hepatitis"
)
gwas_study_info = gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms,
paste0(alcoholic_liver_disease_terms, collapse = "(?=,|$)|\\b"),
"alcoholic liver disease"
))
gwas_study_info = gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms,
"early-onset alzheimers disease|late-onset alzheimers disease",
"alzheimers disease"
))
gwas_study_info = gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms,
"allergic contact dermatitis of eyelid",
"allergic contact dermatitis"
))
gwas_study_info = gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms,
"seasonal allergic rhinitis",
"allergic rhinitis"
))
aortic_aneurysm_terms <- c("thoracic aortic aneurysm",
"abdominal aortic aneurysm"
)
gwas_study_info = gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms,
pattern = paste0(aortic_aneurysm_terms, collapse = "(?=,|$)|\\b"),
"aortic aneurysm"
))
url <- "http://www.ebi.ac.uk/ols4/api/ontologies/mondo/terms/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FMONDO_0015909/descendants"
aplastic_anemia_terms <- get_descendants(url)
[1] "Number of terms collected:"
[1] 62
[1] "\n Some example terms"
[1] "diamond-blackfan anemia 15 with mandibulofacial dysostosis"
[2] "diamond-blackfan anemia 14 with mandibulofacial dysostosis"
[3] "cellular phase chronic idiopathic myelofibrosis"
[4] "autosomal dominant aplasia and myelodysplasia"
[5] "pancytopenia-developmental delay syndrome"
gwas_study_info = gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms,
pattern = paste0(aplastic_anemia_terms, collapse = "(?=,|$)|\\b"),
"aplastic anemia"
))
gwas_study_info = gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms,
"corneal astigmatism",
"astigmatism"
))
gwas_study_info = gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms,
"atopic asthma|chronic obstructive asthma",
"asthma"
)) |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms,
"childhood onset asthma|adult onset asthma|aspirin-induced asthma",
"asthma"
))
gwas_study_info = gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms,
"autoimmune pancreatitis type 1",
"autoimmune pancreatitis"
))
gwas_study_info = gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms,
"bipolar ii disorder|bipolar i disorder",
"bipolar disorder"
))
url <- "http://www.ebi.ac.uk/ols4/api/ontologies/efo/terms/http%253A%252F%252Fwww.ebi.ac.uk%252Fefo%252FEFO_0003931/descendants"
bone_fracture_terms <- get_descendants(url)
[1] "Number of terms collected:"
[1] 29
[1] "\n Some example terms"
[1] "atypical femoral fracture" "periprosthetic fractures"
[3] "upper extremity fracture" "lower extremity fracture"
[5] "multiple bone fractures"
gwas_study_info = gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms,
pattern = paste0(bone_fracture_terms, collapse = "(?=,|$)|\\b"),
"bone fracture"
))
bundle_branch_block_terms <- c("left bundle branch block",
"right bundle branch block",
"complete right bundle branch block"
)
gwas_study_info = gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms,
pattern = paste0(bundle_branch_block_terms, collapse = "(?=,|$)|\\b"),
"bundle branch block"
)
)
url <- "http://www.ebi.ac.uk/ols4/api/ontologies/hp/terms/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FHP_0001638/descendants"
cardiomyopathy_terms <- get_descendants(url)
[1] "Number of terms collected:"
[1] 14
[1] "\n Some example terms"
[1] "right ventricular noncompaction cardiomyopathy"
[2] "left ventricular noncompaction cardiomyopathy"
[3] "biventricular noncompaction cardiomyopathy"
[4] "concentric hypertrophic cardiomyopathy"
[5] "apical hypertrophic cardiomyopathy"
gwas_study_info = gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms,
pattern = paste0(cardiomyopathy_terms, collapse = "(?=,|$)|\\b"),
"cardiomyopathy"
))
carotid_artery_disease_terms <- c("carotid artery thrombosis",
"carotid atherosclerosis"
)
gwas_study_info = gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms,
pattern = paste0(carotid_artery_disease_terms, collapse = "(?=,|$)|\\b"),
"carotid artery disease"
))
url <- "http://www.ebi.ac.uk/ols4/api/ontologies/mondo/terms/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FMONDO_0005129/descendants"
cataract_terms <- get_descendants(url)
[1] "Number of terms collected:"
[1] 80
[1] "\n Some example terms"
[1] "autosomal recessive nonsyndromic congenital nuclear cataract"
[2] "diabetes mellitus type 2 associated cataract"
[3] "early-onset posterior subcapsular cataract"
[4] "autosomal dominant non-nuclear cataract"
[5] "kozlowski rafinski klicharska syndrome"
gwas_study_info = gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms,
pattern = paste0(cataract_terms, collapse = "(?=,|$)|\\b"),
"cataract"
))
gwas_study_info =
gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms,
pattern = "refractory celiac disease",
"celiac disease"
)
)
gwas_study_info = gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms,
"charcot-marie-tooth disease type 1a, decreased fine motor function",
"charcot-marie-tooth disease type 1a"
))
gwas_study_info = gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms,
"charcot-marie-tooth disease type 1a, decreased walking ability",
"charcot-marie-tooth disease type 1a"
))
gwas_study_info = gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms,
"charcot-marie-tooth disease type 1a, gait imbalance",
"charcot-marie-tooth disease type 1a, gait disturbance"
))
url <- "http://www.ebi.ac.uk/ols4/api/ontologies/efo/terms/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FHP_0012532/descendants"
chronic_pain_terms <- get_descendants(url)
[1] "Number of terms collected:"
[1] 9
[1] "\n Some example terms"
[1] "chronic musculoskeletal pain" "female chronic pelvic pain"
[3] "chronic widespread pain" "multisite chronic pain"
[5] "chronic shoulder pain"
gwas_study_info = gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms,
pattern = paste0(chronic_pain_terms, collapse = "(?=,|$)|\\b"),
"chronic pain"
))
gwas_study_info = gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms,
"thiopurine immunosuppressant-induced pancreatitis|non-alcoholic pancreatitis",
"chronic pancreatitis"
))
gwas_study_info = gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms,
"chronic rhinosinusitis with nasal polyps",
"chronic rhinosinusitis"
))
gwas_study_info = gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms,
"acute cholecystitis",
"cholecystitis"
))
url <- "http://www.ebi.ac.uk/ols4/api/ontologies/efo/terms/http%253A%252F%252Fwww.ebi.ac.uk%252Fefo%252FEFO_0001645/descendants"
coronary_artery_disease_terms <- get_descendants(url)
[1] "Number of terms collected:"
[1] 16
[1] "\n Some example terms"
[1] "coronary artery disease, autosomal dominant 2"
[2] "non-obstructive coronary artery disease"
[3] "spontaneous coronary artery dissection"
[4] "postoperative ventricular dysfunction"
[5] "intermediate coronary syndrome"
gwas_study_info = gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms,
pattern = paste0(coronary_artery_disease_terms, collapse = "(?=,|$)|\\b"),
"coronary artery disease"
))
url <- "http://www.ebi.ac.uk/ols4/api/ontologies/mondo/terms/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FMONDO_0002182/descendants"
communication_disorder_terms <- get_descendants(url)
[1] "Number of terms collected:"
[1] 20
[1] "\n Some example terms"
[1] "mixed receptive-expressive language disorder"
[2] "stuttering, familial persistent, 3"
[3] "stuttering, familial persistent, 4"
[4] "stuttering, familial persistent, 2"
[5] "stuttering, familial persistent, 1"
gwas_study_info = gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms,
pattern = paste0(communication_disorder_terms, collapse = "(?=,|$)|\\b"),
"communication disorder"
))
congestive_heart_failure_terms <- c("systolic heart failure",
"diastolic heart failure",
"cor pulmonale",
"congenital heart disease",
"chronic pulmonary heart disease"
)
gwas_study_info = gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms,
pattern = paste0(congestive_heart_failure_terms, collapse = "(?=,|$)|\\b"),
"congestive heart failure"
))
gwas_study_info = gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms,
"sporadic creutzfeld jacob disease",
"creutzfeldt jacob disease"
))
dental_caries_terms <- c("pit and fissure surface dental caries",
"smooth surface dental caries",
"primary dental caries",
"enamel caries",
"permanent dental caries"
)
gwas_study_info = gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms,
pattern = paste0(dental_caries_terms, collapse = "(?=,|$)|\\b"),
"dental caries"
)
)
url <- "http://www.ebi.ac.uk/ols4/api/ontologies/doid/terms/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FDOID_2723/descendants"
dermatitis_terms <- get_descendants(url)
[1] "Number of terms collected:"
[1] 93
[1] "\n Some example terms"
[1] "epidermolysis bullosa with congenital localized absence of skin and deformity of nails"
[2] "diphenylmethane-4,4'-diisocyanate allergic contact dermatitis"
[3] "1-chloro-2,4-dinitrobenzene allergic contact dermatitis"
[4] "epidermolysis bullosa simplex with mottled pigmentation"
[5] "junctional epidermolysis bullosa with pyloric atresia"
gwas_study_info = gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms,
pattern = paste0(dermatitis_terms, collapse = "(?=,|$)|\\b"),
"dermatitis"
))
# see: http://www.ebi.ac.uk/efo/EFO_0000274
eczema_terms <- c("atopic eczema",
"hand eczema",
"eczematoid dermatitis", # see: http://purl.obolibrary.org/obo/HP_0000964
"recalcitrant dermatitis" # assuming same as recalcitrant atopic dermatitis http://www.ebi.ac.uk/efo/EFO_1000651
)
gwas_study_info = gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms,
pattern = paste0(eczema_terms, collapse = "(?=,|$)|\\b"),
"dermatitis"
))
gwas_study_info = gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms,
"beta-lactam allergy",
"drug allergy"
))
diabetic_eye_terms <- c("diabetic maculopathy",
"diabetic macular edema",
"diabetic retinopathy",
"proliferative diabetic retinopathy",
"non-proliferative diabetic retinopathy",
"diabetes mellitus type 2 associated cataract")
gwas_study_info =
gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms ,
pattern = paste0(diabetic_eye_terms, collapse = "(?=,|$)|\\b"),
"diabetic eye disease"
)
)
gwas_study_info = gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms,
"diabetic polyneuropathy",
"diabetic neuropathy"
))
gwas_study_info = gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms,
"iron deficiency anemia",
"deficiency anemia"
))
gwas_study_info = gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms,
"bacterial endocarditis",
"endocarditis"
))
gwas_study_info = gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms,
"peanut allergy|milk allergy|egg allergy|wheat allergic reaction",
"food allergy"
))
url <- "http://www.ebi.ac.uk/ols4/api/ontologies/efo/terms/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FMONDO_0005041/descendants"
glaucoma_terms <- get_descendants(url)
[1] "Number of terms collected:"
[1] 19
[1] "\n Some example terms"
[1] "cyp1b1-related glaucoma with or without anterior segment dysgenesis"
[2] "glaucoma secondary to spherophakia/ectopia lentis and megalocornea"
[3] "hereditary glaucoma, primary closed-angle"
[4] "primary angle closure glaucoma"
[5] "secondary dysgenetic glaucoma"
gwas_study_info = gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms,
pattern = paste0(glaucoma_terms, collapse = "(?=,|$)|\\b"),
"glaucoma"
))
# from: http://www.ebi.ac.uk/efo/EFO_0004274
gwas_study_info = gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms,
"renal overload-type gout",
"gout"
))
heart_block_terms <- c("first degree atrioventricular block",
"second degree atrioventricular block",
"third-degree atrioventricular block",
"bundle branch block"
)
gwas_study_info = gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms,
pattern = paste0(heart_block_terms, collapse = "(?=,|$)|\\b"),
"heart block"
))
url <- "http://www.ebi.ac.uk/ols4/api/ontologies/mondo/terms/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FMONDO_0003664/descendants"
hemolytic_anemia_terms <- get_descendants(url)
[1] "Number of terms collected:"
[1] 76
[1] "\n Some example terms"
[1] "dehydrated hereditary stomatocytosis with or without pseudohyperkalemia and/or perinatal edema"
[2] "anemia, nonspherocytic hemolytic, associated with abnormality of red cell membrane"
[3] "anemia, nonspherocytic hemolytic, possibly due to defect in porphyrin metabolism"
[4] "x-linked dyserythropoetic anemia with abnormal platelets and neutropenia"
[5] "hemolytic anemia due to erythrocyte adenosine deaminase overproduction"
gwas_study_info = gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms,
pattern = paste0(hemolytic_anemia_terms, collapse = "(?=,|$)|\\b"),
"hemolytic anemia"
))
url <- "http://www.ebi.ac.uk/ols4/api/ontologies/hp/terms/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FHP_0004299/children"
hernia_abdominal_wall_terms <- get_descendants(url)
[1] "Number of terms collected:"
[1] 6
[1] "\n Some example terms"
[1] "incisional hernia" "umbilical hernia" "inguinal hernia"
[4] "femoral hernia" "ventral hernia"
gwas_study_info = gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms,
pattern = paste0(hernia_abdominal_wall_terms, collapse = "(?=,|$)|\\b"),
"hernia of the abdominal wall"
))
gwas_study_info =
gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms,
pattern = "herpes simplex infection",
"herpesviridae infectious disease"
)
)
url <- "http://www.ebi.ac.uk/ols4/api/ontologies/efo/terms/http%253A%252F%252Fwww.ebi.ac.uk%252Fefo%252FEFO_0000537/descendants"
hypertension_terms <- get_descendants(url)
[1] "Number of terms collected:"
[1] 46
[1] "\n Some example terms"
[1] "pulmonary veno-occlusive disease and/or pulmonary capillary haemangiomatosis"
[2] "pulmonary arterial hypertension associated with connective tissue disease"
[3] "pulmonary arterial hypertension associated with chronic hemolytic anemia"
[4] "pulmonary arterial hypertension associated with congenital heart disease"
[5] "hyperuricemia-pulmonary hypertension-renal failure-alkalosis syndrome"
hypertension_terms = c("primary hypertension",
hypertension_terms
)
gwas_study_info = gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms,
pattern = paste0(hypertension_terms, collapse = "(?=,|$)|\\b"),
"hypertension"
))
gwas_study_info = gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms,
"hiv-1 infection",
"hiv infection"
))
hyperlipidemia_terms <- c("hypercholesterolemia",
"familial hypercholesterolemia",
"familial hyperlipidemia"
)
gwas_study_info = gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms,
pattern = paste0(hyperlipidemia_terms, collapse = "(?=,|$)|\\b"),
"hyperlipidemia"
))
url <- "http://www.ebi.ac.uk/ols4/api/ontologies/snomed/terms/http%253A%252F%252Fsnomed.info%252Fid%252F36803009/descendants"
idiopathic_generalized_epilepsy_terms <- get_descendants(url)
[1] "Number of terms collected:"
[1] 4
[1] "\n Some example terms"
[1] "epilepsy with generalized tonic-clonic seizures alone (disorder)"
[2] "juvenile myoclonic epilepsy"
[3] "childhood absence epilepsy"
[4] "juvenile absence epilepsy"
[5] NA
idiopathic_generalized_epilepsy_terms = stringr::str_remove_all(
idiopathic_generalized_epilepsy_terms,
" \\(disorder\\)$"
)
gwas_study_info = gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms,
pattern = paste0(idiopathic_generalized_epilepsy_terms, collapse = "(?=,|$)|\\b"),
"idiopathic generalized epilepsy"
))
inborn_error_metab <- c("inborn disorder of amino acid metabolism",
"inborn disorder of amino acid transport",
"inborn disorder of porphyrin metabolism",
"inborn carbohydrate metabolic disorder",
"familial lipoprotein lipase deficiency",
"lactose intolerance",
"hereditary hemochromatosis",
"alpha 1-antitrypsin deficiency",
"urea cycle disorder",
"gaucher disease",
"plasma protein metabolism disease",
"disorder of metabolite absorption and transport")
gwas_study_info = gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms,
pattern = paste0(inborn_error_metab, collapse = "(?=,|$)|\\b"),
"inborn errors of metabolism"
))
gwas_study_info = gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms,
"bacterial meningitis|pneumococcal meningitis|viral meningitis",
"infectious meningitis"
))
gwas_study_info = gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms,
"tubal factor infertility",
"female infertility"
)) |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms,
"azoospermia",
"male infertility"
) )
gwas_study_info = gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms,
"influenza a \\(h1n1\\)",
"influenza"
))
url <- "http://www.ebi.ac.uk/ols4/api/ontologies/mondo/terms/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FMONDO_0004565/descendants"
intestinal_obstruction_terms <- get_descendants(url)
[1] "Number of terms collected:"
[1] 21
[1] "\n Some example terms"
[1] "intestinal obstruction in the newborn due to guanylate cyclase 2c deficiency"
[2] "intestinal pseudoobstruction, neuronal, chronic idiopathic, x-linked"
[3] "visceral neuropathy, familial, 1, autosomal recessive"
[4] "visceral neuropathy, familial, 3, autosomal dominant"
[5] "cystic fibrosis associated meconium ileus"
# also add: http://purl.obolibrary.org/obo/DOID_8437
intestinal_obstruction_terms = c(intestinal_obstruction_terms,
"intestinal impaction"
)
gwas_study_info = gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms,
pattern = paste0(intestinal_obstruction_terms, collapse = "(?=,|$)|\\b"),
"intestinal obstruction"
))
gwas_study_info = gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms ,
"systemic juvenile idiopathic arthritis|oligoarticular juvenile idiopathic arthritis|
polyarticular juvenile idiopathic arthritis",
"juvenile idiopathic arthritis")
)
gwas_study_info = gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms,
"lewy body attribute",
"lewy body dementia"
))
gwas_study_info = gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms,
"plasmodium falciparum malaria|plasmodium vivax malaria",
"malaria"
))
gwas_study_info = gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms,
"methamphetamine dependence|methamphetamine-induced psychosis",
"methamphetamine use disorders"
))
gwas_study_info = gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms,
"migraine disorder|migraine with aura|migraine without aura",
"migraine"
))
gwas_study_info = gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms,
"relapsing-remitting multiple sclerosis",
"multiple sclerosis"
))
url <- "http://www.ebi.ac.uk/ols4/api/ontologies/efo/terms/http%253A%252F%252Fwww.ebi.ac.uk%252Fefo%252FEFO_0000612/descendants"
myocardial_infarction_terms <- get_descendants(url)
[1] "Number of terms collected:"
[1] 7
[1] "\n Some example terms"
[1] "subsequent st elevation (stemi) and non-st elevation (nstemi) myocardial infarction"
[2] "acute anterolateral myocardial infarction"
[3] "non-st elevation myocardial infarction"
[4] "anterolateral myocardial infarction"
[5] "st elevation myocardial infarction"
gwas_study_info = gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms,
pattern = paste0(myocardial_infarction_terms, collapse = "(?=,|$)|\\b"),
"myocardial infarction"
))
gwas_study_info = gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms,
"phototoxic dermatitis|skin sensitivity to sun",
"photosensitivity disease")
)
url <- "http://www.ebi.ac.uk/ols4/api/ontologies/efo/terms/http%253A%252F%252Fwww.ebi.ac.uk%252Fefo%252FEFO_1001908/descendants"
phobic_disorder_terms <- get_descendants(url)
[1] "Number of terms collected:"
[1] 11
[1] "\n Some example terms"
[1] "panic disorder with agoraphobia" "blood-injection-injury phobia"
[3] "social anxiety disorder" "specific phobia"
[5] "flying phobia"
gwas_study_info = gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms,
pattern = paste0(phobic_disorder_terms, collapse = "(?=,|$)|\\b"),
"phobic disorder"
))
gwas_study_info = gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms,
"hereditary hemochromatosis type 1",
"hereditary hemochromatosis"
))
gwas_study_info = gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms,
"late-onset myasthenia gravis",
"myasthenia gravis"
))
gwas_study_info = gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms,
"aqp4-igg-negative neuromyelitis optica|aqp4-igg-positive neuromyelitis optica",
"neuromyelitis optica"
))
gwas_study_info = gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms,
"neurofibromatosis type 1|neurofibromatosis type 2",
"neurofibromatosis"
))
gwas_study_info = gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms,
"non-alcoholic steatohepatitis|non-alcoholic liver disease",
"non-alcoholic fatty liver disease"
))
gwas_study_info =
gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms,
pattern = "morbid obesity|metabolically healthy obesity",
"obesity"
)
)
gwas_study_info = gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms,
"obsessive-compulsive trait",
"obsessive-compulsive disorder"
))
osteoarthritis_terms <- c("knee osteoarthritis",
"hip osteoarthritis",
"hand osteoarthritis",
"spine osteoarthritis",
"toe osteoarthritis"
)
gwas_study_info = gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms,
pattern = paste0(osteoarthritis_terms, collapse = "(?=,|$)|\\b"),
"osteoarthritis"
)
)
gwas_study_info = gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms,
"idiopathic osteonecrosis of the femoral head",
"osteonecrosis"
))
url <- "http://www.ebi.ac.uk/ols4/api/ontologies/efo/terms/http%253A%252F%252Fwww.ebi.ac.uk%252Fefo%252FEFO_0000278/descendants"
pancreatitis_terms <- get_descendants(url)
[1] "Number of terms collected:"
[1] 9
[1] "\n Some example terms"
[1] "thiopurine immunosuppressant-induced pancreatitis"
[2] "asparaginase-induced acute pancreatitis"
[3] "hereditary chronic pancreatitis"
[4] "autoimmune pancreatitis type 1"
[5] "non-alcoholic pancreatitis"
gwas_study_info = gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms,
pattern = paste0(pancreatitis_terms, collapse = "(?=,|$)|\\b"),
"pancreatitis"
))
pericardium_disorder_terms <- c("pericarditis",
"pericardial effusion"
)
url <- "http://www.ebi.ac.uk/ols4/api/ontologies/doid/terms/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FDOID_3388/descendants"
periodontal_terms <- get_descendants(url)
[1] "Number of terms collected:"
[1] 19
[1] "\n Some example terms"
[1] "suppurative periapical periodontitis"
[2] "necrotizing ulcerative gingivitis"
[3] "chronic apical periodontitis"
[4] "acute apical periodontitis"
[5] "periapical periodontitis"
gwas_study_info = gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms,
pattern = paste0(periodontal_terms, collapse = "(?=,|$)|\\b"),
"periodontal disease"
))
gwas_study_info = gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms,
"carbon monoxide poisoning",
"poisoning"
))
url <- "http://www.ebi.ac.uk/ols4/api/ontologies/efo/terms/http%253A%252F%252Fwww.ebi.ac.uk%252Fefo%252FEFO_0005804/descendants"
polycythemia_terms <- get_descendants(url)
[1] "Number of terms collected:"
[1] 16
[1] "\n Some example terms"
[1] "autosomal recessive secondary polycythemia not associated with vhl gene"
[2] "primary familial polycythemia due to epo receptor mutation"
[3] "autosomal dominant secondary polycythemia"
[4] "congenital secondary polycythemia"
[5] "acquired secondary polycythemia"
gwas_study_info = gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms,
pattern = paste0(polycythemia_terms, collapse = "(?=,|$)|\\b"),
"polycythemia"
))
gwas_study_info = gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms,
"pulmonary embolism, pulmonary infarction",
"pulmonary embolism"
))
gwas_study_info = gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms,
"idiopathic pulmonary fibrosis",
"pulmonary fibrosis"
))
psoriasis_terms <- c("psoriasis vulgaris",
"psoriasis area and severity index",
"cutaneous psoriasis",
"psoriatic arthritis",
"parapsoriasis")
gwas_study_info = gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms,
pattern = paste0(psoriasis_terms, collapse = "(?=,|$)|\\b"),
"psoriasis")
)
gwas_study_info = gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms,
"psychosis predisposition",
"psychosis")
)
ra_terms <- c("acpa-positive rheumatoid arthritis",
"acpa-negative rheumatoid arthritis",
"adult-onset stills disease")
gwas_study_info =
gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms,
pattern = paste0(ra_terms, collapse = "(?=,|$)|\\b"),
"rheumatoid arthritis"
)
)
gwas_study_info = gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms,
"schizoaffective disorder-bipolar type",
"schizoaffective disorder")
)
gwas_study_info = gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms,
"treatment refractory schizophrenia",
"schizophrenia")
)
gwas_study_info = gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_remove_all(l1_all_disease_terms,
"^ldh-related sciatica, |, ldh-related sciatica$"
)
)
gwas_study_info = gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms,
"adolescent idiopathic scoliosis",
"scoliosis")
)
gwas_study_info = gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms,
"sickle cell anemia",
"sickle cell disease and related diseases")
)
sleep_apnea_terms <- c("sleep apnea measurement during non-rem sleep",
"sleep apnea measurement during rem sleep",
"obstructive sleep apnea")
gwas_study_info = gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms,
pattern = paste0(sleep_apnea_terms, collapse = "(?=,|$)|\\b"),
"sleep apnea")
)
gwas_study_info = gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms,
"methicillin-resistant staphylococcus aureus infection",
"staphylococcus aureus infection"
))
hemorrhage_terms <- c("intracerebral hemorrhage",
"non-lobar intracerebral hemorrhage",
"lobar intracerebral hemorrhage")
gwas_study_info = gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms,
pattern = paste0(hemorrhage_terms, collapse = "(?=,|$)|\\b"),
"hemorrhagic stroke")
)
stroke_terms <- c("stroke outcome",
"large artery stroke",
"small vessel stroke",
"ischemic stroke",
"stroke disorder",
"cardioembolic stroke",
"hemorrhagic stroke",
"intracranial hemorrhage",
"subdural hemorrhage"
)
gwas_study_info = gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms,
paste0(stroke_terms, collapse = "(?=,|$)|\\b"),
"stroke")
)
gwas_study_info = gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms,
"post-operative stroke",
"post-operative, stroke")
)
tacycardia_terms <- c("paroxysmal tachycardia",
"ventricular tachycardia",
"supraventricular tachycardia",
"paroxysmal supraventricular tachycardia",
"paroxysmal ventricular tachycardia"
)
gwas_study_info = gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms,
pattern = paste0(tacycardia_terms, collapse = "(?=,|$)|\\b"),
"tacycardia"
)
)
url <- "http://www.ebi.ac.uk/ols4/api/ontologies/efo/terms/http%253A%252F%252Fwww.ebi.ac.uk%252Fefo%252FEFO_0000717/descendants"
scleroderma_terms <- get_descendants(url)
[1] "Number of terms collected:"
[1] 5
[1] "\n Some example terms"
[1] "anti-topoisomerase-i-antibody-positive systemic scleroderma"
[2] "anti-centromere-antibody-positive systemic scleroderma"
[3] "limited cutaneous systemic sclerosis"
[4] "limited scleroderma"
[5] "diffuse scleroderma"
gwas_study_info = gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms,
pattern = paste0(scleroderma_terms, collapse = "(?=,|$)|\\b"),
"systemic scleroderma"
))
gwas_study_info = gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms,
"stenosing tenosynovitis",
"tenosynovitis")
)
tb_terms <- c("mycobacterium tuberculosis infection",
"\\bpulmonary tuberculosis",
"extrapulmonary tuberculosis",
"meningeal tuberculosis")
gwas_study_info = gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms,
pattern = paste0(tb_terms, collapse = "(?=,|$)|\\b"),
"tuberculosis"
))
gwas_study_info = gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms,
"latent autoimmune diabetes in adults",
"type 1 diabetes mellitus"
))
thrombocytopenia_terms <- c("thrombocytopenia 4",
"acquired thrombocytopenia",
"primary thrombocytopenia"
)
tinea_terms <- c("tinea pedis",
"tinea unguium",
"dermatophytosis"
)
gwas_study_info = gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms,
pattern = paste0(tinea_terms, collapse = "(?=,|$)|\\b"),
"tinea"
))
gwas_study_info = gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms,
"vitamin b12 deficiency",
"vitamin b deficiency"
))
url <- "http://www.ebi.ac.uk/ols4/api/ontologies/efo/terms/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FHP_0000505/descendants"
visual_impairment_terms <- get_descendants(url)
[1] "Number of terms collected:"
[1] 12
[1] "\n Some example terms"
[1] "constriction of peripheral visual field"
[2] "severely reduced visual acuity"
[3] "slow decrease in visual acuity"
[4] "peripheral visual field loss"
[5] "cerebral visual impairment"
gwas_study_info = gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms,
pattern = paste0(visual_impairment_terms, collapse = "(?=,|$)|\\b"),
"visual impairment"
))
url <- "http://www.ebi.ac.uk/ols4/api/ontologies/upheno/terms/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FHP_0000539/descendants"
abnormal_refraction_terms <- get_descendants(url)
[1] "Number of terms collected:"
[1] 22
[1] "\n Some example terms"
[1] "against the rule astigmatism" "with the rule astigmatism"
[3] "moderate hypermetropia" "lenticular astigmatism"
[5] "irregular astigmatism"
gwas_study_info = gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms,
pattern = paste0(abnormal_refraction_terms, collapse = "(?=,|$)|\\b"),
"abnormality of refraction"
))
gwas_study_info =
gwas_study_info |>
rowwise() |>
mutate(l1_all_disease_terms = paste0(sort(unique(unlist(strsplit(l1_all_disease_terms, ", ")))),
collapse = ", ")
) |>
ungroup()
n_studies_trait = gwas_study_info |>
dplyr::filter(DISEASE_STUDY == T) |>
dplyr::select(l1_all_disease_terms, PUBMED_ID) |>
dplyr::distinct() |>
dplyr::group_by(l1_all_disease_terms) |>
dplyr::summarise(n_studies = dplyr::n()) |>
dplyr::arrange(desc(n_studies))
head(n_studies_trait)
# A tibble: 6 × 2
l1_all_disease_terms n_studies
<chr> <int>
1 type 2 diabetes mellitus 145
2 asthma 134
3 alzheimers disease 124
4 breast cancer 112
5 major depressive disorder 108
6 schizophrenia 108
dim(n_studies_trait)
[1] 2720 2
diseases <- stringr::str_split(pattern = ", ",
gwas_study_info$l1_all_disease_terms[gwas_study_info$l1_all_disease_terms != ""]) |>
unlist() |>
stringr::str_trim()
test <- data.frame(trait = unique(diseases))
length(unique(diseases))
[1] 1907
# make frequency table
freq <- table(as.factor(diseases))
# sort in decreasing order
freq_sorted <- sort(freq, decreasing = TRUE)
# show top N, e.g. top 10
head(freq_sorted, 10)
chronic kidney disease hypertension type 2 diabetes mellitus
10835 7089 922
coronary artery disease major depressive disorder alzheimers disease
501 471 422
schizophrenia asthma covid-19
368 348 305
stroke
303
fwrite(gwas_study_info,
here::here("output/gwas_cat/gwas_study_info_group_l1.csv")
)
sessionInfo()
R version 4.3.1 (2023-06-16)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS 15.6.1
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
time zone: America/Los_Angeles
tzcode source: internal
attached base packages:
[1] stats graphics grDevices datasets utils methods base
other attached packages:
[1] jsonlite_2.0.0 httr_1.4.7 stringr_1.5.1 data.table_1.17.8
[5] dplyr_1.1.4 workflowr_1.7.1
loaded via a namespace (and not attached):
[1] compiler_4.3.1 renv_1.0.3 promises_1.3.3 tidyselect_1.2.1
[5] Rcpp_1.1.0 git2r_0.36.2 callr_3.7.6 later_1.4.2
[9] jquerylib_0.1.4 yaml_2.3.10 fastmap_1.2.0 here_1.0.1
[13] R6_2.6.1 generics_0.1.4 curl_6.4.0 knitr_1.50
[17] tibble_3.3.0 rprojroot_2.1.0 bslib_0.9.0 pillar_1.11.0
[21] rlang_1.1.6 utf8_1.2.6 cachem_1.1.0 stringi_1.8.7
[25] httpuv_1.6.16 xfun_0.52 getPass_0.2-4 fs_1.6.6
[29] sass_0.4.10 cli_3.6.5 withr_3.0.2 magrittr_2.0.3
[33] ps_1.9.1 digest_0.6.37 processx_3.8.6 rstudioapi_0.17.1
[37] lifecycle_1.0.4 vctrs_0.6.5 evaluate_1.0.4 glue_1.8.0
[41] whisker_0.4.1 rmarkdown_2.29 tools_4.3.1 pkgconfig_2.0.3
[45] htmltools_0.5.8.1