Last updated: 2025-12-29
Checks: 7 0
Knit directory:
genomics_ancest_disease_dispar/
This reproducible R Markdown analysis was created with workflowr (version 1.7.1). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.
Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.
Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.
The command set.seed(20220216) was run prior to running
the code in the R Markdown file. Setting a seed ensures that any results
that rely on randomness, e.g. subsampling or permutations, are
reproducible.
Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.
Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.
Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.
The results in this page were generated with repository version caf759a. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.
Note that you need to be careful to ensure that all relevant files for
the analysis have been committed to Git prior to generating the results
(you can use wflow_publish or
wflow_git_commit). workflowr only checks the R Markdown
file, but you know if there are other scripts or data files that it
depends on. Below is the status of the Git repository when the results
were generated:
Ignored files:
Ignored: .DS_Store
Ignored: .Rproj.user/
Ignored: .venv/
Ignored: analysis/.DS_Store
Ignored: ancestry_dispar_env/
Ignored: data/.DS_Store
Ignored: data/cdc/
Ignored: data/cohort/
Ignored: data/gbd/.DS_Store
Ignored: data/gbd/IHME-GBD_2021_DATA-d8cf695e-1.csv
Ignored: data/gbd/IHME-GBD_2023_DATA-73cc01fd-1.csv
Ignored: data/gbd/ihme_gbd_2019_global_disease_burden_rate_all_ages.csv
Ignored: data/gbd/ihme_gbd_2019_global_paf_rate_percent_all_ages.csv
Ignored: data/gbd/ihme_gbd_2021_global_disease_burden_rate_all_ages.csv
Ignored: data/gbd/ihme_gbd_2021_global_paf_rate_percent_all_ages.csv
Ignored: data/gwas_catalog/
Ignored: data/icd/.DS_Store
Ignored: data/icd/2025AA/
Ignored: data/icd/IHME_GBD_2019_COD_CAUSE_ICD_CODE_MAP_Y2020M10D15.XLSX
Ignored: data/icd/IHME_GBD_2019_NONFATAL_CAUSE_ICD_CODE_MAP_Y2020M10D15.XLSX
Ignored: data/icd/IHME_GBD_2021_COD_CAUSE_ICD_CODE_MAP_Y2024M05D16.XLSX
Ignored: data/icd/IHME_GBD_2021_NONFATAL_CAUSE_ICD_CODE_MAP_Y2024M05D16.XLSX
Ignored: data/icd/UK_Biobank_master_file.tsv
Ignored: data/icd/cdc_valid_icd10_Sep_23_2025.xlsx
Ignored: data/icd/cdc_valid_icd9_Sep_23_2025.xlsx
Ignored: data/icd/hp_umls_mapping.csv
Ignored: data/icd/lancet_conditions_icd10.xlsx
Ignored: data/icd/manual_disease_icd10_mappings.xlsx
Ignored: data/icd/mondo_umls_mapping.csv
Ignored: data/icd/phecode_international_version_unrolled.csv
Ignored: data/icd/phecode_to_icd10_manual_mapping.xlsx
Ignored: data/icd/semiautomatic_ICD-pheno.txt
Ignored: data/icd/semiautomatic_ICD-pheno_UKB_subset.txt
Ignored: data/icd/umls-2025AA-mrconso.zip
Ignored: figures/
Ignored: human_dictionary/
Ignored: igsr_populations.tsv
Ignored: output/.DS_Store
Ignored: output/abstracts/
Ignored: output/doccano/
Ignored: output/fulltexts/
Ignored: output/gwas_cat/
Ignored: output/gwas_cohorts/
Ignored: output/icd_map/
Ignored: output/trait_ontology/
Ignored: pubmedbert-cohort-ner-model/
Ignored: pubmedbert-cohort-ner/
Ignored: r-spacyr/
Ignored: renv/
Ignored: venv/
Ignored: visualization.Rdata
Unstaged changes:
Modified: .gitignore
Modified: analysis/disease_inves_by_ancest.Rmd
Modified: analysis/get_full_text.Rmd
Modified: analysis/gwas_to_gbd.Rmd
Modified: analysis/index.Rmd
Modified: analysis/missing_cohort_info.Rmd
Modified: analysis/replication_ancestry_bias.Rmd
Modified: analysis/text_for_cohort_labels.Rmd
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
These are the previous versions of the repository in which changes were
made to the R Markdown (analysis/level_2_disease_group.Rmd)
and HTML (docs/level_2_disease_group.html) files. If you’ve
configured a remote Git repository (see ?wflow_git_remote),
click on the hyperlinks in the table below to view the files as they
were in that past version.
| File | Version | Author | Date | Message |
|---|---|---|---|---|
| Rmd | caf759a | IJbeasley | 2025-12-29 | Archiving old GWAS trait conversion |
| html | c5ee886 | IJbeasley | 2025-09-24 | Build site. |
| Rmd | fde6f07 | IJbeasley | 2025-09-24 | Even more grouping |
| html | fd33f6d | IJbeasley | 2025-09-24 | Build site. |
| Rmd | fee6e4e | IJbeasley | 2025-09-24 | Yay better way to group with icd codes |
| html | 9ff91c0 | IJbeasley | 2025-09-24 | Build site. |
| Rmd | a61120f | IJbeasley | 2025-09-24 | More fixing diseases |
| html | 8f83a4a | IJbeasley | 2025-09-23 | Build site. |
| Rmd | 54d06f8 | IJbeasley | 2025-09-23 | More icd codes |
| html | bed5e82 | IJbeasley | 2025-09-23 | Build site. |
| Rmd | fbee909 | IJbeasley | 2025-09-23 | Using icd codes to help grouping |
| html | 9046c0d | IJbeasley | 2025-09-23 | Build site. |
| Rmd | dd87c61 | IJbeasley | 2025-09-23 | Using icd codes to help grouping |
| html | 904bb1d | IJbeasley | 2025-09-22 | Build site. |
| Rmd | bbcc167 | IJbeasley | 2025-09-22 | Even more typo etc. |
| html | 75debe4 | IJbeasley | 2025-09-22 | Build site. |
| Rmd | b3e3287 | IJbeasley | 2025-09-22 | …maybe fixing typos |
| html | b8ee7f0 | IJbeasley | 2025-09-22 | Build site. |
| Rmd | 3305f6a | IJbeasley | 2025-09-22 | …maybe fixing typos |
| html | 200442f | IJbeasley | 2025-09-17 | Build site. |
| Rmd | 614204e | IJbeasley | 2025-09-17 | More fixing up of disease grouping |
| html | 7b87d93 | IJbeasley | 2025-09-17 | Build site. |
| Rmd | da5c4b4 | IJbeasley | 2025-09-17 | More correction to cardiovascular disease terms |
| html | 08b0db3 | IJbeasley | 2025-09-17 | Build site. |
| Rmd | bb8ae95 | IJbeasley | 2025-09-17 | Better grouping of cardiovascular disease |
| html | 7cf2803 | IJbeasley | 2025-09-17 | Build site. |
| Rmd | 39262a4 | IJbeasley | 2025-09-17 | More typo fixing |
| html | c0cf9bd | IJbeasley | 2025-09-16 | Build site. |
| Rmd | 3519a0b | IJbeasley | 2025-09-16 | Collapsing traits to gbd |
| html | f1b18b0 | IJbeasley | 2025-09-16 | Build site. |
| Rmd | afe44b4 | IJbeasley | 2025-09-16 | Collapsing traits to gbd |
| html | c204ac4 | IJbeasley | 2025-09-16 | Build site. |
| Rmd | 7fa03f5 | IJbeasley | 2025-09-16 | More cancer typos |
| html | 8f1639b | IJbeasley | 2025-09-16 | Build site. |
| Rmd | 345ad9b | IJbeasley | 2025-09-16 | More cancer typos |
| html | a15dd40 | IJbeasley | 2025-09-16 | Build site. |
| Rmd | 16ead66 | IJbeasley | 2025-09-16 | Correcting some cancer grouping |
| html | 6018e42 | IJbeasley | 2025-09-16 | Build site. |
| Rmd | 02a0b9d | IJbeasley | 2025-09-16 | Improving cancer grouping |
| html | 6f66696 | IJbeasley | 2025-09-16 | Build site. |
| Rmd | 66cff1c | IJbeasley | 2025-09-16 | Even more disease term grouping |
| html | 21b6c02 | IJbeasley | 2025-09-15 | Build site. |
| html | 5ec3111 | IJbeasley | 2025-09-15 | Build site. |
| html | 30d773e | IJbeasley | 2025-09-15 | Build site. |
| html | 8d64a38 | IJbeasley | 2025-09-15 | Build site. |
| Rmd | b3088d8 | IJbeasley | 2025-09-15 | workflowr::wflow_publish("analysis/level_2_disease_group.Rmd") |
| html | b89d661 | IJbeasley | 2025-09-10 | Build site. |
| Rmd | c0fcab7 | IJbeasley | 2025-09-10 | workflowr::wflow_publish("analysis/level_2_disease_group.Rmd") |
| html | ead4d8e | IJbeasley | 2025-09-10 | Build site. |
| Rmd | 3964f77 | IJbeasley | 2025-09-10 | workflowr::wflow_publish("analysis/level_2_disease_group.Rmd") |
| html | 8fb639d | IJbeasley | 2025-09-10 | Build site. |
| Rmd | edeb6f5 | IJbeasley | 2025-09-10 | workflowr::wflow_publish("analysis/level_2_disease_group.Rmd") |
| html | fe91704 | IJbeasley | 2025-09-09 | Build site. |
| Rmd | 9c64867 | IJbeasley | 2025-09-09 | Minor fixing of disease trait categorisation |
| html | fa509c0 | IJbeasley | 2025-09-08 | Build site. |
| Rmd | c9602c7 | IJbeasley | 2025-09-08 | More grouping to match GBD |
library(dplyr)
library(data.table)
library(ggplot2)
library(stringr)
source(here::here("code/get_term_descendants.R"))
gwas_study_info <- fread(here::here("output/gwas_cat/gwas_study_info_group_l1_v2.csv"))
gwas_study_info = gwas_study_info |>
mutate(l2_all_disease_terms = l1_all_disease_terms)
maternal_disorder_terms <- c("galactorrhea", #other direct maternal disorders
"eclampsia", # maternal hypertensive disorders
"hyperemesis gravidarum", #other direct maternal disorders
"vomiting of pregnancy", #other direct maternal disorders
"intrahepatic cholestasis of pregnancy", #other direct maternal disorders (? indirect maternal deaths)
"loss of pregnancy", # maternal abortion and miscarriage
"ectopic pregnancy", # ectopic pregnancy
"post term pregnancy", # other direct maternal disorders
"pregnancy disorder",
"obstructed labor", # Maternal obstructed labor and uterine rupture
"early pregnancy hemorrhage", # maternal hemorrhage
"preterm premature rupture of the membranes", # Other direct maternal disorders
"stillbirth", # maternal abortion and miscarriage
"miscarriage", # maternal abortion and miscarriage
"abnormal delivery",
"abruptio placentae", # maternal hemorrhage
"failed induction",
"gestational diabetes", # indirect maternal deaths
"chorioamnionitis"
)
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = vec_to_grep_pattern(maternal_disorder_terms),
"maternal disorders"
)
)
neonatal_disorders <- c("neonatal jaundice", # Hemolytic disease and other neonatal jaundice)
"neonatal sepsis", # Neonatal sepsis and other neonatal infections
"neonatal abstinence syndrome", # Other neonatal disorders
"perinatal disease", # Neonatal sepsis and other neonatal infections
"asphyxia neonatorum" # Neonatal encephalopathy due to birth asphyxia and trauma
)
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = vec_to_grep_pattern(neonatal_disorders),
"neonatal disorders"
)
)
gwas_study_info |>
filter(grepl(vec_to_grep_pattern("lip and oral cavity cancer"),
l2_all_disease_terms,
perl = T)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = vec_to_grep_pattern("nasopharyngeal cancer"),
"nasopharynx cancer"
)
)
gwas_study_info |>
filter(grepl(vec_to_grep_pattern("nasopharynx cancer"),
l2_all_disease_terms,
perl = T)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct()
gwas_study_info |>
filter(grepl(vec_to_grep_pattern("other pharynx cancer"),
l2_all_disease_terms,
perl = T)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
gwas_study_info |>
filter(grepl(vec_to_grep_pattern("esophageal cancer"),
l2_all_disease_terms,
perl = T)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
gwas_study_info |>
filter(grepl(vec_to_grep_pattern("stomach cancer"),
l2_all_disease_terms,
perl = T)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
gwas_study_info = gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = vec_to_grep_pattern("colorectal cancer"),
"colon and rectum cancer"
)
)
gwas_study_info |>
filter(grepl(vec_to_grep_pattern("colon and rectum cancer"),
l2_all_disease_terms,
perl = T)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
gwas_study_info |>
filter(grepl(vec_to_grep_pattern("liver cancer"),
l2_all_disease_terms,
perl = T)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
case_when(
l2_all_disease_terms == "cancer of gallbladder and extrahepatic biliary tract" ~ "gallbladder and biliary tract cancer",
TRUE ~ l2_all_disease_terms
)
)
gwas_study_info |>
filter(grepl(vec_to_grep_pattern("gallbladder and biliary tract cancer"),
l2_all_disease_terms,
perl = T)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct()
gwas_study_info |>
filter(grepl(vec_to_grep_pattern("pancreatic cancer"),
l2_all_disease_terms,
perl = T)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
gwas_study_info |>
filter(grepl(vec_to_grep_pattern("larynx cancer"),
l2_all_disease_terms,
perl = T)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
resp_cancer_terms = c("lung cancer",
"bronchus cancer",
"tracheal cancer",
"respiratory system cancer"
)
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
ifelse(l2_all_disease_terms != "tracheal bronchus and lung cancer",
stringr::str_replace_all(l2_all_disease_terms,
pattern = vec_to_grep_pattern(resp_cancer_terms),
#pattern = paste0(resp_cancer_terms, collapse = "(?=,|$)|\\b"),
"tracheal bronchus and lung cancer"
),
l2_all_disease_terms
)
)
gwas_study_info |>
filter(grepl(vec_to_grep_pattern("tracheal bronchus and lung cancer"),
l2_all_disease_terms,
perl = T)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms,
pattern = vec_to_grep_pattern("malignant melanoma of skin"),
"malignant skin melanoma"
)
)
gwas_study_info |>
filter(grepl(vec_to_grep_pattern("malignant skin melanoma"),
l2_all_disease_terms,
perl = T)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
gwas_study_info |>
filter(grepl(vec_to_grep_pattern("non-melanoma skin cancer"),
l2_all_disease_terms,
perl = T)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct()
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = vec_to_grep_pattern("soft tissue sarcoma"),
"soft tissue and other extraosseous sarcomas"
)
)
gwas_study_info |>
filter(grepl(vec_to_grep_pattern("soft tissue and other extraosseous sarcomas"),
l2_all_disease_terms,
perl = T)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = vec_to_grep_pattern(c("bone cancer","osteosarcoma")),
"malignant neoplasm of bone and articular cartilage"
)
)
gwas_study_info |>
filter(grepl(vec_to_grep_pattern("malignant neoplasm of bone and articular cartilage"),
l2_all_disease_terms,
perl = T)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
gwas_study_info |>
filter(grepl(vec_to_grep_pattern("breast cancer"),
l2_all_disease_terms,
perl = T
)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
gwas_study_info |>
filter(grepl(vec_to_grep_pattern("cervical cancer"),
l2_all_disease_terms,
perl = T)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
# ? is endometrial cancer a subset of uterine cancer for GBD?
# is for ontology: http://purl.obolibrary.org/obo/MONDO_0002715
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = vec_to_grep_pattern("endometrial cancer"),
"uterine cancer"
)
)
gwas_study_info |>
filter(grepl(vec_to_grep_pattern("uterine cancer"),
l2_all_disease_terms,
perl = T)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
gwas_study_info |>
filter(grepl(vec_to_grep_pattern("ovarian cancer"),
l2_all_disease_terms,
perl = T)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
gwas_study_info |>
filter(grepl(vec_to_grep_pattern("prostate cancer"),
l2_all_disease_terms,
perl = T)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
gwas_study_info |>
filter(grepl(vec_to_grep_pattern("testicular cancer"),
l2_all_disease_terms,
perl = T)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct()
gwas_study_info |>
filter(grepl(vec_to_grep_pattern("kidney cancer"),
l2_all_disease_terms,
perl = T)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = vec_to_grep_pattern("urinary bladder cancer"),
"bladder cancer"
)
)
gwas_study_info |>
filter(grepl(vec_to_grep_pattern("bladder cancer"),
l2_all_disease_terms,
perl = T)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct()
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = vec_to_grep_pattern("central nervous system cancer"),
"brain and central nervous system cancer"
)
)
gwas_study_info |>
filter(grepl(vec_to_grep_pattern("brain and central nervous system cancer"),
l2_all_disease_terms,
perl = T)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern =
vec_to_grep_pattern(c("ocular melanoma",
"ocular cancer")
),
"eye cancer"
)
)
gwas_study_info |>
filter(grepl(vec_to_grep_pattern("eye cancer"),
l2_all_disease_terms,
perl = T)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms,
pattern = vec_to_grep_pattern(
c("neuroblastoma",
"peripheral nervous system cancer")
),
"neuroblastoma and other peripheral nervous cell tumors"
)
)
gwas_study_info |>
filter(grepl(vec_to_grep_pattern("neuroblastoma and other peripheral nervous cell tumors"),
l2_all_disease_terms,
perl = T)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
gwas_study_info |>
filter(grepl(vec_to_grep_pattern("thyroid cancer"),
l2_all_disease_terms,
perl = T)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
gwas_study_info |>
filter(grepl(vec_to_grep_pattern("mesothelioma"),
l2_all_disease_terms,
perl = T)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = vec_to_grep_pattern("hodgkins lymphoma"),
"hodgkin lymphoma"
)
)
gwas_study_info |>
filter(grepl(vec_to_grep_pattern("hodgkin lymphoma"),
l2_all_disease_terms,
perl = T)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = vec_to_grep_pattern("non-hodgkins lymphoma"),
"non-hodgkin lymphoma"
)
)
gwas_study_info |>
filter(grepl(vec_to_grep_pattern("non-hodgkin lymphoma"),
l2_all_disease_terms,
perl = T)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
gwas_study_info |>
filter(grepl(vec_to_grep_pattern("multiple myeloma"),
l2_all_disease_terms,
perl = T)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
gwas_study_info |>
filter(grepl(vec_to_grep_pattern("leukemia"),
l2_all_disease_terms,
perl = T)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
case_when(
l2_all_disease_terms == "cancer" ~ "other malignant neoplasms",
TRUE ~ l2_all_disease_terms
)
)
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
ifelse(PUBMED_ID == 27790247,
stringr::str_replace_all(l2_all_disease_terms,
pattern = ", cancer,",
", other malignant neoplasms,"
),
l2_all_disease_terms
)
)
### dealing with measuring cancer caused factor terms
gwas_study_info |>
filter(grepl("^cancer,", l2_all_disease_terms)) |>
pull(l2_all_disease_terms) |>
unique()
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
ifelse(grepl("^cancer,", l2_all_disease_terms),
stringr::str_replace_all(l2_all_disease_terms,
pattern = "^cancer,",
"other malignant neoplasms,"
),
l2_all_disease_terms
)
)
other_malignant_terms <- c(
"retroperitoneal cancer",
"peritoneal cancer",
"ewing sarcoma",
"digestive system cancer",
"intestinal cancer",
"small intestine cancer",
"female reproductive organ cancer",
"male reproductive organ cancer",
"vulvar cancer",
"testicular germ cell tumor",
"urogenital cancer",
"squamous cell cancer",
"head and neck cancer",
"malignant tumor of floor of mouth",
"nasal cavity cancer", #? not sure if should be somewhere else ..
"malignant lymphoid tumor",
"neuroendocrine tumor",
"lymphatic system cancer",
"childhood cancer" #? maybe sort furtrher
)
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms,
pattern = vec_to_grep_pattern(other_malignant_terms),
# pattern = paste0(other_malignant_terms, collapse = "(?=,|$)|\\b"),
"other malignant neoplasms"
)
)
gwas_study_info |>
filter(grepl(vec_to_grep_pattern("other malignant neoplasms"),
l2_all_disease_terms,
perl = T)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
other_neoplasm_terms <- c("clonal hematopoiesis",
"neoplasm",
"benign neoplasm")
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = vec_to_grep_pattern(other_neoplasm_terms),
"other neoplasms"
)
)
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
case_when(
l2_all_disease_terms == "benign neoplasm" ~ "other neoplasms",
TRUE ~ l2_all_disease_terms
)
)
unknown_sig_terms <- c("intracranial germ cell tumor",
"bladder tumor",
"dysplasia of cervix")
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = vec_to_grep_pattern(unknown_sig_terms),
"other neoplasms"
)
)
gwas_study_info |>
filter(grepl(vec_to_grep_pattern("other neoplasms"),
l2_all_disease_terms,
perl = T)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
gwas_study_info |>
filter(grepl(vec_to_grep_pattern("rheumatic heart disease"),
l2_all_disease_terms,
perl = T)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
# add chronic ischemic heart disease
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = vec_to_grep_pattern("chronic ischemic heart disease"),
"ischemic heart disease"
)
)
# unsure --- to double check: aortic atherosclerosis
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = vec_to_grep_pattern("aortic atherosclerosis"),
"ischemic heart disease"
)
)
gwas_study_info |>
filter(grepl(vec_to_grep_pattern("ischemic heart disease"),
l2_all_disease_terms,
perl = T)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
gwas_study_info |>
filter(grepl(vec_to_grep_pattern("stroke"),
l2_all_disease_terms,
perl = T)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
gwas_study_info |>
filter(grepl(vec_to_grep_pattern("hypertensive heart disease"),
l2_all_disease_terms,
perl = T)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
gwas_study_info |>
filter(grepl(vec_to_grep_pattern("heart valve disease"),
l2_all_disease_terms,
perl = T)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
gwas_study_info = gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = vec_to_grep_pattern("heart valve disease"),
"non-rheumatic valvular heart disease"
)
)
gwas_study_info |>
filter(grepl(vec_to_grep_pattern("non-rheumatic valvular heart disease"),
l2_all_disease_terms,
perl = T)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = vec_to_grep_pattern(c("cardiomyopathy",
"myocarditis")),
"cardiomyopathy and myocarditis"
)
)
gwas_study_info |>
filter(grepl(vec_to_grep_pattern("cardiomyopathy and myocarditis"),
l2_all_disease_terms,
perl = T)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
gwas_study_info |>
filter(grepl(vec_to_grep_pattern("pulmonary arterial hypertension"),
l2_all_disease_terms,
perl = T)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
afib_terms <- c("atrial fibrillation",
"atrial flutter",
"post-operative atrial fibrillation")
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = vec_to_grep_pattern(afib_terms),
"atrial fibrillation and flutter"
)
)
gwas_study_info |>
filter(grepl(vec_to_grep_pattern("atrial fibrillation and flutter"),
l2_all_disease_terms,
perl = T)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
url <- "http://www.ebi.ac.uk/ols4/api/ontologies/cvdo/terms/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FDOID_3627/descendants"
aortic_aneurysm_terms <- get_descendants(url)
gwas_study_info = gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = vec_to_grep_pattern(aortic_aneurysm_terms),
"aortic aneurysm"
)
)
gwas_study_info |>
filter(grepl(vec_to_grep_pattern("aortic aneurysm"),
l2_all_disease_terms,
perl = T
)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
lower_extremity_peripheral_arterial_disease_terms <- c("raynaud disease"
)
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = vec_to_grep_pattern(lower_extremity_peripheral_arterial_disease_terms),
"lower extremity peripheral arterial disease"
)
)
gwas_study_info |>
filter(grepl(vec_to_grep_pattern("lower extremity peripheral arterial disease"),
l2_all_disease_terms,
perl = T)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
gwas_study_info |>
filter(grepl(vec_to_grep_pattern("endocarditis"),
l2_all_disease_terms,
perl = T)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
other_cardiovascular_terms <- c("tachycardia",
"other cardiac arrhythmias",
"heart block",
"carotid artery disease",
"hypertension",
"pericarditis",
"phlebitis", # ICD9 - 451
"coronary artery calcification",
"arterial occlusion",
"other vascular disorders",
"congestive heart failure",
"heart failure",
"thrombotic diseas",
"arterial embolism",
"cardiac embolism",
"venus embolism",
"venus thrombosis",
"pulmonary embolism",
"arterial thrombosis",
"thromboembolism",
"vascular insufficiency",
"brain infarction",
"heart murmur",
"deep vein thrombosis"
)
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = vec_to_grep_pattern(other_cardiovascular_terms),
"other cardiovascular and circulatory diseases"
)
)
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = "(?<=^|, ) other vascular disorders(?=,|$)",
"other cardiovascular and circulatory diseases"
)
)
gwas_study_info |>
filter(grepl(vec_to_grep_pattern("other cardiovascular and circulatory diseases"),
l2_all_disease_terms,
perl = T)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
gwas_study_info |>
filter(grepl(vec_to_grep_pattern("chronic obstructive pulmonary disease"),
l2_all_disease_terms,
perl = T)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
gwas_study_info |>
filter(grepl(vec_to_grep_pattern("pneumoconiosis"),
l2_all_disease_terms,
perl = T)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
gwas_study_info |>
filter(grepl(vec_to_grep_pattern("asthma"),
l2_all_disease_terms,
perl = T)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
interstitial_lung_disease_terms <- c("pulmonary sarcoidosis",
"interstitial lung disease",
"löfgren syndrome",
"sarcoidosis"
)
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = vec_to_grep_pattern(interstitial_lung_disease_terms),
"interstitial lung disease and pulmonary sarcoidosis"
)
)
gwas_study_info |>
filter(grepl(vec_to_grep_pattern("interstitial lung disease and pulmonary sarcoidosis"),
l2_all_disease_terms,
perl = T)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
gwas_study_info |>
filter(grepl(vec_to_grep_pattern("other chronic respiratory diseases"),
l2_all_disease_terms,
perl = T)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
url <- "http://www.ebi.ac.uk/ols4/api/ontologies/snomed/terms/http%253A%252F%252Fsnomed.info%252Fid%252F328383001/descendants"
chronic_liver_disease_terms <- get_descendants(url)
chronic_liver_disease_terms <- c("primary biliary cirrhosis",
"alcoholic liver cirrhosis",
"chronic hepatitis B virus infection",
"acute-on-chronic liver failure",
"non-alcoholic fatty liver disease",
"cirrhosis of liver",
"primary biliary cirrhosis",
"chronic hepatitis",
"liver disease",
"alcoholic liver disease",
"hepatitis c induced liver cirrhosis",
chronic_liver_disease_terms)
gwas_study_info = gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms,
pattern = vec_to_grep_pattern(chronic_liver_disease_terms),
"cirrhosis and other chronic liver diseases"
)
)
gwas_study_info = gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms,
pattern = "(?<=^|, ) liver disease(?=,|$)",
"cirrhosis and other chronic liver diseases"
)
)
gwas_study_info |>
filter(grepl(vec_to_grep_pattern("cirrhosis and other chronic liver diseases"),
l2_all_disease_terms,
perl = T)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
upper_dig_terms <- c("peptic ulcer diseases",
"peptic ulcer",
"duodenitis",
"gastritis",
"atrophic gastritis",
"gastroesophageal reflux disease")
gwas_study_info = gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms,
pattern = vec_to_grep_pattern(upper_dig_terms),
"upper digestive system diseases"
)
)
gwas_study_info |>
filter(grepl(vec_to_grep_pattern("upper digestive system diseases"),
l2_all_disease_terms,
perl = T)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
gwas_study_info |>
filter(grepl(vec_to_grep_pattern("appendicitis"),
l2_all_disease_terms,
perl = T)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
gwas_study_info = gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms,
pattern = vec_to_grep_pattern(c("paralytic ileus",
"intestinal obstruction")
),
"paralytic ileus and intestinal obstruction"
)
)
gwas_study_info |>
filter(grepl(vec_to_grep_pattern("paralytic ileus and intestinal obstruction"),
l2_all_disease_terms,
perl = T)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
hernia_terms <- c("inguinal hernia",
"femoral hernia",
"abdominal hernia",
"hernia of the abdominal wall",
"hiatus hernia")
gwas_study_info = gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms,
pattern = vec_to_grep_pattern(hernia_terms),
"inguinal femoral and abdominal hernia"
)
)
gwas_study_info |>
filter(grepl(vec_to_grep_pattern("inguinal femoral and abdominal hernia"),
l2_all_disease_terms,
perl = T)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
ibd_terms <- c("crohns disease",
"ulcerative colitis",
"inflammatory bowel disease",
"enteritis",
"gastroenteritis")
gwas_study_info = gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms,
pattern = vec_to_grep_pattern(ibd_terms),
"inflammatory bowel disease"
)
)
gwas_study_info |>
filter(grepl(vec_to_grep_pattern("inflammatory bowel disease"),
l2_all_disease_terms,
perl = T)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
gwas_study_info |>
filter(grepl(vec_to_grep_pattern("vascular intestinal disorders"),
l2_all_disease_terms,
perl = T)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
gal_bile_terms = c("gallbladder disease",
"bile duct disorder",
"biliary tract disease",
"cholelithiasis",
"cholecystitis",
"sclerosing cholangitis",
"gallstones")
gwas_study_info = gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms,
pattern = vec_to_grep_pattern(gal_bile_terms),
"gallbladder and biliary diseases"
)
)
gwas_study_info |>
filter(grepl("gallbladder and biliary diseases",
l2_all_disease_terms,
perl = T)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
gwas_study_info |>
filter(grepl("pancreatitis",
l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
# I84-I84.9,
# K20-K20.9,
# K22-K22.6,
# K22.8-K24,
# K31-K31.8,
# K38-K38.2,
# K57-K62,
# K62.2-K62.6,
# K62.8-K62.9,
# K64-K64.9,
# K66.8, K67,
# K68,
# K77,
# K90-K90.9,
# K92.8,
# K93.8
# 579 - celiac disease
other_digestive_terms <- c("esophagitis",
"eosinophilic esophagitis",
"esophageal ulcer",
# "barretts esophagus",
"diverticulitis",
"celiac disease",
"irritable bowel syndrome",
"anal fissure",
"anal fistula",
#? "anal polyp"
"rectal prolapse",
"rectal abscess",
"hemorrhoid",
"peritonitis"
)
gwas_study_info = gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms,
pattern = vec_to_grep_pattern(other_digestive_terms),
"other digestive diseases"
)
)
gwas_study_info |>
filter(grepl("other digestive diseases",
l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
dementia <- c("alzheimers disease biomarker measurement",
"alzheimers disease neuropathologic change",
"aids dementia",
"dementia",
"frontotemporal dementia",
"lewy body dementia",
"vascular dementia",
"alzheimers disease",
"neurodegenerative disease"
)
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = vec_to_grep_pattern(dementia),
"alzheimer's disease and other dementias"
)
)
gwas_study_info |>
filter(grepl(vec_to_grep_pattern("alzheimer's disease and other dementias"),
l2_all_disease_terms,
perl = T)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = "parkinsons disease",
"parkinson's disease"
)
)
gwas_study_info |>
filter(grepl("parkinson's disease", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = "idiopathic generalized epilepsy",
"idiopathic epilepsy"
)
)
#
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = "rolandic epilepsy",
"idiopathic epilepsy"
)
)
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = "partial epilepsy",
"idiopathic epilepsy"
)
)
gwas_study_info |>
filter(grepl("idiopathic epilepsy", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
gwas_study_info |>
filter(grepl("multiple sclerosis", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
motor_neuron_disease_terms <- c("anterior horn cell disease",
"anterior horn disorder",
"amyotrophic lateral sclerosis",
"motor neuron disease"
)
gwas_study_info = gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms,
pattern = vec_to_grep_pattern(motor_neuron_disease_terms),
"motor neuron disease"
)
)
gwas_study_info |>
filter(grepl(
vec_to_grep_pattern("motor neuron disease"),
l2_all_disease_terms,
perl = T)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
headache_terms <- c("headache disorder",
"cluster headache",
"migraine",
"headache"
)
gwas_study_info = gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms,
#pattern = "\\bheadache disorder\\b|cluster headache\\b|migraine\\b",
pattern = vec_to_grep_pattern(headache_terms),
"headache disorders"
)
)
gwas_study_info |>
filter(grepl("(?<=^|, )headache disorders",
l2_all_disease_terms,
perl = T)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
other_neuro_terms <- c("huntington disease",
"hereditary ataxia",
"torsion dystonia",
"x-linked dystonia-parkinsonism",
"isolated dystonia",
"limb dystonia",
"cervical dystonia",
"myoclonus",
"restless legs syndrome",
"chronic inflammatory demyelinating polyneuropathy",
"demyelinating disease of central nervous system",
"myasthenia gravis",
"complex regional pain syndrome",
"acute transverse myelitis",
"machado-joseph disease",
"degenerative disease of the spinal cord",
"cerebral palsy",
"peripheral neuropathy",
"essential tremor",
"facial nerve disease",
"sporadic amyotrophic lateral sclerosis"
)
gwas_study_info = gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms,
pattern = vec_to_grep_pattern(other_neuro_terms),
"other neurological disorders"
)
)
gwas_study_info |>
filter(grepl(vec_to_grep_pattern("other neurological disorders"),
l2_all_disease_terms,
perl = T)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
gwas_study_info |>
filter(grepl(vec_to_grep_pattern("schizophrenia"),
l2_all_disease_terms,
perl = T)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
depressive_terms <- c("depressive disorder",
"depressive symptom",
"depressive episode",
"major depressive disorders",
"major depressive disorder",
"major depressive episode",
"depressive"
)
gwas_study_info = gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = vec_to_grep_pattern(depressive_terms),
"depressive disorders"
)
) |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = vec_to_grep_pattern("depressive disorder"),
"depressive disorders"
)
)
gwas_study_info |>
filter(grepl(vec_to_grep_pattern("depressive disorders"),
l2_all_disease_terms,
perl = T)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
url <- "http://www.ebi.ac.uk/ols4/api/ontologies/mesh/terms/http%253A%252F%252Fid.nlm.nih.gov%252Fmesh%252FD001008/descendants"
anxiety_terms <- get_descendants(url)
anxiety_terms <- c(anxiety_terms,
"obsessive-compulsive symptom measurement",
"obsessive-compulsive disorder",
"obsessive-compulsive",
"anxiety"
)
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = vec_to_grep_pattern(anxiety_terms),
"anxiety disorders"
)
) |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = "(?<=^|, )anxiety disorder(?=,|$)|(?<=^|, ) anxiety measurement(?=,|$)",
"anxiety disorders"
)
) |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = "(?<=^|, )anxiety(?=,|$)",
"anxiety disorders"
)
)
gwas_study_info |>
filter(grepl(vec_to_grep_pattern("anxiety disorders"),
l2_all_disease_terms,
perl = T)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
gwas_study_info = gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern =
vec_to_grep_pattern(
c("bulimia nervosa",
"anorexia nervosa",
"binge eating",
"eating disorder"
)
),
"eating disorders"
)
) |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = "anorexia",
"eating disorders"
)
)
gwas_study_info |>
filter(grepl(vec_to_grep_pattern("eating disorders"),
l2_all_disease_terms,
perl = T)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
gwas_study_info = gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms,
pattern = vec_to_grep_pattern("autism"),
"autism spectrum disorders"
)
)
gwas_study_info |>
filter(grepl(vec_to_grep_pattern("autism spectrum disorders"),
l2_all_disease_terms,
perl = T)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
gwas_study_info = gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = vec_to_grep_pattern("adhd"),
"attention-deficit/hyperactivity disorder"
)
)
gwas_study_info |>
filter(grepl(vec_to_grep_pattern("attention-deficit/hyperactivity disorder"),
l2_all_disease_terms,
perl = T)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
gwas_study_info |>
filter(grepl(vec_to_grep_pattern("conduct disorder"),
l2_all_disease_terms,
perl = T)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
terms <- c("developmental disability",
"dyslexia")
gwas_study_info = gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = vec_to_grep_pattern(terms),
"idiopathic developmental intellectual disability"
)
)
gwas_study_info |>
filter(grepl(vec_to_grep_pattern("idiopathic developmental intellectual disability"),
l2_all_disease_terms,
perl = T)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
url <- "http://www.ebi.ac.uk/ols4/api/ontologies/mondo/terms/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FMONDO_0002028/descendants"
personality_disorders <- get_descendants(url)
gwas_study_info = gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms,
pattern = vec_to_grep_pattern(personality_disorders),
"personality disorders"
)
)
url <- "http://www.ebi.ac.uk/ols4/api/ontologies/efo/terms/http%253A%252F%252Fwww.ebi.ac.uk%252Fefo%252FEFO_0004247/descendants"
mood_disorders <- get_descendants(url)
gwas_study_info = gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms,
pattern = vec_to_grep_pattern(mood_disorders),
"mood disorder"
)
)
url <- "http://www.ebi.ac.uk/ols4/api/ontologies/doid/terms/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FDOID_535/descendants"
sleep_disorders <- get_descendants(url)
url <- "http://www.ebi.ac.uk/ols4/api/ontologies/efo/terms/http%253A%252F%252Fwww.ebi.ac.uk%252Fefo%252FEFO_0008568/descendants"
other_sleep_disorders <- get_descendants(url)
sleep_disorders <- c(sleep_disorders,
other_sleep_disorders)
sleep_disorders <- str_length_sort(sleep_disorders)
gwas_study_info = gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms,
pattern = vec_to_grep_pattern(sleep_disorders),
"sleep disorders"
)
)
other_mental_disorders <- c("manic or hypomanic episode",
"mental or behavioural disorder",
"mental disorder",
"post-traumatic stress disorder",
"stress-related disorder",
"acute stress reaction",
"occupation-related stress disorder",
"psychotic symptom",
"psychosis",
"psychiatric disorder",
"personality disorders",
"personality disorder",
"mood disorder",
"sleep disorders",
"sleep disorder",
"mixed anxiety disorders and depressive disorders",
"emotional symptom",
"dissociative disorder",
"hallucinations",
"somatoform disorder",
"schizoaffective disorder",
"phobic disorder",
"psychotic"
)
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = vec_to_grep_pattern(other_mental_disorders),
"other mental disorders"
)
)
gwas_study_info |>
filter(grepl(vec_to_grep_pattern("other mental disorders"),
l2_all_disease_terms,
perl = T)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms,
pattern =
vec_to_grep_pattern(
c("alcohol-related disorders",
"alcohol and nicotine codependence",
"alcohol use disorder"
)),
"alcohol use disorders"
)
)
gwas_study_info |>
filter(grepl(vec_to_grep_pattern("alcohol use disorders"),
l2_all_disease_terms,
perl = T)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = vec_to_grep_pattern(
c("opioid-related disorders",
"opioid dependence",
"opioid use disorder"
)
),
"opioid use disorders"
)
)
gwas_study_info |>
filter(grepl(vec_to_grep_pattern("opioid use disorders"),
l2_all_disease_terms,
perl = T)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
gwas_study_info = gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = vec_to_grep_pattern("cocaine-related disorders"),
"cocaine use disorders"
)
)
gwas_study_info |>
filter(grepl(vec_to_grep_pattern("cocaine use disorders"),
l2_all_disease_terms,
perl = T)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = "methamphetamine",
"amphetamine"
)
)
gwas_study_info |>
filter(grepl("amphetamine",
l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = vec_to_grep_pattern("cannabis dependence"),
"cannabis use disorders"
)
)
gwas_study_info |>
filter(grepl(vec_to_grep_pattern("cannabis use disorders"),
l2_all_disease_terms,
perl = T)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
other_drug_use_terms <- c("heroin dependence",
"drug dependence",
"nictone dependence",
"substance abuse",
"drug misuse",
"nicotine-related disorders"
)
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = vec_to_grep_pattern(other_drug_use_terms),
"other drug use disorders"
)
)
gwas_study_info |>
filter(grepl(vec_to_grep_pattern("other drug use disorders"),
l2_all_disease_terms,
perl = T)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = vec_to_grep_pattern("type 1 diabetes mellitus"),
"diabetes mellitus type 1"
)
)
gwas_study_info |>
filter(grepl(vec_to_grep_pattern("diabetes mellitus type 1"),
l2_all_disease_terms,
perl = T)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = vec_to_grep_pattern("type 2 diabetes mellitus"),
"diabetes mellitus type 2"
)
)
gwas_study_info |>
filter(grepl("diabetes mellitus type 2",
l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
chronic_kidney_disease <- c("cystic kidney disease")
gwas_study_info = gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = vec_to_grep_pattern(chronic_kidney_disease),
"chronic kidney disease"
)
)
gwas_study_info |>
filter(grepl(vec_to_grep_pattern("chronic kidney disease"),
l2_all_disease_terms,
perl = T)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
glomerulonephritis_terms <- c("chronic glomerulonephritis",
"membranous glomerulonephritis",
"proliferative glomerulonephritis")
gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = vec_to_grep_pattern(glomerulonephritis_terms),
"glomerulonephritis"
)
)
gwas_study_info |>
filter(grepl(vec_to_grep_pattern("acute glomerulonephritis"),
l2_all_disease_terms,
perl = T)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
gwas_study_info |>
filter(grepl(vec_to_grep_pattern("dermatitis"),
l2_all_disease_terms,
perl = T)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
gwas_study_info |>
filter(grepl(vec_to_grep_pattern("psoriasis"),
l2_all_disease_terms,
perl = T)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
bacterial_skin_disease_terms <- c("staphylococcal skin infections",
"skin and soft tissue staphylococcus aureus infection",
"cellulitis"
# "skin infection"
)
gwas_study_info = gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = vec_to_grep_pattern(bacterial_skin_disease_terms),
"bacterial skin diseases"
)
)
gwas_study_info |>
filter(grepl(vec_to_grep_pattern("bacterial skin diseases"),
l2_all_disease_terms,
perl = T)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
gwas_study_info |>
filter(grepl(vec_to_grep_pattern("scabies"),
l2_all_disease_terms,
perl = T)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
fungal_skin_disease_terms <- c("tinea",
"dermatomycosis")
gwas_study_info = gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = vec_to_grep_pattern(fungal_skin_disease_terms),
"fungal skin diseases"
)
)
gwas_study_info |>
filter(grepl("fungal skin diseases", l2_all_disease_terms)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
acne_terms <- c("sapho syndrome")
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = vec_to_grep_pattern(acne_terms),
"acne vulgaris"
)
)
gwas_study_info = gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = vec_to_grep_pattern("acne"),
"acne vulgaris"
)
)
gwas_study_info |>
filter(grepl(vec_to_grep_pattern("acne vulgaris"),
l2_all_disease_terms,
perl = T)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
# also add prurigo to puritus
gwas_study_info = gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = vec_to_grep_pattern("prurigo"),
"pruritus"
)
)
gwas_study_info |>
filter(grepl(vec_to_grep_pattern("pruritus"),
l2_all_disease_terms,
perl = T)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
gwas_study_info |>
filter(grepl(vec_to_grep_pattern("urticaria"),
l2_all_disease_terms,
perl = T)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
gwas_study_info |>
filter(grepl(vec_to_grep_pattern("decubitus ulcer"),
l2_all_disease_terms,
perl = T)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
other_skin_disease_terms <- c("sebaceous gland disease",
"rosacea",
"erythematosquamous dermatosis",
"dry skin",
"skin tags",
"dermatochalasis",
"epidermal thickening",
"epidermal inclusion cyst",
"cutaneous lupus erythematosus",
"androgenetic alopecia",
"chemotherapy-induced alopecia",
"cutaneous leishmaniasis",
"acanthosis nigricans",
"stevens-johnson syndrome",
"keloid")
gwas_study_info = gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = vec_to_grep_pattern(other_skin_disease_terms),
"other skin and subcutaneous diseases"
)
)
gwas_study_info |>
filter(grepl(vec_to_grep_pattern("other skin and subcutaneous diseases"),
l2_all_disease_terms,
perl = T)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
vision_loss_terms <- c("blindness",
"color vision disorder",
"vision disorder",
"visuospatial impairment",
"pathological blindness and vision loss",
"visual impairment",
"myopia",
"refractive error",
"retinopathy",
"hyperopia",
"astigmatism",
"corneal astigmatism",
"presbyopia",
"anisometropia",
"esotropia",
"non-accomodative esotropia",
"accommodative esotropia",
"abnormality of refraction",
"abnormality of vision",
"age-related macular degeneration",
"degeneration of macula and posterior pole",
"age-related cataract",
"retinal degeneration",
"retinal drusen",
"cataract")
gwas_study_info = gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = vec_to_grep_pattern(vision_loss_terms),
"blindness and vision loss"
)
)
gwas_study_info |>
filter(grepl(vec_to_grep_pattern("blindness and vision loss"),
l2_all_disease_terms,
perl = T)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
other_sense_terms <- c("abnormality of the sense of smell",
"disturbances of sensation of smell and taste",
"tinnitus",
"disturbance of skin sensation",
"disturbances to senses",
"vogt-koyanagi-harada disease",
"pathological myopia",
"lacrimal apparatus disease",
"keratoconus"
)
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = vec_to_grep_pattern(other_sense_terms),
"other sense organ diseases"
)
)
gwas_study_info |>
filter(grepl(vec_to_grep_pattern("other sense organ diseases"),
l2_all_disease_terms,
perl = T)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
# add juvenile idiopathic arthritis to rheumatoid arthritis
gwas_study_info = gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = vec_to_grep_pattern("juvenile idiopathic arthritis"),
"rheumatoid arthritis"
)
)
# rheumatoid factor-negative juvenile idiopathic arthritis
gwas_study_info = gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = vec_to_grep_pattern("rheumatoid factor-negative juvenile idiopathic arthritis"),
"rheumatoid arthritis"
)
)
gwas_study_info |>
filter(grepl(vec_to_grep_pattern("rheumatoid arthritis"),
l2_all_disease_terms,
perl = T)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
gwas_study_info |>
filter(grepl(vec_to_grep_pattern("osteoarthritis"),
l2_all_disease_terms,
perl = T)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
gwas_study_info |>
filter(grepl(vec_to_grep_pattern("low back pain"),
l2_all_disease_terms,
perl = T)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
gwas_study_info |>
filter(grepl(vec_to_grep_pattern("neck pain"),
l2_all_disease_terms,
perl = T)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
gwas_study_info |>
filter(grepl(vec_to_grep_pattern("gout"),
l2_all_disease_terms,
perl = T)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
other_musculo_terms <- c("infectious arthritis",
"scoliosis",
"fasciitis",
"plica syndrome",
"panniculitis",
"sciatica",
"polymyalgia rheumatica",
"acquired musculoskeletal deformity",
"pyogenic arthritis",
"reactive arthritis",
"acroiliac arthritis",
"arthritis",
"systemic lupus erythematosus",
"musculoskeletal system disease",
"spondylosis",
"osteonecrosis",
"myalgia"
)
gwas_study_info = gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = vec_to_grep_pattern(other_musculo_terms),
"other musculoskeletal disorders"
)
)
gwas_study_info |>
filter(grepl(vec_to_grep_pattern("other musculoskeletal disorders"),
l2_all_disease_terms,
perl = T)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
gwas_study_info = gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(
l2_all_disease_terms,
vec_to_grep_pattern("orofacial cleft"),
"orofacial clefts"
)
)
gwas_study_info |>
filter(grepl(vec_to_grep_pattern("orofacial clefts"),
l2_all_disease_terms,
perl = T)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
gwas_study_info |>
filter(grepl(vec_to_grep_pattern("down syndrome"),
l2_all_disease_terms,
perl = T)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
gwas_study_info |>
filter(grepl(vec_to_grep_pattern("klinefelter syndrome"),
l2_all_disease_terms,
perl = T)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
icd10 <- readxl::read_xlsx(here::here("data/icd/cdc_valid_icd10_Sep_23_2025.xlsx"))
icd10 <- icd10 |>
rename_with(~ tolower(gsub(" ", "_", .x)))
# Q87-Q87.8, Q91-Q93.9, Q95-Q95.9, Q97-Q97.9, Q99-Q99.8
other_chrom_abn_icd10 <-
c(paste0("Q", seq(87, 87.8, by = 0.1) * 10),
paste0("Q", seq(91, 93.9, by = 0.1) * 10),
paste0("Q", seq(95, 95.9, by = 0.1) * 10),
paste0("Q", seq(97, 97.9, by = 0.1) * 10),
paste0("Q", seq(99, 99.8, by = 0.1) * 10)
)
other_chrom_abn_terms <-
icd10 |>
filter(grepl(paste0(other_chrom_abn_icd10, collapse = "|"), code)) |>
pull(`short_description_(valid_icd-10_fy2025)`) |>
unique()
other_chrom_abn <- c("marfan syndrome",
"fragile x syndrome",
"22q11.2 deletion syndrome",
"chromosomal disorder")
url <- "http://www.ebi.ac.uk/ols4/api/ontologies/mondo/terms/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FMONDO_0019040/descendants"
chromosomal_disorders <- get_descendants(url)
other_chrom_abn <- c(other_chrom_abn,
chromosomal_disorders)
gwas_study_info = gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms,
vec_to_grep_pattern(c(other_chrom_abn_terms, other_chrom_abn)),
"other chromosomal abnormalities"
)
)
# Q20-Q28.9
congen_heart_icd10 <-
paste0("Q", seq(20, 28.9, by = 0.1) * 10)
congen_terms <-
icd10 |>
filter(grepl(paste0(congen_heart_icd10, collapse = "|"), code)) |>
pull(`short_description_(valid_icd-10_fy2025)`) |>
unique()
gwas_study_info = gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms,
"congenital anomaly of cardiovascular system",
"congenital heart abnormalities"
)
)
# Q65-Q79, Q79.6-Q79.9
congen_musculo_icd10 <-
c(paste0("Q", seq(65, 79, by = 0.1) * 10),
paste0("Q", seq(79.6, 79.9, by = 0.1) * 10)
)
congen_musculo_terms <-
icd10 |>
filter(grepl(paste0(congen_musculo_icd10, collapse = "|"), code)) |>
pull(`short_description_(valid_icd-10_fy2025)`) |>
unique()
congen_muscle <- c("osteochondrodysplasia",
"congenital deformities of limbs",
"lower limb asymmetry",
"familial clubfoot with or without associated lower limb anomalies",
"abnormality of limbs",
"abnormal foot morphology",
congen_musculo_terms
)
gwas_study_info = gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms,
vec_to_grep_pattern(c(congen_musculo_terms, congen_muscle)),
"congenital musculoskeletal and limb anomalies"
)
)
gwas_study_info |>
filter(grepl(vec_to_grep_pattern("congenital musculoskeletal and limb anomalies"),
l2_all_disease_terms,
perl = T)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
# P96.0, Q50-Q60.6, Q63-Q64.9
urogenital_icd10 <-
c("P960",
paste0("Q", seq(50, 60.6, by = 0.1) * 10),
paste0("Q", seq(63, 64.9, by = 0.1) * 10)
)
urogenital_terms <-
icd10 |>
filter(grepl(paste0(urogenital_icd10, collapse = "|"), code)) |>
pull(`short_description_(valid_icd-10_fy2025)`) |>
unique()
urogenital_terms <-
c("abnormal morphology of female internal genitalia",
"abnormality of the genital system",
"functional abnormality of the bladder",
"congenital anomaly of kidney and urinary tract",
"bladder exstrophy", # ICD9:753.5
tolower(urogenital_terms))
gwas_study_info = gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms,
vec_to_grep_pattern(urogenital_terms),
"urogenital congenital anomalies"
)
)
gwas_study_info |>
filter(grepl(vec_to_grep_pattern("urogenital congenital anomalies"),
l2_all_disease_terms,
perl = T)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
# Q38-Q45.9, Q79.0-Q79.5
digestive_icd10 <-
c(paste0("Q", seq(38, 45.9, by = 0.1) * 10),
paste0("Q", seq(79, 79.5, by = 0.1) * 10)
)
digestive_terms <-
icd10 |>
filter(grepl(paste0(digestive_icd10, collapse = "|"), code)) |>
pull(`short_description_(valid_icd-10_fy2025)`) |>
unique()
digestive_terms <- tolower(digestive_terms)
digestive_terms <- stringr::str_remove_all(digestive_terms, ", unspecified$")
digestive_terms <- c(
"hirschsprung disease",
digestive_terms
)
gwas_study_info = gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms,
vec_to_grep_pattern(tolower(digestive_terms)),
"digestive congenital anomalies"
)
)
gwas_study_info |>
filter(grepl(vec_to_grep_pattern("digestive congenital anomalies"),
l2_all_disease_terms,
perl = T)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
other_congen <- c("abnormality of the respiratory system")
# N10-N12.9, N13.6, N15, N15.1-N16.8, N30-N30.3, N30.8-N30.9, N34-N34.3, N39.0-N39.2
uti_icd10 <-
c(paste0("N", seq(10, 12.9, by = 0.1) * 10),
"N136",
paste0("N", seq(151, 168, by = 0.1) * 10),
paste0("N", seq(30, 30.3, by = 0.1) * 10),
paste0("N", seq(308, 309, by = 0.1) * 10),
paste0("N", seq(34, 34.3, by = 0.1) * 10),
paste0("N", seq(390, 392, by = 0.1) * 10)
)
uti_terms <-
icd10 |>
filter(grepl(paste0(uti_icd10, collapse = "|"), code)) |>
pull(`short_description_(valid_icd-10_fy2025)`) |>
unique()
uti_terms <-
c("acute cystitis",
"chronic cystitis",
"chronic interstitial cystitis",
"urinary tract infection",
"interstitial nephritis",
"pyelonephritis",
"urethritis",
"cystitis",
uti_terms
)
gwas_study_info = gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(
l2_all_disease_terms,
vec_to_grep_pattern(uti_terms),
"urinary tract infections and interstitial nephritis"
)
)
# N20-N23.0
urolithiasis_icd10 <-
paste0("N", seq(20, 23.0, by = 0.1) * 10)
urolithiasis_terms <-
icd10 |>
filter(grepl(paste0(urolithiasis_icd10, collapse = "|"), code)) |>
pull(`short_description_(valid_icd-10_fy2025)`) |>
unique()
urolithiasis_terms <- tolower(urolithiasis_terms)
urolithiasis_terms <- stringr::str_remove_all(urolithiasis_terms, ", unspecified$")
urolithiasis_terms <-
c("lower urinary tract calculus",
"urinary calculus",
"bladder calculus",
"calcium oxalate nephrolithiasis",
"calcium phosphate nephrolithiasis",
"uric acid nephrolithiasis",
"nephrolithiasis",
"renal colic", #icd-9 788.0
"ureterolithiasis",
urolithiasis_terms
)
gwas_study_info = gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(
l2_all_disease_terms,
vec_to_grep_pattern(urolithiasis_terms),
"urolithiasis"
)
)
gwas_study_info |>
filter(grepl(vec_to_grep_pattern("male infertility"),
l2_all_disease_terms,
perl = T)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct()
# N25-N28.1, N29-N29.8, N31-N32.0, N32.3-N32.4, N36-N36.9, N39, N41-N41.9, N44-N44.0, N45-N45.9, N49-N49.9
urine_icd10 <-
c(paste0("N", seq(25, 28.1, by = 0.1) * 10),
paste0("N", seq(29, 29.8, by = 0.1) * 10),
paste0("N", seq(31, 32.0, by = 0.1) * 10),
paste0("N", seq(32.3, 32.4, by = 0.1) * 10),
paste0("N", seq(36, 36.9, by = 0.1) * 10),
"N39",
paste0("N", seq(41, 41.9, by = 0.1) * 10),
paste0("N", seq(44, 44.0, by = 0.1) * 10),
paste0("N", seq(45, 45.9, by = 0.1) * 10),
paste0("N", seq(49, 49.9, by = 0.1) * 10)
)
urine_terms <-
icd10 |>
filter(grepl(paste0(urine_icd10, collapse = "|"), code)) |>
pull(`short_description_(valid_icd-10_fy2025)`) |>
unique() |>
tolower()
urine_terms <- stringr::str_remove_all(urine_terms, ", unspecified$")
urine_terms <- stringr::str_remove_all(urine_terms, ", site not specified$")
urinary_diseases_terms <- c("urinary incontinence",
"stress urinary incontinence",
"urgency urinary incontinence",
"urinary system disease",
"bladder neck obstruction",
"urinary tract obstruction",
"urethral disease",
"urethral syndrome",
"uterine inflammatory disease",
"enuresis",
"neurogenic bladder", # icd9 596.54
"bladder diverticulum", # icd9 596.3
"priapism",
"hydronephrosis",
urine_terms
)
urinary_diseases_terms <- str_length_sort(urinary_diseases_terms)
gwas_study_info = gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(
l2_all_disease_terms,
vec_to_grep_pattern(urinary_diseases_terms),
"other urinary diseases"
)
)
# D25-D26, D28.2
icd10_uterine_fibroids <-
c(paste0("D", seq(25, 26, by = 0.1) * 10),
"D282"
)
icd10_uterine_fibroids_terms <-
icd10 |>
filter(grepl(paste0(icd10_uterine_fibroids, collapse = "|"), code)) |>
pull(`short_description_(valid_icd-10_fy2025)`) |>
unique()
uterine_fibroid_terms <-
c("uterine leiomyoma",
"uterine fibroid")
gwas_study_info = gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms,
pattern = vec_to_grep_pattern(uterine_fibroid_terms),
"uterine fibroids"
)
)
gwas_study_info |>
filter(grepl(vec_to_grep_pattern("uterine fibroids"),
l2_all_disease_terms,
perl = T)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
gwas_study_info |>
filter(grepl(vec_to_grep_pattern("polycystic ovary syndrome"),
l2_all_disease_terms,
perl = T)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
gwas_study_info |>
filter(grepl(
vec_to_grep_pattern("female infertility"),
l2_all_disease_terms,
perl = T)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct()
endo_terms <- c("ovarian endometriosis",
"endometriosis of pelvic peritoneum"
)
gwas_study_info = gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms,
pattern = vec_to_grep_pattern(endo_terms),
"endometriosis"
)
)
gwas_study_info |>
filter(grepl(vec_to_grep_pattern("endometriosis"),
l2_all_disease_terms,
perl = T)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
fem_genital_prolapse_terms <- c("uterine prolapse",
"cystocele",
"rectocele",
"enterocele"
)
gwas_study_info = gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms,
pattern = vec_to_grep_pattern(fem_genital_prolapse_terms),
"genital prolapse"
)
)
gwas_study_info |>
filter(grepl(vec_to_grep_pattern("genital prolapse"),
l2_all_disease_terms,
perl = T)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
diseases <- stringr::str_split(pattern = ", ",
gwas_study_info$l2_all_disease_terms) |>
unlist() |>
stringr::str_trim()
# N72-N72.0, N75-N77.8, N83-N83.9
other_gyno_terms <- c("cervicitis", #N72
"bartholin gland disease", # N75
"bartholin duct cyst",
"vaginitis", # N76
"vaginal inflammation",
"postmenopausal atrophic vaginitis",
"atrophic vaginitis",
"vulvovaginitis",
"ulceration of vulva", # N77
"ovarian cyst", # N83
"follicular cyst",
"noninflammatory disorder of ovary fallopian tube and broad ligament",
"cervical disorder",
"dysmenorrhea",
"dyspareunia"
)
# pregnancy_terms <- grep("pregnancy", diseases, value = T)
# gyno_terms <- c(
# "female reproductive system disease",
# "female genital tract fistula",
# "placenta disease",
# "ovarian gynecological diseases",
# "vaginal disorder",
#
#
# "abnormal delivery",
# pregnancy_terms)
gwas_study_info = gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms,
pattern = vec_to_grep_pattern(other_gyno_terms),
"other gynecological diseases"
)
)
hemoglobinopathies_terms <- c("sickle cell disease and related diseases",
"thalassemia",
"inherited hemoglobinopathy",
"hemoglobin e disease"
)
hemopath_hemo_anemias <- c(hemoglobinopathies_terms, "hemolytic anemia")
gwas_study_info = gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(
l2_all_disease_terms,
vec_to_grep_pattern(hemopath_hemo_anemias),
"hemoglobinopathies and hemolytic anemias"
)
)
gwas_study_info |>
filter(grepl(vec_to_grep_pattern("hemoglobinopathies and hemolytic anemias"),
l2_all_disease_terms,
perl = T)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
endo_terms <- c("ovarian dysfunction",
"ovarian disease",
"menstrual disorder",
# "premenstral tension" #icd9 625.4, icd10 N94.3,
"adrenocortical insufficiency", # ICD9:255.4, ICD10:E27.1 / ICD10:E27.4
"cushing syndrome", # ICD9:255.0, ICD10:E24
"hyperaldosteronism", # ICD10CM:E26, ICD9:255.1
"delayed puberty", #ICD10:E30.1, ICD9:259.1
"central precocious puberty", #?ICD10: E30.1
"endocrine system disease", # ????
"primary aldosteronism"
)
gwas_study_info = gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms,
pattern = vec_to_grep_pattern(endo_terms),
"other endocrine disorders"
)
)
thyroid_terms <- c("congenital hypothyroidism due to developmental anomaly", #ICD-10:E03.1
"hypothyroidism",
"hyperthyroidism",
"myxedema",
"nontoxic goiter",#ICD-10: E04
"thyrotoxicosis", #ICD-10: E05
"toxic nodular goiter", # ICD9:242.3)
"goiter",
"thyroiditis", #ICD-10: E06, IDC9:245
"hashimotos thyroiditis", #ICD9:245.2
"graves disease", #ICD9:242.0,
"autoimmune thyroid disease" #ICD10:E06.3,
)
gwas_study_info = gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms,
pattern = vec_to_grep_pattern(thyroid_terms),
"thyroid disorders"
)
)
other_blood_disorders <- c("neutropenia",
"cryoglobulinemia",
"acquired coagulation factor deficiency", #ICD10: D68.4
"von willebrand disease", # ICD9 286.4
"hemophilia a", # Congenital factor VIII disorder - ICD9 286.0
"qualitative platelet defect",
"blood coagulation disease",
"hematologic disease",
"thrombophilia"
)
gwas_study_info = gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms,
pattern = vec_to_grep_pattern(other_blood_disorders),
"other blood disorders"
)
)
metabolism_disorders <- c("hypoglycemia",
"hypoparathyroidism",
"parathyroid disease", # ICD9: 252
"primary hyperparathyroidism", # ICD10 E21.0
"hyperparathyroidism", #ID20 E20
# "secondary hyperparathyroidism of renal origin" not incl, as otherwise specified in ICD10CM:N25.81
"obesity",
"metabolic syndrome",
"metabolic syndrome x",
"inborn errors of metabolism",
"metabolic disease",
"mineral metabolism disease", # ICD9:275.8, ICD9:275.9, ICD10:E83
"acidosis", # ICD9:276.2
"disorder of acid-base balance", # ? ICD10-cm E87.8
"bilirubin metabolism disease",
"cystic fibrosis", # ICD9:277.0, ICD10:E84
"hyperlipidemia",
"hypovolemia",
"rare dyslipidemia"
)
gwas_study_info = gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms,
pattern = vec_to_grep_pattern(metabolism_disorders),
"other metabolic disorders"
)
)
other_hemo <- c("aplastic anemia",
"pure red cell aplasia",
"severe malarial anemia",
"anemia")
gwas_study_info = gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms,
pattern = vec_to_grep_pattern(other_hemo),
"other hemoglobinopathies and hemolytic anemias"
)
)
oral_disorders_terms <- c("dental caries",
"tooth disease",
"toothache",
"periodontal disease",
"tooth agenesis"
)
gwas_study_info = gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(
l2_all_disease_terms,
vec_to_grep_pattern(oral_disorders_terms),
"oral disorders"
)
)
gwas_study_info |>
filter(grepl(vec_to_grep_pattern("oral disorders"),
l2_all_disease_terms,
perl = T)) |>
select(all_disease_terms, l2_all_disease_terms) |>
distinct() |>
head()
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = "disorderss",
"disorders"
)
)
gwas_study_info =
gwas_study_info |>
mutate(l2_all_disease_terms =
stringr::str_replace_all(l2_all_disease_terms ,
pattern = "anxiety disorders disorderss",
"anxiety disorders"
)
)
gbd_data <- data.table::fread(here::here("data/gbd/ihme_gbd_2019_global_disease_burden_rate_all_ages.csv"))
gbd_data$cause <- stringr::str_remove_all(gbd_data$cause, ",")
diseases <- stringr::str_split(pattern = ", ",
gwas_study_info$l2_all_disease_terms[gwas_study_info$l2_all_disease_terms != ""]) |>
unlist() |>
stringr::str_trim()
gbd_data$cause[!tolower(gbd_data$cause) %in% unique(diseases)] |> sort() |> unique()
gbd_data =
gbd_data |>
mutate(cause = tolower(cause))
gwas_disease_traits = data.frame(cause = diseases)
# gwas_study_info |>
# filter(DISEASE_STUDY == T) |>
# select(all_disease_terms, l2_all_disease_terms, cause = l2_all_disease_terms) |>
# distinct()
left_join(gwas_disease_traits,
gbd_data) |>
head()
gwas_study_info |> select(cause = l2_all_disease_terms) |>
distinct() |>
left_join(gbd_data) |>
head()
diseases <- stringr::str_split(pattern = ", ",
gwas_study_info$l2_all_disease_terms[gwas_study_info$l2_all_disease_terms != ""]) |>
unlist() |>
stringr::str_trim()
length(unique(diseases))
# make frequency table
freq <- table(as.factor(diseases))
# sort in decreasing order
freq_sorted <- sort(freq, decreasing = TRUE)
# show top N, e.g. top 10
head(freq_sorted, 10)
gwas_study_info <- fwrite(gwas_study_info,
here::here("output/gwas_cat/gwas_study_info_trait_group_l2.csv"))
sessionInfo()