Last updated: 2025-09-10
Checks: 7 0
Knit directory:
genomics_ancest_disease_dispar/
This reproducible R Markdown analysis was created with workflowr (version 1.7.1). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.
Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.
Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.
The command set.seed(20220216)
was run prior to running
the code in the R Markdown file. Setting a seed ensures that any results
that rely on randomness, e.g. subsampling or permutations, are
reproducible.
Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.
Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.
Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.
The results in this page were generated with repository version 1679f9d. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.
Note that you need to be careful to ensure that all relevant files for
the analysis have been committed to Git prior to generating the results
(you can use wflow_publish
or
wflow_git_commit
). workflowr only checks the R Markdown
file, but you know if there are other scripts or data files that it
depends on. Below is the status of the Git repository when the results
were generated:
Ignored files:
Ignored: .DS_Store
Ignored: .Rproj.user/
Ignored: data/.DS_Store
Ignored: data/gwas_catalog/
Ignored: output/gwas_cat/
Ignored: output/gwas_study_info_cohort_corrected.csv
Ignored: output/gwas_study_info_trait_corrected.csv
Ignored: output/gwas_study_info_trait_ontology_info.csv
Ignored: output/gwas_study_info_trait_ontology_info_l1.csv
Ignored: output/gwas_study_info_trait_ontology_info_l2.csv
Ignored: output/trait_ontology/
Ignored: renv/
Untracked files:
Untracked: code/get_term_descendants.R
Untracked: data/gbd/
Untracked: data/who/
Unstaged changes:
Modified: analysis/disease_inves_by_ancest.Rmd
Modified: analysis/index.Rmd
Deleted: analysis/level_1_disease_group.Rmd
Modified: analysis/level_2_disease_group.Rmd
Deleted: analysis/non_ontology_trait_collapse.Rmd
Deleted: analysis/trait_ontology_collapse.Rmd
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
These are the previous versions of the repository in which changes were
made to the R Markdown
(analysis/level_1_disease_group_cancer.Rmd
) and HTML
(docs/level_1_disease_group_cancer.html
) files. If you’ve
configured a remote Git repository (see ?wflow_git_remote
),
click on the hyperlinks in the table below to view the files as they
were in that past version.
File | Version | Author | Date | Message |
---|---|---|---|---|
html | 9ca183a | IJbeasley | 2025-09-10 | Build site. |
Rmd | 50ef69d | IJbeasley | 2025-09-10 | Update cancer grouping |
library(dplyr)
library(data.table)
library(stringr)
source(here::here("code/get_term_descendants.R"))
gwas_study_info <- fread(here::here("output/gwas_cat/gwas_study_info_group_l1.csv"))
n_studies_trait = gwas_study_info |>
dplyr::filter(DISEASE_STUDY == T) |>
dplyr::select(l1_all_disease_terms, PUBMED_ID) |>
dplyr::distinct() |>
dplyr::group_by(l1_all_disease_terms) |>
dplyr::summarise(n_studies = dplyr::n()) |>
dplyr::arrange(desc(n_studies))
head(n_studies_trait)
# A tibble: 6 × 2
l1_all_disease_terms n_studies
<chr> <int>
1 type 2 diabetes mellitus 145
2 asthma 131
3 alzheimers disease 124
4 breast cancer 112
5 major depressive disorder 108
6 schizophrenia 103
dim(n_studies_trait)
[1] 2901 2
diseases <- stringr::str_split(pattern = ", ",
gwas_study_info$l1_all_disease_terms[gwas_study_info$l1_all_disease_terms != ""]) |>
unlist() |>
stringr::str_trim()
length(unique(diseases))
[1] 2029
test <- data.frame(trait = unique(diseases))
url <- "http://www.ebi.ac.uk/ols4/api/ontologies/mondo/terms/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FMONDO_0002129/descendants"
bone_cancer_terms <- get_descendants(url)
[1] "Number of terms collected:"
[1] 232
[1] "\n Some example terms"
[1] "cancer affecting bone of limb skeleton"
[2] "bone marrow cancer"
[3] "bone sarcoma"
[4] "primary bone lymphoma"
[5] "adult extraskeletal osteosarcoma"
bone_cancer_terms = stringr::str_replace_all(bone_cancer_terms,
"\\bcarcinoma",
"cancer")
bone_cancer_terms = c("malignant bone neoplasm",
bone_cancer_terms)
gwas_study_info = gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms,
pattern = paste0(bone_cancer_terms, collapse = "|"),
"bone cancer"
)
)
url <- "http://www.ebi.ac.uk/ols4/api/ontologies/doid/terms/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FDOID_4007/descendants"
# maybe do uninary bladder cancer instead
bladder_cancer_terms <- get_descendants(url)
[1] "Number of terms collected:"
[1] 33
[1] "\n Some example terms"
[1] "superficial urinary bladder cancer"
[2] "jewett-marshall bladder cancer"
[3] "urinary bladder small cell neuroendocrine carcinoma"
[4] "bladder urothelial carcinoma"
[5] "bladder squamous cell carcinoma"
bladder_cancer_terms = stringr::str_replace_all(bladder_cancer_terms,
"\\bcarcinoma",
"cancer")
gwas_study_info = gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms ,
pattern = paste0(bladder_cancer_terms, collapse = "|"),
"bladder cancer"
)
)
breast_cancer_terms <- grep("breast cancer", unique(diseases), value = T)
gwas_study_info =
gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms ,
pattern = paste0(breast_cancer_terms, collapse = "|"),
"breast cancer"
)
) |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms ,
pattern = "breast cancer in situ",
"breast cancer"
)
) |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms,
pattern = "invasive lobular cancer",
"breast cancer"
)
)
gwas_study_info = gwas_study_info |>
mutate(l1_all_disease_terms =
str_replace_all(l1_all_disease_terms,
"benign neoplasm of (.*?)(?=,|$)|benign neoplasm of (.*?) (.*?)(?=,|$)",
"benign neoplasm")
)
gwas_study_info = gwas_study_info |>
mutate(l1_all_disease_terms =
str_replace_all(l1_all_disease_terms,
"benign (.*?) neoplasm(?=,|$)|benign (.*?) (.*?) neoplasm(?=,|$)",
"benign neoplasm")
)
gwas_study_info = gwas_study_info |>
mutate(l1_all_disease_terms =
str_replace_all(l1_all_disease_terms,
"(.*?) benign neoplasm(?=,|$)|(.*?) (.*?) neoplasm(?=,|$)",
"benign neoplasm")
)
gwas_study_info = gwas_study_info |>
mutate(l1_all_disease_terms =
str_replace_all(l1_all_disease_terms,
"polyp of (.*?)(?=,|$)|polyp of (.*?) (.*?)(?=,|$)",
"benign neoplasm")
)
gwas_study_info = gwas_study_info |>
mutate(l1_all_disease_terms =
str_replace_all(l1_all_disease_terms,
"(?=,|^)(.*?) polyp(?=,|$)|(?=,|^)(.*?) (.*?) polyp (?=,|$)",
"benign neoplasm")
)
# https://my.clevelandclinic.org/health/diseases/21477-adenomas - benign
other_benign_neoplasms = c("adenomatous colon polyp",
"colorectal adenoma",
"pituitary gland adenoma",
"aldosterone-producing adenoma",
"female genital tract polyp",
"polyp"
)
gwas_study_info = gwas_study_info |>
mutate(l1_all_disease_terms =
str_replace_all(l1_all_disease_terms,
paste0(other_benign_neoplasms, collapse = "|"),
"benign neoplasm")
)
url <- "http://www.ebi.ac.uk/ols4/api/ontologies/efo/terms/http%253A%252F%252Fwww.ebi.ac.uk%252Fefo%252FEFO_0000326/descendants"
cns_cancer_terms <- get_descendants(url)
[1] "Number of terms collected:"
[1] 173
[1] "\n Some example terms"
[1] "malignant jugulotympanic paraganglioma"
[2] "malignant adrenal gland pheochromocytoma"
[3] "central nervous system lymphoma"
[4] "central nervous system embryonal neoplasm"
[5] "oligoastrocytoma"
cns_cancer_terms = stringr::str_replace_all(cns_cancer_terms,
"\\bcarcinoma",
"cancer")
pattern = paste0(cns_cancer_terms, collapse = "|")
gwas_study_info = gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms,
pattern = pattern,
"central nervous system cancer"
)
)
url <- "http://www.ebi.ac.uk/ols4/api/ontologies/doid/terms/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FDOID_4362/descendants"
cervical_cancer_terms <- get_descendants(url)
[1] "Number of terms collected:"
[1] 37
[1] "\n Some example terms"
[1] "cervix endometrial stromal tumor" "cervix melanoma"
[3] "cervical alveolar soft part sarcoma" "epithelioid trophoblastic tumor"
[5] "cervical adenosquamous carcinoma"
cervical_cancer_terms = stringr::str_replace_all(cervical_cancer_terms,
"\\bcarcinoma",
"cancer")
cervical_cancer_terms = c("cervical intraepithelial neoplasia grade 2/3",
"uterine cervical cancer in situ",
cervical_cancer_terms)
gwas_study_info = gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms,
pattern = "uterine cervical cancer in situ",
"cervical cancer"
)
)
gwas_study_info = gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms,
pattern = paste0(cervical_cancer_terms, collapse = "|"),
"cervical cancer"
)
)
gwas_study_info = gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms,
pattern = "uterine cervical cancer in situ",
"cervical cancer"
)
)
url <- "http://www.ebi.ac.uk/ols4/api/ontologies/mondo/terms/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FMONDO_0005575/descendants"
colorectal_cancer_terms <- get_descendants(url)
[1] "Number of terms collected:"
[1] 106
[1] "\n Some example terms"
[1] "colorectal lymphoma" "colorectal carcinoma"
[3] "familial colorectal cancer" "malignant colon neoplasm"
[5] "rectal cancer"
colorectal_cancer_terms = stringr::str_replace_all(colorectal_cancer_terms,
"\\bcarcinoma",
"cancer")
colorectal_cancer_terms= c("metastatic colorectal cancer",
"rectum cancer",
colorectal_cancer_terms)
gwas_study_info =
gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms,
pattern = paste0(colorectal_cancer_terms, collapse = "|"),
"colorectal cancer"
)
) |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms,
pattern = "colorectal mucinous adenocarcinoma",
"colorectal cancer"
)
) |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms,
pattern = "metachronous colorectal adenoma",
"colorectal cancer"
)
)
url <- "http://www.ebi.ac.uk/ols4/api/ontologies/mondo/terms/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FMONDO_0011962/descendants"
endometrial_cancer_terms <- get_descendants(url)
[1] "Number of terms collected:"
[1] 23
[1] "\n Some example terms"
[1] "endometrioid stromal sarcoma"
[2] "endometrial carcinoma"
[3] "endometrioid stromal sarcoma of the cervix"
[4] "uterine corpus endometrial stromal sarcoma"
[5] "endometrioid stromal sarcoma of the vagina"
# also: http://www.ebi.ac.uk/efo/EFO_1001514: endometrial endometrioid carcinoma
endometrial_cancer_terms = c("endometrial endometrioid carcinoma",
endometrial_cancer_terms)
endometrial_cancer_terms = stringr::str_replace_all(endometrial_cancer_terms,
"\\bcarcinoma",
"cancer")
gwas_study_info = gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms ,
pattern = paste0(endometrial_cancer_terms, collapse = "|"),
"endometrial cancer"
)
)
esophageal_cancer_terms <- c("esophageal adenocarcinoma",
"esophageal squamous cell cancer")
gwas_study_info = gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms ,
pattern = paste0(esophageal_cancer_terms, collapse = "|"),
"esophageal cancer"
)
)
url <- "http://www.ebi.ac.uk/ols4/api/ontologies/mondo/terms/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FMONDO_0002236/descendants"
ocular_cancer_terms <- get_descendants(url)
[1] "Number of terms collected:"
[1] 75
[1] "\n Some example terms"
[1] "metastatic malignant neoplasm in the eye"
[2] "eyelid cancer"
[3] "ocular melanoma"
[4] "eye lymphoma"
[5] "cornea cancer"
ocular_cancer_terms = stringr::str_replace_all(ocular_cancer_terms,
"\\bcarcinoma",
"cancer")
gwas_study_info = gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms ,
pattern = paste0(ocular_cancer_terms, collapse = "|"),
"ocular cancer"
)
)
gwas_study_info = gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms,
"cancer of gallbladder and extrahepatic biliary tract",
"gallbladder and bilary tract cancer"
)
)
gwas_study_info = gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms ,
"nodular sclerosis hodgkin lymphoma",
"hodgkins lymphoma"
))
gwas_study_info = gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms ,
"head and neck squamous cell cancer",
"head and neck cancer, squamous cell cancer"
))
intestinal_cancer_terms <- c("small intestine cancer",
"small bowel cancer",
"small intestine cancer")
url <- "http://www.ebi.ac.uk/ols4/api/ontologies/doid/terms/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FDOID_263/descendants"
kidney_cancer_terms <- get_descendants(url)
[1] "Number of terms collected:"
[1] 51
[1] "\n Some example terms"
[1] "malignant cystic nephroma" "kidney liposarcoma"
[3] "renal pelvis carcinoma" "congenital mesoblastic nephroma"
[5] "renal carcinoma"
kidney_cancer_terms = c("renal cell carcinoma",
"clear cell renal carcinoma",
"clear cell renal cell carcinoma",
kidney_cancer_terms)
kidney_cancer_terms = stringr::str_replace_all(kidney_cancer_terms,
"\\bcarcinoma",
"cancer")
kidney_cancer_terms = stringr::str_replace_all(kidney_cancer_terms,
"renal cancer",
"kidney cancer")
gwas_study_info =
gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms ,
pattern = paste0(kidney_cancer_terms, collapse = "|"),
"kidney cancer"
)
)
gwas_study_info =
gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms ,
pattern = "laryngeal squamous cell cancer|laryngeal cancer",
"larynx cancer"
)
)
url <- "http://www.ebi.ac.uk/ols4/api/ontologies/efo/terms/http%253A%252F%252Fwww.ebi.ac.uk%252Fefo%252FEFO_0000565/descendants"
leukemia_terms <- get_descendants(url)
[1] "Number of terms collected:"
[1] 133
[1] "\n Some example terms"
[1] "chronic eosinophilic leukemia, not otherwise specified"
[2] "acute leukemia"
[3] "mast-cell leukemia"
[4] "lymphoid leukemia"
[5] "myeloid leukemia"
leukemia_terms = stringr::str_replace_all(leukemia_terms,
"\\bcarcinoma",
"cancer")
gwas_study_info =
gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms ,
pattern = paste0(leukemia_terms, collapse = "|"),
"leukemia"
)
)
url <- "http://www.ebi.ac.uk/ols4/api/ontologies/efo/terms/http%253A%252F%252Fwww.ebi.ac.uk%252Fefo%252FEFO_0005570/descendants"
lip_oral_cavity_cancer_terms <- get_descendants(url)
[1] "Number of terms collected:"
[1] 51
[1] "\n Some example terms"
[1] "squamous odontogenic tumor" "lip cancer"
[3] "gum cancer" "oral cavity carcinoma"
[5] "vestibule of mouth cancer"
lip_oral_cavity_cancer_terms = stringr::str_replace_all(lip_oral_cavity_cancer_terms,
"\\bcarcinoma",
"cancer")
gwas_study_info = gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms ,
pattern = paste0(lip_oral_cavity_cancer_terms, collapse = "|"),
"lip and oral cavity cancer"
)
)
url <- "http://www.ebi.ac.uk/ols4/api/ontologies/mondo/terms/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FMONDO_0002691/descendants"
liver_cancer_terms <- get_descendants(url)
[1] "Number of terms collected:"
[1] 76
[1] "\n Some example terms"
[1] "carcinoma of liver and intrahepatic biliary tract"
[2] "calcifying nested epithelial stromal tumor of the liver"
[3] "liver lymphoma"
[4] "biliary tract cancer"
[5] "liver sarcoma"
liver_cancer_terms = stringr::str_replace_all(liver_cancer_terms,
"\\bcarcinoma",
"cancer")
liver_cancer_terms = c("hepatitis virus-related liver cancer",
liver_cancer_terms)
gwas_study_info =
gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms ,
pattern = paste0(liver_cancer_terms, collapse = "|"),
"liver cancer"
)
)
gwas_study_info =
gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms ,
pattern = "hepatitis virus-related liver cancer",
"liver cancer"
)
)
url <- "http://www.ebi.ac.uk/ols4/api/ontologies/mondo/terms/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FMONDO_0008903/descendants"
lung_cancer_terms <- get_descendants(url)
[1] "Number of terms collected:"
[1] 63
[1] "\n Some example terms"
[1] "graham-boyle-troxell syndrome" "malignant superior sulcus neoplasm"
[3] "lung carcinoma" "lung hilum cancer"
[5] "lung lymphoma"
lung_cancer_terms = stringr::str_replace_all(lung_cancer_terms,
"\\bcarcinoma",
"cancer")
gwas_study_info =
gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms ,
pattern = paste0(lung_cancer_terms, collapse = "|"),
"lung cancer"
)
)
url <- "http://www.ebi.ac.uk/ols4/api/ontologies/mondo/terms/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FMONDO_0008170/descendants"
ovarian_cancer_terms <- get_descendants(url)
[1] "Number of terms collected:"
[1] 77
[1] "\n Some example terms"
[1] "malignant non-epithelial tumor of ovary"
[2] "malignant epithelial tumor of ovary"
[3] "familial ovarian cancer"
[4] "ovarian endometrioid adenocarcinofibroma"
[5] "ovarian neuroendocrine neoplasm"
ovarian_cancer_terms = stringr::str_replace_all(ovarian_cancer_terms,
"\\bcarcinoma",
"cancer")
ovarian_cancer_terms = c("high grade serous ovarian cancer",
"high grade ovarian cancer",
"high grade ovarian cancers",
"ovarian endometrioid cancer", # http://www.ebi.ac.uk/efo/EFO_1001515 - ovarian edometrioid carcinoma
"ovarian serous cancer", # http://www.ebi.ac.uk/efo/EFO_1001516 - ovarian serous carcinoma
ovarian_cancer_terms
)
gwas_study_info = gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms,
pattern = paste0(ovarian_cancer_terms, collapse = "|"),
"ovarian cancer"
)
)
gwas_study_info = gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms,
pattern = "high grade ovarian cancer",
"ovarian cancer"
)
)
url <- "http://www.ebi.ac.uk/ols4/api/ontologies/mondo/terms/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FMONDO_0009831/descendants"
pancreatic_cancer_terms <- get_descendants(url)
[1] "Number of terms collected:"
[1] 37
[1] "\n Some example terms"
[1] "pancreatic endocrine carcinoma"
[2] "pancreas sarcoma"
[3] "malignant exocrine pancreas neoplasm"
[4] "pancreas lymphoma"
[5] "pancreatic small cell neuroendocrine carcinoma"
pancreatic_cancer_terms = stringr::str_replace_all(pancreatic_cancer_terms,
"\\bcarcinoma",
"cancer")
gwas_study_info = gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms ,
pattern = paste0(pancreatic_cancer_terms, collapse = "|"),
"pancreatic cancer"
)
)
url <- "http://www.ebi.ac.uk/ols4/api/ontologies/doid/terms/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FDOID_10283/descendants"
prostate_cancer_terms <- get_descendants(url)
[1] "Number of terms collected:"
[1] 21
[1] "\n Some example terms"
[1] "prostate small cell carcinoma" "adenosquamous prostate carcinoma"
[3] "prostate sarcoma" "prostate neuroendocrine neoplasm"
[5] "prostate lymphoma"
prostate_cancer_terms = stringr::str_replace_all(prostate_cancer_terms,
"\\bcarcinoma",
"cancer")
prostate_cancer_terms = c("grade iii prostatic intraepithelial neoplasia",
"metastatic prostate cancer",
prostate_cancer_terms)
gwas_study_info = gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms,
pattern = paste0(prostate_cancer_terms, collapse = "|"),
"prostate cancer"
)
)
url <- "http://www.ebi.ac.uk/ols4/api/ontologies/efo/terms/http%253A%252F%252Fwww.ebi.ac.uk%252Fefo%252FEFO_0005952/descendants"
nhl_terms <- get_descendants(url)
[1] "Number of terms collected:"
[1] 93
[1] "\n Some example terms"
[1] "b-cell non-hodgkins lymphoma" "lymphoma, aids-related"
[3] "sezary's disease" "acute lymphoblastic leukemia"
[5] "gastric non-hodgkin lymphoma"
nhl_terms = stringr::str_replace_all(leukemia_terms,
"\\bcarcinoma",
"cancer")
nhl_terms = c("central nervous system non-hodgkin lymphoma",
"lymphoblastic lymphoma",
"extranodal nasal nk/t cell lymphoma", # https://www.ebi.ac.uk/ols4/ontologies/ordo/classes/http%253A%252F%252Fwww.orpha.net%252FORDO%252FOrphanet_86879
"follicular lymphoma", # http://purl.obolibrary.org/obo/DOID_0050873
"marginal zone b-cell lymphoma",
"diffuse large b-cell lymphoma",
nhl_terms)
gwas_study_info =
gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms ,
pattern = paste0(nhl_terms, collapse = "|"),
"non-hodgkins lymphoma"
)
)
url <- "http://www.ebi.ac.uk/ols4/api/ontologies/efo/terms/http%253A%252F%252Fwww.ebi.ac.uk%252Fefo%252FEFO_0009260/descendants"
non_melanoma_skin_cancer_terms <- get_descendants(url)
[1] "Number of terms collected:"
[1] 11
[1] "\n Some example terms"
[1] "keratinocyte carcinoma"
[2] "basal cell carcinoma"
[3] "skin basal cell carcinoma"
[4] "skin basosquamous cell carcinoma"
[5] "salivary gland basal cell adenocarcinoma"
non_melanoma_skin_cancer_terms = stringr::str_replace_all(non_melanoma_skin_cancer_terms,
"\\bcarcinoma",
"cancer")
gwas_study_info = gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms ,
pattern = paste0(non_melanoma_skin_cancer_terms, collapse = "|"),
"non-melanoma skin cancer"
)
)
url <- "http://www.ebi.ac.uk/ols4/api/ontologies/doid/terms/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FDOID_0060119/descendants"
other_pharynx_cancer_terms <- get_descendants(url)
[1] "Number of terms collected:"
[1] 16
[1] "\n Some example terms"
[1] "nasopharynx carcinoma" "oropharynx cancer"
[3] "hypopharynx cancer" "pharynx squamous cell carcinoma"
[5] "tonsillar fossa cancer"
other_pharynx_cancer_terms = stringr::str_replace_all(other_pharynx_cancer_terms,
"\\bcarcinoma",
"cancer")
other_pharynx_cancer_terms = c("hypopharyngeal cancer",
other_pharynx_cancer_terms)
gwas_study_info = gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms ,
pattern = paste0(other_pharynx_cancer_terms, collapse = "|"),
"other pharynx cancer"
)
)
url <- "http://www.ebi.ac.uk/ols4/api/ontologies/doid/terms/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FDOID_10534/descendants"
stomach_cancer_terms <- get_descendants(url)
[1] "Number of terms collected:"
[1] 32
[1] "\n Some example terms"
[1] "gastric liposarcoma" "gastric gastrinoma"
[3] "gastric teratoma" "stomach carcinoma"
[5] "malignant gastric germ cell tumor"
stomach_cancer_terms = stringr::str_replace_all(stomach_cancer_terms,
"\\bcarcinoma",
"cancer")
stomach_cancer_terms = c(
"diffuse stomach cancer",
"gastric cancer",
stomach_cancer_terms
)
gwas_study_info = gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms ,
pattern = paste0(stomach_cancer_terms, collapse = "|"),
"stomach cancer"
)
) |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms ,
pattern = "diffuse stomach cancer",
"stomach cancer"
)
)
gwas_study_info = gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms ,
pattern = "cutaneous squamous cell cancer",
"squamous cell cancer"
)
)
url <- "http://www.ebi.ac.uk/ols4/api/ontologies/doid/terms/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FDOID_1781/descendants"
thyroid_cancer_terms <- get_descendants(url)
[1] "Number of terms collected:"
[1] 24
[1] "\n Some example terms"
[1] "thyroid sarcoma"
[2] "thyroid gland carcinoma"
[3] "thyroid lymphoma"
[4] "thyroid angiosarcoma"
[5] "thyroid gland mucoepidermoid carcinoma"
thyroid_cancer_terms = stringr::str_replace_all(thyroid_cancer_terms,
"\\bcarcinoma",
"cancer")
thyroid_cancer_terms = c("differentiated thyroid cancer",
thyroid_cancer_terms)
gwas_study_info = gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms ,
pattern = paste0(thyroid_cancer_terms, collapse = "|"),
"thyroid cancer"
)
)
uterine_cancer_terms <- c("uterine corpus cancer",
"uterine adnexa cancer")
gwas_study_info = gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms ,
pattern = paste0(uterine_cancer_terms, collapse = "|"),
"uterine cancer"
)
)
gwas_study_info = gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms,
"uveal melanoma|uveal melanoma disease severity|epithelioid cell uveal melanoma|choroidal melanoma",
"ocular melanoma"
))
gwas_study_info = gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(l1_all_disease_terms,
"ocular melanoma disease severity",
"ocular melanoma"
))
n_studies_trait = gwas_study_info |>
dplyr::filter(DISEASE_STUDY == T) |>
dplyr::select(l1_all_disease_terms, PUBMED_ID) |>
dplyr::distinct() |>
dplyr::group_by(l1_all_disease_terms) |>
dplyr::summarise(n_studies = dplyr::n()) |>
dplyr::arrange(desc(n_studies))
head(n_studies_trait)
# A tibble: 6 × 2
l1_all_disease_terms n_studies
<chr> <int>
1 type 2 diabetes mellitus 145
2 asthma 131
3 alzheimers disease 124
4 breast cancer 122
5 major depressive disorder 108
6 colorectal cancer 104
dim(n_studies_trait)
[1] 2722 2
diseases <- stringr::str_split(pattern = ", ",
gwas_study_info$l1_all_disease_terms[gwas_study_info$l1_all_disease_terms != ""]) |>
unlist() |>
stringr::str_trim()
test <- data.frame(trait = unique(diseases))
length(unique(diseases))
[1] 1875
# make frequency table
freq <- table(as.factor(diseases))
# sort in decreasing order
freq_sorted <- sort(freq, decreasing = TRUE)
# show top N, e.g. top 10
head(freq_sorted, 10)
chronic kidney disease hypertension type 2 diabetes mellitus
10835 7093 922
coronary artery disease major depressive disorder benign neoplasm
514 471 430
alzheimers disease breast cancer asthma
422 404 357
schizophrenia
356
fwrite(gwas_study_info,
here::here("output/gwas_cat/gwas_study_info_group_l1_v2.csv")
)
sessionInfo()
R version 4.3.1 (2023-06-16)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS 15.6.1
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
time zone: America/Los_Angeles
tzcode source: internal
attached base packages:
[1] stats graphics grDevices datasets utils methods base
other attached packages:
[1] jsonlite_2.0.0 httr_1.4.7 stringr_1.5.1 data.table_1.17.8
[5] dplyr_1.1.4 workflowr_1.7.1
loaded via a namespace (and not attached):
[1] compiler_4.3.1 renv_1.0.3 promises_1.3.3 tidyselect_1.2.1
[5] Rcpp_1.1.0 git2r_0.36.2 callr_3.7.6 later_1.4.2
[9] jquerylib_0.1.4 yaml_2.3.10 fastmap_1.2.0 here_1.0.1
[13] R6_2.6.1 generics_0.1.4 curl_6.4.0 knitr_1.50
[17] tibble_3.3.0 rprojroot_2.1.0 bslib_0.9.0 pillar_1.11.0
[21] rlang_1.1.6 utf8_1.2.6 cachem_1.1.0 stringi_1.8.7
[25] httpuv_1.6.16 xfun_0.52 getPass_0.2-4 fs_1.6.6
[29] sass_0.4.10 cli_3.6.5 withr_3.0.2 magrittr_2.0.3
[33] ps_1.9.1 digest_0.6.37 processx_3.8.6 rstudioapi_0.17.1
[37] lifecycle_1.0.4 vctrs_0.6.5 evaluate_1.0.4 glue_1.8.0
[41] whisker_0.4.1 rmarkdown_2.29 tools_4.3.1 pkgconfig_2.0.3
[45] htmltools_0.5.8.1