Last updated: 2025-09-22
Checks: 7 0
Knit directory:
genomics_ancest_disease_dispar/
This reproducible R Markdown analysis was created with workflowr (version 1.7.1). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.
Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.
Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.
The command set.seed(20220216)
was run prior to running
the code in the R Markdown file. Setting a seed ensures that any results
that rely on randomness, e.g. subsampling or permutations, are
reproducible.
Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.
Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.
Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.
The results in this page were generated with repository version 62b9234. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.
Note that you need to be careful to ensure that all relevant files for
the analysis have been committed to Git prior to generating the results
(you can use wflow_publish
or
wflow_git_commit
). workflowr only checks the R Markdown
file, but you know if there are other scripts or data files that it
depends on. Below is the status of the Git repository when the results
were generated:
Ignored files:
Ignored: .DS_Store
Ignored: .Rproj.user/
Ignored: data/.DS_Store
Ignored: data/gbd/.DS_Store
Ignored: data/gbd/IHME-GBD_2021_DATA-d8cf695e-1.csv
Ignored: data/gbd/ihme_gbd_2019_global_disease_burden_rate_all_ages.csv
Ignored: data/gbd/ihme_gbd_2019_global_paf_rate_percent_all_ages.csv
Ignored: data/gbd/ihme_gbd_2021_global_disease_burden_rate_all_ages.csv
Ignored: data/gbd/ihme_gbd_2021_global_paf_rate_percent_all_ages.csv
Ignored: data/gwas_catalog/
Ignored: data/who/
Ignored: output/gwas_cat/
Ignored: output/gwas_study_info_cohort_corrected.csv
Ignored: output/gwas_study_info_trait_corrected.csv
Ignored: output/gwas_study_info_trait_ontology_info.csv
Ignored: output/gwas_study_info_trait_ontology_info_l1.csv
Ignored: output/gwas_study_info_trait_ontology_info_l2.csv
Ignored: output/trait_ontology/
Ignored: renv/
Unstaged changes:
Modified: analysis/level_1_disease_group_non_cancer.Rmd
Modified: analysis/level_2_disease_group.Rmd
Modified: code/get_term_descendants.R
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
These are the previous versions of the repository in which changes were
made to the R Markdown
(analysis/exclude_infectious_diseases.Rmd
) and HTML
(docs/exclude_infectious_diseases.html
) files. If you’ve
configured a remote Git repository (see ?wflow_git_remote
),
click on the hyperlinks in the table below to view the files as they
were in that past version.
File | Version | Author | Date | Message |
---|---|---|---|---|
Rmd | 62b9234 | IJbeasley | 2025-09-22 | Excluding infectious disease |
library(dplyr)
library(data.table)
library(stringr)
source(here::here("code/get_term_descendants.R"))
gwas_study_info <- fread(here::here("output/gwas_cat/gwas_study_info_disease_trait_simplified.csv"))
n_studies_trait = gwas_study_info |>
dplyr::filter(DISEASE_STUDY == T) |>
dplyr::select(collected_all_disease_terms, PUBMED_ID) |>
dplyr::distinct() |>
dplyr::group_by(collected_all_disease_terms) |>
dplyr::summarise(n_studies = dplyr::n()) |>
dplyr::arrange(desc(n_studies))
head(n_studies_trait)
# A tibble: 6 × 2
collected_all_disease_terms n_studies
<chr> <int>
1 type 2 diabetes mellitus 145
2 alzheimers disease 116
3 breast cancer 112
4 asthma 110
5 major depressive disorder 108
6 schizophrenia 103
dim(n_studies_trait)
[1] 3050 2
diseases <- stringr::str_split(pattern = ", ",
gwas_study_info$collected_all_disease_terms[gwas_study_info$collected_all_disease_terms != ""]) |>
unlist() |>
stringr::str_trim()
length(unique(diseases))
[1] 2211
gwas_study_info = gwas_study_info |>
mutate(collected_all_disease_terms =
stringr::str_replace_all(collected_all_disease_terms,
"hiv-1 infection",
"hiv infection"
))
gwas_study_info = gwas_study_info |>
mutate(collected_all_disease_terms =
stringr::str_remove_all(collected_all_disease_terms,
pattern = "(^|, )hiv infection(,|$)"
)
)
gwas_study_info = gwas_study_info |>
mutate(collected_all_disease_terms =
stringr::str_remove_all(collected_all_disease_terms,
pattern = "^hiv infection$"
)
)
gwas_study_info |>
filter(grepl("syphi", collected_all_disease_terms, ignore.case = T)) |>
select(`DISEASE/TRAIT`, collected_all_disease_terms)
Empty data.table (0 rows and 2 cols): DISEASE/TRAIT,collected_all_disease_terms
gwas_study_info = gwas_study_info |>
mutate(collected_all_disease_terms =
stringr::str_remove_all(collected_all_disease_terms,
pattern = "(^|, )chlamydophila infectious disease(,|$)"
)
)
gwas_study_info |>
filter(grepl("gonorrhea", collected_all_disease_terms, ignore.case = T)) |>
select(`DISEASE/TRAIT`, collected_all_disease_terms)
Empty data.table (0 rows and 2 cols): DISEASE/TRAIT,collected_all_disease_terms
gwas_study_info |>
filter(grepl("trichomoniasis", collected_all_disease_terms, ignore.case = T)) |>
select(`DISEASE/TRAIT`, collected_all_disease_terms)
Empty data.table (0 rows and 2 cols): DISEASE/TRAIT,collected_all_disease_terms
gwas_study_info |>
filter(grepl("herpes", collected_all_disease_terms, ignore.case = T)) |>
select(`DISEASE/TRAIT`, collected_all_disease_terms)
DISEASE/TRAIT
<char>
1: Shingles
2: Source of report of B02 (zoster [herpes zoster]) (UKB data field 130179)
3: ICD10 B02: Zoster [herpes zoster]
4: Herpes zoster (PheCode 53)
5: Herpes zoster with nervous system complications (PheCode 53.1)
6: Herpes simplex (PheCode 54)
7: Herpes zoster
8: Herpes simplex (PheCode 54)
9: Herpes zoster with nervous system complications (PheCode 53.1)
10: Herpes zoster with nervous system complications (PheCode 53.1)
11: Herpes zoster with nervous system complications (PheCode 53.1)
12: Herpes zoster (PheCode 53)
13: Herpes zoster (PheCode 53)
14: Herpes zoster (PheCode 53)
15: Herpes zoster (PheCode 53)
16: Herpes zoster with nervous system complications (PheCode 53.1)
17: Herpes simplex (PheCode 54)
18: Herpes simplex (PheCode 54)
19: Herpes simplex (PheCode 54)
20: Shingles
21: ICD10 B00: Herpesviral [herpes simplex] infections
22: ICD10 B02: Zoster [herpes zoster]
23: Response to tofacitinib treatment in rheumatoid arthritis (herpes zoster)(time to event)
24: Response to tofacitinib treatment in psoriasis (herpes zoster)(time to event)
25: Response to tofacitinib treatment in rheumatoid arthritis (herpes zoster)
26: Response to tofacitinib treatment in psoriasis (herpes zoster)
27: Response to tofacitinib treatment (herpes zoster)(time to event)
28: Response to tofacitinib treatment (herpes zoster)
29: ICD10 B00: Herpesviral [herpes simplex] infections (Gene-based burden)
30: ICD10 B02: Zoster [herpes zoster] (Gene-based burden)
31: Zoster infection
32: Herpes infection
33: Zoster infection
34: Herpes infection
DISEASE/TRAIT
collected_all_disease_terms
<char>
1: herpes zoster
2: herpes zoster
3: herpes zoster
4: herpes zoster
5: complication, herpes zoster, nervous system disease
6: herpes simplex infection
7: herpes zoster
8: herpes simplex infection
9: complication, herpes zoster, nervous system disease
10: complication, herpes zoster, nervous system disease
11: complication, herpes zoster, nervous system disease
12: herpes zoster
13: herpes zoster
14: herpes zoster
15: herpes zoster
16: complication, herpes zoster, nervous system disease
17: herpes simplex infection
18: herpes simplex infection
19: herpes simplex infection
20: herpes zoster
21: herpesviridae infectious disease
22: herpes zoster
23: herpes zoster, rheumatoid arthritis
24: herpes zoster, psoriasis
25: herpes zoster, rheumatoid arthritis
26: herpes zoster, psoriasis
27: herpes zoster, psoriasis, rheumatoid arthritis
28: herpes zoster, psoriasis, rheumatoid arthritis
29: herpesviridae infectious disease
30: herpes zoster
31: herpes zoster
32: herpes simplex infection
33: herpes zoster
34: herpes simplex infection
collected_all_disease_terms
url <- "http://www.ebi.ac.uk/ols4/api/ontologies/mesh/terms/http%253A%252F%252Fid.nlm.nih.gov%252Fmesh%252FD012749/descendants"
sti_terms <- get_descendants(url)
[1] "Number of terms collected:"
[1] 36
[1] "\n Some example terms"
[1] "sexually transmitted diseases, bacterial"
[2] "aids arteritis, central nervous system"
[3] "hiv-associated lipodystrophy syndrome"
[4] "aids-related opportunistic infections"
[5] "sexually transmitted diseases, viral"
for(term in sti_terms){
if(sum(grepl(term, gwas_study_info$collected_all_disease_terms)) > 0) {
print(paste0("Term removed:", term))
gwas_study_info = gwas_study_info |>
mutate(collected_all_disease_terms =
stringr::str_remove_all(collected_all_disease_terms,
pattern = paste0("(^|, )", term, "(,|$)")
)
)
}
}
[1] "Term removed:warts"
tb_terms <- c("mycobacterium tuberculosis infection",
"pulmonary tuberculosis",
"extrapulmonary tuberculosis",
"meningeal tuberculosis")
tb_terms <- paste0("(^|, )", tb_terms, "(,|$)")
gwas_study_info = gwas_study_info |>
mutate(collected_all_disease_terms =
stringr::str_replace_all(collected_all_disease_terms,
pattern = paste0(tb_terms, collapse = "|"),
"tuberculosis"
))
gwas_study_info = gwas_study_info |>
mutate(collected_all_disease_terms =
stringr::str_remove_all(collected_all_disease_terms,
pattern = "(^|, )tuberculosis(,|$)"
))
# acute lower respiratory infections
url <- "http://www.ebi.ac.uk/ols4/api/ontologies/snomed/terms/http%253A%252F%252Fsnomed.info%252Fid%252F195742007/descendants"
lri_terms <- get_descendants(url)
[1] "Number of terms collected:"
[1] 35
[1] "\n Some example terms"
[1] "chronic obstructive pulmonary disease with acute lower respiratory infection"
[2] "acute lower respiratory infection caused by respiratory syncytial virus"
[3] "acute infective exacerbation of chronic obstructive pulmonary disease"
[4] "inflammation of bronchiole caused by human metapneumovirus"
[5] "acute bronchiolitis caused by respiratory syncytial virus"
for(term in lri_terms){
if(sum(grepl(term, gwas_study_info$collected_all_disease_terms)) > 0) {
print(paste0("Term removed:", term))
gwas_study_info = gwas_study_info |>
mutate(collected_all_disease_terms =
stringr::str_remove_all(collected_all_disease_terms,
pattern = paste0("(^|, )", term, "(,|$)")
)
)
}
}
# acute upper respiratory infections
url <- "http://www.ebi.ac.uk/ols4/api/ontologies/snomed/terms/http%253A%252F%252Fsnomed.info%252Fid%252F54398005/descendants"
uri_terms <- get_descendants(url)
[1] "Number of terms collected:"
[1] 30
[1] "\n Some example terms"
[1] "acute upper respiratory infection caused by respiratory syncytial virus"
[2] "acute infective pharyngitis caused by streptococcus (disorder)"
[3] "acute nasopharyngitis caused by respiratory syncytial virus"
[4] "acute laryngitis caused by haemophilus influenzae"
[5] "fusobacterial necrotizing tonsillitis (disorder)"
for(term in uri_terms){
if(sum(grepl(term, gwas_study_info$collected_all_disease_terms)) > 0) {
print(paste0("Term removed:", term))
gwas_study_info = gwas_study_info |>
mutate(collected_all_disease_terms =
stringr::str_remove_all(collected_all_disease_terms,
pattern = paste0("(^|, )", term, "(,|$)")
)
)
}
}
url <- "http://www.ebi.ac.uk/ols4/api/ontologies/doid/terms/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FDOID_10754/descendants"
otitis_media_terms <- get_descendants(url)
[1] "Number of terms collected:"
[1] 20
[1] "\n Some example terms"
[1] "chronic tubotympanic suppurative otitis media"
[2] "acute allergic sanguinous otitis media"
[3] "acute allergic mucoid otitis media"
[4] "acute allergic serous otitis media"
[5] "middle ear cholesterol granuloma"
gwas_study_info = gwas_study_info |>
mutate(collected_all_disease_terms =
stringr::str_replace_all(collected_all_disease_terms,
pattern = paste0("(^|, )", otitis_media_terms, "(,|$)", collapse = "|"),
"otitis media"
))
gwas_study_info = gwas_study_info |>
mutate(collected_all_disease_terms =
stringr::str_remove_all(collected_all_disease_terms,
pattern = "(^|, )otitis media(,|$)"
)
)
gwas_study_info = gwas_study_info |>
mutate(collected_all_disease_terms =
stringr::str_remove_all(collected_all_disease_terms,
pattern = "(^|, )covid-19(,|$)"
)
)
gwas_study_info = gwas_study_info |>
mutate(collected_all_disease_terms =
stringr::str_remove_all(collected_all_disease_terms,
pattern = "(^|, )astroviridae infectious disease, infantile diarrhea(,|$)"
)
)
# if diarrhea is caused by IBS, add ibs to collected_all_disease_terms
gwas_study_info = gwas_study_info |>
mutate(collected_all_disease_terms =
ifelse(grepl("IBS", `DISEASE/TRAIT`, ignore.case = T) &
grepl("diarrhea", collected_all_disease_terms, ignore.case = T),
paste0(collected_all_disease_terms, ", irritable bowel syndrome"),
collected_all_disease_terms
)
)
gwas_study_info = gwas_study_info |>
mutate(collected_all_disease_terms =
stringr::str_remove_all(collected_all_disease_terms,
pattern = "(^|, )typhoid fever(,|$)"
)
)
gwas_study_info = gwas_study_info |>
mutate(collected_all_disease_terms =
stringr::str_remove_all(collected_all_disease_terms,
pattern = "(^|, )non-typhoidal salmonella bacteremia(,|$)"
)
)
# remove bacterial and viral gastroenteritis
gwas_study_info = gwas_study_info |>
mutate(collected_all_disease_terms =
ifelse(grepl("gastroenteritis", collected_all_disease_terms, ignore.case = T) &
grepl("bacterial gastroenteritis|viral gastroenteritis",
collected_all_disease_terms,
ignore.case = T),
stringr::str_remove_all(collected_all_disease_terms,
pattern = "(^|, )gastroenteritis(,|$)"),
collected_all_disease_terms
)
)
# remove viral and bacterial enteritis
gwas_study_info = gwas_study_info |>
mutate(collected_all_disease_terms =
ifelse(grepl("enteritis", collected_all_disease_terms, ignore.case = T) &
grepl("bacterial enteritis|viral enteritis",
collected_all_disease_terms,
ignore.case = T),
stringr::str_remove_all(collected_all_disease_terms,
pattern = "(^|, )enteritis(,|$)"),
collected_all_disease_terms
)
)
gwas_study_info = gwas_study_info |>
mutate(collected_all_disease_terms =
stringr::str_remove_all(collected_all_disease_terms,
pattern = "(^|, )malaria(,|$)"
)
)
gwas_study_info = gwas_study_info |>
mutate(collected_all_disease_terms =
stringr::str_remove_all(collected_all_disease_terms,
pattern = "(^|, )chagas disease(,|$)"
)
)
gwas_study_info = gwas_study_info |>
mutate(collected_all_disease_terms =
stringr::str_remove_all(collected_all_disease_terms,
pattern = "(^|, )visceral leishmaniasis(,|$)"
)
)
gwas_study_info = gwas_study_info |>
mutate(collected_all_disease_terms =
stringr::str_remove_all(collected_all_disease_terms,
pattern = "(^|, )cutaneous leishmaniasis(,|$)"
)
)
gwas_study_info |>
filter(grepl("trypanosomiasis", collected_all_disease_terms, ignore.case = T)) |>
select(`DISEASE/TRAIT`, collected_all_disease_terms)
Empty data.table (0 rows and 2 cols): DISEASE/TRAIT,collected_all_disease_terms
gwas_study_info |>
filter(grepl("schistosomiasis", collected_all_disease_terms, ignore.case = T)) |>
select(`DISEASE/TRAIT`, collected_all_disease_terms)
Empty data.table (0 rows and 2 cols): DISEASE/TRAIT,collected_all_disease_terms
gwas_study_info |>
filter(grepl("cysticercosis", collected_all_disease_terms, ignore.case = T)) |>
select(`DISEASE/TRAIT`, collected_all_disease_terms)
Empty data.table (0 rows and 2 cols): DISEASE/TRAIT,collected_all_disease_terms
gwas_study_info |>
filter(grepl("echinococcosis", collected_all_disease_terms, ignore.case = T)) |>
select(`DISEASE/TRAIT`, collected_all_disease_terms)
Empty data.table (0 rows and 2 cols): DISEASE/TRAIT,collected_all_disease_terms
gwas_study_info |>
filter(grepl("filariasis", collected_all_disease_terms, ignore.case = T)) |>
select(`DISEASE/TRAIT`, collected_all_disease_terms)
Empty data.table (0 rows and 2 cols): DISEASE/TRAIT,collected_all_disease_terms
gwas_study_info |>
filter(grepl("onchocerciasis", collected_all_disease_terms, ignore.case = T)) |>
select(`DISEASE/TRAIT`, collected_all_disease_terms)
Empty data.table (0 rows and 2 cols): DISEASE/TRAIT,collected_all_disease_terms
gwas_study_info |>
filter(grepl("trachoma", collected_all_disease_terms, ignore.case = T)) |>
select(`DISEASE/TRAIT`, collected_all_disease_terms)
Empty data.table (0 rows and 2 cols): DISEASE/TRAIT,collected_all_disease_terms
gwas_study_info |>
filter(grepl("yellow fever", collected_all_disease_terms, ignore.case = T)) |>
select(`DISEASE/TRAIT`, collected_all_disease_terms)
Empty data.table (0 rows and 2 cols): DISEASE/TRAIT,collected_all_disease_terms
gwas_study_info |>
filter(grepl("rabies", collected_all_disease_terms, ignore.case = T)) |>
select(`DISEASE/TRAIT`, collected_all_disease_terms)
Empty data.table (0 rows and 2 cols): DISEASE/TRAIT,collected_all_disease_terms
# ascariasis
gwas_study_info |>
filter(grepl("ascariasis", collected_all_disease_terms, ignore.case = T)) |>
select(`DISEASE/TRAIT`, collected_all_disease_terms)
Empty data.table (0 rows and 2 cols): DISEASE/TRAIT,collected_all_disease_terms
# trichuriasis
gwas_study_info |>
filter(grepl("trichuriasis", collected_all_disease_terms, ignore.case = T)) |>
select(`DISEASE/TRAIT`, collected_all_disease_terms)
Empty data.table (0 rows and 2 cols): DISEASE/TRAIT,collected_all_disease_terms
# hookworm
gwas_study_info |>
filter(grepl("hookworm", collected_all_disease_terms, ignore.case = T)) |>
select(`DISEASE/TRAIT`, collected_all_disease_terms)
Empty data.table (0 rows and 2 cols): DISEASE/TRAIT,collected_all_disease_terms
gwas_study_info |>
filter(grepl("trema", collected_all_disease_terms, ignore.case = T)
) |>
select(`DISEASE/TRAIT`, collected_all_disease_terms)
Empty data.table (0 rows and 2 cols): DISEASE/TRAIT,collected_all_disease_terms
gwas_study_info |>
filter(grepl("leprosy", collected_all_disease_terms, ignore.case = T)) |>
select(`DISEASE/TRAIT`, collected_all_disease_terms) |>
distinct()
DISEASE/TRAIT collected_all_disease_terms
<char> <char>
1: Leprosy leprosy
2: Crohn's disease or Leprosy crohns disease, leprosy
3: Crohn's disease or Leprosy (opposite effect) crohns disease, leprosy
gwas_study_info = gwas_study_info |>
mutate(collected_all_disease_terms =
stringr::str_remove_all(collected_all_disease_terms,
pattern = "(^|, )leprosy(,|$)"
)
)
gwas_study_info |>
filter(grepl("ebola", collected_all_disease_terms, ignore.case = T)) |>
select(`DISEASE/TRAIT`, collected_all_disease_terms)
Empty data.table (0 rows and 2 cols): DISEASE/TRAIT,collected_all_disease_terms
gwas_study_info |>
filter(grepl("zika", collected_all_disease_terms, ignore.case = T)) |>
select(`DISEASE/TRAIT`, collected_all_disease_terms)
DISEASE/TRAIT collected_all_disease_terms
<char> <char>
1: Congenital Zika syndrome zika virus congenital syndrome
gwas_study_info |>
filter(grepl("worm", collected_all_disease_terms, ignore.case = T)) |>
select(`DISEASE/TRAIT`, collected_all_disease_terms)
Empty data.table (0 rows and 2 cols): DISEASE/TRAIT,collected_all_disease_terms
gwas_study_info = gwas_study_info |>
mutate(collected_all_disease_terms =
stringr::str_remove_all(collected_all_disease_terms,
pattern = "(^|, )infectious meningitis(,|$)"
)
)
gwas_study_info = gwas_study_info |>
mutate(collected_all_disease_terms =
stringr::str_remove_all(collected_all_disease_terms,
pattern = "(^|, )viral encephalitis(,|$)"
)
)
# ? also remove if DISEASE/TRAIT == Encephalitis (PheCode 323)
gwas_study_info = gwas_study_info |>
mutate(collected_all_disease_terms =
ifelse(`DISEASE/TRAIT` == "Encephalitis (PheCode 323)",
stringr::str_remove_all(collected_all_disease_terms,
pattern = "(^|, )encephalitis(,|$)"),
collected_all_disease_terms
)
)
gwas_study_info = gwas_study_info |>
mutate(collected_all_disease_terms =
stringr::str_remove_all(collected_all_disease_terms,
pattern = "(^|, )diphtheria(,|$)"
)
)
gwas_study_info |>
filter(grepl("pertussis", collected_all_disease_terms, ignore.case = T)) |>
select(`DISEASE/TRAIT`, collected_all_disease_terms)
Empty data.table (0 rows and 2 cols): DISEASE/TRAIT,collected_all_disease_terms
gwas_study_info |>
filter(grepl("tetanus", collected_all_disease_terms, ignore.case = T)) |>
select(`DISEASE/TRAIT`, collected_all_disease_terms)
Empty data.table (0 rows and 2 cols): DISEASE/TRAIT,collected_all_disease_terms
gwas_study_info = gwas_study_info |>
mutate(collected_all_disease_terms =
stringr::str_remove_all(collected_all_disease_terms,
pattern = "(^|, )measles(,|$)"
)
)
gwas_study_info = gwas_study_info |>
mutate(collected_all_disease_terms =
ifelse(`DISEASE/TRAIT` == "Herpes zoster (PheCode 53)" |
`DISEASE/TRAIT` == "Herpes zoster" |
`DISEASE/TRAIT` == "Zoster infection" |
grepl("ICD10 B02: Zoster \\[herpes zoster\\]", `DISEASE/TRAIT`),
stringr::str_remove_all(collected_all_disease_terms,
pattern = "(^|, )herpes zoster(,|$)"),
collected_all_disease_terms
)
)
gwas_study_info = gwas_study_info |>
mutate(collected_all_disease_terms =
stringr::str_replace_all(collected_all_disease_terms,
pattern = "(^|, )complication, herpes zoster, nervous system disease(,|$)",
"herpes zoster, nervous system disease"
)
)
# if acute hepatitis in DISEASE/TRAIT remove from collected_all_disease_terms
gwas_study_info |>
filter(grepl("acute hepat",
`DISEASE/TRAIT`,
ignore.case = T)) |>
pull(`DISEASE/TRAIT`) |>
unique()
[1] "Acute hepatitis A infection"
gwas_study_info = gwas_study_info |>
mutate(collected_all_disease_terms =
ifelse(`DISEASE/TRAIT` == "Acute hepatitis A infection",
stringr::str_remove_all(collected_all_disease_terms,
pattern = "(^|, )hepatitis a infection(,|$)"),
collected_all_disease_terms
)
)
gwas_study_info |>
filter(grepl("epstein", collected_all_disease_terms, ignore.case = T)) |>
pull(collected_all_disease_terms) |> unique()
[1] "epstein-barr virus infection"
gwas_study_info = gwas_study_info |>
mutate(collected_all_disease_terms =
stringr::str_remove_all(collected_all_disease_terms,
pattern = "(^|, )infectious mononucleosis(,|$)"
)
)
gwas_study_info = gwas_study_info |>
mutate(collected_all_disease_terms =
stringr::str_remove_all(collected_all_disease_terms,
pattern = "(^|, )epstein-barr virus infection(,|$)"
)
)
gwas_study_info = gwas_study_info |>
mutate(collected_all_disease_terms =
stringr::str_remove_all(collected_all_disease_terms,
pattern = "(^|, )protozoa infectious disease(,|$)"
)
)
gwas_study_info =
gwas_study_info |>
mutate(collected_all_disease_terms =
ifelse(grepl("herpes simplex", `DISEASE/TRAIT`, ignore.case = T) &
grepl("herpesviridae infectious disease", collected_all_disease_terms, ignore.case = T),
stringr::str_replace_all(collected_all_disease_terms,
pattern = "herpesviridae infectious disease",
"herpes simplex infection"),
collected_all_disease_terms
)
)
gwas_study_info |>
filter(grepl("herpes", collected_all_disease_terms, ignore.case = T)) |>
pull(`DISEASE/TRAIT`) |>
unique()
[1] "Shingles"
[2] "Source of report of B02 (zoster [herpes zoster]) (UKB data field 130179)"
[3] "Herpes zoster with nervous system complications (PheCode 53.1)"
[4] "Herpes simplex (PheCode 54)"
[5] "ICD10 B00: Herpesviral [herpes simplex] infections"
[6] "Response to tofacitinib treatment in rheumatoid arthritis (herpes zoster)(time to event)"
[7] "Response to tofacitinib treatment in psoriasis (herpes zoster)(time to event)"
[8] "Response to tofacitinib treatment in rheumatoid arthritis (herpes zoster)"
[9] "Response to tofacitinib treatment in psoriasis (herpes zoster)"
[10] "Response to tofacitinib treatment (herpes zoster)(time to event)"
[11] "Response to tofacitinib treatment (herpes zoster)"
[12] "ICD10 B00: Herpesviral [herpes simplex] infections (Gene-based burden)"
[13] "Herpes infection"
# parasitic infection
gwas_study_info =
gwas_study_info |>
rowwise() |>
mutate(collected_all_disease_terms = paste0(sort(unique(unlist(strsplit(collected_all_disease_terms, ", ")))),
collapse = ", ")
) |>
ungroup()
gwas_study_info = gwas_study_info |>
mutate(collected_all_disease_terms = stringr::str_remove_all(collected_all_disease_terms, "^,|,$")
) |>
mutate(collected_all_disease_terms = stringr::str_trim(collected_all_disease_terms)
)
n_studies_trait = gwas_study_info |>
dplyr::filter(DISEASE_STUDY == T) |>
dplyr::select(collected_all_disease_terms, PUBMED_ID) |>
dplyr::distinct() |>
dplyr::group_by(collected_all_disease_terms) |>
dplyr::summarise(n_studies = dplyr::n()) |>
dplyr::arrange(desc(n_studies))
head(n_studies_trait)
# A tibble: 6 × 2
collected_all_disease_terms n_studies
<chr> <int>
1 "type 2 diabetes mellitus" 146
2 "" 130
3 "alzheimers disease" 116
4 "breast cancer" 113
5 "asthma" 111
6 "major depressive disorder" 109
dim(n_studies_trait)
[1] 2993 2
diseases <- stringr::str_split(pattern = ", ",
gwas_study_info$collected_all_disease_terms[gwas_study_info$collected_all_disease_terms != ""]) |>
unlist() |>
stringr::str_trim()
length(unique(diseases))
[1] 2184
fwrite(gwas_study_info,
here::here("output/gwas_cat/gwas_study_info_disease_trait_filtered.csv"))
sessionInfo()
R version 4.3.1 (2023-06-16)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS 15.6.1
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
time zone: America/Los_Angeles
tzcode source: internal
attached base packages:
[1] stats graphics grDevices datasets utils methods base
other attached packages:
[1] jsonlite_2.0.0 httr_1.4.7 stringr_1.5.1 data.table_1.17.8
[5] dplyr_1.1.4 workflowr_1.7.1
loaded via a namespace (and not attached):
[1] compiler_4.3.1 renv_1.0.3 promises_1.3.3 tidyselect_1.2.1
[5] Rcpp_1.1.0 git2r_0.36.2 callr_3.7.6 later_1.4.2
[9] jquerylib_0.1.4 yaml_2.3.10 fastmap_1.2.0 here_1.0.1
[13] R6_2.6.1 generics_0.1.4 curl_6.4.0 knitr_1.50
[17] tibble_3.3.0 rprojroot_2.1.0 bslib_0.9.0 pillar_1.11.0
[21] rlang_1.1.6 utf8_1.2.6 cachem_1.1.0 stringi_1.8.7
[25] httpuv_1.6.16 xfun_0.52 getPass_0.2-4 fs_1.6.6
[29] sass_0.4.10 cli_3.6.5 withr_3.0.2 magrittr_2.0.3
[33] ps_1.9.1 digest_0.6.37 processx_3.8.6 rstudioapi_0.17.1
[37] lifecycle_1.0.4 vctrs_0.6.5 evaluate_1.0.4 glue_1.8.0
[41] whisker_0.4.1 rmarkdown_2.29 tools_4.3.1 pkgconfig_2.0.3
[45] htmltools_0.5.8.1