Last updated: 2025-09-23
Checks: 7 0
Knit directory:
genomics_ancest_disease_dispar/
This reproducible R Markdown analysis was created with workflowr (version 1.7.1). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.
Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.
Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.
The command set.seed(20220216)
was run prior to running
the code in the R Markdown file. Setting a seed ensures that any results
that rely on randomness, e.g. subsampling or permutations, are
reproducible.
Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.
Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.
Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.
The results in this page were generated with repository version a78628a. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.
Note that you need to be careful to ensure that all relevant files for
the analysis have been committed to Git prior to generating the results
(you can use wflow_publish
or
wflow_git_commit
). workflowr only checks the R Markdown
file, but you know if there are other scripts or data files that it
depends on. Below is the status of the Git repository when the results
were generated:
Ignored files:
Ignored: .DS_Store
Ignored: .Rproj.user/
Ignored: data/.DS_Store
Ignored: data/gbd/.DS_Store
Ignored: data/gbd/IHME-GBD_2021_DATA-d8cf695e-1.csv
Ignored: data/gbd/ihme_gbd_2019_global_disease_burden_rate_all_ages.csv
Ignored: data/gbd/ihme_gbd_2019_global_paf_rate_percent_all_ages.csv
Ignored: data/gbd/ihme_gbd_2021_global_disease_burden_rate_all_ages.csv
Ignored: data/gbd/ihme_gbd_2021_global_paf_rate_percent_all_ages.csv
Ignored: data/gwas_catalog/
Ignored: data/icd/
Ignored: data/who/
Ignored: output/gwas_cat/
Ignored: output/gwas_study_info_cohort_corrected.csv
Ignored: output/gwas_study_info_trait_corrected.csv
Ignored: output/gwas_study_info_trait_ontology_info.csv
Ignored: output/gwas_study_info_trait_ontology_info_l1.csv
Ignored: output/gwas_study_info_trait_ontology_info_l2.csv
Ignored: output/trait_ontology/
Ignored: renv/
Unstaged changes:
Modified: analysis/level_1_disease_group_cancer.Rmd
Modified: analysis/level_1_disease_group_non_cancer.Rmd
Modified: analysis/level_2_disease_group.Rmd
Modified: code/get_term_descendants.R
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
These are the previous versions of the repository in which changes were
made to the R Markdown
(analysis/exclude_infectious_diseases.Rmd
) and HTML
(docs/exclude_infectious_diseases.html
) files. If you’ve
configured a remote Git repository (see ?wflow_git_remote
),
click on the hyperlinks in the table below to view the files as they
were in that past version.
File | Version | Author | Date | Message |
---|---|---|---|---|
Rmd | a78628a | IJbeasley | 2025-09-23 | More icd codes for infectious disease |
html | bdcc9eb | IJbeasley | 2025-09-22 | Build site. |
Rmd | f897b13 | IJbeasley | 2025-09-22 | More typo + structure fixing … |
html | 9e1ed75 | IJbeasley | 2025-09-22 | Build site. |
Rmd | 351c6a9 | IJbeasley | 2025-09-22 | More typo + structure fixing … |
html | 56a3b23 | IJbeasley | 2025-09-22 | Build site. |
Rmd | 8a0be87 | IJbeasley | 2025-09-22 | More typo correcting |
html | b284521 | IJbeasley | 2025-09-22 | Build site. |
Rmd | ec5636b | IJbeasley | 2025-09-22 | …maybe fixing typos |
html | 73ed9df | IJbeasley | 2025-09-22 | Build site. |
Rmd | abee648 | IJbeasley | 2025-09-22 | …maybe fixing typos |
html | a241ae1 | IJbeasley | 2025-09-22 | Build site. |
Rmd | 62b9234 | IJbeasley | 2025-09-22 | Excluding infectious disease |
library(dplyr)
library(data.table)
library(stringr)
source(here::here("code/get_term_descendants.R"))
gwas_study_info <- fread(here::here("output/gwas_cat/gwas_study_info_disease_trait_simplified.csv"))
diseases <- stringr::str_split(pattern = ", ",
gwas_study_info$collected_all_disease_terms[gwas_study_info$collected_all_disease_terms != ""]) |>
unlist() |>
stringr::str_trim()
length(unique(diseases))
[1] 2187
gwas_study_info = gwas_study_info |>
mutate(collected_all_disease_terms =
stringr::str_replace_all(collected_all_disease_terms,
vec_to_grep_pattern("hiv-1 infection"),
"hiv infection"
))
gwas_study_info = gwas_study_info |>
mutate(collected_all_disease_terms =
stringr::str_remove_all(collected_all_disease_terms,
pattern = vec_to_grep_pattern("hiv infection")
)
)
gwas_study_info |>
filter(grepl("syphi", collected_all_disease_terms, ignore.case = T)) |>
select(`DISEASE/TRAIT`, collected_all_disease_terms)
Empty data.table (0 rows and 2 cols): DISEASE/TRAIT,collected_all_disease_terms
gwas_study_info = gwas_study_info |>
mutate(collected_all_disease_terms =
stringr::str_remove_all(collected_all_disease_terms,
pattern = vec_to_grep_pattern("chlamydophila infectious disease")
)
)
gwas_study_info |>
filter(grepl("gonorrhea", collected_all_disease_terms, ignore.case = T)) |>
select(`DISEASE/TRAIT`, collected_all_disease_terms)
Empty data.table (0 rows and 2 cols): DISEASE/TRAIT,collected_all_disease_terms
gwas_study_info |>
filter(grepl("trichomoniasis", collected_all_disease_terms, ignore.case = T)) |>
select(`DISEASE/TRAIT`, collected_all_disease_terms)
Empty data.table (0 rows and 2 cols): DISEASE/TRAIT,collected_all_disease_terms
gwas_study_info |>
filter(grepl("herpes", collected_all_disease_terms, ignore.case = T)) |>
select(`DISEASE/TRAIT`, collected_all_disease_terms)
DISEASE/TRAIT
<char>
1: Shingles
2: Source of report of B02 (zoster [herpes zoster]) (UKB data field 130179)
3: ICD10 B02: Zoster [herpes zoster]
4: Herpes zoster (PheCode 53)
5: Herpes zoster with nervous system complications (PheCode 53.1)
6: Herpes simplex (PheCode 54)
7: Herpes zoster
8: Herpes simplex (PheCode 54)
9: Herpes zoster with nervous system complications (PheCode 53.1)
10: Herpes zoster with nervous system complications (PheCode 53.1)
11: Herpes zoster with nervous system complications (PheCode 53.1)
12: Herpes zoster (PheCode 53)
13: Herpes zoster (PheCode 53)
14: Herpes zoster (PheCode 53)
15: Herpes zoster (PheCode 53)
16: Herpes zoster with nervous system complications (PheCode 53.1)
17: Herpes simplex (PheCode 54)
18: Herpes simplex (PheCode 54)
19: Herpes simplex (PheCode 54)
20: Shingles
21: ICD10 B00: Herpesviral [herpes simplex] infections
22: ICD10 B02: Zoster [herpes zoster]
23: Response to tofacitinib treatment in rheumatoid arthritis (herpes zoster)(time to event)
24: Response to tofacitinib treatment in psoriasis (herpes zoster)(time to event)
25: Response to tofacitinib treatment in rheumatoid arthritis (herpes zoster)
26: Response to tofacitinib treatment in psoriasis (herpes zoster)
27: Response to tofacitinib treatment (herpes zoster)(time to event)
28: Response to tofacitinib treatment (herpes zoster)
29: ICD10 B00: Herpesviral [herpes simplex] infections (Gene-based burden)
30: ICD10 B02: Zoster [herpes zoster] (Gene-based burden)
31: Zoster infection
32: Herpes infection
33: Zoster infection
34: Herpes infection
DISEASE/TRAIT
collected_all_disease_terms
<char>
1: herpes zoster
2: herpes zoster
3: herpes zoster
4: herpes zoster
5: herpes zoster, nervous system disease, complication
6: herpes simplex infection
7: herpes zoster
8: herpes simplex infection
9: herpes zoster, nervous system disease, complication
10: herpes zoster, nervous system disease, complication
11: herpes zoster, nervous system disease, complication
12: herpes zoster
13: herpes zoster
14: herpes zoster
15: herpes zoster
16: herpes zoster, nervous system disease, complication
17: herpes simplex infection
18: herpes simplex infection
19: herpes simplex infection
20: herpes zoster
21: herpesviridae infectious disease
22: herpes zoster
23: rheumatoid arthritis, herpes zoster
24: psoriasis, herpes zoster
25: rheumatoid arthritis, herpes zoster
26: psoriasis, herpes zoster
27: psoriasis, rheumatoid arthritis, herpes zoster
28: psoriasis, rheumatoid arthritis, herpes zoster
29: herpesviridae infectious disease
30: herpes zoster
31: herpes zoster
32: herpes simplex infection
33: herpes zoster
34: herpes simplex infection
collected_all_disease_terms
url <- "http://www.ebi.ac.uk/ols4/api/ontologies/mesh/terms/http%253A%252F%252Fid.nlm.nih.gov%252Fmesh%252FD012749/descendants"
sti_terms <- get_descendants(url)
[1] "Number of terms collected:"
[1] 36
[1] "\n Some example terms"
[1] "sexually transmitted diseases, bacterial"
[2] "aids arteritis, central nervous system"
[3] "hiv-associated lipodystrophy syndrome"
[4] "aids-related opportunistic infections"
[5] "sexually transmitted diseases, viral"
for(term in sti_terms){
if(sum(grepl(term, gwas_study_info$collected_all_disease_terms)) > 0) {
print(paste0("Term removed:", term))
gwas_study_info = gwas_study_info |>
mutate(collected_all_disease_terms =
stringr::str_remove_all(collected_all_disease_terms,
pattern = vec_to_grep_pattern(term)
)
)
}
}
[1] "Term removed:warts"
tb_terms <- c("mycobacterium tuberculosis infection",
"pulmonary tuberculosis",
"extrapulmonary tuberculosis",
"meningeal tuberculosis")
gwas_study_info = gwas_study_info |>
mutate(collected_all_disease_terms =
stringr::str_replace_all(collected_all_disease_terms,
pattern = vec_to_grep_pattern(tb_terms),
"tuberculosis"
))
# acute lower respiratory infections
url <- "http://www.ebi.ac.uk/ols4/api/ontologies/snomed/terms/http%253A%252F%252Fsnomed.info%252Fid%252F195742007/descendants"
# in the case of acute / chronic bronchitis, we will remove only acute
gwas_study_info =
gwas_study_info |>
mutate(collected_all_disease_terms =
ifelse(grepl("acute bronchitis", `DISEASE/TRAIT`, ignore.case = T) &
grepl(vec_to_grep_pattern("bronchitis"),
collected_all_disease_terms,
ignore.case = T,
perl = T),
stringr::str_remove_all(collected_all_disease_terms,
vec_to_grep_pattern("bronchitis")),
collected_all_disease_terms
))
# if bronchitis is specified as chronic in DISEASE/TRAIT, replace bronchitis with chronic bronchitis
# in to collected_all_disease_terms
gwas_study_info =
gwas_study_info |>
mutate(collected_all_disease_terms =
ifelse(grepl("chronic bronchitis", `DISEASE/TRAIT`, ignore.case = T) &
grepl(vec_to_grep_pattern("bronchitis"),
collected_all_disease_terms,
ignore.case = T,
perl = T),
stringr::str_replace_all(collected_all_disease_terms,
vec_to_grep_pattern("bronchitis"),
"chronic bronchitis"),
collected_all_disease_terms
))
# similar, for bronchiolitis, remove if acute
gwas_study_info =
gwas_study_info |>
mutate(collected_all_disease_terms =
ifelse(grepl("acute bronchiolitis", `DISEASE/TRAIT`, ignore.case = T) &
grepl(vec_to_grep_pattern("bronchiolitis"),
collected_all_disease_terms,
ignore.case = T,
perl = T),
stringr::str_remove_all(collected_all_disease_terms,
vec_to_grep_pattern("bronchiolitis")),
collected_all_disease_terms
))
lri_terms <- get_descendants(url)
[1] "Number of terms collected:"
[1] 35
[1] "\n Some example terms"
[1] "chronic obstructive pulmonary disease with acute lower respiratory infection"
[2] "acute lower respiratory infection caused by respiratory syncytial virus"
[3] "acute infective exacerbation of chronic obstructive pulmonary disease"
[4] "inflammation of bronchiole caused by human metapneumovirus"
[5] "acute bronchiolitis caused by respiratory syncytial virus"
lri_terms <- c("acute bronchitis",
"respiratory syncytial virus infection",
lri_terms
)
for(term in lri_terms){
if(sum(grepl(term, gwas_study_info$collected_all_disease_terms)) > 0) {
print(paste0("Term removed:", term))
gwas_study_info = gwas_study_info |>
mutate(collected_all_disease_terms =
stringr::str_remove_all(collected_all_disease_terms,
pattern = vec_to_grep_pattern(term)
)
)
}
}
[1] "Term removed:acute bronchitis"
[1] "Term removed:respiratory syncytial virus infection"
# remove larygitis, if acute
gwas_study_info =
gwas_study_info |>
mutate(collected_all_disease_terms =
ifelse(grepl("acute laryngitis", `DISEASE/TRAIT`, ignore.case = T) &
grepl(vec_to_grep_pattern("laryngitis"),
collected_all_disease_terms,
ignore.case = T,
perl = T),
stringr::str_remove_all(collected_all_disease_terms,
vec_to_grep_pattern("laryngitis")),
collected_all_disease_terms
))
# if laryngitis is specified as chronic in DISEASE/TRAIT, replace laryngitis with chronic laryngitis
gwas_study_info =
gwas_study_info |>
mutate(collected_all_disease_terms =
ifelse(grepl("chronic laryngitis", `DISEASE/TRAIT`, ignore.case = T) &
grepl(vec_to_grep_pattern("laryngitis"),
collected_all_disease_terms,
ignore.case = T,
perl = T),
stringr::str_replace_all(collected_all_disease_terms,
vec_to_grep_pattern("laryngitis"),
"chronic laryngitis"),
collected_all_disease_terms
))
# for tonsillitis, remove if acute
gwas_study_info =
gwas_study_info |>
mutate(collected_all_disease_terms =
ifelse(grepl("acute tonsillitis|ICD10 J03", `DISEASE/TRAIT`, ignore.case = T) &
grepl(vec_to_grep_pattern("tonsillitis"),
collected_all_disease_terms,
ignore.case = T,
perl = T),
stringr::str_remove_all(collected_all_disease_terms,
vec_to_grep_pattern("tonsillitis")),
collected_all_disease_terms
))
# for tonsillitis, if chronic in DISEASE/TRAIT, replace tonsillitis with chronic tonsillitis
gwas_study_info =
gwas_study_info |>
mutate(collected_all_disease_terms =
ifelse(grepl("chronic tonsillitis", `DISEASE/TRAIT`, ignore.case = T) &
grepl(vec_to_grep_pattern("tonsillitis"),
collected_all_disease_terms,
ignore.case = T,
perl = T),
stringr::str_replace_all(collected_all_disease_terms,
vec_to_grep_pattern("tonsillitis"),
"chronic tonsillitis"),
collected_all_disease_terms
))
# acute upper respiratory infections
url <- "http://www.ebi.ac.uk/ols4/api/ontologies/snomed/terms/http%253A%252F%252Fsnomed.info%252Fid%252F54398005/descendants"
uri_terms <- get_descendants(url)
[1] "Number of terms collected:"
[1] 30
[1] "\n Some example terms"
[1] "acute upper respiratory infection caused by respiratory syncytial virus"
[2] "acute infective pharyngitis caused by streptococcus (disorder)"
[3] "acute nasopharyngitis caused by respiratory syncytial virus"
[4] "acute laryngitis caused by haemophilus influenzae"
[5] "fusobacterial necrotizing tonsillitis (disorder)"
uri_terms <- c("strep throat",
"streptococcal pharyngitis",
"acute pharyngitis",
"pharyngitis",
"acute tonsillitis",
uri_terms
)
for(term in uri_terms){
if(sum(grepl(term, gwas_study_info$collected_all_disease_terms)) > 0) {
print(paste0("Term removed:", term))
gwas_study_info = gwas_study_info |>
mutate(collected_all_disease_terms =
stringr::str_remove_all(collected_all_disease_terms,
pattern = vec_to_grep_pattern(term)
)
)
}
}
[1] "Term removed:strep throat"
[1] "Term removed:streptococcal pharyngitis"
[1] "Term removed:acute pharyngitis"
[1] "Term removed:pharyngitis"
[1] "Term removed:acute tonsillitis"
url <- "http://www.ebi.ac.uk/ols4/api/ontologies/doid/terms/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FDOID_10754/descendants"
otitis_media_terms <- get_descendants(url)
[1] "Number of terms collected:"
[1] 20
[1] "\n Some example terms"
[1] "chronic tubotympanic suppurative otitis media"
[2] "acute allergic sanguinous otitis media"
[3] "acute allergic mucoid otitis media"
[4] "acute allergic serous otitis media"
[5] "middle ear cholesterol granuloma"
gwas_study_info = gwas_study_info |>
mutate(collected_all_disease_terms =
stringr::str_replace_all(collected_all_disease_terms,
pattern = vec_to_grep_pattern(otitis_media_terms),
"otitis media"
))
gwas_study_info = gwas_study_info |>
mutate(collected_all_disease_terms =
stringr::str_remove_all(collected_all_disease_terms,
pattern = vec_to_grep_pattern("otitis media")
)
)
gwas_study_info = gwas_study_info |>
mutate(collected_all_disease_terms =
stringr::str_remove_all(collected_all_disease_terms,
pattern = vec_to_grep_pattern("covid-19")
)
)
gwas_study_info = gwas_study_info |>
mutate(l1_all_disease_terms =
stringr::str_replace_all(collected_all_disease_terms,
vec_to_grep_pattern("influenza a \\(h1n1\\)"),
"influenza"
))
gwas_study_info = gwas_study_info |>
mutate(collected_all_disease_terms =
stringr::str_remove_all(collected_all_disease_terms,
pattern = vec_to_grep_pattern("influenza")
)
)
gwas_study_info = gwas_study_info |>
mutate(collected_all_disease_terms =
stringr::str_remove_all(collected_all_disease_terms,
pattern = vec_to_grep_pattern("astroviridae infectious disease, infantile diarrhea")
)
)
# if diarrhea is caused by IBS, add ibs to collected_all_disease_terms
gwas_study_info = gwas_study_info |>
mutate(collected_all_disease_terms =
ifelse(grepl("IBS", `DISEASE/TRAIT`, ignore.case = T) &
grepl("diarrhea", collected_all_disease_terms, ignore.case = T),
paste0(collected_all_disease_terms, ", irritable bowel syndrome"),
collected_all_disease_terms
)
)
gwas_study_info = gwas_study_info |>
mutate(collected_all_disease_terms =
stringr::str_remove_all(collected_all_disease_terms,
pattern = vec_to_grep_pattern("typhoid fever")
)
)
gwas_study_info = gwas_study_info |>
mutate(collected_all_disease_terms =
stringr::str_remove_all(collected_all_disease_terms,
pattern = vec_to_grep_pattern("non-typhoidal salmonella bacteremia")
)
)
# remove bacterial and viral gastroenteritis
gwas_study_info = gwas_study_info |>
mutate(collected_all_disease_terms =
ifelse(grepl("gastroenteritis", collected_all_disease_terms, ignore.case = T) &
grepl("bacterial gastroenteritis|viral gastroenteritis",
collected_all_disease_terms,
ignore.case = T),
stringr::str_remove_all(collected_all_disease_terms,
pattern = vec_to_grep_pattern("gastroenteritis")
),
collected_all_disease_terms
)
)
# remove viral and bacterial enteritis
gwas_study_info = gwas_study_info |>
mutate(collected_all_disease_terms =
ifelse(grepl("enteritis", collected_all_disease_terms, ignore.case = T) &
grepl("bacterial enteritis|viral enteritis",
collected_all_disease_terms,
ignore.case = T),
stringr::str_remove_all(collected_all_disease_terms,
pattern = vec_to_grep_pattern("enteritis")
),
collected_all_disease_terms
)
)
gwas_study_info = gwas_study_info |>
mutate(collected_all_disease_terms =
stringr::str_remove_all(collected_all_disease_terms,
pattern = vec_to_grep_pattern("intestinal infectious disease")
)
)
gwas_study_info = gwas_study_info |>
mutate(collected_all_disease_terms =
stringr::str_replace_all(collected_all_disease_terms,
vec_to_grep_pattern(
c("plasmodium falciparum malaria",
"plasmodium vivax malaria"
)),
"malaria"
))
gwas_study_info = gwas_study_info |>
mutate(collected_all_disease_terms =
stringr::str_remove_all(collected_all_disease_terms,
pattern = vec_to_grep_pattern("malaria")
)
)
gwas_study_info = gwas_study_info |>
mutate(collected_all_disease_terms =
stringr::str_remove_all(collected_all_disease_terms,
pattern = vec_to_grep_pattern("chagas disease")
)
)
gwas_study_info = gwas_study_info |>
mutate(collected_all_disease_terms =
stringr::str_remove_all(collected_all_disease_terms,
pattern = vec_to_grep_pattern("visceral leishmaniasis")
)
)
gwas_study_info = gwas_study_info |>
mutate(collected_all_disease_terms =
stringr::str_remove_all(collected_all_disease_terms,
pattern = vec_to_grep_pattern("cutaneous leishmaniasis")
)
)
gwas_study_info |>
filter(grepl("trypanosomiasis", collected_all_disease_terms, ignore.case = T)) |>
select(`DISEASE/TRAIT`, collected_all_disease_terms)
Empty data.table (0 rows and 2 cols): DISEASE/TRAIT,collected_all_disease_terms
gwas_study_info |>
filter(grepl("schistosomiasis", collected_all_disease_terms, ignore.case = T)) |>
select(`DISEASE/TRAIT`, collected_all_disease_terms)
Empty data.table (0 rows and 2 cols): DISEASE/TRAIT,collected_all_disease_terms
gwas_study_info |>
filter(grepl("cysticercosis", collected_all_disease_terms, ignore.case = T)) |>
select(`DISEASE/TRAIT`, collected_all_disease_terms)
Empty data.table (0 rows and 2 cols): DISEASE/TRAIT,collected_all_disease_terms
gwas_study_info |>
filter(grepl("echinococcosis", collected_all_disease_terms, ignore.case = T)) |>
select(`DISEASE/TRAIT`, collected_all_disease_terms)
Empty data.table (0 rows and 2 cols): DISEASE/TRAIT,collected_all_disease_terms
gwas_study_info |>
filter(grepl("filariasis", collected_all_disease_terms, ignore.case = T)) |>
select(`DISEASE/TRAIT`, collected_all_disease_terms)
Empty data.table (0 rows and 2 cols): DISEASE/TRAIT,collected_all_disease_terms
gwas_study_info |>
filter(grepl("onchocerciasis", collected_all_disease_terms, ignore.case = T)) |>
select(`DISEASE/TRAIT`, collected_all_disease_terms)
Empty data.table (0 rows and 2 cols): DISEASE/TRAIT,collected_all_disease_terms
gwas_study_info |>
filter(grepl("trachoma", collected_all_disease_terms, ignore.case = T)) |>
select(`DISEASE/TRAIT`, collected_all_disease_terms)
Empty data.table (0 rows and 2 cols): DISEASE/TRAIT,collected_all_disease_terms
gwas_study_info =
gwas_study_info |>
mutate(collected_all_disease_terms =
stringr::str_replace_all(collected_all_disease_terms,
pattern = vec_to_grep_pattern("dengue hemorrhagic fever"),
"dengue"
))
gwas_study_info |>
filter(grepl("dengue", collected_all_disease_terms, ignore.case = T)) |>
select(`DISEASE/TRAIT`, collected_all_disease_terms)
DISEASE/TRAIT collected_all_disease_terms
<char> <char>
1: Dengue shock syndrome dengue
gwas_study_info |>
filter(grepl("yellow fever", collected_all_disease_terms, ignore.case = T)) |>
select(`DISEASE/TRAIT`, collected_all_disease_terms)
Empty data.table (0 rows and 2 cols): DISEASE/TRAIT,collected_all_disease_terms
gwas_study_info |>
filter(grepl("rabies", collected_all_disease_terms, ignore.case = T)) |>
select(`DISEASE/TRAIT`, collected_all_disease_terms)
Empty data.table (0 rows and 2 cols): DISEASE/TRAIT,collected_all_disease_terms
# ascariasis
gwas_study_info |>
filter(grepl("ascariasis", collected_all_disease_terms, ignore.case = T)) |>
select(`DISEASE/TRAIT`, collected_all_disease_terms)
Empty data.table (0 rows and 2 cols): DISEASE/TRAIT,collected_all_disease_terms
# trichuriasis
gwas_study_info |>
filter(grepl("trichuriasis", collected_all_disease_terms, ignore.case = T)) |>
select(`DISEASE/TRAIT`, collected_all_disease_terms)
Empty data.table (0 rows and 2 cols): DISEASE/TRAIT,collected_all_disease_terms
# hookworm
gwas_study_info |>
filter(grepl("hookworm", collected_all_disease_terms, ignore.case = T)) |>
select(`DISEASE/TRAIT`, collected_all_disease_terms)
Empty data.table (0 rows and 2 cols): DISEASE/TRAIT,collected_all_disease_terms
gwas_study_info |>
filter(grepl("trema", collected_all_disease_terms, ignore.case = T)
) |>
select(`DISEASE/TRAIT`, collected_all_disease_terms)
Empty data.table (0 rows and 2 cols): DISEASE/TRAIT,collected_all_disease_terms
gwas_study_info |>
filter(grepl("leprosy", collected_all_disease_terms, ignore.case = T)) |>
select(`DISEASE/TRAIT`, collected_all_disease_terms) |>
distinct()
DISEASE/TRAIT collected_all_disease_terms
<char> <char>
1: Leprosy leprosy
2: Crohn's disease or Leprosy leprosy, crohns disease
3: Crohn's disease or Leprosy (opposite effect) leprosy, crohns disease
gwas_study_info = gwas_study_info |>
mutate(collected_all_disease_terms =
stringr::str_remove_all(collected_all_disease_terms,
pattern = vec_to_grep_pattern("leprosy")
)
)
gwas_study_info |>
filter(grepl("ebola", collected_all_disease_terms, ignore.case = T)) |>
select(`DISEASE/TRAIT`, collected_all_disease_terms)
Empty data.table (0 rows and 2 cols): DISEASE/TRAIT,collected_all_disease_terms
gwas_study_info |>
filter(grepl("zika", collected_all_disease_terms, ignore.case = T)) |>
select(`DISEASE/TRAIT`, collected_all_disease_terms)
DISEASE/TRAIT collected_all_disease_terms
<char> <char>
1: Congenital Zika syndrome zika virus congenital syndrome
gwas_study_info |>
filter(grepl("worm", collected_all_disease_terms, ignore.case = T)) |>
select(`DISEASE/TRAIT`, collected_all_disease_terms)
Empty data.table (0 rows and 2 cols): DISEASE/TRAIT,collected_all_disease_terms
meningitis_terms <- c(
"meningococcal infection",
"infectious meningitis",
"bacterial meningitis",
"viral meningitis",
"pneumococcal meningitis",
"meningitis"
)
gwas_study_info = gwas_study_info |>
mutate(collected_all_disease_terms =
stringr::str_remove_all(collected_all_disease_terms,
pattern = vec_to_grep_pattern(meningitis_terms)
)
)
gwas_study_info = gwas_study_info |>
mutate(collected_all_disease_terms =
stringr::str_remove_all(collected_all_disease_terms,
pattern = vec_to_grep_pattern("viral encephalitis")
)
)
# ? also remove if DISEASE/TRAIT == Encephalitis (PheCode 323)
gwas_study_info = gwas_study_info |>
mutate(collected_all_disease_terms =
ifelse(`DISEASE/TRAIT` == "Encephalitis (PheCode 323)",
stringr::str_remove_all(collected_all_disease_terms,
pattern = vec_to_grep_pattern("encephalitis")
),
collected_all_disease_terms
)
)
gwas_study_info = gwas_study_info |>
mutate(collected_all_disease_terms =
stringr::str_remove_all(collected_all_disease_terms,
pattern = vec_to_grep_pattern("diphtheria")
)
)
gwas_study_info |>
filter(grepl("pertussis", collected_all_disease_terms, ignore.case = T)) |>
select(`DISEASE/TRAIT`, collected_all_disease_terms)
Empty data.table (0 rows and 2 cols): DISEASE/TRAIT,collected_all_disease_terms
gwas_study_info |>
filter(grepl("tetanus", collected_all_disease_terms, ignore.case = T)) |>
select(`DISEASE/TRAIT`, collected_all_disease_terms)
Empty data.table (0 rows and 2 cols): DISEASE/TRAIT,collected_all_disease_terms
gwas_study_info = gwas_study_info |>
mutate(collected_all_disease_terms =
stringr::str_remove_all(collected_all_disease_terms,
pattern = vec_to_grep_pattern("measles")
)
)
gwas_study_info = gwas_study_info |>
mutate(collected_all_disease_terms =
stringr::str_remove_all(collected_all_disease_terms,
pattern = vec_to_grep_pattern("chickenpox")
)
)
gwas_study_info = gwas_study_info |>
mutate(collected_all_disease_terms =
ifelse(`DISEASE/TRAIT` == "Herpes zoster (PheCode 53)" |
`DISEASE/TRAIT` == "Herpes zoster" |
`DISEASE/TRAIT` == "Zoster infection" |
grepl("ICD10 B02: Zoster \\[herpes zoster\\]", `DISEASE/TRAIT`),
stringr::str_remove_all(collected_all_disease_terms,
pattern = vec_to_grep_pattern("herpes zoster")
),
collected_all_disease_terms
)
)
gwas_study_info = gwas_study_info |>
mutate(collected_all_disease_terms =
stringr::str_replace_all(collected_all_disease_terms,
pattern = vec_to_grep_pattern("complication, herpes zoster, nervous system disease"),
"herpes zoster, nervous system disease"
)
)
# if acute hepatitis in DISEASE/TRAIT remove from collected_all_disease_terms
gwas_study_info |>
filter(grepl("acute hepat",
`DISEASE/TRAIT`,
ignore.case = T)) |>
pull(`DISEASE/TRAIT`) |>
unique()
[1] "Acute hepatitis A infection"
gwas_study_info = gwas_study_info |>
mutate(collected_all_disease_terms =
ifelse(`DISEASE/TRAIT` == "Acute hepatitis A infection",
stringr::str_remove_all(collected_all_disease_terms,
pattern = vec_to_grep_pattern("hepatitis a infection")
),
collected_all_disease_terms
)
)
gwas_study_info = gwas_study_info |>
mutate(collected_all_disease_terms =
stringr::str_remove_all(collected_all_disease_terms,
pattern = vec_to_grep_pattern("bacterial pneumonia")
)
)
gwas_study_info = gwas_study_info |>
mutate(collected_all_disease_terms =
stringr::str_remove_all(collected_all_disease_terms,
pattern = vec_to_grep_pattern("infectious mononucleosis")
)
)
gwas_study_info = gwas_study_info |>
mutate(collected_all_disease_terms =
stringr::str_remove_all(collected_all_disease_terms,
pattern = vec_to_grep_pattern("epstein-barr virus infection")
)
)
gwas_study_info = gwas_study_info |>
mutate(collected_all_disease_terms =
stringr::str_remove_all(collected_all_disease_terms,
pattern = vec_to_grep_pattern("protozoa infectious disease")
)
)
gwas_study_info =
gwas_study_info |>
mutate(collected_all_disease_terms =
ifelse(grepl("herpes simplex", `DISEASE/TRAIT`, ignore.case = T) &
grepl("herpesviridae infectious disease", collected_all_disease_terms, ignore.case = T),
stringr::str_replace_all(collected_all_disease_terms,
pattern = vec_to_grep_pattern("herpesviridae infectious disease"),
"herpes simplex infection"),
collected_all_disease_terms
)
)
# cold sores
gwas_study_info = gwas_study_info |>
mutate(collected_all_disease_terms =
stringr::str_replace_all(collected_all_disease_terms,
pattern = vec_to_grep_pattern("cold sores"),
"herpes simplex infection"
)
)
gwas_study_info |>
filter(grepl("herpes", collected_all_disease_terms, ignore.case = T)) |>
pull(`DISEASE/TRAIT`) |>
unique()
[1] "Cold sores"
[2] "Shingles"
[3] "Source of report of B02 (zoster [herpes zoster]) (UKB data field 130179)"
[4] "Herpes zoster with nervous system complications (PheCode 53.1)"
[5] "Herpes simplex (PheCode 54)"
[6] "ICD10 B00: Herpesviral [herpes simplex] infections"
[7] "Response to tofacitinib treatment in rheumatoid arthritis (herpes zoster)(time to event)"
[8] "Response to tofacitinib treatment in psoriasis (herpes zoster)(time to event)"
[9] "Response to tofacitinib treatment in rheumatoid arthritis (herpes zoster)"
[10] "Response to tofacitinib treatment in psoriasis (herpes zoster)"
[11] "Response to tofacitinib treatment (herpes zoster)(time to event)"
[12] "Response to tofacitinib treatment (herpes zoster)"
[13] "ICD10 B00: Herpesviral [herpes simplex] infections (Gene-based burden)"
[14] "Herpes infection"
gwas_study_info = gwas_study_info |>
mutate(collected_all_disease_terms =
stringr::str_remove_all(collected_all_disease_terms,
pattern = vec_to_grep_pattern("mumps")
)
)
gwas_study_info = gwas_study_info |>
mutate(collected_all_disease_terms =
stringr::str_remove_all(collected_all_disease_terms,
pattern = vec_to_grep_pattern("parasitic infection")
)
)
gwas_study_info = gwas_study_info |>
mutate(collected_all_disease_terms =
stringr::str_remove_all(collected_all_disease_terms,
pattern = vec_to_grep_pattern("helicobacter pylori infectious disease")
)
)
gwas_study_info = gwas_study_info |>
mutate(collected_all_disease_terms =
stringr::str_remove_all(collected_all_disease_terms,
pattern = vec_to_grep_pattern("common cold")
)
)
gwas_study_info = gwas_study_info |>
mutate(collected_all_disease_terms =
stringr::str_remove_all(collected_all_disease_terms,
pattern = vec_to_grep_pattern("tropical spastic paraparesis")
)
)
# parasitic infection
gwas_study_info =
gwas_study_info |>
rowwise() |>
mutate(collected_all_disease_terms = paste0(sort(unique(unlist(strsplit(collected_all_disease_terms, ", ")))),
collapse = ", ")
) |>
ungroup()
gwas_study_info = gwas_study_info |>
mutate(collected_all_disease_terms = stringr::str_remove_all(collected_all_disease_terms, "^,|,$")
) |>
mutate(collected_all_disease_terms = stringr::str_trim(collected_all_disease_terms)
)
n_studies_trait = gwas_study_info |>
dplyr::filter(DISEASE_STUDY == T) |>
dplyr::select(collected_all_disease_terms, PUBMED_ID) |>
dplyr::distinct() |>
dplyr::group_by(collected_all_disease_terms) |>
dplyr::summarise(n_studies = dplyr::n()) |>
dplyr::arrange(desc(n_studies))
head(n_studies_trait)
# A tibble: 6 × 2
collected_all_disease_terms n_studies
<chr> <int>
1 "type 2 diabetes mellitus" 146
2 "" 120
3 "alzheimers disease" 116
4 "breast cancer" 113
5 "asthma" 111
6 "major depressive disorder" 109
dim(n_studies_trait)
[1] 2940 2
diseases <- stringr::str_split(pattern = ", ",
gwas_study_info$collected_all_disease_terms[gwas_study_info$collected_all_disease_terms != ""]) |>
unlist() |>
stringr::str_trim()
length(unique(diseases))
[1] 2127
fwrite(gwas_study_info,
here::here("output/gwas_cat/gwas_study_info_disease_trait_filtered.csv"))
sessionInfo()
R version 4.3.1 (2023-06-16)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS 15.6.1
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
time zone: America/Los_Angeles
tzcode source: internal
attached base packages:
[1] stats graphics grDevices datasets utils methods base
other attached packages:
[1] jsonlite_2.0.0 httr_1.4.7 stringr_1.5.1 data.table_1.17.8
[5] dplyr_1.1.4 workflowr_1.7.1
loaded via a namespace (and not attached):
[1] compiler_4.3.1 renv_1.0.3 promises_1.3.3 tidyselect_1.2.1
[5] Rcpp_1.1.0 git2r_0.36.2 callr_3.7.6 later_1.4.2
[9] jquerylib_0.1.4 yaml_2.3.10 fastmap_1.2.0 here_1.0.1
[13] R6_2.6.1 generics_0.1.4 curl_6.4.0 knitr_1.50
[17] tibble_3.3.0 rprojroot_2.1.0 bslib_0.9.0 pillar_1.11.0
[21] rlang_1.1.6 utf8_1.2.6 cachem_1.1.0 stringi_1.8.7
[25] httpuv_1.6.16 xfun_0.52 getPass_0.2-4 fs_1.6.6
[29] sass_0.4.10 cli_3.6.5 withr_3.0.2 magrittr_2.0.3
[33] ps_1.9.1 digest_0.6.37 processx_3.8.6 rstudioapi_0.17.1
[37] lifecycle_1.0.4 vctrs_0.6.5 evaluate_1.0.4 glue_1.8.0
[41] whisker_0.4.1 rmarkdown_2.29 tools_4.3.1 pkgconfig_2.0.3
[45] htmltools_0.5.8.1