Last updated: 2025-09-22

Checks: 7 0

Knit directory: genomics_ancest_disease_dispar/

This reproducible R Markdown analysis was created with workflowr (version 1.7.1). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.


Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

The command set.seed(20220216) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version f897b13. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .DS_Store
    Ignored:    .Rproj.user/
    Ignored:    data/.DS_Store
    Ignored:    data/gbd/.DS_Store
    Ignored:    data/gbd/IHME-GBD_2021_DATA-d8cf695e-1.csv
    Ignored:    data/gbd/ihme_gbd_2019_global_disease_burden_rate_all_ages.csv
    Ignored:    data/gbd/ihme_gbd_2019_global_paf_rate_percent_all_ages.csv
    Ignored:    data/gbd/ihme_gbd_2021_global_disease_burden_rate_all_ages.csv
    Ignored:    data/gbd/ihme_gbd_2021_global_paf_rate_percent_all_ages.csv
    Ignored:    data/gwas_catalog/
    Ignored:    data/who/
    Ignored:    output/gwas_cat/
    Ignored:    output/gwas_study_info_cohort_corrected.csv
    Ignored:    output/gwas_study_info_trait_corrected.csv
    Ignored:    output/gwas_study_info_trait_ontology_info.csv
    Ignored:    output/gwas_study_info_trait_ontology_info_l1.csv
    Ignored:    output/gwas_study_info_trait_ontology_info_l2.csv
    Ignored:    output/trait_ontology/
    Ignored:    renv/

Unstaged changes:
    Modified:   analysis/level_1_disease_group_cancer.Rmd
    Modified:   analysis/level_1_disease_group_non_cancer.Rmd
    Modified:   analysis/level_2_disease_group.Rmd
    Modified:   code/get_term_descendants.R

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.


These are the previous versions of the repository in which changes were made to the R Markdown (analysis/exclude_infectious_diseases.Rmd) and HTML (docs/exclude_infectious_diseases.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File Version Author Date Message
Rmd f897b13 IJbeasley 2025-09-22 More typo + structure fixing …
html 9e1ed75 IJbeasley 2025-09-22 Build site.
Rmd 351c6a9 IJbeasley 2025-09-22 More typo + structure fixing …
html 56a3b23 IJbeasley 2025-09-22 Build site.
Rmd 8a0be87 IJbeasley 2025-09-22 More typo correcting
html b284521 IJbeasley 2025-09-22 Build site.
Rmd ec5636b IJbeasley 2025-09-22 …maybe fixing typos
html 73ed9df IJbeasley 2025-09-22 Build site.
Rmd abee648 IJbeasley 2025-09-22 …maybe fixing typos
html a241ae1 IJbeasley 2025-09-22 Build site.
Rmd 62b9234 IJbeasley 2025-09-22 Excluding infectious disease

1 Set up

library(dplyr)
library(data.table)
library(stringr)

1.1 Ontology help - for getting disease subtypes

source(here::here("code/get_term_descendants.R"))
gwas_study_info <- fread(here::here("output/gwas_cat/gwas_study_info_disease_trait_simplified.csv"))

1.2 Initial summary - number of unique study terms

1.2.1 When separate studies with multiple terms

diseases <- stringr::str_split(pattern = ", ", 
 gwas_study_info$collected_all_disease_terms[gwas_study_info$collected_all_disease_terms != ""])  |> 
            unlist() |>
            stringr::str_trim()

length(unique(diseases))
[1] 2187

2 HIV/AIDS and sexually transmitted infections

2.1 HIV/AIDS

2.1.1 HIV subgrouping

gwas_study_info = gwas_study_info |>
    mutate(collected_all_disease_terms = 
         stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern("hiv-1 infection"),
                          "hiv infection"
         ))
gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_remove_all(collected_all_disease_terms,
                          pattern = vec_to_grep_pattern("hiv infection")
         )
         )

2.2 Sexually transmitted infections (other than HIV)

2.2.1 Syphilis

gwas_study_info |>
  filter(grepl("syphi", collected_all_disease_terms, ignore.case = T)) |>
  select(`DISEASE/TRAIT`, collected_all_disease_terms)
Empty data.table (0 rows and 2 cols): DISEASE/TRAIT,collected_all_disease_terms

2.2.2 Chlamydial infections

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_remove_all(collected_all_disease_terms,
                          pattern = vec_to_grep_pattern("chlamydophila infectious disease")
         )
         )

2.2.3 Gonorrhea

gwas_study_info |>
  filter(grepl("gonorrhea", collected_all_disease_terms, ignore.case = T)) |>
  select(`DISEASE/TRAIT`, collected_all_disease_terms)
Empty data.table (0 rows and 2 cols): DISEASE/TRAIT,collected_all_disease_terms

2.2.4 Trichomoniasis

gwas_study_info |>
  filter(grepl("trichomoniasis", collected_all_disease_terms, ignore.case = T)) |>
  select(`DISEASE/TRAIT`, collected_all_disease_terms)
Empty data.table (0 rows and 2 cols): DISEASE/TRAIT,collected_all_disease_terms

2.2.5 Genital herpes

gwas_study_info |>
  filter(grepl("herpes", collected_all_disease_terms, ignore.case = T)) |>
  select(`DISEASE/TRAIT`, collected_all_disease_terms)
                                                                               DISEASE/TRAIT
                                                                                      <char>
 1:                                                                                 Shingles
 2:                 Source of report of B02 (zoster [herpes zoster]) (UKB data field 130179)
 3:                                                        ICD10 B02: Zoster [herpes zoster]
 4:                                                               Herpes zoster (PheCode 53)
 5:                           Herpes zoster with nervous system complications (PheCode 53.1)
 6:                                                              Herpes simplex (PheCode 54)
 7:                                                                            Herpes zoster
 8:                                                              Herpes simplex (PheCode 54)
 9:                           Herpes zoster with nervous system complications (PheCode 53.1)
10:                           Herpes zoster with nervous system complications (PheCode 53.1)
11:                           Herpes zoster with nervous system complications (PheCode 53.1)
12:                                                               Herpes zoster (PheCode 53)
13:                                                               Herpes zoster (PheCode 53)
14:                                                               Herpes zoster (PheCode 53)
15:                                                               Herpes zoster (PheCode 53)
16:                           Herpes zoster with nervous system complications (PheCode 53.1)
17:                                                              Herpes simplex (PheCode 54)
18:                                                              Herpes simplex (PheCode 54)
19:                                                              Herpes simplex (PheCode 54)
20:                                                                                 Shingles
21:                                       ICD10 B00: Herpesviral [herpes simplex] infections
22:                                                        ICD10 B02: Zoster [herpes zoster]
23: Response to tofacitinib treatment in rheumatoid arthritis (herpes zoster)(time to event)
24:            Response to tofacitinib treatment in psoriasis (herpes zoster)(time to event)
25:                Response to tofacitinib treatment in rheumatoid arthritis (herpes zoster)
26:                           Response to tofacitinib treatment in psoriasis (herpes zoster)
27:                         Response to tofacitinib treatment (herpes zoster)(time to event)
28:                                        Response to tofacitinib treatment (herpes zoster)
29:                   ICD10 B00: Herpesviral [herpes simplex] infections (Gene-based burden)
30:                                    ICD10 B02: Zoster [herpes zoster] (Gene-based burden)
31:                                                                         Zoster infection
32:                                                                         Herpes infection
33:                                                                         Zoster infection
34:                                                                         Herpes infection
                                                                               DISEASE/TRAIT
                            collected_all_disease_terms
                                                 <char>
 1:                                       herpes zoster
 2:                                       herpes zoster
 3:                                       herpes zoster
 4:                                       herpes zoster
 5: herpes zoster, nervous system disease, complication
 6:                            herpes simplex infection
 7:                                       herpes zoster
 8:                            herpes simplex infection
 9: herpes zoster, nervous system disease, complication
10: herpes zoster, nervous system disease, complication
11: herpes zoster, nervous system disease, complication
12:                                       herpes zoster
13:                                       herpes zoster
14:                                       herpes zoster
15:                                       herpes zoster
16: herpes zoster, nervous system disease, complication
17:                            herpes simplex infection
18:                            herpes simplex infection
19:                            herpes simplex infection
20:                                       herpes zoster
21:                    herpesviridae infectious disease
22:                                       herpes zoster
23:                 rheumatoid arthritis, herpes zoster
24:                            psoriasis, herpes zoster
25:                 rheumatoid arthritis, herpes zoster
26:                            psoriasis, herpes zoster
27:      psoriasis, rheumatoid arthritis, herpes zoster
28:      psoriasis, rheumatoid arthritis, herpes zoster
29:                    herpesviridae infectious disease
30:                                       herpes zoster
31:                                       herpes zoster
32:                            herpes simplex infection
33:                                       herpes zoster
34:                            herpes simplex infection
                            collected_all_disease_terms

2.2.6 Other sexually transmitted infections

url <- "http://www.ebi.ac.uk/ols4/api/ontologies/mesh/terms/http%253A%252F%252Fid.nlm.nih.gov%252Fmesh%252FD012749/descendants"

sti_terms <- get_descendants(url)
[1] "Number of terms collected:"
[1] 36
[1] "\n Some example terms"
[1] "sexually transmitted diseases, bacterial"
[2] "aids arteritis, central nervous system"  
[3] "hiv-associated lipodystrophy syndrome"   
[4] "aids-related opportunistic infections"   
[5] "sexually transmitted diseases, viral"    
for(term in sti_terms){
  
  if(sum(grepl(term, gwas_study_info$collected_all_disease_terms)) > 0) {
    
  print(paste0("Term removed:", term))
    
  gwas_study_info = gwas_study_info |>
    mutate(collected_all_disease_terms  = 
           stringr::str_remove_all(collected_all_disease_terms,
                            pattern = vec_to_grep_pattern(term)
           )
    )
  
  }
}
[1] "Term removed:warts"

3 Respiratory infections and tuberculosis

3.1 Tuberculosis

tb_terms <- c("mycobacterium tuberculosis infection",
              "pulmonary tuberculosis",
              "extrapulmonary tuberculosis",
              "meningeal tuberculosis")


gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          pattern = vec_to_grep_pattern(tb_terms),
                          "tuberculosis"
         ))

3.2 Lower respiratory infections

# acute lower respiratory infections
url <- "http://www.ebi.ac.uk/ols4/api/ontologies/snomed/terms/http%253A%252F%252Fsnomed.info%252Fid%252F195742007/descendants"

lri_terms <- get_descendants(url)
[1] "Number of terms collected:"
[1] 35
[1] "\n Some example terms"
[1] "chronic obstructive pulmonary disease with acute lower respiratory infection"
[2] "acute lower respiratory infection caused by respiratory syncytial virus"     
[3] "acute infective exacerbation of chronic obstructive pulmonary disease"       
[4] "inflammation of bronchiole caused by human metapneumovirus"                  
[5] "acute bronchiolitis caused by respiratory syncytial virus"                   
for(term in lri_terms){
  
  if(sum(grepl(term, gwas_study_info$collected_all_disease_terms)) > 0) {
    
  print(paste0("Term removed:", term))
    
  gwas_study_info = gwas_study_info |>
    mutate(collected_all_disease_terms  = 
           stringr::str_remove_all(collected_all_disease_terms,
                            pattern = vec_to_grep_pattern(term)
           )
    )
  
  }
}

3.3 Upper respiratory infections

# acute upper respiratory infections
url <- "http://www.ebi.ac.uk/ols4/api/ontologies/snomed/terms/http%253A%252F%252Fsnomed.info%252Fid%252F54398005/descendants"

uri_terms <- get_descendants(url)
[1] "Number of terms collected:"
[1] 30
[1] "\n Some example terms"
[1] "acute upper respiratory infection caused by respiratory syncytial virus"
[2] "acute infective pharyngitis caused by streptococcus (disorder)"         
[3] "acute nasopharyngitis caused by respiratory syncytial virus"            
[4] "acute laryngitis caused by haemophilus influenzae"                      
[5] "fusobacterial necrotizing tonsillitis (disorder)"                       
for(term in uri_terms){
  
  if(sum(grepl(term, gwas_study_info$collected_all_disease_terms)) > 0) {
    
  print(paste0("Term removed:", term))
    
  gwas_study_info = gwas_study_info |>
    mutate(collected_all_disease_terms  = 
           stringr::str_remove_all(collected_all_disease_terms,
                            pattern = vec_to_grep_pattern(term)
           )
    )
  
  }
}

3.4 Otitis media

url <- "http://www.ebi.ac.uk/ols4/api/ontologies/doid/terms/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FDOID_10754/descendants"

otitis_media_terms <- get_descendants(url)
[1] "Number of terms collected:"
[1] 20
[1] "\n Some example terms"
[1] "chronic tubotympanic suppurative otitis media"
[2] "acute allergic sanguinous otitis media"       
[3] "acute allergic mucoid otitis media"           
[4] "acute allergic serous otitis media"           
[5] "middle ear cholesterol granuloma"             
gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          pattern = vec_to_grep_pattern(otitis_media_terms),
                          "otitis media"
         ))


gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_remove_all(collected_all_disease_terms,
                          pattern = vec_to_grep_pattern("otitis media")
         )
         )

3.5 Covid-19

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_remove_all(collected_all_disease_terms,
                          pattern = vec_to_grep_pattern("covid-19")
         )
         )

3.6 Other

3.6.1 Influenza a (h1n1) (subset of influenza)

gwas_study_info = gwas_study_info |>
    mutate(l1_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern("influenza a \\(h1n1\\)"),
                          "influenza"
         ))


gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_remove_all(collected_all_disease_terms,
                          pattern = vec_to_grep_pattern("influenza")
         )
         )

4 Enteric infections

4.1 Diarrheal diseases

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_remove_all(collected_all_disease_terms,
                          pattern = vec_to_grep_pattern("astroviridae infectious disease, infantile diarrhea")
         )
         )

# if diarrhea is caused by IBS, add ibs to collected_all_disease_terms

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
        ifelse(grepl("IBS", `DISEASE/TRAIT`, ignore.case = T) &
               grepl("diarrhea", collected_all_disease_terms, ignore.case = T),
               paste0(collected_all_disease_terms, ", irritable bowel syndrome"),
               collected_all_disease_terms
               )
         )

4.2 Typhoid and paratyphoid fevers

4.2.1 Typhoid fever

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_remove_all(collected_all_disease_terms,
                          pattern = vec_to_grep_pattern("typhoid fever")
         )
         )

4.3 Invasive Non-typhoidal salmonella

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_remove_all(collected_all_disease_terms,
                          pattern = vec_to_grep_pattern("non-typhoidal salmonella bacteremia")
         )
         )

4.4 Other intestinal infectious diseases

4.4.1 Gastroenteritis

# remove bacterial and viral gastroenteritis

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         ifelse(grepl("gastroenteritis", collected_all_disease_terms, ignore.case = T) &
                grepl("bacterial gastroenteritis|viral gastroenteritis", 
                       collected_all_disease_terms, 
                       ignore.case = T),
          stringr::str_remove_all(collected_all_disease_terms,
          pattern = vec_to_grep_pattern("gastroenteritis")
          ),
          collected_all_disease_terms
         )
         )

4.4.2 Enteritis

# remove viral and bacterial enteritis
gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         ifelse(grepl("enteritis", collected_all_disease_terms, ignore.case = T) &
                grepl("bacterial enteritis|viral enteritis", 
                       collected_all_disease_terms, 
                       ignore.case = T),
          stringr::str_remove_all(collected_all_disease_terms,
          pattern = vec_to_grep_pattern("enteritis")
          ),
          collected_all_disease_terms
         )
         )

4.4.3 Intestinal infectious disease

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_remove_all(collected_all_disease_terms,
                          pattern = vec_to_grep_pattern("intestinal infectious disease")
         )
         )

5 Neglected tropical diseases and malaria

5.1 Malaria

5.1.1 Plasmodium falciparum and vivax malaria

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          vec_to_grep_pattern(
                            c("plasmodium falciparum malaria",
                              "plasmodium vivax malaria"
                              )),
                          "malaria"
         ))
gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_remove_all(collected_all_disease_terms,
                          pattern = vec_to_grep_pattern("malaria")
         )
         )

5.2 Chagas disease

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_remove_all(collected_all_disease_terms,
                          pattern = vec_to_grep_pattern("chagas disease")
         )
         )

5.3 Leishmaniasis

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_remove_all(collected_all_disease_terms,
                          pattern = vec_to_grep_pattern("visceral leishmaniasis")
         )
         )


gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_remove_all(collected_all_disease_terms,
                          pattern = vec_to_grep_pattern("cutaneous leishmaniasis")
         )
         )

5.4 African trypanosomiasis

gwas_study_info |>
  filter(grepl("trypanosomiasis", collected_all_disease_terms, ignore.case = T)) |>
  select(`DISEASE/TRAIT`, collected_all_disease_terms)
Empty data.table (0 rows and 2 cols): DISEASE/TRAIT,collected_all_disease_terms

5.5 Schistosomiasis

gwas_study_info |>
  filter(grepl("schistosomiasis", collected_all_disease_terms, ignore.case = T)) |>
  select(`DISEASE/TRAIT`, collected_all_disease_terms)
Empty data.table (0 rows and 2 cols): DISEASE/TRAIT,collected_all_disease_terms

5.6 Cysticercosis

gwas_study_info |>
  filter(grepl("cysticercosis", collected_all_disease_terms, ignore.case = T)) |>
  select(`DISEASE/TRAIT`, collected_all_disease_terms)
Empty data.table (0 rows and 2 cols): DISEASE/TRAIT,collected_all_disease_terms

5.7 Cystic echinococcosis

gwas_study_info |>
  filter(grepl("echinococcosis", collected_all_disease_terms, ignore.case = T)) |>
  select(`DISEASE/TRAIT`, collected_all_disease_terms)
Empty data.table (0 rows and 2 cols): DISEASE/TRAIT,collected_all_disease_terms

5.8 Lymphatic filariasis

gwas_study_info |>
  filter(grepl("filariasis", collected_all_disease_terms, ignore.case = T)) |>
  select(`DISEASE/TRAIT`, collected_all_disease_terms)
Empty data.table (0 rows and 2 cols): DISEASE/TRAIT,collected_all_disease_terms

5.9 Onchocerciasis

gwas_study_info |>
  filter(grepl("onchocerciasis", collected_all_disease_terms, ignore.case = T)) |>
  select(`DISEASE/TRAIT`, collected_all_disease_terms)
Empty data.table (0 rows and 2 cols): DISEASE/TRAIT,collected_all_disease_terms

5.10 Trachoma

gwas_study_info |>
  filter(grepl("trachoma", collected_all_disease_terms, ignore.case = T)) |>
  select(`DISEASE/TRAIT`, collected_all_disease_terms)
Empty data.table (0 rows and 2 cols): DISEASE/TRAIT,collected_all_disease_terms

5.11 Dengue

gwas_study_info |>
  filter(grepl("dengue", collected_all_disease_terms, ignore.case = T)) |>
  select(`DISEASE/TRAIT`, collected_all_disease_terms)
           DISEASE/TRAIT collected_all_disease_terms
                  <char>                      <char>
1: Dengue shock syndrome    dengue hemorrhagic fever

5.12 Yellow fever

gwas_study_info |>
  filter(grepl("yellow fever", collected_all_disease_terms, ignore.case = T)) |>
  select(`DISEASE/TRAIT`, collected_all_disease_terms)
Empty data.table (0 rows and 2 cols): DISEASE/TRAIT,collected_all_disease_terms

5.13 Rabies

gwas_study_info |>
  filter(grepl("rabies", collected_all_disease_terms, ignore.case = T)) |>
  select(`DISEASE/TRAIT`, collected_all_disease_terms)
Empty data.table (0 rows and 2 cols): DISEASE/TRAIT,collected_all_disease_terms

5.14 Intestinal nematode infections

# ascariasis
gwas_study_info |>
  filter(grepl("ascariasis", collected_all_disease_terms, ignore.case = T)) |>
  select(`DISEASE/TRAIT`, collected_all_disease_terms)
Empty data.table (0 rows and 2 cols): DISEASE/TRAIT,collected_all_disease_terms
# trichuriasis
gwas_study_info |>
  filter(grepl("trichuriasis", collected_all_disease_terms, ignore.case = T)) |>
  select(`DISEASE/TRAIT`, collected_all_disease_terms)
Empty data.table (0 rows and 2 cols): DISEASE/TRAIT,collected_all_disease_terms
# hookworm
gwas_study_info |>
  filter(grepl("hookworm", collected_all_disease_terms, ignore.case = T)) |>
  select(`DISEASE/TRAIT`, collected_all_disease_terms)
Empty data.table (0 rows and 2 cols): DISEASE/TRAIT,collected_all_disease_terms

5.15 Food borne trematode infections

gwas_study_info |>
  filter(grepl("trema", collected_all_disease_terms, ignore.case = T) 
         ) |>
  select(`DISEASE/TRAIT`, collected_all_disease_terms)
Empty data.table (0 rows and 2 cols): DISEASE/TRAIT,collected_all_disease_terms

5.16 Leprosy

gwas_study_info |>
  filter(grepl("leprosy", collected_all_disease_terms, ignore.case = T)) |>
  select(`DISEASE/TRAIT`, collected_all_disease_terms) |>
  distinct()
                                  DISEASE/TRAIT collected_all_disease_terms
                                         <char>                      <char>
1:                                      Leprosy                     leprosy
2:                   Crohn's disease or Leprosy     leprosy, crohns disease
3: Crohn's disease or Leprosy (opposite effect)     leprosy, crohns disease
gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_remove_all(collected_all_disease_terms,
                          pattern = vec_to_grep_pattern("leprosy")
         )
         )

5.17 Ebola

gwas_study_info |>
  filter(grepl("ebola", collected_all_disease_terms, ignore.case = T)) |>
  select(`DISEASE/TRAIT`, collected_all_disease_terms)
Empty data.table (0 rows and 2 cols): DISEASE/TRAIT,collected_all_disease_terms

5.18 Zika virus

gwas_study_info |>
  filter(grepl("zika", collected_all_disease_terms, ignore.case = T)) |>
  select(`DISEASE/TRAIT`, collected_all_disease_terms)
              DISEASE/TRAIT    collected_all_disease_terms
                     <char>                         <char>
1: Congenital Zika syndrome zika virus congenital syndrome

5.19 Guinea worm disease

gwas_study_info |>
  filter(grepl("worm", collected_all_disease_terms, ignore.case = T)) |>
  select(`DISEASE/TRAIT`, collected_all_disease_terms)
Empty data.table (0 rows and 2 cols): DISEASE/TRAIT,collected_all_disease_terms

5.20 Other neglected tropical diseases

6 Other infectious diseases

6.1 Meningitis

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_remove_all(collected_all_disease_terms,
                          pattern = vec_to_grep_pattern("infectious meningitis")
         )
         )

6.2 Encephalitis

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_remove_all(collected_all_disease_terms,
                          pattern = vec_to_grep_pattern("viral encephalitis")
         )
         )


# ? also remove if DISEASE/TRAIT == Encephalitis (PheCode 323)

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         ifelse(`DISEASE/TRAIT` == "Encephalitis (PheCode 323)",
         stringr::str_remove_all(collected_all_disease_terms,
                          pattern = vec_to_grep_pattern("encephalitis")
                          ),
         collected_all_disease_terms
         )
  )

6.3 Diphtheria

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_remove_all(collected_all_disease_terms,
                          pattern = vec_to_grep_pattern("diphtheria")
         )
         )

6.4 Pertussis

gwas_study_info |>
  filter(grepl("pertussis", collected_all_disease_terms, ignore.case = T)) |>
  select(`DISEASE/TRAIT`, collected_all_disease_terms)
Empty data.table (0 rows and 2 cols): DISEASE/TRAIT,collected_all_disease_terms

6.5 Tetanus

gwas_study_info |>
  filter(grepl("tetanus", collected_all_disease_terms, ignore.case = T)) |>
  select(`DISEASE/TRAIT`, collected_all_disease_terms) 
Empty data.table (0 rows and 2 cols): DISEASE/TRAIT,collected_all_disease_terms

6.6 Measles

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_remove_all(collected_all_disease_terms,
                          pattern = vec_to_grep_pattern("measles")
         )
         )

6.7 Varicella and herpes zoster

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         ifelse(`DISEASE/TRAIT` == "Herpes zoster (PheCode 53)" |
                `DISEASE/TRAIT` == "Herpes zoster" |
                `DISEASE/TRAIT` == "Zoster infection" |
                grepl("ICD10 B02: Zoster \\[herpes zoster\\]", `DISEASE/TRAIT`),
         stringr::str_remove_all(collected_all_disease_terms,
                          pattern = vec_to_grep_pattern("herpes zoster")
                          ),
         collected_all_disease_terms
         )
  )


gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          pattern = vec_to_grep_pattern("complication, herpes zoster, nervous system disease"),
                          "herpes zoster, nervous system disease"
                          
         )
         )

6.8 Acute hepatitis

# if acute hepatitis in DISEASE/TRAIT remove from collected_all_disease_terms
gwas_study_info |> 
  filter(grepl("acute hepat", 
                 `DISEASE/TRAIT`, 
                  ignore.case = T)) |> 
  pull(`DISEASE/TRAIT`) |>
  unique()
[1] "Acute hepatitis A infection"
gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         ifelse(`DISEASE/TRAIT` == "Acute hepatitis A infection",
         stringr::str_remove_all(collected_all_disease_terms,
                          pattern = vec_to_grep_pattern("hepatitis a infection")
                          ),
         collected_all_disease_terms
         )
  )

6.9 Other unspecified infectious diseases

6.9.1 Bacterial pneumonia

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_remove_all(collected_all_disease_terms,
                          pattern = vec_to_grep_pattern("bacterial pneumonia")
         )
         )

6.9.2 Infectious mononucleosis

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_remove_all(collected_all_disease_terms,
                          pattern = vec_to_grep_pattern("infectious mononucleosis")
         )
         )

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_remove_all(collected_all_disease_terms,
                          pattern = vec_to_grep_pattern("epstein-barr virus infection")
         )
         )

6.9.3 Protozoa infectious disease

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_remove_all(collected_all_disease_terms,
                          pattern = vec_to_grep_pattern("protozoa infectious disease")
         )
         )

6.9.4 Herpes

gwas_study_info = 
gwas_study_info |> 
 mutate(collected_all_disease_terms  = 
        ifelse(grepl("herpes simplex", `DISEASE/TRAIT`, ignore.case = T) &
               grepl("herpesviridae infectious disease", collected_all_disease_terms, ignore.case = T),
          stringr::str_replace_all(collected_all_disease_terms,
                                  pattern = vec_to_grep_pattern("herpesviridae infectious disease"),
                                  "herpes simplex infection"),
          collected_all_disease_terms
               )
          ) 


# cold sores
gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_replace_all(collected_all_disease_terms,
                          pattern = vec_to_grep_pattern("cold sores"),
                          "herpes simplex infection"
         )
         )


gwas_study_info |>
  filter(grepl("herpes", collected_all_disease_terms, ignore.case = T)) |>
  pull(`DISEASE/TRAIT`) |>
  unique()
 [1] "Cold sores"                                                                              
 [2] "Shingles"                                                                                
 [3] "Source of report of B02 (zoster [herpes zoster]) (UKB data field 130179)"                
 [4] "Herpes zoster with nervous system complications (PheCode 53.1)"                          
 [5] "Herpes simplex (PheCode 54)"                                                             
 [6] "ICD10 B00: Herpesviral [herpes simplex] infections"                                      
 [7] "Response to tofacitinib treatment in rheumatoid arthritis (herpes zoster)(time to event)"
 [8] "Response to tofacitinib treatment in psoriasis (herpes zoster)(time to event)"           
 [9] "Response to tofacitinib treatment in rheumatoid arthritis (herpes zoster)"               
[10] "Response to tofacitinib treatment in psoriasis (herpes zoster)"                          
[11] "Response to tofacitinib treatment (herpes zoster)(time to event)"                        
[12] "Response to tofacitinib treatment (herpes zoster)"                                       
[13] "ICD10 B00: Herpesviral [herpes simplex] infections (Gene-based burden)"                  
[14] "Herpes infection"                                                                        

6.9.5 Mumps

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_remove_all(collected_all_disease_terms,
                          pattern = vec_to_grep_pattern("mumps")
         )
         )

6.9.6 Parasitic infection

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_remove_all(collected_all_disease_terms,
                          pattern = vec_to_grep_pattern("parasitic infection")
         )
         )

6.9.7 Helicobacter pylori infection

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_remove_all(collected_all_disease_terms,
                          pattern = vec_to_grep_pattern("helicobacter pylori infectious disease")
         )
         )

6.9.8 Common cold

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms  = 
         stringr::str_remove_all(collected_all_disease_terms,
                          pattern = vec_to_grep_pattern("common cold")
         )
         )
# parasitic infection

7 Update and save output

7.1 Deal with duplicate terms created during grouping

gwas_study_info = 
 gwas_study_info |>
  rowwise() |>
  mutate(collected_all_disease_terms = paste0(sort(unique(unlist(strsplit(collected_all_disease_terms, ", ")))),
                                      collapse = ", ")
         ) |>
  ungroup()

7.2 Deal with hanging commas and spaces

gwas_study_info = gwas_study_info |>
  mutate(collected_all_disease_terms = stringr::str_remove_all(collected_all_disease_terms, "^,|,$")
         ) |>
  mutate(collected_all_disease_terms = stringr::str_trim(collected_all_disease_terms)
         ) 

7.3 Final summary - number of unique study terms

n_studies_trait = gwas_study_info |>
  dplyr::filter(DISEASE_STUDY == T) |>
  dplyr::select(collected_all_disease_terms, PUBMED_ID) |>
  dplyr::distinct() |>
  dplyr::group_by(collected_all_disease_terms) |>
  dplyr::summarise(n_studies = dplyr::n()) |>
  dplyr::arrange(desc(n_studies))

head(n_studies_trait)
# A tibble: 6 × 2
  collected_all_disease_terms n_studies
  <chr>                           <int>
1 "type 2 diabetes mellitus"        146
2 "alzheimers disease"              116
3 ""                                115
4 "breast cancer"                   113
5 "asthma"                          111
6 "major depressive disorder"       109
dim(n_studies_trait)
[1] 2954    2

7.3.1 When separate studies with multiple terms

diseases <- stringr::str_split(pattern = ", ", 
 gwas_study_info$collected_all_disease_terms[gwas_study_info$collected_all_disease_terms != ""])  |> 
            unlist() |>
            stringr::str_trim()

length(unique(diseases))
[1] 2140

7.4 Save output

fwrite(gwas_study_info,
here::here("output/gwas_cat/gwas_study_info_disease_trait_filtered.csv"))

sessionInfo()
R version 4.3.1 (2023-06-16)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS 15.6.1

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: America/Los_Angeles
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices datasets  utils     methods   base     

other attached packages:
[1] jsonlite_2.0.0    httr_1.4.7        stringr_1.5.1     data.table_1.17.8
[5] dplyr_1.1.4       workflowr_1.7.1  

loaded via a namespace (and not attached):
 [1] compiler_4.3.1    renv_1.0.3        promises_1.3.3    tidyselect_1.2.1 
 [5] Rcpp_1.1.0        git2r_0.36.2      callr_3.7.6       later_1.4.2      
 [9] jquerylib_0.1.4   yaml_2.3.10       fastmap_1.2.0     here_1.0.1       
[13] R6_2.6.1          generics_0.1.4    curl_6.4.0        knitr_1.50       
[17] tibble_3.3.0      rprojroot_2.1.0   bslib_0.9.0       pillar_1.11.0    
[21] rlang_1.1.6       utf8_1.2.6        cachem_1.1.0      stringi_1.8.7    
[25] httpuv_1.6.16     xfun_0.52         getPass_0.2-4     fs_1.6.6         
[29] sass_0.4.10       cli_3.6.5         withr_3.0.2       magrittr_2.0.3   
[33] ps_1.9.1          digest_0.6.37     processx_3.8.6    rstudioapi_0.17.1
[37] lifecycle_1.0.4   vctrs_0.6.5       evaluate_1.0.4    glue_1.8.0       
[41] whisker_0.4.1     rmarkdown_2.29    tools_4.3.1       pkgconfig_2.0.3  
[45] htmltools_0.5.8.1