Last updated: 2025-09-28

Checks: 7 0

Knit directory: genomics_ancest_disease_dispar/

This reproducible R Markdown analysis was created with workflowr (version 1.7.1). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.

R Markdown file: up-to-date

Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Environment: empty

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

Seed: set.seed(20220216)

The command set.seed(20220216) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Session information: recorded

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Cache: none

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

File paths: relative

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Repository version: 97d340d

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version 97d340d. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .DS_Store
    Ignored:    .Rproj.user/
    Ignored:    data/.DS_Store
    Ignored:    data/gbd/.DS_Store
    Ignored:    data/gbd/IHME-GBD_2021_DATA-d8cf695e-1.csv
    Ignored:    data/gbd/ihme_gbd_2019_global_disease_burden_rate_all_ages.csv
    Ignored:    data/gbd/ihme_gbd_2019_global_paf_rate_percent_all_ages.csv
    Ignored:    data/gbd/ihme_gbd_2021_global_disease_burden_rate_all_ages.csv
    Ignored:    data/gbd/ihme_gbd_2021_global_paf_rate_percent_all_ages.csv
    Ignored:    data/gwas_catalog/
    Ignored:    data/icd/.DS_Store
    Ignored:    data/icd/IHME_GBD_2019_COD_CAUSE_ICD_CODE_MAP_Y2020M10D15.XLSX
    Ignored:    data/icd/IHME_GBD_2019_NONFATAL_CAUSE_ICD_CODE_MAP_Y2020M10D15.XLSX
    Ignored:    data/icd/IHME_GBD_2021_COD_CAUSE_ICD_CODE_MAP_Y2024M05D16.XLSX
    Ignored:    data/icd/IHME_GBD_2021_NONFATAL_CAUSE_ICD_CODE_MAP_Y2024M05D16.XLSX
    Ignored:    data/icd/UK_Biobank_master_file.tsv
    Ignored:    data/icd/cdc_valid_icd10_Sep_23_2025.xlsx
    Ignored:    data/icd/cdc_valid_icd9_Sep_23_2025.xlsx
    Ignored:    data/icd/phecode_international_version_unrolled.csv
    Ignored:    data/icd/semiautomatic_ICD-pheno.txt
    Ignored:    data/icd/~$IHME_GBD_2021_COD_CAUSE_ICD_CODE_MAP_Y2024M05D16.XLSX
    Ignored:    data/icd/~$IHME_GBD_2021_NONFATAL_CAUSE_ICD_CODE_MAP_Y2024M05D16.XLSX
    Ignored:    data/who/
    Ignored:    diseases.txt
    Ignored:    not_found_diseases.txt
    Ignored:    orig_phecode_map.csv
    Ignored:    original_phecodes_pheinfo.csv
    Ignored:    output/gwas_cat/
    Ignored:    output/gwas_study_info_cohort_corrected.csv
    Ignored:    output/gwas_study_info_trait_corrected.csv
    Ignored:    output/gwas_study_info_trait_ontology_info.csv
    Ignored:    output/gwas_study_info_trait_ontology_info_l1.csv
    Ignored:    output/gwas_study_info_trait_ontology_info_l2.csv
    Ignored:    output/trait_ontology/
    Ignored:    renv/
    Ignored:    sup_table.xlsx
    Ignored:    zooma.tsv
    Ignored:    zooma_res.tsv

Untracked files:
    Untracked:  analysis/garbage_icd_codes.Rmd
    Untracked:  disease_mapping.R

Unstaged changes:
    Modified:   analysis/disease_inves_by_ancest.Rmd
    Modified:   analysis/exclude_infectious_diseases.Rmd
    Modified:   analysis/gbd_data_plots.Rmd
    Modified:   analysis/index.Rmd
    Modified:   analysis/level_1_disease_group_non_cancer.Rmd
    Modified:   analysis/level_2_disease_group.Rmd
    Modified:   analysis/trait_ontology_categorization.Rmd
    Modified:   data/icd/README.md

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.

These are the previous versions of the repository in which changes were made to the R Markdown (analysis/map_trait_to_icd10.Rmd) and HTML (docs/map_trait_to_icd10.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File	Version	Author	Date	Message
Rmd	97d340d	IJbeasley	2025-09-28	workflowr::wflow_publish("analysis/map_trait_to_icd10.Rmd")

Set up

library(dplyr)
library(stringr)
library(data.table)

Ontology help - for getting disease subtypes

source(here::here("code/get_term_descendants.R"))

Load Data

gwas_study_info <- fread(here::here("output/gwas_cat/gwas_study_info_group.csv"))

Map GWAS traits to ICD 10

Create a mapping table - just for studies with a single disease term

disease_mapping <- gwas_study_info |>
  filter(DISEASE_STUDY == T) |>
  filter(!grepl(",", collected_all_disease_terms)) |>
  filter(collected_all_disease_terms != " ") |>
  filter(collected_all_disease_terms != "") |>
  select(`DISEASE/TRAIT`, collected_all_disease_terms) |>
  distinct()

diseases <- stringr::str_split(pattern = ", ",
                               gwas_study_info$collected_all_disease_terms[gwas_study_info$collected_all_disease_terms != ""])  |>
  unlist() |>
  stringr::str_trim()

diseases <- unique(diseases)

print(length(diseases))

[1] 1697

Get ICD10 codes from PheCodes

Get Phecodes for diseases

disease_mapping <- disease_mapping |>
  mutate(
    phecode = str_extract(`DISEASE/TRAIT`, "(?<=PheCode )[^)]+")
  ) |>
  mutate(phecode = as.numeric(phecode))

Convert Phecodes to ICD10

# phecode to ICD10 mapping from https://wei-lab.app.vumc.org/phecode-data/phecode_international_version

phecodes <- fread(here::here("data/icd/phecode_international_version_unrolled.csv"))

phecode_icd_map =
  phecodes |>
  select(icd10_code = ICD10, 
         phecode = PheCode
         )

# if more than one ICD10 code per phecode, collapse into a single row
phecode_icd_map =
  phecode_icd_map |>
group_by(phecode) |>
  summarise(icd10_code = 
            str_flatten(unique(icd10_code), collapse = ", ", na.rm = T), 
            .groups = "drop")


disease_mapping =
left_join(disease_mapping,
          phecode_icd_map,
          by = "phecode",
          relationship = "many-to-one",
          na_matches = "never")

Get ICD10 codes from author provided DISEASE/TRAIT column

disease_mapping =
  disease_mapping |>
  mutate(icd10_code = ifelse(grepl("ICD10", `DISEASE/TRAIT`),
                             str_extract(`DISEASE/TRAIT`, "(?<=ICD10 )[^:]+(?=:)"),
                             icd10_code)
)


disease_mapping =
disease_mapping |>
  group_by(collected_all_disease_terms) |>
  summarise(icd10_code = str_flatten(unique(icd10_code), 
                                     collapse = ", ", 
                                     na.rm = T), 
            .groups = "drop")

How many diseases are not mapped yet?

disease_mapping =
  disease_mapping |>
  filter(icd10_code != "")


not_found_diseases <- diseases[!diseases %in% disease_mapping$collected_all_disease_terms] 
not_found_diseases <- not_found_diseases[not_found_diseases != ""]

print(length(not_found_diseases))

[1] 507

Match disease terms to phenotypes

phenotype_icd_map =
  phecodes |>
group_by(Phenotype) |>
  summarise(icd10_code = 
            str_flatten(ICD10, collapse = ", ", na.rm = T), 
            .groups = "drop")

matched_phenotypes =
phenotype_icd_map |>
filter(tolower(Phenotype) %in% not_found_diseases)

matched_phenotypes =
matched_phenotypes |>
  mutate(collected_all_disease_terms = tolower(Phenotype)) |>
  select(collected_all_disease_terms, icd10_code) 

disease_mapping =
  bind_rows(disease_mapping, matched_phenotypes) |>
  distinct()

How many diseases are not mapped yet?

disease_mapping =
  disease_mapping |>
  filter(icd10_code != "")


not_found_diseases <- diseases[!diseases %in% disease_mapping$collected_all_disease_terms] 
not_found_diseases <- not_found_diseases[not_found_diseases != ""]

print(length(not_found_diseases))

[1] 455

Match disease terms to ICD descriptions

to_add = 
phecodes |>
    filter(tolower(iconv(ICD_DESCRIPTION, to = "UTF-8")) %in% not_found_diseases) |>
    mutate(collected_all_disease_terms = tolower(iconv(ICD_DESCRIPTION, to = "UTF-8"))) |>
    select(collected_all_disease_terms,    
           icd10_code = ICD10) 

phecodes |>
    filter(tolower(iconv(ICD_DESCRIPTION, to = "UTF-8")) %in% "androgenetic alopecia") |>
    mutate(collected_all_disease_terms = tolower(iconv(ICD_DESCRIPTION, to = "UTF-8"))) |>
    select(collected_all_disease_terms,    
           icd10_code = ICD10)

Empty data.table (0 rows and 2 cols): collected_all_disease_terms,icd10_code

disease_mapping =
  bind_rows(disease_mapping, to_add) |>
  distinct()

How many diseases are not mapped yet?

disease_mapping =
  disease_mapping |>
  filter(icd10_code != "")


not_found_diseases <- diseases[!diseases %in% disease_mapping$collected_all_disease_terms] 
not_found_diseases <- not_found_diseases[not_found_diseases != ""]

print(length(not_found_diseases))

[1] 367

Manually add some ICD10 codes

collected_all_disease_terms = c("alcoholic liver cirrhosis",
                                "alcoholic pancreatitis",
                                "ischemic cardiomyopathy",
                                "systemic juvenile idiopathic arthritis",
                                "juvenile idiopathic arthritis",
                                "oligoarticular juvenile idiopathic arthritis",
                                "sapho syndrome",
                                "synovial plica syndrome",
                                "urgency urinary incontinence",
                                "abdominal distention",
                                "early-onset alzheimers disease",
                                "late-onset alzheimers disease",
                                "renal overload-type gout",
                                "vomiting of pregnancy",
                                "kleine-levin syndrome",
                                "autoimmune pancreatitis type 1",
                                "allergic contact dermatitis of eyelid",
                                "guillain-barre syndrome",
                                "idiopathic pulmonary fibrosis",
                                "behcets syndrome",
                                "kashin-beck disease",
                                "chronic thromboembolic pulmonary hypertension",
                                "pulmonary hypertension",
                                "pulmonary arterial hypertension",
                                "pulmonary coin lesion",
                                "pulmonary infarction",
                                "neuromyelitis optica",
                                "buruli ulcer disease",
                                "churg-strauss syndrome",
                                "graft versus host disease",
                                "takayasu arteritis",
                                "enuresis",
                                "cannabis dependence",
                                "orofacial cleft",
                                "eczema",
                                "drug dependence",
                                "cocaine-related disorders",
                                "pharynx cancer",
                                "pseudotumor cerebri",
                                "altitude sickness",
                                "high altitude pulmonary edema",
                                "intrahepatic cholestasis of pregnancy",
                                "brain injury",
                                "radiation-induced brain injury",
                                "abdominal infections code",
                                "secondary hyperparathyroidism of renal origin",
                                "gastroparesis",
                                "neuroblastoma",
                                "peripartum cardiomyopathy",
                                "retroperitoneal cancer",
                                "asphyxia neonatorum",
                                "postherpetic neuralgia",
                                "manic or hypomanic episode",
                                "allergic conjunctivitis",
                                "thiazide-induced hyponatremia",
                                "alpha 1-antitrypsin deficiency",
                                "autoimmune thyroid disease",
                                "hashimotos thyroiditis",
                                "charcot-marie-tooth disease type 1a",
                                "amyotrophic lateral sclerosis",
                                "fuchs endothelial corneal dystrophy",
                                "duchenne muscular dystrophy",
                                "familial apolipoprotein b hypobetalipoproteinemia",
                                "gastric metaplasia",
                                "inborn carbohydrate metabolic disorder",
                                "petaloid toenail",
                                "thyrotoxic periodic paralysis",
                                "schizoaffective disorder",
                                "rhegmatogenous retinal detachment",
                                "restless legs syndrome",
                                "preterm premature rupture of the membranes",
                                "porphyrin metabolism disease",
                                "peritoneal cancer",
                                "methamphetamine use disorders",
                                "familial sick sinus syndrome",
                                "drug misuse",
                                "abnormal ecg",
                                "adenoiditis",
                                "bacterial endocarditis",
                                "biliary atresia",
                                "bronchopulmonary dysplasia",
                                "cervical ectropion",
                                "chronic primary adrenal insufficiency",
                                "ciliopathy",
                                "collagenous colitis",
                                "colonic diverticula",
                                "craniofacial microsomia",
                                "cryptorchidism",
                                "plantar fasciitis",
                                "plantar fibromatosis",
                                "lewy body dementia",
                                "x-linked dystonia-parkinsonism",
                                "hippocampal sclerosis of aging",
                                "testicular dysgenesis syndrome",
                                "internet addiction disorder",
                                "food addiction",
                                "malignant lymphoid tumor",
                                "compartment syndrome",
                                "elevated lactate dehydrogenase",
                                "loss of consciousness",
                                "nephrosclerosis",
                                "periprosthetic osteolysis",
                                "polypoidal choroidal vasculopathy",
                                "pulmonary alveolar proteinosis",
                                "chorioamnionitis",
                                "hoarding disorder",
                                "unilateral renal agenesis",
                                "muscle spasm",
                                "oral ulcer",
                                "ileocolitis",
                                "microscopic colitis",
                                "lymphocytic colitis",
                                "drug-induced dyskinesia",
                                "plasma protein metabolism disease",
                                "oral lichen planus",
                                "epididymitis",
                                "orchitis",
                                "ectropion",
                                "entropion",
                                "cervical dystonia",
                                "clonal hematopoiesis",
                                "diffuse idiopathic skeletal hyperostosis",
                                "endocervicitis",
                                "eosinophilic esophagitis",
                                "focal segmental glomerulosclerosis",
                                "hypercalcemia",
                                "hypertriglyceridemia",
                                "hypocalcemia",
                                "lymphangioleiomyomatosis",
                                "mononucleosis",
                                "necrotizing enterocolitis",
                                "occupation-related stress disorder",
                                "ototoxicity",
                                "plantar warts",
                                "podoconiosis",
                                "posterior cortical atrophy",
                                "pigment dispersion syndrome",
                                "takotsubo cardiomyopathy",
                                "testicular germ cell tumor",
                                "normal pressure hydrocephalus",
                                "anti-nmda receptor encephalitis"
                                )

icd10_code = c("K70.3", 
               "K85.2, K85.20, K85.21, K85.22",
               "I25.5",
               "M08.20",
               "M08.9",
               "MO8.4",
               "M86.3",
               "M67.8",
               "N39.4",
               "R14",
               "F00.0, G30.0",
               "F00.1, G30.1",
               "M10.3",
               "O21, O21.9",
               "G47.1",
               "K86.1",
               "H01.1",
               "G61.0",
               "J84.1",
               "M35.2",
               "M12.1",
               "I27.8",
               "I27.9",
               "I27.9",
               "R91",
               "I26.9",
               "G36.0",
               "A31.1",
               "M30.1",
               "D89.8",
               "M31.4",
               "R32",
               "F12.2",
               "Q36, Q36.0, Q36.9, Q35, Q35.1, Q35.3, Q35.5, Q35.7, Q35.9",
               "L30.9",
               "F19.2",
               "F14.1",
               "C14.0",
               "G93.2",
               "T70.2",
               "T70.2",
               "O26.6",
               "S06.9",
               "S06.9",
               "D73.3, K35-37, K57, K61, K63.0, K65, K75.0, K81, K83.0",
               "N25.8",
               "K31.8",
               "C74.9",
               "O90.3",
               "C48.0",
               "P24.8, P24.9",
               "B02.2",
               "F30.9",
               "H10.1",
               "E87.1",
               "E88.0",
               "E06.3",
               "E06.3",
               "G60.0",
               "G12.2",
               "H18.5",
               "G71.0",
               "E78.6",
               "K31",
               "E74.9",
               "L60",
               "G72.3",
               "F25",
               "H33.0",
               "G25.8",
               "O42",
               "E80.2",
               "C48.2",
               "F15.1, F15.2",
               "I49.5",
               "F19.1",
               "R94.3",
               "J35",
               "I33.0",
               "Q44.2",
               "P27.1",
               "H02.1",
               "E27.1",
               "Q34.8",
               "K52.8",
               "K57.3",
               "Q67.4",
               "Q53",
               "M72.2",
               "M72.2",
               "G31.8",
               "G24.1",
               "G93.8",
               "E29",
               "F63",
               "F50.8, F50.9",
               "C96.9",
               "T79",
               "R74",
               "R40",
               "I12",
               "T84",
               "H35",
               "J84",
               "O41.1",
               "F42.3",
               "Q60.0",
               "M62.8",
               "K12",
               "K50.0",
               "K52.8",
               "K52.8",
               "G25.8",
               "E88",
               "L43",
               "N45",
               "N45",
               "H02.1",
               "H02.0, H02.1",
               "G24",
               "D47",
               "M48.1",
               "N72",
               "K20",
               "N04.1",
               "E83.5",
               "E78.1",
               "E83.5",
               "J84.8",
               "B27",
               "P77",
               "F43",
               "H91.0",
               "B07",
               "I89.0",
               "G31.1",
               "H21.2",
               "I51.8",
               "D41",
               "G91.2",
               "A85"
               )


to_add = data.frame(collected_all_disease_terms, icd10_code)

disease_mapping =
  bind_rows(disease_mapping, to_add) |>
  distinct()

How many diseases are not mapped yet?

disease_mapping =
  disease_mapping |>
  filter(icd10_code != "")


not_found_diseases <- diseases[!diseases %in% disease_mapping$collected_all_disease_terms] 
not_found_diseases <- not_found_diseases[not_found_diseases != ""]

print(length(not_found_diseases))

[1] 229

Mapping by finding fuzzy matches

top_match_string =
function(string1, string2, method){
  
  lcs_matrix = stringdistmatrix(string1, string2, method = method)
  
  colnames(lcs_matrix) = string2
  rownames(lcs_matrix) = string1
  
  # only return rows where min distance is < 10
  #lcs_matrix = lcs_matrix[apply(lcs_matrix, 1, min) < 5, ]
  
  # then only return the colname for the min distance for each row
  original_string1 = rownames(lcs_matrix)
  top_match_string2 = vector()
  distance = vector()
  
  for(i in 1:nrow(lcs_matrix)){
    
    col_n = which(min(lcs_matrix[i, ]) == lcs_matrix[i, ])
    top_match_string2[i] = colnames(lcs_matrix)[col_n[1]]
    distance[i] = min(lcs_matrix[i, ])
    
    
  }
  
  top_match_pairs = data.frame(original_string1, 
                               top_match_string2,
                               distance)
  
  return(top_match_pairs)
  
}


top_match_string(not_found_diseases, 
                 unique(tolower(phecodes$Phenotype)), 
                 method = "lv") -> fuzzy_matches

top_match_string(not_found_diseases, 
                 unique(tolower(iconv(phecodes$ICD_DESCRIPTION, to = "UTF-8"))), 
                 method = "lv") -> fuzzy_matches

Map remaining diseases using UK Biobank mapping of EFO to ICD10

writeLines(not_found_diseases, con = here::here("not_found_diseases.txt"))

data.table::fread(here::here("zooma_res.tsv"), skip = 6) -> zooma_res

zooma_res = 
  zooma_res |>
  rename_with(~str_replace_all(., " ", "_"))

  

multiple_mapping <-
  zooma_res |> 
  group_by(PROPERTY_VALUE) |> 
  summarise(n = n()) |> 
  filter(n > 1)

zooma_res_to_check =
  zooma_res |>
  filter(PROPERTY_VALUE %in% multiple_mapping$PROPERTY_VALUE)


zooma_res =
zooma_res |>
  rowwise() |>
  filter(
          (PROPERTY_VALUE == tolower(`ONTOLOGY_TERM_LABEL(S)`) & 
         PROPERTY_VALUE %in% multiple_mapping$PROPERTY_VALUE) |
         !PROPERTY_VALUE %in% multiple_mapping$PROPERTY_VALUE
           ) |>
  ungroup()

zooma_res =
  zooma_res |>
  group_by(PROPERTY_VALUE) |>
  slice_sample(n= 1)

zooma_res =
  zooma_res |>
  select(uri = `ONTOLOGY_TERM(S)`, 
         collected_all_disease_terms = PROPERTY_VALUE)

uk_efo_icd <- data.table::fread(here::here("data/icd/UK_Biobank_master_file.tsv"))

uk_efo_icd = 
uk_efo_icd |>
  tidyr::separate_longer_delim(MAPPED_TERM_URI, delim = ", ") 

uk_efo_icd =
  uk_efo_icd |>
  rename_with(~str_replace_all(., " ", "_")) |>
  rename_with(~str_replace_all(., "/", "_"))

uk_efo_icd =
  uk_efo_icd |>
  filter(grepl("^[A-Z]", 
               ICD10_CODE_SELF_REPORTED_TRAIT_FIELD_CODE)
         )


uk_efo_icd = 
uk_efo_icd |>
  filter(MAPPED_TERM_URI %in% zooma_res$uri) |>
  select(uri = MAPPED_TERM_URI, 
         icd10_code = ICD10_CODE_SELF_REPORTED_TRAIT_FIELD_CODE)

uk_efo_icd = 
  uk_efo_icd |>
  group_by(uri) |>
  summarise(icd10_code = 
            str_flatten(icd10_code, collapse = ", ", na.rm = T), 
            .groups = "drop")

to_add = 
left_join(zooma_res,
          uk_efo_icd,
          by = c("uri"),
          na_matches = "never") |>
  select(collected_all_disease_terms, icd10_code) |>
  filter(icd10_code != "") |>
  distinct()

disease_mapping =
bind_rows(disease_mapping, to_add)

readxl::read_xlsx(here::here("sup_table.xlsx"), sheet = 1) -> sup_table

sup_table <- sup_table |> mutate(Mapped_trait_URI = str_remove_all(pattern = "http://www.ebi.ac.uk/efo/|http://purl.obolibrary.org/obo/|http://www.orpha.net/ORDO/", Mapped_trait_URI))

sup_table |>
  filter(Mapped_trait_URI %in% zooma_res$uri)

sessionInfo()

R version 4.3.1 (2023-06-16)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS 15.6.1

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: America/Los_Angeles
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices datasets  utils     methods   base     

other attached packages:
[1] jsonlite_2.0.0    httr_1.4.7        data.table_1.17.8 stringr_1.5.1    
[5] dplyr_1.1.4       workflowr_1.7.1  

loaded via a namespace (and not attached):
 [1] compiler_4.3.1    renv_1.0.3        promises_1.3.3    tidyselect_1.2.1 
 [5] Rcpp_1.1.0        git2r_0.36.2      callr_3.7.6       later_1.4.2      
 [9] jquerylib_0.1.4   yaml_2.3.10       fastmap_1.2.0     here_1.0.1       
[13] R6_2.6.1          generics_0.1.4    knitr_1.50        tibble_3.3.0     
[17] rprojroot_2.1.0   bslib_0.9.0       pillar_1.11.0     rlang_1.1.6      
[21] cachem_1.1.0      stringi_1.8.7     httpuv_1.6.16     xfun_0.52        
[25] getPass_0.2-4     fs_1.6.6          sass_0.4.10       cli_3.6.5        
[29] withr_3.0.2       magrittr_2.0.3    ps_1.9.1          digest_0.6.37    
[33] processx_3.8.6    rstudioapi_0.17.1 lifecycle_1.0.4   vctrs_0.6.5      
[37] evaluate_1.0.4    glue_1.8.0        whisker_0.4.1     rmarkdown_2.29   
[41] tools_4.3.1       pkgconfig_2.0.3   htmltools_0.5.8.1

Mapping GWAS traits to ICD 10

Isobel Beasley

2025-09-26

Set up

Ontology help - for getting disease subtypes

Load Data

Map GWAS traits to ICD 10

Create a mapping table - just for studies with a single disease term

Get ICD10 codes from PheCodes

Get Phecodes for diseases

Convert Phecodes to ICD10

Get ICD10 codes from author provided DISEASE/TRAIT column

How many diseases are not mapped yet?

Match disease terms to phenotypes

How many diseases are not mapped yet?

Match disease terms to ICD descriptions

How many diseases are not mapped yet?

Manually add some ICD10 codes

How many diseases are not mapped yet?

Mapping by finding fuzzy matches

Map remaining diseases using UK Biobank mapping of EFO to ICD10