Last updated: 2025-10-08

Checks: 7 0

Knit directory: genomics_ancest_disease_dispar/

This reproducible R Markdown analysis was created with workflowr (version 1.7.1). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.


Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

The command set.seed(20220216) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version a8f1628. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .DS_Store
    Ignored:    .Rproj.user/
    Ignored:    analysis/.DS_Store
    Ignored:    data/.DS_Store
    Ignored:    data/gbd/.DS_Store
    Ignored:    data/gbd/IHME-GBD_2021_DATA-d8cf695e-1.csv
    Ignored:    data/gbd/ihme_gbd_2019_global_disease_burden_rate_all_ages.csv
    Ignored:    data/gbd/ihme_gbd_2019_global_paf_rate_percent_all_ages.csv
    Ignored:    data/gbd/ihme_gbd_2021_global_disease_burden_rate_all_ages.csv
    Ignored:    data/gbd/ihme_gbd_2021_global_paf_rate_percent_all_ages.csv
    Ignored:    data/gwas_catalog/
    Ignored:    data/icd/.DS_Store
    Ignored:    data/icd/IHME_GBD_2019_COD_CAUSE_ICD_CODE_MAP_Y2020M10D15.XLSX
    Ignored:    data/icd/IHME_GBD_2019_NONFATAL_CAUSE_ICD_CODE_MAP_Y2020M10D15.XLSX
    Ignored:    data/icd/IHME_GBD_2021_COD_CAUSE_ICD_CODE_MAP_Y2024M05D16.XLSX
    Ignored:    data/icd/IHME_GBD_2021_NONFATAL_CAUSE_ICD_CODE_MAP_Y2024M05D16.XLSX
    Ignored:    data/icd/UK_Biobank_master_file.tsv
    Ignored:    data/icd/cdc_valid_icd10_Sep_23_2025.xlsx
    Ignored:    data/icd/cdc_valid_icd9_Sep_23_2025.xlsx
    Ignored:    data/icd/manual_disease_icd10_mappings.xlsx
    Ignored:    data/icd/phecode_international_version_unrolled.csv
    Ignored:    data/icd/semiautomatic_ICD-pheno.txt
    Ignored:    data/icd/~$IHME_GBD_2019_COD_CAUSE_ICD_CODE_MAP_Y2020M10D15.XLSX
    Ignored:    data/icd/~$IHME_GBD_2019_NONFATAL_CAUSE_ICD_CODE_MAP_Y2020M10D15.XLSX
    Ignored:    data/who/
    Ignored:    output/.DS_Store
    Ignored:    output/gwas_cat/
    Ignored:    output/gwas_study_info_cohort_corrected.csv
    Ignored:    output/gwas_study_info_trait_corrected.csv
    Ignored:    output/gwas_study_info_trait_ontology_info.csv
    Ignored:    output/gwas_study_info_trait_ontology_info_l1.csv
    Ignored:    output/gwas_study_info_trait_ontology_info_l2.csv
    Ignored:    output/icd_map/
    Ignored:    output/trait_ontology/
    Ignored:    renv/
    Ignored:    sup_table.xlsx

Untracked files:
    Untracked:  analysis/gwas_to_gbd.Rmd

Unstaged changes:
    Modified:   analysis/disease_inves_by_ancest.Rmd
    Modified:   analysis/gbd_data_plots.Rmd
    Modified:   analysis/index.Rmd
    Modified:   analysis/level_1_disease_group_non_cancer.Rmd
    Modified:   analysis/level_2_disease_group.Rmd
    Modified:   analysis/trait_ontology_categorization.Rmd
    Modified:   data/icd/README.md

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.


These are the previous versions of the repository in which changes were made to the R Markdown (analysis/map_trait_to_icd10.Rmd) and HTML (docs/map_trait_to_icd10.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File Version Author Date Message
Rmd a8f1628 IJbeasley 2025-10-08 Include study accession in icd 10 map
html 50ebebc IJbeasley 2025-10-08 Build site.
Rmd 9bbe0dd IJbeasley 2025-10-08 Updating icd 10 mapping
html ec027a3 IJbeasley 2025-10-08 Build site.
Rmd cb8a570 IJbeasley 2025-10-08 Updating disease icd code mapping
html 41d6fe5 IJbeasley 2025-09-28 Build site.
Rmd 97d340d IJbeasley 2025-09-28 workflowr::wflow_publish("analysis/map_trait_to_icd10.Rmd")

Set up

library(dplyr)
library(stringr)
library(data.table)

Ontology help - for getting disease subtypes

source(here::here("code/get_term_descendants.R"))

Load Data

gwas_study_info <- fread(here::here("output/gwas_cat/gwas_study_info_group_v2.csv"))


gwas_study_info = gwas_study_info |>
    mutate(collected_all_disease_terms = stringi::stri_trans_general(collected_all_disease_terms, "Latin-ASCII")
           )

Automatic mapping of GWAS traits to ICD 10

Create a mapping table

disease_mapping <- gwas_study_info |>
  filter(DISEASE_STUDY == T) |>
  tidyr::separate_longer_delim(cols = collected_all_disease_terms, 
                               delim = ", ") |>
  select(`DISEASE/TRAIT`, 
         collected_all_disease_terms, 
         PUBMED_ID,
         STUDY_ACCESSION) |>
  distinct()


disease_mapping =
  disease_mapping |>
  filter(collected_all_disease_terms != "")

print("Number of unique disease trait & study pairs")
[1] "Number of unique disease trait & study pairs"
nrow(disease_mapping)
[1] 45184
diseases <- stringr::str_split(pattern = ", ",
                               gwas_study_info$collected_all_disease_terms[gwas_study_info$collected_all_disease_terms != ""])  |>
  unlist() |>
  stringr::str_trim()

diseases <- unique(diseases)

print("Number of unique disease terms")
[1] "Number of unique disease terms"
print(length(diseases))
[1] 1755

Get ICD10 codes from PheCodes

Get Phecodes for diseases

disease_mapping <- disease_mapping |>
  mutate(
    phecode = str_extract(`DISEASE/TRAIT`, "(?<=PheCode )[^)]+")
  ) |>
  mutate(phecode = as.numeric(phecode))

Convert Phecodes to ICD10

# phecode to ICD10 mapping from https://wei-lab.app.vumc.org/phecode-data/phecode_international_version

phecodes <- fread(here::here("data/icd/phecode_international_version_unrolled.csv"))

phecode_icd_map =
  phecodes |>
  select(icd10_code = ICD10, 
         phecode = PheCode
         )

# if more than one ICD10 code per phecode, collapse into a single row
phecode_icd_map =
  phecode_icd_map |>
group_by(phecode) |>
  summarise(icd10_code = 
            str_flatten(unique(icd10_code), collapse = ", ", na.rm = T), 
            .groups = "drop")

# phecode icd 10 map
phecode_icd_map = 
  phecode_icd_map |>
  mutate(icd10_code_origin = "Study PheCode Mapping")

disease_mapping =
left_join(disease_mapping,
          phecode_icd_map,
          by = "phecode",
          relationship = "many-to-one",
          na_matches = "never")

# disease_mapping =
#   disease_mapping |>
#   mutate(icd10_code_origin = "Study PheCode")

disease_mapping |>
  filter(!is.na(icd10_code_origin)) |>
  nrow()
[1] 6677
disease_mapping |>
  filter(!is.na(icd10_code_origin)) |>
  head()
                                               DISEASE/TRAIT
1                          Neurofibromatosis (PheCode 199.4)
2                   Myeloproliferative disease (PheCode 200)
3                          Polycythemia vera (PheCode 200.1)
4                            Hodgkin's disease (PheCode 201)
5 Cancer of other lymphoid, histiocytic tissue (PheCode 202)
6                      Non-Hodgkins lymphoma (PheCode 202.2)
  collected_all_disease_terms PUBMED_ID STUDY_ACCESSION phecode
1           neurofibromatosis  30104761    GCST90435639   199.4
2 myeloproliferative disorder  30104761    GCST90435640   200.0
3           polycythemia vera  30104761    GCST90435641   200.1
4           hodgkins lymphoma  30104761    GCST90435642   201.0
5     lymphatic system cancer  30104761    GCST90435643   202.0
6       non-hodgkins lymphoma  30104761    GCST90435644   202.2
                                                                                                                                                                                           icd10_code
1                                                                                                                                                                                               Q85.0
2                                                                                 C88.7, C94.4, C94.5, D46, D46.0, D46.1, D46.2, D46.4, D46.7, D46.9, D47.0, D47.1, D47.3, D47.7, D47.9, D57.8, D75.2
3                                                                                                                                                                                                 D45
4                                                                                                                                                       C81, C81.0, C81.1, C81.2, C81.3, C81.7, C81.9
5                                                                                                                                                                   C96.0, C96.1, C96.2, C96.3, Z85.7
6 B21.1, C82.0, C82.1, C82.2, C82.7, C83, C83.0, C83.1, C83.2, C83.4, C83.5, C83.6, C83.7, C83.8, C83.9, C84, C84.0, C84.1, C84.2, C84.3, C84.4, C84.5, C85, C85.1, C85.7, C85.9, C96.7, C96.9, L41.2
      icd10_code_origin
1 Study PheCode Mapping
2 Study PheCode Mapping
3 Study PheCode Mapping
4 Study PheCode Mapping
5 Study PheCode Mapping
6 Study PheCode Mapping

Missing ICD10 codes for PheCodes

disease_mapping |>
    filter(is.na(icd10_code) & !is.na(phecode)) |> 
    nrow()
[1] 894
phecode = c("38.30",
            "79.90",
            "110.00",
            "530.13",
            "562.20",
            "580.10",
            "174.00",
            "174.20",
            "189.10",
            "218.00",
            "228.10",
            "224.00",
            "250.15",
            "250.25",
            "250.40",
            "285.21",
            "292.11",
            "362.26",
            "362.23",
            "362.27",
            "362.50",
            "452.20",
            "528.10",
            "535.90",
            "724.10",
            "724.22",
            "740.00",
            "743.10",
            "743.12",
            "743.13",
            "172.21",
            "274.00",
            "286.00",
            "282.00",
            "280.00",
            "272.00",
            "276.00",
            "338.00",
            "350.00",
            "401.00",
            "411.00",
            "414.20",
            "415.1",
            "427.21",
            "429.00",
            "585.00",
            "735.22",
            "724.00",
            "722.00",
            "716.10",
            "709.00",
            "706.00",
            "592.00",
            "571.00",
            "580.00",
            "170.00",
            "172.00",
            "264.00",
            "291.00",
            "555.00",
            "562.00",
            "578.00",
            "783.1",
            "536.80",
            "427.40",
            "443.00",
            "41.80",
            "41.90",
            "244.00",
            "250.00",
            "427.40",
            "526.40",
            "977.00",
            "840.20",
            "823.00",
            "751.00",
            "743.00",
            "270.38",
            "504.10",
            "253.40",
            "279.20",
            "426.22",
            "537.1",
            "707.20",
            "736.10",
            "789.10",
            "290.13",
            "327.70",
            "433.60",
            "695.00",
            "602.30",
            "375.10",
            "560.00",
            "586.10",
            "593.20",
            "620.10",
            "475.90",
            "799.00")

icd10_code = c("A49.9",
               "B34.9",
               "B35, B36",
               "K22.7",
               "K57",
               "N00, N01, N02, N03, N04, N05, N06, N07",
               "Z85.3",
               "C50",
               "C64, C65",
               "D25, D26",
               "D18.0",
               "E00, E00.0, E00.1, E00.2, E00.9, E01.8, E02, E03.0, E03.1, E03.2, E03.3, E03.8, E03.9, E89.0",
               "E10.5",
               "E11.5",
               "R73.0",
               "D63",
               "R47.0",
               "H35.3",
               "H35.3",
               "H35.3",
               "H35.5",
               "I80",
               "K12.30",
               "K29.7, K29.8, K29.9",
               "M43.2",
               "M21.5",
               "M13.9, M15.0, M15.1, M15.2, M15.3, M15.4, M16, M16.0, M16.1, M16.3, M16.6, M16.7, M16.9, M17.1, M17.4, M17.5, M18.0, M18.1, M18.5, M18.9, M19.0, M19.2",
               "M81",
               "M81.8",
               "M81.8",
               "C44",
               "M10, M10.0, M10.1, M10.2, M10.4, M10.9, M11.0, M11.1, M11.2, M11.8, M11.9, M67.9",
               "D65, D66, D67, D68, D68, D68.0, D68.1, D68.2, D68.3, D68.4, D68.8, D68.9, O72.3, O99.1",
               "D55, D55.0, D55.1, D55.2, D55.3, D55.8, D55.9, D56, D56.0, D56.1, D56.2, D56.3, D56.4, D56.8, D56.9, D57, D57.0, D57.1, D57.2, D57.3, D57.8, D58, D58.0, D58.1, D58.2, D58.8, D58.9, M90.4",
               "D50, D50.0, D50.1, D50.8, D50.9",
               "E78.0, E78.1, E78.2, E78.3, E78.4, E78.5, E78.9",
               "E86, E87.0, E87.1, E87.2, E87.3, E87.4, E87.5, E87.6, E87.7, E87.8, R63.1",
               "R52.0, R52.2, R52.9",
               "R25, R25.0, R25.1, R25.2, R25.3, R25.8, R26, R26.0, R26.1, R26.8, R27, R27.0, R27.8, R29.0, R29.2, R43, R43.0, R43.1, R43.2",
               "I10, I11, I11.0, I11.9, I12, I12.0, I12.9, I13, I13.0, I13.1, I13.2, I13.9, I15, I15.0, I15.1, I15.2, I15.8, I15.9, I67.4",
               "I20, I20.0, I20.1, I20.8, I20.9, I21, I21.0, I21.1, I21.2, I21.3, I21.4, I21.9, I22, I22.0, I22.1, I22.8, I22.9, I23, I23.0, I23.1, I23.2, I23.3, I23.6, I23.8, I24, I24.0, I24.1, I24.8, I24.9, I25, I25.1, I25.2, I25.3, I25.4, I25.5, I25.6, I25.8, I25.9, I34.1, I51.0, I51.3, Z95.1, Z95.5",
               "I25.10",
               "I26, I26.0",
               "I48",
               "I51.8",
               "N17, N17.0, N17.1, N17.2, N17.8, N17.9, N18, N18.0, N18.9, N19, Y60.2, Y61.2, Y62.0, Y84.1, Z49.1, Z49.2, Z99.2",
               "M21.5",
               "M40.2, M43.2, M43.8, M48.8, M49.8, M50.0, M99.6",
               "G55.1, M46.4, M50, M50.0, M50.0, M50.1, M50.2, M50.3, M50.8, M50.9, M51.3, M51.4, M96.1",
               "M13.0",
               "L94.3, M33, M33.0, M33.1, M33.2, M33.9, M34, M34.0, M34.1, M34.2, M34.8, M34.9, M35.0, M35.1, M35.5, M35.8, M35.9, M36.0, M36.8, M65.3, N16.4",
               "K09.8, L70, L70.0, L70.1, L70.2, L70.3, L70.4, L70.5, L70.8, L70.9, L72, L72.0, L72.1, L72.2, L72.8, L72.9, L73.0, L85.3",
               "N30, N30.0, N30.1, N30.2, N30.3, N30.8, N30.9, N34, N34.0, N34.2, N34.3, N35.1, N37",
                "K70.4, K72, K72.1, K72.9, K74.0, K74.1, K74.2, K74.3, K74.4, K74.5, K74.6, K75.0, K75.1, K76.0, K76.6, K76.7",
               "B52.0, N00.0, N00.1, N00.2, N00.3, N00.4, N00.5, N00.6, N00.7, N01, N01.0, N01.1, N01.2, N01.3, N01.4, N01.5, N01.6, N01.7, N01.9, N02.0, N02.1, N02.2, N02.3, N02.4, N02.5, N02.6, N02.7, N03, N03.0, N03.1, N03.2, N03.3, N03.4, N03.5, N03.6, N03.7, N03.9, N04, N04.0, N04.1, N04.2, N04.3, N04.4, N04.5, N04.6, N04.7, N04.8, N04.9, N05, N05.0, N05.1, N05.2, N05.3, N05.4, N05.5, N05.6, N05.7, N05.9, N06.0, N06.1, N06.2, N06.3, N06.4, N06.5, N06.6, N06.7, N07.0, N07.1, N07.2, N07.3, N07.4, N07.5, N07.6, N07.7, N08, N08.1, N08.2, N08.3, N08.4, N08.5, N08.8, N14, N14.0, N14.1, N14.2, N14.3, N14.4, N15.0, N15.8, N16.1, N16.2, N16.3, N16.4, N16.5",
               "C40, C40.0, C40.1, C40.2, C40.3, C40.8, C40.9, C41, C41.0, C41.1, C41.2, C41.3, C41.4, C41.9, C47, C47.0, C47.1, C47.2, C47.3, C47.4, C47.5, C47.6, C47.8, C47.9, C49, C49.0, C49.1, C49.2, C49.3, C49.4, C49.5, C49.6, C49.8, C49.9",
"C43, C43.0, C43.1, C43.2, C43.3, C43.4, C43.5, C43.6, C43.7, C43.8, C43.9, C44.0, C44.1, C44.2, C44.3, C44.4, C44.5, C44.6, C44.7, C44.8, C44.9, D03, D03.0, D03.1, D03.2, D03.3, D03.4, D03.5, D03.6, D03.7, D03.8, D03.9, D04, D04.0, D04.1, D04.2, D04.3, D04.4, D04.5, D04.6, D04.7, D04.8, D04.9",
"R62.0, R62.8, R62.9",
"F06, F06.1, F07.0, F07.1, F07.2, F07.8, F07.9, F23, F23.0, F23.1, F23.8, F23.9, G47.1, R40.0, R40.1",
"K50, K50.0, K50.1, K50.8, K50.9, K51, K51.0, K51.1, K51.2, K51.3, K51.4, K51.5, K51.8, K51.9",
"K57, K57.0, K57.1, K57.2, K57.3, K57.4, K57.5, K57.8, K57.9",
"K62.5, K92.0, K92.1, K92.2",
"R50.8",
"K30",
"I46, I46.0, I46.9, I49.0",
"E10.5, E11.5, E14.5, I73, I73.0, I73.8, I73.9, I79.1, I79.2, I79.8",
"B96.8",
"U82, U83, U84",
"E00, E00.0, E00.1, E00.2, E00.9, E01.8, E02, E03.0, E03.1, E03.2, E03.3, E03.8, E03.9, E89.0",
"E10, E10.0, E10.1, E10.2, E10.3, E10.3, E10.3, E10.4, E10.4, E10.6, E10.7, E10.8, E10.9, E11, E11.0, E11.1, E11.2, E11.3, E11.4, E11.6, E11.7, E11.8, E11.9, E12.3, E13, E13.1, E13.3, E13.4, E13.5, E13.6, E13.7, E13.8, E13.9, E14.9, G59.0, G63.2, H36.0, R73.0, R73.9, R81, R82.4, Z96.4",
"I46, I46.0, I46.9, I49.0",
"K07.6",
"Z88.9",
"S43.4",
"S82.1, S82.3, S82.8",
 "Q50.0, Q50.1, Q50.2, Q50.3, Q50.4, Q50.5, Q50.6, Q51, Q51.0, Q51.1, Q51.2, Q51.3, Q51.4, Q51.5, Q51.6, Q51.7, Q51.8, Q51.9, Q52.0, Q52.1, Q52.2, Q52.3, Q52.4, Q52.5, Q52.6, Q52.7, Q52.8, Q52.9, Q53, Q53.0, Q53.1, Q53.2, Q53.9, Q54, Q54.0, Q54.1, Q54.2, Q54.3, Q54.4, Q54.8, Q54.9, Q55, Q55.0, Q55.1, Q55.2, Q55.3, Q55.4, Q55.5, Q55.6, Q55.8, Q55.9, Q56, Q56.0, Q56.1, Q56.2, Q56.3, Q56.4, Q60, Q60.0, Q60.1, Q60.2, Q60.3, Q60.4, Q60.5, Q60.6, Q61, Q61.0, Q61.1, Q61.2, Q61.3, Q61.4, Q61.5, Q61.8, Q61.9, Q62, Q62.0, Q62.1, Q62.3, Q62.4, Q62.5, Q62.6, Q62.7, Q62.8, Q63, Q63.0, Q63.1, Q63.2, Q63.3, Q63.8, Q63.9, Q64.0, Q64.1, Q64.2, Q64.3, Q64.4, Q64.5, Q64.6, Q64.7, Q64.8, Q64.9",
"M48.4, M48.5, M80.5, M80.8, M81.6, M81.9, M84.4, M85.9, M89.9",
"E88.0",
"J84.1, J84.2",
"E23.6",
"D89.8",
"I44.1",
"K31.8. K31.9",
"L97",
"M21.0, M21.1, M21.9",
"R11.10",
"F03",
"G47.6, G25.8",
"G43.6",
"L49",
"N42.3",
"H04.1",
"K56.6",
"N28.8",
"R31.2",
"N87",
"R09.8",
"R53")

icd10_code_origin = rep("Study PheCode (Manual Mapping)", 
                        length(phecode))

to_add = data.frame(phecode, 
                    icd10_code, 
                    icd10_code_origin)

to_add = to_add |> distinct()

to_add = 
  to_add |>
  mutate(phecode = as.numeric(phecode))

disease_mapping =
  rows_patch(disease_mapping,
             to_add,
             unmatched = "ignore")
Matching, by = "phecode"

Get ICD10 codes from author provided DISEASE/TRAIT column

disease_mapping =
  disease_mapping |>
  mutate(icd10_code = ifelse(grepl("ICD10", `DISEASE/TRAIT`),
                             str_extract(`DISEASE/TRAIT`, "(?<=ICD10 )[^:]+(?=:)"),
                             icd10_code),
         icd10_code_origin = ifelse(grepl("ICD10", `DISEASE/TRAIT`),
                                    "Study Provided",
                                    icd10_code_origin)
)


disease_mapping |>
  filter(icd10_code_origin == "Study Provided") |>
  nrow()
[1] 3803
# disease_mapping =
# disease_mapping |>
#   group_by(collected_all_disease_terms) |>
#   summarise(icd10_code = str_flatten(unique(icd10_code), 
#                                      collapse = ", ", 
#                                      na.rm = T), 
#             .groups = "drop")

How many diseases are not mapped yet?

disease_mapping_matched =
  disease_mapping |>
  filter(icd10_code != "" | !is.na(icd10_code))

not_found_diseases <- diseases[!diseases %in% disease_mapping_matched$collected_all_disease_terms] 
not_found_diseases <- not_found_diseases[not_found_diseases != ""]

print("Number of diseases not mapped to a single ICD10 code yet:")
[1] "Number of diseases not mapped to a single ICD10 code yet:"
print(length(not_found_diseases))
[1] 487

Get ICD10 codes by matching terms

Match disease terms to phenotypes (corresponding to PheCodes)

phenotype_icd_map =
  phecodes |>
group_by(Phenotype) |>
  summarise(icd10_code = 
            str_flatten(ICD10, collapse = ", ", na.rm = T), 
            .groups = "drop")

matched_phenotypes =
phenotype_icd_map #|>
#filter(tolower(Phenotype) %in% not_found_diseases)

matched_phenotypes =
matched_phenotypes |>
  mutate(collected_all_disease_terms = tolower(Phenotype)) |>
  select(collected_all_disease_terms, icd10_code) |>
  mutate(icd10_code_origin = "Phecode Phenotype Match")

# match by collected_all_disease_terms
disease_mapping = 
disease_mapping |>
rows_patch(matched_phenotypes,
           unmatched = "ignore")

# match by DISEASE/TRAIT
matched_phenotypes =
  matched_phenotypes |>
  rename(`DISEASE/TRAIT` = collected_all_disease_terms)

disease_mapping =
disease_mapping |>
rows_patch(matched_phenotypes,
           unmatched = "ignore")

disease_mapping |>
  filter(icd10_code_origin == "Phecode Phenotype Match") |>
  nrow()
[1] 5553

How many diseases are not mapped yet?

# disease_mapping =
#   disease_mapping |>
#   filter(icd10_code != "")

matched <- c(disease_mapping_matched$collected_all_disease_terms,
             matched_phenotypes$collected_all_disease_terms)
Warning: Unknown or uninitialised column: `collected_all_disease_terms`.
not_found_diseases <- diseases[!diseases %in% 
                               matched
                               ] 

not_found_diseases <- not_found_diseases[not_found_diseases != ""]

print(length(not_found_diseases))
[1] 487

Match disease terms to ICD 10 descriptions

matched_icd10_desc  = 
phecodes |>
    #filter(tolower(iconv(ICD_DESCRIPTION, to = "UTF-8")) %in% not_found_diseases) |>
    mutate(collected_all_disease_terms = tolower(iconv(ICD_DESCRIPTION, to = "UTF-8"))) |>
    select(collected_all_disease_terms,    
           icd10_code = ICD10) 


matched_icd10_desc =
  matched_icd10_desc |>
  group_by(collected_all_disease_terms) |>
  summarise(icd10_code = str_flatten(unique(icd10_code), 
                                     collapse = ", ", 
                                     na.rm = T), 
            .groups = "drop")

# match by collected_all_disease_terms
matched_icd10_desc = 
  matched_icd10_desc |>
    mutate(icd10_code_origin = "ICD Description Match")

disease_mapping = 
disease_mapping |>
rows_patch(matched_icd10_desc,
           unmatched = "ignore")

# match by DISEASE/TRAIT
matched_icd10_desc =
  matched_icd10_desc |>
  rename(`DISEASE/TRAIT` = collected_all_disease_terms)

disease_mapping =
  rows_patch(disease_mapping,
             matched_icd10_desc,
             unmatched = "ignore")

disease_mapping |>
  filter(icd10_code_origin == "ICD Description Match") |>
  nrow()
[1] 13176

How many diseases are not mapped yet?

# disease_mapping =
#   disease_mapping |>
#   filter(icd10_code != "")

matched <- c(disease_mapping_matched$collected_all_disease_terms,
             matched_phenotypes$collected_all_disease_terms,
             to_add$collected_all_disease_terms)
Warning: Unknown or uninitialised column: `collected_all_disease_terms`.
not_found_diseases <- diseases[!diseases %in%  matched] 
not_found_diseases <- not_found_diseases[not_found_diseases != ""]

print(length(not_found_diseases))
[1] 487
disease_mapping |>
  filter(is.na(icd10_code_origin)) |>
  nrow()
[1] 15565

Manual mapping of GWAS traits to ICD 10

manual_icd10_map <-
  readxl::read_xlsx(here::here("data/icd/manual_disease_icd10_mappings.xlsx"))

manual_icd10_map =
  manual_icd10_map |>
  select(collected_all_disease_terms = mapped_trait, 
         icd10_code) |>
  mutate(icd10_code_origin = "Manual Mapping (collected_all_disease_terms)")

# disease_mapping =
#   bind_rows(disease_mapping, to_add) |>
#   distinct()
disease_mapping =
  rows_patch(disease_mapping,
             manual_icd10_map,
             unmatched = "ignore")
Matching, by = "collected_all_disease_terms"
disease_mapping |>
  filter(icd10_code_origin == "Manual Mapping (collected_all_disease_terms)") |>
  nrow()
[1] 1519

Additional manual mapping

disease_mapping =
  disease_mapping |>
  mutate(icd10_code = ifelse(collected_all_disease_terms == "toxicity" & 
                            grepl("Anthracycline-induced cardiotoxicity|Trastuzumab-induced cardiotoxicity", `DISEASE/TRAIT`, ignore.case = T),
                            "I42.7, T45.1",
                             icd10_code)) |>
mutate(icd10_code_origin = 
         ifelse(collected_all_disease_terms == "toxicity" & 
                  grepl("Anthracycline-induced cardiotoxicity|Trastuzumab-induced cardiotoxicity", `DISEASE/TRAIT`, ignore.case = T),
                "Manual Mapping (from DISEASE/TRAIT)",
                icd10_code_origin)
         )

disease_mapping =
  disease_mapping |>
  mutate(icd10_code = ifelse(collected_all_disease_terms == "toxicity" & 
                            grepl("Induration at the injection site after COVID-19", `DISEASE/TRAIT`, ignore.case = T),
                            "R23.4",
                             icd10_code)) |>
  mutate(collected_all_disease_terms = 
           ifelse(collected_all_disease_terms == "toxicity" & 
                    grepl("Induration at the injection site after COVID-19", `DISEASE/TRAIT`, ignore.case = T),
                  "induration of skin",
                  collected_all_disease_terms)) |>
  mutate(icd10_code_origin =
           ifelse(collected_all_disease_terms == "induration of skin",
                  "Manual Mapping (from DISEASE/TRAIT)",
                  icd10_code_origin)
         )


# N64.5
disease_mapping =
  disease_mapping |>
  mutate(icd10_code = ifelse(collected_all_disease_terms == "toxicity" & 
                            grepl("Induration \\(>= grade 2\\) in breast cancer treated with radiotherapy", 
                                  `DISEASE/TRAIT`, ignore.case = T),
                            "N64.5",
                             icd10_code)) |>
  mutate(collected_all_disease_terms = 
           ifelse(collected_all_disease_terms == "toxicity" & 
                    grepl("Induration \\(>= grade 2\\) in breast cancer treated with radiotherapy", 
                          `DISEASE/TRAIT`, ignore.case = T),
                  "induration of breast",
                  collected_all_disease_terms)) |>
  mutate(icd10_code_origin =
           ifelse(collected_all_disease_terms == "induration of breast",
                  "Manual Mapping (from DISEASE/TRAIT)",
                  icd10_code_origin)
         )


# T45.1
disease_mapping =
  disease_mapping |>
  mutate(icd10_code = ifelse(collected_all_disease_terms == "toxicity" & 
                            grepl("Nivolumab-induced immune-related adverse events in cancer|Response to immune checkpoint inhibitors in melanoma", 
                                  `DISEASE/TRAIT`, ignore.case = T),
                            "T45.1",
                             icd10_code)) |>
  mutate(icd10_code_origin =
           ifelse(collected_all_disease_terms == "toxicity" & 
                    grepl("Nivolumab-induced immune-related adverse events in cancer|Response to immune checkpoint inhibitors in melanoma", 
                          `DISEASE/TRAIT`, ignore.case = T),
                  "Manual Mapping (from DISEASE/TRAIT)",
                  icd10_code_origin)
         )


disease_mapping =
  disease_mapping |>
  mutate(icd10_code = ifelse(collected_all_disease_terms == "toxicity" & 
                            grepl("Methotrexate-related central neurotoxicity in children treated for acute lymphoblastic leukemia", 
                                  `DISEASE/TRAIT`, ignore.case = T),
                            "G92, T45.1",
                             icd10_code)) |>
  mutate(icd10_code_origin =
           ifelse(collected_all_disease_terms == "toxicity" & 
                    grepl("Methotrexate-related central neurotoxicity in children treated for acute lymphoblastic leukemia", 
                          `DISEASE/TRAIT`, ignore.case = T),
                  "Manual Mapping (from DISEASE/TRAIT)",
                  icd10_code_origin)
         )

How many diseases are not mapped yet?

# disease_mapping =
#   disease_mapping |>
#   filter(icd10_code != "")

matched <- c(disease_mapping_matched$collected_all_disease_terms,
             matched_phenotypes$collected_all_disease_terms,
             to_add$collected_all_disease_terms,
             manual_icd10_map$collected_all_disease_terms)
Warning: Unknown or uninitialised column: `collected_all_disease_terms`.
not_found_diseases <- diseases[!diseases %in% matched] 
not_found_diseases <- not_found_diseases[not_found_diseases != ""]

print(length(not_found_diseases))
[1] 149

Inferring ICD10 codes from similar study ICD10 codes

# similar studies:
study_icd_map =
disease_mapping |>
filter(!is.na(icd10_code)) |>
filter(icd10_code_origin == "Study Provided" | icd10_code_origin == "Study PheCode Mapping") 

study_icd_map =
  study_icd_map |>
  select(collected_all_disease_terms, icd10_code) |>
  distinct()

study_icd_map =
  study_icd_map |>
  group_by(collected_all_disease_terms) |>
  summarise(icd10_code = str_flatten(unique(sort(icd10_code)), 
                                     collapse = ", ", 
                                     na.rm = T), 
            .groups = "drop")

study_icd_map = 
  study_icd_map |>
  mutate(icd10_code_origin = "Inferred from similar studies")

disease_mapping = 
  rows_patch(disease_mapping,
             study_icd_map,
             unmatched = "ignore")
Matching, by = "collected_all_disease_terms"
disease_mapping |>
  filter(is.na(icd10_code)) |>
  nrow()
[1] 331

How was ICD-10 code inferred?

disease_mapping  |> 
group_by(icd10_code_origin) |> 
summarise(n = n()) |> 
arrange(desc(n))
# A tibble: 9 × 2
  icd10_code_origin                                n
  <chr>                                        <int>
1 Inferred from similar studies                13705
2 ICD Description Match                        13176
3 Study PheCode Mapping                         6677
4 Phecode Phenotype Match                       5553
5 Study Provided                                3803
6 Manual Mapping (collected_all_disease_terms)  1519
7 Study PheCode (Manual Mapping)                 410
8 <NA>                                           331
9 Manual Mapping (from DISEASE/TRAIT)             10

Saving disease mapping

Add description of ICD10 codes to mapping table

Prepare / fix disease mappings

Fixing multiple ICD10 codes missing commas

disease_mapping = 
  disease_mapping |> 
  mutate(icd10_code = str_replace_all(
    icd10_code,
    pattern = "F40 F41 F42",
    replacement = "F40, F41, F42"
    )
  )

disease_mapping = 
  disease_mapping |> 
  mutate(icd10_code = str_replace_all(
    icd10_code,
    pattern = "G47.8. G47.9",
    replacement = "G47.8, G47.9"
    )
  )

Converting ICD-10 cm codes to WHO ICD-10

disease_mapping =
  disease_mapping |>
  mutate(icd10_code = str_split(icd10_code, ",\\s*")) |>
  tidyr::unnest(icd10_code) 


disease_mapping = 
  disease_mapping |> 
    mutate(icd10_code = str_replace_all(
    icd10_code,
    pattern = "B37.49",
    replacement = "B37.4"
    )
    ) 

disease_mapping = 
  disease_mapping |> 
    mutate(icd10_code = str_replace_all(
    icd10_code,
    pattern = "E05.90",
    replacement = "E05.9"
    )
    ) 

disease_mapping = 
  disease_mapping |> 
  mutate(icd10_code = str_replace_all(
    icd10_code,
    pattern = "E11.319|E11.31|E11.329.9|E11.329.|E11.32|E11.3.",
    replacement = "E11.3"
    )
  )

disease_mapping = 
  disease_mapping |> 
  mutate(icd10_code = str_replace_all(
    icd10_code,
    pattern = "E11.3.",
    replacement = "E11.3"
    )
  )

disease_mapping = 
  disease_mapping |> 
  mutate(icd10_code = str_replace_all(
    icd10_code,
    pattern = "E13.621.|E13.62",
    replacement = "E11.3"
    )
  )

disease_mapping = 
  disease_mapping |> 
    mutate(icd10_code = str_replace_all(
    icd10_code,
    pattern = "E78.00",
    replacement = "E78.0"
    )
    ) 

disease_mapping = 
  disease_mapping |> 
    mutate(icd10_code = str_replace_all(
    icd10_code,
    pattern = "F10.10",
    replacement = "F10.1"
    )
    ) 

disease_mapping = 
  disease_mapping |> 
    mutate(icd10_code = str_replace_all(
    icd10_code,
    pattern = "F17.201.|F17.20",
    replacement = "F17.2"
    )
    ) 

disease_mapping = 
  disease_mapping |> 
    mutate(icd10_code = str_replace_all(
    icd10_code,
    pattern = "F430",
    replacement = "F43.0"
    )
    ) 

disease_mapping = 
  disease_mapping |> 
    mutate(icd10_code = str_replace_all(
    icd10_code,
    pattern = "G47.00",
    replacement = "G47.0"
    )
    ) 

disease_mapping = 
  disease_mapping |> 
    mutate(icd10_code = str_replace_all(
    icd10_code,
    pattern = "H01.00|H01.09.",
    replacement = "H01.0"
    )
    ) 

disease_mapping = 
  disease_mapping |> 
    mutate(icd10_code = str_replace_all(
    icd10_code,
    pattern = "H01.09.",
    replacement = "H01.0"
    )
    ) 

disease_mapping = 
  disease_mapping |> 
    mutate(icd10_code = str_replace_all(
    icd10_code,
    pattern = "H029",
    replacement = "H02.9"
    )
    ) 

disease_mapping = 
  disease_mapping |> 
    mutate(icd10_code = str_replace_all(
    icd10_code,
    pattern = "H60.90",
    replacement = "H60.9"
    )
    ) 

disease_mapping = 
  disease_mapping |> 
    mutate(icd10_code = str_replace_all(
    icd10_code,
    pattern = "H60.90",
    replacement = "H60.9"
    )
    ) 

disease_mapping = 
  disease_mapping |> 
    mutate(icd10_code = str_replace_all(
    icd10_code,
    pattern = "H91.8X9.",
    replacement = "H91.8"
    )
    ) 

disease_mapping = 
  disease_mapping |> 
    mutate(icd10_code = str_replace_all(
    icd10_code,
    pattern = "H91.90",
    replacement = "H91.9"
    )
    ) 

disease_mapping = 
  disease_mapping |> 
    mutate(icd10_code = str_replace_all(
    icd10_code,
    pattern = "H93.299.|H93.29",
    replacement = "H01"
    )
    ) 

disease_mapping = 
  disease_mapping |> 
    mutate(icd10_code = str_replace_all(
    icd10_code,
    pattern = "I70.20",
    replacement = "I70.2"
    )
    ) 

disease_mapping = 
  disease_mapping |> 
    mutate(icd10_code = str_replace_all(
    icd10_code,
    pattern = "I82.409.|I82.40|I82.4",
    replacement = "I82"
    )
    ) 

disease_mapping = 
  disease_mapping |> 
    mutate(icd10_code = str_replace_all(
    icd10_code,
    pattern = "J30.9",
    replacement = "J30.2"
    )
    ) 

disease_mapping = 
  disease_mapping |> 
  mutate(icd10_code = str_replace_all(
    icd10_code,
    pattern = "J45.998.|J45.909.|J45.901.|J45.99|J45.90",
    replacement = "J45.9"
    )
  )

disease_mapping = 
  disease_mapping |> 
    mutate(icd10_code = str_replace_all(
    icd10_code,
    pattern = "J98.457.6",
    replacement = "J98.4"
    )
    ) 

disease_mapping = 
  disease_mapping |> 
    mutate(icd10_code = str_replace_all(
    icd10_code,
    pattern = "K05.30-31",
    replacement = "K05.3"
    )
    ) 

disease_mapping = 
  disease_mapping |> 
    mutate(icd10_code = str_replace_all(
    icd10_code,
    pattern = "K29.70",
    replacement = "K29.7"
    )
    ) 

disease_mapping = 
  disease_mapping |> 
    mutate(icd10_code = str_replace_all(
    icd10_code,
    pattern = "K59.00",
    replacement = "K59.0"
    )
    ) 

disease_mapping = 
  disease_mapping |> 
    mutate(icd10_code = str_replace_all(
    icd10_code,
    pattern = "L08.89",
    replacement = "L08.8"
    )
    ) 

disease_mapping = 
  disease_mapping |> 
    mutate(icd10_code = str_replace_all(
    icd10_code,
    pattern = "M06.99|M06.90",
    replacement = "M06.9"
    )
    ) 

disease_mapping = 
  disease_mapping |> 
    mutate(icd10_code = str_replace_all(
    icd10_code,
    pattern = "M10.99",
    replacement = "M10.9"
    )
    ) 

disease_mapping = 
  disease_mapping |> 
    mutate(icd10_code = str_replace_all(
    icd10_code,
    pattern = "M10.99",
    replacement = "M10.9"
    )
    ) 

disease_mapping = 
  disease_mapping |> 
  mutate(icd10_code = str_replace_all(
    icd10_code,
    pattern = "M13.90|M13.94|M13.96|M13.97|M13.99",
    replacement = "M13.9"
    )
  )

disease_mapping = 
  disease_mapping |> 
  mutate(icd10_code = str_replace_all(
    icd10_code,
    pattern = "M19.07",
    replacement = "M19.0"
    )
  )

disease_mapping = 
  disease_mapping |> 
  mutate(icd10_code = str_replace_all(
    icd10_code,
    pattern = "M19.90|M19.91|M19.94|M19.97|M19.99",
    replacement = "M19.9"
    )
  )



disease_mapping = 
  disease_mapping |> 
  mutate(icd10_code = str_replace_all(
    icd10_code,
    pattern = "M25.50|M25.51|M25.55|M25.569.|M25.56|M25.571.|M25.57",
    replacement = "M25.5"
    )
  )

disease_mapping = 
  disease_mapping |> 
    mutate(icd10_code = str_replace_all(
    icd10_code,
    pattern = "M25.76|M25.77",
    replacement = "M25.7"
    )
    ) 

disease_mapping = 
  disease_mapping |> 
    mutate(icd10_code = str_replace_all(
    icd10_code,
    pattern = "M43.16",
    replacement = "M43.1"
    )
    ) 

disease_mapping = 
  disease_mapping |> 
  mutate(icd10_code = str_replace_all(
    icd10_code,
    pattern = "M47.80|M47.82|M47.86",
    replacement = "M47.8"
    )
  )

disease_mapping = 
  disease_mapping |> 
  mutate(icd10_code = str_replace_all(
    icd10_code,
    pattern = "M47.92|M47.96",
    replacement = "M47.9"
    )
  )

disease_mapping = 
  disease_mapping |> 
    mutate(icd10_code = str_replace_all(
    icd10_code,
    pattern = "M48.02|M48.06",
    replacement = "M48.0"
    )
    ) 

disease_mapping = 
  disease_mapping |> 
    mutate(icd10_code = str_replace_all(
    icd10_code,
    pattern = "M54.22",
    replacement = "M54.2"
    )
    ) 

disease_mapping = 
  disease_mapping |> 
    mutate(icd10_code = str_replace_all(
    icd10_code,
    pattern = "M54.30|M54.39",
    replacement = "M54.3"
    )
    ) 

disease_mapping = 
  disease_mapping |> 
    mutate(icd10_code = str_replace_all(
    icd10_code,
    pattern = "M54.56|M54.57|M54.59",
    replacement = "M54.5"
    )
    ) 

disease_mapping = 
  disease_mapping |> 
    mutate(icd10_code = str_replace_all(
    icd10_code,
    pattern = "M54.99",
    replacement = "M54.9"
    )
    ) 

disease_mapping = 
  disease_mapping |> 
    mutate(icd10_code = str_replace_all(
    icd10_code,
    pattern = "M65.34",
    replacement = "M65.3"
    )
    ) 

disease_mapping = 
  disease_mapping |> 
    mutate(icd10_code = str_replace_all(
    icd10_code,
    pattern = "M65.96",
    replacement = "M65.9"
    )
    ) 

disease_mapping = 
  disease_mapping |> 
    mutate(icd10_code = str_replace_all(
    icd10_code,
    pattern = "M72.04",
    replacement = "M72.0"
    )
    ) 

disease_mapping = 
  disease_mapping |> 
    mutate(icd10_code = str_replace_all(
    icd10_code,
    pattern = "M79.09",
    replacement = "M79.0"
    )
    ) 

disease_mapping = 
  disease_mapping |> 
    mutate(icd10_code = str_replace_all(
    icd10_code,
    pattern = "M79.65|M79.66|M79.67",
    replacement = "M79.6"
    )
    ) 

disease_mapping = 
  disease_mapping |> 
    mutate(icd10_code = str_replace_all(
    icd10_code,
    pattern = "M79.79",
    replacement = "M79.7"
    )
    ) 

disease_mapping = 
  disease_mapping |> 
    mutate(icd10_code = str_replace_all(
    icd10_code,
    pattern = "M79.86",
    replacement = "M79.8"
    )
    ) 

disease_mapping = 
  disease_mapping |> 
    mutate(icd10_code = str_replace_all(
    icd10_code,
    pattern = "M81.99",
    replacement = "M81.9"
    )
    ) 

disease_mapping = 
  disease_mapping |> 
    mutate(icd10_code = str_replace_all(
    icd10_code,
    pattern = "N390",
    replacement = "N39.0"
    )
    ) 

disease_mapping = 
  disease_mapping |> 
    mutate(icd10_code = str_replace_all(
    icd10_code,
    pattern = "N50.89",
    replacement = "N50.8"
    )
    ) 

disease_mapping = 
  disease_mapping |> 
    mutate(icd10_code = str_replace_all(
    icd10_code,
    pattern = "N52",
    replacement = "F52"
    )
    ) 

disease_mapping = 
  disease_mapping |> 
    mutate(icd10_code = str_replace_all(
    icd10_code,
    pattern = "N52.9",
    replacement = "F52.9"
    )
    ) 

disease_mapping = 
  disease_mapping |> 
    mutate(icd10_code = str_replace_all(
    icd10_code,
    pattern = "N814",
    replacement = "N81.4"
    )
    ) 

disease_mapping = 
  disease_mapping |> 
    mutate(icd10_code = str_replace_all(
    icd10_code,
    pattern = "P29.12",
    replacement = "P29.1"
    )
    ) 

disease_mapping = 
  disease_mapping |> 
    mutate(icd10_code = str_replace_all(
    icd10_code,
    pattern = "R06.00|R06.09",
    replacement = "R06.0"
    )
    ) 

disease_mapping = 
  disease_mapping |> 
    mutate(icd10_code = str_replace_all(
    icd10_code,
    pattern = "R06.83",
    replacement = "R06.8"
    )
    ) 

disease_mapping = 
  disease_mapping |> 
    mutate(icd10_code = str_replace_all(
    icd10_code,
    pattern = "R07.89",
    replacement = "R07.8"
    )
    ) 

disease_mapping = 
  disease_mapping |> 
    mutate(icd10_code = str_replace_all(
    icd10_code,
    pattern = "R07.8|R07.9",
    replacement = "R07"
    )
    ) 

disease_mapping = 
  disease_mapping |> 
    mutate(icd10_code = str_replace_all(
    icd10_code,
    pattern = "R09.89",
    replacement = "R09.8"
    )
    ) 


disease_mapping = 
  disease_mapping |> 
    mutate(icd10_code = str_replace_all(
    icd10_code,
    pattern = "R10.30",
    replacement = "R10.3"
    )
    ) 

disease_mapping = 
  disease_mapping |> 
    mutate(icd10_code = str_replace_all(
    icd10_code,
    pattern = "R10.8|R10.9",
    replacement = "R10"
    )
    ) 

disease_mapping = 
  disease_mapping |> 
    mutate(icd10_code = str_replace_all(
    icd10_code,
    pattern = "R11.0",
    replacement = "R11"
    )
    ) 

disease_mapping = 
  disease_mapping |> 
    mutate(icd10_code = str_replace_all(
    icd10_code,
    pattern = "R13.10",
    replacement = "R13"
    )
    ) 

disease_mapping = 
  disease_mapping |> 
    mutate(icd10_code = str_replace_all(
    icd10_code,
    pattern = "R13.1",
    replacement = "R13"
    )
    ) 

disease_mapping = 
  disease_mapping |> 
    mutate(icd10_code = str_replace_all(
    icd10_code,
    pattern = "R14.0",
    replacement = "R14"
    )
    ) 

disease_mapping = 
  disease_mapping |> 
    mutate(icd10_code = str_replace_all(
    icd10_code,
    pattern = "R19.7",
    replacement = "R19"
    )
    ) 

disease_mapping = 
  disease_mapping |> 
    mutate(icd10_code = str_replace_all(
    icd10_code,
    pattern = "R29.898.|R29.89",
    replacement = "R29.8"
    )
    ) 

disease_mapping = 
  disease_mapping |> 
    mutate(icd10_code = str_replace_all(
    icd10_code,
    pattern = "R31.29",
    replacement = "R31.2"
    )
    ) 

disease_mapping = 
  disease_mapping |> 
    mutate(icd10_code = str_replace_all(
    icd10_code,
    pattern = "R31.2|R31.9",
    replacement = "R31"
    )
    ) 


disease_mapping = 
  disease_mapping |> 
    mutate(icd10_code = str_replace_all(
    icd10_code,
    pattern = "R53.82|R53.83|R5382",
    replacement = "R53.8"
    )
    ) 

disease_mapping = 
  disease_mapping |> 
    mutate(icd10_code = str_replace_all(
    icd10_code,
    pattern = "R53.8",
    replacement = "R53"
    )
    ) 

disease_mapping = 
  disease_mapping |> 
    mutate(icd10_code = str_replace_all(
    icd10_code,
    pattern = "R56.9",
    replacement = "R56"
    )
    ) 

disease_mapping = 
  disease_mapping |> 
  mutate(icd10_code = str_replace_all(
    icd10_code,
    pattern = "R73.09|R73.02",
    replacement = "R73.0"
    )
    ) 

disease_mapping = 
  disease_mapping |> 
    mutate(icd10_code = str_replace_all(
    icd10_code,
    pattern = "R80.9",
    replacement = "R80"
    )
    ) 
 

disease_mapping = 
  disease_mapping |> 
    mutate(icd10_code = str_replace_all(
    icd10_code,
    pattern = "R82.998.|R82.99",
    replacement = "R82.9"
    )
    ) 

disease_mapping = 
  disease_mapping |> 
    mutate(icd10_code = str_replace_all(
    icd10_code,
    pattern = "R871",
    replacement = "R87.1"
    )
    ) 

disease_mapping = 
  disease_mapping |> 
    mutate(icd10_code = str_replace_all(
    icd10_code,
    pattern = "R87.6",
    replacement = "R87"
    )
    ) 


disease_mapping = 
  disease_mapping |> 
    mutate(icd10_code = str_replace_all(
    icd10_code,
    pattern = "R87.61",
    replacement = "R87.6"
    )
    ) 

disease_mapping = 
  disease_mapping |> 
    mutate(icd10_code = str_replace_all(
    icd10_code,
    pattern = "T50.905.",
    replacement = "T50.9"
    )
    ) 

disease_mapping = 
  disease_mapping |> 
    mutate(icd10_code = str_replace_all(
    icd10_code,
    pattern = "T78.40",
    replacement = "T78.4"
    )
    ) 

disease_mapping = 
  disease_mapping |> 
    mutate(icd10_code = str_replace_all(
    icd10_code,
    pattern = "T780",
    replacement = "T78.0"
    )
    ) 

disease_mapping = 
  disease_mapping |> 
    mutate(icd10_code = str_replace_all(
    icd10_code,
    pattern = "T81.149.88",
    replacement = "T81.1"
    )
    ) 

disease_mapping = 
  disease_mapping |> 
    mutate(icd10_code = str_replace_all(
    icd10_code,
    pattern = "T81.815.013.",
    replacement = "T81.8"
    )
    ) 

disease_mapping = 
  disease_mapping |> 
    mutate(icd10_code = str_replace_all(
    icd10_code,
    pattern = "T84.84",
    replacement = "T84.8"
    )
    ) 

disease_mapping = 
  disease_mapping |> 
    mutate(icd10_code = str_replace_all(
    icd10_code,
    pattern = "T887",
    replacement = "T88.7"
    )
    ) 

disease_mapping = 
  disease_mapping |> 
    mutate(icd10_code = str_replace_all(
    icd10_code,
    pattern = "Z86.79",
    replacement = "Z86.7"
    )
    ) 

disease_mapping = 
  disease_mapping |> 
    mutate(icd10_code = str_replace_all(
    icd10_code,
    pattern = "Z87.09",
    replacement = "Z87.0"
    )
    ) 

disease_mapping = 
  disease_mapping |> 
    mutate(icd10_code = str_replace_all(
    icd10_code,
    pattern = "Z87.39",
    replacement = "Z87.3"
    )
    ) 


disease_mapping = 
  disease_mapping |> 
    mutate(icd10_code = str_replace_all(
    icd10_code,
    pattern = "Z87.42",
    replacement = "Z87.4"
    )
    ) 

disease_mapping = 
  disease_mapping |> 
    mutate(icd10_code = str_replace_all(
    icd10_code,
    pattern = "Z87.891.",
    replacement = "Z87.8"
    )
    ) 

disease_mapping = 
  disease_mapping |> 
    mutate(icd10_code = str_replace_all(
    icd10_code,
    pattern = "W44.9",
    replacement = "W44"
    )
    ) 

disease_mapping = 
  disease_mapping |> 
    mutate(icd10_code = str_replace_all(
    icd10_code,
    pattern = "U80",
    replacement = "U82"
    )
    ) 

disease_mapping =
  disease_mapping |>
  distinct()

Add ICD10 descriptions

Other ICD10 descriptions to add

other_icd10_desc = 
  data.frame(
    icd10_code = c("A09.9", 
                  "A41",
                  "B95",
                  "B96",
                  "B97",
                  "B98",
                  "C44",
                  "C79",
                  "C80.0",
                  "C80.9",
                  "C90",
                  "D07",
                  "D35",
                  "D37",
                  "D47",
                  "D48",
                  "E03",
                  "E04",
                  "E21",
                  "E23",
                  "E27",
                  "E61",
                  "E80",
                  "E87",
                  "E89",
                  "F05",
                  "F17",
                  "F41",
                  "F43.0",
                  "F53",
                  "G31",
                  "G45",
                  "G46",
                  "G62",
                  "G93",
                  "G99",
                  "H47",
                  "H50",
                  "H57",
                  "H83",
                  "H91",
                  "H92",
                  "H93",
                  "I08",
                  "I27",
                  "I27.2",
                  "I44",
                  "I45",
                  "I48.0",
                  "I49",
                  "I51",
                  "I62",
                  "I71",
                  "J15",
                  "J16",
                  "J34",
                  "J38",
                  "J69",
                  "J84",
                  "J95",
                  "J96",
                  "K05",
                  "K07",
                  "K08",
                  "K09",
                  "K10",
                  "K12",
                  "K13",
                  "K22",
                  "K22.7",
                  "K52",
                  "K56",
                  "K64",
                  "K64.0",
                  "K64.9",
                  "K74",
                  "K75",
                  "K85.9",
                  "K86",
                  "K91",
                  "K91.8",
                  "K92",
                  "L02",
                  "L13",
                  "L30",
                  "L65",
                  "L73",
                  "L85",
                  "L89.1",
                  "L98",
                  "M05",
                  "M06",
                  "M07",
                  "M11",
                  "M13",
                  "M18",
                  "M19",
                  "M24",
                  "M25",
                  "M31",
                  "M35",
                  "M43",
                  "M48",
                  "M53",
                  "M62",
                  "M66",
                  "M67",
                  "M71",
                  "M77",
                  "M79",
                  "M79.7",
                  "M80",
                  "M81",
                  "M85",
                  "M89",
                  "M96",
                  "N02",
                  "N18.3",
                  "N18.4",
                  "N28",
                  "N39",
                  "N48",
                  "N73",
                  "N76",
                  "N88",
                  "N89",
                  "N91",
                  "N92",
                  "N93",
                  "N94",
                  "N99",
                  "O14",
                  "O04",
                  "O26",
                  "O32",
                  "O34",
                  "O36",
                  "O68",
                  "O75",
                  "O99",
                  "R03",
                  "R07",
                  "R09",
                  "R10",
                  "R19",
                  "R29",
                  "R29.6",
                  "R39",
                  "R40",
                  "R47",
                  "R57",
                  "T95.8",
                  "W44",
                  "Y95",
                  "Z86.3",
                  "Z87.3",
                  "Z87.4",
                  "Z87.7",
                  "Z87.8",
                  "T85.8",
                  "Z91.0",
                  "Z88.9",
                  "Z88.8",
                  "Z88",
                  "Z92.6",
                  "N90",
                  "U82"
                  ),
    icd10_description = c("Gastroenteritis and colitis of unspecified origin",
                         "Other sepsis",
                         "Streptococcus and staphylococcus as the cause of diseases classified to other chapters",
                         "Other specified bacterial agents as the cause of diseases classified to other chapters",
                         "Viral agents as the cause of diseases classified to other chapters",
                         "Other specified infectious agents as the cause of diseases classified to other chapters",
                         "Other malignant neoplasms of skin",
                         "Secondary malignant neoplasm of other and unspecified sites",
                         "Malignant neoplasm, primary site unknown, so stated",
                         "Malignant neoplasm, primary site unspecified",
                         "Multiple myeloma and malignant plasma cell neoplasms",
                         "Carcinoma in situ of other and unspecified genital organs",
                         "Benign neoplasm of other and unspecified endocrine glands",
                         "Neoplasm of uncertain or unknown behaviour of oral cavity and digestive organs",
                         "Other neoplasms of uncertain or unknown behaviour of lymphoid, haematopoietic and related tissue",
                         "Neoplasm of uncertain or unknown behaviour of other and unspecified sites",
                         "Other hypothyroidism",
                         "Other nontoxic goitre",
                         "Hyperparathyroidism and other disorders of parathyroid gland",
                         "Hypofunction and other disorders of pituitary gland",
                         "Other disorders of adrenal gland",
                         "Deficiency of other nutrient elements",
                         "Disorders of porphyrin and bilirubin metabolism",
                         "Other disorders of fluid, electrolyte and acid-base balance",
                         "Postprocedural endocrine and metabolic disorders, not elsewhere classified",
                         "Delirium, not induced by alcohol and other psychoactive substances",
                         "Mental and behavioural disorders due to use of tobacco",
                         "Other anxiety disorders",
                         "Acute stress reaction",
                         "Mental and behavioural disorders associated with the puerperium, not elsewhere classified",
                         "Other degenerative diseases of nervous system, not elsewhere classified",
                         "Transient cerebral ischaemic attacks and related syndromes",
                         "Vascular syndromes of brain in cerebrovascular diseases",
                         "Other polyneuropathies",
                         "Other disorders of brain",
                         "Other disorders of nervous system in diseases classified elsewhere",
                         "Other disorders of optic [2nd] nerve and visual pathways",
                         "Other strabismus",
                         "Other disorders of eye and adnexa",
                         "Other diseases of inner ear",
                         "Other hearing loss",
                         "Otalgia and effusion of ear",
                         "Other disorders of ear, not elsewhere classified",
                         "Multiple valve diseases",
                         "Other pulmonary heart diseases",
                         "Other secondary pulmonary hypertension",
                         "Atrioventricular and left bundle-branch block",
                         "Other conduction disorders",
                         "Paroxysmal atrial fibrillation",
                         "Ventricular fibrillation and flutter",
                         "Complications and ill-defined descriptions of heart disease",
                         "Other nontraumatic intracranial haemorrhage",
                         "Aortic aneurysm and dissection",
                         "Bacterial pneumonia, not elsewhere classified",
                         "Pneumonia due to other infectious organisms, not elsewhere classified",
                         "Other disorders of nose and nasal sinuses",
                         "Diseases of vocal cords and larynx, not elsewhere classified",
                         "Pneumonitis due to solids and liquids",
                         "Other interstitial pulmonary diseases",
                         "Postprocedural respiratory disorders, not elsewhere classified",
                         "Respiratory failure, not elsewhere classified",
                         "Gingivitis and periodontal diseases",
                         "Dentofacial anomalies [including malocclusion]",
                         "Other disorders of teeth and supporting structures",
                         "Cysts of oral region, not elsewhere classified",
                         "Other diseases of jaws",
                         "Stomatitis and related lesions",
                         "Other diseases of lip and oral mucosa",
                         "Other diseases of oesophagus",
                         "Barrett oesophagus",
                         "Other noninfective gastroenteritis and colitis",
                         "Paralytic ileus and intestinal obstruction without hernia",
                         "Haemorrhoids and perianal venous thrombosis",
                         "First degree haemorrhoids",
                         "Haemorrhoids, unspecified",
                         "Fibrosis and cirrhosis of liver",
                         "Other inflammatory liver diseases",
                         "Acute pancreatitis, unspecified",
                         "Other diseases of pancreas",
                         "Postprocedural disorders of digestive system, not elsewhere classified",
                         "Other postprocedural disorders of digestive system, not elsewhere classified",
                         "Other diseases of digestive system",
                         "Cutaneous abscess, furuncle and carbuncle",
                         "Other bullous disorders",
                         "Other dermatitis",
                         "Other nonscarring hair loss",
                         "Other follicular disorders",
                         "Other epidermal thickening",
                         "Stage II decubitus ulcer",
                         "Other disorders of skin and subcutaneous tissue, not elsewhere classified",
                         "Seropositive rheumatoid arthritis",
                         "Other rheumatoid arthritis",
                         "Psoriatic and enteropathic arthropathies",
                         "Other crystal arthropathies",
                         "Other arthritis",
                         "Arthrosis of first carpometacarpal joint",
                         "Other arthrosis",
                         "Other specific joint derangements",
                         "Other joint disorders, not elsewhere classified",
                         "Other necrotizing vasculopathies",
                         "Other systemic involvement of connective tissue",
                         "Other deforming dorsopathies",
                         "Other spondylopathies",
                         "Other dorsopathies, not elsewhere classified",
                         "Other disorders of muscle",
                         "Spontaneous rupture of synovium and tendon",
                         "Other disorders of synovium and tendon",
                         "Other bursopathies",
                         "Other enthesopathies",
                         "Other soft tissue disorders, not elsewhere classified",
                         "Fibromyalgia",
                         "Osteoporosis with pathological fracture",
                         "Osteoporosis without pathological fracture",
                         "Other disorders of bone density and structure",
                         "Other disorders of bone",
                         "Postprocedural musculoskeletal disorders, not elsewhere classified",
                         "Recurrent and persistent haematuria",
                         "Chronic kidney disease, stage 3",
                         "Chronic kidney disease, stage 4",
                         "Other disorders of kidney and ureter, not elsewhere classified",
                         "Other disorders of urinary system",
                         "Other disorders of penis",
                         "Other female pelvic inflammatory diseases",
                         "Other inflammation of vagina and vulva",
                         "Other noninflammatory disorders of cervix uteri",
                         "Other noninflammatory disorders of vagina",
                         "Other noninflammatory disorders of vulva and perineum",
                         "Excessive, frequent and irregular menstruation",
                         "Other abnormal uterine and vaginal bleeding",
                         "Pain and other conditions associated with female genital organs and menstrual cycle",
                         "Postprocedural disorders of genitourinary system, not elsewhere classified",
                         "Pre-eclampsia",
                         "Medical abortion",
                         "Maternal care for other conditions predominantly related to pregnancy",
                         "Maternal care for known or suspected malpresentation of fetus",
                         "Maternal care for known or suspected abnormality of pelvic organs",
                         "Maternal care for other known or suspected fetal problems",
                         "Labour and delivery complicated by fetal stress [distress]",
                         "Other complications of labour and delivery, not elsewhere classified",
                         "Other maternal diseases classifiable elsewhere but complicating pregnancy, childbirth and the puerperium",
                         "Abnormal blood-pressure reading, without diagnosis",
                         "Pain in throat and chest",
                         "Other symptoms and signs involving the circulatory and respiratory systems",
                         "Abdominal and pelvic pain",
                         "Other symptoms and signs involving the digestive system and abdomen",
                         "Other symptoms and signs involving the nervous and musculoskeletal systems",
                         "Tendency to fall, not elsewhere classified",
                         "Other symptoms and signs involving the urinary system",
                         "Somnolence, stupor and coma",
                         "Speech disturbances, not elsewhere classified",
                         "Shock, not elsewhere classified",
                         "Other complications of internal prosthetic devices, implants and grafts, not elsewhere classified",
                         "Foreign body entering into or through eye or natural orifice",
                         "Nosocomial condition",
                         "Personal history of endocrine, nutritional and metabolic diseases",
                         "Personal history of diseases of the musculoskeletal system and connective tissue",
                         "Personal history of diseases of the genitourinary system",
                         "Personal history of congenital malformations, deformations and chromosomal abnormalities",
                         "Personal history of other specified conditions",
                         "Other complications of internal prosthetic devices, implants and grafts, not elsewhere classified",
                         "Personal history of allergy, other than to drugs and biological substances",
                         "Personal history of allergy to unspecified drugs, medicaments and biological substances",
                         "Personal history of allergy to other drugs, medicaments and biological substances
",
"Personal history of allergy to drugs, medicaments and biological substances",
"Personal history of chemotherapy for neoplastic disease",
"Other noninflammatory disorders of vulva and perineum",
"Resistance to betalactam antibiotics")
  )
manual_icd10_map <-
  readxl::read_xlsx(here::here("data/icd/manual_disease_icd10_mappings.xlsx"))

icd10_descriptions =
  phecodes |>
  select(icd10_code = ICD10, 
         icd10_description = ICD_DESCRIPTION
         ) |>
  distinct()

# Expand multiple ICD codes into rows
to_add_expanded <- manual_icd10_map |>
  mutate(icd10_code = str_split(icd10_code, ",\\s*")) |>
  tidyr::unnest(icd10_code)

icd10_descriptions = 
  bind_rows(
    icd10_descriptions,
    to_add_expanded |>
      select(icd10_code, 
             icd10_description = icd10_desc),
    other_icd10_desc
  ) 

icd10_descriptions = icd10_descriptions |> distinct()

icd10_descriptions = 
  icd10_descriptions |>
  group_by(icd10_code) |>
  summarise(icd10_description = 
            str_flatten(unique(icd10_description), collapse = "; ", na.rm = T), 
            .groups = "drop"
            )

disease_mapping =
  left_join(disease_mapping,
            icd10_descriptions,
            by = "icd10_code",
            relationship = "many-to-one",
            na_matches = "never"
            ) 


disease_mapping = 
  disease_mapping |> 
    mutate(icd10_code = str_replace_all(
    icd10_code,
    pattern = "R871",
    replacement = "R87"
    )
    ) 


disease_mapping = 
  disease_mapping |> 
    mutate(icd10_description = 
    ifelse(icd10_code == "R87",
           "Abnormal findings in specimens from female genital organs",
           icd10_description)
    ) 

Missing descriptions for ICD10 codes

# check if any ICD10 codes are missing descriptions
disease_mapping |> 
  filter(is.na(icd10_description)) |>
  head()
# A tibble: 6 × 8
  `DISEASE/TRAIT`       collected_all_diseas…¹ PUBMED_ID STUDY_ACCESSION phecode
  <chr>                 <chr>                      <int> <chr>             <dbl>
1 Fractures (vertebral) bone fracture           29170203 GCST005097           NA
2 Fractures (vertebral) bone fracture           29170203 GCST005097           NA
3 Fractures (vertebral) bone fracture           29170203 GCST005097           NA
4 Fractures (vertebral) bone fracture           29170203 GCST005097           NA
5 Fractures (vertebral) bone fracture           29170203 GCST005097           NA
6 Fractures (vertebral) bone fracture           29170203 GCST005097           NA
# ℹ abbreviated name: ¹​collected_all_disease_terms
# ℹ 3 more variables: icd10_code <chr>, icd10_code_origin <chr>,
#   icd10_description <chr>

Noticing some mistakes in MAPPED trait by icd 10 codes

gwas_study_info |> 
  filter(grepl("ICD10 F05", 
               `DISEASE/TRAIT`)) |> 
  select(`DISEASE/TRAIT`, MAPPED_TRAIT, collected_all_disease_terms, PUBMED_ID)
                                                                  DISEASE/TRAIT
                                                                         <char>
1: ICD10 F05: Delirium due to known physiological condition (Gene-based burden)
2:                       ICD10 F05.9: Delirium, unspecified (Gene-based burden)
3:                                           ICD10 F05.9: Delirium, unspecified
4:                     ICD10 F05: Delirium due to known physiological condition
                  MAPPED_TRAIT collected_all_disease_terms PUBMED_ID
                        <char>                      <char>     <int>
1: alcohol withdrawal delirium   alcohol-related disorders  34662886
2:                    delirium                    delirium  34662886
3:                    delirium                    delirium  34662886
4: alcohol withdrawal delirium   alcohol-related disorders  34662886
disease_mapping |> 
  filter(icd10_code == "F05")
# A tibble: 2 × 8
  `DISEASE/TRAIT`       collected_all_diseas…¹ PUBMED_ID STUDY_ACCESSION phecode
  <chr>                 <chr>                      <int> <chr>             <dbl>
1 ICD10 F05: Delirium … alcohol-related disor…  34662886 GCST90083772         NA
2 ICD10 F05: Delirium … alcohol-related disor…  34662886 GCST90079786         NA
# ℹ abbreviated name: ¹​collected_all_disease_terms
# ℹ 3 more variables: icd10_code <chr>, icd10_code_origin <chr>,
#   icd10_description <chr>
# F05 refers to delirium not induced by alcohol and other psychoactive substances
# yet, the mapped trait is alcohol withdrawal delirium
# replace with delirium
disease_mapping = 
disease_mapping |>
  mutate(collected_all_disease_terms = 
         ifelse(icd10_code == "F05",
                "delirium",
                collected_all_disease_terms)
         )

gwas_study_info =
  gwas_study_info |>
  mutate(MAPPED_TRAIT = 
         ifelse(grepl("ICD10 F05", `DISEASE/TRAIT`),
                str_replace_all(collected_all_disease_terms,
                                pattern = "alcohol-related disorders",
                                replacement = "delirium"),
                collected_all_disease_terms)
         )

Save disease map

disease_mapping = 
  disease_mapping |> 
  arrange(collected_all_disease_terms, icd10_code)

fwrite(disease_mapping,
       here::here("output/icd_map/gwas_disease_to_icd10_mapping.csv")
       )

sessionInfo()
R version 4.3.1 (2023-06-16)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS 15.6.1

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: America/Los_Angeles
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices datasets  utils     methods   base     

other attached packages:
[1] jsonlite_2.0.0    httr_1.4.7        data.table_1.17.8 stringr_1.5.1    
[5] dplyr_1.1.4       workflowr_1.7.1  

loaded via a namespace (and not attached):
 [1] compiler_4.3.1    renv_1.0.3        promises_1.3.3    tidyselect_1.2.1 
 [5] Rcpp_1.1.0        git2r_0.36.2      tidyr_1.3.1       callr_3.7.6      
 [9] later_1.4.2       jquerylib_0.1.4   readxl_1.4.5      yaml_2.3.10      
[13] fastmap_1.2.0     here_1.0.1        R6_2.6.1          generics_0.1.4   
[17] knitr_1.50        tibble_3.3.0      rprojroot_2.1.0   bslib_0.9.0      
[21] pillar_1.11.0     rlang_1.1.6       utf8_1.2.6        cachem_1.1.0     
[25] stringi_1.8.7     httpuv_1.6.16     xfun_0.52         getPass_0.2-4    
[29] fs_1.6.6          sass_0.4.10       cli_3.6.5         withr_3.0.2      
[33] magrittr_2.0.3    ps_1.9.1          digest_0.6.37     processx_3.8.6   
[37] rstudioapi_0.17.1 lifecycle_1.0.4   vctrs_0.6.5       evaluate_1.0.4   
[41] glue_1.8.0        cellranger_1.1.0  whisker_0.4.1     purrr_1.1.0      
[45] rmarkdown_2.29    tools_4.3.1       pkgconfig_2.0.3   htmltools_0.5.8.1