Last updated: 2025-10-08
Checks: 7 0
Knit directory:
genomics_ancest_disease_dispar/
This reproducible R Markdown analysis was created with workflowr (version 1.7.1). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.
Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.
Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.
The command set.seed(20220216) was run prior to running
the code in the R Markdown file. Setting a seed ensures that any results
that rely on randomness, e.g. subsampling or permutations, are
reproducible.
Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.
Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.
Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.
The results in this page were generated with repository version a8f1628. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.
Note that you need to be careful to ensure that all relevant files for
the analysis have been committed to Git prior to generating the results
(you can use wflow_publish or
wflow_git_commit). workflowr only checks the R Markdown
file, but you know if there are other scripts or data files that it
depends on. Below is the status of the Git repository when the results
were generated:
Ignored files:
Ignored: .DS_Store
Ignored: .Rproj.user/
Ignored: analysis/.DS_Store
Ignored: data/.DS_Store
Ignored: data/gbd/.DS_Store
Ignored: data/gbd/IHME-GBD_2021_DATA-d8cf695e-1.csv
Ignored: data/gbd/ihme_gbd_2019_global_disease_burden_rate_all_ages.csv
Ignored: data/gbd/ihme_gbd_2019_global_paf_rate_percent_all_ages.csv
Ignored: data/gbd/ihme_gbd_2021_global_disease_burden_rate_all_ages.csv
Ignored: data/gbd/ihme_gbd_2021_global_paf_rate_percent_all_ages.csv
Ignored: data/gwas_catalog/
Ignored: data/icd/.DS_Store
Ignored: data/icd/IHME_GBD_2019_COD_CAUSE_ICD_CODE_MAP_Y2020M10D15.XLSX
Ignored: data/icd/IHME_GBD_2019_NONFATAL_CAUSE_ICD_CODE_MAP_Y2020M10D15.XLSX
Ignored: data/icd/IHME_GBD_2021_COD_CAUSE_ICD_CODE_MAP_Y2024M05D16.XLSX
Ignored: data/icd/IHME_GBD_2021_NONFATAL_CAUSE_ICD_CODE_MAP_Y2024M05D16.XLSX
Ignored: data/icd/UK_Biobank_master_file.tsv
Ignored: data/icd/cdc_valid_icd10_Sep_23_2025.xlsx
Ignored: data/icd/cdc_valid_icd9_Sep_23_2025.xlsx
Ignored: data/icd/manual_disease_icd10_mappings.xlsx
Ignored: data/icd/phecode_international_version_unrolled.csv
Ignored: data/icd/semiautomatic_ICD-pheno.txt
Ignored: data/icd/~$IHME_GBD_2019_COD_CAUSE_ICD_CODE_MAP_Y2020M10D15.XLSX
Ignored: data/icd/~$IHME_GBD_2019_NONFATAL_CAUSE_ICD_CODE_MAP_Y2020M10D15.XLSX
Ignored: data/who/
Ignored: output/.DS_Store
Ignored: output/gwas_cat/
Ignored: output/gwas_study_info_cohort_corrected.csv
Ignored: output/gwas_study_info_trait_corrected.csv
Ignored: output/gwas_study_info_trait_ontology_info.csv
Ignored: output/gwas_study_info_trait_ontology_info_l1.csv
Ignored: output/gwas_study_info_trait_ontology_info_l2.csv
Ignored: output/icd_map/
Ignored: output/trait_ontology/
Ignored: renv/
Ignored: sup_table.xlsx
Untracked files:
Untracked: analysis/gwas_to_gbd.Rmd
Unstaged changes:
Modified: analysis/disease_inves_by_ancest.Rmd
Modified: analysis/gbd_data_plots.Rmd
Modified: analysis/index.Rmd
Modified: analysis/level_1_disease_group_non_cancer.Rmd
Modified: analysis/level_2_disease_group.Rmd
Modified: analysis/trait_ontology_categorization.Rmd
Modified: data/icd/README.md
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
These are the previous versions of the repository in which changes were
made to the R Markdown (analysis/map_trait_to_icd10.Rmd)
and HTML (docs/map_trait_to_icd10.html) files. If you’ve
configured a remote Git repository (see ?wflow_git_remote),
click on the hyperlinks in the table below to view the files as they
were in that past version.
| File | Version | Author | Date | Message |
|---|---|---|---|---|
| Rmd | a8f1628 | IJbeasley | 2025-10-08 | Include study accession in icd 10 map |
| html | 50ebebc | IJbeasley | 2025-10-08 | Build site. |
| Rmd | 9bbe0dd | IJbeasley | 2025-10-08 | Updating icd 10 mapping |
| html | ec027a3 | IJbeasley | 2025-10-08 | Build site. |
| Rmd | cb8a570 | IJbeasley | 2025-10-08 | Updating disease icd code mapping |
| html | 41d6fe5 | IJbeasley | 2025-09-28 | Build site. |
| Rmd | 97d340d | IJbeasley | 2025-09-28 | workflowr::wflow_publish("analysis/map_trait_to_icd10.Rmd") |
library(dplyr)
library(stringr)
library(data.table)
source(here::here("code/get_term_descendants.R"))
gwas_study_info <- fread(here::here("output/gwas_cat/gwas_study_info_group_v2.csv"))
gwas_study_info = gwas_study_info |>
mutate(collected_all_disease_terms = stringi::stri_trans_general(collected_all_disease_terms, "Latin-ASCII")
)
disease_mapping <- gwas_study_info |>
filter(DISEASE_STUDY == T) |>
tidyr::separate_longer_delim(cols = collected_all_disease_terms,
delim = ", ") |>
select(`DISEASE/TRAIT`,
collected_all_disease_terms,
PUBMED_ID,
STUDY_ACCESSION) |>
distinct()
disease_mapping =
disease_mapping |>
filter(collected_all_disease_terms != "")
print("Number of unique disease trait & study pairs")
[1] "Number of unique disease trait & study pairs"
nrow(disease_mapping)
[1] 45184
diseases <- stringr::str_split(pattern = ", ",
gwas_study_info$collected_all_disease_terms[gwas_study_info$collected_all_disease_terms != ""]) |>
unlist() |>
stringr::str_trim()
diseases <- unique(diseases)
print("Number of unique disease terms")
[1] "Number of unique disease terms"
print(length(diseases))
[1] 1755
disease_mapping <- disease_mapping |>
mutate(
phecode = str_extract(`DISEASE/TRAIT`, "(?<=PheCode )[^)]+")
) |>
mutate(phecode = as.numeric(phecode))
# phecode to ICD10 mapping from https://wei-lab.app.vumc.org/phecode-data/phecode_international_version
phecodes <- fread(here::here("data/icd/phecode_international_version_unrolled.csv"))
phecode_icd_map =
phecodes |>
select(icd10_code = ICD10,
phecode = PheCode
)
# if more than one ICD10 code per phecode, collapse into a single row
phecode_icd_map =
phecode_icd_map |>
group_by(phecode) |>
summarise(icd10_code =
str_flatten(unique(icd10_code), collapse = ", ", na.rm = T),
.groups = "drop")
# phecode icd 10 map
phecode_icd_map =
phecode_icd_map |>
mutate(icd10_code_origin = "Study PheCode Mapping")
disease_mapping =
left_join(disease_mapping,
phecode_icd_map,
by = "phecode",
relationship = "many-to-one",
na_matches = "never")
# disease_mapping =
# disease_mapping |>
# mutate(icd10_code_origin = "Study PheCode")
disease_mapping |>
filter(!is.na(icd10_code_origin)) |>
nrow()
[1] 6677
disease_mapping |>
filter(!is.na(icd10_code_origin)) |>
head()
DISEASE/TRAIT
1 Neurofibromatosis (PheCode 199.4)
2 Myeloproliferative disease (PheCode 200)
3 Polycythemia vera (PheCode 200.1)
4 Hodgkin's disease (PheCode 201)
5 Cancer of other lymphoid, histiocytic tissue (PheCode 202)
6 Non-Hodgkins lymphoma (PheCode 202.2)
collected_all_disease_terms PUBMED_ID STUDY_ACCESSION phecode
1 neurofibromatosis 30104761 GCST90435639 199.4
2 myeloproliferative disorder 30104761 GCST90435640 200.0
3 polycythemia vera 30104761 GCST90435641 200.1
4 hodgkins lymphoma 30104761 GCST90435642 201.0
5 lymphatic system cancer 30104761 GCST90435643 202.0
6 non-hodgkins lymphoma 30104761 GCST90435644 202.2
icd10_code
1 Q85.0
2 C88.7, C94.4, C94.5, D46, D46.0, D46.1, D46.2, D46.4, D46.7, D46.9, D47.0, D47.1, D47.3, D47.7, D47.9, D57.8, D75.2
3 D45
4 C81, C81.0, C81.1, C81.2, C81.3, C81.7, C81.9
5 C96.0, C96.1, C96.2, C96.3, Z85.7
6 B21.1, C82.0, C82.1, C82.2, C82.7, C83, C83.0, C83.1, C83.2, C83.4, C83.5, C83.6, C83.7, C83.8, C83.9, C84, C84.0, C84.1, C84.2, C84.3, C84.4, C84.5, C85, C85.1, C85.7, C85.9, C96.7, C96.9, L41.2
icd10_code_origin
1 Study PheCode Mapping
2 Study PheCode Mapping
3 Study PheCode Mapping
4 Study PheCode Mapping
5 Study PheCode Mapping
6 Study PheCode Mapping
disease_mapping |>
filter(is.na(icd10_code) & !is.na(phecode)) |>
nrow()
[1] 894
phecode = c("38.30",
"79.90",
"110.00",
"530.13",
"562.20",
"580.10",
"174.00",
"174.20",
"189.10",
"218.00",
"228.10",
"224.00",
"250.15",
"250.25",
"250.40",
"285.21",
"292.11",
"362.26",
"362.23",
"362.27",
"362.50",
"452.20",
"528.10",
"535.90",
"724.10",
"724.22",
"740.00",
"743.10",
"743.12",
"743.13",
"172.21",
"274.00",
"286.00",
"282.00",
"280.00",
"272.00",
"276.00",
"338.00",
"350.00",
"401.00",
"411.00",
"414.20",
"415.1",
"427.21",
"429.00",
"585.00",
"735.22",
"724.00",
"722.00",
"716.10",
"709.00",
"706.00",
"592.00",
"571.00",
"580.00",
"170.00",
"172.00",
"264.00",
"291.00",
"555.00",
"562.00",
"578.00",
"783.1",
"536.80",
"427.40",
"443.00",
"41.80",
"41.90",
"244.00",
"250.00",
"427.40",
"526.40",
"977.00",
"840.20",
"823.00",
"751.00",
"743.00",
"270.38",
"504.10",
"253.40",
"279.20",
"426.22",
"537.1",
"707.20",
"736.10",
"789.10",
"290.13",
"327.70",
"433.60",
"695.00",
"602.30",
"375.10",
"560.00",
"586.10",
"593.20",
"620.10",
"475.90",
"799.00")
icd10_code = c("A49.9",
"B34.9",
"B35, B36",
"K22.7",
"K57",
"N00, N01, N02, N03, N04, N05, N06, N07",
"Z85.3",
"C50",
"C64, C65",
"D25, D26",
"D18.0",
"E00, E00.0, E00.1, E00.2, E00.9, E01.8, E02, E03.0, E03.1, E03.2, E03.3, E03.8, E03.9, E89.0",
"E10.5",
"E11.5",
"R73.0",
"D63",
"R47.0",
"H35.3",
"H35.3",
"H35.3",
"H35.5",
"I80",
"K12.30",
"K29.7, K29.8, K29.9",
"M43.2",
"M21.5",
"M13.9, M15.0, M15.1, M15.2, M15.3, M15.4, M16, M16.0, M16.1, M16.3, M16.6, M16.7, M16.9, M17.1, M17.4, M17.5, M18.0, M18.1, M18.5, M18.9, M19.0, M19.2",
"M81",
"M81.8",
"M81.8",
"C44",
"M10, M10.0, M10.1, M10.2, M10.4, M10.9, M11.0, M11.1, M11.2, M11.8, M11.9, M67.9",
"D65, D66, D67, D68, D68, D68.0, D68.1, D68.2, D68.3, D68.4, D68.8, D68.9, O72.3, O99.1",
"D55, D55.0, D55.1, D55.2, D55.3, D55.8, D55.9, D56, D56.0, D56.1, D56.2, D56.3, D56.4, D56.8, D56.9, D57, D57.0, D57.1, D57.2, D57.3, D57.8, D58, D58.0, D58.1, D58.2, D58.8, D58.9, M90.4",
"D50, D50.0, D50.1, D50.8, D50.9",
"E78.0, E78.1, E78.2, E78.3, E78.4, E78.5, E78.9",
"E86, E87.0, E87.1, E87.2, E87.3, E87.4, E87.5, E87.6, E87.7, E87.8, R63.1",
"R52.0, R52.2, R52.9",
"R25, R25.0, R25.1, R25.2, R25.3, R25.8, R26, R26.0, R26.1, R26.8, R27, R27.0, R27.8, R29.0, R29.2, R43, R43.0, R43.1, R43.2",
"I10, I11, I11.0, I11.9, I12, I12.0, I12.9, I13, I13.0, I13.1, I13.2, I13.9, I15, I15.0, I15.1, I15.2, I15.8, I15.9, I67.4",
"I20, I20.0, I20.1, I20.8, I20.9, I21, I21.0, I21.1, I21.2, I21.3, I21.4, I21.9, I22, I22.0, I22.1, I22.8, I22.9, I23, I23.0, I23.1, I23.2, I23.3, I23.6, I23.8, I24, I24.0, I24.1, I24.8, I24.9, I25, I25.1, I25.2, I25.3, I25.4, I25.5, I25.6, I25.8, I25.9, I34.1, I51.0, I51.3, Z95.1, Z95.5",
"I25.10",
"I26, I26.0",
"I48",
"I51.8",
"N17, N17.0, N17.1, N17.2, N17.8, N17.9, N18, N18.0, N18.9, N19, Y60.2, Y61.2, Y62.0, Y84.1, Z49.1, Z49.2, Z99.2",
"M21.5",
"M40.2, M43.2, M43.8, M48.8, M49.8, M50.0, M99.6",
"G55.1, M46.4, M50, M50.0, M50.0, M50.1, M50.2, M50.3, M50.8, M50.9, M51.3, M51.4, M96.1",
"M13.0",
"L94.3, M33, M33.0, M33.1, M33.2, M33.9, M34, M34.0, M34.1, M34.2, M34.8, M34.9, M35.0, M35.1, M35.5, M35.8, M35.9, M36.0, M36.8, M65.3, N16.4",
"K09.8, L70, L70.0, L70.1, L70.2, L70.3, L70.4, L70.5, L70.8, L70.9, L72, L72.0, L72.1, L72.2, L72.8, L72.9, L73.0, L85.3",
"N30, N30.0, N30.1, N30.2, N30.3, N30.8, N30.9, N34, N34.0, N34.2, N34.3, N35.1, N37",
"K70.4, K72, K72.1, K72.9, K74.0, K74.1, K74.2, K74.3, K74.4, K74.5, K74.6, K75.0, K75.1, K76.0, K76.6, K76.7",
"B52.0, N00.0, N00.1, N00.2, N00.3, N00.4, N00.5, N00.6, N00.7, N01, N01.0, N01.1, N01.2, N01.3, N01.4, N01.5, N01.6, N01.7, N01.9, N02.0, N02.1, N02.2, N02.3, N02.4, N02.5, N02.6, N02.7, N03, N03.0, N03.1, N03.2, N03.3, N03.4, N03.5, N03.6, N03.7, N03.9, N04, N04.0, N04.1, N04.2, N04.3, N04.4, N04.5, N04.6, N04.7, N04.8, N04.9, N05, N05.0, N05.1, N05.2, N05.3, N05.4, N05.5, N05.6, N05.7, N05.9, N06.0, N06.1, N06.2, N06.3, N06.4, N06.5, N06.6, N06.7, N07.0, N07.1, N07.2, N07.3, N07.4, N07.5, N07.6, N07.7, N08, N08.1, N08.2, N08.3, N08.4, N08.5, N08.8, N14, N14.0, N14.1, N14.2, N14.3, N14.4, N15.0, N15.8, N16.1, N16.2, N16.3, N16.4, N16.5",
"C40, C40.0, C40.1, C40.2, C40.3, C40.8, C40.9, C41, C41.0, C41.1, C41.2, C41.3, C41.4, C41.9, C47, C47.0, C47.1, C47.2, C47.3, C47.4, C47.5, C47.6, C47.8, C47.9, C49, C49.0, C49.1, C49.2, C49.3, C49.4, C49.5, C49.6, C49.8, C49.9",
"C43, C43.0, C43.1, C43.2, C43.3, C43.4, C43.5, C43.6, C43.7, C43.8, C43.9, C44.0, C44.1, C44.2, C44.3, C44.4, C44.5, C44.6, C44.7, C44.8, C44.9, D03, D03.0, D03.1, D03.2, D03.3, D03.4, D03.5, D03.6, D03.7, D03.8, D03.9, D04, D04.0, D04.1, D04.2, D04.3, D04.4, D04.5, D04.6, D04.7, D04.8, D04.9",
"R62.0, R62.8, R62.9",
"F06, F06.1, F07.0, F07.1, F07.2, F07.8, F07.9, F23, F23.0, F23.1, F23.8, F23.9, G47.1, R40.0, R40.1",
"K50, K50.0, K50.1, K50.8, K50.9, K51, K51.0, K51.1, K51.2, K51.3, K51.4, K51.5, K51.8, K51.9",
"K57, K57.0, K57.1, K57.2, K57.3, K57.4, K57.5, K57.8, K57.9",
"K62.5, K92.0, K92.1, K92.2",
"R50.8",
"K30",
"I46, I46.0, I46.9, I49.0",
"E10.5, E11.5, E14.5, I73, I73.0, I73.8, I73.9, I79.1, I79.2, I79.8",
"B96.8",
"U82, U83, U84",
"E00, E00.0, E00.1, E00.2, E00.9, E01.8, E02, E03.0, E03.1, E03.2, E03.3, E03.8, E03.9, E89.0",
"E10, E10.0, E10.1, E10.2, E10.3, E10.3, E10.3, E10.4, E10.4, E10.6, E10.7, E10.8, E10.9, E11, E11.0, E11.1, E11.2, E11.3, E11.4, E11.6, E11.7, E11.8, E11.9, E12.3, E13, E13.1, E13.3, E13.4, E13.5, E13.6, E13.7, E13.8, E13.9, E14.9, G59.0, G63.2, H36.0, R73.0, R73.9, R81, R82.4, Z96.4",
"I46, I46.0, I46.9, I49.0",
"K07.6",
"Z88.9",
"S43.4",
"S82.1, S82.3, S82.8",
"Q50.0, Q50.1, Q50.2, Q50.3, Q50.4, Q50.5, Q50.6, Q51, Q51.0, Q51.1, Q51.2, Q51.3, Q51.4, Q51.5, Q51.6, Q51.7, Q51.8, Q51.9, Q52.0, Q52.1, Q52.2, Q52.3, Q52.4, Q52.5, Q52.6, Q52.7, Q52.8, Q52.9, Q53, Q53.0, Q53.1, Q53.2, Q53.9, Q54, Q54.0, Q54.1, Q54.2, Q54.3, Q54.4, Q54.8, Q54.9, Q55, Q55.0, Q55.1, Q55.2, Q55.3, Q55.4, Q55.5, Q55.6, Q55.8, Q55.9, Q56, Q56.0, Q56.1, Q56.2, Q56.3, Q56.4, Q60, Q60.0, Q60.1, Q60.2, Q60.3, Q60.4, Q60.5, Q60.6, Q61, Q61.0, Q61.1, Q61.2, Q61.3, Q61.4, Q61.5, Q61.8, Q61.9, Q62, Q62.0, Q62.1, Q62.3, Q62.4, Q62.5, Q62.6, Q62.7, Q62.8, Q63, Q63.0, Q63.1, Q63.2, Q63.3, Q63.8, Q63.9, Q64.0, Q64.1, Q64.2, Q64.3, Q64.4, Q64.5, Q64.6, Q64.7, Q64.8, Q64.9",
"M48.4, M48.5, M80.5, M80.8, M81.6, M81.9, M84.4, M85.9, M89.9",
"E88.0",
"J84.1, J84.2",
"E23.6",
"D89.8",
"I44.1",
"K31.8. K31.9",
"L97",
"M21.0, M21.1, M21.9",
"R11.10",
"F03",
"G47.6, G25.8",
"G43.6",
"L49",
"N42.3",
"H04.1",
"K56.6",
"N28.8",
"R31.2",
"N87",
"R09.8",
"R53")
icd10_code_origin = rep("Study PheCode (Manual Mapping)",
length(phecode))
to_add = data.frame(phecode,
icd10_code,
icd10_code_origin)
to_add = to_add |> distinct()
to_add =
to_add |>
mutate(phecode = as.numeric(phecode))
disease_mapping =
rows_patch(disease_mapping,
to_add,
unmatched = "ignore")
Matching, by = "phecode"
phenotype_icd_map =
phecodes |>
group_by(Phenotype) |>
summarise(icd10_code =
str_flatten(ICD10, collapse = ", ", na.rm = T),
.groups = "drop")
matched_phenotypes =
phenotype_icd_map #|>
#filter(tolower(Phenotype) %in% not_found_diseases)
matched_phenotypes =
matched_phenotypes |>
mutate(collected_all_disease_terms = tolower(Phenotype)) |>
select(collected_all_disease_terms, icd10_code) |>
mutate(icd10_code_origin = "Phecode Phenotype Match")
# match by collected_all_disease_terms
disease_mapping =
disease_mapping |>
rows_patch(matched_phenotypes,
unmatched = "ignore")
# match by DISEASE/TRAIT
matched_phenotypes =
matched_phenotypes |>
rename(`DISEASE/TRAIT` = collected_all_disease_terms)
disease_mapping =
disease_mapping |>
rows_patch(matched_phenotypes,
unmatched = "ignore")
disease_mapping |>
filter(icd10_code_origin == "Phecode Phenotype Match") |>
nrow()
[1] 5553
# disease_mapping =
# disease_mapping |>
# filter(icd10_code != "")
matched <- c(disease_mapping_matched$collected_all_disease_terms,
matched_phenotypes$collected_all_disease_terms)
Warning: Unknown or uninitialised column: `collected_all_disease_terms`.
not_found_diseases <- diseases[!diseases %in%
matched
]
not_found_diseases <- not_found_diseases[not_found_diseases != ""]
print(length(not_found_diseases))
[1] 487
matched_icd10_desc =
phecodes |>
#filter(tolower(iconv(ICD_DESCRIPTION, to = "UTF-8")) %in% not_found_diseases) |>
mutate(collected_all_disease_terms = tolower(iconv(ICD_DESCRIPTION, to = "UTF-8"))) |>
select(collected_all_disease_terms,
icd10_code = ICD10)
matched_icd10_desc =
matched_icd10_desc |>
group_by(collected_all_disease_terms) |>
summarise(icd10_code = str_flatten(unique(icd10_code),
collapse = ", ",
na.rm = T),
.groups = "drop")
# match by collected_all_disease_terms
matched_icd10_desc =
matched_icd10_desc |>
mutate(icd10_code_origin = "ICD Description Match")
disease_mapping =
disease_mapping |>
rows_patch(matched_icd10_desc,
unmatched = "ignore")
# match by DISEASE/TRAIT
matched_icd10_desc =
matched_icd10_desc |>
rename(`DISEASE/TRAIT` = collected_all_disease_terms)
disease_mapping =
rows_patch(disease_mapping,
matched_icd10_desc,
unmatched = "ignore")
disease_mapping |>
filter(icd10_code_origin == "ICD Description Match") |>
nrow()
[1] 13176
# disease_mapping =
# disease_mapping |>
# filter(icd10_code != "")
matched <- c(disease_mapping_matched$collected_all_disease_terms,
matched_phenotypes$collected_all_disease_terms,
to_add$collected_all_disease_terms)
Warning: Unknown or uninitialised column: `collected_all_disease_terms`.
not_found_diseases <- diseases[!diseases %in% matched]
not_found_diseases <- not_found_diseases[not_found_diseases != ""]
print(length(not_found_diseases))
[1] 487
disease_mapping |>
filter(is.na(icd10_code_origin)) |>
nrow()
[1] 15565
manual_icd10_map <-
readxl::read_xlsx(here::here("data/icd/manual_disease_icd10_mappings.xlsx"))
manual_icd10_map =
manual_icd10_map |>
select(collected_all_disease_terms = mapped_trait,
icd10_code) |>
mutate(icd10_code_origin = "Manual Mapping (collected_all_disease_terms)")
# disease_mapping =
# bind_rows(disease_mapping, to_add) |>
# distinct()
disease_mapping =
rows_patch(disease_mapping,
manual_icd10_map,
unmatched = "ignore")
Matching, by = "collected_all_disease_terms"
disease_mapping |>
filter(icd10_code_origin == "Manual Mapping (collected_all_disease_terms)") |>
nrow()
[1] 1519
disease_mapping =
disease_mapping |>
mutate(icd10_code = ifelse(collected_all_disease_terms == "toxicity" &
grepl("Anthracycline-induced cardiotoxicity|Trastuzumab-induced cardiotoxicity", `DISEASE/TRAIT`, ignore.case = T),
"I42.7, T45.1",
icd10_code)) |>
mutate(icd10_code_origin =
ifelse(collected_all_disease_terms == "toxicity" &
grepl("Anthracycline-induced cardiotoxicity|Trastuzumab-induced cardiotoxicity", `DISEASE/TRAIT`, ignore.case = T),
"Manual Mapping (from DISEASE/TRAIT)",
icd10_code_origin)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = ifelse(collected_all_disease_terms == "toxicity" &
grepl("Induration at the injection site after COVID-19", `DISEASE/TRAIT`, ignore.case = T),
"R23.4",
icd10_code)) |>
mutate(collected_all_disease_terms =
ifelse(collected_all_disease_terms == "toxicity" &
grepl("Induration at the injection site after COVID-19", `DISEASE/TRAIT`, ignore.case = T),
"induration of skin",
collected_all_disease_terms)) |>
mutate(icd10_code_origin =
ifelse(collected_all_disease_terms == "induration of skin",
"Manual Mapping (from DISEASE/TRAIT)",
icd10_code_origin)
)
# N64.5
disease_mapping =
disease_mapping |>
mutate(icd10_code = ifelse(collected_all_disease_terms == "toxicity" &
grepl("Induration \\(>= grade 2\\) in breast cancer treated with radiotherapy",
`DISEASE/TRAIT`, ignore.case = T),
"N64.5",
icd10_code)) |>
mutate(collected_all_disease_terms =
ifelse(collected_all_disease_terms == "toxicity" &
grepl("Induration \\(>= grade 2\\) in breast cancer treated with radiotherapy",
`DISEASE/TRAIT`, ignore.case = T),
"induration of breast",
collected_all_disease_terms)) |>
mutate(icd10_code_origin =
ifelse(collected_all_disease_terms == "induration of breast",
"Manual Mapping (from DISEASE/TRAIT)",
icd10_code_origin)
)
# T45.1
disease_mapping =
disease_mapping |>
mutate(icd10_code = ifelse(collected_all_disease_terms == "toxicity" &
grepl("Nivolumab-induced immune-related adverse events in cancer|Response to immune checkpoint inhibitors in melanoma",
`DISEASE/TRAIT`, ignore.case = T),
"T45.1",
icd10_code)) |>
mutate(icd10_code_origin =
ifelse(collected_all_disease_terms == "toxicity" &
grepl("Nivolumab-induced immune-related adverse events in cancer|Response to immune checkpoint inhibitors in melanoma",
`DISEASE/TRAIT`, ignore.case = T),
"Manual Mapping (from DISEASE/TRAIT)",
icd10_code_origin)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = ifelse(collected_all_disease_terms == "toxicity" &
grepl("Methotrexate-related central neurotoxicity in children treated for acute lymphoblastic leukemia",
`DISEASE/TRAIT`, ignore.case = T),
"G92, T45.1",
icd10_code)) |>
mutate(icd10_code_origin =
ifelse(collected_all_disease_terms == "toxicity" &
grepl("Methotrexate-related central neurotoxicity in children treated for acute lymphoblastic leukemia",
`DISEASE/TRAIT`, ignore.case = T),
"Manual Mapping (from DISEASE/TRAIT)",
icd10_code_origin)
)
# disease_mapping =
# disease_mapping |>
# filter(icd10_code != "")
matched <- c(disease_mapping_matched$collected_all_disease_terms,
matched_phenotypes$collected_all_disease_terms,
to_add$collected_all_disease_terms,
manual_icd10_map$collected_all_disease_terms)
Warning: Unknown or uninitialised column: `collected_all_disease_terms`.
not_found_diseases <- diseases[!diseases %in% matched]
not_found_diseases <- not_found_diseases[not_found_diseases != ""]
print(length(not_found_diseases))
[1] 149
# similar studies:
study_icd_map =
disease_mapping |>
filter(!is.na(icd10_code)) |>
filter(icd10_code_origin == "Study Provided" | icd10_code_origin == "Study PheCode Mapping")
study_icd_map =
study_icd_map |>
select(collected_all_disease_terms, icd10_code) |>
distinct()
study_icd_map =
study_icd_map |>
group_by(collected_all_disease_terms) |>
summarise(icd10_code = str_flatten(unique(sort(icd10_code)),
collapse = ", ",
na.rm = T),
.groups = "drop")
study_icd_map =
study_icd_map |>
mutate(icd10_code_origin = "Inferred from similar studies")
disease_mapping =
rows_patch(disease_mapping,
study_icd_map,
unmatched = "ignore")
Matching, by = "collected_all_disease_terms"
disease_mapping |>
filter(is.na(icd10_code)) |>
nrow()
[1] 331
disease_mapping |>
group_by(icd10_code_origin) |>
summarise(n = n()) |>
arrange(desc(n))
# A tibble: 9 × 2
icd10_code_origin n
<chr> <int>
1 Inferred from similar studies 13705
2 ICD Description Match 13176
3 Study PheCode Mapping 6677
4 Phecode Phenotype Match 5553
5 Study Provided 3803
6 Manual Mapping (collected_all_disease_terms) 1519
7 Study PheCode (Manual Mapping) 410
8 <NA> 331
9 Manual Mapping (from DISEASE/TRAIT) 10
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "F40 F41 F42",
replacement = "F40, F41, F42"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "G47.8. G47.9",
replacement = "G47.8, G47.9"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_split(icd10_code, ",\\s*")) |>
tidyr::unnest(icd10_code)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "B37.49",
replacement = "B37.4"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "E05.90",
replacement = "E05.9"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "E11.319|E11.31|E11.329.9|E11.329.|E11.32|E11.3.",
replacement = "E11.3"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "E11.3.",
replacement = "E11.3"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "E13.621.|E13.62",
replacement = "E11.3"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "E78.00",
replacement = "E78.0"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "F10.10",
replacement = "F10.1"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "F17.201.|F17.20",
replacement = "F17.2"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "F430",
replacement = "F43.0"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "G47.00",
replacement = "G47.0"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "H01.00|H01.09.",
replacement = "H01.0"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "H01.09.",
replacement = "H01.0"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "H029",
replacement = "H02.9"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "H60.90",
replacement = "H60.9"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "H60.90",
replacement = "H60.9"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "H91.8X9.",
replacement = "H91.8"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "H91.90",
replacement = "H91.9"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "H93.299.|H93.29",
replacement = "H01"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "I70.20",
replacement = "I70.2"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "I82.409.|I82.40|I82.4",
replacement = "I82"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "J30.9",
replacement = "J30.2"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "J45.998.|J45.909.|J45.901.|J45.99|J45.90",
replacement = "J45.9"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "J98.457.6",
replacement = "J98.4"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "K05.30-31",
replacement = "K05.3"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "K29.70",
replacement = "K29.7"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "K59.00",
replacement = "K59.0"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "L08.89",
replacement = "L08.8"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "M06.99|M06.90",
replacement = "M06.9"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "M10.99",
replacement = "M10.9"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "M10.99",
replacement = "M10.9"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "M13.90|M13.94|M13.96|M13.97|M13.99",
replacement = "M13.9"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "M19.07",
replacement = "M19.0"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "M19.90|M19.91|M19.94|M19.97|M19.99",
replacement = "M19.9"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "M25.50|M25.51|M25.55|M25.569.|M25.56|M25.571.|M25.57",
replacement = "M25.5"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "M25.76|M25.77",
replacement = "M25.7"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "M43.16",
replacement = "M43.1"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "M47.80|M47.82|M47.86",
replacement = "M47.8"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "M47.92|M47.96",
replacement = "M47.9"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "M48.02|M48.06",
replacement = "M48.0"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "M54.22",
replacement = "M54.2"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "M54.30|M54.39",
replacement = "M54.3"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "M54.56|M54.57|M54.59",
replacement = "M54.5"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "M54.99",
replacement = "M54.9"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "M65.34",
replacement = "M65.3"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "M65.96",
replacement = "M65.9"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "M72.04",
replacement = "M72.0"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "M79.09",
replacement = "M79.0"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "M79.65|M79.66|M79.67",
replacement = "M79.6"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "M79.79",
replacement = "M79.7"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "M79.86",
replacement = "M79.8"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "M81.99",
replacement = "M81.9"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "N390",
replacement = "N39.0"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "N50.89",
replacement = "N50.8"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "N52",
replacement = "F52"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "N52.9",
replacement = "F52.9"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "N814",
replacement = "N81.4"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "P29.12",
replacement = "P29.1"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "R06.00|R06.09",
replacement = "R06.0"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "R06.83",
replacement = "R06.8"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "R07.89",
replacement = "R07.8"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "R07.8|R07.9",
replacement = "R07"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "R09.89",
replacement = "R09.8"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "R10.30",
replacement = "R10.3"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "R10.8|R10.9",
replacement = "R10"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "R11.0",
replacement = "R11"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "R13.10",
replacement = "R13"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "R13.1",
replacement = "R13"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "R14.0",
replacement = "R14"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "R19.7",
replacement = "R19"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "R29.898.|R29.89",
replacement = "R29.8"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "R31.29",
replacement = "R31.2"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "R31.2|R31.9",
replacement = "R31"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "R53.82|R53.83|R5382",
replacement = "R53.8"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "R53.8",
replacement = "R53"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "R56.9",
replacement = "R56"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "R73.09|R73.02",
replacement = "R73.0"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "R80.9",
replacement = "R80"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "R82.998.|R82.99",
replacement = "R82.9"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "R871",
replacement = "R87.1"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "R87.6",
replacement = "R87"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "R87.61",
replacement = "R87.6"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "T50.905.",
replacement = "T50.9"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "T78.40",
replacement = "T78.4"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "T780",
replacement = "T78.0"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "T81.149.88",
replacement = "T81.1"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "T81.815.013.",
replacement = "T81.8"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "T84.84",
replacement = "T84.8"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "T887",
replacement = "T88.7"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "Z86.79",
replacement = "Z86.7"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "Z87.09",
replacement = "Z87.0"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "Z87.39",
replacement = "Z87.3"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "Z87.42",
replacement = "Z87.4"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "Z87.891.",
replacement = "Z87.8"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "W44.9",
replacement = "W44"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "U80",
replacement = "U82"
)
)
disease_mapping =
disease_mapping |>
distinct()
other_icd10_desc =
data.frame(
icd10_code = c("A09.9",
"A41",
"B95",
"B96",
"B97",
"B98",
"C44",
"C79",
"C80.0",
"C80.9",
"C90",
"D07",
"D35",
"D37",
"D47",
"D48",
"E03",
"E04",
"E21",
"E23",
"E27",
"E61",
"E80",
"E87",
"E89",
"F05",
"F17",
"F41",
"F43.0",
"F53",
"G31",
"G45",
"G46",
"G62",
"G93",
"G99",
"H47",
"H50",
"H57",
"H83",
"H91",
"H92",
"H93",
"I08",
"I27",
"I27.2",
"I44",
"I45",
"I48.0",
"I49",
"I51",
"I62",
"I71",
"J15",
"J16",
"J34",
"J38",
"J69",
"J84",
"J95",
"J96",
"K05",
"K07",
"K08",
"K09",
"K10",
"K12",
"K13",
"K22",
"K22.7",
"K52",
"K56",
"K64",
"K64.0",
"K64.9",
"K74",
"K75",
"K85.9",
"K86",
"K91",
"K91.8",
"K92",
"L02",
"L13",
"L30",
"L65",
"L73",
"L85",
"L89.1",
"L98",
"M05",
"M06",
"M07",
"M11",
"M13",
"M18",
"M19",
"M24",
"M25",
"M31",
"M35",
"M43",
"M48",
"M53",
"M62",
"M66",
"M67",
"M71",
"M77",
"M79",
"M79.7",
"M80",
"M81",
"M85",
"M89",
"M96",
"N02",
"N18.3",
"N18.4",
"N28",
"N39",
"N48",
"N73",
"N76",
"N88",
"N89",
"N91",
"N92",
"N93",
"N94",
"N99",
"O14",
"O04",
"O26",
"O32",
"O34",
"O36",
"O68",
"O75",
"O99",
"R03",
"R07",
"R09",
"R10",
"R19",
"R29",
"R29.6",
"R39",
"R40",
"R47",
"R57",
"T95.8",
"W44",
"Y95",
"Z86.3",
"Z87.3",
"Z87.4",
"Z87.7",
"Z87.8",
"T85.8",
"Z91.0",
"Z88.9",
"Z88.8",
"Z88",
"Z92.6",
"N90",
"U82"
),
icd10_description = c("Gastroenteritis and colitis of unspecified origin",
"Other sepsis",
"Streptococcus and staphylococcus as the cause of diseases classified to other chapters",
"Other specified bacterial agents as the cause of diseases classified to other chapters",
"Viral agents as the cause of diseases classified to other chapters",
"Other specified infectious agents as the cause of diseases classified to other chapters",
"Other malignant neoplasms of skin",
"Secondary malignant neoplasm of other and unspecified sites",
"Malignant neoplasm, primary site unknown, so stated",
"Malignant neoplasm, primary site unspecified",
"Multiple myeloma and malignant plasma cell neoplasms",
"Carcinoma in situ of other and unspecified genital organs",
"Benign neoplasm of other and unspecified endocrine glands",
"Neoplasm of uncertain or unknown behaviour of oral cavity and digestive organs",
"Other neoplasms of uncertain or unknown behaviour of lymphoid, haematopoietic and related tissue",
"Neoplasm of uncertain or unknown behaviour of other and unspecified sites",
"Other hypothyroidism",
"Other nontoxic goitre",
"Hyperparathyroidism and other disorders of parathyroid gland",
"Hypofunction and other disorders of pituitary gland",
"Other disorders of adrenal gland",
"Deficiency of other nutrient elements",
"Disorders of porphyrin and bilirubin metabolism",
"Other disorders of fluid, electrolyte and acid-base balance",
"Postprocedural endocrine and metabolic disorders, not elsewhere classified",
"Delirium, not induced by alcohol and other psychoactive substances",
"Mental and behavioural disorders due to use of tobacco",
"Other anxiety disorders",
"Acute stress reaction",
"Mental and behavioural disorders associated with the puerperium, not elsewhere classified",
"Other degenerative diseases of nervous system, not elsewhere classified",
"Transient cerebral ischaemic attacks and related syndromes",
"Vascular syndromes of brain in cerebrovascular diseases",
"Other polyneuropathies",
"Other disorders of brain",
"Other disorders of nervous system in diseases classified elsewhere",
"Other disorders of optic [2nd] nerve and visual pathways",
"Other strabismus",
"Other disorders of eye and adnexa",
"Other diseases of inner ear",
"Other hearing loss",
"Otalgia and effusion of ear",
"Other disorders of ear, not elsewhere classified",
"Multiple valve diseases",
"Other pulmonary heart diseases",
"Other secondary pulmonary hypertension",
"Atrioventricular and left bundle-branch block",
"Other conduction disorders",
"Paroxysmal atrial fibrillation",
"Ventricular fibrillation and flutter",
"Complications and ill-defined descriptions of heart disease",
"Other nontraumatic intracranial haemorrhage",
"Aortic aneurysm and dissection",
"Bacterial pneumonia, not elsewhere classified",
"Pneumonia due to other infectious organisms, not elsewhere classified",
"Other disorders of nose and nasal sinuses",
"Diseases of vocal cords and larynx, not elsewhere classified",
"Pneumonitis due to solids and liquids",
"Other interstitial pulmonary diseases",
"Postprocedural respiratory disorders, not elsewhere classified",
"Respiratory failure, not elsewhere classified",
"Gingivitis and periodontal diseases",
"Dentofacial anomalies [including malocclusion]",
"Other disorders of teeth and supporting structures",
"Cysts of oral region, not elsewhere classified",
"Other diseases of jaws",
"Stomatitis and related lesions",
"Other diseases of lip and oral mucosa",
"Other diseases of oesophagus",
"Barrett oesophagus",
"Other noninfective gastroenteritis and colitis",
"Paralytic ileus and intestinal obstruction without hernia",
"Haemorrhoids and perianal venous thrombosis",
"First degree haemorrhoids",
"Haemorrhoids, unspecified",
"Fibrosis and cirrhosis of liver",
"Other inflammatory liver diseases",
"Acute pancreatitis, unspecified",
"Other diseases of pancreas",
"Postprocedural disorders of digestive system, not elsewhere classified",
"Other postprocedural disorders of digestive system, not elsewhere classified",
"Other diseases of digestive system",
"Cutaneous abscess, furuncle and carbuncle",
"Other bullous disorders",
"Other dermatitis",
"Other nonscarring hair loss",
"Other follicular disorders",
"Other epidermal thickening",
"Stage II decubitus ulcer",
"Other disorders of skin and subcutaneous tissue, not elsewhere classified",
"Seropositive rheumatoid arthritis",
"Other rheumatoid arthritis",
"Psoriatic and enteropathic arthropathies",
"Other crystal arthropathies",
"Other arthritis",
"Arthrosis of first carpometacarpal joint",
"Other arthrosis",
"Other specific joint derangements",
"Other joint disorders, not elsewhere classified",
"Other necrotizing vasculopathies",
"Other systemic involvement of connective tissue",
"Other deforming dorsopathies",
"Other spondylopathies",
"Other dorsopathies, not elsewhere classified",
"Other disorders of muscle",
"Spontaneous rupture of synovium and tendon",
"Other disorders of synovium and tendon",
"Other bursopathies",
"Other enthesopathies",
"Other soft tissue disorders, not elsewhere classified",
"Fibromyalgia",
"Osteoporosis with pathological fracture",
"Osteoporosis without pathological fracture",
"Other disorders of bone density and structure",
"Other disorders of bone",
"Postprocedural musculoskeletal disorders, not elsewhere classified",
"Recurrent and persistent haematuria",
"Chronic kidney disease, stage 3",
"Chronic kidney disease, stage 4",
"Other disorders of kidney and ureter, not elsewhere classified",
"Other disorders of urinary system",
"Other disorders of penis",
"Other female pelvic inflammatory diseases",
"Other inflammation of vagina and vulva",
"Other noninflammatory disorders of cervix uteri",
"Other noninflammatory disorders of vagina",
"Other noninflammatory disorders of vulva and perineum",
"Excessive, frequent and irregular menstruation",
"Other abnormal uterine and vaginal bleeding",
"Pain and other conditions associated with female genital organs and menstrual cycle",
"Postprocedural disorders of genitourinary system, not elsewhere classified",
"Pre-eclampsia",
"Medical abortion",
"Maternal care for other conditions predominantly related to pregnancy",
"Maternal care for known or suspected malpresentation of fetus",
"Maternal care for known or suspected abnormality of pelvic organs",
"Maternal care for other known or suspected fetal problems",
"Labour and delivery complicated by fetal stress [distress]",
"Other complications of labour and delivery, not elsewhere classified",
"Other maternal diseases classifiable elsewhere but complicating pregnancy, childbirth and the puerperium",
"Abnormal blood-pressure reading, without diagnosis",
"Pain in throat and chest",
"Other symptoms and signs involving the circulatory and respiratory systems",
"Abdominal and pelvic pain",
"Other symptoms and signs involving the digestive system and abdomen",
"Other symptoms and signs involving the nervous and musculoskeletal systems",
"Tendency to fall, not elsewhere classified",
"Other symptoms and signs involving the urinary system",
"Somnolence, stupor and coma",
"Speech disturbances, not elsewhere classified",
"Shock, not elsewhere classified",
"Other complications of internal prosthetic devices, implants and grafts, not elsewhere classified",
"Foreign body entering into or through eye or natural orifice",
"Nosocomial condition",
"Personal history of endocrine, nutritional and metabolic diseases",
"Personal history of diseases of the musculoskeletal system and connective tissue",
"Personal history of diseases of the genitourinary system",
"Personal history of congenital malformations, deformations and chromosomal abnormalities",
"Personal history of other specified conditions",
"Other complications of internal prosthetic devices, implants and grafts, not elsewhere classified",
"Personal history of allergy, other than to drugs and biological substances",
"Personal history of allergy to unspecified drugs, medicaments and biological substances",
"Personal history of allergy to other drugs, medicaments and biological substances
",
"Personal history of allergy to drugs, medicaments and biological substances",
"Personal history of chemotherapy for neoplastic disease",
"Other noninflammatory disorders of vulva and perineum",
"Resistance to betalactam antibiotics")
)
manual_icd10_map <-
readxl::read_xlsx(here::here("data/icd/manual_disease_icd10_mappings.xlsx"))
icd10_descriptions =
phecodes |>
select(icd10_code = ICD10,
icd10_description = ICD_DESCRIPTION
) |>
distinct()
# Expand multiple ICD codes into rows
to_add_expanded <- manual_icd10_map |>
mutate(icd10_code = str_split(icd10_code, ",\\s*")) |>
tidyr::unnest(icd10_code)
icd10_descriptions =
bind_rows(
icd10_descriptions,
to_add_expanded |>
select(icd10_code,
icd10_description = icd10_desc),
other_icd10_desc
)
icd10_descriptions = icd10_descriptions |> distinct()
icd10_descriptions =
icd10_descriptions |>
group_by(icd10_code) |>
summarise(icd10_description =
str_flatten(unique(icd10_description), collapse = "; ", na.rm = T),
.groups = "drop"
)
disease_mapping =
left_join(disease_mapping,
icd10_descriptions,
by = "icd10_code",
relationship = "many-to-one",
na_matches = "never"
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "R871",
replacement = "R87"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_description =
ifelse(icd10_code == "R87",
"Abnormal findings in specimens from female genital organs",
icd10_description)
)
# check if any ICD10 codes are missing descriptions
disease_mapping |>
filter(is.na(icd10_description)) |>
head()
# A tibble: 6 × 8
`DISEASE/TRAIT` collected_all_diseas…¹ PUBMED_ID STUDY_ACCESSION phecode
<chr> <chr> <int> <chr> <dbl>
1 Fractures (vertebral) bone fracture 29170203 GCST005097 NA
2 Fractures (vertebral) bone fracture 29170203 GCST005097 NA
3 Fractures (vertebral) bone fracture 29170203 GCST005097 NA
4 Fractures (vertebral) bone fracture 29170203 GCST005097 NA
5 Fractures (vertebral) bone fracture 29170203 GCST005097 NA
6 Fractures (vertebral) bone fracture 29170203 GCST005097 NA
# ℹ abbreviated name: ¹collected_all_disease_terms
# ℹ 3 more variables: icd10_code <chr>, icd10_code_origin <chr>,
# icd10_description <chr>
gwas_study_info |>
filter(grepl("ICD10 F05",
`DISEASE/TRAIT`)) |>
select(`DISEASE/TRAIT`, MAPPED_TRAIT, collected_all_disease_terms, PUBMED_ID)
DISEASE/TRAIT
<char>
1: ICD10 F05: Delirium due to known physiological condition (Gene-based burden)
2: ICD10 F05.9: Delirium, unspecified (Gene-based burden)
3: ICD10 F05.9: Delirium, unspecified
4: ICD10 F05: Delirium due to known physiological condition
MAPPED_TRAIT collected_all_disease_terms PUBMED_ID
<char> <char> <int>
1: alcohol withdrawal delirium alcohol-related disorders 34662886
2: delirium delirium 34662886
3: delirium delirium 34662886
4: alcohol withdrawal delirium alcohol-related disorders 34662886
disease_mapping |>
filter(icd10_code == "F05")
# A tibble: 2 × 8
`DISEASE/TRAIT` collected_all_diseas…¹ PUBMED_ID STUDY_ACCESSION phecode
<chr> <chr> <int> <chr> <dbl>
1 ICD10 F05: Delirium … alcohol-related disor… 34662886 GCST90083772 NA
2 ICD10 F05: Delirium … alcohol-related disor… 34662886 GCST90079786 NA
# ℹ abbreviated name: ¹collected_all_disease_terms
# ℹ 3 more variables: icd10_code <chr>, icd10_code_origin <chr>,
# icd10_description <chr>
# F05 refers to delirium not induced by alcohol and other psychoactive substances
# yet, the mapped trait is alcohol withdrawal delirium
# replace with delirium
disease_mapping =
disease_mapping |>
mutate(collected_all_disease_terms =
ifelse(icd10_code == "F05",
"delirium",
collected_all_disease_terms)
)
gwas_study_info =
gwas_study_info |>
mutate(MAPPED_TRAIT =
ifelse(grepl("ICD10 F05", `DISEASE/TRAIT`),
str_replace_all(collected_all_disease_terms,
pattern = "alcohol-related disorders",
replacement = "delirium"),
collected_all_disease_terms)
)
disease_mapping =
disease_mapping |>
arrange(collected_all_disease_terms, icd10_code)
fwrite(disease_mapping,
here::here("output/icd_map/gwas_disease_to_icd10_mapping.csv")
)
sessionInfo()
R version 4.3.1 (2023-06-16)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS 15.6.1
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
time zone: America/Los_Angeles
tzcode source: internal
attached base packages:
[1] stats graphics grDevices datasets utils methods base
other attached packages:
[1] jsonlite_2.0.0 httr_1.4.7 data.table_1.17.8 stringr_1.5.1
[5] dplyr_1.1.4 workflowr_1.7.1
loaded via a namespace (and not attached):
[1] compiler_4.3.1 renv_1.0.3 promises_1.3.3 tidyselect_1.2.1
[5] Rcpp_1.1.0 git2r_0.36.2 tidyr_1.3.1 callr_3.7.6
[9] later_1.4.2 jquerylib_0.1.4 readxl_1.4.5 yaml_2.3.10
[13] fastmap_1.2.0 here_1.0.1 R6_2.6.1 generics_0.1.4
[17] knitr_1.50 tibble_3.3.0 rprojroot_2.1.0 bslib_0.9.0
[21] pillar_1.11.0 rlang_1.1.6 utf8_1.2.6 cachem_1.1.0
[25] stringi_1.8.7 httpuv_1.6.16 xfun_0.52 getPass_0.2-4
[29] fs_1.6.6 sass_0.4.10 cli_3.6.5 withr_3.0.2
[33] magrittr_2.0.3 ps_1.9.1 digest_0.6.37 processx_3.8.6
[37] rstudioapi_0.17.1 lifecycle_1.0.4 vctrs_0.6.5 evaluate_1.0.4
[41] glue_1.8.0 cellranger_1.1.0 whisker_0.4.1 purrr_1.1.0
[45] rmarkdown_2.29 tools_4.3.1 pkgconfig_2.0.3 htmltools_0.5.8.1