Last updated: 2026-01-12
Checks: 7 0
Knit directory:
genomics_ancest_disease_dispar/
This reproducible R Markdown analysis was created with workflowr (version 1.7.1). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.
Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.
Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.
The command set.seed(20220216) was run prior to running
the code in the R Markdown file. Setting a seed ensures that any results
that rely on randomness, e.g. subsampling or permutations, are
reproducible.
Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.
Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.
Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.
The results in this page were generated with repository version d031c86. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.
Note that you need to be careful to ensure that all relevant files for
the analysis have been committed to Git prior to generating the results
(you can use wflow_publish or
wflow_git_commit). workflowr only checks the R Markdown
file, but you know if there are other scripts or data files that it
depends on. Below is the status of the Git repository when the results
were generated:
Ignored files:
Ignored: .DS_Store
Ignored: .Rproj.user/
Ignored: .venv/
Ignored: analysis/.DS_Store
Ignored: ancestry_dispar_env/
Ignored: data/.DS_Store
Ignored: data/RCDCFundingSummary_01042026.xlsx
Ignored: data/cdc/
Ignored: data/cohort/
Ignored: data/epmc/
Ignored: data/europe_pmc/
Ignored: data/gbd/.DS_Store
Ignored: data/gbd/IHME-GBD_2021_DATA-d8cf695e-1.csv
Ignored: data/gbd/IHME-GBD_2023_DATA-73cc01fd-1.csv
Ignored: data/gbd/ihme_gbd_2019_global_disease_burden_rate_all_ages.csv
Ignored: data/gbd/ihme_gbd_2019_global_paf_rate_percent_all_ages.csv
Ignored: data/gbd/ihme_gbd_2021_global_disease_burden_rate_all_ages.csv
Ignored: data/gbd/ihme_gbd_2021_global_paf_rate_percent_all_ages.csv
Ignored: data/gwas_catalog/
Ignored: data/icd/.DS_Store
Ignored: data/icd/2025AA/
Ignored: data/icd/IHME_GBD_2019_COD_CAUSE_ICD_CODE_MAP_Y2020M10D15.XLSX
Ignored: data/icd/IHME_GBD_2019_NONFATAL_CAUSE_ICD_CODE_MAP_Y2020M10D15.XLSX
Ignored: data/icd/IHME_GBD_2021_COD_CAUSE_ICD_CODE_MAP_Y2024M05D16.XLSX
Ignored: data/icd/IHME_GBD_2021_NONFATAL_CAUSE_ICD_CODE_MAP_Y2024M05D16.XLSX
Ignored: data/icd/UK_Biobank_master_file.tsv
Ignored: data/icd/cdc_valid_icd10_Sep_23_2025.xlsx
Ignored: data/icd/cdc_valid_icd9_Sep_23_2025.xlsx
Ignored: data/icd/hp_umls_mapping.csv
Ignored: data/icd/lancet_conditions_icd10.xlsx
Ignored: data/icd/manual_disease_icd10_mappings.xlsx
Ignored: data/icd/mondo_umls_mapping.csv
Ignored: data/icd/phecode_international_version_unrolled.csv
Ignored: data/icd/phecode_to_icd10_manual_mapping.xlsx
Ignored: data/icd/semiautomatic_ICD-pheno.txt
Ignored: data/icd/semiautomatic_ICD-pheno_UKB_subset.txt
Ignored: data/icd/umls-2025AA-mrconso.zip
Ignored: figures/
Ignored: output/.DS_Store
Ignored: output/abstracts/
Ignored: output/doccano/
Ignored: output/fulltexts/
Ignored: output/gwas_cat/
Ignored: output/gwas_cohorts/
Ignored: output/icd_map/
Ignored: output/trait_ontology/
Ignored: pubmedbert-cohort-ner-model/
Ignored: pubmedbert-cohort-ner/
Ignored: r-spacyr/
Ignored: renv/
Ignored: venv/
Unstaged changes:
Modified: analysis/disease_inves_by_ancest.Rmd
Modified: analysis/gwas_to_gbd.Rmd
Modified: analysis/index.Rmd
Modified: analysis/missing_cohort_info.Rmd
Modified: analysis/replication_ancestry_bias.Rmd
Modified: analysis/text_for_cohort_labels.Rmd
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
These are the previous versions of the repository in which changes were
made to the R Markdown (analysis/map_trait_to_icd10.Rmd)
and HTML (docs/map_trait_to_icd10.html) files. If you’ve
configured a remote Git repository (see ?wflow_git_remote),
click on the hyperlinks in the table below to view the files as they
were in that past version.
| File | Version | Author | Date | Message |
|---|---|---|---|---|
| Rmd | d031c86 | IJbeasley | 2026-01-12 | Update initial trait categorisation |
| html | 7da0bed | IJbeasley | 2026-01-05 | Build site. |
| Rmd | 78cdca8 | IJbeasley | 2026-01-05 | Update filtering of GWAS traits |
| html | bbd6e4c | IJbeasley | 2026-01-04 | Build site. |
| Rmd | ddce199 | IJbeasley | 2026-01-04 | Removing non-specific disease terms |
| html | 9d4bb84 | IJbeasley | 2026-01-03 | Build site. |
| Rmd | 9bccb24 | IJbeasley | 2026-01-03 | Update fixing of trait mapping |
| html | b9a3549 | IJbeasley | 2025-12-29 | Build site. |
| Rmd | 57a4dec | IJbeasley | 2025-12-29 | Fixing commas |
| html | de3349f | IJbeasley | 2025-12-29 | Build site. |
| Rmd | ba69411 | IJbeasley | 2025-12-29 | Fix some UMLS ICD10 Codes |
| html | 6bf9d47 | IJbeasley | 2025-12-29 | Build site. |
| Rmd | f597620 | IJbeasley | 2025-12-29 | Update mapping to ICD-10 codes (to keep year) |
| html | 1f555b6 | IJbeasley | 2025-12-29 | Build site. |
| Rmd | b4527b8 | IJbeasley | 2025-12-29 | Update mapping to ICD-10 codes |
| html | 757b4b4 | IJbeasley | 2025-10-09 | Build site. |
| Rmd | 6019c96 | IJbeasley | 2025-10-09 | Even more correcting of icd 10 codes |
| html | 0feea16 | IJbeasley | 2025-10-08 | Build site. |
| Rmd | a8f1628 | IJbeasley | 2025-10-08 | Include study accession in icd 10 map |
| html | 50ebebc | IJbeasley | 2025-10-08 | Build site. |
| Rmd | 9bbe0dd | IJbeasley | 2025-10-08 | Updating icd 10 mapping |
| html | ec027a3 | IJbeasley | 2025-10-08 | Build site. |
| Rmd | cb8a570 | IJbeasley | 2025-10-08 | Updating disease icd code mapping |
| html | 41d6fe5 | IJbeasley | 2025-09-28 | Build site. |
| Rmd | 97d340d | IJbeasley | 2025-09-28 | workflowr::wflow_publish("analysis/map_trait_to_icd10.Rmd") |
title: “Mapping GWAS traits to ICD 10” author: “Isobel Beasley” date: “2025-09-26” output: html_document —
library(dplyr)
library(stringr)
library(data.table)
source(here::here("code/get_term_descendants.R"))
# gwas_study_info <- fread(here::here("output/gwas_cat/gwas_study_info_group_v2.csv"))
gwas_study_info <- data.table::fread(here::here("output/gwas_cat/gwas_study_all_group.csv"))
gwas_study_info = gwas_study_info |>
mutate(collected_all_disease_terms = stringi::stri_trans_general(collected_all_disease_terms, "Latin-ASCII")
)
# study accession = GCST003209
# replace benign neoplasm with colorectal cancer, endometrial cancer
gwas_study_info =
gwas_study_info |>
mutate(collected_all_disease_terms = ifelse(STUDY_ACCESSION == "GCST003209" |
STUDY_ACCESSION == "GCST003208" ,
"colorectal cancer, endometrial cancer",
collected_all_disease_terms))
# for study accession = GCST90133383
# replace benign neoplasm with testicular cancer, hearing loss
disease_mapping <- gwas_study_info |>
filter(DISEASE_STUDY == T) |>
tidyr::separate_longer_delim(cols = collected_all_disease_terms,
delim = ", ") |>
select(`DISEASE/TRAIT`,
collected_all_disease_terms,
PUBMED_ID,
YEAR,
STUDY_ACCESSION) |>
distinct()
disease_mapping =
disease_mapping |>
filter(collected_all_disease_terms != "")
print("Number of unique disease trait & study pairs")
[1] "Number of unique disease trait & study pairs"
nrow(disease_mapping)
[1] 45920
diseases <- stringr::str_split(pattern = ", ",
gwas_study_info$collected_all_disease_terms[gwas_study_info$collected_all_disease_terms != ""]) |>
unlist() |>
stringr::str_trim()
diseases <- unique(diseases)
print("Number of unique disease terms")
[1] "Number of unique disease terms"
print(length(diseases))
[1] 1959
DISEASE/TRAIT termsDISEASE/TRAIT Column to better match terms# removing genotype effect (e.g. (fetal genotype effect))
disease_mapping =
disease_mapping |>
mutate(`DISEASE/TRAIT` =
str_remove_all(`DISEASE/TRAIT`,
"\\s*\\([^)]*genotype effect\\)")
) |>
mutate(`DISEASE/TRAIT` =
str_remove_all(`DISEASE/TRAIT`,
"\\s*\\([^)]* effect\\)")
)
# remove '(adjusted for APOE e4 dosage)'
disease_mapping =
disease_mapping |>
mutate(`DISEASE/TRAIT` = str_remove_all(`DISEASE/TRAIT`,
"\\(adjusted for APOE e4 dosage\\)"))
disease_mapping =
disease_mapping |>
mutate(`DISEASE/TRAIT` = str_remove_all(`DISEASE/TRAIT`,
"\\(adjusted for sex\\)|\\(adjusted for age\\)|\\(adjusted for age, sex\\)"))
# remove '(maternal):' & '(paternal):
disease_mapping =
disease_mapping |>
mutate(`DISEASE/TRAIT` = str_remove_all(`DISEASE/TRAIT`, "\\s*\\(maternal\\):")) |>
mutate(`DISEASE/TRAIT` = str_remove_all(`DISEASE/TRAIT`, "\\s*\\(paternal\\):"))
# remove 'Biological Grandparent '
disease_mapping =
disease_mapping |>
mutate(`DISEASE/TRAIT` = str_remove_all(`DISEASE/TRAIT`, "Biological Grandparent "))
# remove 'Biological Father: ', 'Biological Sibling: ', 'Biological Mother: '
disease_mapping =
disease_mapping |>
mutate(`DISEASE/TRAIT` = str_remove_all(`DISEASE/TRAIT`, "Biological Father: ")) |>
mutate(`DISEASE/TRAIT` = str_remove_all(`DISEASE/TRAIT`, "Biological Sibling: ")) |>
mutate(`DISEASE/TRAIT` = str_remove_all(`DISEASE/TRAIT`, "Biological Mother: "))
# illnesses of father -
disease_mapping =
disease_mapping |>
mutate(`DISEASE/TRAIT` = str_remove_all(`DISEASE/TRAIT`, "Illnesses of father - ")) |>
mutate(`DISEASE/TRAIT` = str_remove_all(`DISEASE/TRAIT`, "Illnesses of mother - ")) |>
mutate(`DISEASE/TRAIT` = str_remove_all(`DISEASE/TRAIT`, "Illnesses of siblings - "))
# remove (apnea hypopnea index),
# (average respiratory event duration),
# (micro-arousal index)
# (percentage of N3 sleep time during total sleep time)
# (percentage of N3 sleep time during sleep period time)
# (average oxyhemoglobin desaturation per event)
# (average oxyhemoglobin saturation across sleep episode)
# (percentage sleep with oxyhemoglobin saturation less than 90%)
# (wake time during sleep period time)
# (minimum oxyhemoglobin saturation across sleep episode)
# (oxygen desaturation index)
# (average oxygen saturation during sleep)
# (apnea hypopnea index, change over time)
disease_mapping =
disease_mapping |>
mutate(`DISEASE/TRAIT` = str_remove_all(`DISEASE/TRAIT`,
"\\s*\\(apnea hypopnea index\\)")) |>
mutate(`DISEASE/TRAIT` = str_remove_all(`DISEASE/TRAIT`,
"\\s*\\(average respiratory event duration\\)")) |>
mutate(`DISEASE/TRAIT` = str_remove_all(`DISEASE/TRAIT`,
"\\s*\\(micro-arousal index\\)")) |>
mutate(`DISEASE/TRAIT` = str_remove_all(`DISEASE/TRAIT`,
"\\s*\\(percentage of N3 sleep time during total sleep time\\)")) |>
mutate(`DISEASE/TRAIT` = str_remove_all(`DISEASE/TRAIT`,
"\\s*\\(percentage of N3 sleep time during sleep period time\\)")) |>
mutate(`DISEASE/TRAIT` = str_remove_all(`DISEASE/TRAIT`,
"\\s*\\(average oxyhemoglobin desaturation per event\\)")) |>
mutate(`DISEASE/TRAIT` = str_remove_all(`DISEASE/TRAIT`,
"\\s*\\(average oxyhemoglobin saturation across sleep episode\\)")) |>
mutate(`DISEASE/TRAIT` = str_remove_all(`DISEASE/TRAIT`,
"\\s*\\(percentage sleep with oxyhemoglobin saturation less than 90%\\)")) |>
mutate(`DISEASE/TRAIT` = str_remove_all(`DISEASE/TRAIT`,
"\\s*\\(wake time during sleep period time\\)")) |>
mutate(`DISEASE/TRAIT` = str_remove_all(`DISEASE/TRAIT`,
"\\s*\\(minimum oxyhemoglobin saturation across sleep episode\\)")) |>
mutate(`DISEASE/TRAIT` = str_remove_all(`DISEASE/TRAIT`,
"\\s*\\(oxygen desaturation index\\)")) |>
mutate(`DISEASE/TRAIT` = str_remove_all(`DISEASE/TRAIT`,
"\\s*\\(average oxygen saturation during sleep\\)")) |>
mutate(`DISEASE/TRAIT` = str_remove_all(`DISEASE/TRAIT`,
"\\s*\\(apnea hypopnea index, change over time\\)"))
disease_mapping =
disease_mapping |>
mutate(`DISEASE/TRAIT` = str_remove_all(`DISEASE/TRAIT`,
" during REM sleep$| during non-REM sleep$"))
# ends in levels in coronary artery disease, make it just coronary artery disease
disease_mapping =
disease_mapping |>
mutate(`DISEASE/TRAIT` =
ifelse(str_detect(`DISEASE/TRAIT`,
"(?i)\\s*levels in coronary artery disease$"),
"coronary artery disease",
`DISEASE/TRAIT`))
# ends in levels in chronic kidney disease, make it just chronic kidney disease
# ends in levels in coronary artery disease, make it just coronary artery disease
disease_mapping =
disease_mapping |>
mutate(`DISEASE/TRAIT` =
ifelse(str_detect(`DISEASE/TRAIT`,
"(?i)\\s*levels in chronic kidney disease$"),
"chronic kidney disease",
`DISEASE/TRAIT`))
# ends in levels in type 2 diabetes, make it just type 2 diabetes
disease_mapping =
disease_mapping |>
mutate(`DISEASE/TRAIT` =
ifelse(str_detect(`DISEASE/TRAIT`,
"(?i)\\s*levels in type 2 diabetes$"),
"type 2 diabetes",
`DISEASE/TRAIT`))
# ends in levels in prediabetes, make it just prediabetes
disease_mapping =
disease_mapping |>
mutate(`DISEASE/TRAIT` =
ifelse(str_detect(`DISEASE/TRAIT`,
"(?i)\\s*levels in prediabetes$"),
"prediabetes",
`DISEASE/TRAIT`))
# remove str_extract(x, "(?<=Takes medication for )\\w+")
disease_mapping =
disease_mapping |>
mutate(`DISEASE/TRAIT` =
ifelse(str_detect(`DISEASE/TRAIT`,
"(?i)Takes medication for \\w+"),
str_extract(`DISEASE/TRAIT`,
"(?<=Takes medication for )\\w+"),
`DISEASE/TRAIT`))
# remove BMI adjustments
# '(BMI adjusted)', or '(adjusted for BMI)'
disease_mapping =
disease_mapping |>
mutate(`DISEASE/TRAIT` = str_remove_all(`DISEASE/TRAIT`,
"\\s*\\(BMI adjusted\\)|\\s*\\(adjusted for BMI\\)"))
# remove 'trait'
disease_mapping =
disease_mapping |>
mutate(`DISEASE/TRAIT` = str_remove_all(`DISEASE/TRAIT`,
"\\s*trait$"))
# remove (slight), (severe), (generalised), (localised)
disease_mapping =
disease_mapping |>
mutate(`DISEASE/TRAIT` = str_remove_all(`DISEASE/TRAIT`,
"\\s*\\(slight\\)|\\s*\\(severe\\)|\\s*\\(generalised\\)|\\s*\\(localised\\)"))
disease_mapping =
disease_mapping |>
mutate(`DISEASE/TRAIT` = str_squish(`DISEASE/TRAIT`))
matched_icd10_desc =
phecodes |>
mutate(collected_all_disease_terms = tolower(iconv(ICD_DESCRIPTION,
to = "UTF-8"))
) |>
select(collected_all_disease_terms,
icd10_code = ICD10)
# filter missing strings
matched_icd10_desc =
matched_icd10_desc |>
filter(!is.na(collected_all_disease_terms) &
collected_all_disease_terms != "") |>
filter(!is.na(icd10_code) &
icd10_code != "")
matched_icd10_desc =
matched_icd10_desc |>
group_by(collected_all_disease_terms) |>
summarise(icd10_code = str_flatten(unique(icd10_code),
collapse = ", ",
na.rm = T),
.groups = "drop")
matched_icd10_desc =
matched_icd10_desc |>
mutate(icd10_code_origin = "ICD Description Match (DISEASE/TRAIT)")
# match by DISEASE/TRAIT
matched_icd10_desc =
matched_icd10_desc |>
rename(`DISEASE/TRAIT` = collected_all_disease_terms)
disease_mapping =
disease_mapping |>
rows_patch(matched_icd10_desc,
unmatched = "ignore")
print("Number of ICD10 codes obtained")
[1] "Number of ICD10 codes obtained"
disease_mapping |>
filter(!is.na(icd10_code)) |>
group_by(icd10_code_origin) |>
summarise(n = n(),
percent = n()/n_studies *100)
# A tibble: 4 × 3
icd10_code_origin n percent
<chr> <int> <dbl>
1 ICD Description Match (DISEASE/TRAIT) 3945 8.59
2 Study PheCode (Manual Mapping) 875 1.91
3 Study PheCode Mapping 6766 14.7
4 Study Provided 3940 8.58
phenotype_icd_map =
phecodes |>
group_by(Phenotype) |>
summarise(icd10_code =
str_flatten(ICD10,
collapse = ", ",
na.rm = T),
.groups = "drop")
matched_phenotypes =
phenotype_icd_map #|>
#filter(tolower(Phenotype) %in% not_found_diseases)
matched_phenotypes =
matched_phenotypes |>
mutate(collected_all_disease_terms = tolower(Phenotype)) |>
select(collected_all_disease_terms, icd10_code) |>
mutate(icd10_code_origin = "Phecode Phenotype Match (DISEASE/TRAIT)")
# match by DISEASE/TRAIT
matched_phenotypes =
matched_phenotypes |>
rename(`DISEASE/TRAIT` = collected_all_disease_terms)
disease_mapping =
disease_mapping |>
rows_patch(matched_phenotypes,
unmatched = "ignore")
disease_mapping |>
filter(icd10_code_origin == "Phecode Phenotype Match (DISEASE/TRAIT)") |>
nrow()
[1] 572
disease_mapping |>
filter(!is.na(icd10_code)) |>
group_by(icd10_code_origin) |>
summarise(n = n(),
percent = n()/n_studies *100)
# A tibble: 5 × 3
icd10_code_origin n percent
<chr> <int> <dbl>
1 ICD Description Match (DISEASE/TRAIT) 3945 8.59
2 Phecode Phenotype Match (DISEASE/TRAIT) 572 1.25
3 Study PheCode (Manual Mapping) 875 1.91
4 Study PheCode Mapping 6766 14.7
5 Study Provided 3940 8.58
matched =
disease_mapping |>
filter(icd10_code != "") |>
pull(collected_all_disease_terms)
not_found_diseases <- diseases[!diseases %in%
matched
]
not_found_diseases <- not_found_diseases[not_found_diseases != ""]
print(length(not_found_diseases))
[1] 623
unmapped_disease_terms <-
disease_mapping |>
filter(is.na(icd10_code)) |>
pull(`DISEASE/TRAIT`) |>
unique() |>
tolower()
# get UMLS CUIs with ICD10 mappings
umls_data <-
data.table::fread(here::here("data/icd/2025AA/META/MRCONSO.RRF"),
sep = "|",
header = FALSE,
quote = "",
fill = TRUE,
na.strings = c("", "NA")
)
colnames(umls_data)[1:18] <- c(
"CUI","LAT","TS","LUI","STT","SUI","ISPREF",
"AUI","SAUI","SCUI","SDUI","SAB","TTY","CODE",
"STR","SRL","SUPPRESS","CVF"
)
umls_cuis_icd10 <-
umls_data |>
filter(SAB %in% c("ICD10", "ICD10CM")) |>
select(CODE, CUI) |>
group_by(CUI) |>
summarise(CODE = str_flatten(unique(CODE),
collapse = ", ",
na.rm = T),
.groups = "drop"
) |>
rename(icd10_code = CODE)
umls_icd10 <-
umls_data |>
filter(CUI %in% umls_cuis_icd10$CUI)
umls_icd10 =
umls_icd10 |>
left_join(umls_cuis_icd10,
by = "CUI")
# overlap with umls terms
disease_trait_umls <-
umls_icd10 |>
filter(tolower(STR) %in% unmapped_disease_terms) |>
mutate(`DISEASE/TRAIT` = tolower(STR)) |>
# mutate(CODE = ifelse(SAB %in% c("ICD10", "ICD10CM"),
# CODE,
# NA)) |>
select(`DISEASE/TRAIT`,
icd10_code) |>
group_by(`DISEASE/TRAIT`) |>
summarise(icd10_code = str_flatten(unique(icd10_code),
collapse = ", ",
na.rm = T),
.groups = "drop"
) |>
distinct() |>
mutate(icd10_code_origin = "UMLS term match")
disease_mapping <-
disease_mapping |>
mutate(`DISEASE/TRAIT` = tolower(`DISEASE/TRAIT`)) |>
rows_patch(disease_trait_umls,
by = "DISEASE/TRAIT")
# remove pubmed id: 27197191
# in this one, disease/trait = "cancer", but specific cancers tested
# are listed in collected_all_disease_terms - so will only
# match collected_all_disease_terms later (not disease/trait terms)
disease_mapping =
disease_mapping |>
mutate(
icd10_code = ifelse(PUBMED_ID == 27197191,
NA,
icd10_code
)
) |>
mutate(
icd10_code_origin = ifelse(PUBMED_ID == 27197191,
NA,
icd10_code_origin
)
)
# collected_all_disease_terms = "hepatitis b"
# & disease/trait contains "hepatitis B vaccine"
disease_mapping =
disease_mapping |>
mutate(
icd10_code = ifelse(
str_detect(tolower(`DISEASE/TRAIT`),
"behaviour of cancer tumour: uncertain whether benign or malignant"),
paste0("D", 37:48, collapse = ", "),
icd10_code
)
) |>
mutate(
icd10_code_origin = ifelse(
str_detect(tolower(`DISEASE/TRAIT`),
"behaviour of cancer tumour: uncertain whether benign or malignant"),
"Manual Mapping (DISEASE/TRAIT)",
icd10_code_origin
)
)
collected_all_disease_terms
termsmatched_icd10_desc =
matched_icd10_desc |>
rename(collected_all_disease_terms = `DISEASE/TRAIT`)
matched_icd10_desc =
matched_icd10_desc |>
mutate(icd10_code_origin = "ICD Description Match (collected_all_disease_terms)")
disease_mapping =
rows_patch(disease_mapping,
matched_icd10_desc,
unmatched = "ignore")
Matching, by = "collected_all_disease_terms"
print("Number of ICD10 codes obtained")
[1] "Number of ICD10 codes obtained"
disease_mapping |>
filter(!is.na(icd10_code)) |>
group_by(icd10_code_origin) |>
summarise(n = n(),
percent = n()/n_studies *100)
# A tibble: 8 × 3
icd10_code_origin n percent
<chr> <int> <dbl>
1 ICD Description Match (DISEASE/TRAIT) 3945 8.59
2 ICD Description Match (collected_all_disease_terms) 9579 20.9
3 Manual Mapping (DISEASE/TRAIT) 1 0.00218
4 Phecode Phenotype Match (DISEASE/TRAIT) 572 1.25
5 Study PheCode (Manual Mapping) 875 1.91
6 Study PheCode Mapping 6766 14.7
7 Study Provided 3940 8.58
8 UMLS term match 4817 10.5
matched =
disease_mapping |>
filter(icd10_code != "") |>
pull(collected_all_disease_terms)
not_found_diseases <- diseases[!diseases %in%
matched
]
not_found_diseases <- not_found_diseases[not_found_diseases != ""]
print(length(not_found_diseases))
[1] 431
matched_phenotypes =
matched_phenotypes |>
rename(collected_all_disease_terms = `DISEASE/TRAIT`)
matched_phenotypes =
matched_phenotypes |>
mutate(icd10_code_origin = "Phecode Phenotype Match (collected_all_disease_terms)")
# match by collected_all_disease_terms
disease_mapping =
disease_mapping |>
rows_patch(matched_phenotypes,
unmatched = "ignore")
Matching, by = "collected_all_disease_terms"
print("Number of ICD10 codes obtained")
[1] "Number of ICD10 codes obtained"
disease_mapping |>
filter(!is.na(icd10_code)) |>
group_by(icd10_code_origin) |>
summarise(n = n(),
percent = n()/n_studies *100)
# A tibble: 9 × 3
icd10_code_origin n percent
<chr> <int> <dbl>
1 ICD Description Match (DISEASE/TRAIT) 3945 8.59
2 ICD Description Match (collected_all_disease_terms) 9579 20.9
3 Manual Mapping (DISEASE/TRAIT) 1 0.00218
4 Phecode Phenotype Match (DISEASE/TRAIT) 572 1.25
5 Phecode Phenotype Match (collected_all_disease_terms) 2196 4.78
6 Study PheCode (Manual Mapping) 875 1.91
7 Study PheCode Mapping 6766 14.7
8 Study Provided 3940 8.58
9 UMLS term match 4817 10.5
matched =
disease_mapping |>
filter(icd10_code != "") |>
pull(collected_all_disease_terms)
not_found_diseases <- diseases[!diseases %in%
matched
]
not_found_diseases <- not_found_diseases[not_found_diseases != ""]
print(length(not_found_diseases))
[1] 418
unmapped_terms <-
disease_mapping |>
filter(is.na(icd10_code)) |>
pull(collected_all_disease_terms) |>
unique()
# overlap with umls terms
collected_trait_umls <-
umls_icd10 |>
filter(tolower(STR) %in% unmapped_terms) |>
mutate(collected_all_disease_terms = tolower(STR)) |>
# mutate(CODE = ifelse(SAB %in% c("ICD10", "ICD10CM"),
# CODE,
# NA)) |>
select(collected_all_disease_terms,
icd10_code) |>
group_by(collected_all_disease_terms) |>
summarise(icd10_code = str_flatten(unique(icd10_code),
collapse = ", ",
na.rm = T),
.groups = "drop"
) |>
distinct() |>
mutate(icd10_code_origin = "UMLS term match (collected_all_disease_terms)")
disease_mapping <-
disease_mapping |>
rows_patch(collected_trait_umls,
by = "collected_all_disease_terms")
disease_mapping =
disease_mapping |>
mutate(
icd10_code = ifelse(
collected_all_disease_terms == "influenza a (h1n1)",
"J10",
icd10_code
)
) |>
mutate(
icd10_code_origin = ifelse(
collected_all_disease_terms == "influenza a (h1n1)",
"Manual Mapping (collected_all_disease_terms)",
icd10_code_origin
)
)
disease_mapping =
disease_mapping |>
mutate(
icd10_code = ifelse(
collected_all_disease_terms == "suicide",
paste0("X", 60:84, collapse = ", "),
icd10_code
)
) |>
mutate(
icd10_code_origin = ifelse(
collected_all_disease_terms == "suicide",
"Manual Mapping (collected_all_disease_terms)",
icd10_code_origin
)
)
disease_mapping =
disease_mapping |>
mutate(
icd10_code = ifelse(
collected_all_disease_terms == "osteoarthritis of spine",
"M47",
icd10_code
)
) |>
mutate(
icd10_code_origin = ifelse(
collected_all_disease_terms == "osteoarthritis of spine",
"Manual Mapping (collected_all_disease_terms)",
icd10_code_origin
)
)
disease_mapping =
disease_mapping |>
mutate(
icd10_code = ifelse(
collected_all_disease_terms == "osteoarthritis of hand",
"M19.0",
icd10_code
)
) |>
mutate(
icd10_code_origin = ifelse(
collected_all_disease_terms == "osteoarthritis of hand",
"Manual Mapping (collected_all_disease_terms)",
icd10_code_origin
)
)
manual_icd10_map <-
readxl::read_xlsx(here::here("data/icd/manual_disease_icd10_mappings.xlsx"))
manual_icd10_map =
manual_icd10_map |>
select(collected_all_disease_terms = mapped_trait,
icd10_code) |>
mutate(collected_all_disease_terms = stringr::str_squish(tolower(collected_all_disease_terms))) |>
mutate(icd10_code_origin = "Manual Mapping (collected_all_disease_terms)")
# disease_mapping =
# bind_rows(disease_mapping, to_add) |>
# distinct()
disease_mapping =
rows_patch(disease_mapping,
manual_icd10_map,
unmatched = "ignore")
Matching, by = "collected_all_disease_terms"
disease_mapping |>
filter(icd10_code_origin == "Manual Mapping (collected_all_disease_terms)") |>
nrow()
[1] 1264
# repeat for `DISEASE/TRAIT`
manual_icd10_map =
manual_icd10_map |>
select(`DISEASE/TRAIT` = collected_all_disease_terms,
icd10_code) |>
mutate(icd10_code_origin = "Manual Mapping (DISEASE/TRAIT)")
disease_mapping =
rows_patch(disease_mapping |> mutate(`DISEASE/TRAIT` = tolower(`DISEASE/TRAIT`)),
manual_icd10_map,
unmatched = "ignore")
Matching, by = "DISEASE/TRAIT"
disease_mapping =
disease_mapping |>
mutate(icd10_code = stringr::str_remove_all(string = icd10_code,
pattern = ", $"))
# disease_mapping =
# disease_mapping |>
# filter(icd10_code != "")
matched <- c(disease_mapping_matched$collected_all_disease_terms,
matched_phenotypes$collected_all_disease_terms,
to_add$collected_all_disease_terms,
manual_icd10_map$collected_all_disease_terms)
Warning: Unknown or uninitialised column: `collected_all_disease_terms`.
Unknown or uninitialised column: `collected_all_disease_terms`.
not_found_diseases <- diseases[!diseases %in% matched]
not_found_diseases <- not_found_diseases[not_found_diseases != ""]
print(length(not_found_diseases))
[1] 588
# similar studies:
study_icd_map =
disease_mapping |>
filter(!is.na(icd10_code)) |>
filter(icd10_code_origin == "Study Provided" | icd10_code_origin == "Study PheCode Mapping")
study_icd_map =
study_icd_map |>
select(collected_all_disease_terms, icd10_code) |>
distinct()
# remove Z95.1 and Z95.5 from coronary artery disease
study_icd_map =
study_icd_map |>
mutate(icd10_code = ifelse(collected_all_disease_terms == "coronary artery disease",
str_replace_all(icd10_code, "Z95.1|Z95.5", ""),
icd10_code))
# assume ovarian cancer terms more specific than
# Malignant neoplasm of ovary and other uterine adnexa (PheCode 184.1)
# study_icd_map =
# study_icd_map |>
# filter(!( collected_all_disease_terms == "ovarian cancer" &&phecode == "184.10")
# )
study_icd_map =
study_icd_map |>
mutate(collected_all_disease_terms = str_trim(collected_all_disease_terms)) |>
mutate(collected_all_disease_terms = str_remove_all(collected_all_disease_terms, "^, "))
study_icd_map =
study_icd_map |>
filter(icd10_code != "" & !is.na(icd10_code))
study_icd_map =
study_icd_map |>
tidyr::separate_longer_delim(icd10_code, delim = ", ") |>
tidyr::separate_longer_delim(icd10_code, delim = ",")
study_icd_map =
study_icd_map |>
group_by(collected_all_disease_terms) |>
summarise(icd10_code = str_flatten(unique(sort(icd10_code)),
collapse = ", ",
na.rm = T),
.groups = "drop")
study_icd_map =
study_icd_map |>
mutate(icd10_code_origin = "Inferred from similar studies")
disease_mapping =
rows_patch(disease_mapping,
study_icd_map,
unmatched = "ignore")
Matching, by = "collected_all_disease_terms"
disease_mapping |>
filter(is.na(icd10_code)) |>
nrow()
[1] 132
disease_mapping |>
group_by(icd10_code_origin) |>
summarise(n = n()) |>
arrange(desc(n))
# A tibble: 14 × 2
icd10_code_origin n
<chr> <int>
1 UMLS term match (collected_all_disease_terms) 11007
2 ICD Description Match (collected_all_disease_terms) 9579
3 Study PheCode Mapping 6762
4 UMLS term match 4798
5 ICD Description Match (DISEASE/TRAIT) 3945
6 Study Provided 3940
7 Phecode Phenotype Match (collected_all_disease_terms) 2196
8 Manual Mapping (collected_all_disease_terms) 1264
9 Study PheCode (Manual Mapping) 875
10 Inferred from similar studies 826
11 Phecode Phenotype Match (DISEASE/TRAIT) 572
12 <NA> 132
13 Manual Mapping (DISEASE/TRAIT) 21
14 Manual Mapping (collected_all_disease_terms + DISEASE/TRAIT) 3
disease_mapping =
disease_mapping |>
mutate(icd10_code = sub("^([^-]+)-\\1$", "\\1", icd10_code))
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "C50-C50.9",
replacement = "C50"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "F30-F39.9",
replacement = "F30"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "F99-F99.9",
replacement = "F99"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "F40 F41 F42",
replacement = "F40, F41, F42"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "G47.8. G47.9",
replacement = "G47.8, G47.9"
)
)
# ICD-10 cm conversion to ICD-10 WHO
# from here: https://seer.cancer.gov/tools/conversion/2026/ICD10CM-to-ICD10.FY2026.pdf
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "D3A.8",
replacement = "D36.7"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "F32.A",
replacement = "F32"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "G20.A1",
replacement = "G20"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "G20.C",
replacement = "G20"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "G40.A",
replacement = "G40"
)
)
# resistant hypertension to hypertension codes
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "I1A.0",
replacement = "I10, I11, I12, I13, I15"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "J09.X",
replacement = "J09"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "S06.0X",
replacement = "S06.0"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "S06.0XA",
replacement = "S06.0"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "S06.0A",
replacement = "S06.0"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "A15-A19.9|A15-A19",
replacement = paste0("A", 15:19, collapse = ", ")
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "A50-A64.9|A50-A64",
replacement = paste0("A", 50:64, collapse = ", ")
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "B20-B24.9",
replacement = paste0("B", 20:24, collapse = ", ")
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "C00-C97.9",
replacement = paste0(
paste0("C0", 0:9, collapse = ", "),
", ",
paste0("C", 10:97, collapse = ", "),
collapse = ","
)
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "C00-D48.9",
replacement = paste0(
paste0("C0", 0:9, collapse = ", "),
", ",
paste0("C", 10:90, collapse = ", "),
", ",
paste0("D", 0:9, collapse = ", "),
", ",
paste0("D", 10:48, collapse = ", "),
collapse = ",")
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "C60-C63.9|C60-C63",
replacement = paste0("C", 60:63, collapse = ", ")
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "D10-D36.9",
replacement = paste0("D", 10:36, collapse = ", ")
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "D80-D89",
replacement = paste0("D", 80:89, collapse = ", ")
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "E00-E07.9|E00-E07",
replacement = paste0("E0", 0:7, collapse = ", ")
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "E08-E13",
replacement = "E08, E09, E10, E11, E12, E13"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "E10-E14.9",
replacement = "E10, E11, E12, E13, E14"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "E27.1-E27.4",
replacement = "E27.1, E27.2, E27.3, E27.4"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "E70-E90.9",
replacement = paste0("E", 70:90, collapse = ", ")
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "F30-F39",
replacement = paste0("F", 30:39, collapse = ", ")
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "G00-G99.9|G00-G99",
replacement = paste0(
paste0("G0", 0:9, collapse = ", "),
", ",
paste0("G", 10:99, collapse = ", "),
collapse = ","
)
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "H40-H42.9|H40-H42",
replacement = paste0("H", 40:42, collapse = ", ")
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "I00-I02.9|I00-I02",
replacement = paste0("I0", 0:2, collapse = ", ")
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "I10-I15.9",
replacement = paste0("I", 10:15, collapse = ", ")
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "I20-I25.9|I20-I25",
replacement = paste0("I", 20:25, collapse = ", ")
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "I60-I69.9|I60-I69",
replacement = paste0("I", 60:69, collapse = ", ")
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "J00-J99.9|J00-J99",
replacement = paste0(paste0("J0", 0:9, collapse = ", "),
", ",
paste0("J", 10:99, collapse = ", "),
collapse = ","
)
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "K00-K93.9|K00-K93",
replacement = paste0(paste0("K0", 0:9, collapse = ", "),
", ",
paste0("K", 10:93, collapse = ", "),
collapse = ","
)
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "K40-K46.9|K40-K46",
replacement = paste0("K", 40:46, collapse = ", ")
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "K70-K77.9|K70-K77",
replacement = paste0("K", 70:77, collapse = ", ")
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "L00-L08.9|L00-L08",
replacement = paste0("L0", 0:8, collapse = ", ")
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "L00-L99.9|L00-L99",
replacement = paste0(paste0("L0", 0:9, collapse = ", "),
", ",
paste0("L", 10:99, collapse = ", "),
collapse = ","
)
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "M00-M25.9",
replacement = paste0(paste0("M0", 0:9, collapse = ", "),
", ",
paste0("M", 10:25, collapse = ", "),
collapse = ","
)
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "M15-M19",
replacement = paste0("M", 15:19, collapse = ", ")
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "M15-M19",
replacement = paste0("M", 15:19, collapse = ", ")
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "M30-M36",
replacement = paste0("M", 30:36, collapse = ", ")
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "M45-M49.9|M45-M49",
replacement = paste0("M", 45:49, collapse = ", ")
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "M60-M79.9",
replacement = paste0("M", 60:79, collapse = ", ")
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "N10-N16",
replacement = paste0("N", 10:16, collapse = ", ")
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "N17-N19.9",
replacement = paste0("N", 17:19, collapse = ", ")
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "N18.3-N18.9",
replacement = paste0("N18.", 3:9, collapse = ", ")
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "N20-N23.9|N20-N23",
replacement = paste0("N", 20:23, collapse = ", ")
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "N60-N64.9",
replacement = paste0("N", 60:64, collapse = ", ")
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "N60-N65",
replacement = paste0("N", 60:65, collapse = ", ")
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "O00-O08.9|O00-O08",
replacement = paste0("O0", 0:8, collapse = ", ")
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "X71-X83",
replacement = paste0("X", 71:83, collapse = ", ")
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_split(icd10_code, ",\\s*")) |>
tidyr::unnest(icd10_code)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "B37.49",
replacement = "B37.4"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "E05.90",
replacement = "E05.9"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "E11.319|E11.31|E11.329.9|E11.329.|E11.32|E11.3.",
replacement = "E11.3"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "E11.3.",
replacement = "E11.3"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "E13.621.|E13.62",
replacement = "E11.3"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "E78.00",
replacement = "E78.0"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "F10.10",
replacement = "F10.1"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "F17.201.|F17.20",
replacement = "F17.2"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "F430",
replacement = "F43.0"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "G47.00",
replacement = "G47.0"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "H01.00|H01.09.",
replacement = "H01.0"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "H01.09.",
replacement = "H01.0"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "H029",
replacement = "H02.9"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "H60.90",
replacement = "H60.9"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "H60.90",
replacement = "H60.9"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "H91.8X9.",
replacement = "H91.8"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "H91.90",
replacement = "H91.9"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "H93.299.|H93.29",
replacement = "H01"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "I70.20",
replacement = "I70.2"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "I82.409.|I82.40|I82.4",
replacement = "I82"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "J30.9",
replacement = "J30.2"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "J45.998.|J45.909.|J45.901.|J45.99|J45.90",
replacement = "J45.9"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "J98.457.6",
replacement = "J98.4"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "K05.30-31",
replacement = "K05.3"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "K29.70",
replacement = "K29.7"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "K59.00",
replacement = "K59.0"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "L08.89",
replacement = "L08.8"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "M06.99|M06.90",
replacement = "M06.9"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "M10.99",
replacement = "M10.9"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "M10.99",
replacement = "M10.9"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "M13.90|M13.94|M13.96|M13.97|M13.99",
replacement = "M13.9"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "M19.07",
replacement = "M19.0"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "M19.90|M19.91|M19.94|M19.97|M19.99",
replacement = "M19.9"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "M25.50|M25.51|M25.55|M25.569.|M25.56|M25.571.|M25.57",
replacement = "M25.5"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "M25.76|M25.77",
replacement = "M25.7"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "M43.16",
replacement = "M43.1"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "M47.80|M47.82|M47.86",
replacement = "M47.8"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "M47.92|M47.96",
replacement = "M47.9"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "M48.02|M48.06",
replacement = "M48.0"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "M54.22",
replacement = "M54.2"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "M54.30|M54.39",
replacement = "M54.3"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "M54.56|M54.57|M54.59",
replacement = "M54.5"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "M54.99",
replacement = "M54.9"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "M65.34",
replacement = "M65.3"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "M65.96",
replacement = "M65.9"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "M72.04",
replacement = "M72.0"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "M79.09",
replacement = "M79.0"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "M79.65|M79.66|M79.67",
replacement = "M79.6"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "M79.79",
replacement = "M79.7"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "M79.86",
replacement = "M79.8"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "M81.99",
replacement = "M81.9"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "N390",
replacement = "N39.0"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "N50.89",
replacement = "N50.8"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "N52",
replacement = "F52"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "N52.9",
replacement = "F52.9"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "N814",
replacement = "N81.4"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "P29.12",
replacement = "P29.1"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "R06.00|R06.09",
replacement = "R06.0"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "R06.83",
replacement = "R06.8"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "R07.89",
replacement = "R07.8"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "R07.8|R07.9",
replacement = "R07"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "R09.89",
replacement = "R09.8"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "R10.30",
replacement = "R10.3"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "R10.8|R10.9",
replacement = "R10"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "R11.0",
replacement = "R11"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "R13.10",
replacement = "R13"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "R13.1",
replacement = "R13"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "R14.0",
replacement = "R14"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "R19.7",
replacement = "R19"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "R29.898.|R29.89",
replacement = "R29.8"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "R31.29",
replacement = "R31.2"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "R31.2|R31.9",
replacement = "R31"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "R53.82|R53.83|R5382",
replacement = "R53.8"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "R53.8",
replacement = "R53"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "R56.9",
replacement = "R56"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "R73.09|R73.02",
replacement = "R73.0"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "R80.9",
replacement = "R80"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "R82.998.|R82.99",
replacement = "R82.9"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "R871",
replacement = "R87.1"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "R87.6",
replacement = "R87"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "R87.61",
replacement = "R87.6"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "T50.905.",
replacement = "T50.9"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "T78.40",
replacement = "T78.4"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "T780",
replacement = "T78.0"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "T81.149.88",
replacement = "T81.1"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "T81.815.013.",
replacement = "T81.8"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "T84.84",
replacement = "T84.8"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "T887",
replacement = "T88.7"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "U80",
replacement = "U82"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "W44.9",
replacement = "W44"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "Z86.79",
replacement = "Z86.7"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "Z87.09",
replacement = "Z87.0"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "Z87.39",
replacement = "Z87.3"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "Z87.42",
replacement = "Z87.4"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "Z87.828.",
replacement = "Z87.8"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "Z87.891.",
replacement = "Z87.8"
)
)
disease_mapping =
disease_mapping |>
distinct()
other_icd10_desc =
data.frame(
icd10_code = c("A09.9",
"A41",
"B95",
"B96",
"B97",
"B98",
"C44",
"C79",
"C80.0",
"C80.9",
"C90",
"D07",
"D35",
"D37",
"D47",
"D48",
"E03",
"E04",
"E21",
"E23",
"E27",
"E61",
"E80",
"E87",
"E89",
"F05",
"F17",
"F41",
"F43.0",
"F53",
"G31",
"G45",
"G46",
"G62",
"G93",
"G99",
"H47",
"H50",
"H57",
"H83",
"H91",
"H92",
"H93",
"I08",
"I27",
"I27.2",
"I44",
"I45",
"I48.0",
"I49",
"I51",
"I62",
"I71",
"J15",
"J16",
"J34",
"J38",
"J69",
"J84",
"J95",
"J96",
"K05",
"K07",
"K08",
"K09",
"K10",
"K12",
"K13",
"K22",
"K22.7",
"K52",
"K56",
"K64",
"K64.0",
"K64.9",
"K74",
"K75",
"K85.9",
"K86",
"K91",
"K91.8",
"K92",
"L02",
"L13",
"L30",
"L65",
"L73",
"L85",
"L89.1",
"L98",
"M05",
"M06",
"M07",
"M11",
"M13",
"M18",
"M19",
"M24",
"M25",
"M31",
"M35",
"M43",
"M48",
"M53",
"M62",
"M66",
"M67",
"M71",
"M77",
"M79",
"M79.7",
"M80",
"M81",
"M85",
"M89",
"M96",
"N02",
"N18.3",
"N18.4",
"N28",
"N39",
"N48",
"N73",
"N76",
"N88",
"N89",
"N91",
"N92",
"N93",
"N94",
"N99",
"O14",
"O04",
"O26",
"O32",
"O34",
"O36",
"O68",
"O75",
"O99",
"R03",
"R07",
"R09",
"R10",
"R19",
"R29",
"R29.6",
"R39",
"R40",
"R47",
"R57",
"T95.8",
"W44",
"Y95",
"Z86.3",
"Z87.3",
"Z87.4",
"Z87.7",
"Z87.8",
"T85.8",
"Z91.0",
"Z88.9",
"Z88.8",
"Z88",
"Z92.6",
"N90",
"U82"
),
icd10_description = c("Gastroenteritis and colitis of unspecified origin",
"Other sepsis",
"Streptococcus and staphylococcus as the cause of diseases classified to other chapters",
"Other specified bacterial agents as the cause of diseases classified to other chapters",
"Viral agents as the cause of diseases classified to other chapters",
"Other specified infectious agents as the cause of diseases classified to other chapters",
"Other malignant neoplasms of skin",
"Secondary malignant neoplasm of other and unspecified sites",
"Malignant neoplasm, primary site unknown, so stated",
"Malignant neoplasm, primary site unspecified",
"Multiple myeloma and malignant plasma cell neoplasms",
"Carcinoma in situ of other and unspecified genital organs",
"Benign neoplasm of other and unspecified endocrine glands",
"Neoplasm of uncertain or unknown behaviour of oral cavity and digestive organs",
"Other neoplasms of uncertain or unknown behaviour of lymphoid, haematopoietic and related tissue",
"Neoplasm of uncertain or unknown behaviour of other and unspecified sites",
"Other hypothyroidism",
"Other nontoxic goitre",
"Hyperparathyroidism and other disorders of parathyroid gland",
"Hypofunction and other disorders of pituitary gland",
"Other disorders of adrenal gland",
"Deficiency of other nutrient elements",
"Disorders of porphyrin and bilirubin metabolism",
"Other disorders of fluid, electrolyte and acid-base balance",
"Postprocedural endocrine and metabolic disorders, not elsewhere classified",
"Delirium, not induced by alcohol and other psychoactive substances",
"Mental and behavioural disorders due to use of tobacco",
"Other anxiety disorders",
"Acute stress reaction",
"Mental and behavioural disorders associated with the puerperium, not elsewhere classified",
"Other degenerative diseases of nervous system, not elsewhere classified",
"Transient cerebral ischaemic attacks and related syndromes",
"Vascular syndromes of brain in cerebrovascular diseases",
"Other polyneuropathies",
"Other disorders of brain",
"Other disorders of nervous system in diseases classified elsewhere",
"Other disorders of optic [2nd] nerve and visual pathways",
"Other strabismus",
"Other disorders of eye and adnexa",
"Other diseases of inner ear",
"Other hearing loss",
"Otalgia and effusion of ear",
"Other disorders of ear, not elsewhere classified",
"Multiple valve diseases",
"Other pulmonary heart diseases",
"Other secondary pulmonary hypertension",
"Atrioventricular and left bundle-branch block",
"Other conduction disorders",
"Paroxysmal atrial fibrillation",
"Ventricular fibrillation and flutter",
"Complications and ill-defined descriptions of heart disease",
"Other nontraumatic intracranial haemorrhage",
"Aortic aneurysm and dissection",
"Bacterial pneumonia, not elsewhere classified",
"Pneumonia due to other infectious organisms, not elsewhere classified",
"Other disorders of nose and nasal sinuses",
"Diseases of vocal cords and larynx, not elsewhere classified",
"Pneumonitis due to solids and liquids",
"Other interstitial pulmonary diseases",
"Postprocedural respiratory disorders, not elsewhere classified",
"Respiratory failure, not elsewhere classified",
"Gingivitis and periodontal diseases",
"Dentofacial anomalies [including malocclusion]",
"Other disorders of teeth and supporting structures",
"Cysts of oral region, not elsewhere classified",
"Other diseases of jaws",
"Stomatitis and related lesions",
"Other diseases of lip and oral mucosa",
"Other diseases of oesophagus",
"Barrett oesophagus",
"Other noninfective gastroenteritis and colitis",
"Paralytic ileus and intestinal obstruction without hernia",
"Haemorrhoids and perianal venous thrombosis",
"First degree haemorrhoids",
"Haemorrhoids, unspecified",
"Fibrosis and cirrhosis of liver",
"Other inflammatory liver diseases",
"Acute pancreatitis, unspecified",
"Other diseases of pancreas",
"Postprocedural disorders of digestive system, not elsewhere classified",
"Other postprocedural disorders of digestive system, not elsewhere classified",
"Other diseases of digestive system",
"Cutaneous abscess, furuncle and carbuncle",
"Other bullous disorders",
"Other dermatitis",
"Other nonscarring hair loss",
"Other follicular disorders",
"Other epidermal thickening",
"Stage II decubitus ulcer",
"Other disorders of skin and subcutaneous tissue, not elsewhere classified",
"Seropositive rheumatoid arthritis",
"Other rheumatoid arthritis",
"Psoriatic and enteropathic arthropathies",
"Other crystal arthropathies",
"Other arthritis",
"Arthrosis of first carpometacarpal joint",
"Other arthrosis",
"Other specific joint derangements",
"Other joint disorders, not elsewhere classified",
"Other necrotizing vasculopathies",
"Other systemic involvement of connective tissue",
"Other deforming dorsopathies",
"Other spondylopathies",
"Other dorsopathies, not elsewhere classified",
"Other disorders of muscle",
"Spontaneous rupture of synovium and tendon",
"Other disorders of synovium and tendon",
"Other bursopathies",
"Other enthesopathies",
"Other soft tissue disorders, not elsewhere classified",
"Fibromyalgia",
"Osteoporosis with pathological fracture",
"Osteoporosis without pathological fracture",
"Other disorders of bone density and structure",
"Other disorders of bone",
"Postprocedural musculoskeletal disorders, not elsewhere classified",
"Recurrent and persistent haematuria",
"Chronic kidney disease, stage 3",
"Chronic kidney disease, stage 4",
"Other disorders of kidney and ureter, not elsewhere classified",
"Other disorders of urinary system",
"Other disorders of penis",
"Other female pelvic inflammatory diseases",
"Other inflammation of vagina and vulva",
"Other noninflammatory disorders of cervix uteri",
"Other noninflammatory disorders of vagina",
"Other noninflammatory disorders of vulva and perineum",
"Excessive, frequent and irregular menstruation",
"Other abnormal uterine and vaginal bleeding",
"Pain and other conditions associated with female genital organs and menstrual cycle",
"Postprocedural disorders of genitourinary system, not elsewhere classified",
"Pre-eclampsia",
"Medical abortion",
"Maternal care for other conditions predominantly related to pregnancy",
"Maternal care for known or suspected malpresentation of fetus",
"Maternal care for known or suspected abnormality of pelvic organs",
"Maternal care for other known or suspected fetal problems",
"Labour and delivery complicated by fetal stress [distress]",
"Other complications of labour and delivery, not elsewhere classified",
"Other maternal diseases classifiable elsewhere but complicating pregnancy, childbirth and the puerperium",
"Abnormal blood-pressure reading, without diagnosis",
"Pain in throat and chest",
"Other symptoms and signs involving the circulatory and respiratory systems",
"Abdominal and pelvic pain",
"Other symptoms and signs involving the digestive system and abdomen",
"Other symptoms and signs involving the nervous and musculoskeletal systems",
"Tendency to fall, not elsewhere classified",
"Other symptoms and signs involving the urinary system",
"Somnolence, stupor and coma",
"Speech disturbances, not elsewhere classified",
"Shock, not elsewhere classified",
"Other complications of internal prosthetic devices, implants and grafts, not elsewhere classified",
"Foreign body entering into or through eye or natural orifice",
"Nosocomial condition",
"Personal history of endocrine, nutritional and metabolic diseases",
"Personal history of diseases of the musculoskeletal system and connective tissue",
"Personal history of diseases of the genitourinary system",
"Personal history of congenital malformations, deformations and chromosomal abnormalities",
"Personal history of other specified conditions",
"Other complications of internal prosthetic devices, implants and grafts, not elsewhere classified",
"Personal history of allergy, other than to drugs and biological substances",
"Personal history of allergy to unspecified drugs, medicaments and biological substances",
"Personal history of allergy to other drugs, medicaments and biological substances
",
"Personal history of allergy to drugs, medicaments and biological substances",
"Personal history of chemotherapy for neoplastic disease",
"Other noninflammatory disorders of vulva and perineum",
"Resistance to betalactam antibiotics")
)
manual_icd10_map <-
readxl::read_xlsx(here::here("data/icd/manual_disease_icd10_mappings.xlsx"))
icd10_descriptions =
phecodes |>
select(icd10_code = ICD10,
icd10_description = ICD_DESCRIPTION
) |>
distinct()
# Expand multiple ICD codes into rows
to_add_expanded <- manual_icd10_map |>
mutate(icd10_code = str_split(icd10_code, ",\\s*")) |>
tidyr::unnest(icd10_code)
icd10_descriptions =
bind_rows(
icd10_descriptions,
to_add_expanded |>
select(icd10_code,
icd10_description = icd10_desc),
other_icd10_desc
)
icd10_descriptions = icd10_descriptions |> distinct()
icd10_descriptions =
icd10_descriptions |>
group_by(icd10_code) |>
summarise(icd10_description =
str_flatten(unique(icd10_description), collapse = "; ", na.rm = T),
.groups = "drop"
)
disease_mapping =
left_join(disease_mapping,
icd10_descriptions,
by = "icd10_code",
relationship = "many-to-one",
na_matches = "never"
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = str_replace_all(
icd10_code,
pattern = "R871",
replacement = "R87"
)
)
disease_mapping =
disease_mapping |>
mutate(icd10_description =
ifelse(icd10_code == "R87",
"Abnormal findings in specimens from female genital organs",
icd10_description)
)
# check if any ICD10 codes are missing descriptions
disease_mapping |>
filter(is.na(icd10_description)) |>
head()
# A tibble: 6 × 9
`DISEASE/TRAIT` collected_all_diseas…¹ PUBMED_ID YEAR STUDY_ACCESSION
<chr> <chr> <int> <int> <chr>
1 smoking interaction in… lung cancer 29059373 2017 GCST005910
2 smoking interaction in… lung cancer 29059373 2017 GCST005909
3 smoking interaction in… lung cancer 29059373 2017 GCST005911
4 breast cancer (estroge… breast cancer 29058716 2017 GCST005076
5 breast cancer breast cancer 29058716 2017 GCST005077
6 breast cancer in brca1… breast cancer 29058716 2017 GCST005075
# ℹ abbreviated name: ¹collected_all_disease_terms
# ℹ 4 more variables: icd10_code <chr>, icd10_code_origin <chr>, phecode <dbl>,
# icd10_description <chr>
gwas_study_info |>
filter(grepl("ICD10 F05",
`DISEASE/TRAIT`)) |>
select(`DISEASE/TRAIT`, MAPPED_TRAIT, collected_all_disease_terms, PUBMED_ID)
DISEASE/TRAIT
<char>
1: ICD10 F05: Delirium due to known physiological condition (Gene-based burden)
2: ICD10 F05.9: Delirium, unspecified (Gene-based burden)
3: ICD10 F05.9: Delirium, unspecified
4: ICD10 F05: Delirium due to known physiological condition
MAPPED_TRAIT collected_all_disease_terms PUBMED_ID
<char> <char> <int>
1: alcohol withdrawal delirium alcohol-related disorders 34662886
2: delirium delirium 34662886
3: delirium delirium 34662886
4: alcohol withdrawal delirium alcohol-related disorders 34662886
disease_mapping |>
filter(icd10_code == "F05")
# A tibble: 3 × 9
`DISEASE/TRAIT` collected_all_diseas…¹ PUBMED_ID YEAR STUDY_ACCESSION
<chr> <chr> <int> <int> <chr>
1 behavioral disturbance… atypical behavior 25897833 2015 GCST002863
2 icd10 f05: delirium du… alcohol-related disor… 34662886 2021 GCST90083772
3 icd10 f05: delirium du… alcohol-related disor… 34662886 2021 GCST90079786
# ℹ abbreviated name: ¹collected_all_disease_terms
# ℹ 4 more variables: icd10_code <chr>, icd10_code_origin <chr>, phecode <dbl>,
# icd10_description <chr>
# F05 refers to delirium not induced by alcohol and other psychoactive substances
# yet, the mapped trait is alcohol withdrawal delirium
# replace with delirium
disease_mapping =
disease_mapping |>
mutate(collected_all_disease_terms =
ifelse(icd10_code == "F05",
"delirium",
collected_all_disease_terms)
)
gwas_study_info =
gwas_study_info |>
mutate(MAPPED_TRAIT =
ifelse(grepl("ICD10 F05", `DISEASE/TRAIT`),
str_replace_all(collected_all_disease_terms,
pattern = "alcohol-related disorders",
replacement = "delirium"),
collected_all_disease_terms)
)
disease_mapping =
disease_mapping |>
mutate(icd10_code = sub("^([^-]+)-\\1$", "\\1", icd10_code))
disease_mapping =
disease_mapping |>
arrange(YEAR,
PUBMED_ID,
STUDY_ACCESSION,
collected_all_disease_terms,
icd10_code)
fwrite(disease_mapping,
here::here("output/icd_map/gwas_disease_to_icd10_mapping.csv")
)
sessionInfo()
R version 4.3.1 (2023-06-16)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS 15.7.3
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
time zone: America/Los_Angeles
tzcode source: internal
attached base packages:
[1] stats graphics grDevices datasets utils methods base
other attached packages:
[1] jsonlite_2.0.0 httr_1.4.7 data.table_1.17.8 stringr_1.6.0
[5] dplyr_1.1.4 workflowr_1.7.1
loaded via a namespace (and not attached):
[1] bit_4.6.0 compiler_4.3.1 BiocManager_1.30.26
[4] renv_1.0.3 promises_1.3.3 tidyselect_1.2.1
[7] Rcpp_1.1.0 git2r_0.36.2 tidyr_1.3.1
[10] callr_3.7.6 later_1.4.4 jquerylib_0.1.4
[13] readxl_1.4.5 yaml_2.3.10 fastmap_1.2.0
[16] here_1.0.1 R6_2.6.1 generics_0.1.4
[19] knitr_1.50 tibble_3.3.0 rprojroot_2.1.0
[22] bslib_0.9.0 pillar_1.11.1 rlang_1.1.6
[25] utf8_1.2.6 cachem_1.1.0 stringi_1.8.7
[28] httpuv_1.6.16 xfun_0.55 getPass_0.2-4
[31] fs_1.6.6 sass_0.4.10 bit64_4.6.0-1
[34] cli_3.6.5 withr_3.0.2 magrittr_2.0.4
[37] ps_1.9.1 digest_0.6.37 processx_3.8.6
[40] rstudioapi_0.17.1 lifecycle_1.0.4 vctrs_0.6.5
[43] evaluate_1.0.5 glue_1.8.0 cellranger_1.1.0
[46] whisker_0.4.1 purrr_1.1.0 rmarkdown_2.30
[49] tools_4.3.1 pkgconfig_2.0.3 htmltools_0.5.8.1