Last updated: 2025-10-09

Checks: 7 0

Knit directory: genomics_ancest_disease_dispar/

This reproducible R Markdown analysis was created with workflowr (version 1.7.1). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.

R Markdown file: up-to-date

Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Environment: empty

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

Seed: set.seed(20220216)

The command set.seed(20220216) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Session information: recorded

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Cache: none

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

File paths: relative

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Repository version: 8e94754

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version 8e94754. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .DS_Store
    Ignored:    .Rproj.user/
    Ignored:    analysis/.DS_Store
    Ignored:    data/.DS_Store
    Ignored:    data/gbd/.DS_Store
    Ignored:    data/gbd/IHME-GBD_2021_DATA-d8cf695e-1.csv
    Ignored:    data/gbd/ihme_gbd_2019_global_disease_burden_rate_all_ages.csv
    Ignored:    data/gbd/ihme_gbd_2019_global_paf_rate_percent_all_ages.csv
    Ignored:    data/gbd/ihme_gbd_2021_global_disease_burden_rate_all_ages.csv
    Ignored:    data/gbd/ihme_gbd_2021_global_paf_rate_percent_all_ages.csv
    Ignored:    data/gwas_catalog/
    Ignored:    data/icd/.DS_Store
    Ignored:    data/icd/IHME_GBD_2019_COD_CAUSE_ICD_CODE_MAP_Y2020M10D15.XLSX
    Ignored:    data/icd/IHME_GBD_2019_NONFATAL_CAUSE_ICD_CODE_MAP_Y2020M10D15.XLSX
    Ignored:    data/icd/IHME_GBD_2021_COD_CAUSE_ICD_CODE_MAP_Y2024M05D16.XLSX
    Ignored:    data/icd/IHME_GBD_2021_NONFATAL_CAUSE_ICD_CODE_MAP_Y2024M05D16.XLSX
    Ignored:    data/icd/UK_Biobank_master_file.tsv
    Ignored:    data/icd/cdc_valid_icd10_Sep_23_2025.xlsx
    Ignored:    data/icd/cdc_valid_icd9_Sep_23_2025.xlsx
    Ignored:    data/icd/manual_disease_icd10_mappings.xlsx
    Ignored:    data/icd/phecode_international_version_unrolled.csv
    Ignored:    data/icd/semiautomatic_ICD-pheno.txt
    Ignored:    data/icd/~$IHME_GBD_2019_COD_CAUSE_ICD_CODE_MAP_Y2020M10D15.XLSX
    Ignored:    data/icd/~$IHME_GBD_2019_NONFATAL_CAUSE_ICD_CODE_MAP_Y2020M10D15.XLSX
    Ignored:    data/who/
    Ignored:    output/.DS_Store
    Ignored:    output/gwas_cat/
    Ignored:    output/gwas_study_info_cohort_corrected.csv
    Ignored:    output/gwas_study_info_trait_corrected.csv
    Ignored:    output/gwas_study_info_trait_ontology_info.csv
    Ignored:    output/gwas_study_info_trait_ontology_info_l1.csv
    Ignored:    output/gwas_study_info_trait_ontology_info_l2.csv
    Ignored:    output/icd_map/
    Ignored:    output/trait_ontology/
    Ignored:    renv/
    Ignored:    sup_table.xlsx

Unstaged changes:
    Modified:   analysis/disease_inves_by_ancest.Rmd
    Modified:   analysis/gbd_data_plots.Rmd
    Modified:   analysis/index.Rmd
    Modified:   analysis/level_1_disease_group_non_cancer.Rmd
    Modified:   analysis/level_2_disease_group.Rmd
    Modified:   analysis/trait_ontology_categorization.Rmd
    Modified:   data/icd/README.md

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.

These are the previous versions of the repository in which changes were made to the R Markdown (analysis/gwas_to_gbd.Rmd) and HTML (docs/gwas_to_gbd.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File	Version	Author	Date	Message
Rmd	8e94754	IJbeasley	2025-10-09	Integrating gwas traits with ICD10

Set up

library(dplyr)


Attaching package: 'dplyr'

The following objects are masked from 'package:stats':

    filter, lag

The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union

library(stringr)
library(purrr)
library(tidyr)
library(data.table)


Attaching package: 'data.table'

The following object is masked from 'package:purrr':

    transpose

The following objects are masked from 'package:dplyr':

    between, first, last

Load GBD 2019 data

gbd_2019 <- readxl::read_xlsx(here::here("data/icd/IHME_GBD_2019_NONFATAL_CAUSE_ICD_CODE_MAP_Y2020M10D15.XLSX"), 
                               skip = 1)

gbd_2019 = 
  gbd_2019 |>
  rename_with(~str_replace_all(., " ", "_")) |>
  rename_with(~ str_replace_all(., "\\\\", "_")) |>
  rename_with(~ tolower(.))

gbd_2019 = 
gbd_2019 |>
  filter(`...1` != 0)

Load GBD 2021 data

gbd_2021 <- readxl::read_xlsx(here::here("data/icd/IHME_GBD_2021_NONFATAL_CAUSE_ICD_CODE_MAP_Y2024M05D16.XLSX"), 
                               skip = 1)

gbd_2021 = 
  gbd_2021 |>
  rename_with(~str_replace_all(., " ", "_")) |>
  rename_with(~ str_replace_all(., "\\\\", "_")) |>
  rename_with(~ tolower(.))


cause_level = 
  gbd_2021 |>
  select(cause_hierarchy_level, cause = cause_name)

Add cause level to 2019 data (using 2021)

gbd_2019 = 
left_join(
  gbd_2019,
  cause_level)


# check if any cause_hierarchy_level is NA
# meaning not filled in by GBD 2021 
gbd_2019 |>
  filter(is.na(cause_hierarchy_level))

# Pertussis == Wooping cough
# this is a level 3 cause

gbd_2019 = 
  gbd_2019 |>
  mutate(cause_hierarchy_level = 
         ifelse(cause == "Whooping cough",
                3,
                cause_hierarchy_level)
         )

# Other direct maternal disorders appears (GBD 2021) to be equiv to other maternal disorders (GBD 2019)
# cause_hierarchy_level = 4
gbd_2019 = 
  gbd_2019 |>
  mutate(cause_hierarchy_level = 
         ifelse(cause == "Other maternal disorders",
                4,
                cause_hierarchy_level)
         )

# Urinary tract infections and interstitial nephritis (GBD 2021) appears to be equiv to Urinary tract infections (GBD 2019)
# cause_hierarchy_level = 4
gbd_2019 = 
  gbd_2019 |>
  mutate(cause_hierarchy_level = 
         ifelse(cause == "Urinary tract infections",
                4,
                cause_hierarchy_level)
         )

# Peripheral artery disease
# likely level = 3
gbd_2019 = 
  gbd_2019 |>
  mutate(cause_hierarchy_level = 
         ifelse(cause == "Peripheral artery disease",
                3,
                cause_hierarchy_level)
         )

# Police conflict and executions (GBD 2021) equiv to Executions and police conflict (GBD 2019)
# level = 3
gbd_2019 = 
  gbd_2019 |>
  mutate(cause_hierarchy_level = 
         ifelse(cause == "Executions and police conflict",
                3,
                cause_hierarchy_level)
         )


# Edentulism and severe tooth loss
# level = 3
gbd_2019 = 
  gbd_2019 |>
  mutate(cause_hierarchy_level = 
         ifelse(cause == "Edentulism and severe tooth loss",
                3,
                cause_hierarchy_level)
         )

# check that all cause levels filled in
gbd_2019 |>
  filter(is.na(cause_hierarchy_level))

Prepare GBD 2019 data for merging with GWAS traits

Tidy ICD codes

Make long format, separate multiple ICD code ranges into rows

gbd_2021 = 
  gbd_2021 |>
  rename(icd10_code = icd10,
         icd10_code_used_in_hosp = `icd10_used_in_hospital/claims_analyses`) |>
  select(-1)

# expand multiple ICD codes into rows
gbd_2021 = 
  gbd_2021 |>
  mutate(icd10_code = str_split(icd10_code, ",\\s*")) |>
  tidyr::unnest(icd10_code) 

# expand multiple ICD codes into rows
# gbd_2021 = 
#   gbd_2021 |>
#   mutate(icd10_code_used_in_hosp = str_split(icd10_code_used_in_hosp, ",\\s*")) |>
#   tidyr::unnest(icd10_code_used_in_hosp) 

gbd_2021 = 
gbd_2021 |>
  tidyr::pivot_longer(c(#"icd10_code_used_in_hosp", 
                 "icd10_code"),
               names_to = "icd10_code_type", 
               values_to = "icd10_code")

gbd_2021 =
  gbd_2021 |>
  select(icd10_code,
         cause_name,
         cause_hierarchy_level) |>
  distinct()

gbd_2021 =
  gbd_2021 |>
  tidyr::separate_wider_delim(col = "icd10_code",
                        delim = "-", 
                       names = c("start_icd10_code", "end_icd10_code"),
                       too_few = "align_end"
                       ) |>
  mutate(start_icd10_code = 
         ifelse(is.na(start_icd10_code),
                end_icd10_code,
                start_icd10_code)
         ) 


head(gbd_2021)

# A tibble: 6 × 4
  start_icd10_code end_icd10_code cause_name               cause_hierarchy_level
  <chr>            <chr>          <chr>                                    <dbl>
1 A50              A60.9          HIV/AIDS and sexually t…                     2
2 A63              A64.0          HIV/AIDS and sexually t…                     2
3 B20              B23.8          HIV/AIDS and sexually t…                     2
4 B24              B24.0          HIV/AIDS and sexually t…                     2
5 B63              B63            HIV/AIDS and sexually t…                     2
6 B97.81           B97.81         HIV/AIDS and sexually t…                     2

Where icd10 code ranges are the same, but missing decimal places, add .9 to end code (to help with filtering ranges)

gbd_2021 |>
  filter(start_icd10_code == end_icd10_code &
         !grepl(".", start_icd10_code, fixed = TRUE)
         ) |>
  arrange(start_icd10_code) |>
  head()

# A tibble: 6 × 4
  start_icd10_code end_icd10_code cause_name               cause_hierarchy_level
  <chr>            <chr>          <chr>                                    <dbl>
1 A09              A09            Enteric infections                           2
2 A09              A09            Diarrheal diseases                           3
3 A70              A70            Respiratory infections …                     2
4 A70              A70            Lower respiratory infec…                     3
5 A76              A76            Other infectious diseas…                     2
6 A76              A76            Other unspecified infec…                     3

gbd_2021 = 
gbd_2021 |>
  mutate(end_icd10_code =
         ifelse(
         start_icd10_code == end_icd10_code &
         !grepl(".", start_icd10_code, fixed = TRUE),
         paste0(end_icd10_code, ".9"),
         end_icd10_code
         ) 
  )

Make sure start_icd10_code and end_icd10_code start with the same letter

gbd_2021 |> 
  filter(end_icd10_code == "N95.9")

# A tibble: 3 × 4
  start_icd10_code end_icd10_code cause_name               cause_hierarchy_level
  <chr>            <chr>          <chr>                                    <dbl>
1 N88              N95.9          Other non-communicable …                     2
2 N88              N95.9          Gynecological diseases                       3
3 N95.1            N95.9          Other gynecological dis…                     4

# check start_icd10_code and end_icd10_code start with the same letter
# if not, need to fix these rows
  gbd_2021 |>
  filter(str_extract(start_icd10_code, "^[A-Z]") != 
         str_extract(end_icd10_code, "^[A-Z]")
         )

# A tibble: 6 × 4
  start_icd10_code end_icd10_code cause_name               cause_hierarchy_level
  <chr>            <chr>          <chr>                                    <dbl>
1 O98.8            P22.9          Maternal and neonatal d…                     2
2 C8               D24.9          Neoplasms                                    2
3 W49              X58.9          Unintentional injuries                       2
4 T74.2            U03            Self-harm and interpers…                     2
5 X66              Y08.9          Self-harm and interpers…                     2
6 X85              Y08.9          Interpersonal violence                       3

gbd_2021 =
  gbd_2021 |>
  mutate(start_icd10_code = 
        ifelse(start_icd10_code == "W99" & 
               end_icd10_code == "X06.9",
               "X00.0",
               start_icd10_code)
         )

gbd_2021 =
  gbd_2021 |>
  add_row(
    start_icd10_code = "W99",
    end_icd10_code = "W99",
    cause_name = "Unintentional injuries",
    cause_hierarchy_level = 2
  )

gbd_2021 =
  gbd_2021 |>
  mutate(end_icd10_code = 
        ifelse(start_icd10_code == "O98.8" & 
               end_icd10_code == "P05.9",
               "O99.9",
               end_icd10_code)
         )


gbd_2021 =
  gbd_2021 |>
  add_row(
    start_icd10_code = "P00",
    end_icd10_code = "P05.9",
    cause_name = "Maternal and neonatal disorders",
    cause_hierarchy_level = 2
  )

gbd_2021 =
  gbd_2021 |>
  mutate(end_icd10_code = 
        ifelse(start_icd10_code == "T74.2" & 
               end_icd10_code == "U03",
               "T98.3",
               end_icd10_code)
         )

gbd_2021 =
  gbd_2021 |>
  add_row(
    start_icd10_code = "U00",
    end_icd10_code = "U03",
    cause_name = "Self-harm and interpersonal violence",
    cause_hierarchy_level = 2
  )

gbd_2021 =
  gbd_2021 |>
  mutate(end_icd10_code = 
        ifelse(start_icd10_code == "X66" & 
               end_icd10_code == "Y08.9",
               "X99",
               end_icd10_code)
         )

gbd_2021 =
  gbd_2021 |>
  add_row(
    start_icd10_code = "Y00",
    end_icd10_code = "Y08.9",
    cause_name = "Self-harm and interpersonal violence",
    cause_hierarchy_level = 2
  )

gbd_2021 =
  gbd_2021 |>
  mutate(end_icd10_code = 
        ifelse(start_icd10_code == "X85" & 
               end_icd10_code == "Y08.9",
               "X99",
               end_icd10_code)
         )

gbd_2021 =
  gbd_2021 |>
  add_row(
    start_icd10_code = "Y00",
    end_icd10_code = "Y08.9",
    cause_name = "Interpersonal violence",
    cause_hierarchy_level = 2
  )

  gbd_2021 |>
  filter(str_extract(start_icd10_code, "^[A-Z]") != 
         str_extract(end_icd10_code, "^[A-Z]")
         )

# A tibble: 3 × 4
  start_icd10_code end_icd10_code cause_name               cause_hierarchy_level
  <chr>            <chr>          <chr>                                    <dbl>
1 O98.8            P22.9          Maternal and neonatal d…                     2
2 C8               D24.9          Neoplasms                                    2
3 W49              X58.9          Unintentional injuries                       2

    gbd_2021 |>
    filter(
         is.na(end_icd10_code)
         | is.na(start_icd10_code)) |>
    head()

# A tibble: 6 × 4
  start_icd10_code end_icd10_code cause_name               cause_hierarchy_level
  <chr>            <chr>          <chr>                                    <dbl>
1 <NA>             <NA>           HIV/AIDS - Multidrug-re…                     4
2 <NA>             <NA>           HIV/AIDS - Extensively …                     4
3 <NA>             <NA>           Latent tuberculosis inf…                     4
4 <NA>             <NA>           Extensively drug-resist…                     4
5 <NA>             <NA>           COVID-19                                     3
6 <NA>             <NA>           Liver cancer due to hep…                     4

  # some nas ... 
  # remove these rows at least for now
  
  gbd_2021 =
    gbd_2021 |>
    filter(
      !is.na(end_icd10_code)
      & !is.na(start_icd10_code))

Seperate numeric and letter parts of ICD codes (to help with filtering ranges)

gbd_2021 =
  gbd_2021 |>
  mutate(icd10_code_letter = str_extract(start_icd10_code, "^[A-Z]")) 

gbd_2021 =
  gbd_2021 |>
  mutate(start_icd10_code_num = as.numeric(str_remove(start_icd10_code, "^[A-Z]")),
         end_icd10_code_num = as.numeric(str_remove(end_icd10_code, "^[A-Z]"))
         )

Warning: There were 2 warnings in `mutate()`.
The first warning was:
ℹ In argument: `start_icd10_code_num = as.numeric(str_remove(start_icd10_code,
  "^[A-Z]"))`.
Caused by warning:
! NAs introduced by coercion
ℹ Run `dplyr::last_dplyr_warnings()` to see the 1 remaining warning.

gbd_2021 |> 
  filter(icd10_code_letter == "N" & 
           start_icd10_code_num <= 93.0 & 
           end_icd10_code_num >= 93.0)

# A tibble: 3 × 7
  start_icd10_code end_icd10_code cause_name               cause_hierarchy_level
  <chr>            <chr>          <chr>                                    <dbl>
1 N88              N95.9          Other non-communicable …                     2
2 N88              N95.9          Gynecological diseases                       3
3 N92              N93.9          Premenstrual syndrome                        4
# ℹ 3 more variables: icd10_code_letter <chr>, start_icd10_code_num <dbl>,
#   end_icd10_code_num <dbl>

gbd_2021 |> 
  filter(end_icd10_code == "N95.9")

# A tibble: 3 × 7
  start_icd10_code end_icd10_code cause_name               cause_hierarchy_level
  <chr>            <chr>          <chr>                                    <dbl>
1 N88              N95.9          Other non-communicable …                     2
2 N88              N95.9          Gynecological diseases                       3
3 N95.1            N95.9          Other gynecological dis…                     4
# ℹ 3 more variables: icd10_code_letter <chr>, start_icd10_code_num <dbl>,
#   end_icd10_code_num <dbl>

gbd_2021 |>
  filter(start_icd10_code_num <= 93.0 &
         end_icd10_code_num >= 93.0 &
         icd10_code_letter == "N"
         )

# A tibble: 3 × 7
  start_icd10_code end_icd10_code cause_name               cause_hierarchy_level
  <chr>            <chr>          <chr>                                    <dbl>
1 N88              N95.9          Other non-communicable …                     2
2 N88              N95.9          Gynecological diseases                       3
3 N92              N93.9          Premenstrual syndrome                        4
# ℹ 3 more variables: icd10_code_letter <chr>, start_icd10_code_num <dbl>,
#   end_icd10_code_num <dbl>

Merge GWAS traits with GBD causes

Load Disease Mapping

disease_mapping <- data.table::fread(
here::here("output/icd_map/gwas_disease_to_icd10_mapping.csv")
)

disease_mapping =
  disease_mapping |>
    mutate(icd10_code_letter = str_extract(icd10_code, "^[A-Z]")) |>
    mutate(icd10_code_num = as.numeric(str_remove(icd10_code, "^[A-Z]")))

Warning: There was 1 warning in `mutate()`.
ℹ In argument: `icd10_code_num = as.numeric(str_remove(icd10_code, "^[A-Z]"))`.
Caused by warning:
! NAs introduced by coercion

disease_mapping = 
  disease_mapping |>
  mutate(icd10_code_num = as.numeric(icd10_code_num))

Join disease mapping with GBD 2019 data

disease_mapping_with_cause <- disease_mapping |>
  rowwise() |>
  mutate(
    cause = 
      list(
      gbd_2021$cause_name[
      which(icd10_code_num >= gbd_2021$start_icd10_code_num &
            icd10_code_num <= gbd_2021$end_icd10_code_num & 
            icd10_code_letter == gbd_2021$icd10_code_letter)
      ]),
    cause_hierarchy_level  = 
       list(
            gbd_2021$cause_hierarchy_level[
      which(icd10_code_num >= gbd_2021$start_icd10_code_num &
            icd10_code_num <= gbd_2021$end_icd10_code_num & 
            icd10_code_letter == gbd_2021$icd10_code_letter)
      ])
  ) |>
  ungroup()


disease_mapping_with_cause = 
  disease_mapping_with_cause |>
  tidyr::unnest(c(cause, cause_hierarchy_level),
                keep_empty = TRUE
                )

Checking hierarchy levels

disease_mapping_with_cause = 
  disease_mapping_with_cause |>
  filter(!is.na(cause))

# filtering na to deal with pivoting 
disease_mapping_with_cause_grouped = 
disease_mapping_with_cause |>
select(-icd10_code_num) |>
tidyr::pivot_wider(
            id_cols = c("collected_all_disease_terms", "icd10_code", "icd10_description"),
            names_from = cause_hierarchy_level, 
            names_glue = "l{.name}_{.value}",
            values_from = cause,
            values_fn = ~paste0(unique(.x), collapse = ", ")
            )

# check which l2 causes are NA but l3 causes are not NA
# can use l3 causes therefore to fill in l2 causes
disease_mapping_with_cause_grouped |> 
  rowwise() |> 
  filter(is.na(l2_cause) & !is.na(l3_cause)) |>
  head()

# A tibble: 6 × 6
# Rowwise: 
  collected_all_diseas…¹ icd10_code icd10_description l2_cause l3_cause l4_cause
  <chr>                  <chr>      <chr>             <chr>    <chr>    <chr>   
1 abnormal delivery      P20        Intrauterine hyp… <NA>     Neonata… Neonata…
2 abnormal delivery      P20.9      Intrauterine hyp… <NA>     Neonata… Neonata…
3 abnormal delivery      P21        Birth asphyxia    <NA>     Neonata… Neonata…
4 abnormal delivery      P21.0      Severe birth asp… <NA>     Neonata… Neonata…
5 abnormal delivery      P21.1      Mild and moderat… <NA>     Neonata… Neonata…
6 abnormal delivery      P21.9      Birth asphyxia, … <NA>     Neonata… Neonata…
# ℹ abbreviated name: ¹collected_all_disease_terms

neoplasms = c("Other neoplasms", 
                                    "Hodgkin lymphoma",
                                    "Leukemia",
                                    "Non-Hodgkin lymphoma",
              "Multiple myeloma")

disease_mapping_with_cause_grouped = 
  disease_mapping_with_cause_grouped |>
  rowwise() |>
  mutate(l2_cause = 
           list(ifelse(is.na(l2_cause),
                  case_when(map_lgl(l3_cause, ~ .x %in% neoplasms) ~ "Neoplasms",
                            map_lgl(l3_cause, ~ .x %in% "Neonatal disorders") ~ "Maternal and neonatal disorders",
                            map_lgl(l3_cause, ~ .x %in% "Other cardiovascular and circulatory diseases") ~ "Cardiovascular diseases"
                            ),
                  l2_cause
           )
           )) |>
  ungroup()
          

# to_map_manually <- 
# disease_mapping_with_cause_grouped |> 
#   rowwise() |> 
#   filter(is.null(l2_cause)) |> 
#   pull(collected_all_disease_terms) |> 
#   unique()
# 
# 
# update_unmapped_cause = function(df, 
#                                  unmapped_term, 
#                                  l2_cause_rep, 
#                                  l3_cause_rep){
#   
#   df = 
#   df |>
#     mutate(l3_cause = 
#            ifelse(collected_all_disease_terms == unmapped_term,
#                   l3_cause_rep,
#                   l3_cause
#            )) |>
#     mutate(l2_cause = 
#            ifelse(collected_all_disease_terms == unmapped_term,
#                   l2_cause_rep,
#                   l2_cause
#            ))    
#            
#     return(df)
#            
# }
# 
# disease_mapping_with_cause_grouped = 
# update_unmapped_cause(disease_mapping_with_cause_grouped, 
#                       "bone cancer",
#                       "Neoplasms",
#                       "Malignant neoplasms of bone and articular cartilage")

Save + combine with GWAS data

disease_mapping_orig <- data.table::fread(
here::here("output/icd_map/gwas_disease_to_icd10_mapping.csv")
)

disease_mapping_with_cause_grouped =
  disease_mapping_with_cause_grouped |>
  tidyr::unnest_longer(c("l2_cause", "l3_cause", "l4_cause")) 

disease_mapping_final = 
  left_join(
    disease_mapping_orig,
    disease_mapping_with_cause_grouped,
    by = c("collected_all_disease_terms", 
           "icd10_code", 
           "icd10_description")
  )

gwas_study_info <- fread(here::here("output/gwas_cat/gwas_study_info_group_v2.csv"))

gwas_study_info =
  gwas_study_info |>
  select(`DISEASE/TRAIT`, PUBMED_ID, STUDY_ACCESSION, COHORT)

disease_mapping_final =
  left_join(
    disease_mapping_final,
    gwas_study_info
  )

Joining with `by = join_by(`DISEASE/TRAIT`, PUBMED_ID, STUDY_ACCESSION)`

data.table::fwrite(
disease_mapping_final,
here::here("output/icd_map/gwas_disease_to_icd10_mapping_gbd_cat.csv")
)

gbd_data <- data.table::fread(here::here("data/gbd/ihme_gbd_2019_global_disease_burden_rate_all_ages.csv"))

gbd_data = gbd_data |> 
           dplyr::filter(metric == "Rate") |>
           dplyr::filter(measure == "DALYs (Disability-Adjusted Life Years)") |> 
           dplyr::filter(year == 2019)

sessionInfo()

R version 4.3.1 (2023-06-16)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS 15.6.1

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: America/Los_Angeles
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices datasets  utils     methods   base     

other attached packages:
[1] data.table_1.17.8 tidyr_1.3.1       purrr_1.1.0       stringr_1.5.1    
[5] dplyr_1.1.4       workflowr_1.7.1  

loaded via a namespace (and not attached):
 [1] jsonlite_2.0.0    compiler_4.3.1    renv_1.0.3        promises_1.3.3   
 [5] tidyselect_1.2.1  Rcpp_1.1.0        git2r_0.36.2      callr_3.7.6      
 [9] later_1.4.2       jquerylib_0.1.4   readxl_1.4.5      yaml_2.3.10      
[13] fastmap_1.2.0     here_1.0.1        R6_2.6.1          generics_0.1.4   
[17] knitr_1.50        tibble_3.3.0      rprojroot_2.1.0   bslib_0.9.0      
[21] pillar_1.11.0     rlang_1.1.6       utf8_1.2.6        cachem_1.1.0     
[25] stringi_1.8.7     httpuv_1.6.16     xfun_0.52         getPass_0.2-4    
[29] fs_1.6.6          sass_0.4.10       cli_3.6.5         withr_3.0.2      
[33] magrittr_2.0.3    ps_1.9.1          digest_0.6.37     processx_3.8.6   
[37] rstudioapi_0.17.1 lifecycle_1.0.4   vctrs_0.6.5       evaluate_1.0.4   
[41] glue_1.8.0        cellranger_1.1.0  whisker_0.4.1     rmarkdown_2.29   
[45] httr_1.4.7        tools_4.3.1       pkgconfig_2.0.3   htmltools_0.5.8.1

Matching GWAS traits to GBD causes / diseases

Isobel Beasley

2025-10-07