Last updated: 2025-12-28
Checks: 7 0
Knit directory:
genomics_ancest_disease_dispar/
This reproducible R Markdown analysis was created with workflowr (version 1.7.1). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.
Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.
Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.
The command set.seed(20220216) was run prior to running
the code in the R Markdown file. Setting a seed ensures that any results
that rely on randomness, e.g. subsampling or permutations, are
reproducible.
Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.
Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.
Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.
The results in this page were generated with repository version 4edd22a. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.
Note that you need to be careful to ensure that all relevant files for
the analysis have been committed to Git prior to generating the results
(you can use wflow_publish or
wflow_git_commit). workflowr only checks the R Markdown
file, but you know if there are other scripts or data files that it
depends on. Below is the status of the Git repository when the results
were generated:
Ignored files:
Ignored: .DS_Store
Ignored: .Rproj.user/
Ignored: .venv/
Ignored: analysis/.DS_Store
Ignored: ancestry_dispar_env/
Ignored: data/.DS_Store
Ignored: data/cohort/
Ignored: data/gbd/.DS_Store
Ignored: data/gbd/IHME-GBD_2021_DATA-d8cf695e-1.csv
Ignored: data/gbd/IHME-GBD_2023_DATA-73cc01fd-1.csv
Ignored: data/gbd/ihme_gbd_2019_global_disease_burden_rate_all_ages.csv
Ignored: data/gbd/ihme_gbd_2019_global_paf_rate_percent_all_ages.csv
Ignored: data/gbd/ihme_gbd_2021_global_disease_burden_rate_all_ages.csv
Ignored: data/gbd/ihme_gbd_2021_global_paf_rate_percent_all_ages.csv
Ignored: data/gwas_catalog/
Ignored: data/icd/.DS_Store
Ignored: data/icd/IHME_GBD_2019_COD_CAUSE_ICD_CODE_MAP_Y2020M10D15.XLSX
Ignored: data/icd/IHME_GBD_2019_NONFATAL_CAUSE_ICD_CODE_MAP_Y2020M10D15.XLSX
Ignored: data/icd/IHME_GBD_2021_COD_CAUSE_ICD_CODE_MAP_Y2024M05D16.XLSX
Ignored: data/icd/IHME_GBD_2021_NONFATAL_CAUSE_ICD_CODE_MAP_Y2024M05D16.XLSX
Ignored: data/icd/UK_Biobank_master_file.tsv
Ignored: data/icd/cdc_valid_icd10_Sep_23_2025.xlsx
Ignored: data/icd/cdc_valid_icd9_Sep_23_2025.xlsx
Ignored: data/icd/hp_umls_mapping.csv
Ignored: data/icd/lancet_conditions_icd10.xlsx
Ignored: data/icd/manual_disease_icd10_mappings.xlsx
Ignored: data/icd/mondo_umls_mapping.csv
Ignored: data/icd/phecode_international_version_unrolled.csv
Ignored: data/icd/phecode_to_icd10_manual_mapping.xlsx
Ignored: data/icd/semiautomatic_ICD-pheno.txt
Ignored: data/icd/semiautomatic_ICD-pheno_UKB_subset.txt
Ignored: human_dictionary/
Ignored: igsr_populations.tsv
Ignored: output/.DS_Store
Ignored: output/abstracts/
Ignored: output/doccano/
Ignored: output/fulltexts/
Ignored: output/gwas_cat/
Ignored: output/gwas_cohorts/
Ignored: output/icd_map/
Ignored: output/trait_ontology/
Ignored: pubmedbert-cohort-ner-model/
Ignored: pubmedbert-cohort-ner/
Ignored: r-spacyr/
Ignored: renv/
Ignored: venv/
Untracked files:
Untracked: code/extract_cdc_meta.R
Untracked: code/figure_4a.R
Untracked: code/poster_figures.R
Untracked: code/umls_ontology.R
Untracked: data/cdc/
Untracked: data/icd/2025AA/
Untracked: data/icd/umls-2025AA-mrconso.zip
Untracked: figures/
Untracked: visualization.Rdata
Unstaged changes:
Modified: analysis/disease_inves_by_ancest.Rmd
Modified: analysis/get_full_text.Rmd
Modified: analysis/group_cancer_diseases.Rmd
Modified: analysis/gwas_to_gbd.Rmd
Modified: analysis/index.Rmd
Modified: analysis/level_1_disease_group_non_cancer.Rmd
Modified: analysis/level_2_disease_group.Rmd
Modified: analysis/manual_trait_map_icd10.Rmd
Modified: analysis/map_trait_to_icd10.Rmd
Modified: analysis/missing_cohort_info.Rmd
Modified: analysis/replication_ancestry_bias.Rmd
Modified: analysis/text_for_cohort_labels.Rmd
Modified: analysis/trait_ontology_categorization.Rmd
Modified: code/custom_plotting.R
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
These are the previous versions of the repository in which changes were
made to the R Markdown
(analysis/correcting_cohort_names.Rmd) and HTML
(docs/correcting_cohort_names.html) files. If you’ve
configured a remote Git repository (see ?wflow_git_remote),
click on the hyperlinks in the table below to view the files as they
were in that past version.
| File | Version | Author | Date | Message |
|---|---|---|---|---|
| Rmd | 4edd22a | IJbeasley | 2025-12-28 | Adding more cohorts to cohort dictionary |
| html | bae17f0 | IJbeasley | 2025-12-28 | Build site. |
| Rmd | 9a3bb9b | IJbeasley | 2025-12-28 | Update fixing missing cohort information |
| html | aab4928 | IJbeasley | 2025-10-28 | Build site. |
| Rmd | d088104 | IJbeasley | 2025-10-28 | More cohort name correcting |
| html | 1b07ce8 | IJbeasley | 2025-10-28 | Build site. |
| Rmd | cd3b8d8 | IJbeasley | 2025-10-28 | More cohort name correcting |
| html | 0b43415 | IJbeasley | 2025-10-15 | Build site. |
| Rmd | fcd0501 | IJbeasley | 2025-10-15 | workflowr::wflow_publish("analysis/correcting_cohort_names.Rmd") |
| html | b33ca74 | IJbeasley | 2025-08-21 | Build site. |
| Rmd | ac13d70 | IJbeasley | 2025-08-21 | Updating correcting cohort labels |
| html | 6c592b7 | IJbeasley | 2025-08-20 | Build site. |
| Rmd | 1969e6b | IJbeasley | 2025-08-20 | More corrections / harmonisation of cohort names in gwas catalog |
knitr::opts_chunk$set(echo = TRUE, message = FALSE, warning = FALSE)
library(data.table)
library(dplyr)
library(ggplot2)
library(stringr)
# Load GWAS Catalog studies
gwas_study_info <- fread(here::here("data/gwas_catalog/gwas-catalog-v1.0.3.1-studies-r2025-07-21.tsv"),
sep = "\t", quote = "")
# Standardize column names (remove spaces)
gwas_study_info <- gwas_study_info |>
rename_all(~gsub(" ", "_", .x))
gwas_study_info <- gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, " \\| ", "|")) |>
mutate(COHORT = str_replace_all(COHORT, "\\| ", "|")) |>
mutate(COHORT = str_replace_all(COHORT, " \\|", "|"))
# some use commas instead of | to designate multiple cohorts
gwas_study_info <- gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, ", ", "|"))
# making "multiple" designation to be the same
gwas_study_info <- gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT,
"(\\(Multiple cohorts\\))|(\\(multiple\\))|Multiple",
"multiple"))
# number of cohorts listed as 'multiple'
gwas_study_info |>
filter(grepl("multiple", COHORT)) |>
group_by(COHORT) |>
summarise(n = n())
# A tibble: 1 × 2
COHORT n
<chr> <int>
1 multiple 320
# remove cohorts listed as multiple
gwas_study_info =
gwas_study_info |>
mutate(COHORT = stringr::str_remove_all(pattern = "multiple",
string = COHORT)
)
gwas_study_info =
gwas_study_info |>
mutate(COHORT = stringr::str_remove_all(pattern = "multiple",
string = COHORT)
)
# making others be the same
gwas_study_info <- gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, "Other", "other")) |>
mutate(COHORT = str_replace_all(COHORT, "OTHER", "other")) |>
mutate(COHORT = str_replace_all(COHORT, "others", "other"))
# number of cohorts listed as other
gwas_study_info |>
filter(grepl("other", COHORT)) |>
group_by(COHORT) |>
summarise(n = n()) |>
arrange(desc(n)) |>
head()
# A tibble: 6 × 2
COHORT n
<chr> <int>
1 other 8529
2 Lifelines|other 2863
3 ALSPAC|other 1048
4 UPENN|other 63
5 ADNI|ALS|BDC|BIG|BrainSCALE|Generation_R|IMAGEN|NCNG|NESDA|NeuroIMAGE|N… 56
6 other|CADD|DTR|FTC|MyCode|GS:SFHS|GENOA|HUNT|MOBA|NTR|ORCADES|QIMR|STR|… 48
gwas_study_info |>
filter(grepl("other", COHORT)) |>
nrow()
[1] 13209
gwas_study_info =
gwas_study_info |>
mutate(COHORT = stringr::str_remove_all(pattern = "(^other$)|(^other\\|)|(\\|other$)",
string = COHORT)
) |>
mutate(COHORT = stringr::str_replace_all(pattern = "\\|other\\|",
string = COHORT,
replacement = "|")
)
gwas_study_info =
gwas_study_info |>
mutate(COHORT = stringr::str_remove_all(pattern = "other",
string = COHORT)
)
# not reported:
gwas_study_info |>
filter(grepl("NR", COHORT)) |>
group_by(COHORT) |>
summarise(n = n()) |>
arrange(desc(n)) |>
head()
# A tibble: 6 × 2
COHORT n
<chr> <int>
1 Knight_ADRC|ADNI|Barcelona-1|GR@ACE|DIAN|NR|Stanford_ADRC|PPMI 3608
2 Knight_ADRC|ADNI|Barcelona-1|GR@ACE|DIAN|NR 1725
3 NR 1533
4 Knight_ADRC|ADNI|Barcelona-1|GR@ACE|DIAN|NR|Stanford_ADRC 963
5 Knight_ADRC|ADNI|Barcelona-1|GR@ACE|DIAN|NR|PPMI 559
6 FINRISK 476
gwas_study_info |>
filter(grepl("NR", COHORT)) |>
nrow()
[1] 9261
gwas_study_info =
gwas_study_info |>
dplyr::mutate(COHORT = stringr::str_remove_all(pattern ="(^NR$)|(^NR\\|)|(\\|NR$)",
string = COHORT)) |>
mutate(COHORT = stringr::str_replace_all(pattern = "(\\|NR\\|)",
string = COHORT,
replacement = "|")
)
gwas_study_info =
gwas_study_info |>
mutate(COHORT = str_remove_all(pattern = "\\|$",
string = COHORT))
all_cohorts = gwas_study_info$COHORT
all_cohorts = unlist(strsplit(all_cohorts, "\\|"))
unique(all_cohorts) |> length()
[1] 1174
# Correct for discrepancies within same paper
gwas_study_info <- gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, "AWI-Gen", "AWI-GEN")) |> # PUBMED ID :40229280
mutate(COHORT = str_replace_all(COHORT, "AddHealth", "Add Health")) |> # PUBMED ID: 37494057
mutate(COHORT = str_replace_all(COHORT, fixed("EB|FinnGen|UKBB"), "EB|FinnGen|UKB")) |> # 39067062
mutate(COHORT = str_replace_all(COHORT, "Estonian Biobank", "EB")) |> # PUBMED ID: 39500877
mutate(COHORT = str_replace_all(COHORT, "AWIGEN", "AWI-GEN")) # 40229280
# Makes TwinsUK consistent
gwas_study_info <- gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, "TWINS-UK|TWINSUK", "TwinsUK"))
# Make epic norfolk consistent
gwas_study_info <- gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, "EPIC-Norfolk cohort", "EPIC-Norfolk"))
# Make emerge consistent
gwas_study_info <- gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, "EMERGE", "eMERGE"))
# Make twingene consistent
gwas_study_info <- gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, "TWINGENE", "TwinGene"))
# Make QSkin consistent
gwas_study_info <- gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, "QSkin|Qskin", "QSKIN"))
# Make 23andme consistent
gwas_study_info <- gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, "23ANDME", "23andMe"))
# Make PopGen consistent
gwas_study_info <- gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, "PopGen", "POPGEN"))
# Make decode consistent
gwas_study_info <- gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, "DECODE|deCode|DeCODE", "deCODE"))
# Make FinnGen consistent
gwas_study_info <- gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, "Finngen|FINNGEN", "FinnGen"))
gwas_study_info <- gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, "genomicc", "GenOMICC")) |>
mutate(COHORT = str_replace_all(COHORT, "IPSYCH", "iPSYCH")) |>
mutate(COHORT = str_replace_all(COHORT, "SIMES", "SiMES")) |>
mutate(COHORT = str_replace_all(COHORT, "HELIX", "Helix"))
gwas_study_info = gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, "FINLAND", "Finland"))
all_cohorts = gwas_study_info$COHORT
all_cohorts = unlist(strsplit(all_cohorts, "\\|"))
unique(all_cohorts) |> length()
[1] 1150
# CARDIoGRAMplusC4D cohort includes both CARDIoGRAM and C4D cohorts
# see: https://cardiogramplusc4d.org/data-downloads/
# for coding, therefore, we change this to CARDIoGRAM|C4D
gwas_study_info = gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, "CARDIoGRAMplusC4D", "CARDIoGRAM|C4D"))
all_cohorts[grep("ukb", tolower(all_cohorts))] |> unique()
[1] "UKB" "UKBB" "UKBB White British"
[4] "UKBS" "UKB-PPP"
gwas_study_info |>
filter(grepl("UKBS", COHORT)) |>
select(PUBMED_ID, COHORT) |>
distinct()
PUBMED_ID
<int>
1: 37653029
2: 34127860
COHORT
<char>
1: GenEPA|CHOP|EPICURE|HBCS|KORA|ILM|PoBI|POPGEN|TSS|UKBS
2: BC58|BDA|NIHR Cambridge BioResource|GRID|UKBS|BRI|CLEAR|EDIC|GoKinD|NYCP|NIMH|SEARCH|TrialNet|T1DGC|UAB|UC|UCSF|IDDMGEN|T1DGEN|MCW|GRID-NI|Young Hearts-NI|Steno Diabetes Center|HSG|HapMap
# for PUBMED_ID: 37653029
# UKBS seems to be UK Biobank Bank
# for pubmed id: 34127860
# UKBS is UK Blood Service (UKBS)
gwas_study_info |>
filter(grepl("UKB-PPP", COHORT)) |>
select(PUBMED_ID, COHORT) |>
distinct()
PUBMED_ID COHORT
<int> <char>
1: 37794183 UKB-PPP
# pubmed id 37794183 is uk biobank - protein
gwas_study_info <- gwas_study_info |>
mutate(COHORT = ifelse(COHORT == "UKB-PPP", "UKB", COHORT)) |>
mutate(COHORT = str_replace_all(COHORT, "UKBB White British", "UKB")) |>
mutate(COHORT = gsub("\\bUKB\\b", "UKBB", COHORT))
all_cohorts = gwas_study_info$COHORT
all_cohorts = unlist(strsplit(all_cohorts, "\\|"))
unique(all_cohorts) |> length()
[1] 1147
# seems NIHR Cambridge BioResource & NIHR BIORESOURCE are the same
# https://www.cambridgebioresource.group.cam.ac.uk/
gwas_study_info |>
filter(grepl("NIHR Cambridge BioResource", COHORT)) |>
select(PUBMED_ID, COHORT)
PUBMED_ID
<int>
1: 34127860
2: 34127860
COHORT
<char>
1: BC58|BDA|NIHR Cambridge BioResource|GRID|UKBS|BRI|CLEAR|EDIC|GoKinD|NYCP|NIMH|SEARCH|TrialNet|T1DGC|UAB|UC|UCSF|IDDMGEN|T1DGEN|MCW|GRID-NI|Young Hearts-NI|Steno Diabetes Center|HSG|HapMap
2: BC58|BDA|NIHR Cambridge BioResource|GRID|UKBS|BRI|CLEAR|EDIC|GoKinD|NYCP|NIMH|SEARCH|TrialNet|T1DGC|UAB|UC|UCSF|IDDMGEN|T1DGEN|MCW|GRID-NI|Young Hearts-NI|Steno Diabetes Center|HSG|HapMap
gwas_study_info |>
filter(grepl("NIHR BIORESOURCE", COHORT)) |>
select(PUBMED_ID, COHORT) |>
distinct()
PUBMED_ID
<int>
1: 39891803
2: 40205036
COHORT
<char>
1: UKBB|CHARGE|ALSPAC|NIHR BIORESOURCE
2: arcOGEN|ARGO|UKHLS|China Kadoorie Biobank|deCODE|CHB|DBDS|eMERGE|EB|FinnGen|MyCode|GS:SFHS|HRS|HKDDDPC|HUNT|Bunkyo|HerediGene|RIKEN|Shimane-CoHRE|JOCO|LifeLines|NEO|NHS|MGBB|QIMR|RS|SHIP|SIMPLER|ToMMo|TwinsUK|UKBB|BioMe|G&H|NIHR BIORESOURCE|MVP|OAI
gwas_study_info |>
filter(grepl(tolower("BIORESOURCE"), tolower(COHORT))) |>
select(PUBMED_ID, COHORT) |>
distinct()
PUBMED_ID
<int>
1: 39891803
2: 40205036
3: 34127860
COHORT
<char>
1: UKBB|CHARGE|ALSPAC|NIHR BIORESOURCE
2: arcOGEN|ARGO|UKHLS|China Kadoorie Biobank|deCODE|CHB|DBDS|eMERGE|EB|FinnGen|MyCode|GS:SFHS|HRS|HKDDDPC|HUNT|Bunkyo|HerediGene|RIKEN|Shimane-CoHRE|JOCO|LifeLines|NEO|NHS|MGBB|QIMR|RS|SHIP|SIMPLER|ToMMo|TwinsUK|UKBB|BioMe|G&H|NIHR BIORESOURCE|MVP|OAI
3: BC58|BDA|NIHR Cambridge BioResource|GRID|UKBS|BRI|CLEAR|EDIC|GoKinD|NYCP|NIMH|SEARCH|TrialNet|T1DGC|UAB|UC|UCSF|IDDMGEN|T1DGEN|MCW|GRID-NI|Young Hearts-NI|Steno Diabetes Center|HSG|HapMap
gwas_study_info = gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, "NIHR Cambridge BioResource|NIHR BIORESOURCE" , "NIHR BioResource"))
# Leivin biobank appears to a typo - for Living Biobank
# see PUBMED ID 34059833; https://pmc.ncbi.nlm.nih.gov/articles/PMC7610958/#SD1
gwas_study_info |> filter(grepl("Leivin Biobank", COHORT))
DATE_ADDED_TO_CATALOG PUBMED_ID FIRST_AUTHOR DATE JOURNAL
<IDat> <int> <char> <IDat> <char>
1: 2021-06-10 34059833 Chen J 2021-05-31 Nat Genet
LINK
<char>
1: www.ncbi.nlm.nih.gov/pubmed/34059833
STUDY DISEASE/TRAIT
<char> <char>
1: The trans-ancestral genomic architecture of glycemic traits. Fasting glucose
INITIAL_SAMPLE_SIZE REPLICATION_SAMPLE_SIZE
<char> <char>
1: 35,619 East Asian ancestry individuals <NA>
PLATFORM_[SNPS_PASSING_QC] ASSOCIATION_COUNT
<char> <int>
1: Affymetrix, Illumina [15438438] (imputed) 15
MAPPED_TRAIT MAPPED_TRAIT_URI STUDY_ACCESSION
<char> <char> <char>
1: glucose measurement http://www.ebi.ac.uk/efo/EFO_0004468 GCST90002231
GENOTYPING_TECHNOLOGY
<char>
1: Genome-wide genotyping array, Targeted genotyping array [Genome-wide genotyping array|Metabochip]
SUBMISSION_DATE STATISTICAL_MODEL BACKGROUND_TRAIT MAPPED_BACKGROUND_TRAIT
<lgcl> <lgcl> <lgcl> <char>
1: NA NA NA
MAPPED_BACKGROUND_TRAIT_URI
<char>
1:
COHORT
<char>
1: AASC|BES|CAGE-GWAS1|CAGE|CLHNS|CHNS|KARE|Leivin Biobank|MESA|Nagahama Study|NHAPC|SCES|SiMES|SP2|TAICHI|CRC|SBCS|SMHS
FULL_SUMMARY_STATISTICS
<char>
1: yes
SUMMARY_STATS_LOCATION
<char>
1: http://ftp.ebi.ac.uk/pub/databases/gwas/summary_statistics/GCST90002001-GCST90003000/GCST90002231
GXE
<char>
1: no
gwas_study_info = gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, "Leivin Biobank", "Living Biobank"))
gwas_study_info |>
filter(grepl("Ghana", COHORT)) |>
select(PUBMED_ID,COHORT) |>
distinct()
PUBMED_ID COHORT
<int> <char>
1: 39358599 MADCaP|Ghana_Prostate|PRACTICAL
2: 36872133 AAPC|ELLIPSE|Ghana|eMERGE|BioVU|BioMe|MVP|ProHealth
# if look at papers they are referring to the same cohorts:
# PUBMED_ID: 36872133 https://pmc.ncbi.nlm.nih.gov/articles/PMC10424812/#S9
# PUBMED_ID: 39358599 https://www.nature.com/articles/s41588-024-01931-3#Sec12
gwas_study_info = gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, "Ghana_Prostate", "Ghana"))
# from reading sup table: https://pmc.ncbi.nlm.nih.gov/articles/instance/7611832/bin/EMS136340-supplement-Supplementary_Information.pdf
# for pubmed 34349265
# seems SARDINIA should be combined into SardiNIA
gwas_study_info |>
filter(grepl("SARDINIA", COHORT)) |>
select(PUBMED_ID, COHORT) |>
distinct()
PUBMED_ID
<int>
1: 34349265
COHORT
<char>
1: ALSPAC|ARIC|CHS|CILENTO|COLAUS|EGCUT|EPIC-Norfolk|FHS|INGI-FVG|GS:SFHS|HealthABC|HRS|INCHIANTI|InterAct|KORA|LifeLines|NEO|NHS|NTR|ORCADES|QIMR|RS|SARDINIA|SHIP|SHIP-TREND|TwinGene|TwinsUK|INGI-Val_Borbera|WGHS|WHI|BCAC|UKBB
gwas_study_info |>
filter(grepl("SardiNIA", COHORT)) |>
select(PUBMED_ID, COHORT) |>
distinct()
PUBMED_ID
<int>
1: 36477530
2: 36477530
3: 36477530
4: 36477530
5: 36477530
6: 36477530
7: 36477530
8: 36477530
9: 36477530
10: 36477530
11: 36376304
12: 36050321
13: 36050321
14: 34718232
15: 32929287
COHORT
<char>
1: 23andMe|ALSPAC|ARIC|CADD|deCODE|EGCUT|eMERGE|Harvard|HRS|HUNT|MCTFR|MESA|METSIM|NTR|SardiNIA|UKBB|NINDS|FINRISK|AMISH|GeneSTAR|GOLDN|CHS|HVH|JHS|WGHS|WHI|GFG
2: 23andMe|ALSPAC|ARIC|CADD|COGEND|COPDGene|deCODE|EGCUT|Harvard|HRS|HUNT|METSIM|NTR|QIMR|SardiNIA|UKBB|FINRISK|AMISH|CFS|ECLIPSE|GeneSTAR|GOLDN|WHI
3: 23andMe|ALSPAC|ARIC|CADD|COGEND|COPDGene|deCODE|EGCUT|Harvard|HRS|HUNT|MCTFR|MESA|METSIM|NTR|PAGE|QIMR|SardiNIA|UKBB|FINRISK|AMISH|CFS|ECLIPSE|GeneSTAR|GOLDN|CHS|HCHS|SOL|WHI
4: 23andMe|ALSPAC|ARIC|CADD|COGEND|COPDGene|deCODE|EGCUT|eMERGE|Harvard|HUNT|MCTFR|METSIM|NTR|SardiNIA|UKBB|NINDS|FINRISK|AMISH|CFS|ECLIPSE|GeneSTAR|GOLDN|CHS|HCHS|SOL|HVH|WGHS|WHI
5: 23andMe|ALSPAC|ARIC|CADD|COGEND|deCODE|EGCUT|GERA|Harvard|HRS|HUNT|MCTFR|MESA|METSIM|NTR|QIMR|SardiNIA|UKBB|FINRISK|WHI
6: 23andMe|ALSPAC|ARIC|BLTS|CADD|deCODE|EGCUT|eMERGE|GFG|Harvard|HRS|HUNT|MCTFR|MESA|METSIM|NTR|SardiNIA|UKBB|WHI|FINRISK|NINDS|BBJ|CKB|AMISH|CFS|CHS|GENSalt|GOLDN|HCHS|SOL|HVH|HyperGEN|JHS|GeneSTAR|GENOA|SARP|WGHS
7: 23andMe|ALSPAC|ARIC|BLTS|CADD|COGEND|COPDGene|deCODE|EGCUT|GFG|Harvard|HRS|HUNT|MESA|METSIM|NTR|OZALC|SardiNIA|UKBB|WHI|FINRISK|BBJ|CKB|AMISH|CFS|ECLIPSE|GENSalt|GOLDN|HyperGEN|JHS|GeneSTAR|GENOA
8: 23andMe|ALSPAC|ARIC|BLTS|CADD|COGEND|COPDGene|deCODE|EGCUT|GFG|Harvard|HRS|HUNT|MCTFR|MESA|METSIM|NTR|OZALC|SardiNIA|UKBB|WHI|FINRISK|PAGE|BBJ|CKB|AMISH|CFS|CHS|ECLIPSE|GENSalt|GOLDN|HCHS|SOL|HyperGEN|JHS|GeneSTAR|GENOA
9: 23andMe|ALSPAC|ARIC|BLTS|CADD|COGEND|COPDGene|deCODE|EGCUT|eMERGE|GFG|Harvard|HUNT|MCTFR|MESA|METSIM|NTR|SardiNIA|UKBB|WHI|FINRISK|NINDS|BBJ|CKB|AMISH|CFS|CHS|ECLIPSE|GENSalt|GOLDN|HCHS|SOL|HVH|HyperGEN|JHS|GeneSTAR|GENOA|WGHS
10: 23andMe|ALSPAC|ARIC|CADD|COGEND|deCODE|EGCUT|GERA|GFG|Harvard|HRS|HUNT|MCTFR|MESA|METSIM|NTR|OZALC|SardiNIA|UKBB|WHI|FINRISK|BBJ|CKB
11: 23andMe|ALSPAC|ARIC|BLS|CADD|COGEND|COPDGene|deCODE|EGCUT|FHS|FTC|GERA|GFG|Harvard|HRS|HUNT|MCTFR|MESA|METSIM|NESCOG|FTC|NAG-FIN|NTR|QIMR|SardiNIA|UKBB|WHI
12: ARIC|BioMe|BRIGHT|CHRIS|CHS|ERF|FINCAVAS|GAPP|HCHS|SOL|HealthABC|INGI-Carlantino|INGI-FVG|Inter99|JHS|KORA|LifeLines|MESA|NEO|OOA|ORCADES|PIVUS|PREVEND|PROSPER|RS|SardiNIA|SHIP|TwinsUK|UKBB|VIKING|WHI|YFS
13: ARIC|BioMe|BRIGHT|CHRIS|CHS|ERF|FINCAVAS|GAPP|HealthABC|INGI-Carlantino|INGI-FVG|Inter99|KORA|LifeLines|MESA|NEO|OOA|ORCADES|PIVUS|PREVEND|PROSPER|RS|SardiNIA|SHIP|TwinsUK|UKBB|VIKING|WHI|YFS
14: SardiNIA
15: SardiNIA
gwas_study_info = gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, "SARDINIA", "SardiNIA"))
gwas_study_info |>
filter(grepl("Sardinia", COHORT)) |>
select(PUBMED_ID, COHORT) |>
distinct()
PUBMED_ID
<int>
1: 33830302
COHORT
<char>
1: GRID|British 1958 birth cohort|National blood service|WTCCC - Bipolar disease cases|Oxford Regional Prospective Study of Childhood Diabetes (ORPS)|Sardinia case-control
# not sure about case control Sardinia ...
# see second sup table from https://pmc.ncbi.nlm.nih.gov/articles/PMC8099827/#_ad93_
# Sardinia
Seems like mentioned ancestry groups, rather than cohorts (e.g. UKBB is used in this study)
see cohort information here: https://pmc.ncbi.nlm.nih.gov/articles/instance/8220892/bin/NIHMS1709432-supplement-Supp_Materials.pdf
gwas_study_info |>
dplyr::filter(PUBMED_ID == 32949544)
DATE_ADDED_TO_CATALOG PUBMED_ID FIRST_AUTHOR DATE JOURNAL
<IDat> <int> <char> <IDat> <char>
1: 2020-10-01 32949544 Jones E 2020-09-16 Lancet Neurol
LINK
<char>
1: www.ncbi.nlm.nih.gov/pubmed/32949544
STUDY
<char>
1: Identification of novel risk loci and causal insights for sporadic Creutzfeldt-Jakob disease: a genome-wide association study.
DISEASE/TRAIT
<char>
1: Creutzfeldt-Jakob disease (sporadic)
INITIAL_SAMPLE_SIZE
<char>
1: 4,110 European ancestry cases, 13,569 European ancestry controls
REPLICATION_SAMPLE_SIZE
<char>
1: 1,098 European ancestry cases, 498 ,016 European ancestry controls
PLATFORM_[SNPS_PASSING_QC] ASSOCIATION_COUNT
<char> <int>
1: Affymetrix, Illumina [6314492] (imputed) 4
MAPPED_TRAIT MAPPED_TRAIT_URI
<char> <char>
1: sporadic Creutzfeld Jacob disease http://www.ebi.ac.uk/efo/EFO_1000656
STUDY_ACCESSION GENOTYPING_TECHNOLOGY SUBMISSION_DATE
<char> <char> <lgcl>
1: GCST90001389 Genome-wide genotyping array NA
STATISTICAL_MODEL BACKGROUND_TRAIT MAPPED_BACKGROUND_TRAIT
<lgcl> <lgcl> <char>
1: NA NA
MAPPED_BACKGROUND_TRAIT_URI
<char>
1:
COHORT
<char>
1: Dutch controls|French controls|German controls|Italian controls|Spanish controls|UK controls|US controls|UK sCJD cases|US sCJD cases|German sCJD cases| sCJD cases
FULL_SUMMARY_STATISTICS
<char>
1: yes
SUMMARY_STATS_LOCATION
<char>
1: http://ftp.ebi.ac.uk/pub/databases/gwas/summary_statistics/GCST90001001-GCST90002000/GCST90001389
GXE
<char>
1: no
gwas_study_info =
rows_update(gwas_study_info ,tibble(PUBMED_ID = 32949544, COHORT = "multiple"), unmatched = "ignore")
gwas_study_info |>
filter(grepl("Multiethnic samples from the UK", COHORT)) |>
select(PUBMED_ID, COHORT) |>
distinct()
PUBMED_ID COHORT
<int> <char>
1: 33649486 Multiethnic samples from the UK
gwas_study_info |>
filter(PUBMED_ID == 33649486)
DATE_ADDED_TO_CATALOG PUBMED_ID FIRST_AUTHOR DATE JOURNAL
<IDat> <int> <char> <IDat> <char>
1: 2021-03-22 33649486 Hardcastle AJ 2021-03-01 Commun Biol
LINK
<char>
1: www.ncbi.nlm.nih.gov/pubmed/33649486
STUDY
<char>
1: A multi-ethnic genome-wide association study implicates collagen matrix integrity and cell differentiation pathways in keratoconus.
DISEASE/TRAIT
<char>
1: Keratoconus
INITIAL_SAMPLE_SIZE
<char>
1: 2,116 European ancestry cases, 24,626 European ancestry controls
REPLICATION_SAMPLE_SIZE
<char>
1: 1, 389 European ancestry cases, 79,727 European ancestry controls, 759 South Asian ancestry cases, 8,009 South Asian ancestry controls, 405 African ancestry cases, 4,185 African ancestry controls
PLATFORM_[SNPS_PASSING_QC] ASSOCIATION_COUNT MAPPED_TRAIT
<char> <int> <char>
1: Affymetrix [7701190] (imputed) 36 keratoconus
MAPPED_TRAIT_URI STUDY_ACCESSION
<char> <char>
1: http://purl.obolibrary.org/obo/MONDO_0015486 GCST90013442
GENOTYPING_TECHNOLOGY SUBMISSION_DATE STATISTICAL_MODEL
<char> <lgcl> <lgcl>
1: Genome-wide genotyping array NA NA
BACKGROUND_TRAIT MAPPED_BACKGROUND_TRAIT MAPPED_BACKGROUND_TRAIT_URI
<lgcl> <char> <char>
1: NA
COHORT FULL_SUMMARY_STATISTICS
<char> <char>
1: Multiethnic samples from the UK yes
SUMMARY_STATS_LOCATION
<char>
1: http://ftp.ebi.ac.uk/pub/databases/gwas/summary_statistics/GCST90013001-GCST90014000/GCST90013442
GXE
<char>
1: no
# looking at this study, discovery
# controls come from UKBB
# cases recruited from various places across the UK - so
gwas_study_info =
rows_update(gwas_study_info ,tibble(PUBMED_ID = 33649486, COHORT = "UKBB|other"),
unmatched = "ignore")
# GAINT appears to be a typo
# see PUBMED_ID: 36376304 (https://pmc.ncbi.nlm.nih.gov/articles/PMC9663411/)
gwas_study_info |>
filter(grepl("GAINT", COHORT)) |>
select(PUBMED_ID, COHORT) |>
distinct()
PUBMED_ID COHORT
<int> <char>
1: 36376304 UKBB|GAINT
gwas_study_info = gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, "GAINT", "GIANT"))
unique(all_cohorts)[grepl("1982", unique(all_cohorts))]
[1] "1982 PELOTAS"
[2] "1982 Pelotas (Brazil) Birth Cohort Study"
gwas_study_info |>
filter(grepl("1982", COHORT)) |>
select(PUBMED_ID, COHORT) |>
distinct()
PUBMED_ID
<int>
1: 40537477
2: 39885687
3: 35399580
4: 35399580
COHORT
<char>
1: ARIC|CARDIA|CHS|GENOA|HABC|HANDLS|JHS|MESA|WHI|SP2|BAEPENDI|1982 PELOTAS|AGES|ERF|FHS|HyperGEN|NEO|RS|WHI-GARNET|GeneSTAR|HRS|SMHS|SWHS|CoLaus|KORA|LBC|Lifelines|NESDA|SHIP-Trend|TRAILS|YFS|SOL
2: ZOE2.0|SLS|BioVU|MyCode|VFA|SOLYouth|1982 PELOTAS|CCHC|EGG|MOBA
3: BioMe|Baependi|CANDELA|NC-BCFR|SFBCS|FIND|HCHS|SOL|Los Angeles Latino Eye Study|MEC|MESA|Mexico City 1|Mexico City 2|MHS|1982 Pelotas (Brazil) Birth Cohort Study|SAFS|STARR COUNTY|T2D SIGMA Studies|WHI
4: BioMe|Baependi|CANDELA|NC-BCFR|SFBCS|FIND|HCHS|SOL|Los Angeles Latino Eye Study|MEC|MESA|Mexico City 1|Mexico City 2|MHS|1982 Pelotas (Brazil) Birth Cohort Study|SAFS|STARR COUNTY|T2D SIGMA Studies|WHI|AAAGC|GIANT
# can confirm, 39885687 (https://pmc.ncbi.nlm.nih.gov/articles/PMC11875162/)
# 1982 PELOTAS refers to 1982 Pelotas (Brazil) Birth Cohort Study
# can confirm: 40537477 (https://pmc.ncbi.nlm.nih.gov/articles/PMC12179276/#MOESM2)
# 1982 PELOTAS refers to 1982 Pelotas (Brazil) Birth Cohort Study
gwas_study_info = gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, "1982 Pelotas (Brazil) Birth Cohort Study", "1982 PELOTAS"))
gwas_study_info |>
filter(grepl("\\bPELOTAS\\b", COHORT)) |>
filter(!grepl("1982", COHORT)) |>
select(PUBMED_ID, COHORT) |>
distinct()
PUBMED_ID COHORT
<int> <char>
1: 34059833 BioMe|IRAS|MESA|PELOTAS|HCHS|SOL
# can confirm: 34059833
# PELOTAS refers to the 1982 PELOTAS study
gwas_study_info = gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, "(?<!1982 )PELOTAS", "1982 PELOTAS"))
# i notice there are studies with cohort listed as
# GR@CE & GR@ACE - perhaps these are the same?
# checking GR@CE - as there is fewer of these studies listed ...
gwas_study_info |>
filter(grepl("GR@CE", COHORT)) |>
select(PUBMED_ID, COHORT) |>
distinct()
PUBMED_ID
<int>
1: 35379992
2: 39046104
3: 39046104
COHORT
<char>
1: EADB|GR@CE|EADI|GERAD|PERADES|DemGene|Bonn|RS|CCHS|UKBB
2: 3C|AGES|ARIC|ASPREE|CHS|FVG|FHS|GR@CE|Apulia|HKOS|HUNT|MEMENTO|MYHAT|ROSMAP|RS|ADGC|UKBB|SALSA
3: 3C|AGES|ARIC|ASPREE|CHS|FVG|FHS|GR@CE|Apulia|HKOS|HUNT|MEMENTO|MYHAT|ROSMAP|RS|ADGC|UKBB
# 35379992 -GR@CE appears to be a typo, should be: GR@ACE (https://pmc.ncbi.nlm.nih.gov/articles/PMC9005347/#Sec8)
# 39046104 - GR@CE also appears to be a typo, should be: GR@ACE
# https://pmc.ncbi.nlm.nih.gov/articles/PMC11497727/#alz14115-sec-0080
gwas_study_info = gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, "GR@CE", "GR@ACE"))
gwas_study_info |>
filter(grepl("tohoku", tolower(COHORT))) |>
select(PUBMED_ID, COHORT) |>
distinct()
PUBMED_ID COHORT
<int> <char>
1: 40226751 Tohoku Medical Megabank
2: 34782693 Tohoku Medical Megabank Project
gwas_study_info = gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, "Tohoku Medical Megabank Project", "Tohoku Medical Megabank"))
gwas_study_info |>
filter(grepl("steno", tolower(COHORT))) |>
select(PUBMED_ID, COHORT, `DISEASE/TRAIT`) |>
distinct()
PUBMED_ID
<int>
1: 34127860
2: 35627254
COHORT
<char>
1: BC58|BDA|NIHR BioResource|GRID|UKBS|BRI|CLEAR|EDIC|GoKinD|NYCP|NIMH|SEARCH|TrialNet|T1DGC|UAB|UC|UCSF|IDDMGEN|T1DGEN|MCW|GRID-NI|Young Hearts-NI|Steno Diabetes Center|HSG|HapMap
2: Steno
DISEASE/TRAIT
<char>
1: Type 1 diabetes
2: Neuropeptide Y autoantibody levels in type 1 diabetes
all_cohorts[grep("steno", tolower(all_cohorts))] |> unique()
[1] "Steno Diabetes Center" "Steno"
gwas_study_info = gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, "Steno Diabetes Center", "Steno"))
gwas_study_info |>
filter(grepl("nagahama", tolower(COHORT))) |>
select(PUBMED_ID, COHORT) |>
distinct()
PUBMED_ID
<int>
1: 35551307
2: 34059833
3: 34059833
4: 34887591
5: 40181193
6: 38277453
COHORT
<char>
1: AASC|BBJ|BES|CAGE|CHNS|CKB|CLHNS|DC|SP2|HKDR|KARE|MESA|Nagahama Study|SBCS|SWHS|SCES|SCHS|SiMES|TAICHI|TWT2D
2: AASC|BES|CAGE-GWAS1|CAGE|CLHNS|CHNS|KARE|Living Biobank|MESA|Nagahama Study|NHAPC|SCES|SiMES|SP2|TAICHI|CRC|SBCS|SMHS
3: CAGE-GWAS1|CAGE|CHNS|KARE|LivingBiobank|MESA|NagahamaStudy|NHAPC|SCES|SiMES|SP2|TAICHI|CRC|TWSC
4: BAS|BBJ|BES|CAGE|CAS|CHNS|CKB|SDCS|JPDSC|KARE|Living-biobank|MESA|Nagahama Study|NHAPC|SBCS|SCES|SCHS|SiMES|SINDI|SP2|SWHS|TUDR|TWT2D
5: AGES|ALSPAC|ARIC|BHS_b|CARDIA|CCHC|CFS|CHS|COLAUS|DIACORE|DRS_EXTRA|EPIC-Norfolk|EB|FHS|Fenland|GAPP|GENSALT|HANDLS|HCS|IRASFS|JHS|KOGES|LBC|LifeLines|LLFS|MESA|MVP|Nagahama_Study|NEO|NESDA|SHIP|SOL|SWAN|TwinsUK|UKBB|WHI|YFS
6: HERPACC|J-MICC|JPHC|ToMMo|Nagahama|BBJ
gwas_study_info = gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, "Nagahama_Study|NagahamaStudy", "Nagahama Study")) |>
# ? maybe check Nagahama == Nagahama Study
mutate(COHORT = str_replace_all(COHORT, "Nagahama Study", "Nagahama"))
gwas_study_info |>
filter(grepl("WTCCC - Bipolar disease cases", COHORT)) |>
select(1:5)
DATE_ADDED_TO_CATALOG PUBMED_ID FIRST_AUTHOR DATE JOURNAL
<IDat> <int> <char> <IDat> <char>
1: 2021-04-23 33830302 Inshaw JRJ 2021-04-08 Diabetologia
gwas_study_info |>
filter(PUBMED_ID == 33830302) |>
select(PUBMED_ID, COHORT)
PUBMED_ID
<int>
1: 33830302
2: 33830302
COHORT
<char>
1: GRID|British 1958 birth cohort|National blood service|WTCCC - Bipolar disease cases|Oxford Regional Prospective Study of Childhood Diabetes (ORPS)|Sardinia case-control
2:
gwas_study_info = gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, "WTCCC - Bipolar disease cases", "WTCCC"))
gwas_study_info |>
filter(grepl("QGP", COHORT)) |>
select(PUBMED_ID, COHORT) |>
distinct()
PUBMED_ID COHORT
<int> <char>
1: 33623009 Qatar Genome Program (QGP)
2: 36168886 QGP
# Checked 36168886 - QGP is Qatar Genome Project
# so
gwas_study_info = gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, "Qatar Genome Program (QGP)", "QGP"))
all_cohorts = gwas_study_info$COHORT
all_cohorts = unlist(strsplit(all_cohorts, "\\|"))
unique(all_cohorts) |> length()
[1] 1124
gwas_study_info =
gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT,
'1982 Pelotas \\(Brazil\\) Birth Cohort Study',
'1982 PELOTAS'))
gwas_study_info = gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT,
"ABCD study",
"ABCD"))
# canSCAD" "CanSCAD cases and MGI controls"
gwas_study_info |>
filter(grepl("CanSCAD cases and MGI controls", COHORT)) |>
select(PUBMED_ID, COHORT) |>
distinct()
PUBMED_ID COHORT
<int> <char>
1: 32887874 CanSCAD cases and MGI controls
gwas_study_info = gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, "CanSCAD cases and MGI controls", "canSCAD|MGI"))
all_cohorts = gwas_study_info$COHORT
all_cohorts = unlist(strsplit(all_cohorts, "\\|"))
unique_cohort_names = unique(all_cohorts)
# Convert to lowercase and check duplicates
dup_groups <- tapply(unique_cohort_names, tolower(unique_cohort_names), I)
# Keep only groups with >1 element (i.e., capitalization differences)
dup_groups[lengths(dup_groups) > 1]
$airwave
[1] "Airwave" "AIRWAVE"
$allofus
[1] "AllofUs" "AllOfUs"
$baependi
[1] "BAEPENDI" "Baependi"
$biome
[1] "BioMe" "BioME" "BIOME"
$biovu
[1] "BioVU" "BioVu" "BIOVU"
$cilento
[1] "CILENTO" "Cilento"
$colaus
[1] "CoLaus" "COLAUS"
$`croatia-korcula`
[1] "CROATIA-KORCULA" "CROATIA-Korcula"
$famhs
[1] "FamHS" "FAMHS"
$fenland
[1] "Fenland" "FENLAND"
$gel
[1] "GEL" "GeL"
$genestar
[1] "GeneSTAR" "GENESTAR" "GeneStar"
$gensalt
[1] "GENSalt" "GENSALT" "GenSalt"
$godarts
[1] "GoDARTS" "GODARTS"
$hypergen
[1] "HyperGEN" "HyperGen" "HYPERGEN"
$inchianti
[1] "InCHIANTI" "INCHIANTI"
$inter99
[1] "Inter99" "INTER99"
$koges
[1] "KoGES" "KOGES"
$`life-heart`
[1] "LIFE-HEART" "LIFE-Heart"
$lifelines
[1] "LifeLines" "Lifelines"
$`mayo-vdb`
[1] "MAYO-VDB" "Mayo-VDB"
$moba
[1] "MOBA" "MoBa"
$nugene
[1] "Nugene" "NUGENE"
$orcades
[1] "ORCADES" "Orcades"
$panscan
[1] "PANSCAN" "PanScan"
$raine
[1] "RAINE" "Raine"
$`ship-trend`
[1] "SHIP-TREND" "SHIP-Trend"
$sign
[1] "SiGN" "SIGN"
$viva
[1] "Viva" "VIVA"
# Normalize by removing spaces and underscores
normalized <- gsub("[ _]", "", sort(unique_cohort_names))
# Group by normalized value
dup_groups <- tapply(sort(unique_cohort_names), normalized, I)
# Keep only groups with >1 element (i.e. variants)
dup_groups[lengths(dup_groups) > 1]
$DRSEXTRA
[1] "DRS_EXTRA" "DRSEXTRA"
$GALAII
[1] "GALA II" "GALA_II"
$Health2000
[1] "Health 2000" "Health2000"
$HealthABC
[1] "Health ABC" "HealthABC"
$`INGI-ValBorbera`
[1] "INGI-Val Borbera" "INGI-Val_Borbera"
$LivingBiobank
[1] "Living Biobank" "LivingBiobank"
gwas_study_info = gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, "AIRWAVE", "Airwave"))
gwas_study_info = gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, "AllOfUs", "AllofUs"))
gwas_study_info = gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, toupper("Baependi"), "Baependi"))
gwas_study_info = gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, "BIOME|BioME", "BioMe"))
gwas_study_info = gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, "BIOVU|BioVu", "BioVU"))
gwas_study_info =
gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, "British 1958 birth cohort", "B58C"))
gwas_study_info =
gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, "BC58", "B58C"))
# CKB is the acronym for the China Kadoorie Biobank (see:pubmed id 36777997) https://pmc.ncbi.nlm.nih.gov/articles/PMC9903787/#tbl1
gwas_study_info |>
filter(grepl("\\bCKB\\b", COHORT)) |>
select(PUBMED_ID, COHORT) |>
distinct() |>
tail()
PUBMED_ID COHORT
<int> <char>
1: 34586374 CKB|23andMe|WHI|UKBB
2: 34586374 CKB
3: 34586374 CKB|UKBB
4: 34586374 CKB|WHI
5: 33766948 CKB
6: 36777997 BBJ|BioMe|BioVU|CCPM|CKB|EB|FinnGen|G&H|HUNT|MGBB|MGI|UCLA|UKBB
gwas_study_info = gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, "China Kadoorie Biobank", "CKB"))
gwas_study_info = gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, toupper("Cilento"), "Cilento"))
gwas_study_info = gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, "COLAUS", "CoLaus"))
gwas_study_info = gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, "CROATIA-KORCULA", "CROATIA-Korcula"))
gwas_study_info = gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, "DRSEXTRA", "DRS_EXTRA"))
gwas_study_info = gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, toupper("FamHS"), "FamHS"))
gwas_study_info = gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, "FENLAND", "Fenland"))
gwas_study_info = gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, "GALA_II", "GALA II"))
gwas_study_info |>
filter(grepl("GeL", COHORT)) |>
select(PUBMED_ID, COHORT) |>
distinct()
PUBMED_ID COHORT
<int> <char>
1: 36124557 GeL
# Only one study uses GeL (36124557)- from
# https://pmc.ncbi.nlm.nih.gov/articles/PMC9512401/#s4
# Appears to be typo, for Genomics England (GEL)
gwas_study_info = gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, "GeL", "GEL"))
gwas_study_info = gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, "GENESTAR|GeneStar", "GeneSTAR"))
gwas_study_info = gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, "GENSALT|GenSalt", "GENSalt"))
gwas_study_info = gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, toupper("godarts"), "GoDARTS"))
gwas_study_info = gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, toupper("InCHIANTI"), "InCHIANTI"))
gwas_study_info = gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, toupper("Inter99"), "Inter99"))
gwas_study_info = gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, "Health ABC", "HealthABC"))
gwas_study_info = gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, "Health 2000", "Health2000"))
gwas_study_info = gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, "HyperGEN|HYPERGEN", "HyperGen"))
# ? LifeLines Deep
gwas_study_info = gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, "INGI-Val_Borbera", "INGI-Val Borbera"))
# ? LifeLines Deep
gwas_study_info = gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, toupper("KoGES"), "KoGES"))
gwas_study_info = gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, "LIFE-HEART", "LIFE-Heart"))
gwas_study_info = gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, "Lifelines", "LifeLines"))
# ? LifeLines Deep
gwas_study_info = gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, "Living-biobank|LivingBiobank", "Living Biobank"))
gwas_study_info = gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, toupper("Mayo-VDB"), "Mayo-VDB"))
? I think refers to this: https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000306.v4.p1
gwas_study_info =
gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT,
'A Multiethnic Genome-wide Scan of Prostate Cancer',
'MEC')) |>
mutate(COHORT = str_replace_all(COHORT,
'Multiethnic Genome-wide Scan of Prostate Cancer',
'MEC'))
gwas_study_info = gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, toupper("MoBa"), "MoBa"))
gwas_study_info = gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, "Nugene", "NUGENE"))
gwas_study_info = gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, toupper("Orcades"), "Orcades"))
gwas_study_info =
gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT,
'Oxford Regional Prospective Study of Childhood Diabetes \\(ORPS\\)',
'ORPS'))
gwas_study_info =
gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT,
'Qatar Genome Program \\(QGP\\)',
'QGP'))
gwas_study_info = gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, toupper("PanScan"), "PanScan"))
gwas_study_info = gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, toupper("Raine"), "Raine"))
all_cohorts[grep("rosmap", tolower(all_cohorts))] |> unique()
[1] "ROSMAP" "ROSMAP 1" "ROSMAP 2"
gwas_study_info |>
filter(grepl("ROSMAP 1|ROSMAP 2", COHORT)) |>
select(PUBMED_ID, COHORT) |>
distinct()
PUBMED_ID
<int>
1: 33510174
COHORT
<char>
1: ARIC|BASE-II|BPROOF|CHS|EPIC-Norfolk|FHS|HRS|InCHIANTI|LASA I|LASA II|Long Life Family Study|MrOS Gothenburg|MrOS Malmo|ROSMAP 1|ROSMAP 2|RS|RSI|RSII|SHIP|TSHA|UKBB|WLS
gwas_study_info = gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, "ROSMAP 1|ROSMAP 2", "ROSMAP"))
gwas_study_info = gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, "SHIP-Trend", "SHIP-TREND"))
# ? "SHIPNATREND" - comes from one study
gwas_study_info |>
filter(grepl("SHIPNATREND", COHORT)) |>
select(PUBMED_ID, COHORT) |>
distinct()
PUBMED_ID
<int>
1: 32888493
2: 32888493
3: 32888493
4: 32888493
5: 32888493
6: 32888493
7: 32888493
8: 32888493
9: 32888493
10: 32888493
11: 32888493
12: 32888493
13: 32888493
14: 32888493
15: 32888493
16: 32888493
17: 32888493
18: 32888493
19: 32888493
20: 32888493
21: 32888493
22: 32888493
PUBMED_ID
COHORT
<char>
1: Airwave|BioMe|CaPS|Estonia|Estonia|FHS|FINCAVAS|GERA|GERA|GERA|INTERVAL|MESA|MHIphase1|MHIphase2|RS|RS|RSI|SHIPNATREND|UKBB|WHI
2: Airwave|BBJ|BioMe|CaPS|Estonia|Estonia|FHS|FINCAVAS|GERA|GERA|GERA|HANDLS|INTERVAL|JHS|MESA|MHIphase1|MHIphase2|RS|RS|RSI|SHIPNATREND|UKBB|WHI
3: Airwave|BioMe|CaPS|Estonia|Estonia|FHS|FINCAVAS|INTERVAL|MESA|MHIphase1|MHIphase2|RS|RS|RSI|SHIP|SHIPNATREND|UKBB|WHI|YFS
4: Airwave|BBJ|BioMe|CaPS|Estonia|Estonia|FHS|FINCAVAS|HANDLS|INTERVAL|JHS|MESA|MHIphase1|MHIphase2|RS|RS|RSI|SHIP|SHIPNATREND|UKBB|WHI|YFS
5: Airwave|BioMe|CaPS|CHS|Estonia|Estonia|FHS|FINCAVAS|INTERVAL|MESA|MHIphase1|MHIphase2|RS|RS|RSI|SHIP|SHIPNATREND|UKBB|WHI|YFS
6: Airwave|BBJ|BioMe|CaPS|CHS|CHS|Estonia|Estonia|FHS|FINCAVAS|HANDLS|INTERVAL|JHS|MESA|MHIphase1|MHIphase2|RS|RS|RSI|SHIP|SHIPNATREND|UKBB|WHI|YFS
7: Airwave|BioMe|CaPS|Estonia|Estonia|FHS|FINCAVAS|GERA|GERA|GERA|INTERVAL|MESA|MHIphase1|MHIphase2|RS|RS|RSI|SHIP|SHIPNATREND|UKBB|WHI|YFS
8: Airwave|BBJ|BioMe|CaPS|Estonia|Estonia|FHS|FINCAVAS|GERA|GERA|GERA|HANDLS|INTERVAL|JHS|MESA|MHIphase1|MHIphase2|RS|RS|RSI|SHIP|SHIPNATREND|UKBB|WHI|YFS
9: Airwave|BioMe|CaPS|Estonia|Estonia|FHS|FINCAVAS|INTERVAL|MESA|MHIphase1|MHIphase2|SHIPNATREND|UKBB|WHI
10: Airwave|BBJ|BioMe|CaPS|Estonia|Estonia|FHS|FINCAVAS|HANDLS|INTERVAL|JHS|MESA|MHIphase1|MHIphase2|SHIPNATREND|UKBB|WHI
11: Airwave|BioMe|CaPS|CHS|Estonia|Estonia|FHS|FINCAVAS|GERA|GERA|GERA|INTERVAL|MESA|MHIphase1|MHIphase2|RS|RS|RSI|SHIP|SHIPNATREND|UKBB|WHI|YFS
12: Airwave|BBJ|BioMe|CaPS|CHS|Estonia|Estonia|FHS|FINCAVAS|GERA|GERA|GERA|HANDLS|INTERVAL|JHS|MESA|MHIphase1|MHIphase2|RS|RS|RSI|SHIP|SHIPNATREND|UKBB|WHI|YFS
13: Airwave|BioMe|CaPS|CHS|Estonia|Estonia|FHS|FINCAVAS|GERA|GERA|GERA|Health2006|Health2008|Health2010|INTERVAL|MESA|MHIphase1|MHIphase2|RS|RS|RSI|SHIP|SHIPNATREND|UKBB|WHI|YFS
14: Airwave|BBJ|BioMe|CaPS|CHNS|CHS|Estonia|Estonia|FHS|FINCAVAS|GERA|GERA|GERA|HANDLS|Health2006|Health2008|Health2010|INTERVAL|JHS|MESA|MHIphase1|MHIphase2|RS|RS|RSI|SHIP|SHIPNATREND|UKBB|WHI|YFS
15: Airwave|BioMe|CaPS|Estonia|FHS|INTERVAL|MHIphase1|MHIphase2|RS|RS|RSI|SHIP|SHIPNATREND|UKBB|WHI
16: Airwave|BioMe|BioMe|BioMe|CaPS|Estonia|FHS|HANDLS|INTERVAL|JHS|MHIphase1|MHIphase2|RS|RS|RSI|SHIP|SHIPNATREND|UKBB|UKBB|UKBB|UKBB|WHI
17: Airwave|BioMe|CaPS|Estonia|Estonia|FHS|FINCAVAS|GERA|GERA|GERA|INTERVAL|MESA|MHIphase1|MHIphase2|SHIPNATREND|UKBB|WHI
18: Airwave|BBJ|BioMe|BioMe|BioMe|CaPS|Estonia|Estonia|FHS|FINCAVAS|GERA|GERA|GERA|GERA|GERA|HANDLS|INTERVAL|JHS|MESA|MESA|MESA|MHIphase1|MHIphase2|SHIPNATREND|UKBB|UKBB|UKBB|UKBB|WHI
19: Airwave|BBJ|BioMe|BioMe|BioMe|CaPS|CHNS|CHS|CHS|Estonia|Estonia|FHS|FINCAVAS|GERA|GERA|GERA|GERA|GERA|HANDLS|INTERVAL|JHS|MESA|MESA|MESA|MHIphase1|MHIphase2|RS|RS|RSI|SHIP|SHIPNATREND|UKBB|UKBB|UKBB|UKBB|WHI|YFS
20: Airwave|BBJ|BioMe|BioMe|BioMe|CaPS|CHNS|Estonia|Estonia|FHS|FINCAVAS|GERA|GERA|GERA|GERA|GERA|HANDLS|INTERVAL|JHS|MESA|MESA|MESA|MHIphase1|MHIphase2|RS|RS|RSI|SHIP|SHIPNATREND|UKBB|UKBB|UKBB|UKBB|WHI|YFS
21: Airwave|BioMe|CaPS|FHS|GERA|GERA|GERA|INTERVAL|MHIphase1|MHIphase2|RS|RS|RSI|SHIP|SHIPNATREND|UKBB|WHI
22: Airwave|BioMe|BioMe|BioMe|CaPS|FHS|GERA|GERA|GERA|GERA|GERA|HANDLS|INTERVAL|JHS|MHIphase1|MHIphase2|RS|RS|RSI|SHIP|SHIPNATREND|UKBB|UKBB|UKBB|UKBB|WHI
COHORT
# from sup table, seems like SHIPNATREND is SHIP-TREND -
# https://pmc.ncbi.nlm.nih.gov/articles/PMC7480402/#SD1
gwas_study_info = gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, "SHIPNATREND", "SHIP-TREND"))
gwas_study_info = gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, "SIGN", "SiGN"))
# for PUBMED_IDs
twb_pubmed_ids <- c("34026292",
"36329257",
"36009466",
"35046404",
"34934334",
"34834521",
"34404248",
"36778051",
"34522458"
)
gwas_study_info =
gwas_study_info |>
mutate(COHORT = ifelse(PUBMED_ID %in% twb_pubmed_ids,
str_replace_all(COHORT, "Taiwan", "TWB"),
COHORT
)
)
gwas_study_info =
gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT,
'T2D SIGMA Studies',
'SIGMA T2D'))
gwas_study_info =
gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT,
'Tohoku Medical Megabank',
'TMM'))
gwas_study_info =
gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, "Understanding Society", "UnderstandingSociety"))
gwas_study_info = gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, "Viva", "VIVA"))
gwas_study_info |>
filter(grepl("Rotterdam", COHORT))
DATE_ADDED_TO_CATALOG PUBMED_ID FIRST_AUTHOR DATE JOURNAL
<IDat> <int> <char> <IDat> <char>
1: 2025-03-11 40050429 Roselli C 2025-03-06 Nat Genet
LINK
<char>
1: www.ncbi.nlm.nih.gov/pubmed/40050429
STUDY
<char>
1: Meta-analysis of genome-wide associations and polygenic risk prediction for atrial fibrillation in more than 180,000 cases.
DISEASE/TRAIT
<char>
1: Atrial fibrillation
INITIAL_SAMPLE_SIZE
<char>
1: 1,782 Admix African and African American cases, 9,356 Admix African and African American controls, 11,350 East Asian ancestry cases, 137,515 East Asian ancestry controls, 166,322 European ancestry cases, 1,313,950 European ancestry controls, 1,774 Hispanic or Latin American cases, 7,665 Hispanic or Latin American controls, 218 South Asian ancestry cases, 413 South Asian ancestry controls
REPLICATION_SAMPLE_SIZE PLATFORM_[SNPS_PASSING_QC]
<char> <char>
1: <NA> Affymetrix, Illumina [29789980] (imputed)
ASSOCIATION_COUNT MAPPED_TRAIT MAPPED_TRAIT_URI
<int> <char> <char>
1: 355 atrial fibrillation http://www.ebi.ac.uk/efo/EFO_0000275
STUDY_ACCESSION GENOTYPING_TECHNOLOGY SUBMISSION_DATE
<char> <char> <lgcl>
1: GCST90559230 Genome-wide genotyping array NA
STATISTICAL_MODEL BACKGROUND_TRAIT MAPPED_BACKGROUND_TRAIT
<lgcl> <lgcl> <char>
1: NA NA
MAPPED_BACKGROUND_TRAIT_URI
<char>
1:
COHORT
<char>
1: AGES|ARIC|BioMe|Broad CVDi|BBJ|CHS|MESA|SiGN|ENGAGE_AF-TIMI_48|SPHFC|CCAF|CHB|MyCode|EGCUT|FHS|GAPP|GS:SFHS|HRS|LURIC|HUNT|MGI|PHB|PIVUS|PREVEND|PROSPER|Rotterdam|SHIP|SiGN|TwinGene|ULSAM|Vanderbilt|WGHS|WTCCC|FinnGen|UKBB
FULL_SUMMARY_STATISTICS SUMMARY_STATS_LOCATION GXE
<char> <char> <char>
1: no <NA> no
# Rotterdam study is typically listed as "RS"
# see e.g. 36568030 https://pmc.ncbi.nlm.nih.gov/articles/PMC9772568/
gwas_study_info |>
filter(grepl("\\bRS\\b", COHORT))
DATE_ADDED_TO_CATALOG PUBMED_ID FIRST_AUTHOR DATE
<IDat> <int> <char> <IDat>
1: 2023-03-21 36662418 Faber BG 2023-01-20
2: 2023-05-12 36918541 Young WJ 2023-03-14
3: 2023-05-12 36918541 Young WJ 2023-03-14
4: 2023-05-12 36918541 Young WJ 2023-03-14
5: 2023-05-12 36918541 Young WJ 2023-03-14
---
332: 2023-01-31 36568030 Young KL 2022-11-25
333: 2023-01-31 36568030 Young KL 2022-11-25
334: 2023-01-31 36568030 Young KL 2022-11-25
335: 2023-01-31 36568030 Young KL 2022-11-25
336: 2023-01-31 36568030 Young KL 2022-11-25
JOURNAL LINK
<char> <char>
1: Arthritis Rheumatol www.ncbi.nlm.nih.gov/pubmed/36662418
2: Nat Commun www.ncbi.nlm.nih.gov/pubmed/36918541
3: Nat Commun www.ncbi.nlm.nih.gov/pubmed/36918541
4: Nat Commun www.ncbi.nlm.nih.gov/pubmed/36918541
5: Nat Commun www.ncbi.nlm.nih.gov/pubmed/36918541
---
332: HGG Adv www.ncbi.nlm.nih.gov/pubmed/36568030
333: HGG Adv www.ncbi.nlm.nih.gov/pubmed/36568030
334: HGG Adv www.ncbi.nlm.nih.gov/pubmed/36568030
335: HGG Adv www.ncbi.nlm.nih.gov/pubmed/36568030
336: HGG Adv www.ncbi.nlm.nih.gov/pubmed/36568030
STUDY
<char>
1: A GWAS meta-analysis of alpha angle suggests cam-type morphology may be a specific feature of hip osteoarthritis in older adults.
2: Genetic architecture of spatial electrical biomarkers for cardiac arrhythmia and relationship with cardiovascular disease.
3: Genetic architecture of spatial electrical biomarkers for cardiac arrhythmia and relationship with cardiovascular disease.
4: Genetic architecture of spatial electrical biomarkers for cardiac arrhythmia and relationship with cardiovascular disease.
5: Genetic architecture of spatial electrical biomarkers for cardiac arrhythmia and relationship with cardiovascular disease.
---
332: Whole-exome sequence analysis of anthropometric traits illustrates challenges in identifying effects of rare genetic variants.
333: Whole-exome sequence analysis of anthropometric traits illustrates challenges in identifying effects of rare genetic variants.
334: Whole-exome sequence analysis of anthropometric traits illustrates challenges in identifying effects of rare genetic variants.
335: Whole-exome sequence analysis of anthropometric traits illustrates challenges in identifying effects of rare genetic variants.
336: Whole-exome sequence analysis of anthropometric traits illustrates challenges in identifying effects of rare genetic variants.
DISEASE/TRAIT
<char>
1: Alpha angle
2: Frontal QRS-T angle
3: Spatial QRS-T angle
4: Spatial QRS-T angle
5: Frontal QRS-T angle
---
332: Waist-hip ratio
333: Waist-hip ratio
334: Waist-hip ratio
335: Waist-hip ratio
336: Waist-hip ratio
INITIAL_SAMPLE_SIZE
<char>
1: 44,214 European ancestry individuals
2: 159,715 European ancestry, African ancestry, Hispanic or Latin American individuals
3: 96,562 European ancestry individuals
4: 118,780 European ancestry, African ancestry, Hispanic or Latin American individuals
5: 134,567 European ancestry individuals
---
332: 15,503 European ancestry individuals
333: 8,678 European ancestry women
334: 6,825 European ancestry men
335: 2,987 African ancestry women, 8,678 European ancestry women
336: 1,307 African ancestry men, 6,825 European ancestry men
REPLICATION_SAMPLE_SIZE
<char>
1: <NA>
2: <NA>
3: <NA>
4: <NA>
5: <NA>
---
332: 1,229 European ancestry individuals
333: 771 European ancestry women
334: 758 European ancestry men
335: 771 European ancestry women, 2,308 African American women
336: 758 European ancestry men, 1,239 African American men
PLATFORM_[SNPS_PASSING_QC] ASSOCIATION_COUNT
<char> <int>
1: Affymetrix, Illumina [9134976] (imputed) 8
2: Affymetrix, Illumina [8299259] (imputed) 11
3: Affymetrix, Illumina [8603009] (imputed) 51
4: Affymetrix, Illumina [9052360] (imputed) 61
5: Affymetrix, Illumina [7954211] (imputed) 9
---
332: NR [67633] 0
333: NR [67633] 0
334: NR [67633] 0
335: NR [67633] 0
336: NR [67633] 0
MAPPED_TRAIT MAPPED_TRAIT_URI
<char> <char>
1: alpha angle measurement http://www.ebi.ac.uk/efo/EFO_0020071
2: QRS-T angle http://www.ebi.ac.uk/efo/EFO_0020097
3: QRS-T angle http://www.ebi.ac.uk/efo/EFO_0020097
4: QRS-T angle http://www.ebi.ac.uk/efo/EFO_0020097
5: QRS-T angle http://www.ebi.ac.uk/efo/EFO_0020097
---
332: waist-hip ratio http://www.ebi.ac.uk/efo/EFO_0004343
333: waist-hip ratio http://www.ebi.ac.uk/efo/EFO_0004343
334: waist-hip ratio http://www.ebi.ac.uk/efo/EFO_0004343
335: waist-hip ratio http://www.ebi.ac.uk/efo/EFO_0004343
336: waist-hip ratio http://www.ebi.ac.uk/efo/EFO_0004343
STUDY_ACCESSION GENOTYPING_TECHNOLOGY SUBMISSION_DATE
<char> <char> <lgcl>
1: GCST90129635 Genome-wide genotyping array NA
2: GCST90246319 Genome-wide genotyping array NA
3: GCST90246320 Genome-wide genotyping array NA
4: GCST90246318 Genome-wide genotyping array NA
5: GCST90246321 Genome-wide genotyping array NA
---
332: GCST90245813 Exome-wide sequencing NA
333: GCST90245814 Exome-wide sequencing NA
334: GCST90245815 Exome-wide sequencing NA
335: GCST90245816 Exome-wide sequencing NA
336: GCST90245817 Exome-wide sequencing NA
STATISTICAL_MODEL BACKGROUND_TRAIT MAPPED_BACKGROUND_TRAIT
<lgcl> <lgcl> <char>
1: NA NA
2: NA NA
3: NA NA
4: NA NA
5: NA NA
---
332: NA NA
333: NA NA
334: NA NA
335: NA NA
336: NA NA
MAPPED_BACKGROUND_TRAIT_URI
<char>
1:
2:
3:
4:
5:
---
332:
333:
334:
335:
336:
COHORT
<char>
1: UKBB|RS
2: ARIC|BRIGHT|CHRIS|CHS|ERF|GS:SFHS|HCHS|SOL|Inter99|JHS|LifeLines|MESA|NEO|Orcades|PREVEND|PROSPER|RS|UKBB|VIKING|WHI
3: ARIC|BRIGHT|CHRIS|CHS|ERF|GS:SFHS|HCHS|SOL|Inter99|JHS|LifeLines|MESA|NEO|Orcades|PREVEND|PROSPER|RS|UKBB|VIKING|WHI
4: ARIC|BRIGHT|CHRIS|CHS|ERF|GS:SFHS|HCHS|SOL|Inter99|JHS|LifeLines|MESA|NEO|Orcades|PREVEND|PROSPER|RS|UKBB|VIKING|WHI
5: ARIC|BRIGHT|CHRIS|CHS|ERF|GS:SFHS|HCHS|SOL|Inter99|JHS|LifeLines|MESA|NEO|Orcades|PREVEND|PROSPER|RS|UKBB|VIKING|WHI
---
332: ARIC|CHS|ERF|FHS|GOLDN|RS
333: ARIC|CHS|ERF|FHS|GOLDN|RS
334: ARIC|CHS|ERF|FHS|GOLDN|RS
335: ARIC|CHS|ERF|FHS|GOLDN|RS
336: ARIC|CHS|ERF|FHS|GOLDN|RS
FULL_SUMMARY_STATISTICS
<char>
1: yes
2: yes
3: yes
4: yes
5: yes
---
332: no
333: no
334: no
335: no
336: no
SUMMARY_STATS_LOCATION
<char>
1: http://ftp.ebi.ac.uk/pub/databases/gwas/summary_statistics/GCST90129001-GCST90130000/GCST90129635
2: http://ftp.ebi.ac.uk/pub/databases/gwas/summary_statistics/GCST90246001-GCST90247000/GCST90246319
3: http://ftp.ebi.ac.uk/pub/databases/gwas/summary_statistics/GCST90246001-GCST90247000/GCST90246320
4: http://ftp.ebi.ac.uk/pub/databases/gwas/summary_statistics/GCST90246001-GCST90247000/GCST90246318
5: http://ftp.ebi.ac.uk/pub/databases/gwas/summary_statistics/GCST90246001-GCST90247000/GCST90246321
---
332: <NA>
333: <NA>
334: <NA>
335: <NA>
336: <NA>
GXE
<char>
1: no
2: no
3: no
4: no
5: no
---
332: no
333: no
334: no
335: no
336: no
gwas_study_info = gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, "Rotterdam", "RS"))
all_cohorts = gwas_study_info$COHORT
all_cohorts = unlist(strsplit(all_cohorts, "\\|"))
unique(all_cohorts) |> length()
[1] 1070
single_use_cohorts =
data.frame(cohort = all_cohorts) |>
group_by(cohort) |>
summarise(n_studies = n()) |>
filter(n_studies == 1) |>
pull(cohort)
length(single_use_cohorts)
[1] 209
unique_cohort_names = unique(all_cohorts)
# Convert to lowercase and check duplicates
dup_groups <- tapply(unique_cohort_names, tolower(unique_cohort_names), I)
# Keep only groups with >1 element (i.e., capitalization differences)
dup_groups[lengths(dup_groups) > 1]
named character(0)
normalized <- gsub("[ _]", "", sort(unique_cohort_names))
# Group by normalized value
dup_groups <- tapply(sort(unique_cohort_names), normalized, I)
# Keep only groups with >1 element (i.e. variants)
dup_groups[lengths(dup_groups) > 1]
named character(0)
normalized <- gsub("[ _]", "", sort(unique_cohort_names))
# Group by normalized value
dup_groups <- tapply(sort(unique_cohort_names), tolower(normalized), I)
# Keep only groups with >1 element (i.e. variants)
dup_groups[lengths(dup_groups) > 1]
named character(0)
library(stringdist)
library(dplyr)
# Identify pairs with small distance (e.g., <=2 edits)
small_dist_pairs <- function(threshold,
dist_matrix_method,
all_cohorts) {
# Create a vector of unique cohort names
cohorts <- unique(all_cohorts)
single_use_cohorts = data.frame(cohort = all_cohorts) |>
group_by(cohort) |>
summarise(n_studies = n()) |>
filter(n_studies == 1) |>
pull(cohort)
# Compute pairwise string distances (Levenshtein distance)
dist_matrix <- stringdistmatrix(single_use_cohorts,
cohorts,
method = dist_matrix_method)
matches <- which(dist_matrix > 0 & dist_matrix <= threshold,
arr.ind = TRUE)
matches <- data.frame(
cohort1 = single_use_cohorts[matches[,1]],
cohort2 = cohorts[matches[,2]],
distance = dist_matrix[matches]
)
matches <- matches[matches$cohort1 != matches$cohort2, ]
matches <- unique(matches)
return(matches)
}
small_dist_pairs(threshold = 2,
dist_matrix_method = "lv",
all_cohorts = all_cohorts) |>
arrange(distance) |>
head()
cohort1 cohort2 distance
1 CHIP SHIP 1
2 SpBCS SEBCS 1
3 DCHS DACHS 1
4 HIS HAS 1
5 MACS MCCS 1
6 MHCS MCCS 1
small_dist_pairs(threshold = 2,
dist_matrix_method = "lcs",
all_cohorts = all_cohorts) |>
arrange(distance) |>
head()
cohort1 cohort2 distance
1 DCHS DACHS 1
2 DNHS NHS 1
3 CCHS CHS 1
4 DCHS CHS 1
5 BLS BLTS 1
6 EBB EB 1
# COHRA1 vs COHRA
# COHRA2 vs COHRA
# EGLE vs EAGLE
# GHS-II GHS-I
# B-PROOF BPROOF
gwas_study_info |>
filter(grepl("\\bB-PROOF\\b", COHORT))
DATE_ADDED_TO_CATALOG PUBMED_ID FIRST_AUTHOR DATE JOURNAL
<IDat> <int> <char> <IDat> <char>
1: 2024-10-07 39103364 Went M 2024-08-05 Nat Commun
LINK
<char>
1: www.ncbi.nlm.nih.gov/pubmed/39103364
STUDY
<char>
1: Deciphering the genetics and mechanisms of predisposition to multiple myeloma.
DISEASE/TRAIT
<char>
1: Multiple myeloma
INITIAL_SAMPLE_SIZE
<char>
1: 10,906 European ancestry cases, 366,221 European ancestry controls
REPLICATION_SAMPLE_SIZE PLATFORM_[SNPS_PASSING_QC] ASSOCIATION_COUNT
<char> <char> <int>
1: <NA> NR [8100000] (imputed) 35
MAPPED_TRAIT MAPPED_TRAIT_URI STUDY_ACCESSION
<char> <char> <char>
1: multiple myeloma http://www.ebi.ac.uk/efo/EFO_0001378 GCST90451657
GENOTYPING_TECHNOLOGY SUBMISSION_DATE STATISTICAL_MODEL
<char> <lgcl> <lgcl>
1: Genome-wide genotyping array NA NA
BACKGROUND_TRAIT MAPPED_BACKGROUND_TRAIT MAPPED_BACKGROUND_TRAIT_URI
<lgcl> <char> <char>
1: NA
COHORT
<char>
1: SNMB|B58C|NBBS|GMMG|HNR|DBDS|MRC|PRACTICAL|BCAC|CGEMS|deCODE|UKBB|B-PROOF
FULL_SUMMARY_STATISTICS SUMMARY_STATS_LOCATION GXE
<char> <char> <char>
1: no <NA> no
gwas_study_info |>
filter(grepl("\\bBPROOF\\b", COHORT))
DATE_ADDED_TO_CATALOG PUBMED_ID FIRST_AUTHOR DATE JOURNAL
<IDat> <int> <char> <IDat> <char>
1: 2021-02-18 33510174 Jones G 2021-01-28 Nat Commun
2: 2021-02-18 33510174 Jones G 2021-01-28 Nat Commun
3: 2021-02-18 33510174 Jones G 2021-01-28 Nat Commun
4: 2021-02-18 33510174 Jones G 2021-01-28 Nat Commun
5: 2021-02-18 33510174 Jones G 2021-01-28 Nat Commun
6: 2021-02-18 33510174 Jones G 2021-01-28 Nat Commun
LINK
<char>
1: www.ncbi.nlm.nih.gov/pubmed/33510174
2: www.ncbi.nlm.nih.gov/pubmed/33510174
3: www.ncbi.nlm.nih.gov/pubmed/33510174
4: www.ncbi.nlm.nih.gov/pubmed/33510174
5: www.ncbi.nlm.nih.gov/pubmed/33510174
6: www.ncbi.nlm.nih.gov/pubmed/33510174
STUDY
<char>
1: Genome-wide meta-analysis of muscle weakness identifies 15 susceptibility loci in older men and women.
2: Genome-wide meta-analysis of muscle weakness identifies 15 susceptibility loci in older men and women.
3: Genome-wide meta-analysis of muscle weakness identifies 15 susceptibility loci in older men and women.
4: Genome-wide meta-analysis of muscle weakness identifies 15 susceptibility loci in older men and women.
5: Genome-wide meta-analysis of muscle weakness identifies 15 susceptibility loci in older men and women.
6: Genome-wide meta-analysis of muscle weakness identifies 15 susceptibility loci in older men and women.
DISEASE/TRAIT
<char>
1: Low hand grip strength (60 years and older) (EWGSOP)
2: Low hand grip strength (60 years and older) (EWGSOP)
3: Low hand grip strength (60 years and older) (EWGSOP)
4: Low hand grip strength (60 years and older) (FNIH)
5: Low hand grip strength (60 years and older) (FNIH)
6: Low hand grip strength (60 years and older) (FNIH)
INITIAL_SAMPLE_SIZE
<char>
1: 48,596 European ancestry cases, 207,927 European ancestry controls
2: 34,589 European ancestry female cases, 100,879 European ancestry female controls
3: 14,007 European ancestry male cases, 107,048 European ancestry male controls
4: 20,335 European ancestry cases, 236,188 European ancestry controls
5: 13,601 European ancestry female cases, 121,867 European ancestry female controls
6: 6,734 European ancestry male cases, 114,321 European ancestry male controls
REPLICATION_SAMPLE_SIZE PLATFORM_[SNPS_PASSING_QC]
<char> <char>
1: <NA> Affymetrix, Illumina [9457422] (imputed)
2: <NA> Affymetrix, Illumina [9449805] (imputed)
3: <NA> Affymetrix, Illumina [9464541] (imputed)
4: <NA> Affymetrix, Illumina [9465622] (imputed)
5: <NA> Affymetrix, Illumina [9431325] (imputed)
6: <NA> Affymetrix, Illumina [9471905] (imputed)
ASSOCIATION_COUNT MAPPED_TRAIT
<int> <char>
1: 15 grip strength measurement
2: 8 grip strength measurement
3: 3 grip strength measurement
4: 5 grip strength measurement
5: 0 grip strength measurement
6: 0 grip strength measurement
MAPPED_TRAIT_URI STUDY_ACCESSION
<char> <char>
1: http://www.ebi.ac.uk/efo/EFO_0006941 GCST90007526
2: http://www.ebi.ac.uk/efo/EFO_0006941 GCST90007527
3: http://www.ebi.ac.uk/efo/EFO_0006941 GCST90007528
4: http://www.ebi.ac.uk/efo/EFO_0006941 GCST90007529
5: http://www.ebi.ac.uk/efo/EFO_0006941 GCST90007530
6: http://www.ebi.ac.uk/efo/EFO_0006941 GCST90007531
GENOTYPING_TECHNOLOGY SUBMISSION_DATE STATISTICAL_MODEL
<char> <lgcl> <lgcl>
1: Genome-wide genotyping array NA NA
2: Genome-wide genotyping array NA NA
3: Genome-wide genotyping array NA NA
4: Genome-wide genotyping array NA NA
5: Genome-wide genotyping array NA NA
6: Genome-wide genotyping array NA NA
BACKGROUND_TRAIT MAPPED_BACKGROUND_TRAIT MAPPED_BACKGROUND_TRAIT_URI
<lgcl> <char> <char>
1: NA
2: NA
3: NA
4: NA
5: NA
6: NA
COHORT
<char>
1: ARIC|BASE-II|BPROOF|CHS|EPIC-Norfolk|FHS|HRS|InCHIANTI|LASA I|LASA II|Long Life Family Study|MrOS Gothenburg|MrOS Malmo|ROSMAP|ROSMAP|RS|RSI|RSII|SHIP|TSHA|UKBB|WLS
2: ARIC|BASE-II|BPROOF|CHS|EPIC-Norfolk|FHS|HRS|InCHIANTI|LASA I|LASA II|Long Life Family Study|MrOS Gothenburg|MrOS Malmo|ROSMAP|ROSMAP|RS|RSI|RSII|SHIP|TSHA|UKBB|WLS
3: ARIC|BASE-II|BPROOF|CHS|EPIC-Norfolk|FHS|HRS|InCHIANTI|LASA I|LASA II|Long Life Family Study|MrOS Gothenburg|MrOS Malmo|ROSMAP|ROSMAP|RS|RSI|RSII|SHIP|TSHA|UKBB|WLS
4: ARIC|BASE-II|BPROOF|CHS|EPIC-Norfolk|FHS|HRS|InCHIANTI|LASA I|LASA II|Long Life Family Study|MrOS Gothenburg|MrOS Malmo|ROSMAP|ROSMAP|RS|RSI|RSII|SHIP|TSHA|UKBB|WLS
5: ARIC|BASE-II|BPROOF|CHS|EPIC-Norfolk|FHS|HRS|InCHIANTI|LASA I|LASA II|Long Life Family Study|MrOS Gothenburg|MrOS Malmo|ROSMAP|ROSMAP|RS|RSI|RSII|SHIP|TSHA|UKBB|WLS
6: ARIC|BASE-II|BPROOF|CHS|EPIC-Norfolk|FHS|HRS|InCHIANTI|LASA I|LASA II|Long Life Family Study|MrOS Gothenburg|MrOS Malmo|ROSMAP|ROSMAP|RS|RSI|RSII|SHIP|TSHA|UKBB|WLS
FULL_SUMMARY_STATISTICS
<char>
1: yes
2: yes
3: yes
4: yes
5: yes
6: yes
SUMMARY_STATS_LOCATION
<char>
1: http://ftp.ebi.ac.uk/pub/databases/gwas/summary_statistics/GCST90007001-GCST90008000/GCST90007526
2: http://ftp.ebi.ac.uk/pub/databases/gwas/summary_statistics/GCST90007001-GCST90008000/GCST90007527
3: http://ftp.ebi.ac.uk/pub/databases/gwas/summary_statistics/GCST90007001-GCST90008000/GCST90007528
4: http://ftp.ebi.ac.uk/pub/databases/gwas/summary_statistics/GCST90007001-GCST90008000/GCST90007529
5: http://ftp.ebi.ac.uk/pub/databases/gwas/summary_statistics/GCST90007001-GCST90008000/GCST90007530
6: http://ftp.ebi.ac.uk/pub/databases/gwas/summary_statistics/GCST90007001-GCST90008000/GCST90007531
GXE
<char>
1: no
2: no
3: no
4: no
5: no
6: no
# WAMHS WASHS
# CALGB 40502 CALGB 40503
# matches <- which(dist_matrix > 0 & dist_matrix <= threshold, arr.ind = TRUE)
# matches <- data.frame(
# cohort1 = single_use_cohorts[matches[,1]],
# cohort2 = cohorts[matches[,2]],
# distance = dist_matrix[matches]
# )
#
#
# matches |>
# arrange(distance) |>
# head()
gwas_study_info =
gwas_study_info |>
mutate(COHORT = ifelse(PUBMED_ID == 34604815,
str_replace_all(COHORT, "Mount Sinai", "BioMe"),
COHORT
)
)
gwas_study_info =
gwas_study_info |>
mutate(COHORT = ifelse(PUBMED_ID == 40181193,
str_replace_all(COHORT, "BHS_b", "BHS"),
COHORT
)
)
gwas_study_info =
gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, "B-PROOF", "BPROOF")
)
# replace other with: AGORA
# for PUBMED_ID 36551779
gwas_study_info =
gwas_study_info |>
mutate(COHORT = ifelse(PUBMED_ID == 36551779,
str_replace_all(COHORT, "other", "AGORA"),
COHORT
)
)
# if first author, COVID-19 Host Genetics Initiative
# replace Estonia with: EB
gwas_study_info =
gwas_study_info |>
mutate(COHORT = ifelse(FIRST_AUTHOR == "COVID-19 Host Genetics Initiative",
str_replace_all(COHORT, "Estonia", "EB"),
COHORT
)
)
# then pubmed id: 34791234
# replace Estonia with EB
gwas_study_info =
gwas_study_info |>
mutate(COHORT = ifelse(PUBMED_ID == 34791234,
str_replace_all(COHORT, "Estonia", "EB"),
COHORT
)
)
# PUBMED_ID = 32298765
# European NAFLD Registry Metacohort
# if STAGE = "initial"
# then set
# COHORT = European NAFLD Registry Metacohort|WTCCC|HYPERGENES|KORA|Understanding Society
gwas_study_info =
gwas_study_info |>
mutate(COHORT = ifelse(COHORT == "" & PUBMED_ID == "32298765",
"European NAFLD Registry Metacohort|WTCCC|HYPERGENES|KORA|Understanding Society",
COHORT
)
)
# PUBMED_ID = 32298765
# if STAGE = "replication"
# then set:
# COHORT = European NAFLD Registry Metacohort
# gwas_study_info =
# gwas_study_info |>
# mutate(COHORT = ifelse(COHORT == "" &
# PUBMED_ID == "32298765" &
# STAGE == "replication",
# "European NAFLD Registry Metacohort",
# COHORT
# )
# )
gwas_study_info =
gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT,
'The ""European NAFLD Registry"" Metacohort',
'European NAFLD Registry Metacohort')) |>
mutate(COHORT = str_replace_all(COHORT,
'The "European NAFLD Registry" Metacohort',
'European NAFLD Registry Metacohort'))
all_cohorts = gwas_study_info$COHORT
all_cohorts = unlist(strsplit(all_cohorts, "\\|"))
unique(all_cohorts) |> length()
[1] 1069
cohort_names_df <- readxl::read_xlsx(here::here("data/cohort/cohort_desc.xlsx")) |>
mutate(across(everything(),
~stringr::str_replace_all(.x,
pattern = "\u00A0",
replacement = " ")))
# all cohort names:
all_cohort_names <-
c(cohort_names_df$cohort,
cohort_names_df$full_name,
cohort_names_df$synyoms)
all_cohort_names <- unique(all_cohort_names)
length(all_cohort_names)
[1] 1496
gwas_cat_names <- unique(all_cohorts)
not_found_names <-gwas_cat_names[!(gwas_cat_names %in% all_cohort_names)]
checked_all_cohort_names <- tolower(stringr::str_trim(all_cohort_names))
found_with_edits <- gwas_cat_names[(tolower(gwas_cat_names) %in% checked_all_cohort_names)]
# names that match when case is ignored:
not_found_names[not_found_names %in% found_with_edits]
[1] "Croatia" "Nagahama"
print("Number of cohorts not (yet) included in data-dictionary")
[1] "Number of cohorts not (yet) included in data-dictionary"
length(not_found_names)
[1] 631
print("Most used cohorts that are not included in the data-dictionary")
[1] "Most used cohorts that are not included in the data-dictionary"
data.frame(
cohort_name =
all_cohorts[all_cohorts %in% not_found_names]) |>
group_by(cohort_name) |>
summarise(n = n()) |>
arrange(desc(n)) |>
head()
# A tibble: 6 × 2
cohort_name n
<chr> <int>
1 WRAP 441
2 AMISH 406
3 LBC 142
4 HBCS 138
5 RBC-Omics 131
6 HELIOS 125
gwas_study_info =
gwas_study_info |>
select(STUDY_ACCESSION,
DATE,
COHORT) |>
distinct()
data.table::fwrite(gwas_study_info,
here::here("output/gwas_cohorts/gwas_cohort_name_corrected.csv"),
sep = ",")
# in below study, unlisted cohort is combination of two cohorts
gwas_study_info |>
filter(PUBMED_ID == 32605384) |>
select(PUBMED_ID,
COHORT,
STUDY_ACCESSION,
"DISEASE/TRAIT",
"INITIAL_SAMPLE_SIZE",
"REPLICATION_SAMPLE_SIZE")
gwas_study_info |>
filter(PUBMED_ID == 30510241) |>
select(PUBMED_ID,
COHORT,
STUDY_ACCESSION,
"DISEASE/TRAIT",
"INITIAL_SAMPLE_SIZE",
"REPLICATION_SAMPLE_SIZE"
)
# if go to supplement, can see made up of many many many studies - I believe includes other all other subsamples
gwas_study_info |>
filter(PUBMED_ID == 33307546) |>
select(PUBMED_ID,
COHORT,
STUDY_ACCESSION,
"DISEASE/TRAIT",
"INITIAL_SAMPLE_SIZE",
"REPLICATION_SAMPLE_SIZE")
# COVID-19 Host Genetics Initiative (HGI) is this hispanic individuals I believe
# European ancestry from the ‘broad respiratory phenotype’ study of 23andMe
# See replication section of https://www.nature.com/articles/s41586-020-03065-y#Sec4
gwas_study_info |>
filter(PUBMED_ID == 38184787) |>
select(PUBMED_ID, COHORT, STUDY_ACCESSION,
"DISEASE/TRAIT",
"INITIAL_SAMPLE_SIZE",
"REPLICATION_SAMPLE_SIZE")
# cohorts listed are for
Mayo Clinic Bipolar Biobank (STUDY_ACCESSION: GCST90554822)
MAYO-Clinic RGC Project Generation. (PUBMED_ID: 37949852)
Mayo Clinic (PUBMED_ID: 40050615)
Mayo-VDB|
# Stanford_ADRC
# CROATIA
# Raine Study -- ? Raine
# Penn - UPenn etc.
# ?CALGB
# "SIGNET-REGARDS" >? "SIGNET"
# "RISC" & "RISK" appear to be different
# Relationship Between Insulin Sensitivity and Cardiovascular Disease Risk (RISC)
# Risk Stratification and Identification of Immunogenetic and Microbial Markers of Rapid Disease Progression in Children with Crohn’s Disease (RISK)
"CKB"
[231] "COHRA"
[232] "COHRA1"
[233] "COHRA2"
# UK Blood Service (UKBS)
[294] "DiscovEHR"
[295] "DISCOVeRY-BMT"
[330] "ELSA"
[331] "ELSA-Brasil"
[340] "EPIC"
[341] "EPIC_CAD"
[342] "EPIC_Obs"
[343] "EPIC-Norfolk"
[344] "EPICURE"
[372] "FinnTwin"
[373] "FinnTwin12"
[463] "GOCS"
[464] "GOCS_Chilean"
[480] "GRAAD"
[481] "GRaD"
# Colo2&3
[513] "HELIC"
[514] "HELIC-MANOLIS"
[515] "HELIC-Pomak"
# ? QTR == QTR_Qindao
# ? is "other|UKB" == "UKB|other"
# ? is UK|NR == UKB|NR
# ? CF_TSS == TSS
gwas_study_info |>
filter(grepl("ORPS", COHORT))
gwas_study_info |>
filter(PUBMED_ID == 39749473) |>
select(COHORT)
# PAGE vs PAGES
# PUBMED ID: 35754128 - should be PAGES
# see sup table 1. https://pmc.ncbi.nlm.nih.gov/articles/PMC9671132/
# COGEND COGENT
sessionInfo()
R version 4.3.1 (2023-06-16)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS 15.6.1
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
time zone: America/Los_Angeles
tzcode source: internal
attached base packages:
[1] stats graphics grDevices datasets utils methods base
other attached packages:
[1] stringdist_0.9.15 stringr_1.5.2 ggplot2_3.5.2 dplyr_1.1.4
[5] data.table_1.17.8 workflowr_1.7.1
loaded via a namespace (and not attached):
[1] sass_0.4.10 utf8_1.2.6 generics_0.1.4 renv_1.0.3
[5] stringi_1.8.7 digest_0.6.37 magrittr_2.0.4 evaluate_1.0.5
[9] grid_4.3.1 RColorBrewer_1.1-3 fastmap_1.2.0 cellranger_1.1.0
[13] rprojroot_2.1.0 jsonlite_2.0.0 processx_3.8.6 whisker_0.4.1
[17] ps_1.9.1 promises_1.3.3 httr_1.4.7 scales_1.4.0
[21] jquerylib_0.1.4 cli_3.6.5 rlang_1.1.6 withr_3.0.2
[25] cachem_1.1.0 yaml_2.3.10 tools_4.3.1 parallel_4.3.1
[29] httpuv_1.6.16 here_1.0.1 vctrs_0.6.5 R6_2.6.1
[33] lifecycle_1.0.4 git2r_0.36.2 fs_1.6.6 pkgconfig_2.0.3
[37] callr_3.7.6 pillar_1.11.1 bslib_0.9.0 later_1.4.4
[41] gtable_0.3.6 glue_1.8.0 Rcpp_1.1.0 xfun_0.53
[45] tibble_3.3.0 tidyselect_1.2.1 rstudioapi_0.17.1 knitr_1.50
[49] farver_2.1.2 htmltools_0.5.8.1 rmarkdown_2.30 compiler_4.3.1
[53] getPass_0.2-4 readxl_1.4.5