Last updated: 2025-08-21
Checks: 7 0
Knit directory:
genomics_ancest_disease_dispar/
This reproducible R Markdown analysis was created with workflowr (version 1.7.1). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.
Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.
Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.
The command set.seed(20220216)
was run prior to running
the code in the R Markdown file. Setting a seed ensures that any results
that rely on randomness, e.g. subsampling or permutations, are
reproducible.
Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.
Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.
Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.
The results in this page were generated with repository version ac13d70. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.
Note that you need to be careful to ensure that all relevant files for
the analysis have been committed to Git prior to generating the results
(you can use wflow_publish
or
wflow_git_commit
). workflowr only checks the R Markdown
file, but you know if there are other scripts or data files that it
depends on. Below is the status of the Git repository when the results
were generated:
Ignored files:
Ignored: .Rproj.user/
Ignored: data/gwas_catalog/
Ignored: output/gwas_study_info_cohort_corrected.csv
Untracked files:
Untracked: analysis/cohort_dist.Rmd
Untracked: analysis/collapse_traits.Rmd
Untracked: analysis/missing_cohort_info.Rmd
Untracked: data/.DS_Store
Untracked: renv/
Unstaged changes:
Modified: .Rprofile
Modified: analysis/collapse_cohorts.Rmd
Modified: code/collapse_diseases.R
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
These are the previous versions of the repository in which changes were
made to the R Markdown
(analysis/correcting_cohort_names.Rmd
) and HTML
(docs/correcting_cohort_names.html
) files. If you’ve
configured a remote Git repository (see ?wflow_git_remote
),
click on the hyperlinks in the table below to view the files as they
were in that past version.
File | Version | Author | Date | Message |
---|---|---|---|---|
Rmd | ac13d70 | IJbeasley | 2025-08-21 | Updating correcting cohort labels |
html | 6c592b7 | IJbeasley | 2025-08-20 | Build site. |
Rmd | 1969e6b | IJbeasley | 2025-08-20 | More corrections / harmonisation of cohort names in gwas catalog |
knitr::opts_chunk$set(echo = TRUE, message = FALSE, warning = FALSE)
library(data.table)
library(dplyr)
library(ggplot2)
library(stringr)
# Load GWAS Catalog studies
gwas_study_info <- fread(here::here("data/gwas_catalog/gwas-catalog-v1.0.3.1-studies-r2025-07-21.tsv"),
sep = "\t", quote = "")
# Standardize column names (remove spaces)
gwas_study_info <- gwas_study_info |>
rename_all(~gsub(" ", "_", .x))
gwas_study_info <- gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, " \\| ", "|")) |>
mutate(COHORT = str_replace_all(COHORT, "\\| ", "|")) |>
mutate(COHORT = str_replace_all(COHORT, " \\|", "|"))
all_cohorts = gwas_study_info$COHORT
all_cohorts = unlist(strsplit(all_cohorts, "\\|"))
unique(all_cohorts) |> length()
[1] 1183
# Correct for discrepancies within same paper
gwas_study_info <- gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, "AWI-Gen", "AWI-GEN")) |> # PUBMED ID :40229280
mutate(COHORT = str_replace_all(COHORT, "AddHealth", "Add Health")) |> # PUBMED ID: 37494057
mutate(COHORT = str_replace_all(COHORT, fixed("EB|FinnGen|UKBB"), "EB|FinnGen|UKB")) |> # 39067062
mutate(COHORT = str_replace_all(COHORT, "Estonian Biobank", "EB")) |> # PUBMED ID: 39500877
mutate(COHORT = str_replace_all(COHORT, "AWIGEN", "AWI-GEN")) # 40229280
# making other be the same
gwas_study_info <- gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, "Other", "other")) |>
mutate(COHORT = str_replace_all(COHORT, "OTHER", "other")) |>
mutate(COHORT = str_replace_all(COHORT, "others", "other"))
# making "multiple" designation to be the same
gwas_study_info <- gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT,
"(\\(Multiple cohorts\\))|(\\(multiple\\))|Multiple",
"multiple"))
# some use commas instead of | to designate multiple cohorts
gwas_study_info <- gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, ", ", "|"))
# Makes TwinsUK consistent
gwas_study_info <- gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, "TWINS-UK|TWINSUK", "TwinsUK"))
# Make epic norfolk consistent
gwas_study_info <- gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, "EPIC-Norfolk cohort", "EPIC-Norfolk"))
# Make emerge consistent
gwas_study_info <- gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, "EMERGE", "eMERGE"))
# Make twingene consistent
gwas_study_info <- gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, "TWINGENE", "TwinGene"))
# Make QSkin consistent
gwas_study_info <- gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, "QSkin|Qskin", "QSKIN"))
# Make 23andme consistent
gwas_study_info <- gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, "23ANDME", "23andMe"))
# Make PopGen consistent
gwas_study_info <- gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, "PopGen", "POPGEN"))
# Make decode consistent
gwas_study_info <- gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, "DECODE|deCode|DeCODE", "deCODE"))
# Make FinnGen consistent
gwas_study_info <- gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, "Finngen|FINNGEN", "FinnGen"))
gwas_study_info <- gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, "genomicc", "GenOMICC")) |>
mutate(COHORT = str_replace_all(COHORT, "IPSYCH", "iPSYCH")) |>
mutate(COHORT = str_replace_all(COHORT, "SIMES", "SiMES")) |>
mutate(COHORT = str_replace_all(COHORT, "HELIX", "Helix"))
gwas_study_info = gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, "FINLAND", "Finland"))
all_cohorts = gwas_study_info$COHORT
all_cohorts = unlist(strsplit(all_cohorts, "\\|"))
unique(all_cohorts) |> length()
[1] 1153
# CARDIoGRAMplusC4D cohort includes both CARDIoGRAM and C4D cohorts
# see: https://cardiogramplusc4d.org/data-downloads/
# for coding, therefore, we change this to CARDIoGRAM|C4D
gwas_study_info = gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, "CARDIoGRAMplusC4D", "CARDIoGRAM|C4D"))
all_cohorts[grep("ukb", tolower(all_cohorts))] |> unique()
[1] "UKB" "UKBB" "UKBB White British"
[4] "UKBS" "UKB-PPP"
gwas_study_info |>
filter(grepl("UKBS", COHORT)) |>
select(PUBMED_ID, COHORT) |>
distinct()
PUBMED_ID
<int>
1: 37653029
2: 34127860
COHORT
<char>
1: other|GenEPA|CHOP|EPICURE|HBCS|KORA|ILM|PoBI|POPGEN|TSS|UKBS
2: BC58|BDA|NIHR Cambridge BioResource|GRID|UKBS|BRI|CLEAR|EDIC|GoKinD|NYCP|NIMH|SEARCH|TrialNet|T1DGC|UAB|UC|UCSF|IDDMGEN|T1DGEN|MCW|GRID-NI|Young Hearts-NI|Steno Diabetes Center|HSG|HapMap
# for PUBMED_ID: 37653029
# UKBS seems to be UK Biobank Bank
# for pubmed id: 34127860
# UKBS is UK Blood Service (UKBS)
gwas_study_info |>
filter(grepl("UKB-PPP", COHORT)) |>
select(PUBMED_ID, COHORT) |>
distinct()
PUBMED_ID COHORT
<int> <char>
1: 37794183 UKB-PPP
# pubmed id 37794183 is uk biobank - protein
gwas_study_info <- gwas_study_info |>
mutate(COHORT = ifelse(COHORT == "UKB-PPP", "UKB", COHORT)) |>
mutate(COHORT = str_replace_all(COHORT, "UKBB White British", "UKB")) |>
mutate(COHORT = gsub("\\bUKB\\b", "UKBB", COHORT))
all_cohorts = gwas_study_info$COHORT
all_cohorts = unlist(strsplit(all_cohorts, "\\|"))
unique(all_cohorts) |> length()
[1] 1150
# seems NIHR Cambridge BioResource & NIHR BIORESOURCE are the same
# https://www.cambridgebioresource.group.cam.ac.uk/
gwas_study_info |>
filter(grepl("NIHR Cambridge BioResource", COHORT)) |>
select(PUBMED_ID, COHORT)
PUBMED_ID
<int>
1: 34127860
2: 34127860
COHORT
<char>
1: BC58|BDA|NIHR Cambridge BioResource|GRID|UKBS|BRI|CLEAR|EDIC|GoKinD|NYCP|NIMH|SEARCH|TrialNet|T1DGC|UAB|UC|UCSF|IDDMGEN|T1DGEN|MCW|GRID-NI|Young Hearts-NI|Steno Diabetes Center|HSG|HapMap
2: BC58|BDA|NIHR Cambridge BioResource|GRID|UKBS|BRI|CLEAR|EDIC|GoKinD|NYCP|NIMH|SEARCH|TrialNet|T1DGC|UAB|UC|UCSF|IDDMGEN|T1DGEN|MCW|GRID-NI|Young Hearts-NI|Steno Diabetes Center|HSG|HapMap
gwas_study_info |>
filter(grepl("NIHR BIORESOURCE", COHORT)) |>
select(PUBMED_ID, COHORT) |>
distinct()
PUBMED_ID
<int>
1: 39891803
2: 40205036
COHORT
<char>
1: UKBB|CHARGE|ALSPAC|NIHR BIORESOURCE
2: arcOGEN|ARGO|UKHLS|China Kadoorie Biobank|deCODE|CHB|DBDS|eMERGE|EB|FinnGen|MyCode|GS:SFHS|HRS|HKDDDPC|HUNT|Bunkyo|HerediGene|RIKEN|Shimane-CoHRE|JOCO|LifeLines|NEO|NHS|MGBB|QIMR|RS|SHIP|SIMPLER|ToMMo|TwinsUK|UKBB|BioMe|G&H|NIHR BIORESOURCE|MVP|OAI
gwas_study_info |>
filter(grepl(tolower("BIORESOURCE"), tolower(COHORT))) |>
select(PUBMED_ID, COHORT) |>
distinct()
PUBMED_ID
<int>
1: 39891803
2: 40205036
3: 34127860
COHORT
<char>
1: UKBB|CHARGE|ALSPAC|NIHR BIORESOURCE
2: arcOGEN|ARGO|UKHLS|China Kadoorie Biobank|deCODE|CHB|DBDS|eMERGE|EB|FinnGen|MyCode|GS:SFHS|HRS|HKDDDPC|HUNT|Bunkyo|HerediGene|RIKEN|Shimane-CoHRE|JOCO|LifeLines|NEO|NHS|MGBB|QIMR|RS|SHIP|SIMPLER|ToMMo|TwinsUK|UKBB|BioMe|G&H|NIHR BIORESOURCE|MVP|OAI
3: BC58|BDA|NIHR Cambridge BioResource|GRID|UKBS|BRI|CLEAR|EDIC|GoKinD|NYCP|NIMH|SEARCH|TrialNet|T1DGC|UAB|UC|UCSF|IDDMGEN|T1DGEN|MCW|GRID-NI|Young Hearts-NI|Steno Diabetes Center|HSG|HapMap
gwas_study_info = gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, "NIHR Cambridge BioResource|NIHR BIORESOURCE" , "NIHR BioResource"))
# Leivin biobank appears to a typo - for Living Biobank
# see PUBMED ID 34059833; https://pmc.ncbi.nlm.nih.gov/articles/PMC7610958/#SD1
gwas_study_info |> filter(grepl("Leivin Biobank", COHORT))
DATE_ADDED_TO_CATALOG PUBMED_ID FIRST_AUTHOR DATE JOURNAL
<IDat> <int> <char> <IDat> <char>
1: 2021-06-10 34059833 Chen J 2021-05-31 Nat Genet
LINK
<char>
1: www.ncbi.nlm.nih.gov/pubmed/34059833
STUDY DISEASE/TRAIT
<char> <char>
1: The trans-ancestral genomic architecture of glycemic traits. Fasting glucose
INITIAL_SAMPLE_SIZE REPLICATION_SAMPLE_SIZE
<char> <char>
1: 35,619 East Asian ancestry individuals <NA>
PLATFORM_[SNPS_PASSING_QC] ASSOCIATION_COUNT
<char> <int>
1: Affymetrix, Illumina [15438438] (imputed) 15
MAPPED_TRAIT MAPPED_TRAIT_URI STUDY_ACCESSION
<char> <char> <char>
1: glucose measurement http://www.ebi.ac.uk/efo/EFO_0004468 GCST90002231
GENOTYPING_TECHNOLOGY
<char>
1: Genome-wide genotyping array, Targeted genotyping array [Genome-wide genotyping array|Metabochip]
SUBMISSION_DATE STATISTICAL_MODEL BACKGROUND_TRAIT MAPPED_BACKGROUND_TRAIT
<lgcl> <lgcl> <lgcl> <char>
1: NA NA NA
MAPPED_BACKGROUND_TRAIT_URI
<char>
1:
COHORT
<char>
1: AASC|BES|CAGE-GWAS1|CAGE|CLHNS|CHNS|KARE|Leivin Biobank|MESA|Nagahama Study|NHAPC|SCES|SiMES|SP2|TAICHI|CRC|SBCS|SMHS
FULL_SUMMARY_STATISTICS
<char>
1: yes
SUMMARY_STATS_LOCATION
<char>
1: http://ftp.ebi.ac.uk/pub/databases/gwas/summary_statistics/GCST90002001-GCST90003000/GCST90002231
GXE
<char>
1: no
gwas_study_info = gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, "Leivin Biobank", "Living Biobank"))
gwas_study_info |>
filter(grepl("Ghana", COHORT)) |>
select(PUBMED_ID,COHORT) |>
distinct()
PUBMED_ID COHORT
<int> <char>
1: 39358599 MADCaP|Ghana_Prostate|PRACTICAL
2: 36872133 AAPC|ELLIPSE|Ghana|other|eMERGE|BioVU|BioMe|MVP|ProHealth
# if look at papers they are referring to the same cohorts:
# PUBMED_ID: 36872133 https://pmc.ncbi.nlm.nih.gov/articles/PMC10424812/#S9
# PUBMED_ID: 39358599 https://www.nature.com/articles/s41588-024-01931-3#Sec12
gwas_study_info = gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, "Ghana_Prostate", "Ghana"))
# from reading sup table: https://pmc.ncbi.nlm.nih.gov/articles/instance/7611832/bin/EMS136340-supplement-Supplementary_Information.pdf
# for pubmed 34349265
# seems SARDINIA should be combined into SardiNIA
gwas_study_info |>
filter(grepl("SARDINIA", COHORT)) |>
select(PUBMED_ID, COHORT) |>
distinct()
PUBMED_ID
<int>
1: 34349265
COHORT
<char>
1: ALSPAC|ARIC|other|CHS|CILENTO|COLAUS|EGCUT|EPIC-Norfolk|FHS|INGI-FVG|GS:SFHS|HealthABC|HRS|INCHIANTI|InterAct|KORA|LifeLines|NEO|NHS|NTR|ORCADES|QIMR|RS|SARDINIA|SHIP|SHIP-TREND|TwinGene|TwinsUK|INGI-Val_Borbera|WGHS|WHI|BCAC|UKBB
gwas_study_info |>
filter(grepl("SardiNIA", COHORT)) |>
select(PUBMED_ID, COHORT) |>
distinct()
PUBMED_ID
<int>
1: 36477530
2: 36477530
3: 36477530
4: 36477530
5: 36477530
6: 36477530
7: 36477530
8: 36477530
9: 36477530
10: 36477530
11: 36376304
12: 36050321
13: 36050321
14: 34718232
15: 32929287
COHORT
<char>
1: 23andMe|ALSPAC|ARIC|CADD|deCODE|EGCUT|eMERGE|Harvard|HRS|HUNT|MCTFR|MESA|METSIM|NTR|SardiNIA|UKBB|NINDS|FINRISK|AMISH|GeneSTAR|GOLDN|CHS|HVH|JHS|WGHS|WHI|GFG|other
2: 23andMe|ALSPAC|ARIC|CADD|COGEND|COPDGene|deCODE|EGCUT|Harvard|HRS|HUNT|METSIM|NTR|QIMR|SardiNIA|UKBB|FINRISK|AMISH|CFS|ECLIPSE|GeneSTAR|GOLDN|WHI|other
3: 23andMe|ALSPAC|ARIC|CADD|COGEND|COPDGene|deCODE|EGCUT|Harvard|HRS|HUNT|MCTFR|MESA|METSIM|NTR|PAGE|QIMR|SardiNIA|UKBB|FINRISK|AMISH|CFS|ECLIPSE|GeneSTAR|GOLDN|CHS|HCHS|SOL|WHI|other
4: 23andMe|ALSPAC|ARIC|CADD|COGEND|COPDGene|deCODE|EGCUT|eMERGE|Harvard|HUNT|MCTFR|METSIM|NTR|SardiNIA|UKBB|NINDS|FINRISK|AMISH|CFS|ECLIPSE|GeneSTAR|GOLDN|CHS|HCHS|SOL|HVH|WGHS|WHI|other
5: 23andMe|ALSPAC|ARIC|CADD|COGEND|deCODE|EGCUT|GERA|Harvard|HRS|HUNT|MCTFR|MESA|METSIM|NTR|QIMR|SardiNIA|UKBB|FINRISK|WHI|other
6: 23andMe|ALSPAC|ARIC|BLTS|CADD|deCODE|EGCUT|eMERGE|GFG|Harvard|HRS|HUNT|MCTFR|MESA|METSIM|NTR|SardiNIA|UKBB|WHI|FINRISK|NINDS|BBJ|CKB|AMISH|CFS|CHS|GENSalt|GOLDN|HCHS|SOL|HVH|HyperGEN|JHS|GeneSTAR|GENOA|SARP|WGHS|other
7: 23andMe|ALSPAC|ARIC|BLTS|CADD|COGEND|COPDGene|deCODE|EGCUT|GFG|Harvard|HRS|HUNT|MESA|METSIM|NTR|OZALC|SardiNIA|UKBB|WHI|FINRISK|BBJ|CKB|AMISH|CFS|ECLIPSE|GENSalt|GOLDN|HyperGEN|JHS|GeneSTAR|GENOA|other
8: 23andMe|ALSPAC|ARIC|BLTS|CADD|COGEND|COPDGene|deCODE|EGCUT|GFG|Harvard|HRS|HUNT|MCTFR|MESA|METSIM|NTR|OZALC|SardiNIA|UKBB|WHI|FINRISK|PAGE|BBJ|CKB|AMISH|CFS|CHS|ECLIPSE|GENSalt|GOLDN|HCHS|SOL|HyperGEN|JHS|GeneSTAR|GENOA|other
9: 23andMe|ALSPAC|ARIC|BLTS|CADD|COGEND|COPDGene|deCODE|EGCUT|eMERGE|GFG|Harvard|HUNT|MCTFR|MESA|METSIM|NTR|SardiNIA|UKBB|WHI|FINRISK|NINDS|BBJ|CKB|AMISH|CFS|CHS|ECLIPSE|GENSalt|GOLDN|HCHS|SOL|HVH|HyperGEN|JHS|GeneSTAR|GENOA|WGHS|other
10: 23andMe|ALSPAC|ARIC|CADD|COGEND|deCODE|EGCUT|GERA|GFG|Harvard|HRS|HUNT|MCTFR|MESA|METSIM|NTR|OZALC|SardiNIA|UKBB|WHI|FINRISK|BBJ|CKB|other
11: 23andMe|ALSPAC|ARIC|BLS|CADD|COGEND|COPDGene|deCODE|EGCUT|FHS|FTC|GERA|GFG|Harvard|HRS|HUNT|MCTFR|MESA|METSIM|NESCOG|FTC|NAG-FIN|NTR|QIMR|SardiNIA|UKBB|WHI
12: ARIC|other|BioMe|BRIGHT|CHRIS|CHS|ERF|FINCAVAS|GAPP|HCHS|SOL|HealthABC|INGI-Carlantino|INGI-FVG|Inter99|JHS|KORA|LifeLines|MESA|NEO|OOA|ORCADES|PIVUS|PREVEND|PROSPER|RS|SardiNIA|SHIP|TwinsUK|UKBB|VIKING|WHI|YFS
13: ARIC|BioMe|BRIGHT|other|CHRIS|CHS|ERF|FINCAVAS|GAPP|HealthABC|INGI-Carlantino|INGI-FVG|Inter99|KORA|LifeLines|MESA|NEO|OOA|ORCADES|PIVUS|PREVEND|PROSPER|RS|SardiNIA|SHIP|TwinsUK|UKBB|VIKING|WHI|YFS
14: SardiNIA
15: SardiNIA
gwas_study_info = gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, "SARDINIA", "SardiNIA"))
gwas_study_info |>
filter(grepl("Sardinia", COHORT)) |>
select(PUBMED_ID, COHORT) |>
distinct()
PUBMED_ID
<int>
1: 33830302
COHORT
<char>
1: GRID|British 1958 birth cohort|National blood service|WTCCC - Bipolar disease cases|Oxford Regional Prospective Study of Childhood Diabetes (ORPS)|Sardinia case-control
# not sure about case control Sardinia ...
# see second sup table from https://pmc.ncbi.nlm.nih.gov/articles/PMC8099827/#_ad93_
# Sardinia
Seems like mentioned ancestry groups, rather than cohorts (e.g. UKBB is used in this study)
see cohort information here: https://pmc.ncbi.nlm.nih.gov/articles/instance/8220892/bin/NIHMS1709432-supplement-Supp_Materials.pdf
gwas_study_info |>
dplyr::filter(PUBMED_ID == 32949544)
DATE_ADDED_TO_CATALOG PUBMED_ID FIRST_AUTHOR DATE JOURNAL
<IDat> <int> <char> <IDat> <char>
1: 2020-10-01 32949544 Jones E 2020-09-16 Lancet Neurol
LINK
<char>
1: www.ncbi.nlm.nih.gov/pubmed/32949544
STUDY
<char>
1: Identification of novel risk loci and causal insights for sporadic Creutzfeldt-Jakob disease: a genome-wide association study.
DISEASE/TRAIT
<char>
1: Creutzfeldt-Jakob disease (sporadic)
INITIAL_SAMPLE_SIZE
<char>
1: 4,110 European ancestry cases, 13,569 European ancestry controls
REPLICATION_SAMPLE_SIZE
<char>
1: 1,098 European ancestry cases, 498 ,016 European ancestry controls
PLATFORM_[SNPS_PASSING_QC] ASSOCIATION_COUNT
<char> <int>
1: Affymetrix, Illumina [6314492] (imputed) 4
MAPPED_TRAIT MAPPED_TRAIT_URI
<char> <char>
1: sporadic Creutzfeld Jacob disease http://www.ebi.ac.uk/efo/EFO_1000656
STUDY_ACCESSION GENOTYPING_TECHNOLOGY SUBMISSION_DATE
<char> <char> <lgcl>
1: GCST90001389 Genome-wide genotyping array NA
STATISTICAL_MODEL BACKGROUND_TRAIT MAPPED_BACKGROUND_TRAIT
<lgcl> <lgcl> <char>
1: NA NA
MAPPED_BACKGROUND_TRAIT_URI
<char>
1:
COHORT
<char>
1: Dutch controls|French controls|German controls|Italian controls|Spanish controls|UK controls|US controls|UK sCJD cases|US sCJD cases|German sCJD cases|other sCJD cases
FULL_SUMMARY_STATISTICS
<char>
1: yes
SUMMARY_STATS_LOCATION
<char>
1: http://ftp.ebi.ac.uk/pub/databases/gwas/summary_statistics/GCST90001001-GCST90002000/GCST90001389
GXE
<char>
1: no
gwas_study_info =
rows_update(gwas_study_info ,tibble(PUBMED_ID = 32949544, COHORT = "multiple"), unmatched = "ignore")
gwas_study_info |>
filter(grepl("Multiethnic samples from the UK", COHORT)) |>
select(PUBMED_ID, COHORT) |>
distinct()
PUBMED_ID COHORT
<int> <char>
1: 33649486 Multiethnic samples from the UK
gwas_study_info |>
filter(PUBMED_ID == 33649486)
DATE_ADDED_TO_CATALOG PUBMED_ID FIRST_AUTHOR DATE JOURNAL
<IDat> <int> <char> <IDat> <char>
1: 2021-03-22 33649486 Hardcastle AJ 2021-03-01 Commun Biol
LINK
<char>
1: www.ncbi.nlm.nih.gov/pubmed/33649486
STUDY
<char>
1: A multi-ethnic genome-wide association study implicates collagen matrix integrity and cell differentiation pathways in keratoconus.
DISEASE/TRAIT
<char>
1: Keratoconus
INITIAL_SAMPLE_SIZE
<char>
1: 2,116 European ancestry cases, 24,626 European ancestry controls
REPLICATION_SAMPLE_SIZE
<char>
1: 1, 389 European ancestry cases, 79,727 European ancestry controls, 759 South Asian ancestry cases, 8,009 South Asian ancestry controls, 405 African ancestry cases, 4,185 African ancestry controls
PLATFORM_[SNPS_PASSING_QC] ASSOCIATION_COUNT MAPPED_TRAIT
<char> <int> <char>
1: Affymetrix [7701190] (imputed) 36 keratoconus
MAPPED_TRAIT_URI STUDY_ACCESSION
<char> <char>
1: http://purl.obolibrary.org/obo/MONDO_0015486 GCST90013442
GENOTYPING_TECHNOLOGY SUBMISSION_DATE STATISTICAL_MODEL
<char> <lgcl> <lgcl>
1: Genome-wide genotyping array NA NA
BACKGROUND_TRAIT MAPPED_BACKGROUND_TRAIT MAPPED_BACKGROUND_TRAIT_URI
<lgcl> <char> <char>
1: NA
COHORT FULL_SUMMARY_STATISTICS
<char> <char>
1: Multiethnic samples from the UK yes
SUMMARY_STATS_LOCATION
<char>
1: http://ftp.ebi.ac.uk/pub/databases/gwas/summary_statistics/GCST90013001-GCST90014000/GCST90013442
GXE
<char>
1: no
# looking at this study, discovery
# controls come from UKBB
# cases recruited from various places across the UK - so
gwas_study_info =
rows_update(gwas_study_info ,tibble(PUBMED_ID = 33649486, COHORT = "UKBB|other"),
unmatched = "ignore")
# GAINT appears to be a typo
# see PUBMED_ID: 36376304 (https://pmc.ncbi.nlm.nih.gov/articles/PMC9663411/)
gwas_study_info |>
filter(grepl("GAINT", COHORT)) |>
select(PUBMED_ID, COHORT) |>
distinct()
PUBMED_ID COHORT
<int> <char>
1: 36376304 UKBB|GAINT
gwas_study_info = gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, "GAINT", "GIANT"))
unique(all_cohorts)[grepl("1982", unique(all_cohorts))]
[1] "1982 PELOTAS"
[2] "1982 Pelotas (Brazil) Birth Cohort Study"
gwas_study_info |>
filter(grepl("1982", COHORT)) |>
select(PUBMED_ID, COHORT) |>
distinct()
PUBMED_ID
<int>
1: 40537477
2: 39885687
3: 35399580
4: 35399580
COHORT
<char>
1: ARIC|CARDIA|CHS|GENOA|HABC|HANDLS|JHS|MESA|WHI|SP2|other|BAEPENDI|1982 PELOTAS|AGES|ERF|FHS|HyperGEN|NEO|RS|WHI-GARNET|GeneSTAR|HRS|SMHS|SWHS|CoLaus|KORA|LBC|Lifelines|NESDA|SHIP-Trend|TRAILS|YFS|SOL
2: ZOE2.0|SLS|BioVU|MyCode|VFA|SOLYouth|1982 PELOTAS|CCHC|EGG|MOBA
3: BioMe|Baependi|CANDELA|NC-BCFR|SFBCS|FIND|HCHS|SOL|Los Angeles Latino Eye Study|MEC|MESA|Mexico City 1|Mexico City 2|MHS|1982 Pelotas (Brazil) Birth Cohort Study|SAFS|STARR COUNTY|T2D SIGMA Studies|WHI
4: BioMe|Baependi|CANDELA|NC-BCFR|SFBCS|FIND|HCHS|SOL|Los Angeles Latino Eye Study|MEC|MESA|Mexico City 1|Mexico City 2|MHS|1982 Pelotas (Brazil) Birth Cohort Study|SAFS|STARR COUNTY|T2D SIGMA Studies|WHI|AAAGC|GIANT
# can confirm, 39885687 (https://pmc.ncbi.nlm.nih.gov/articles/PMC11875162/)
# 1982 PELOTAS refers to 1982 Pelotas (Brazil) Birth Cohort Study
# can confirm: 40537477 (https://pmc.ncbi.nlm.nih.gov/articles/PMC12179276/#MOESM2)
# 1982 PELOTAS refers to 1982 Pelotas (Brazil) Birth Cohort Study
gwas_study_info = gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, "1982 Pelotas (Brazil) Birth Cohort Study", "1982 PELOTAS"))
gwas_study_info |>
filter(grepl("\\bPELOTAS\\b", COHORT)) |>
filter(!grepl("1982", COHORT)) |>
select(PUBMED_ID, COHORT) |>
distinct()
PUBMED_ID COHORT
<int> <char>
1: 34059833 BioMe|IRAS|MESA|PELOTAS|HCHS|SOL
# can confirm: 34059833
# PELOTAS refers to the 1982 PELOTAS study
gwas_study_info = gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, "(?<!1982 )PELOTAS", "1982 PELOTAS"))
# i notice there are studies with cohort listed as
# GR@CE & GR@ACE - perhaps these are the same?
# checking GR@CE - as there is fewer of these studies listed ...
gwas_study_info |>
filter(grepl("GR@CE", COHORT)) |>
select(PUBMED_ID, COHORT) |>
distinct()
PUBMED_ID
<int>
1: 35379992
2: 39046104
3: 39046104
COHORT
<char>
1: EADB|GR@CE|EADI|GERAD|PERADES|DemGene|Bonn|RS|CCHS|UKBB|other
2: 3C|AGES|ARIC|ASPREE|CHS|FVG|FHS|GR@CE|Apulia|HKOS|HUNT|MEMENTO|MYHAT|ROSMAP|RS|ADGC|UKBB|other|SALSA
3: 3C|AGES|ARIC|ASPREE|CHS|FVG|FHS|GR@CE|Apulia|HKOS|HUNT|MEMENTO|MYHAT|ROSMAP|RS|ADGC|UKBB
# 35379992 -GR@CE appears to be a typo, should be: GR@ACE (https://pmc.ncbi.nlm.nih.gov/articles/PMC9005347/#Sec8)
# 39046104 - GR@CE also appears to be a typo, should be: GR@ACE
# https://pmc.ncbi.nlm.nih.gov/articles/PMC11497727/#alz14115-sec-0080
gwas_study_info = gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, "GR@CE", "GR@ACE"))
gwas_study_info |>
filter(grepl("tohoku", tolower(COHORT))) |>
select(PUBMED_ID, COHORT) |>
distinct()
PUBMED_ID COHORT
<int> <char>
1: 40226751 Tohoku Medical Megabank
2: 34782693 Tohoku Medical Megabank Project
gwas_study_info = gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, "Tohoku Medical Megabank Project", "Tohoku Medical Megabank"))
gwas_study_info |>
filter(grepl("steno", tolower(COHORT))) |>
select(PUBMED_ID, COHORT, `DISEASE/TRAIT`) |>
distinct()
PUBMED_ID
<int>
1: 34127860
2: 35627254
COHORT
<char>
1: BC58|BDA|NIHR BioResource|GRID|UKBS|BRI|CLEAR|EDIC|GoKinD|NYCP|NIMH|SEARCH|TrialNet|T1DGC|UAB|UC|UCSF|IDDMGEN|T1DGEN|MCW|GRID-NI|Young Hearts-NI|Steno Diabetes Center|HSG|HapMap
2: Steno|other
DISEASE/TRAIT
<char>
1: Type 1 diabetes
2: Neuropeptide Y autoantibody levels in type 1 diabetes
all_cohorts[grep("steno", tolower(all_cohorts))] |> unique()
[1] "Steno Diabetes Center" "Steno"
gwas_study_info = gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, "Steno Diabetes Center", "Steno"))
gwas_study_info |>
filter(grepl("nagahama", tolower(COHORT))) |>
select(PUBMED_ID, COHORT) |>
distinct()
PUBMED_ID
<int>
1: 35551307
2: 34059833
3: 34059833
4: 34887591
5: 40181193
6: 38277453
COHORT
<char>
1: AASC|BBJ|BES|CAGE|CHNS|CKB|CLHNS|DC|SP2|HKDR|KARE|other|MESA|Nagahama Study|SBCS|SWHS|SCES|SCHS|SiMES|TAICHI|TWT2D
2: AASC|BES|CAGE-GWAS1|CAGE|CLHNS|CHNS|KARE|Living Biobank|MESA|Nagahama Study|NHAPC|SCES|SiMES|SP2|TAICHI|CRC|SBCS|SMHS
3: CAGE-GWAS1|CAGE|CHNS|KARE|LivingBiobank|MESA|NagahamaStudy|NHAPC|SCES|SiMES|SP2|TAICHI|CRC|TWSC
4: BAS|BBJ|BES|CAGE|CAS|CHNS|CKB|SDCS|JPDSC|KARE|Living-biobank|MESA|Nagahama Study|NHAPC|SBCS|SCES|SCHS|SiMES|SINDI|SP2|SWHS|TUDR|TWT2D|other
5: AGES|ALSPAC|ARIC|BHS_b|CARDIA|CCHC|CFS|CHS|COLAUS|DIACORE|DRS_EXTRA|EPIC-Norfolk|EB|FHS|Fenland|GAPP|GENSALT|HANDLS|HCS|IRASFS|JHS|KOGES|LBC|LifeLines|LLFS|MESA|MVP|Nagahama_Study|NEO|NESDA|SHIP|SOL|SWAN|TwinsUK|UKBB|WHI|YFS
6: HERPACC|J-MICC|JPHC|ToMMo|Nagahama|BBJ
gwas_study_info = gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, "Nagahama_Study|NagahamaStudy", "Nagahama Study")) |>
# ? maybe check Nagahama == Nagahama Study
mutate(COHORT = str_replace_all(COHORT, "Nagahama Study", "Nagahama"))
gwas_study_info |>
filter(grepl("WTCCC - Bipolar disease cases", COHORT)) |>
select(1:5)
DATE_ADDED_TO_CATALOG PUBMED_ID FIRST_AUTHOR DATE JOURNAL
<IDat> <int> <char> <IDat> <char>
1: 2021-04-23 33830302 Inshaw JRJ 2021-04-08 Diabetologia
gwas_study_info |>
filter(PUBMED_ID == 33830302) |>
select(PUBMED_ID, COHORT)
PUBMED_ID
<int>
1: 33830302
2: 33830302
COHORT
<char>
1: GRID|British 1958 birth cohort|National blood service|WTCCC - Bipolar disease cases|Oxford Regional Prospective Study of Childhood Diabetes (ORPS)|Sardinia case-control
2:
gwas_study_info = gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, "WTCCC - Bipolar disease cases", "WTCCC"))
gwas_study_info |>
filter(grepl("QGP", COHORT)) |>
select(PUBMED_ID, COHORT) |>
distinct()
PUBMED_ID COHORT
<int> <char>
1: 33623009 Qatar Genome Program (QGP)
2: 36168886 QGP
# Checked 36168886 - QGP is Qatar Genome Project
# so
gwas_study_info = gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, "Qatar Genome Program (QGP)", "QGP"))
all_cohorts = gwas_study_info$COHORT
all_cohorts = unlist(strsplit(all_cohorts, "\\|"))
unique(all_cohorts) |> length()
[1] 1125
# canSCAD" "CanSCAD cases and MGI controls"
gwas_study_info |>
filter(grepl("CanSCAD cases and MGI controls", COHORT)) |>
select(PUBMED_ID, COHORT) |>
distinct()
PUBMED_ID COHORT
<int> <char>
1: 32887874 CanSCAD cases and MGI controls
gwas_study_info = gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, "CanSCAD cases and MGI controls", "canSCAD|MGI"))
all_cohorts = gwas_study_info$COHORT
all_cohorts = unlist(strsplit(all_cohorts, "\\|"))
unique_cohort_names = unique(all_cohorts)
# Convert to lowercase and check duplicates
dup_groups <- tapply(unique_cohort_names, tolower(unique_cohort_names), I)
# Keep only groups with >1 element (i.e., capitalization differences)
dup_groups[lengths(dup_groups) > 1]
$airwave
[1] "Airwave" "AIRWAVE"
$allofus
[1] "AllofUs" "AllOfUs"
$baependi
[1] "BAEPENDI" "Baependi"
$biome
[1] "BioMe" "BioME" "BIOME"
$biovu
[1] "BioVU" "BioVu" "BIOVU"
$cilento
[1] "CILENTO" "Cilento"
$colaus
[1] "CoLaus" "COLAUS"
$`croatia-korcula`
[1] "CROATIA-KORCULA" "CROATIA-Korcula"
$famhs
[1] "FamHS" "FAMHS"
$fenland
[1] "Fenland" "FENLAND"
$gel
[1] "GEL" "GeL"
$genestar
[1] "GeneSTAR" "GENESTAR" "GeneStar"
$gensalt
[1] "GENSalt" "GENSALT" "GenSalt"
$godarts
[1] "GoDARTS" "GODARTS"
$hypergen
[1] "HyperGEN" "HyperGen" "HYPERGEN"
$inchianti
[1] "InCHIANTI" "INCHIANTI"
$inter99
[1] "Inter99" "INTER99"
$koges
[1] "KoGES" "KOGES"
$`life-heart`
[1] "LIFE-HEART" "LIFE-Heart"
$lifelines
[1] "LifeLines" "Lifelines"
$`mayo-vdb`
[1] "MAYO-VDB" "Mayo-VDB"
$moba
[1] "MOBA" "MoBa"
$nugene
[1] "Nugene" "NUGENE"
$orcades
[1] "ORCADES" "Orcades"
$panscan
[1] "PANSCAN" "PanScan"
$raine
[1] "RAINE" "Raine"
$`ship-trend`
[1] "SHIP-TREND" "SHIP-Trend"
$sign
[1] "SiGN" "SIGN"
$viva
[1] "Viva" "VIVA"
# Normalize by removing spaces and underscores
normalized <- gsub("[ _]", "", sort(unique_cohort_names))
# Group by normalized value
dup_groups <- tapply(sort(unique_cohort_names), normalized, I)
# Keep only groups with >1 element (i.e. variants)
dup_groups[lengths(dup_groups) > 1]
$DRSEXTRA
[1] "DRS_EXTRA" "DRSEXTRA"
$GALAII
[1] "GALA II" "GALA_II"
$Health2000
[1] "Health 2000" "Health2000"
$HealthABC
[1] "Health ABC" "HealthABC"
$`INGI-ValBorbera`
[1] "INGI-Val Borbera" "INGI-Val_Borbera"
$LivingBiobank
[1] "Living Biobank" "LivingBiobank"
gwas_study_info = gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, "AIRWAVE", "Airwave"))
gwas_study_info = gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, "AllOfUs", "AllofUs"))
gwas_study_info = gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, toupper("Baependi"), "Baependi"))
gwas_study_info = gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, "BIOME|BioME", "BioMe"))
gwas_study_info = gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, "BIOVU|BioVu", "BioVU"))
gwas_study_info = gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, toupper("Cilento"), "Cilento"))
gwas_study_info = gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, "COLAUS", "CoLaus"))
gwas_study_info = gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, "CROATIA-KORCULA", "CROATIA-Korcula"))
gwas_study_info = gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, "DRSEXTRA", "DRS_EXTRA"))
gwas_study_info = gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, toupper("FamHS"), "FamHS"))
gwas_study_info = gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, "FENLAND", "Fenland"))
gwas_study_info = gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, "GALA_II", "GALA II"))
gwas_study_info |>
filter(grepl("GeL", COHORT)) |>
select(PUBMED_ID, COHORT) |>
distinct()
PUBMED_ID COHORT
<int> <char>
1: 36124557 GeL
# Only one study uses GeL (36124557)- from
# https://pmc.ncbi.nlm.nih.gov/articles/PMC9512401/#s4
# Appears to be typo, for Genomics England (GEL)
gwas_study_info = gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, "GeL", "GEL"))
gwas_study_info = gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, "GENESTAR|GeneStar", "GeneSTAR"))
gwas_study_info = gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, "GENSALT|GenSalt", "GENSalt"))
gwas_study_info = gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, toupper("godarts"), "GoDARTS"))
gwas_study_info = gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, toupper("InCHIANTI"), "InCHIANTI"))
gwas_study_info = gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, toupper("Inter99"), "Inter99"))
gwas_study_info = gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, "Health ABC", "HealthABC"))
gwas_study_info = gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, "Health 2000", "Health2000"))
gwas_study_info = gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, "HyperGEN|HYPERGEN", "HyperGen"))
# ? LifeLines Deep
gwas_study_info = gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, "INGI-Val_Borbera", "INGI-Val Borbera"))
# ? LifeLines Deep
gwas_study_info = gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, toupper("KoGES"), "KoGES"))
gwas_study_info = gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, "LIFE-HEART", "LIFE-Heart"))
gwas_study_info = gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, "Lifelines", "LifeLines"))
# ? LifeLines Deep
gwas_study_info = gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, "Living-biobank|LivingBiobank", "Living Biobank"))
gwas_study_info = gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, toupper("Mayo-VDB"), "Mayo-VDB"))
gwas_study_info = gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, toupper("MoBa"), "MoBa"))
gwas_study_info = gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, "Nugene", "NUGENE"))
gwas_study_info = gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, toupper("Orcades"), "Orcades"))
gwas_study_info = gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, toupper("PanScan"), "PanScan"))
gwas_study_info = gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, toupper("Raine"), "Raine"))
all_cohorts[grep("rosmap", tolower(all_cohorts))] |> unique()
[1] "ROSMAP" "ROSMAP 1" "ROSMAP 2"
gwas_study_info |>
filter(grepl("ROSMAP 1|ROSMAP 2", COHORT)) |>
select(PUBMED_ID, COHORT) |>
distinct()
PUBMED_ID
<int>
1: 33510174
COHORT
<char>
1: ARIC|BASE-II|BPROOF|CHS|EPIC-Norfolk|FHS|HRS|InCHIANTI|LASA I|LASA II|Long Life Family Study|MrOS Gothenburg|MrOS Malmo|ROSMAP 1|ROSMAP 2|RS|RSI|RSII|SHIP|TSHA|UKBB|WLS|
gwas_study_info = gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, "ROSMAP 1|ROSMAP 2", "ROSMAP"))
gwas_study_info = gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, "SHIP-Trend", "SHIP-TREND"))
# ? "SHIPNATREND" - comes from one study
gwas_study_info |>
filter(grepl("SHIPNATREND", COHORT)) |>
select(PUBMED_ID, COHORT) |>
distinct()
PUBMED_ID
<int>
1: 32888493
2: 32888493
3: 32888493
4: 32888493
5: 32888493
6: 32888493
7: 32888493
8: 32888493
9: 32888493
10: 32888493
11: 32888493
12: 32888493
13: 32888493
14: 32888493
15: 32888493
16: 32888493
17: 32888493
18: 32888493
19: 32888493
20: 32888493
21: 32888493
22: 32888493
PUBMED_ID
COHORT
<char>
1: Airwave|BioMe|CaPS|Estonia|Estonia|FHS|FINCAVAS|GERA|GERA|GERA|INTERVAL|MESA|MHIphase1|MHIphase2|RS|RS|RSI|SHIPNATREND|UKBB|WHI
2: Airwave|BBJ|BioMe|CaPS|Estonia|Estonia|FHS|FINCAVAS|GERA|GERA|GERA|HANDLS|INTERVAL|JHS|MESA|MHIphase1|MHIphase2|RS|RS|RSI|SHIPNATREND|UKBB|WHI
3: Airwave|BioMe|CaPS|Estonia|Estonia|FHS|FINCAVAS|INTERVAL|MESA|MHIphase1|MHIphase2|RS|RS|RSI|SHIP|SHIPNATREND|UKBB|WHI|YFS
4: Airwave|BBJ|BioMe|CaPS|Estonia|Estonia|FHS|FINCAVAS|HANDLS|INTERVAL|JHS|MESA|MHIphase1|MHIphase2|RS|RS|RSI|SHIP|SHIPNATREND|UKBB|WHI|YFS
5: Airwave|BioMe|CaPS|CHS|Estonia|Estonia|FHS|FINCAVAS|INTERVAL|MESA|MHIphase1|MHIphase2|RS|RS|RSI|SHIP|SHIPNATREND|UKBB|WHI|YFS
6: Airwave|BBJ|BioMe|CaPS|CHS|CHS|Estonia|Estonia|FHS|FINCAVAS|HANDLS|INTERVAL|JHS|MESA|MHIphase1|MHIphase2|RS|RS|RSI|SHIP|SHIPNATREND|UKBB|WHI|YFS
7: Airwave|BioMe|CaPS|Estonia|Estonia|FHS|FINCAVAS|GERA|GERA|GERA|INTERVAL|MESA|MHIphase1|MHIphase2|RS|RS|RSI|SHIP|SHIPNATREND|UKBB|WHI|YFS
8: Airwave|BBJ|BioMe|CaPS|Estonia|Estonia|FHS|FINCAVAS|GERA|GERA|GERA|HANDLS|INTERVAL|JHS|MESA|MHIphase1|MHIphase2|RS|RS|RSI|SHIP|SHIPNATREND|UKBB|WHI|YFS
9: Airwave|BioMe|CaPS|Estonia|Estonia|FHS|FINCAVAS|INTERVAL|MESA|MHIphase1|MHIphase2|SHIPNATREND|UKBB|WHI
10: Airwave|BBJ|BioMe|CaPS|Estonia|Estonia|FHS|FINCAVAS|HANDLS|INTERVAL|JHS|MESA|MHIphase1|MHIphase2|SHIPNATREND|UKBB|WHI
11: Airwave|BioMe|CaPS|CHS|Estonia|Estonia|FHS|FINCAVAS|GERA|GERA|GERA|INTERVAL|MESA|MHIphase1|MHIphase2|RS|RS|RSI|SHIP|SHIPNATREND|UKBB|WHI|YFS
12: Airwave|BBJ|BioMe|CaPS|CHS|Estonia|Estonia|FHS|FINCAVAS|GERA|GERA|GERA|HANDLS|INTERVAL|JHS|MESA|MHIphase1|MHIphase2|RS|RS|RSI|SHIP|SHIPNATREND|UKBB|WHI|YFS
13: Airwave|BioMe|CaPS|CHS|Estonia|Estonia|FHS|FINCAVAS|GERA|GERA|GERA|Health2006|Health2008|Health2010|INTERVAL|MESA|MHIphase1|MHIphase2|RS|RS|RSI|SHIP|SHIPNATREND|UKBB|WHI|YFS
14: Airwave|BBJ|BioMe|CaPS|CHNS|CHS|Estonia|Estonia|FHS|FINCAVAS|GERA|GERA|GERA|HANDLS|Health2006|Health2008|Health2010|INTERVAL|JHS|MESA|MHIphase1|MHIphase2|RS|RS|RSI|SHIP|SHIPNATREND|UKBB|WHI|YFS
15: Airwave|BioMe|CaPS|Estonia|FHS|INTERVAL|MHIphase1|MHIphase2|RS|RS|RSI|SHIP|SHIPNATREND|UKBB|WHI
16: Airwave|BioMe|BioMe|BioMe|CaPS|Estonia|FHS|HANDLS|INTERVAL|JHS|MHIphase1|MHIphase2|RS|RS|RSI|SHIP|SHIPNATREND|UKBB|UKBB|UKBB|UKBB|WHI
17: Airwave|BioMe|CaPS|Estonia|Estonia|FHS|FINCAVAS|GERA|GERA|GERA|INTERVAL|MESA|MHIphase1|MHIphase2|SHIPNATREND|UKBB|WHI
18: Airwave|BBJ|BioMe|BioMe|BioMe|CaPS|Estonia|Estonia|FHS|FINCAVAS|GERA|GERA|GERA|GERA|GERA|HANDLS|INTERVAL|JHS|MESA|MESA|MESA|MHIphase1|MHIphase2|SHIPNATREND|UKBB|UKBB|UKBB|UKBB|WHI
19: Airwave|BBJ|BioMe|BioMe|BioMe|CaPS|CHNS|CHS|CHS|Estonia|Estonia|FHS|FINCAVAS|GERA|GERA|GERA|GERA|GERA|HANDLS|INTERVAL|JHS|MESA|MESA|MESA|MHIphase1|MHIphase2|RS|RS|RSI|SHIP|SHIPNATREND|UKBB|UKBB|UKBB|UKBB|WHI|YFS
20: Airwave|BBJ|BioMe|BioMe|BioMe|CaPS|CHNS|Estonia|Estonia|FHS|FINCAVAS|GERA|GERA|GERA|GERA|GERA|HANDLS|INTERVAL|JHS|MESA|MESA|MESA|MHIphase1|MHIphase2|RS|RS|RSI|SHIP|SHIPNATREND|UKBB|UKBB|UKBB|UKBB|WHI|YFS
21: Airwave|BioMe|CaPS|FHS|GERA|GERA|GERA|INTERVAL|MHIphase1|MHIphase2|RS|RS|RSI|SHIP|SHIPNATREND|UKBB|WHI
22: Airwave|BioMe|BioMe|BioMe|CaPS|FHS|GERA|GERA|GERA|GERA|GERA|HANDLS|INTERVAL|JHS|MHIphase1|MHIphase2|RS|RS|RSI|SHIP|SHIPNATREND|UKBB|UKBB|UKBB|UKBB|WHI
COHORT
# from sup table, seems like SHIPNATREND is SHIP-TREND -
# https://pmc.ncbi.nlm.nih.gov/articles/PMC7480402/#SD1
gwas_study_info = gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, "SHIPNATREND", "SHIP-TREND"))
gwas_study_info = gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, "SIGN", "SiGN"))
gwas_study_info = gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, "Viva", "VIVA"))
gwas_study_info |>
filter(grepl("Rotterdam", COHORT))
DATE_ADDED_TO_CATALOG PUBMED_ID FIRST_AUTHOR DATE JOURNAL
<IDat> <int> <char> <IDat> <char>
1: 2025-03-11 40050429 Roselli C 2025-03-06 Nat Genet
LINK
<char>
1: www.ncbi.nlm.nih.gov/pubmed/40050429
STUDY
<char>
1: Meta-analysis of genome-wide associations and polygenic risk prediction for atrial fibrillation in more than 180,000 cases.
DISEASE/TRAIT
<char>
1: Atrial fibrillation
INITIAL_SAMPLE_SIZE
<char>
1: 1,782 Admix African and African American cases, 9,356 Admix African and African American controls, 11,350 East Asian ancestry cases, 137,515 East Asian ancestry controls, 166,322 European ancestry cases, 1,313,950 European ancestry controls, 1,774 Hispanic or Latin American cases, 7,665 Hispanic or Latin American controls, 218 South Asian ancestry cases, 413 South Asian ancestry controls
REPLICATION_SAMPLE_SIZE PLATFORM_[SNPS_PASSING_QC]
<char> <char>
1: <NA> Affymetrix, Illumina [29789980] (imputed)
ASSOCIATION_COUNT MAPPED_TRAIT MAPPED_TRAIT_URI
<int> <char> <char>
1: 355 atrial fibrillation http://www.ebi.ac.uk/efo/EFO_0000275
STUDY_ACCESSION GENOTYPING_TECHNOLOGY SUBMISSION_DATE
<char> <char> <lgcl>
1: GCST90559230 Genome-wide genotyping array NA
STATISTICAL_MODEL BACKGROUND_TRAIT MAPPED_BACKGROUND_TRAIT
<lgcl> <lgcl> <char>
1: NA NA
MAPPED_BACKGROUND_TRAIT_URI
<char>
1:
COHORT
<char>
1: AGES|ARIC|BioMe|Broad CVDi|BBJ|CHS|MESA|SiGN|ENGAGE_AF-TIMI_48|SPHFC|CCAF|CHB|MyCode|EGCUT|FHS|GAPP|GS:SFHS|HRS|LURIC|HUNT|MGI|PHB|PIVUS|PREVEND|PROSPER|Rotterdam|SHIP|SiGN|TwinGene|ULSAM|Vanderbilt|WGHS|WTCCC|FinnGen|UKBB|other
FULL_SUMMARY_STATISTICS SUMMARY_STATS_LOCATION GXE
<char> <char> <char>
1: no <NA> no
# Rotterdam study is typically listed as "RS"
# see e.g. 36568030 https://pmc.ncbi.nlm.nih.gov/articles/PMC9772568/
gwas_study_info |>
filter(grepl("\\bRS\\b", COHORT))
DATE_ADDED_TO_CATALOG PUBMED_ID FIRST_AUTHOR DATE
<IDat> <int> <char> <IDat>
1: 2023-03-21 36662418 Faber BG 2023-01-20
2: 2023-05-12 36918541 Young WJ 2023-03-14
3: 2023-05-12 36918541 Young WJ 2023-03-14
4: 2023-05-12 36918541 Young WJ 2023-03-14
5: 2023-05-12 36918541 Young WJ 2023-03-14
---
332: 2023-01-31 36568030 Young KL 2022-11-25
333: 2023-01-31 36568030 Young KL 2022-11-25
334: 2023-01-31 36568030 Young KL 2022-11-25
335: 2023-01-31 36568030 Young KL 2022-11-25
336: 2023-01-31 36568030 Young KL 2022-11-25
JOURNAL LINK
<char> <char>
1: Arthritis Rheumatol www.ncbi.nlm.nih.gov/pubmed/36662418
2: Nat Commun www.ncbi.nlm.nih.gov/pubmed/36918541
3: Nat Commun www.ncbi.nlm.nih.gov/pubmed/36918541
4: Nat Commun www.ncbi.nlm.nih.gov/pubmed/36918541
5: Nat Commun www.ncbi.nlm.nih.gov/pubmed/36918541
---
332: HGG Adv www.ncbi.nlm.nih.gov/pubmed/36568030
333: HGG Adv www.ncbi.nlm.nih.gov/pubmed/36568030
334: HGG Adv www.ncbi.nlm.nih.gov/pubmed/36568030
335: HGG Adv www.ncbi.nlm.nih.gov/pubmed/36568030
336: HGG Adv www.ncbi.nlm.nih.gov/pubmed/36568030
STUDY
<char>
1: A GWAS meta-analysis of alpha angle suggests cam-type morphology may be a specific feature of hip osteoarthritis in older adults.
2: Genetic architecture of spatial electrical biomarkers for cardiac arrhythmia and relationship with cardiovascular disease.
3: Genetic architecture of spatial electrical biomarkers for cardiac arrhythmia and relationship with cardiovascular disease.
4: Genetic architecture of spatial electrical biomarkers for cardiac arrhythmia and relationship with cardiovascular disease.
5: Genetic architecture of spatial electrical biomarkers for cardiac arrhythmia and relationship with cardiovascular disease.
---
332: Whole-exome sequence analysis of anthropometric traits illustrates challenges in identifying effects of rare genetic variants.
333: Whole-exome sequence analysis of anthropometric traits illustrates challenges in identifying effects of rare genetic variants.
334: Whole-exome sequence analysis of anthropometric traits illustrates challenges in identifying effects of rare genetic variants.
335: Whole-exome sequence analysis of anthropometric traits illustrates challenges in identifying effects of rare genetic variants.
336: Whole-exome sequence analysis of anthropometric traits illustrates challenges in identifying effects of rare genetic variants.
DISEASE/TRAIT
<char>
1: Alpha angle
2: Frontal QRS-T angle
3: Spatial QRS-T angle
4: Spatial QRS-T angle
5: Frontal QRS-T angle
---
332: Waist-hip ratio
333: Waist-hip ratio
334: Waist-hip ratio
335: Waist-hip ratio
336: Waist-hip ratio
INITIAL_SAMPLE_SIZE
<char>
1: 44,214 European ancestry individuals
2: 159,715 European ancestry, African ancestry, Hispanic or Latin American individuals
3: 96,562 European ancestry individuals
4: 118,780 European ancestry, African ancestry, Hispanic or Latin American individuals
5: 134,567 European ancestry individuals
---
332: 15,503 European ancestry individuals
333: 8,678 European ancestry women
334: 6,825 European ancestry men
335: 2,987 African ancestry women, 8,678 European ancestry women
336: 1,307 African ancestry men, 6,825 European ancestry men
REPLICATION_SAMPLE_SIZE
<char>
1: <NA>
2: <NA>
3: <NA>
4: <NA>
5: <NA>
---
332: 1,229 European ancestry individuals
333: 771 European ancestry women
334: 758 European ancestry men
335: 771 European ancestry women, 2,308 African American women
336: 758 European ancestry men, 1,239 African American men
PLATFORM_[SNPS_PASSING_QC] ASSOCIATION_COUNT
<char> <int>
1: Affymetrix, Illumina [9134976] (imputed) 8
2: Affymetrix, Illumina [8299259] (imputed) 11
3: Affymetrix, Illumina [8603009] (imputed) 51
4: Affymetrix, Illumina [9052360] (imputed) 61
5: Affymetrix, Illumina [7954211] (imputed) 9
---
332: NR [67633] 0
333: NR [67633] 0
334: NR [67633] 0
335: NR [67633] 0
336: NR [67633] 0
MAPPED_TRAIT MAPPED_TRAIT_URI
<char> <char>
1: alpha angle measurement http://www.ebi.ac.uk/efo/EFO_0020071
2: QRS-T angle http://www.ebi.ac.uk/efo/EFO_0020097
3: QRS-T angle http://www.ebi.ac.uk/efo/EFO_0020097
4: QRS-T angle http://www.ebi.ac.uk/efo/EFO_0020097
5: QRS-T angle http://www.ebi.ac.uk/efo/EFO_0020097
---
332: waist-hip ratio http://www.ebi.ac.uk/efo/EFO_0004343
333: waist-hip ratio http://www.ebi.ac.uk/efo/EFO_0004343
334: waist-hip ratio http://www.ebi.ac.uk/efo/EFO_0004343
335: waist-hip ratio http://www.ebi.ac.uk/efo/EFO_0004343
336: waist-hip ratio http://www.ebi.ac.uk/efo/EFO_0004343
STUDY_ACCESSION GENOTYPING_TECHNOLOGY SUBMISSION_DATE
<char> <char> <lgcl>
1: GCST90129635 Genome-wide genotyping array NA
2: GCST90246319 Genome-wide genotyping array NA
3: GCST90246320 Genome-wide genotyping array NA
4: GCST90246318 Genome-wide genotyping array NA
5: GCST90246321 Genome-wide genotyping array NA
---
332: GCST90245813 Exome-wide sequencing NA
333: GCST90245814 Exome-wide sequencing NA
334: GCST90245815 Exome-wide sequencing NA
335: GCST90245816 Exome-wide sequencing NA
336: GCST90245817 Exome-wide sequencing NA
STATISTICAL_MODEL BACKGROUND_TRAIT MAPPED_BACKGROUND_TRAIT
<lgcl> <lgcl> <char>
1: NA NA
2: NA NA
3: NA NA
4: NA NA
5: NA NA
---
332: NA NA
333: NA NA
334: NA NA
335: NA NA
336: NA NA
MAPPED_BACKGROUND_TRAIT_URI
<char>
1:
2:
3:
4:
5:
---
332:
333:
334:
335:
336:
COHORT
<char>
1: UKBB|RS
2: ARIC|other|BRIGHT|CHRIS|CHS|ERF|GS:SFHS|HCHS|SOL|Inter99|JHS|LifeLines|MESA|NEO|Orcades|PREVEND|PROSPER|RS|UKBB|VIKING|WHI
3: ARIC|other|BRIGHT|CHRIS|CHS|ERF|GS:SFHS|HCHS|SOL|Inter99|JHS|LifeLines|MESA|NEO|Orcades|PREVEND|PROSPER|RS|UKBB|VIKING|WHI
4: ARIC|other|BRIGHT|CHRIS|CHS|ERF|GS:SFHS|HCHS|SOL|Inter99|JHS|LifeLines|MESA|NEO|Orcades|PREVEND|PROSPER|RS|UKBB|VIKING|WHI
5: ARIC|other|BRIGHT|CHRIS|CHS|ERF|GS:SFHS|HCHS|SOL|Inter99|JHS|LifeLines|MESA|NEO|Orcades|PREVEND|PROSPER|RS|UKBB|VIKING|WHI
---
332: ARIC|CHS|ERF|FHS|GOLDN|RS|other
333: ARIC|CHS|ERF|FHS|GOLDN|RS|other
334: ARIC|CHS|ERF|FHS|GOLDN|RS|other
335: ARIC|CHS|ERF|FHS|GOLDN|RS|other
336: ARIC|CHS|ERF|FHS|GOLDN|RS|other
FULL_SUMMARY_STATISTICS
<char>
1: yes
2: yes
3: yes
4: yes
5: yes
---
332: no
333: no
334: no
335: no
336: no
SUMMARY_STATS_LOCATION
<char>
1: http://ftp.ebi.ac.uk/pub/databases/gwas/summary_statistics/GCST90129001-GCST90130000/GCST90129635
2: http://ftp.ebi.ac.uk/pub/databases/gwas/summary_statistics/GCST90246001-GCST90247000/GCST90246319
3: http://ftp.ebi.ac.uk/pub/databases/gwas/summary_statistics/GCST90246001-GCST90247000/GCST90246320
4: http://ftp.ebi.ac.uk/pub/databases/gwas/summary_statistics/GCST90246001-GCST90247000/GCST90246318
5: http://ftp.ebi.ac.uk/pub/databases/gwas/summary_statistics/GCST90246001-GCST90247000/GCST90246321
---
332: <NA>
333: <NA>
334: <NA>
335: <NA>
336: <NA>
GXE
<char>
1: no
2: no
3: no
4: no
5: no
---
332: no
333: no
334: no
335: no
336: no
gwas_study_info = gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, "Rotterdam", "RS"))
# CKB is the acronym for the China Kadoorie Biobank (see:pubmed id 36777997) https://pmc.ncbi.nlm.nih.gov/articles/PMC9903787/#tbl1
gwas_study_info |>
filter(grepl("\\bCKB\\b", COHORT)) |>
select(PUBMED_ID, COHORT) |>
distinct() |>
tail()
PUBMED_ID COHORT
<int> <char>
1: 34586374 CKB|other
2: 34586374 CKB|other|UKBB
3: 34586374 CKB|WHI|other
4: 33766948 CKB
5: 36777997 BBJ|BioMe|BioVU|CCPM|CKB|EB|FinnGen|G&H|HUNT|MGBB|MGI|UCLA|UKBB
6: 36777997 BBJ|BioMe|BioVU|CCPM|CKB|EB|FinnGen|G&H|HUNT|MGBB|MGI|UCLA|UKBB|NR
gwas_study_info = gwas_study_info |>
mutate(COHORT = str_replace_all(COHORT, "China Kadoorie Biobank", "CKB"))
all_cohorts = gwas_study_info$COHORT
all_cohorts = unlist(strsplit(all_cohorts, "\\|"))
unique(all_cohorts) |> length()
[1] 1078
unique_cohort_names = unique(all_cohorts)
# Convert to lowercase and check duplicates
dup_groups <- tapply(unique_cohort_names, tolower(unique_cohort_names), I)
# Keep only groups with >1 element (i.e., capitalization differences)
dup_groups[lengths(dup_groups) > 1]
named character(0)
normalized <- gsub("[ _]", "", sort(unique_cohort_names))
# Group by normalized value
dup_groups <- tapply(sort(unique_cohort_names), normalized, I)
# Keep only groups with >1 element (i.e. variants)
dup_groups[lengths(dup_groups) > 1]
named character(0)
normalized <- gsub("[ _]", "", sort(unique_cohort_names))
# Group by normalized value
dup_groups <- tapply(sort(unique_cohort_names), tolower(normalized), I)
# Keep only groups with >1 element (i.e. variants)
dup_groups[lengths(dup_groups) > 1]
named character(0)
single_use_cohorts =
data.frame(cohort = all_cohorts) |>
group_by(cohort) |>
summarise(n_studies = n()) |>
filter(n_studies == 1) |>
pull(cohort)
length(single_use_cohorts)
[1] 208
library(stringdist)
library(dplyr)
# Create a vector of unique cohort names
cohorts <- unique(all_cohorts)
# Compute pairwise string distances (Levenshtein distance)
dist_matrix <- stringdistmatrix(single_use_cohorts, cohorts, method = "lv")
# Identify pairs with small distance (e.g., <=2 edits)
threshold <- 2
matches <- which(dist_matrix > 0 & dist_matrix <= threshold, arr.ind = TRUE)
matches <- data.frame(
cohort1 = single_use_cohorts[matches[,1]],
cohort2 = cohorts[matches[,2]],
distance = dist_matrix[matches]
)
matches <- matches[matches$cohort1 != matches$cohort2, ]
matches <- unique(matches)
matches |>
arrange(distance) |>
head()
cohort1 cohort2 distance
1 CHIP SHIP 1
2 SpBCS SEBCS 1
3 NZ NR 1
4 DCHS DACHS 1
5 HIS HAS 1
6 MACS MCCS 1
data.table::fwrite(gwas_study_info,
here::here("output/gwas_study_info_cohort_corrected.csv"),
sep = ",")
# in below study, unlisted cohort is combination of two cohorts
gwas_study_info |>
filter(PUBMED_ID == 32605384) |>
select(PUBMED_ID, COHORT, STUDY_ACCESSION, "DISEASE/TRAIT", "INITIAL_SAMPLE_SIZE", "REPLICATION_SAMPLE_SIZE")
gwas_study_info |>
filter(PUBMED_ID == 30510241) |>
select(PUBMED_ID, COHORT, STUDY_ACCESSION, "DISEASE/TRAIT", "INITIAL_SAMPLE_SIZE", "REPLICATION_SAMPLE_SIZE")
# if go to supplement, can see made up of many many many studies - I believe includes other all other subsamples
gwas_study_info |>
filter(PUBMED_ID == 33307546) |>
select(PUBMED_ID, COHORT, STUDY_ACCESSION, "DISEASE/TRAIT", "INITIAL_SAMPLE_SIZE", "REPLICATION_SAMPLE_SIZE")
# COVID-19 Host Genetics Initiative (HGI) is this hispanic individuals I believe
# European ancestry from the ‘broad respiratory phenotype’ study of 23andMe
# See replication section of https://www.nature.com/articles/s41586-020-03065-y#Sec4
gwas_study_info |>
filter(PUBMED_ID == 38184787) |>
select(PUBMED_ID, COHORT, STUDY_ACCESSION, "DISEASE/TRAIT", "INITIAL_SAMPLE_SIZE", "REPLICATION_SAMPLE_SIZE")
# cohorts listed are for
# Raine Study -- ? Raine
# Penn - UPenn etc.
# ?CALGB
# "SIGNET-REGARDS" >? "SIGNET"
# "RISC" & "RISK" appear to be different
# Relationship Between Insulin Sensitivity and Cardiovascular Disease Risk (RISC)
# Risk Stratification and Identification of Immunogenetic and Microbial Markers of Rapid Disease Progression in Children with Crohn’s Disease (RISK)
"CKB"
[231] "COHRA"
[232] "COHRA1"
[233] "COHRA2"
# UK Blood Service (UKBS)
[294] "DiscovEHR"
[295] "DISCOVeRY-BMT"
[330] "ELSA"
[331] "ELSA-Brasil"
[340] "EPIC"
[341] "EPIC_CAD"
[342] "EPIC_Obs"
[343] "EPIC-Norfolk"
[344] "EPICURE"
[372] "FinnTwin"
[373] "FinnTwin12"
[463] "GOCS"
[464] "GOCS_Chilean"
[480] "GRAAD"
[481] "GRaD"
# Colo2&3
[513] "HELIC"
[514] "HELIC-MANOLIS"
[515] "HELIC-Pomak"
# ? QTR == QTR_Qindao
# ? is "other|UKB" == "UKB|other"
# ? is UK|NR == UKB|NR
# ? CF_TSS == TSS
gwas_study_info |>
filter(grepl("ORPS", COHORT))
gwas_study_info |>
filter(PUBMED_ID == 39749473) |>
select(COHORT)
# PAGE vs PAGES
# PUBMED ID: 35754128 - should be PAGES
# see sup table 1. https://pmc.ncbi.nlm.nih.gov/articles/PMC9671132/
# COGEND COGENT
sessionInfo()
R version 4.3.1 (2023-06-16)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS 15.6
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
time zone: America/Los_Angeles
tzcode source: internal
attached base packages:
[1] stats graphics grDevices datasets utils methods base
other attached packages:
[1] stringdist_0.9.15 stringr_1.5.1 ggplot2_3.5.2 dplyr_1.1.4
[5] data.table_1.17.8 workflowr_1.7.1
loaded via a namespace (and not attached):
[1] gtable_0.3.6 jsonlite_2.0.0 compiler_4.3.1 renv_1.0.3
[5] promises_1.3.3 tidyselect_1.2.1 Rcpp_1.1.0 git2r_0.36.2
[9] parallel_4.3.1 callr_3.7.6 later_1.4.2 jquerylib_0.1.4
[13] scales_1.4.0 yaml_2.3.10 fastmap_1.2.0 here_1.0.1
[17] R6_2.6.1 generics_0.1.4 knitr_1.50 tibble_3.3.0
[21] rprojroot_2.1.0 RColorBrewer_1.1-3 bslib_0.9.0 pillar_1.11.0
[25] rlang_1.1.6 cachem_1.1.0 stringi_1.8.7 httpuv_1.6.16
[29] xfun_0.52 getPass_0.2-4 fs_1.6.6 sass_0.4.10
[33] cli_3.6.5 withr_3.0.2 magrittr_2.0.3 ps_1.9.1
[37] grid_4.3.1 digest_0.6.37 processx_3.8.6 rstudioapi_0.17.1
[41] lifecycle_1.0.4 vctrs_0.6.5 evaluate_1.0.4 glue_1.8.0
[45] farver_2.1.2 whisker_0.4.1 rmarkdown_2.29 httr_1.4.7
[49] tools_4.3.1 pkgconfig_2.0.3 htmltools_0.5.8.1