Last updated: 2020-02-04
Checks: 5 2
Knit directory: PSYMETAB/
This reproducible R Markdown analysis was created with workflowr (version 1.6.0). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.
The R Markdown file has unstaged changes. To know which version of the R Markdown file created these results, you’ll want to first commit it to the Git repo. If you’re still working on the analysis, you can ignore this warning. When you’re finished, you can run wflow_publish
to commit the R Markdown file and build the HTML.
Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.
The command set.seed(20191126)
was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.
Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.
Using absolute paths to the files within your workflowr project makes it difficult for you and others to run your code on a different machine. Change the absolute path(s) below to the suggested relative path(s) to make your code more reproducible.
absolute | relative |
---|---|
/data/sgg2/jenny/projects/PSYMETAB | . |
Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility. The version displayed above was the version of the Git repository at the time these results were generated.
Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish
or wflow_git_commit
). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:
Ignored files:
Ignored: .drake/
Ignored: analysis/GRS/
Ignored: analysis/GWAS/
Ignored: analysis/QC/
Ignored: data/processed/
Ignored: data/raw/
Ignored: packrat/lib-R/
Ignored: packrat/lib-ext/
Ignored: packrat/lib/
Untracked files:
Untracked: ._docs
Untracked: .future/
Untracked: GWAS.out
Untracked: PSYMETAB_GWAS.log
Untracked: Rplots.pdf
Untracked: analysis/._data_processing_in_genomestudio.Rmd
Untracked: analysis/._quality_control.Rmd
Untracked: cache_log.csv
Untracked: debug/
Untracked: init_analysis.future.out
Untracked: post_impute_qc.out
Untracked: qc_prep.log
Unstaged changes:
Modified: analysis/quality_control.Rmd
Deleted: packrat/snapshot.lock
Modified: pre_imputation_qc.log
Modified: pre_impute_qc.out
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
These are the previous versions of the R Markdown and HTML files. If you’ve configured a remote Git repository (see ?wflow_git_remote
), click on the hyperlinks in the table below to view them.
File | Version | Author | Date | Message |
---|---|---|---|---|
Rmd | 6dcbccf | Sjaarda Jennifer Lynn | 2020-02-03 | update rmd files |
Rmd | e6f7fb5 | Jenny | 2019-12-17 | improve website |
Rmd | 82159d6 | Sjaarda Jennifer Lynn | 2019-12-10 | run meta, prepare for grs |
Rmd | dd4a16b | Jenny | 2019-12-09 | misc project updates re main gwas analysis |
html | 46477dd | Jenny Sjaarda | 2019-12-06 | Build site. |
Rmd | b503ef0 | Sjaarda Jennifer Lynn | 2019-12-06 | add more details to website |
html | b6cb027 | Jenny Sjaarda | 2019-12-06 | Build site. |
Rmd | 487b5f5 | Sjaarda Jennifer Lynn | 2019-12-06 | update website, add qc description |
The following document outlines and summarizes the quality control and processing procedure that was followed to create a clean, imputed dataset.
Step 1 was performed entirely on CHUV computer
code/radomize_IDs.r
was run on CHUV computer before building GenomeStudio project.[Header],,,,,,,,,,,,,
Investigator Name,,,,,,,,,,,,,
Project Name,,,,,,,,,,,,,
Experiment Name,,,,,,,,,,,,,
Date,,,,,,,,,,,,,
[Manifests],,,,,,,,,,,,,
A,GSA_UPPC_20023490X357589_A1,,,,,,,,,,,,
[Data],,,,,,,,,,,,,
${ID}002
.L:\PCN\UBPC\ANALYSES_RECHERCHE\Jenny\PSYMETAB_GWAS\data
.Eap0819_1t26_27to29corrected_7b9b_randomizedID.csv
.Eap0819_1t26_27to29corrected_7b9.csv
, if needed.L:\PCN\UBPC\ANALYSES_RECHERCHE\Jenny\PSYMETAB_GWAS
, named: GS_project_26092019
(data of creation).L:\PCN\UBPC\ANALYSES_RECHERCHE\Jenny\PSYMETAB_GWAS
as project repository.L:\PCN\UBPC\ANALYSES_RECHERCHE\Jenny\PSYMETAB_GWAS\data\Eap0819_1t26_27to29corrected_7b9b_randomizedID.csv
,L:\PCN\UBPC\ANALYSES_RECHERCHE\Jenny\PSYMETAB_GWAS\data
,L:\PCN\UBPC\ANALYSES_RECHERCHE\Jenny\PSYMETAB_GWAS\data
.L:\PCN\UBPC\ANALYSES_RECHERCHE\Jenny\PSYMETAB_GWAS\data\GSPMA24v1_0-A_4349HNR_Samples.egt
and click “Finish”.L:\PCN\UBPC\ANALYSES_RECHERCHE\Jenny\PSYMETAB_GWAS\GS_project_26092019
.GS_project_26092019.bsc
was opened (requires Genome Studio) and used for clustering.L:\PCN\UBPC\ANALYSES_RECHERCHE\Jenny\PSYMETAB_GWAS
, and named: GS_project_26092019_cluster
.L:\PCN\UBPC\ANALYSES_RECHERCHE\Jenny\PSYMETAB_GWAS
, and named: PLINK_091019_0920
.L:\PCN\UBPC\ANALYSES_RECHERCHE\Jenny\PSYMETAB_GWAS\PLINK_091019_0920
.013CB
017CB
074CB
095CB
150CRV
192CRV
193CRV
156CSM
181CSM
191CSM
224UAS
234GL
058GP
246GP
089PP
L:\PCN\UBPC\ANALYSES_RECHERCHE\Jenny\PSYMETAB_GWAS\PSYMETAB_GS2\Plates27to29_0819
.Plates27to29_0819_cluster
, and PLINK_270819_0457
).PLINK_030919_0149
) were copied to SGG directory (names of plink files according to parent directory: DATA
).je4649@hpc1.chuv.ch
<chuv-password>
22
/data/sgg2/jenny/projects/PSYMETAB_GWAS/data/raw
.All subsequent steps were performed on the sgg
server and run using drake
plan
Results of Step 3-5 are saved to analysis/QC
Each step corresponds to one folder within analysis/QC
data/processed/phenotype_data/PSYMETAB_GWAS_sex.txt
(created above).
F M
1298 1469
#FID1 ID1 FID2 ID2 NSNP HETHET IBS0 KINSHIP
1 2071 BEEEDIGO002 224 BEEEDIGO 703110 0.178137 0.00000000000 0.499539
2 1873 CQLIXEZP002 64 CQLIXEZP 703045 0.153413 0.00000142238 0.499504
3 1965 EFWKQOIK002 1433 EFWKQOIK 697403 0.151525 0.00000860335 0.496680
4 1886 HFNWJHCI002 1448 HFNWJHCI 702845 0.153089 0.00000426837 0.499547
5 2075 HROOJNCI002 553 HROOJNCI 702167 0.155970 0.00000284833 0.499257
6 1974 IOAWLZGK002 549 IOAWLZGK 704278 0.153028 0.00000000000 0.499847
7 2314 KLFEBCIE002 1916 KLFEBCIE 700799 0.153949 0.00000570777 0.499007
8 2073 LWCGLSDP002 317 LWCGLSDP 702226 0.150114 0.00000427213 0.499363
9 2379 PBAIFEMQ002 2070 PBAIFEMQ 700642 0.154083 0.00000285452 0.498820
10 2009 PNWDYVRH002 494 PNWDYVRH 703993 0.153736 0.00000284094 0.499806
11 2068 QHNUPGWK002 318 QHNUPGWK 702891 0.154500 0.00000569078 0.499363
12 1928 QZAUHIPY002 559 QZAUHIPY 702896 0.144711 0.00000142269 0.499826
13 2067 SSITXXAY002 283 SSITXXAY 702409 0.152603 0.00000284734 0.499418
14 1947 WKBFDWJF002 566 WKBFDWJF 703642 0.153506 0.00000284235 0.499783
15 1657 XABRILAR002 1385 XABRILAR 698282 0.154672 0.00000000000 0.497315
data/processed/reference_files/rsid_conversion.txt
MAF = 0
.geno --0.1
):
mind --0.1
):
geno --0.05
):
mind --0.05
):
geno --0.01
):
mind --0.01
):
Total removed: 50693 variants (7.91%) and 11 individuals (0.40%).
-- freq
).HRC-1000G-check-bim-NoReadKey.pl
(download link).
FID
and IID
.Run-plink.sh
script from #2.vcf
files.vcf.gz
files to Michigan Imputation Server as follows:
Run
, Genotype Imputation (Minimac4)
.HRC r1.1 2016 (GRCh37/hg19)
.GRCh37/hg19
.off
.Eagle v2.4 (phased output)
.EUR
.Quality Control & Imputation
. Chr Num imputed variants
1 1 3069931
2 2 3392237
3 3 2821894
4 4 2787581
5 5 2588168
6 6 2460111
7 7 2289305
8 8 2242705
9 9 1686471
10 10 1927503
11 11 1936990
12 12 1848117
13 13 1385433
14 14 1270436
15 15 1139215
16 16 1281297
17 17 1090072
18 18 1104755
19 19 868554
20 20 884983
21 21 531276
22 22 524544
23 all 39131578
info < 0.30
Chr Num imputed variants R2 filtered
1 1 3069931 2223452
2 2 3392237 2454557
3 3 2821894 2059285
4 4 2787581 2043343
5 5 2588168 1887050
6 6 2460111 1813223
7 7 2289305 1659374
8 8 2242705 1632513
9 9 1686471 1218207
10 10 1927503 1411585
11 11 1936990 1421749
12 12 1848117 1352003
13 13 1385433 1019900
14 14 1270436 920528
15 15 1139215 816986
16 16 1281297 894859
17 17 1090072 771058
18 18 1104755 798012
19 19 868554 607422
20 20 884983 631946
21 21 531276 374376
22 22 524544 367153
23 all 39131578 28378581
bim
file to include rsIDs instead of chr:bp
convention:
chr:bp:ref:alt
.chr:bp:ref:alt
./data/sgg2/jenny/data/dbSNP/dbSNP_SNP_list_chr${chr}.txt
, which was processed according to description in jenny/SGG_generic/scripts/public_data.sh
.require=info "TYPED"
flag). Chr Num imputed variants R2 filtered Typed SNPs
1 1 3069931 2223452 44252
2 2 3392237 2454557 45514
3 3 2821894 2059285 37277
4 4 2787581 2043343 34139
5 5 2588168 1887050 32223
6 6 2460111 1813223 38957
7 7 2289305 1659374 30649
8 8 2242705 1632513 28611
9 9 1686471 1218207 23815
10 10 1927503 1411585 27714
11 11 1936990 1421749 27722
12 12 1848117 1352003 26603
13 13 1385433 1019900 19346
14 14 1270436 920528 17920
15 15 1139215 816986 17002
16 16 1281297 894859 18447
17 17 1090072 771058 16646
18 18 1104755 798012 15625
19 19 868554 607422 12997
20 20 884983 631946 13557
21 21 531276 374376 7582
22 22 524544 367153 8104
23 all 39131578 28378581 544702
bim
, bed
, and fam
files) using the --hard-call-threshold
flag set at 0.1
.vcf
as there is no merge function in plink (using the flag --recode vcf id-paste=iid vcf-dosage=HDS
).bcftools concat
.plink2 --vcf <output-name> dosage=HDS
.code was run in 5 steps
sessionInfo()
R version 3.5.3 (2019-03-11)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)
Matrix products: default
BLAS: /data/sgg2/jenny/bin/R-3.5.3/lib64/R/lib/libRblas.so
LAPACK: /data/sgg2/jenny/bin/R-3.5.3/lib64/R/lib/libRlapack.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] tidyselect_0.2.5 kableExtra_1.1.0 R.utils_2.9.2
[4] R.oo_1.23.0 R.methodsS3_1.7.1 TwoSampleMR_0.4.25
[7] reader_1.0.6 NCmisc_1.1.6 optparse_1.6.4
[10] readxl_1.3.1 ggthemes_4.2.0 tryCatchLog_1.1.6
[13] futile.logger_1.4.3 DataExplorer_0.8.0 taRifx_1.0.6.1
[16] qqman_0.1.4 MASS_7.3-51.5 bit64_0.9-7
[19] bit_1.1-14 rslurm_0.5.0 rmeta_3.0
[22] devtools_2.2.1 usethis_1.5.1 data.table_1.12.8
[25] clustermq_0.8.8.1 future.batchtools_0.8.1 future_1.15.1
[28] rlang_0.4.2 knitr_1.26 drake_7.9.0.9000
[31] forcats_0.4.0 stringr_1.4.0 dplyr_0.8.3
[34] purrr_0.3.3 readr_1.3.1 tidyr_1.0.0
[37] tibble_2.1.3 ggplot2_3.2.1 tidyverse_1.3.0
[40] pacman_0.5.1 processx_3.4.1 workflowr_1.6.0
loaded via a namespace (and not attached):
[1] backports_1.1.5 plyr_1.8.5 igraph_1.2.4.2
[4] lazyeval_0.2.2 storr_1.2.1 listenv_0.8.0
[7] digest_0.6.23 htmltools_0.4.0 fansi_0.4.1
[10] magrittr_1.5 checkmate_1.9.4 memoise_1.1.0
[13] base64url_1.4 remotes_2.1.0 globals_0.12.5
[16] modelr_0.1.5 prettyunits_1.1.0 colorspace_1.4-1
[19] rvest_0.3.5 rappdirs_0.3.1 haven_2.2.0
[22] xfun_0.11 callr_3.4.0 crayon_1.3.4
[25] jsonlite_1.6 zeallot_0.1.0 brew_1.0-6
[28] glue_1.3.1 gtable_0.3.0 webshot_0.5.2
[31] pkgbuild_1.0.6 scales_1.1.0 futile.options_1.0.1
[34] DBI_1.1.0 Rcpp_1.0.3 viridisLite_0.3.0
[37] progress_1.2.2 txtq_0.2.0 htmlwidgets_1.5.1
[40] httr_1.4.1 getopt_1.20.3 calibrate_1.7.5
[43] ellipsis_0.3.0 pkgconfig_2.0.3 dbplyr_1.4.2
[46] reshape2_1.4.3 later_1.0.0 munsell_0.5.0
[49] cellranger_1.1.0 tools_3.5.3 cli_2.0.1
[52] generics_0.0.2 broom_0.5.3 evaluate_0.14
[55] yaml_2.2.0 fs_1.3.1 packrat_0.5.0
[58] nlme_3.1-143 whisker_0.4 formatR_1.7
[61] proftools_0.99-2 xml2_1.2.2 compiler_3.5.3
[64] rstudioapi_0.10 filelock_1.0.2 testthat_2.3.1
[67] reprex_0.3.0 stringi_1.4.5 highr_0.8
[70] ps_1.3.0 desc_1.2.0 lattice_0.20-38
[73] vctrs_0.2.1 pillar_1.4.3 lifecycle_0.1.0
[76] networkD3_0.4 httpuv_1.5.2 R6_2.4.1
[79] promises_1.1.0 gridExtra_2.3 sessioninfo_1.1.1
[82] codetools_0.2-16 lambda.r_1.2.4 assertthat_0.2.1
[85] pkgload_1.0.2 rprojroot_1.3-2 withr_2.1.2
[88] batchtools_0.9.12 parallel_3.5.3 hms_0.5.3
[91] grid_3.5.3 rmarkdown_1.18 git2r_0.26.1
[94] lubridate_1.7.4