Last updated: 2020-06-23
Checks: 2 0
Knit directory: rss/
This reproducible R Markdown analysis was created with workflowr (version 1.6.2). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.
Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.
Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.
The results in this page were generated with repository version c4df22a. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.
Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish
or wflow_git_commit
). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:
Ignored files:
Ignored: .Rproj.user/
Ignored: .spelling
Ignored: examples/example5/.Rhistory
Ignored: examples/example5/Aseg_chr16.mat
Ignored: examples/example5/example5_simulated_data.mat
Ignored: examples/example5/example5_simulated_results.mat
Ignored: examples/example5/ibd2015_path2641_genes_results.mat
Untracked files:
Untracked: docs_old/
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
These are the previous versions of the repository in which changes were made to the R Markdown (rmd/height2014_data.Rmd
) and HTML (docs/height2014_data.html
) files. If you’ve configured a remote Git repository (see ?wflow_git_remote
), click on the hyperlinks in the table below to view the files as they were in that past version.
File | Version | Author | Date | Message |
---|---|---|---|---|
Rmd | c4df22a | Xiang Zhu | 2020-06-23 | wflow_publish(“rmd/height2014_data.Rmd”) |
This page provides information on the preprocessed adult height genome-wide association study (GWAS) summary statistics and estimated linkage disequilibrium (LD) matrices, which were created and analyzed in the following publication.
Zhu, Xiang; Stephens, Matthew. Bayesian large-scale multiple regression with summary statistics from genome-wide association studies. Ann. Appl. Stat. 11 (2017), no. 3, 1561–1592. DOI:10.1214/17-AOAS1046.
This dataset is publicly available at https://doi.org/10.5281/zenodo.1443565, and can be referenced in a journal’s “Data availability” section as
If you find this dataset useful in your research, please kindly cite the publication listed above, Zhu and Stephens (2017).
The folder mat_files
contains the GWAS summary statistics for each of the 22 autosomes.
>> load mat_files/height2014.chr22.mat;
>> whos
Name Size Bytes Class Attributes
H 15599x758 94592336 double
Nsnp 15599x1 62396 int32
betahat 15599x1 124792 double
chr 15599x1 62396 int32
cummap 15599x1 124792 double
pos18 15599x1 62396 int32
pos19 15599x1 62396 int32
se 15599x1 124792 double
Most variables above are self-explanatory. The single-SNP GWAS summary statistics {betahat, se}
are published in Wood et al. (2014). The matrix H
contains the phased haplotypes of 379 European ancestry individuals in the 1000 Genomes Project Phase 1. The vector cummap
contains the genetic map of HapMap Release 24 European-ancestry population. Note that {betahat, se, H}
must use the SAME way of coding alleles; otherwise RSS results will be severely distorted.
The folder estimated_ld_sparse
contains the estimated LD matrices for each of the 22 autosomes. The estimation method is detailed in Wen and Stephens (2010), and is implemented by get_corr.m
.
Files R.chr*.mat
were generated by setting cutoff=1e-8
in get_corr.m
. Files R.chr*.3.mat
were generated by setting cutoff=1e-3
. These LD matrices are stored as sparse matrices to save space.
>> load estimated_ld_sparse/R.chr22.mat;
>> whos R
Name Size Bytes Class Attributes
R 15599x15599 592534032 double sparse
Of note, if you run RSS MCMC programs (rss/src
) with these LD data, please first convert them to full matrices: R=full(R)
. I recommend using full LD matrices in MCMC because each iteration of MCMC involves multiple matrix indexing operations (i.e. sampling SNPs to include in or exclude from the current regression model), and based on my experiments, full matrix indexing is much faster than sparse matrix indexing (at least in MATLAB).
However, if you run RSS VB programs (rss/src_vb
) with these LD data, please do NOT convert them to full matrices. Indeed, RSS VB programs require that input LD matrices must be sparse.