Last updated: 2024-09-16
Checks: 2 0
Knit directory: rss/
This reproducible R Markdown analysis was created with workflowr (version 1.7.1). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.
Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.
Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.
The results in this page were generated with repository version 941a146. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.
Note that you need to be careful to ensure that all relevant files for
the analysis have been committed to Git prior to generating the results
(you can use wflow_publish
or
wflow_git_commit
). workflowr only checks the R Markdown
file, but you know if there are other scripts or data files that it
depends on. Below is the status of the Git repository when the results
were generated:
Ignored files:
Ignored: .Rhistory
Ignored: .Rproj.user/
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
These are the previous versions of the repository in which changes were
made to the R Markdown (rmd/height2014_data.Rmd
) and HTML
(docs/height2014_data.html
) files. If you’ve configured a
remote Git repository (see ?wflow_git_remote
), click on the
hyperlinks in the table below to view the files as they were in that
past version.
File | Version | Author | Date | Message |
---|---|---|---|---|
html | bab3f58 | Xiang Zhu | 2020-06-24 | Build site. |
html | aa569b2 | Xiang Zhu | 2020-06-23 | Build site. |
Rmd | c4df22a | Xiang Zhu | 2020-06-23 | wflow_publish("rmd/height2014_data.Rmd") |
This page provides information on the preprocessed adult height genome-wide association study (GWAS) summary statistics and estimated linkage disequilibrium (LD) matrices, which were created and analyzed in the following publication.
Zhu, Xiang; Stephens, Matthew. Bayesian large-scale multiple regression with summary statistics from genome-wide association studies. Ann. Appl. Stat. 11 (2017), no. 3, 1561–1592. DOI:10.1214/17-AOAS1046.
This dataset is publicly available at https://doi.org/10.5281/zenodo.1443565, and can be referenced in a journal’s “Data availability” section as
If you find this dataset useful in your research, please kindly cite the publication listed above, Zhu and Stephens (2017).
The folder mat_files
contains the GWAS summary
statistics for each of the 22 autosomes.
>> load mat_files/height2014.chr22.mat;
>> whos
Name Size Bytes Class Attributes
H 15599x758 94592336 double
Nsnp 15599x1 62396 int32
betahat 15599x1 124792 double
chr 15599x1 62396 int32
cummap 15599x1 124792 double
pos18 15599x1 62396 int32
pos19 15599x1 62396 int32
se 15599x1 124792 double
Most variables above are self-explanatory. The single-SNP GWAS
summary statistics {betahat, se}
are published in Wood et
al. (2014). The matrix H
contains the phased haplotypes
of 379 European ancestry individuals in the 1000 Genomes Project
Phase 1. The vector cummap
contains the genetic map of
HapMap Release
24 European-ancestry population. Note that
{betahat, se, H}
must use the SAME way of coding alleles;
otherwise RSS results will be severely distorted.
The folder estimated_ld_sparse
contains the estimated LD
matrices for each of the 22 autosomes. The estimation method is detailed
in Wen and
Stephens (2010), and is implemented by get_corr.m
.
Files R.chr*.mat
were generated by setting
cutoff=1e-8
in get_corr.m
.
Files R.chr*.3.mat
were generated by setting
cutoff=1e-3
. These LD matrices are stored as sparse
matrices to save space.
>> load estimated_ld_sparse/R.chr22.mat;
>> whos R
Name Size Bytes Class Attributes
R 15599x15599 592534032 double sparse
Of note, if you run RSS MCMC programs (rss/src
)
with these LD data, please first convert them to full matrices:
R=full(R)
. I recommend using full LD matrices in MCMC
because each iteration of MCMC involves multiple matrix indexing
operations (i.e. sampling SNPs to include in or exclude from the current
regression model), and based on my experiments, full matrix indexing is
much faster than sparse matrix indexing (at least in MATLAB).
However, if you run RSS VB programs (rss/src_vb
)
with these LD data, please do NOT convert them to full matrices. Indeed,
RSS VB programs require that input LD matrices must be sparse.