Last updated: 2020-06-23
Checks: 7 0
Knit directory: rss/
This reproducible R Markdown analysis was created with workflowr (version 1.6.2). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.
Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.
Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.
The command set.seed(20200623)
was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.
Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.
Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.
Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.
The results in this page were generated with repository version 5782e93. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.
Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish
or wflow_git_commit
). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:
Ignored files:
Ignored: .Rproj.user/
Ignored: .spelling
Ignored: examples/example5/.Rhistory
Ignored: examples/example5/Aseg_chr16.mat
Ignored: examples/example5/example5_simulated_data.mat
Ignored: examples/example5/example5_simulated_results.mat
Ignored: examples/example5/ibd2015_path2641_genes_results.mat
Untracked files:
Untracked: docs_old/
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
These are the previous versions of the repository in which changes were made to the R Markdown (rmd/input_data.Rmd
) and HTML (docs/input_data.html
) files. If you’ve configured a remote Git repository (see ?wflow_git_remote
), click on the hyperlinks in the table below to view the files as they were in that past version.
File | Version | Author | Date | Message |
---|---|---|---|---|
Rmd | 5782e93 | Xiang Zhu | 2020-06-23 | wflow_publish(“rmd/input_data.Rmd”) |
This section is modified from Box 1 of Winkler (2014).
The following columns are required for any RSS analysis:
snp
: identifier of genetic variant, character string such as rs12498742
;chr
: chromosome number of genetic variant such as chr1
,…,chr22
, chrX
, chrY
;pos
: physical position, in base pair, of genetic variant;a1
: allele associated with the trait, a single upper case character A
, C
, G
or T
;a2
: the other (non-effect) allele, a single upper case character A
, C
, G
or T
;betahat
: estimated effect size of genetic variant under the single-marker model;se
: estimated standard error of betahat
.The following columns are optional, but they can be very helpful for sanity checks:
strand
: strand on which the alleles are reported, a single character -
or +
;n
: number of individuals analyzed (a.k.a. sample size) for the genetic variant;maf
: minor allele frequency, numeric between 0 and 1;p
: p-value of genetic variant association, numeric between 0 and 1;info
: other information about genetic variants.It is very important to make sure that [a1, betahat, se]
are perfectly matched. Below is a toy example. Consider two SNPs (rs1
, rs2
) and four individuals (i1
, i2
, i3
, i4
):
If the effect alleles (a1
) of these two SNPs are A
and G
respectively, then the genotype data of rs1
are X[, 1]=[1, 0, 1, 2]
, and the genotype data of rs2
are X[, 2]=[1, 0, 2, 1]
. Further, the single-SNP summary statistics of rs1
and rs2
are given by:
(betahat[1], se[1]) <- single.SNP.model(y, X[, 1])
(betahat[2], se[2]) <- single.SNP.model(y, X[, 2])
Finally, when providing chr
and pos
columns, please explicitly specify the assembly releases and versions of human genome. For example, if 1000 Genomes Project Phase 3 data are used to estimate LD, please ensure that chr
and pos
columns are based on UCSC hg19/GRCh37.
All RSS methods to date also require the input of an estimated LD matrix.
The LD estimates are often derived from the phased haplotype data from 1000 Genomes Project Phase 3 data. Because the 1000 Genomes data are publicly available, the LD estimates only require the list of genetic variants, their physical positions and their effect alleles (i.e. [snp, chr, pos, a1]
from the summary statistics file).
If there are some internal genotype data that can be used to estimate LD matrix, please organize the genotype data in the same VCF format as 1000 Genomes Phase 3 data. Again, make sure that the physical positions and effect alleles of the internal genotype data are consistent with [chr, pos, a1]
provided in the GWAS summary statistics file.
Annotation data are only required if you want to use RSS for enrichment analyses. The most statistician-friendly format of genomic annotation data might look like this:
snp chr pos ann1 ann2 ann3
rs1 chr2 52877 0 0 0
rs2 chr1 50670 0 1 0
rs3 chr14 854 0 1 1
rs4 chr4 99620 1 1 1
rs5 chr16 71537 0 0 0
rs6 chr22 39741 0 0 0
rs7 chr6 89331 1 0 0
where ann1
, ann2
and ann3
are three types of annotations, 1
indicates that SNP is annotated and 0
otherwise.
Alternatively, a list of annotated SNPs can be saved as a separate file. For example:
> cat ann3.txt
snp chr pos
rs3 chr14 854
rs4 chr4 99620
> cat ann2.txt
snp chr pos
rs2 chr1 50670
rs3 chr14 854
rs4 chr4 99620
> cat ann1.txt
snp chr pos
rs4 chr4 99620
rs7 chr6 89331
Sometimes the annotations are based on genes or genomic regions (e.g. biological pathways). For these annotations, it is easier to provide a list of annotated regions:
ensembl_gene_id chromosome_name start_position end_position
ENSG00000000938 1 27938575 27961788
ENSG00000008438 19 46522411 46526323
ENSG00000008516 16 3096682 3110727
ENSG00000066336 11 47376411 47400127
ENSG00000077984 20 24929866 24940564
ENSG00000085265 9 137801431 137809809
For all these annotation files, please make sure that the physical positions ([snp, chr, pos]
or [chromosome_name, start_position, end_position]
) are consistent with [snp, chr, pos]
in the summary statistics file.
─ Session info ───────────────────────────────────────────────────────────────
setting value
version R version 4.0.1 (2020-06-06)
os macOS Catalina 10.15.5
system x86_64, darwin17.0
ui X11
language (EN)
collate en_US.UTF-8
ctype en_US.UTF-8
tz America/Los_Angeles
date 2020-06-23
─ Packages ───────────────────────────────────────────────────────────────────
package * version date lib source
assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.0.0)
backports 1.1.8 2020-06-17 [1] CRAN (R 4.0.0)
callr 3.4.3 2020-03-28 [1] CRAN (R 4.0.0)
cli 2.0.2 2020-02-28 [1] CRAN (R 4.0.0)
crayon 1.3.4 2017-09-16 [1] CRAN (R 4.0.0)
desc 1.2.0 2018-05-01 [1] CRAN (R 4.0.0)
devtools 2.3.0 2020-04-10 [1] CRAN (R 4.0.0)
digest 0.6.25 2020-02-23 [1] CRAN (R 4.0.0)
ellipsis 0.3.1 2020-05-15 [1] CRAN (R 4.0.0)
evaluate 0.14 2019-05-28 [1] CRAN (R 4.0.0)
fansi 0.4.1 2020-01-08 [1] CRAN (R 4.0.0)
fs 1.4.1 2020-04-04 [1] CRAN (R 4.0.0)
git2r 0.27.1 2020-05-03 [1] CRAN (R 4.0.0)
glue 1.4.1 2020-05-13 [1] CRAN (R 4.0.0)
htmltools 0.5.0 2020-06-16 [1] CRAN (R 4.0.0)
httpuv 1.5.4 2020-06-06 [1] CRAN (R 4.0.0)
knitr 1.29 2020-06-23 [1] CRAN (R 4.0.0)
later 1.1.0.1 2020-06-05 [1] CRAN (R 4.0.0)
magrittr 1.5 2014-11-22 [1] CRAN (R 4.0.0)
memoise 1.1.0 2017-04-21 [1] CRAN (R 4.0.0)
pkgbuild 1.0.8 2020-05-07 [1] CRAN (R 4.0.0)
pkgload 1.1.0 2020-05-29 [1] CRAN (R 4.0.0)
prettyunits 1.1.1 2020-01-24 [1] CRAN (R 4.0.0)
processx 3.4.2 2020-02-09 [1] CRAN (R 4.0.0)
promises 1.1.1 2020-06-09 [1] CRAN (R 4.0.0)
ps 1.3.3 2020-05-08 [1] CRAN (R 4.0.0)
R6 2.4.1 2019-11-12 [1] CRAN (R 4.0.0)
Rcpp 1.0.4.6 2020-04-09 [1] CRAN (R 4.0.0)
remotes 2.1.1 2020-02-15 [1] CRAN (R 4.0.0)
rlang 0.4.6 2020-05-02 [1] CRAN (R 4.0.0)
rmarkdown 2.3 2020-06-18 [1] CRAN (R 4.0.0)
rprojroot 1.3-2 2018-01-03 [1] CRAN (R 4.0.0)
sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 4.0.0)
stringi 1.4.6 2020-02-17 [1] CRAN (R 4.0.0)
stringr 1.4.0 2019-02-10 [1] CRAN (R 4.0.0)
testthat 2.3.2 2020-03-02 [1] CRAN (R 4.0.0)
usethis 1.6.1 2020-04-29 [1] CRAN (R 4.0.0)
whisker 0.4 2019-08-28 [1] CRAN (R 4.0.0)
withr 2.2.0 2020-04-20 [1] CRAN (R 4.0.0)
workflowr * 1.6.2 2020-04-30 [1] CRAN (R 4.0.0)
xfun 0.15 2020-06-21 [1] CRAN (R 4.0.0)
yaml 2.2.1 2020-02-01 [1] CRAN (R 4.0.0)
[1] /Library/Frameworks/R.framework/Versions/4.0/Resources/library