Last updated: 2024-09-16
Checks: 7 0
Knit directory: rss/
This reproducible R Markdown analysis was created with workflowr (version 1.7.1). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.
Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.
Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.
The command set.seed(20200623)
was run prior to running
the code in the R Markdown file. Setting a seed ensures that any results
that rely on randomness, e.g. subsampling or permutations, are
reproducible.
Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.
Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.
Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.
The results in this page were generated with repository version 941a146. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.
Note that you need to be careful to ensure that all relevant files for
the analysis have been committed to Git prior to generating the results
(you can use wflow_publish
or
wflow_git_commit
). workflowr only checks the R Markdown
file, but you know if there are other scripts or data files that it
depends on. Below is the status of the Git repository when the results
were generated:
Ignored files:
Ignored: .Rhistory
Ignored: .Rproj.user/
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
These are the previous versions of the repository in which changes were
made to the R Markdown (rmd/input_data.Rmd
) and HTML
(docs/input_data.html
) files. If you’ve configured a remote
Git repository (see ?wflow_git_remote
), click on the
hyperlinks in the table below to view the files as they were in that
past version.
File | Version | Author | Date | Message |
---|---|---|---|---|
html | 17eb1e0 | Xiang Zhu | 2021-03-05 | Build site. |
Rmd | 06cfb1d | Xiang Zhu | 2021-03-05 | wflow_publish("rmd/input_data.Rmd") |
html | bab3f58 | Xiang Zhu | 2020-06-24 | Build site. |
html | 1daecfb | Xiang Zhu | 2020-06-23 | Build site. |
Rmd | 5782e93 | Xiang Zhu | 2020-06-23 | wflow_publish("rmd/input_data.Rmd") |
All RSS methods to date require the input of GWAS summary statistics and ancestry-matching LD estimates. Some RSS methods further require the input of genomic annotations.
This section is modified from Box 1 of Winkler (2014).
The following columns are required for any RSS analysis:
snp
: identifier of genetic variant, character string
such as rs12498742
;chr
: autosome number of genetic variant such as
chr1
,…,chr22
;pos
: physical position, in base pair, of genetic
variant;a1
: effect allele, a single upper case character
A
, C
, G
or T
;a2
: the other (non-effect) allele, a single upper case
character A
, C
, G
or
T
;betahat
: estimated effect size of genetic variant under
the single-marker model;se
: estimated standard error of
betahat
.The following columns are optional, but they can be helpful for sanity checks:
strand
: strand on which the alleles are reported, a
single character -
or +
;n
: number of individuals analyzed (i.e., sample size)
for the genetic variant;maf
: minor allele frequency, numeric between 0 and
1;p
: p-value of genetic variant association, numeric
between 0 and 1;info
: other information (e.g., imputation quality)
about genetic variants.It is crucial to make sure that [a1, betahat, se]
are
consistently defined. Below is a toy example. Consider two SNPs
(rs1
, rs2
) and four individuals
(i1
, i2
, i3
,
i4
):
If the effect alleles (a1
) of these two SNPs are
A
and G
respectively, then the genotype data
of rs1
are X[, 1]=[1, 0, 1, 2]
and the
genotype data of rs2
are X[, 2]=[1, 0, 2, 1]
.
Further, the single-SNP summary statistics of rs1
and
rs2
are generated as follows.
(betahat[1], se[1]) <- single.SNP.model(y, X[, 1])
(betahat[2], se[2]) <- single.SNP.model(y, X[, 2])
Finally, when providing chr
and pos
columns, please confirm the assembly releases
and versions of human genome. For example, if 1000 Genomes Project
Phase 3 data are used to generate ancestry-matching LD estimates, then
chr
and pos
columns should be based on UCSC
hg19/GRCh37.
The ancestry-matching LD estimates are often derived from the phased
haplotype data from 1000 Genomes Project
Phase 3 data. Because the 1000 Genomes data are publicly available,
the LD estimates only require the list of genetic variants, their
physical positions and effect alleles
(i.e. [snp, chr, pos, a1]
from the summary statistics
file).
The script import_1000g_vcf.sh
illustrates how to extract phased haplotypes of select genetic variants
from 1000 Genomes Phase 3 VCF format data and save them in IMPUTE
reference-panel format *.impute.hap
.
The scripts get_corr.m
and get_corr.R
illustrate how to compute LD estimates in MATLAB and R respectively.
If there are some internal genotype data that can be used to estimate
LD matrix, you can first organize the genotype data in the same VCF
format as 1000
Genomes Phase 3 data, and then reuse my scripts above. Again, please
make sure that the physical positions and effect alleles of the internal
genotype data are consistent with [chr, pos, a1]
provided
in the GWAS summary statistics file.
The most statistician-friendly format of genomic annotation data might look like this:
snp chr pos ann1 ann2 ann3
rs1 chr2 52877 0 0 0
rs2 chr1 50670 0 1 0
rs3 chr14 854 0 1 1
rs4 chr4 99620 1 1 1
rs5 chr16 71537 0 0 0
rs6 chr22 39741 0 0 0
rs7 chr6 89331 1 0 0
where ann1
, ann2
and ann3
are
three types of annotations, 1
indicates that SNP is
annotated and 0
otherwise.
Alternatively, a list of annotated SNPs can be saved as a separate file. For example:
> cat ann3.txt
snp chr pos
rs3 chr14 854
rs4 chr4 99620
> cat ann2.txt
snp chr pos
rs2 chr1 50670
rs3 chr14 854
rs4 chr4 99620
> cat ann1.txt
snp chr pos
rs4 chr4 99620
rs7 chr6 89331
Sometimes the annotations are based on genes (e.g., biological pathways) or genomic regions (e.g., regulatory elements). For these region-based annotations, it is easier to provide a list of annotated regions as follows:
ensembl_gene_id chromosome_name start_position end_position
ENSG00000000938 1 27938575 27961788
ENSG00000008438 19 46522411 46526323
ENSG00000008516 16 3096682 3110727
ENSG00000066336 11 47376411 47400127
ENSG00000077984 20 24929866 24940564
ENSG00000085265 9 137801431 137809809
Similar to LD estimates, please make sure that the physical positions
([snp, chr, pos]
or
[chromosome_name, start_position, end_position]
) in the
annotation file are consistent with
[snp, chr, pos]
in the summary statistics file.
─ Session info ───────────────────────────────────────────────────────────────
setting value
version R version 4.4.1 (2024-06-14)
os macOS Sonoma 14.6.1
system aarch64, darwin20
ui X11
language (EN)
collate en_US.UTF-8
ctype en_US.UTF-8
tz America/Los_Angeles
date 2024-09-16
pandoc 3.1.11 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/tools/aarch64/ (via rmarkdown)
─ Packages ───────────────────────────────────────────────────────────────────
package * version date (UTC) lib source
bslib 0.8.0 2024-07-29 [1] CRAN (R 4.4.0)
cachem 1.1.0 2024-05-16 [1] CRAN (R 4.4.0)
callr 3.7.6 2024-03-25 [1] CRAN (R 4.4.0)
cli 3.6.3 2024-06-21 [1] CRAN (R 4.4.0)
devtools 2.4.5 2022-10-11 [1] CRAN (R 4.4.0)
digest 0.6.37 2024-08-19 [1] CRAN (R 4.4.1)
ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.4.0)
evaluate 0.24.0 2024-06-10 [1] CRAN (R 4.4.0)
fansi 1.0.6 2023-12-08 [1] CRAN (R 4.4.0)
fastmap 1.2.0 2024-05-15 [1] CRAN (R 4.4.0)
fs 1.6.4 2024-04-25 [1] CRAN (R 4.4.0)
getPass 0.2-4 2023-12-10 [1] CRAN (R 4.4.0)
git2r 0.33.0 2023-11-26 [1] CRAN (R 4.4.0)
glue 1.7.0 2024-01-09 [1] CRAN (R 4.4.0)
htmltools 0.5.8.1 2024-04-04 [1] CRAN (R 4.4.0)
htmlwidgets 1.6.4 2023-12-06 [1] CRAN (R 4.4.0)
httpuv 1.6.15 2024-03-26 [1] CRAN (R 4.4.0)
httr 1.4.7 2023-08-15 [1] CRAN (R 4.4.0)
jquerylib 0.1.4 2021-04-26 [1] CRAN (R 4.4.0)
jsonlite 1.8.8 2023-12-04 [1] CRAN (R 4.4.0)
knitr 1.48 2024-07-07 [1] CRAN (R 4.4.0)
later 1.3.2 2023-12-06 [1] CRAN (R 4.4.0)
lifecycle 1.0.4 2023-11-07 [1] CRAN (R 4.4.0)
magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.4.0)
memoise 2.0.1 2021-11-26 [1] CRAN (R 4.4.0)
mime 0.12 2021-09-28 [1] CRAN (R 4.4.0)
miniUI 0.1.1.1 2018-05-18 [1] CRAN (R 4.4.0)
pillar 1.9.0 2023-03-22 [1] CRAN (R 4.4.0)
pkgbuild 1.4.4 2024-03-17 [1] CRAN (R 4.4.0)
pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.4.0)
pkgload 1.4.0 2024-06-28 [1] CRAN (R 4.4.0)
processx 3.8.4 2024-03-16 [1] CRAN (R 4.4.0)
profvis 0.3.8 2023-05-02 [1] CRAN (R 4.4.0)
promises 1.3.0 2024-04-05 [1] CRAN (R 4.4.0)
ps 1.8.0 2024-09-12 [1] CRAN (R 4.4.1)
purrr 1.0.2 2023-08-10 [1] CRAN (R 4.4.0)
R6 2.5.1 2021-08-19 [1] CRAN (R 4.4.0)
Rcpp 1.0.13 2024-07-17 [1] CRAN (R 4.4.0)
remotes 2.5.0 2024-03-17 [1] CRAN (R 4.4.0)
rlang 1.1.4 2024-06-04 [1] CRAN (R 4.4.0)
rmarkdown 2.28 2024-08-17 [1] CRAN (R 4.4.0)
rprojroot 2.0.4 2023-11-05 [1] CRAN (R 4.4.0)
rstudioapi 0.16.0 2024-03-24 [1] CRAN (R 4.4.0)
sass 0.4.9.9000 2024-07-11 [1] Github (rstudio/sass@9228fcf)
sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.4.0)
shiny 1.9.1 2024-08-01 [1] CRAN (R 4.4.0)
stringi 1.8.4 2024-05-06 [1] CRAN (R 4.4.0)
stringr 1.5.1 2023-11-14 [1] CRAN (R 4.4.0)
tibble 3.2.1 2023-03-20 [1] CRAN (R 4.4.0)
urlchecker 1.0.1 2021-11-30 [1] CRAN (R 4.4.0)
usethis 3.0.0 2024-07-29 [1] CRAN (R 4.4.0)
utf8 1.2.4 2023-10-22 [1] CRAN (R 4.4.0)
vctrs 0.6.5 2023-12-01 [1] CRAN (R 4.4.0)
whisker 0.4.1 2022-12-05 [1] CRAN (R 4.4.0)
workflowr * 1.7.1 2023-08-23 [1] CRAN (R 4.4.0)
xfun 0.47 2024-08-17 [1] CRAN (R 4.4.0)
xtable 1.8-4 2019-04-21 [1] CRAN (R 4.4.0)
yaml 2.3.10 2024-07-26 [1] CRAN (R 4.4.0)
[1] /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library
──────────────────────────────────────────────────────────────────────────────