Last updated: 2022-02-21
Checks: 6 1
Knit directory: cTWAS_analysis/
This reproducible R Markdown analysis was created with workflowr (version 1.6.2). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.
Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.
Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.
The command set.seed(20211220)
was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.
Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.
Using absolute paths to the files within your workflowr project makes it difficult for you and others to run your code on a different machine. Change the absolute path(s) below to the suggested relative path(s) to make your code more reproducible.
absolute | relative |
---|---|
/project2/xinhe/shengqian/cTWAS/cTWAS_analysis/data/ | data |
/project2/xinhe/shengqian/cTWAS/cTWAS_analysis/code/ctwas_config.R | code/ctwas_config.R |
Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.
The results in this page were generated with repository version bbf6737. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.
Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish
or wflow_git_commit
). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:
Ignored files:
Ignored: .ipynb_checkpoints/
Ignored: analysis/figure/
Untracked files:
Untracked: Rplot.png
Untracked: analysis/Glucose_Adipose_Subcutaneous.Rmd
Untracked: analysis/Glucose_Adipose_Visceral_Omentum.Rmd
Untracked: analysis/Splicing_Test.Rmd
Untracked: code/.ipynb_checkpoints/
Untracked: code/AF_out/
Untracked: code/BMI_S_out/
Untracked: code/BMI_out/
Untracked: code/Glucose_out/
Untracked: code/LDL_S_out/
Untracked: code/T2D_out/
Untracked: code/ctwas_config.R
Untracked: code/mapping.R
Untracked: code/out/
Untracked: code/run_AF_analysis.sbatch
Untracked: code/run_AF_analysis.sh
Untracked: code/run_AF_ctwas_rss_LDR.R
Untracked: code/run_BMI_analysis.sbatch
Untracked: code/run_BMI_analysis.sh
Untracked: code/run_BMI_analysis_S.sbatch
Untracked: code/run_BMI_analysis_S.sh
Untracked: code/run_BMI_ctwas_rss_LDR.R
Untracked: code/run_BMI_ctwas_rss_LDR_S.R
Untracked: code/run_Glucose_analysis.sbatch
Untracked: code/run_Glucose_analysis.sh
Untracked: code/run_Glucose_ctwas_rss_LDR.R
Untracked: code/run_LDL_analysis_S.sbatch
Untracked: code/run_LDL_analysis_S.sh
Untracked: code/run_LDL_ctwas_rss_LDR_S.R
Untracked: code/run_T2D_analysis.sbatch
Untracked: code/run_T2D_analysis.sh
Untracked: code/run_T2D_ctwas_rss_LDR.R
Untracked: data/.ipynb_checkpoints/
Untracked: data/AF/
Untracked: data/BMI/
Untracked: data/BMI_S/
Untracked: data/Glucose/
Untracked: data/LDL_S/
Untracked: data/T2D/
Untracked: data/TEST/
Untracked: data/UKBB/
Untracked: data/UKBB_SNPs_Info.text
Untracked: data/gene_OMIM.txt
Untracked: data/gene_pip_0.8.txt
Untracked: data/mashr_Heart_Atrial_Appendage.db
Untracked: data/mashr_sqtl/
Untracked: data/summary_known_genes_annotations.xlsx
Untracked: data/untitled.txt
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
These are the previous versions of the repository in which changes were made to the R Markdown (analysis/BMI_Brain_Amygdala.Rmd
) and HTML (docs/BMI_Brain_Amygdala.html
) files. If you’ve configured a remote Git repository (see ?wflow_git_remote
), click on the hyperlinks in the table below to view the files as they were in that past version.
File | Version | Author | Date | Message |
---|---|---|---|---|
Rmd | bbf6737 | sq-96 | 2022-02-21 | update |
html | 91f38fa | sq-96 | 2022-02-13 | Build site. |
Rmd | eb13ecf | sq-96 | 2022-02-13 | update |
html | e6bc169 | sq-96 | 2022-02-13 | Build site. |
Rmd | 87fee8b | sq-96 | 2022-02-13 | update |
#number of imputed weights
nrow(qclist_all)
[1] 10285
#number of imputed weights by chromosome
table(qclist_all$chr)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
1009 735 594 399 502 598 475 370 404 396 626 561 223 326 336 453
17 18 19 20 21 22
604 162 819 311 117 265
#number of imputed weights without missing variants
sum(qclist_all$nmiss==0)
[1] 8365
#proportion of imputed weights without missing variants
mean(qclist_all$nmiss==0)
[1] 0.8133
Version | Author | Date |
---|---|---|
e6bc169 | sq-96 | 2022-02-13 |
#estimated group prior
estimated_group_prior <- group_prior_rec[,ncol(group_prior_rec)]
names(estimated_group_prior) <- c("gene", "snp")
estimated_group_prior["snp"] <- estimated_group_prior["snp"]*thin #adjust parameter to account for thin argument
print(estimated_group_prior)
gene snp
0.0078519 0.0002945
#estimated group prior variance
estimated_group_prior_var <- group_prior_var_rec[,ncol(group_prior_var_rec)]
names(estimated_group_prior_var) <- c("gene", "snp")
print(estimated_group_prior_var)
gene snp
22.61 17.40
#report sample size
print(sample_size)
[1] 336107
#report group size
group_size <- c(nrow(ctwas_gene_res), n_snps)
print(group_size)
[1] 10285 7535010
#estimated group PVE
estimated_group_pve <- estimated_group_prior_var*estimated_group_prior*group_size/sample_size #check PVE calculation
names(estimated_group_pve) <- c("gene", "snp")
print(estimated_group_pve)
gene snp
0.005432 0.114855
#compare sum(PIP*mu2/sample_size) with above PVE calculation
c(sum(ctwas_gene_res$PVE),sum(ctwas_snp_res$PVE))
[1] 0.1859 16.1230
genename region_tag susie_pip mu2 PVE z num_eqtl
7067 PPM1M 3_36 1.0000 203.85 6.065e-04 4.323 2
9534 GSAP 7_49 1.0000 31328.56 9.321e-02 5.260 1
9062 TMIE 3_33 0.9989 35.23 1.047e-04 -6.902 2
3144 CCND2 12_4 0.9639 29.10 8.345e-05 -5.120 1
2362 B3GAT1 11_84 0.9273 25.66 7.080e-05 -4.502 2
7314 CASP7 10_71 0.8414 24.92 6.240e-05 4.584 1
1743 TSC2 16_2 0.8369 30.80 7.669e-05 5.278 1
10858 SLC12A8 3_77 0.8108 22.51 5.431e-05 -4.338 1
129 CELSR3 3_34 0.8058 57.17 1.370e-04 -7.731 1
4947 SUOX 12_35 0.7929 57.59 1.359e-04 -5.807 1
6745 TAL1 1_29 0.7916 49.63 1.169e-04 -6.866 1
7155 ZNF12 7_9 0.7764 25.85 5.972e-05 4.972 2
4469 HEY2 6_84 0.7704 33571.89 7.696e-02 4.930 1
12540 RP11-823E8.3 12_54 0.7460 31.13 6.909e-05 -6.438 1
8259 EFEMP2 11_36 0.7404 97.56 2.149e-04 -7.542 2
3049 PRRC2C 1_84 0.7402 28.99 6.384e-05 -5.173 1
2893 SLC1A4 2_42 0.7264 23.46 5.069e-05 -4.047 1
7589 NCKAP5L 12_31 0.7218 48.20 1.035e-04 -8.217 1
10874 VPS52 6_28 0.7097 125.93 2.659e-04 1.606 1
12020 LINC01977 17_45 0.7064 28.57 6.004e-05 5.230 1
genename region_tag susie_pip mu2 PVE z num_eqtl
9746 SLC38A3 3_35 0.000e+00 67724 0.000e+00 6.726 1
7061 CAMKV 3_35 0.000e+00 53039 0.000e+00 9.848 1
7212 CCDC171 9_13 0.000e+00 50688 0.000e+00 7.997 1
7063 MST1R 3_35 0.000e+00 34978 0.000e+00 -12.602 2
4469 HEY2 6_84 7.704e-01 33572 7.696e-02 4.930 1
8838 DHFR2 3_59 0.000e+00 32025 0.000e+00 5.146 1
9534 GSAP 7_49 1.000e+00 31329 9.321e-02 5.260 1
8841 STX19 3_59 0.000e+00 31018 0.000e+00 -5.060 1
7432 LEO1 15_21 2.077e-07 27409 1.694e-08 4.647 1
5003 LYSMD2 15_21 0.000e+00 26190 0.000e+00 -4.403 1
4997 MFAP1 15_16 1.608e-07 23703 1.134e-08 4.303 1
7058 RNF123 3_35 0.000e+00 23172 0.000e+00 -10.957 1
1259 WDR76 15_16 0.000e+00 21159 0.000e+00 4.859 2
9777 DPYD 1_60 0.000e+00 19618 0.000e+00 -3.213 1
849 MCM6 2_80 0.000e+00 17859 0.000e+00 -3.886 1
4751 TUBGCP4 15_16 0.000e+00 16922 0.000e+00 3.371 1
10116 ENTPD6 20_18 0.000e+00 16405 0.000e+00 -5.561 1
8836 NSUN3 3_59 0.000e+00 15636 0.000e+00 4.755 1
7782 ADAL 15_16 0.000e+00 14788 0.000e+00 -2.861 1
7783 LCMT2 15_16 0.000e+00 14376 0.000e+00 -3.087 2
genename region_tag susie_pip mu2 PVE z num_eqtl
9534 GSAP 7_49 1.0000 31328.56 9.321e-02 5.260 1
4469 HEY2 6_84 0.7704 33571.89 7.696e-02 4.930 1
9966 TTC30B 2_107 0.3199 757.36 7.208e-04 -3.137 1
7067 PPM1M 3_36 1.0000 203.85 6.065e-04 4.323 2
10874 VPS52 6_28 0.7097 125.93 2.659e-04 1.606 1
8259 EFEMP2 11_36 0.7404 97.56 2.149e-04 -7.542 2
7134 SFXN1 5_105 0.0708 1012.70 2.133e-04 -3.398 1
129 CELSR3 3_34 0.8058 57.17 1.370e-04 -7.731 1
4947 SUOX 12_35 0.7929 57.59 1.359e-04 -5.807 1
6745 TAL1 1_29 0.7916 49.63 1.169e-04 -6.866 1
9062 TMIE 3_33 0.9989 35.23 1.047e-04 -6.902 2
7589 NCKAP5L 12_31 0.7218 48.20 1.035e-04 -8.217 1
9164 KCNB2 8_53 0.4664 63.18 8.767e-05 -8.057 2
3144 CCND2 12_4 0.9639 29.10 8.345e-05 -5.120 1
1743 TSC2 16_2 0.8369 30.80 7.669e-05 5.278 1
2362 B3GAT1 11_84 0.9273 25.66 7.080e-05 -4.502 2
12540 RP11-823E8.3 12_54 0.7460 31.13 6.909e-05 -6.438 1
8212 NEGR1 1_46 0.5012 45.67 6.810e-05 -8.928 1
3049 PRRC2C 1_84 0.7402 28.99 6.384e-05 -5.173 1
7314 CASP7 10_71 0.8414 24.92 6.240e-05 4.584 1
genename region_tag susie_pip mu2 PVE z num_eqtl
7063 MST1R 3_35 0.000e+00 34978.09 0.000e+00 -12.602 2
7058 RNF123 3_35 0.000e+00 23171.66 0.000e+00 -10.957 1
5866 TAOK2 16_24 2.301e-02 95.97 6.571e-06 10.738 1
11609 RP11-1348G14.4 16_23 1.472e-01 90.57 3.967e-05 10.603 1
9939 SULT1A1 16_23 9.114e-02 89.15 2.417e-05 10.415 1
10040 SULT1A2 16_23 9.114e-02 89.15 2.417e-05 -10.415 1
7566 ZNF668 16_24 1.131e-01 77.72 2.615e-05 10.000 1
7567 ZNF646 16_24 1.131e-01 77.72 2.615e-05 -10.000 1
5192 SAE1 19_33 4.601e-03 101.25 1.386e-06 9.849 1
7061 CAMKV 3_35 0.000e+00 53039.36 0.000e+00 9.848 1
8211 C1QTNF4 11_29 3.023e-02 96.82 8.707e-06 9.834 2
439 PRSS8 16_24 1.755e-02 72.64 3.792e-06 -9.765 1
7337 RAPSN 11_29 1.110e-02 87.11 2.877e-06 9.614 1
10701 LAT 16_23 1.333e-01 85.22 3.380e-05 -9.553 1
2358 MTCH2 11_29 9.832e-03 84.51 2.472e-06 -9.514 1
11572 CTC-467M3.3 5_52 1.492e-10 355.30 1.577e-13 9.482 1
8212 NEGR1 1_46 5.012e-01 45.67 6.810e-05 -8.928 1
7336 PSMC3 11_29 1.089e-02 74.52 2.413e-06 -8.866 1
1699 MAPK3 16_24 1.422e-02 68.99 2.918e-06 8.826 1
12567 RCC1L 7_48 1.287e-01 83.58 3.199e-05 -8.667 1
[1] 0.02129
genename region_tag susie_pip mu2 PVE z num_eqtl
7063 MST1R 3_35 0.000e+00 34978.09 0.000e+00 -12.602 2
7058 RNF123 3_35 0.000e+00 23171.66 0.000e+00 -10.957 1
5866 TAOK2 16_24 2.301e-02 95.97 6.571e-06 10.738 1
11609 RP11-1348G14.4 16_23 1.472e-01 90.57 3.967e-05 10.603 1
9939 SULT1A1 16_23 9.114e-02 89.15 2.417e-05 10.415 1
10040 SULT1A2 16_23 9.114e-02 89.15 2.417e-05 -10.415 1
7566 ZNF668 16_24 1.131e-01 77.72 2.615e-05 10.000 1
7567 ZNF646 16_24 1.131e-01 77.72 2.615e-05 -10.000 1
5192 SAE1 19_33 4.601e-03 101.25 1.386e-06 9.849 1
7061 CAMKV 3_35 0.000e+00 53039.36 0.000e+00 9.848 1
8211 C1QTNF4 11_29 3.023e-02 96.82 8.707e-06 9.834 2
439 PRSS8 16_24 1.755e-02 72.64 3.792e-06 -9.765 1
7337 RAPSN 11_29 1.110e-02 87.11 2.877e-06 9.614 1
10701 LAT 16_23 1.333e-01 85.22 3.380e-05 -9.553 1
2358 MTCH2 11_29 9.832e-03 84.51 2.472e-06 -9.514 1
11572 CTC-467M3.3 5_52 1.492e-10 355.30 1.577e-13 9.482 1
8212 NEGR1 1_46 5.012e-01 45.67 6.810e-05 -8.928 1
7336 PSMC3 11_29 1.089e-02 74.52 2.413e-06 -8.866 1
1699 MAPK3 16_24 1.422e-02 68.99 2.918e-06 8.826 1
12567 RCC1L 7_48 1.287e-01 83.58 3.199e-05 -8.667 1
#number of genes for gene set enrichment
length(genes)
[1] 38
Uploading data to Enrichr... Done.
Querying GO_Biological_Process_2021... Done.
Querying GO_Cellular_Component_2021... Done.
Querying GO_Molecular_Function_2021... Done.
Parsing results... Done.
[1] "GO_Biological_Process_2021"
Term Overlap
1 vascular associated smooth muscle cell development (GO:0097084) 2/7
2 vascular associated smooth muscle cell differentiation (GO:0035886) 2/8
3 aorta morphogenesis (GO:0035909) 2/17
4 muscle tissue morphogenesis (GO:0060415) 2/17
Adjusted.P.value Genes
1 0.01559 EFEMP2;HEY2
2 0.01559 EFEMP2;HEY2
3 0.03744 EFEMP2;HEY2
4 0.03744 EFEMP2;HEY2
[1] "GO_Cellular_Component_2021"
[1] Term Overlap Adjusted.P.value Genes
<0 rows> (or 0-length row.names)
[1] "GO_Molecular_Function_2021"
[1] Term Overlap Adjusted.P.value Genes
<0 rows> (or 0-length row.names)
Description FDR
43 Acute Myeloid Leukemia, M1 0.008685
160 Acute Myeloid Leukemia (AML-M2) 0.008685
37 Leukemia, Myelocytic, Acute 0.020284
75 Sulfite oxidase deficiency 0.029827
148 DEAFNESS, AUTOSOMAL RECESSIVE 6 0.029827
165 RETINITIS PIGMENTOSA 42 0.029827
167 Sulfocysteinuria 0.029827
171 CUTIS LAXA, AUTOSOMAL RECESSIVE, TYPE IB 0.029827
176 NEMALINE MYOPATHY 8 0.029827
180 MEGALENCEPHALY-POLYMICROGYRIA-POLYDACTYLY-HYDROCEPHALUS SYNDROME 3 0.029827
Ratio BgRatio
43 4/19 125/9703
160 4/19 125/9703
37 4/19 173/9703
75 1/19 1/9703
148 1/19 1/9703
165 1/19 1/9703
167 1/19 1/9703
171 1/19 1/9703
176 1/19 1/9703
180 1/19 1/9703
Loading the functional categories...
Loading the ID list...
Loading the reference list...
Performing the enrichment analysis...
Warning in oraEnrichment(interestGeneList, referenceGeneList, geneSet, minNum =
minNum, : No significant gene set is identified based on FDR 0.05!
NULL
Warning: ggrepel: 1 unlabeled data points (too many overlaps). Consider
increasing max.overlaps
#number of genes in known annotations
print(length(known_annotations))
[1] 41
#number of genes in known annotations with imputed expression
print(sum(known_annotations %in% ctwas_gene_res$genename))
[1] 22
#significance threshold for TWAS
print(sig_thresh)
[1] 4.571
#number of ctwas genes
length(ctwas_genes)
[1] 9
#number of TWAS genes
length(twas_genes)
[1] 219
#show novel genes (ctwas genes with not in TWAS genes)
ctwas_gene_res[ctwas_gene_res$genename %in% novel_genes,report_cols]
genename region_tag susie_pip mu2 PVE z num_eqtl
7067 PPM1M 3_36 1.0000 203.85 6.065e-04 4.323 2
10858 SLC12A8 3_77 0.8108 22.51 5.431e-05 -4.338 1
2362 B3GAT1 11_84 0.9273 25.66 7.080e-05 -4.502 2
#sensitivity / recall
print(sensitivity)
ctwas TWAS
0.00000 0.09756
#specificity
print(specificity)
ctwas TWAS
0.9991 0.9791
#precision / PPV
print(precision)
ctwas TWAS
0.00000 0.01826
sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Scientific Linux 7.4 (Nitrogen)
Matrix products: default
BLAS/LAPACK: /software/openblas-0.2.19-el7-x86_64/lib/libopenblas_haswellp-r0.2.19.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] readxl_1.3.1 forcats_0.5.1 stringr_1.4.0 dplyr_1.0.7
[5] purrr_0.3.4 readr_2.1.1 tidyr_1.1.4 tidyverse_1.3.1
[9] tibble_3.1.6 WebGestaltR_0.4.4 disgenet2r_0.99.2 enrichR_3.0
[13] cowplot_1.0.0 ggplot2_3.3.5 workflowr_1.6.2
loaded via a namespace (and not attached):
[1] fs_1.5.2 lubridate_1.8.0 bit64_4.0.5 doParallel_1.0.16
[5] httr_1.4.2 rprojroot_2.0.2 tools_3.6.1 backports_1.4.1
[9] doRNG_1.8.2 utf8_1.2.2 R6_2.5.1 vipor_0.4.5
[13] DBI_1.1.1 colorspace_2.0-2 withr_2.4.3 ggrastr_1.0.1
[17] tidyselect_1.1.1 bit_4.0.4 curl_4.3.2 compiler_3.6.1
[21] git2r_0.26.1 cli_3.1.0 rvest_1.0.2 Cairo_1.5-12.2
[25] xml2_1.3.3 labeling_0.4.2 scales_1.1.1 apcluster_1.4.8
[29] digest_0.6.29 rmarkdown_2.11 svglite_1.2.2 pkgconfig_2.0.3
[33] htmltools_0.5.2 dbplyr_2.1.1 fastmap_1.1.0 highr_0.9
[37] rlang_0.4.12 rstudioapi_0.13 RSQLite_2.2.8 jquerylib_0.1.4
[41] farver_2.1.0 generics_0.1.1 jsonlite_1.7.2 vroom_1.5.7
[45] magrittr_2.0.1 Matrix_1.2-18 ggbeeswarm_0.6.0 Rcpp_1.0.7
[49] munsell_0.5.0 fansi_0.5.0 gdtools_0.1.9 lifecycle_1.0.1
[53] stringi_1.7.6 whisker_0.3-2 yaml_2.2.1 plyr_1.8.6
[57] grid_3.6.1 blob_1.2.2 ggrepel_0.9.1 parallel_3.6.1
[61] promises_1.0.1 crayon_1.4.2 lattice_0.20-38 haven_2.4.3
[65] hms_1.1.1 knitr_1.36 pillar_1.6.4 igraph_1.2.10
[69] rjson_0.2.20 rngtools_1.5.2 reshape2_1.4.4 codetools_0.2-16
[73] reprex_2.0.1 glue_1.5.1 evaluate_0.14 data.table_1.14.2
[77] modelr_0.1.8 vctrs_0.3.8 tzdb_0.2.0 httpuv_1.5.1
[81] foreach_1.5.1 cellranger_1.1.0 gtable_0.3.0 assertthat_0.2.1
[85] cachem_1.0.6 xfun_0.29 broom_0.7.10 later_0.8.0
[89] iterators_1.0.13 beeswarm_0.2.3 memoise_2.0.1 ellipsis_0.3.2