Last updated: 2022-02-21
Checks: 6 1
Knit directory: cTWAS_analysis/
This reproducible R Markdown analysis was created with workflowr (version 1.6.2). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.
Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.
Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.
The command set.seed(20211220)
was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.
Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.
Using absolute paths to the files within your workflowr project makes it difficult for you and others to run your code on a different machine. Change the absolute path(s) below to the suggested relative path(s) to make your code more reproducible.
absolute | relative |
---|---|
/project2/xinhe/shengqian/cTWAS/cTWAS_analysis/data/ | data |
/project2/xinhe/shengqian/cTWAS/cTWAS_analysis/code/ctwas_config.R | code/ctwas_config.R |
Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.
The results in this page were generated with repository version bbf6737. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.
Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish
or wflow_git_commit
). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:
Ignored files:
Ignored: .ipynb_checkpoints/
Untracked files:
Untracked: Rplot.png
Untracked: analysis/Glucose_Adipose_Subcutaneous.Rmd
Untracked: analysis/Glucose_Adipose_Visceral_Omentum.Rmd
Untracked: analysis/Splicing_Test.Rmd
Untracked: code/.ipynb_checkpoints/
Untracked: code/AF_out/
Untracked: code/BMI_S_out/
Untracked: code/BMI_out/
Untracked: code/Glucose_out/
Untracked: code/LDL_S_out/
Untracked: code/T2D_out/
Untracked: code/ctwas_config.R
Untracked: code/mapping.R
Untracked: code/out/
Untracked: code/run_AF_analysis.sbatch
Untracked: code/run_AF_analysis.sh
Untracked: code/run_AF_ctwas_rss_LDR.R
Untracked: code/run_BMI_analysis.sbatch
Untracked: code/run_BMI_analysis.sh
Untracked: code/run_BMI_analysis_S.sbatch
Untracked: code/run_BMI_analysis_S.sh
Untracked: code/run_BMI_ctwas_rss_LDR.R
Untracked: code/run_BMI_ctwas_rss_LDR_S.R
Untracked: code/run_Glucose_analysis.sbatch
Untracked: code/run_Glucose_analysis.sh
Untracked: code/run_Glucose_ctwas_rss_LDR.R
Untracked: code/run_LDL_analysis_S.sbatch
Untracked: code/run_LDL_analysis_S.sh
Untracked: code/run_LDL_ctwas_rss_LDR_S.R
Untracked: code/run_T2D_analysis.sbatch
Untracked: code/run_T2D_analysis.sh
Untracked: code/run_T2D_ctwas_rss_LDR.R
Untracked: data/.ipynb_checkpoints/
Untracked: data/AF/
Untracked: data/BMI/
Untracked: data/BMI_S/
Untracked: data/Glucose/
Untracked: data/LDL_S/
Untracked: data/T2D/
Untracked: data/TEST/
Untracked: data/UKBB/
Untracked: data/UKBB_SNPs_Info.text
Untracked: data/gene_OMIM.txt
Untracked: data/gene_pip_0.8.txt
Untracked: data/mashr_Heart_Atrial_Appendage.db
Untracked: data/mashr_sqtl/
Untracked: data/summary_known_genes_annotations.xlsx
Untracked: data/untitled.txt
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
These are the previous versions of the repository in which changes were made to the R Markdown (analysis/BMI_Brain_Spinal_cord_cervical_c-1.Rmd
) and HTML (docs/BMI_Brain_Spinal_cord_cervical_c-1.html
) files. If you’ve configured a remote Git repository (see ?wflow_git_remote
), click on the hyperlinks in the table below to view the files as they were in that past version.
File | Version | Author | Date | Message |
---|---|---|---|---|
Rmd | bbf6737 | sq-96 | 2022-02-21 | update |
html | 91f38fa | sq-96 | 2022-02-13 | Build site. |
Rmd | eb13ecf | sq-96 | 2022-02-13 | update |
html | e6bc169 | sq-96 | 2022-02-13 | Build site. |
Rmd | 87fee8b | sq-96 | 2022-02-13 | update |
#number of imputed weights
nrow(qclist_all)
[1] 10532
#number of imputed weights by chromosome
table(qclist_all$chr)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
1032 741 611 415 500 593 513 394 399 413 617 585 228 354 366 467
17 18 19 20 21 22
630 170 801 314 124 265
#number of imputed weights without missing variants
sum(qclist_all$nmiss==0)
[1] 8584
#proportion of imputed weights without missing variants
mean(qclist_all$nmiss==0)
[1] 0.815
Version | Author | Date |
---|---|---|
e6bc169 | sq-96 | 2022-02-13 |
#estimated group prior
estimated_group_prior <- group_prior_rec[,ncol(group_prior_rec)]
names(estimated_group_prior) <- c("gene", "snp")
estimated_group_prior["snp"] <- estimated_group_prior["snp"]*thin #adjust parameter to account for thin argument
print(estimated_group_prior)
gene snp
0.0053427 0.0002973
#estimated group prior variance
estimated_group_prior_var <- group_prior_var_rec[,ncol(group_prior_var_rec)]
names(estimated_group_prior_var) <- c("gene", "snp")
print(estimated_group_prior_var)
gene snp
30.33 17.10
#report sample size
print(sample_size)
[1] 336107
#report group size
group_size <- c(nrow(ctwas_gene_res), n_snps)
print(group_size)
[1] 10532 7535010
#estimated group PVE
estimated_group_pve <- estimated_group_prior_var*estimated_group_prior*group_size/sample_size #check PVE calculation
names(estimated_group_pve) <- c("gene", "snp")
print(estimated_group_pve)
gene snp
0.005079 0.113944
#compare sum(PIP*mu2/sample_size) with above PVE calculation
c(sum(ctwas_gene_res$PVE),sum(ctwas_snp_res$PVE))
[1] 0.4381 15.3065
genename region_tag susie_pip mu2 PVE z num_eqtl
9777 KLHDC8B 3_34 1.0000 2542.58 7.565e-03 -5.052 2
789 SDHA 5_1 1.0000 21042.52 6.261e-02 3.012 1
11293 AC078842.3 7_84 1.0000 19171.30 5.704e-02 -3.208 1
4274 IGHMBP2 11_38 1.0000 31336.96 9.324e-02 -4.379 1
5140 MFAP1 15_16 1.0000 29203.28 8.689e-02 4.303 1
695 MAPK6 15_21 1.0000 28649.03 8.524e-02 -4.646 1
7232 PPM1M 3_36 1.0000 352.45 1.049e-03 4.732 2
1472 ASCC2 22_10 0.7776 8574.76 1.984e-02 -2.816 2
8471 EFEMP2 11_36 0.7758 50.61 1.168e-04 -7.484 2
12868 PANO1 11_1 0.7576 27.28 6.149e-05 4.979 2
3443 ZMIZ2 7_33 0.7505 67.76 1.513e-04 -8.105 1
7328 ZNF12 7_9 0.7474 28.10 6.249e-05 5.065 2
1464 RASD2 22_14 0.7403 24.87 5.479e-05 -4.362 2
12828 RP11-340F14.6 12_74 0.7297 30.02 6.517e-05 -4.742 2
11162 VPS52 6_28 0.7203 127.03 2.722e-04 1.606 1
2760 PDCD10 3_103 0.7017 23.84 4.978e-05 -4.065 1
2845 ITGB6 2_96 0.6533 59.14 1.150e-04 5.515 1
3959 KLK14 19_35 0.6373 28.25 5.356e-05 -4.062 1
1588 NINL 20_19 0.6093 34.74 6.298e-05 -5.532 2
1304 CBX5 12_33 0.6017 25.84 4.626e-05 4.691 1
genename region_tag susie_pip mu2 PVE z num_eqtl
10032 SLC38A3 3_35 0 66889 0.00000 6.726 1
7397 CCDC171 9_13 0 42633 0.00000 8.471 2
38 RBM6 3_35 0 40476 0.00000 12.536 1
7227 MST1R 3_35 0 34543 0.00000 -12.635 2
8111 CALML6 1_1 0 32851 0.00000 -5.718 1
4274 IGHMBP2 11_38 1 31337 0.09324 -4.379 1
9078 STX19 3_59 0 30619 0.00000 -5.060 1
5140 MFAP1 15_16 1 29203 0.08689 4.303 1
695 MAPK6 15_21 1 28649 0.08524 -4.646 1
1280 WDR76 15_16 0 25920 0.00000 4.454 1
2418 CPT1A 11_38 0 24846 0.00000 -4.677 1
4970 TMOD3 15_21 0 23045 0.00000 5.412 1
7223 RNF123 3_35 0 22890 0.00000 -10.959 1
789 SDHA 5_1 1 21043 0.06261 3.012 1
4873 TUBGCP4 15_16 0 20509 0.00000 3.366 1
11293 AC078842.3 7_84 1 19171 0.05704 -3.208 1
7966 ADAL 15_16 0 17919 0.00000 -2.861 1
7967 LCMT2 15_16 0 17919 0.00000 -2.861 1
9868 HYAL3 3_35 0 17850 0.00000 6.264 2
859 MCM6 2_80 0 17636 0.00000 -3.886 1
genename region_tag susie_pip mu2 PVE z num_eqtl
4274 IGHMBP2 11_38 1.00000 31336.96 9.324e-02 -4.379 1
5140 MFAP1 15_16 1.00000 29203.28 8.689e-02 4.303 1
695 MAPK6 15_21 1.00000 28649.03 8.524e-02 -4.646 1
789 SDHA 5_1 1.00000 21042.52 6.261e-02 3.012 1
11293 AC078842.3 7_84 1.00000 19171.30 5.704e-02 -3.208 1
1472 ASCC2 22_10 0.77762 8574.76 1.984e-02 -2.816 2
9777 KLHDC8B 3_34 1.00000 2542.58 7.565e-03 -5.052 2
262 CPS1 2_124 0.49202 4722.15 6.913e-03 3.535 1
2872 LANCL1 2_124 0.49202 4722.15 6.913e-03 -3.535 1
7232 PPM1M 3_36 1.00000 352.45 1.049e-03 4.732 2
11162 VPS52 6_28 0.72030 127.03 2.722e-04 1.606 1
8727 ASPHD1 16_24 0.59151 120.67 2.124e-04 -11.849 1
10189 ATP2A1 16_23 0.50558 100.96 1.519e-04 -10.759 1
3443 ZMIZ2 7_33 0.75046 67.76 1.513e-04 -8.105 1
6433 GPR61 1_67 0.56829 81.16 1.372e-04 8.755 1
10756 LY6G5C 6_26 0.43405 106.00 1.369e-04 8.418 1
12619 CTD-2186M15.3 5_22 0.02121 2014.86 1.271e-04 2.934 2
8471 EFEMP2 11_36 0.77583 50.61 1.168e-04 -7.484 2
2845 ITGB6 2_96 0.65332 59.14 1.150e-04 5.515 1
13013 DHRS11 17_22 0.51954 63.46 9.809e-05 -8.128 1
genename region_tag susie_pip mu2 PVE z num_eqtl
7227 MST1R 3_35 0.000e+00 34543.29 0.000e+00 -12.635 2
38 RBM6 3_35 0.000e+00 40475.73 0.000e+00 12.536 1
8727 ASPHD1 16_24 5.915e-01 120.67 2.124e-04 -11.849 1
1048 EFR3B 2_15 1.115e-08 203.25 6.742e-12 11.587 1
8728 KCTD13 16_24 6.773e-02 115.84 2.334e-05 -11.491 1
8068 INO80E 16_24 1.260e-02 103.18 3.868e-06 11.077 1
7223 RNF123 3_35 0.000e+00 22889.67 0.000e+00 -10.959 1
1721 MAPK3 16_24 1.146e-02 103.03 3.512e-06 10.880 1
10189 ATP2A1 16_23 5.056e-01 100.96 1.519e-04 -10.759 1
11438 NPIPB7 16_23 7.273e-02 100.96 2.185e-05 10.510 1
10225 SULT1A1 16_23 2.947e-02 99.07 8.685e-06 10.367 1
10271 C6orf106 6_28 5.814e-05 124.60 2.155e-08 -10.264 1
10322 SULT1A2 16_23 1.683e-02 95.72 4.794e-06 -10.171 2
7747 ZNF668 16_24 1.090e-01 80.22 2.602e-05 10.000 1
5341 SAE1 19_33 1.095e-03 100.69 3.281e-07 9.849 1
8426 C1QTNF4 11_29 6.346e-03 90.12 1.701e-06 9.564 1
11752 LINC00461 5_52 8.339e-11 357.45 8.868e-14 9.418 1
10335 IL27 16_23 1.421e-02 81.12 3.430e-06 -9.140 1
8427 NEGR1 1_46 9.461e-02 76.50 2.153e-05 -8.928 1
7515 PSMC3 11_29 6.783e-03 78.61 1.586e-06 -8.866 1
[1] 0.02136
genename region_tag susie_pip mu2 PVE z num_eqtl
7227 MST1R 3_35 0.000e+00 34543.29 0.000e+00 -12.635 2
38 RBM6 3_35 0.000e+00 40475.73 0.000e+00 12.536 1
8727 ASPHD1 16_24 5.915e-01 120.67 2.124e-04 -11.849 1
1048 EFR3B 2_15 1.115e-08 203.25 6.742e-12 11.587 1
8728 KCTD13 16_24 6.773e-02 115.84 2.334e-05 -11.491 1
8068 INO80E 16_24 1.260e-02 103.18 3.868e-06 11.077 1
7223 RNF123 3_35 0.000e+00 22889.67 0.000e+00 -10.959 1
1721 MAPK3 16_24 1.146e-02 103.03 3.512e-06 10.880 1
10189 ATP2A1 16_23 5.056e-01 100.96 1.519e-04 -10.759 1
11438 NPIPB7 16_23 7.273e-02 100.96 2.185e-05 10.510 1
10225 SULT1A1 16_23 2.947e-02 99.07 8.685e-06 10.367 1
10271 C6orf106 6_28 5.814e-05 124.60 2.155e-08 -10.264 1
10322 SULT1A2 16_23 1.683e-02 95.72 4.794e-06 -10.171 2
7747 ZNF668 16_24 1.090e-01 80.22 2.602e-05 10.000 1
5341 SAE1 19_33 1.095e-03 100.69 3.281e-07 9.849 1
8426 C1QTNF4 11_29 6.346e-03 90.12 1.701e-06 9.564 1
11752 LINC00461 5_52 8.339e-11 357.45 8.868e-14 9.418 1
10335 IL27 16_23 1.421e-02 81.12 3.430e-06 -9.140 1
8427 NEGR1 1_46 9.461e-02 76.50 2.153e-05 -8.928 1
7515 PSMC3 11_29 6.783e-03 78.61 1.586e-06 -8.866 1
#number of genes for gene set enrichment
length(genes)
[1] 32
Uploading data to Enrichr... Done.
Querying GO_Biological_Process_2021... Done.
Querying GO_Cellular_Component_2021... Done.
Querying GO_Molecular_Function_2021... Done.
Parsing results... Done.
[1] "GO_Biological_Process_2021"
[1] Term Overlap Adjusted.P.value Genes
<0 rows> (or 0-length row.names)
[1] "GO_Cellular_Component_2021"
Term Overlap Adjusted.P.value Genes
1 microfibril (GO:0001527) 2/11 0.007029 EFEMP2;MFAP1
2 supramolecular fiber (GO:0099512) 2/19 0.010840 EFEMP2;MFAP1
[1] "GO_Molecular_Function_2021"
[1] Term Overlap Adjusted.P.value Genes
<0 rows> (or 0-length row.names)
Description FDR Ratio BgRatio
45 Nodular Sclerosis Classical Hodgkin Lymphoma 0.01404 1/15 1/9703
77 Brody myopathy 0.01404 1/15 1/9703
84 SPINAL MUSCULAR ATROPHY WITH RESPIRATORY DISTRESS 1 0.01404 1/15 1/9703
85 Cerebral Cavernous Malformations 3 0.01404 1/15 1/9703
92 Familial cerebral cavernous malformation 0.01404 1/15 1/9703
95 CARDIOMYOPATHY, DILATED, 1GG 0.01404 1/15 1/9703
97 PARAGANGLIOMAS 5 0.01404 1/15 1/9703
98 CUTIS LAXA, AUTOSOMAL RECESSIVE, TYPE IB 0.01404 1/15 1/9703
100 CHARCOT-MARIE-TOOTH DISEASE, DOMINANT INTERMEDIATE F 0.01404 1/15 1/9703
103 CHARCOT-MARIE-TOOTH DISEASE, AXONAL, TYPE 2S 0.01404 1/15 1/9703
Loading the functional categories...
Loading the ID list...
Loading the reference list...
Performing the enrichment analysis...
Warning in oraEnrichment(interestGeneList, referenceGeneList, geneSet, minNum =
minNum, : No significant gene set is identified based on FDR 0.05!
NULL
#number of genes in known annotations
print(length(known_annotations))
[1] 41
#number of genes in known annotations with imputed expression
print(sum(known_annotations %in% ctwas_gene_res$genename))
[1] 22
#significance threshold for TWAS
print(sig_thresh)
[1] 4.576
#number of ctwas genes
length(ctwas_genes)
[1] 7
#number of TWAS genes
length(twas_genes)
[1] 225
#show novel genes (ctwas genes with not in TWAS genes)
ctwas_gene_res[ctwas_gene_res$genename %in% novel_genes,report_cols]
genename region_tag susie_pip mu2 PVE z num_eqtl
789 SDHA 5_1 1 21043 0.06261 3.012 1
11293 AC078842.3 7_84 1 19171 0.05704 -3.208 1
4274 IGHMBP2 11_38 1 31337 0.09324 -4.379 1
5140 MFAP1 15_16 1 29203 0.08689 4.303 1
#sensitivity / recall
print(sensitivity)
ctwas TWAS
0.00000 0.07317
#specificity
print(specificity)
ctwas TWAS
0.9993 0.9789
#precision / PPV
print(precision)
ctwas TWAS
0.00000 0.01333
sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Scientific Linux 7.4 (Nitrogen)
Matrix products: default
BLAS/LAPACK: /software/openblas-0.2.19-el7-x86_64/lib/libopenblas_haswellp-r0.2.19.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] readxl_1.3.1 forcats_0.5.1 stringr_1.4.0 dplyr_1.0.7
[5] purrr_0.3.4 readr_2.1.1 tidyr_1.1.4 tidyverse_1.3.1
[9] tibble_3.1.6 WebGestaltR_0.4.4 disgenet2r_0.99.2 enrichR_3.0
[13] cowplot_1.0.0 ggplot2_3.3.5 workflowr_1.6.2
loaded via a namespace (and not attached):
[1] fs_1.5.2 lubridate_1.8.0 bit64_4.0.5 doParallel_1.0.16
[5] httr_1.4.2 rprojroot_2.0.2 tools_3.6.1 backports_1.4.1
[9] doRNG_1.8.2 utf8_1.2.2 R6_2.5.1 vipor_0.4.5
[13] DBI_1.1.1 colorspace_2.0-2 withr_2.4.3 ggrastr_1.0.1
[17] tidyselect_1.1.1 bit_4.0.4 curl_4.3.2 compiler_3.6.1
[21] git2r_0.26.1 cli_3.1.0 rvest_1.0.2 Cairo_1.5-12.2
[25] xml2_1.3.3 labeling_0.4.2 scales_1.1.1 apcluster_1.4.8
[29] digest_0.6.29 rmarkdown_2.11 svglite_1.2.2 pkgconfig_2.0.3
[33] htmltools_0.5.2 dbplyr_2.1.1 fastmap_1.1.0 highr_0.9
[37] rlang_0.4.12 rstudioapi_0.13 RSQLite_2.2.8 jquerylib_0.1.4
[41] farver_2.1.0 generics_0.1.1 jsonlite_1.7.2 vroom_1.5.7
[45] magrittr_2.0.1 Matrix_1.2-18 ggbeeswarm_0.6.0 Rcpp_1.0.7
[49] munsell_0.5.0 fansi_0.5.0 gdtools_0.1.9 lifecycle_1.0.1
[53] stringi_1.7.6 whisker_0.3-2 yaml_2.2.1 plyr_1.8.6
[57] grid_3.6.1 blob_1.2.2 ggrepel_0.9.1 parallel_3.6.1
[61] promises_1.0.1 crayon_1.4.2 lattice_0.20-38 haven_2.4.3
[65] hms_1.1.1 knitr_1.36 pillar_1.6.4 igraph_1.2.10
[69] rjson_0.2.20 rngtools_1.5.2 reshape2_1.4.4 codetools_0.2-16
[73] reprex_2.0.1 glue_1.5.1 evaluate_0.14 data.table_1.14.2
[77] modelr_0.1.8 vctrs_0.3.8 tzdb_0.2.0 httpuv_1.5.1
[81] foreach_1.5.1 cellranger_1.1.0 gtable_0.3.0 assertthat_0.2.1
[85] cachem_1.0.6 xfun_0.29 broom_0.7.10 later_0.8.0
[89] iterators_1.0.13 beeswarm_0.2.3 memoise_2.0.1 ellipsis_0.3.2