Last updated: 2022-02-21

Checks: 6 1

Knit directory: cTWAS_analysis/

This reproducible R Markdown analysis was created with workflowr (version 1.6.2). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.


Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

The command set.seed(20211220) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

Using absolute paths to the files within your workflowr project makes it difficult for you and others to run your code on a different machine. Change the absolute path(s) below to the suggested relative path(s) to make your code more reproducible.

absolute relative
/project2/xinhe/shengqian/cTWAS/cTWAS_analysis/data/ data
/project2/xinhe/shengqian/cTWAS/cTWAS_analysis/code/ctwas_config.R code/ctwas_config.R

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version bbf6737. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .ipynb_checkpoints/

Untracked files:
    Untracked:  Rplot.png
    Untracked:  analysis/Glucose_Adipose_Subcutaneous.Rmd
    Untracked:  analysis/Glucose_Adipose_Visceral_Omentum.Rmd
    Untracked:  analysis/Splicing_Test.Rmd
    Untracked:  code/.ipynb_checkpoints/
    Untracked:  code/AF_out/
    Untracked:  code/BMI_S_out/
    Untracked:  code/BMI_out/
    Untracked:  code/Glucose_out/
    Untracked:  code/LDL_S_out/
    Untracked:  code/T2D_out/
    Untracked:  code/ctwas_config.R
    Untracked:  code/mapping.R
    Untracked:  code/out/
    Untracked:  code/run_AF_analysis.sbatch
    Untracked:  code/run_AF_analysis.sh
    Untracked:  code/run_AF_ctwas_rss_LDR.R
    Untracked:  code/run_BMI_analysis.sbatch
    Untracked:  code/run_BMI_analysis.sh
    Untracked:  code/run_BMI_analysis_S.sbatch
    Untracked:  code/run_BMI_analysis_S.sh
    Untracked:  code/run_BMI_ctwas_rss_LDR.R
    Untracked:  code/run_BMI_ctwas_rss_LDR_S.R
    Untracked:  code/run_Glucose_analysis.sbatch
    Untracked:  code/run_Glucose_analysis.sh
    Untracked:  code/run_Glucose_ctwas_rss_LDR.R
    Untracked:  code/run_LDL_analysis_S.sbatch
    Untracked:  code/run_LDL_analysis_S.sh
    Untracked:  code/run_LDL_ctwas_rss_LDR_S.R
    Untracked:  code/run_T2D_analysis.sbatch
    Untracked:  code/run_T2D_analysis.sh
    Untracked:  code/run_T2D_ctwas_rss_LDR.R
    Untracked:  data/.ipynb_checkpoints/
    Untracked:  data/AF/
    Untracked:  data/BMI/
    Untracked:  data/BMI_S/
    Untracked:  data/Glucose/
    Untracked:  data/LDL_S/
    Untracked:  data/T2D/
    Untracked:  data/TEST/
    Untracked:  data/UKBB/
    Untracked:  data/UKBB_SNPs_Info.text
    Untracked:  data/gene_OMIM.txt
    Untracked:  data/gene_pip_0.8.txt
    Untracked:  data/mashr_Heart_Atrial_Appendage.db
    Untracked:  data/mashr_sqtl/
    Untracked:  data/summary_known_genes_annotations.xlsx
    Untracked:  data/untitled.txt

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.


These are the previous versions of the repository in which changes were made to the R Markdown (analysis/BMI_Brain_Putamen_basal_ganglia.Rmd) and HTML (docs/BMI_Brain_Putamen_basal_ganglia.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File Version Author Date Message
Rmd bbf6737 sq-96 2022-02-21 update
html 91f38fa sq-96 2022-02-13 Build site.
Rmd eb13ecf sq-96 2022-02-13 update
html e6bc169 sq-96 2022-02-13 Build site.
Rmd 87fee8b sq-96 2022-02-13 update

Weight QC

#number of imputed weights
nrow(qclist_all)
[1] 11258
#number of imputed weights by chromosome
table(qclist_all$chr)

   1    2    3    4    5    6    7    8    9   10   11   12   13   14   15   16 
1123  787  660  445  518  661  554  400  415  452  668  611  225  384  383  517 
  17   18   19   20   21   22 
 686  178  855  348  121  267 
#number of imputed weights without missing variants
sum(qclist_all$nmiss==0)
[1] 8841
#proportion of imputed weights without missing variants
mean(qclist_all$nmiss==0)
[1] 0.7853

Check convergence of parameters

Version Author Date
e6bc169 sq-96 2022-02-13
#estimated group prior
estimated_group_prior <- group_prior_rec[,ncol(group_prior_rec)]
names(estimated_group_prior) <- c("gene", "snp")
estimated_group_prior["snp"] <- estimated_group_prior["snp"]*thin #adjust parameter to account for thin argument
print(estimated_group_prior)
     gene       snp 
0.0070376 0.0002893 
#estimated group prior variance
estimated_group_prior_var <- group_prior_var_rec[,ncol(group_prior_var_rec)]
names(estimated_group_prior_var) <- c("gene", "snp")
print(estimated_group_prior_var)
 gene   snp 
19.50 17.87 
#report sample size
print(sample_size)
[1] 336107
#report group size
group_size <- c(nrow(ctwas_gene_res), n_snps)
print(group_size)
[1]   11258 7535010
#estimated group PVE
estimated_group_pve <- estimated_group_prior_var*estimated_group_prior*group_size/sample_size #check PVE calculation
names(estimated_group_pve) <- c("gene", "snp")
print(estimated_group_pve)
    gene      snp 
0.004596 0.115884 
#compare sum(PIP*mu2/sample_size) with above PVE calculation
c(sum(ctwas_gene_res$PVE),sum(ctwas_snp_res$PVE))
[1]  0.02674 16.29002

Genes with highest PIPs

        genename region_tag susie_pip     mu2       PVE      z num_eqtl
10175    ATP6V0C       16_2    0.9492   26.58 7.507e-05 -4.711        1
13394      NOL12      22_15    0.8865   62.62 1.652e-04 -4.504        2
241         ISL1       5_30    0.7868   24.88 5.824e-05  5.010        1
5250        FGD4      12_22    0.7584   23.51 5.305e-05  4.449        2
8817      EFEMP2      11_36    0.7570   52.91 1.192e-04 -8.201        1
5487     C18orf8      18_12    0.7461   53.49 1.187e-04  7.458        2
9562       ZADH2      18_44    0.7295   22.99 4.991e-05  4.278        1
13411  HIST1H2BE       6_20    0.7016   28.99 6.052e-05 -6.515        1
11599      FADS3      11_34    0.6986   25.36 5.272e-05  4.311        1
8733     RNASEH1        2_2    0.6955   26.12 5.405e-05  4.231        2
10490      SKOR1      15_31    0.6954   54.86 1.135e-04 -9.754        1
9657     TRAPPC5       19_7    0.6882   25.48 5.218e-05  4.065        2
12847  LINC01977      17_45    0.6839   28.29 5.756e-05  5.230        1
5878        ECE2      3_113    0.6831   30.11 6.120e-05 -5.287        1
666       CACNB1      17_23    0.6825   24.80 5.035e-05  3.883        1
9431       ERBB4      2_125    0.6817 6016.69 1.220e-02 -7.023        1
368       PHLPP2      16_38    0.6728   49.57 9.923e-05  4.619        1
12529 AP006621.5       11_1    0.6619   25.40 5.002e-05 -4.506        1
309         VRK2       2_38    0.6542   22.96 4.468e-05  3.879        2
6637      FBXL18        7_7    0.6341   24.60 4.642e-05 -4.562        2

Genes with largest effect sizes

      genename region_tag susie_pip   mu2 PVE       z num_eqtl
10436  SLC38A3       3_35         0 70545   0   6.726        1
7563     CAMKV       3_35         0 55235   0  -9.848        1
7741     PSIP1       9_13         0 54061   0   7.951        1
7742   CCDC171       9_13         0 54049   0   7.979        1
2148    PIK3R2      19_14         0 49133   0  -7.140        1
36        RBM6       3_35         0 42639   0  12.536        1
7565     MST1R       3_35         0 36400   0 -12.626        2
9443     STX19       3_59         0 32288   0  -5.060        1
5360     MFAP1      15_16         0 24650   0   4.303        1
12170     HYPK      15_16         0 24544   0   4.322        1
7560    RNF123       3_35         0 24100   0 -10.959        1
5186     TMOD3      15_21         0 19482   0  -5.412        1
3086     PLCL1      2_117         0 19300   0  -5.642        1
5884     CENPC       4_47         0 19277   0   5.863        2
12210     NAT6       3_35         0 18820   0  -6.264        2
7603    RNF180       5_39         0 18492   0  -3.745        2
7962      LEO1      15_21         0 18380   0   2.536        2
5088   TUBGCP4      15_16         0 17595   0   3.371        1
1042     CCNT2       2_80         0 17196   0   4.382        2
1422     MAST3      19_14         0 16401   0   2.208        1

Genes with highest PVE

            genename region_tag susie_pip     mu2       PVE       z num_eqtl
9431           ERBB4      2_125  0.681733 6016.69 1.220e-02  -7.023        1
13394          NOL12      22_15  0.886527   62.62 1.652e-04  -4.504        2
8817          EFEMP2      11_36  0.757011   52.91 1.192e-04  -8.201        1
6710           GPR61       1_67  0.508658   78.53 1.188e-04   8.755        1
5487         C18orf8      18_12  0.746061   53.49 1.187e-04   7.458        2
10490          SKOR1      15_31  0.695443   54.86 1.135e-04  -9.754        1
5219           G3BP2       4_51  0.304950  123.45 1.120e-04  -2.134        1
368           PHLPP2      16_38  0.672814   49.57 9.923e-05   4.619        1
12412 RP11-1348G14.4      16_23  0.312488  102.15 9.497e-05  10.740        1
13154   CTC-498M16.4       5_52  0.005306 5461.35 8.622e-05   7.706        1
12235   GS1-259H13.2       7_62  0.526231   50.82 7.957e-05  -7.078        1
7903         TRMT61A      14_54  0.615046   41.71 7.633e-05   6.576        2
10175        ATP6V0C       16_2  0.949227   26.58 7.507e-05  -4.711        1
9806           KCNB2       8_53  0.374727   62.19 6.933e-05  -8.041        2
4200         NECTIN2      19_31  0.614744   33.79 6.180e-05   5.114        1
5878            ECE2      3_113  0.683096   30.11 6.120e-05  -5.287        1
13411      HIST1H2BE       6_20  0.701582   28.99 6.052e-05  -6.515        1
241             ISL1       5_30  0.786770   24.88 5.824e-05   5.010        1
12847      LINC01977      17_45  0.683924   28.29 5.756e-05   5.230        1
8100          ZNF646      16_24  0.230618   79.70 5.468e-05 -10.092        1

Genes with largest z scores

            genename region_tag susie_pip      mu2       PVE       z num_eqtl
7565           MST1R       3_35 0.000e+00 36399.62 0.000e+00 -12.626        2
36              RBM6       3_35 0.000e+00 42639.01 0.000e+00  12.536        1
9065          KCTD13      16_24 1.062e-01   110.12 3.478e-05  11.491        1
7560          RNF123       3_35 0.000e+00 24100.49 0.000e+00 -10.959        1
8425          INO80E      16_24 3.312e-02    96.97 9.555e-06  10.849        2
12412 RP11-1348G14.4      16_23 3.125e-01   102.15 9.497e-05  10.740        1
10750        SULT1A2      16_23 9.533e-02   104.71 2.970e-05 -10.557        2
10461           CLN3      16_23 4.595e-02    99.79 1.364e-05  10.453        1
9180           NUPR1      16_23 8.732e-02   109.54 2.846e-05 -10.442        2
8100          ZNF646      16_24 2.306e-01    79.70 5.468e-05 -10.092        1
8099          ZNF668      16_24 7.753e-02    77.16 1.780e-05  10.000        1
8773         C1QTNF4      11_29 2.139e-02    94.05 5.987e-06   9.960        2
7563           CAMKV       3_35 0.000e+00 55235.03 0.000e+00  -9.848        1
454            PRSS8      16_24 1.517e-02    71.97 3.248e-06  -9.765        1
10490          SKOR1      15_31 6.954e-01    54.86 1.135e-04  -9.754        1
11425         NDUFS3      11_29 1.196e-02    84.08 2.993e-06  -9.609        2
11430            LAT      16_23 5.639e-02    95.10 1.596e-05  -9.553        1
2537           MTCH2      11_29 1.005e-02    83.11 2.485e-06  -9.551        1
10677        FAM180B      11_29 9.653e-03    82.29 2.363e-06  -9.477        2
12260      LINC00461       5_52 4.937e-11   348.10 5.113e-14   9.418        1

Comparing z scores and PIPs

[1] 0.0215
            genename region_tag susie_pip      mu2       PVE       z num_eqtl
7565           MST1R       3_35 0.000e+00 36399.62 0.000e+00 -12.626        2
36              RBM6       3_35 0.000e+00 42639.01 0.000e+00  12.536        1
9065          KCTD13      16_24 1.062e-01   110.12 3.478e-05  11.491        1
7560          RNF123       3_35 0.000e+00 24100.49 0.000e+00 -10.959        1
8425          INO80E      16_24 3.312e-02    96.97 9.555e-06  10.849        2
12412 RP11-1348G14.4      16_23 3.125e-01   102.15 9.497e-05  10.740        1
10750        SULT1A2      16_23 9.533e-02   104.71 2.970e-05 -10.557        2
10461           CLN3      16_23 4.595e-02    99.79 1.364e-05  10.453        1
9180           NUPR1      16_23 8.732e-02   109.54 2.846e-05 -10.442        2
8100          ZNF646      16_24 2.306e-01    79.70 5.468e-05 -10.092        1
8099          ZNF668      16_24 7.753e-02    77.16 1.780e-05  10.000        1
8773         C1QTNF4      11_29 2.139e-02    94.05 5.987e-06   9.960        2
7563           CAMKV       3_35 0.000e+00 55235.03 0.000e+00  -9.848        1
454            PRSS8      16_24 1.517e-02    71.97 3.248e-06  -9.765        1
10490          SKOR1      15_31 6.954e-01    54.86 1.135e-04  -9.754        1
11425         NDUFS3      11_29 1.196e-02    84.08 2.993e-06  -9.609        2
11430            LAT      16_23 5.639e-02    95.10 1.596e-05  -9.553        1
2537           MTCH2      11_29 1.005e-02    83.11 2.485e-06  -9.551        1
10677        FAM180B      11_29 9.653e-03    82.29 2.363e-06  -9.477        2
12260      LINC00461       5_52 4.937e-11   348.10 5.113e-14   9.418        1

GO enrichment analysis for genes with PIP>0.5

#number of genes for gene set enrichment
length(genes)
[1] 33
Uploading data to Enrichr... Done.
  Querying GO_Biological_Process_2021... Done.
  Querying GO_Cellular_Component_2021... Done.
  Querying GO_Molecular_Function_2021... Done.
Parsing results... Done.
[1] "GO_Biological_Process_2021"

[1] Term             Overlap          Adjusted.P.value Genes           
<0 rows> (or 0-length row.names)
[1] "GO_Cellular_Component_2021"

[1] Term             Overlap          Adjusted.P.value Genes           
<0 rows> (or 0-length row.names)
[1] "GO_Molecular_Function_2021"

[1] Term             Overlap          Adjusted.P.value Genes           
<0 rows> (or 0-length row.names)

DisGeNET enrichment analysis for genes with PIP>0.5

                                                                                    Description
31                                                                   Neoplasm Recurrence, Local
85                                                         CHARCOT-MARIE-TOOTH DISEASE, TYPE 4H
86                                    Familial encephalopathy with neuroserpin inclusion bodies
93                                                     CUTIS LAXA, AUTOSOMAL RECESSIVE, TYPE IB
96                                                             AMYOTROPHIC LATERAL SCLEROSIS 19
97                                                                        CONE-ROD DYSTROPHY 20
98 PROGRESSIVE EXTERNAL OPHTHALMOPLEGIA WITH MITOCHONDRIAL DNA DELETIONS, AUTOSOMAL RECESSIVE 2
9                                                                             Bladder Exstrophy
20                                                                    Herpes Simplex Infections
52                                                      Cutis Laxa, Autosomal Recessive, Type I
       FDR Ratio BgRatio
31 0.02082  2/14 39/9703
85 0.02082  1/14  1/9703
86 0.02082  1/14  1/9703
93 0.02082  1/14  1/9703
96 0.02082  1/14  1/9703
97 0.02082  1/14  1/9703
98 0.02082  1/14  1/9703
9  0.02241  1/14  2/9703
20 0.02241  1/14  2/9703
52 0.02241  1/14  2/9703

WebGestalt enrichment analysis for genes with PIP>0.5

Loading the functional categories...
Loading the ID list...
Loading the reference list...
Performing the enrichment analysis...
Warning in oraEnrichment(interestGeneList, referenceGeneList, geneSet, minNum =
minNum, : No significant gene set is identified based on FDR 0.05!
NULL

PIP Manhattan Plot

Warning: ggrepel: 7 unlabeled data points (too many overlaps). Consider
increasing max.overlaps

Sensitivity, specificity and precision for silver standard genes

#number of genes in known annotations
print(length(known_annotations))
[1] 41
#number of genes in known annotations with imputed expression
print(sum(known_annotations %in% ctwas_gene_res$genename))
[1] 27
#significance threshold for TWAS
print(sig_thresh)
[1] 4.59
#number of ctwas genes
length(ctwas_genes)
[1] 2
#number of TWAS genes
length(twas_genes)
[1] 242
#show novel genes (ctwas genes with not in TWAS genes)
ctwas_gene_res[ctwas_gene_res$genename %in% novel_genes,report_cols]
      genename region_tag susie_pip   mu2       PVE      z num_eqtl
13394    NOL12      22_15    0.8865 62.62 0.0001652 -4.504        2
#sensitivity / recall
print(sensitivity)
  ctwas    TWAS 
0.00000 0.07317 
#specificity
print(specificity)
 ctwas   TWAS 
0.9998 0.9787 
#precision / PPV
print(precision)
 ctwas   TWAS 
0.0000 0.0124 


sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Scientific Linux 7.4 (Nitrogen)

Matrix products: default
BLAS/LAPACK: /software/openblas-0.2.19-el7-x86_64/lib/libopenblas_haswellp-r0.2.19.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] readxl_1.3.1      forcats_0.5.1     stringr_1.4.0     dplyr_1.0.7      
 [5] purrr_0.3.4       readr_2.1.1       tidyr_1.1.4       tidyverse_1.3.1  
 [9] tibble_3.1.6      WebGestaltR_0.4.4 disgenet2r_0.99.2 enrichR_3.0      
[13] cowplot_1.0.0     ggplot2_3.3.5     workflowr_1.6.2  

loaded via a namespace (and not attached):
 [1] fs_1.5.2          lubridate_1.8.0   bit64_4.0.5       doParallel_1.0.16
 [5] httr_1.4.2        rprojroot_2.0.2   tools_3.6.1       backports_1.4.1  
 [9] doRNG_1.8.2       utf8_1.2.2        R6_2.5.1          vipor_0.4.5      
[13] DBI_1.1.1         colorspace_2.0-2  withr_2.4.3       ggrastr_1.0.1    
[17] tidyselect_1.1.1  bit_4.0.4         curl_4.3.2        compiler_3.6.1   
[21] git2r_0.26.1      cli_3.1.0         rvest_1.0.2       Cairo_1.5-12.2   
[25] xml2_1.3.3        labeling_0.4.2    scales_1.1.1      apcluster_1.4.8  
[29] digest_0.6.29     rmarkdown_2.11    svglite_1.2.2     pkgconfig_2.0.3  
[33] htmltools_0.5.2   dbplyr_2.1.1      fastmap_1.1.0     highr_0.9        
[37] rlang_0.4.12      rstudioapi_0.13   RSQLite_2.2.8     jquerylib_0.1.4  
[41] farver_2.1.0      generics_0.1.1    jsonlite_1.7.2    vroom_1.5.7      
[45] magrittr_2.0.1    Matrix_1.2-18     ggbeeswarm_0.6.0  Rcpp_1.0.7       
[49] munsell_0.5.0     fansi_0.5.0       gdtools_0.1.9     lifecycle_1.0.1  
[53] stringi_1.7.6     whisker_0.3-2     yaml_2.2.1        plyr_1.8.6       
[57] grid_3.6.1        blob_1.2.2        ggrepel_0.9.1     parallel_3.6.1   
[61] promises_1.0.1    crayon_1.4.2      lattice_0.20-38   haven_2.4.3      
[65] hms_1.1.1         knitr_1.36        pillar_1.6.4      igraph_1.2.10    
[69] rjson_0.2.20      rngtools_1.5.2    reshape2_1.4.4    codetools_0.2-16 
[73] reprex_2.0.1      glue_1.5.1        evaluate_0.14     data.table_1.14.2
[77] modelr_0.1.8      vctrs_0.3.8       tzdb_0.2.0        httpuv_1.5.1     
[81] foreach_1.5.1     cellranger_1.1.0  gtable_0.3.0      assertthat_0.2.1 
[85] cachem_1.0.6      xfun_0.29         broom_0.7.10      later_0.8.0      
[89] iterators_1.0.13  beeswarm_0.2.3    memoise_2.0.1     ellipsis_0.3.2