Last updated: 2022-02-21

Checks: 6 1

Knit directory: cTWAS_analysis/

This reproducible R Markdown analysis was created with workflowr (version 1.6.2). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.


Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

The command set.seed(20211220) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

Using absolute paths to the files within your workflowr project makes it difficult for you and others to run your code on a different machine. Change the absolute path(s) below to the suggested relative path(s) to make your code more reproducible.

absolute relative
/project2/xinhe/shengqian/cTWAS/cTWAS_analysis/data/ data
/project2/xinhe/shengqian/cTWAS/cTWAS_analysis/code/ctwas_config.R code/ctwas_config.R

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version bbf6737. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .ipynb_checkpoints/

Untracked files:
    Untracked:  Rplot.png
    Untracked:  analysis/Glucose_Adipose_Subcutaneous.Rmd
    Untracked:  analysis/Glucose_Adipose_Visceral_Omentum.Rmd
    Untracked:  analysis/Splicing_Test.Rmd
    Untracked:  code/.ipynb_checkpoints/
    Untracked:  code/AF_out/
    Untracked:  code/BMI_S_out/
    Untracked:  code/BMI_out/
    Untracked:  code/Glucose_out/
    Untracked:  code/LDL_S_out/
    Untracked:  code/T2D_out/
    Untracked:  code/ctwas_config.R
    Untracked:  code/mapping.R
    Untracked:  code/out/
    Untracked:  code/run_AF_analysis.sbatch
    Untracked:  code/run_AF_analysis.sh
    Untracked:  code/run_AF_ctwas_rss_LDR.R
    Untracked:  code/run_BMI_analysis.sbatch
    Untracked:  code/run_BMI_analysis.sh
    Untracked:  code/run_BMI_analysis_S.sbatch
    Untracked:  code/run_BMI_analysis_S.sh
    Untracked:  code/run_BMI_ctwas_rss_LDR.R
    Untracked:  code/run_BMI_ctwas_rss_LDR_S.R
    Untracked:  code/run_Glucose_analysis.sbatch
    Untracked:  code/run_Glucose_analysis.sh
    Untracked:  code/run_Glucose_ctwas_rss_LDR.R
    Untracked:  code/run_LDL_analysis_S.sbatch
    Untracked:  code/run_LDL_analysis_S.sh
    Untracked:  code/run_LDL_ctwas_rss_LDR_S.R
    Untracked:  code/run_T2D_analysis.sbatch
    Untracked:  code/run_T2D_analysis.sh
    Untracked:  code/run_T2D_ctwas_rss_LDR.R
    Untracked:  data/.ipynb_checkpoints/
    Untracked:  data/AF/
    Untracked:  data/BMI/
    Untracked:  data/BMI_S/
    Untracked:  data/Glucose/
    Untracked:  data/LDL_S/
    Untracked:  data/T2D/
    Untracked:  data/TEST/
    Untracked:  data/UKBB/
    Untracked:  data/UKBB_SNPs_Info.text
    Untracked:  data/gene_OMIM.txt
    Untracked:  data/gene_pip_0.8.txt
    Untracked:  data/mashr_Heart_Atrial_Appendage.db
    Untracked:  data/mashr_sqtl/
    Untracked:  data/summary_known_genes_annotations.xlsx
    Untracked:  data/untitled.txt

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.


These are the previous versions of the repository in which changes were made to the R Markdown (analysis/BMI_Brain_Cerebellar_Hemisphere.Rmd) and HTML (docs/BMI_Brain_Cerebellar_Hemisphere.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File Version Author Date Message
Rmd bbf6737 sq-96 2022-02-21 update
html 91f38fa sq-96 2022-02-13 Build site.
Rmd eb13ecf sq-96 2022-02-13 update
html e6bc169 sq-96 2022-02-13 Build site.
Rmd 87fee8b sq-96 2022-02-13 update

Weight QC

#number of imputed weights
nrow(qclist_all)
[1] 11315
#number of imputed weights by chromosome
table(qclist_all$chr)

   1    2    3    4    5    6    7    8    9   10   11   12   13   14   15   16 
1087  770  652  425  535  625  556  423  440  443  698  615  209  381  372  538 
  17   18   19   20   21   22 
 709  170  904  333  134  296 
#number of imputed weights without missing variants
sum(qclist_all$nmiss==0)
[1] 8732
#proportion of imputed weights without missing variants
mean(qclist_all$nmiss==0)
[1] 0.7717

Check convergence of parameters

Version Author Date
e6bc169 sq-96 2022-02-13
#estimated group prior
estimated_group_prior <- group_prior_rec[,ncol(group_prior_rec)]
names(estimated_group_prior) <- c("gene", "snp")
estimated_group_prior["snp"] <- estimated_group_prior["snp"]*thin #adjust parameter to account for thin argument
print(estimated_group_prior)
     gene       snp 
0.0097195 0.0002858 
#estimated group prior variance
estimated_group_prior_var <- group_prior_var_rec[,ncol(group_prior_var_rec)]
names(estimated_group_prior_var) <- c("gene", "snp")
print(estimated_group_prior_var)
 gene   snp 
17.70 17.92 
#report sample size
print(sample_size)
[1] 336107
#report group size
group_size <- c(nrow(ctwas_gene_res), n_snps)
print(group_size)
[1]   11315 7535010
#estimated group PVE
estimated_group_pve <- estimated_group_prior_var*estimated_group_prior*group_size/sample_size #check PVE calculation
names(estimated_group_pve) <- c("gene", "snp")
print(estimated_group_pve)
    gene      snp 
0.005791 0.114815 
#compare sum(PIP*mu2/sample_size) with above PVE calculation
c(sum(ctwas_gene_res$PVE),sum(ctwas_snp_res$PVE))
[1]  0.06977 17.00375

Genes with highest PIPs

      genename region_tag susie_pip     mu2       PVE       z num_eqtl
11541   NDUFS3      11_29    0.9962 1179.68 3.497e-03 -11.094        2
3395     CCND2       12_4    0.9680   28.45 8.194e-05  -5.094        2
978     PIK3C3      18_23    0.9528   51.82 1.469e-04   6.896        2
7720     ZNF12       7_10    0.9349   27.64 7.689e-05   5.106        2
4962     DCAF7      17_37    0.8835   28.48 7.487e-05   5.437        1
7905     CASP7      10_71    0.8757   24.36 6.346e-05   4.584        1
9464    ZBTB41       1_98    0.8621 1788.23 4.587e-03   4.618        1
8843     LAMB2       3_34    0.8204  138.48 3.380e-04  -7.471        1
518      KCNH2       7_93    0.7943   40.89 9.662e-05   6.352        2
8913    EFEMP2      11_36    0.7872   56.09 1.314e-04  -8.201        1
1242      XRN2      20_15    0.7859   23.48 5.491e-05  -4.449        3
7481  SERPINI1      3_103    0.7809   21.25 4.937e-05  -3.916        2
4684     YWHAQ        2_6    0.7693   25.68 5.878e-05   4.911        1
3471     YIPF4       2_20    0.7572  628.63 1.416e-03   2.868        4
1398      CBX5      12_33    0.7455   25.07 5.560e-05  -4.691        1
8350      TAP1       6_27    0.7394   29.03 6.387e-05   5.285        1
3479      SLF2      10_64    0.7338   30.52 6.662e-05   4.780        2
8202   NCKAP5L      12_31    0.7242   49.53 1.067e-04  -8.217        1
8279     NLRC3       16_3    0.7209   33.49 7.182e-05   5.243        1
4586   CSNK1G2       19_2    0.7170   31.70 6.763e-05  -5.549        2

Genes with largest effect sizes

          genename region_tag susie_pip   mu2       PVE      z num_eqtl
7785       CCDC171       9_13 0.000e+00 53826 0.000e+00  7.951        1
7784         PSIP1       9_13 0.000e+00 45986 0.000e+00  8.364        1
6221         CNNM2      10_66 1.346e-03 36526 1.463e-04 -5.132        1
6469       ARL14EP      11_21 0.000e+00 28862 0.000e+00  6.331        2
5419         MFAP1      15_16 0.000e+00 24545 0.000e+00  4.303        1
12479 RP11-757G1.6      11_38 1.092e-05 24452 7.943e-07  4.319        2
8035          LEO1      15_21 8.436e-04 23922 6.004e-05  4.647        1
13446    LINC02019       3_35 1.410e-07 23283 9.769e-09 -4.344        2
3007          CISH       3_35 0.000e+00 22928 0.000e+00 -4.823        1
11730       CKMT1A      15_16 0.000e+00 21983 0.000e+00  4.130        1
10888       MRPL21      11_38 0.000e+00 21546 0.000e+00  3.982        2
3006         HEMK1       3_35 0.000e+00 19749 0.000e+00 -4.682        1
1065         CCNT2       2_80 1.380e-04 19271 7.913e-06  3.713        1
3139         PLCL1      2_117 0.000e+00 19186 0.000e+00 -5.642        1
5423        LYSMD2      15_21 0.000e+00 18805 0.000e+00 -5.232        1
8114         MAP1A      15_16 0.000e+00 17091 0.000e+00  3.818        2
1452         MAST3      19_14 0.000e+00 16327 0.000e+00 -2.208        1
9538         NSUN3       3_59 0.000e+00 16205 0.000e+00  4.755        1
8409          ADAL      15_16 0.000e+00 15308 0.000e+00 -2.861        1
130       CACNA2D2       3_35 0.000e+00 14672 0.000e+00 -4.014        1

Genes with highest PVE

      genename region_tag susie_pip      mu2       PVE       z num_eqtl
2642    PTPMT1      11_29  0.456199 14326.39 1.945e-02  -3.623        2
9530     ERBB4      2_125  0.703049  5989.36 1.253e-02  -7.023        1
286       CPS1      2_124  0.364179  4800.48 5.201e-03  -3.562        1
9464    ZBTB41       1_98  0.862082  1788.23 4.587e-03   4.618        1
3081    LANCL1      2_124  0.316378  4817.32 4.535e-03  -3.535        1
11541   NDUFS3      11_29  0.996248  1179.68 3.497e-03 -11.094        2
3471     YIPF4       2_20  0.757159   628.63 1.416e-03   2.868        4
8843     LAMB2       3_34  0.820428   138.48 3.380e-04  -7.471        1
11726    VPS52       6_28  0.631264   126.68 2.379e-04   1.603        1
978     PIK3C3      18_23  0.952810    51.82 1.469e-04   6.896        2
6221     CNNM2      10_66  0.001346 36525.78 1.463e-04  -5.132        1
11281     RNF5       6_26  0.246996   181.55 1.334e-04   6.337        2
8913    EFEMP2      11_36  0.787153    56.09 1.314e-04  -8.201        1
1460     STX1B      16_24  0.512675    80.29 1.225e-04 -10.209        1
7606     MFSD8       4_84  0.005705  7091.90 1.204e-04   2.512        1
8202   NCKAP5L      12_31  0.724170    49.53 1.067e-04  -8.217        1
518      KCNH2       7_93  0.794299    40.89 9.662e-05   6.352        2
13683   DHRS11      17_22  0.480896    61.80 8.842e-05  -8.128        1
3395     CCND2       12_4  0.968047    28.45 8.194e-05  -5.094        2
7263      TAL1       1_29  0.563761    47.91 8.036e-05  -6.866        1

Genes with largest z scores

            genename region_tag susie_pip     mu2       PVE       z num_eqtl
41              RBM6       3_35 1.197e-03  934.02 3.327e-06  12.536        1
33              RBM5       3_35 6.485e-04  978.25 1.888e-06  12.473        1
7609           MST1R       3_35 3.362e-10  248.31 2.484e-13 -11.521        3
9166          KCTD13      16_24 1.073e-01  109.75 3.504e-05 -11.491        1
11541         NDUFS3      11_29 9.962e-01 1179.68 3.497e-03 -11.094        2
8510          INO80E      16_24 2.414e-02   98.53 7.075e-06  11.077        1
7604          RNF123       3_35 1.410e-11  847.57 3.555e-14 -10.957        1
12511 RP11-1348G14.4      16_23 2.267e-01   91.80 6.192e-05  10.676        1
10122          APOBR      16_23 1.382e-01   93.79 3.855e-05 -10.540        1
9282           NUPR1      16_23 1.382e-01   93.79 3.855e-05 -10.540        1
12037         NPIPB7      16_23 1.036e-01   90.83 2.801e-05  10.510        1
6310           DOC2A      16_24 3.833e-02   87.50 9.978e-06 -10.320        2
10802       C6orf106       6_28 4.122e-05  118.83 1.457e-08 -10.264        1
1460           STX1B      16_24 5.127e-01   80.29 1.225e-04 -10.209        1
8172          ZNF646      16_24 5.766e-02   75.22 1.290e-05 -10.000        1
8171          ZNF668      16_24 5.766e-02   75.22 1.290e-05  10.000        1
2889        COL4A3BP       5_44 3.736e-02   69.80 7.759e-06   9.828        1
484            PRSS8      16_24 1.768e-02   71.39 3.754e-06   9.765        1
649         UHRF1BP1       6_28 1.067e-07   88.10 2.797e-11  -9.654        2
1937           BCKDK      16_24 1.390e-02   68.03 2.813e-06  -9.638        1

Comparing z scores and PIPs

[1] 0.02307
            genename region_tag susie_pip     mu2       PVE       z num_eqtl
41              RBM6       3_35 1.197e-03  934.02 3.327e-06  12.536        1
33              RBM5       3_35 6.485e-04  978.25 1.888e-06  12.473        1
7609           MST1R       3_35 3.362e-10  248.31 2.484e-13 -11.521        3
9166          KCTD13      16_24 1.073e-01  109.75 3.504e-05 -11.491        1
11541         NDUFS3      11_29 9.962e-01 1179.68 3.497e-03 -11.094        2
8510          INO80E      16_24 2.414e-02   98.53 7.075e-06  11.077        1
7604          RNF123       3_35 1.410e-11  847.57 3.555e-14 -10.957        1
12511 RP11-1348G14.4      16_23 2.267e-01   91.80 6.192e-05  10.676        1
10122          APOBR      16_23 1.382e-01   93.79 3.855e-05 -10.540        1
9282           NUPR1      16_23 1.382e-01   93.79 3.855e-05 -10.540        1
12037         NPIPB7      16_23 1.036e-01   90.83 2.801e-05  10.510        1
6310           DOC2A      16_24 3.833e-02   87.50 9.978e-06 -10.320        2
10802       C6orf106       6_28 4.122e-05  118.83 1.457e-08 -10.264        1
1460           STX1B      16_24 5.127e-01   80.29 1.225e-04 -10.209        1
8172          ZNF646      16_24 5.766e-02   75.22 1.290e-05 -10.000        1
8171          ZNF668      16_24 5.766e-02   75.22 1.290e-05  10.000        1
2889        COL4A3BP       5_44 3.736e-02   69.80 7.759e-06   9.828        1
484            PRSS8      16_24 1.768e-02   71.39 3.754e-06   9.765        1
649         UHRF1BP1       6_28 1.067e-07   88.10 2.797e-11  -9.654        2
1937           BCKDK      16_24 1.390e-02   68.03 2.813e-06  -9.638        1

GO enrichment analysis for genes with PIP>0.5

#number of genes for gene set enrichment
length(genes)
[1] 43
Uploading data to Enrichr... Done.
  Querying GO_Biological_Process_2021... Done.
  Querying GO_Cellular_Component_2021... Done.
  Querying GO_Molecular_Function_2021... Done.
Parsing results... Done.
[1] "GO_Biological_Process_2021"

                                                                                                                       Term
1 positive regulation of protein insertion into mitochondrial membrane involved in apoptotic signaling pathway (GO:1900740)
2          regulation of protein insertion into mitochondrial membrane involved in apoptotic signaling pathway (GO:1900739)
3 positive regulation of mitochondrial outer membrane permeabilization involved in apoptotic signaling pathway (GO:1901030)
4                                positive regulation of establishment of protein localization to mitochondrion (GO:1903749)
  Overlap Adjusted.P.value             Genes
1    3/26         0.004615 YWHAQ;YWHAB;YWHAZ
2    3/26         0.004615 YWHAQ;YWHAB;YWHAZ
3    3/34         0.006997 YWHAQ;YWHAB;YWHAZ
4    3/56         0.023522 YWHAQ;YWHAB;YWHAZ
[1] "GO_Cellular_Component_2021"

[1] Term             Overlap          Adjusted.P.value Genes           
<0 rows> (or 0-length row.names)
[1] "GO_Molecular_Function_2021"

[1] Term             Overlap          Adjusted.P.value Genes           
<0 rows> (or 0-length row.names)

DisGeNET enrichment analysis for genes with PIP>0.5

                                                                                     Description
79                                                        Diffuse mesangial sclerosis (disorder)
117                                                  ERYTHROKERATODERMIA VARIABILIS 3 (disorder)
118                                                                             Pierson syndrome
119                                                          ATAXIA, SENSORY, AUTOSOMAL DOMINANT
128                                    Familial encephalopathy with neuroserpin inclusion bodies
143                             NEPHROTIC SYNDROME, TYPE 5, WITH OR WITHOUT OCULAR ABNORMALITIES
144                                                     CUTIS LAXA, AUTOSOMAL RECESSIVE, TYPE IB
146                                                                 Familial mesangial sclerosis
147 Nephrotic Syndrome, Congenital, With Ocular Abnormalities And Congenital Myasthenic Syndrome
150                                                             AMYOTROPHIC LATERAL SCLEROSIS 19
        FDR Ratio BgRatio
79  0.02608  1/22  1/9703
117 0.02608  1/22  1/9703
118 0.02608  1/22  1/9703
119 0.02608  1/22  1/9703
128 0.02608  1/22  1/9703
143 0.02608  1/22  1/9703
144 0.02608  1/22  1/9703
146 0.02608  1/22  1/9703
147 0.02608  1/22  1/9703
150 0.02608  1/22  1/9703

WebGestalt enrichment analysis for genes with PIP>0.5

Loading the functional categories...
Loading the ID list...
Loading the reference list...
Performing the enrichment analysis...
Warning in oraEnrichment(interestGeneList, referenceGeneList, geneSet, minNum =
minNum, : No significant gene set is identified based on FDR 0.05!
NULL

PIP Manhattan Plot

Warning: ggrepel: 3 unlabeled data points (too many overlaps). Consider
increasing max.overlaps

Sensitivity, specificity and precision for silver standard genes

#number of genes in known annotations
print(length(known_annotations))
[1] 41
#number of genes in known annotations with imputed expression
print(sum(known_annotations %in% ctwas_gene_res$genename))
[1] 25
#significance threshold for TWAS
print(sig_thresh)
[1] 4.591
#number of ctwas genes
length(ctwas_genes)
[1] 8
#number of TWAS genes
length(twas_genes)
[1] 261
#show novel genes (ctwas genes with not in TWAS genes)
ctwas_gene_res[ctwas_gene_res$genename %in% novel_genes,report_cols]
     genename region_tag susie_pip   mu2       PVE     z num_eqtl
7905    CASP7      10_71    0.8757 24.36 6.346e-05 4.584        1
#sensitivity / recall
print(sensitivity)
  ctwas    TWAS 
0.00000 0.07317 
#specificity
print(specificity)
 ctwas   TWAS 
0.9993 0.9771 
#precision / PPV
print(precision)
  ctwas    TWAS 
0.00000 0.01149 


sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Scientific Linux 7.4 (Nitrogen)

Matrix products: default
BLAS/LAPACK: /software/openblas-0.2.19-el7-x86_64/lib/libopenblas_haswellp-r0.2.19.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] readxl_1.3.1      forcats_0.5.1     stringr_1.4.0     dplyr_1.0.7      
 [5] purrr_0.3.4       readr_2.1.1       tidyr_1.1.4       tidyverse_1.3.1  
 [9] tibble_3.1.6      WebGestaltR_0.4.4 disgenet2r_0.99.2 enrichR_3.0      
[13] cowplot_1.0.0     ggplot2_3.3.5     workflowr_1.6.2  

loaded via a namespace (and not attached):
 [1] fs_1.5.2          lubridate_1.8.0   bit64_4.0.5       doParallel_1.0.16
 [5] httr_1.4.2        rprojroot_2.0.2   tools_3.6.1       backports_1.4.1  
 [9] doRNG_1.8.2       utf8_1.2.2        R6_2.5.1          vipor_0.4.5      
[13] DBI_1.1.1         colorspace_2.0-2  withr_2.4.3       ggrastr_1.0.1    
[17] tidyselect_1.1.1  bit_4.0.4         curl_4.3.2        compiler_3.6.1   
[21] git2r_0.26.1      cli_3.1.0         rvest_1.0.2       Cairo_1.5-12.2   
[25] xml2_1.3.3        labeling_0.4.2    scales_1.1.1      apcluster_1.4.8  
[29] digest_0.6.29     rmarkdown_2.11    svglite_1.2.2     pkgconfig_2.0.3  
[33] htmltools_0.5.2   dbplyr_2.1.1      fastmap_1.1.0     highr_0.9        
[37] rlang_0.4.12      rstudioapi_0.13   RSQLite_2.2.8     jquerylib_0.1.4  
[41] farver_2.1.0      generics_0.1.1    jsonlite_1.7.2    vroom_1.5.7      
[45] magrittr_2.0.1    Matrix_1.2-18     ggbeeswarm_0.6.0  Rcpp_1.0.7       
[49] munsell_0.5.0     fansi_0.5.0       gdtools_0.1.9     lifecycle_1.0.1  
[53] stringi_1.7.6     whisker_0.3-2     yaml_2.2.1        plyr_1.8.6       
[57] grid_3.6.1        blob_1.2.2        ggrepel_0.9.1     parallel_3.6.1   
[61] promises_1.0.1    crayon_1.4.2      lattice_0.20-38   haven_2.4.3      
[65] hms_1.1.1         knitr_1.36        pillar_1.6.4      igraph_1.2.10    
[69] rjson_0.2.20      rngtools_1.5.2    reshape2_1.4.4    codetools_0.2-16 
[73] reprex_2.0.1      glue_1.5.1        evaluate_0.14     data.table_1.14.2
[77] modelr_0.1.8      vctrs_0.3.8       tzdb_0.2.0        httpuv_1.5.1     
[81] foreach_1.5.1     cellranger_1.1.0  gtable_0.3.0      assertthat_0.2.1 
[85] cachem_1.0.6      xfun_0.29         broom_0.7.10      later_0.8.0      
[89] iterators_1.0.13  beeswarm_0.2.3    memoise_2.0.1     ellipsis_0.3.2