Last updated: 2022-02-21

Checks: 6 1

Knit directory: cTWAS_analysis/

This reproducible R Markdown analysis was created with workflowr (version 1.6.2). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.


Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

The command set.seed(20211220) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

Using absolute paths to the files within your workflowr project makes it difficult for you and others to run your code on a different machine. Change the absolute path(s) below to the suggested relative path(s) to make your code more reproducible.

absolute relative
/project2/xinhe/shengqian/cTWAS/cTWAS_analysis/data/ data
/project2/xinhe/shengqian/cTWAS/cTWAS_analysis/code/ctwas_config.R code/ctwas_config.R

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version bbf6737. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .ipynb_checkpoints/

Untracked files:
    Untracked:  Rplot.png
    Untracked:  analysis/Glucose_Adipose_Subcutaneous.Rmd
    Untracked:  analysis/Glucose_Adipose_Visceral_Omentum.Rmd
    Untracked:  analysis/Splicing_Test.Rmd
    Untracked:  code/.ipynb_checkpoints/
    Untracked:  code/AF_out/
    Untracked:  code/BMI_S_out/
    Untracked:  code/BMI_out/
    Untracked:  code/Glucose_out/
    Untracked:  code/LDL_S_out/
    Untracked:  code/T2D_out/
    Untracked:  code/ctwas_config.R
    Untracked:  code/mapping.R
    Untracked:  code/out/
    Untracked:  code/run_AF_analysis.sbatch
    Untracked:  code/run_AF_analysis.sh
    Untracked:  code/run_AF_ctwas_rss_LDR.R
    Untracked:  code/run_BMI_analysis.sbatch
    Untracked:  code/run_BMI_analysis.sh
    Untracked:  code/run_BMI_analysis_S.sbatch
    Untracked:  code/run_BMI_analysis_S.sh
    Untracked:  code/run_BMI_ctwas_rss_LDR.R
    Untracked:  code/run_BMI_ctwas_rss_LDR_S.R
    Untracked:  code/run_Glucose_analysis.sbatch
    Untracked:  code/run_Glucose_analysis.sh
    Untracked:  code/run_Glucose_ctwas_rss_LDR.R
    Untracked:  code/run_LDL_analysis_S.sbatch
    Untracked:  code/run_LDL_analysis_S.sh
    Untracked:  code/run_LDL_ctwas_rss_LDR_S.R
    Untracked:  code/run_T2D_analysis.sbatch
    Untracked:  code/run_T2D_analysis.sh
    Untracked:  code/run_T2D_ctwas_rss_LDR.R
    Untracked:  data/.ipynb_checkpoints/
    Untracked:  data/AF/
    Untracked:  data/BMI/
    Untracked:  data/BMI_S/
    Untracked:  data/Glucose/
    Untracked:  data/LDL_S/
    Untracked:  data/T2D/
    Untracked:  data/TEST/
    Untracked:  data/UKBB/
    Untracked:  data/UKBB_SNPs_Info.text
    Untracked:  data/gene_OMIM.txt
    Untracked:  data/gene_pip_0.8.txt
    Untracked:  data/mashr_Heart_Atrial_Appendage.db
    Untracked:  data/mashr_sqtl/
    Untracked:  data/summary_known_genes_annotations.xlsx
    Untracked:  data/untitled.txt

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.


These are the previous versions of the repository in which changes were made to the R Markdown (analysis/BMI_Brain_Spinal_cord_cervical_c-1.Rmd) and HTML (docs/BMI_Brain_Spinal_cord_cervical_c-1.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File Version Author Date Message
Rmd bbf6737 sq-96 2022-02-21 update
html 91f38fa sq-96 2022-02-13 Build site.
Rmd eb13ecf sq-96 2022-02-13 update
html e6bc169 sq-96 2022-02-13 Build site.
Rmd 87fee8b sq-96 2022-02-13 update

Weight QC

#number of imputed weights
nrow(qclist_all)
[1] 10532
#number of imputed weights by chromosome
table(qclist_all$chr)

   1    2    3    4    5    6    7    8    9   10   11   12   13   14   15   16 
1032  741  611  415  500  593  513  394  399  413  617  585  228  354  366  467 
  17   18   19   20   21   22 
 630  170  801  314  124  265 
#number of imputed weights without missing variants
sum(qclist_all$nmiss==0)
[1] 8584
#proportion of imputed weights without missing variants
mean(qclist_all$nmiss==0)
[1] 0.815

Check convergence of parameters

Version Author Date
e6bc169 sq-96 2022-02-13
#estimated group prior
estimated_group_prior <- group_prior_rec[,ncol(group_prior_rec)]
names(estimated_group_prior) <- c("gene", "snp")
estimated_group_prior["snp"] <- estimated_group_prior["snp"]*thin #adjust parameter to account for thin argument
print(estimated_group_prior)
     gene       snp 
0.0053427 0.0002973 
#estimated group prior variance
estimated_group_prior_var <- group_prior_var_rec[,ncol(group_prior_var_rec)]
names(estimated_group_prior_var) <- c("gene", "snp")
print(estimated_group_prior_var)
 gene   snp 
30.33 17.10 
#report sample size
print(sample_size)
[1] 336107
#report group size
group_size <- c(nrow(ctwas_gene_res), n_snps)
print(group_size)
[1]   10532 7535010
#estimated group PVE
estimated_group_pve <- estimated_group_prior_var*estimated_group_prior*group_size/sample_size #check PVE calculation
names(estimated_group_pve) <- c("gene", "snp")
print(estimated_group_pve)
    gene      snp 
0.005079 0.113944 
#compare sum(PIP*mu2/sample_size) with above PVE calculation
c(sum(ctwas_gene_res$PVE),sum(ctwas_snp_res$PVE))
[1]  0.4381 15.3065

Genes with highest PIPs

           genename region_tag susie_pip      mu2       PVE      z num_eqtl
9777        KLHDC8B       3_34    1.0000  2542.58 7.565e-03 -5.052        2
789            SDHA        5_1    1.0000 21042.52 6.261e-02  3.012        1
11293    AC078842.3       7_84    1.0000 19171.30 5.704e-02 -3.208        1
4274        IGHMBP2      11_38    1.0000 31336.96 9.324e-02 -4.379        1
5140          MFAP1      15_16    1.0000 29203.28 8.689e-02  4.303        1
695           MAPK6      15_21    1.0000 28649.03 8.524e-02 -4.646        1
7232          PPM1M       3_36    1.0000   352.45 1.049e-03  4.732        2
1472          ASCC2      22_10    0.7776  8574.76 1.984e-02 -2.816        2
8471         EFEMP2      11_36    0.7758    50.61 1.168e-04 -7.484        2
12868         PANO1       11_1    0.7576    27.28 6.149e-05  4.979        2
3443          ZMIZ2       7_33    0.7505    67.76 1.513e-04 -8.105        1
7328          ZNF12        7_9    0.7474    28.10 6.249e-05  5.065        2
1464          RASD2      22_14    0.7403    24.87 5.479e-05 -4.362        2
12828 RP11-340F14.6      12_74    0.7297    30.02 6.517e-05 -4.742        2
11162         VPS52       6_28    0.7203   127.03 2.722e-04  1.606        1
2760         PDCD10      3_103    0.7017    23.84 4.978e-05 -4.065        1
2845          ITGB6       2_96    0.6533    59.14 1.150e-04  5.515        1
3959          KLK14      19_35    0.6373    28.25 5.356e-05 -4.062        1
1588           NINL      20_19    0.6093    34.74 6.298e-05 -5.532        2
1304           CBX5      12_33    0.6017    25.84 4.626e-05  4.691        1

Genes with largest effect sizes

        genename region_tag susie_pip   mu2     PVE       z num_eqtl
10032    SLC38A3       3_35         0 66889 0.00000   6.726        1
7397     CCDC171       9_13         0 42633 0.00000   8.471        2
38          RBM6       3_35         0 40476 0.00000  12.536        1
7227       MST1R       3_35         0 34543 0.00000 -12.635        2
8111      CALML6        1_1         0 32851 0.00000  -5.718        1
4274     IGHMBP2      11_38         1 31337 0.09324  -4.379        1
9078       STX19       3_59         0 30619 0.00000  -5.060        1
5140       MFAP1      15_16         1 29203 0.08689   4.303        1
695        MAPK6      15_21         1 28649 0.08524  -4.646        1
1280       WDR76      15_16         0 25920 0.00000   4.454        1
2418       CPT1A      11_38         0 24846 0.00000  -4.677        1
4970       TMOD3      15_21         0 23045 0.00000   5.412        1
7223      RNF123       3_35         0 22890 0.00000 -10.959        1
789         SDHA        5_1         1 21043 0.06261   3.012        1
4873     TUBGCP4      15_16         0 20509 0.00000   3.366        1
11293 AC078842.3       7_84         1 19171 0.05704  -3.208        1
7966        ADAL      15_16         0 17919 0.00000  -2.861        1
7967       LCMT2      15_16         0 17919 0.00000  -2.861        1
9868       HYAL3       3_35         0 17850 0.00000   6.264        2
859         MCM6       2_80         0 17636 0.00000  -3.886        1

Genes with highest PVE

           genename region_tag susie_pip      mu2       PVE       z num_eqtl
4274        IGHMBP2      11_38   1.00000 31336.96 9.324e-02  -4.379        1
5140          MFAP1      15_16   1.00000 29203.28 8.689e-02   4.303        1
695           MAPK6      15_21   1.00000 28649.03 8.524e-02  -4.646        1
789            SDHA        5_1   1.00000 21042.52 6.261e-02   3.012        1
11293    AC078842.3       7_84   1.00000 19171.30 5.704e-02  -3.208        1
1472          ASCC2      22_10   0.77762  8574.76 1.984e-02  -2.816        2
9777        KLHDC8B       3_34   1.00000  2542.58 7.565e-03  -5.052        2
262            CPS1      2_124   0.49202  4722.15 6.913e-03   3.535        1
2872         LANCL1      2_124   0.49202  4722.15 6.913e-03  -3.535        1
7232          PPM1M       3_36   1.00000   352.45 1.049e-03   4.732        2
11162         VPS52       6_28   0.72030   127.03 2.722e-04   1.606        1
8727         ASPHD1      16_24   0.59151   120.67 2.124e-04 -11.849        1
10189        ATP2A1      16_23   0.50558   100.96 1.519e-04 -10.759        1
3443          ZMIZ2       7_33   0.75046    67.76 1.513e-04  -8.105        1
6433          GPR61       1_67   0.56829    81.16 1.372e-04   8.755        1
10756        LY6G5C       6_26   0.43405   106.00 1.369e-04   8.418        1
12619 CTD-2186M15.3       5_22   0.02121  2014.86 1.271e-04   2.934        2
8471         EFEMP2      11_36   0.77583    50.61 1.168e-04  -7.484        2
2845          ITGB6       2_96   0.65332    59.14 1.150e-04   5.515        1
13013        DHRS11      17_22   0.51954    63.46 9.809e-05  -8.128        1

Genes with largest z scores

       genename region_tag susie_pip      mu2       PVE       z num_eqtl
7227      MST1R       3_35 0.000e+00 34543.29 0.000e+00 -12.635        2
38         RBM6       3_35 0.000e+00 40475.73 0.000e+00  12.536        1
8727     ASPHD1      16_24 5.915e-01   120.67 2.124e-04 -11.849        1
1048      EFR3B       2_15 1.115e-08   203.25 6.742e-12  11.587        1
8728     KCTD13      16_24 6.773e-02   115.84 2.334e-05 -11.491        1
8068     INO80E      16_24 1.260e-02   103.18 3.868e-06  11.077        1
7223     RNF123       3_35 0.000e+00 22889.67 0.000e+00 -10.959        1
1721      MAPK3      16_24 1.146e-02   103.03 3.512e-06  10.880        1
10189    ATP2A1      16_23 5.056e-01   100.96 1.519e-04 -10.759        1
11438    NPIPB7      16_23 7.273e-02   100.96 2.185e-05  10.510        1
10225   SULT1A1      16_23 2.947e-02    99.07 8.685e-06  10.367        1
10271  C6orf106       6_28 5.814e-05   124.60 2.155e-08 -10.264        1
10322   SULT1A2      16_23 1.683e-02    95.72 4.794e-06 -10.171        2
7747     ZNF668      16_24 1.090e-01    80.22 2.602e-05  10.000        1
5341       SAE1      19_33 1.095e-03   100.69 3.281e-07   9.849        1
8426    C1QTNF4      11_29 6.346e-03    90.12 1.701e-06   9.564        1
11752 LINC00461       5_52 8.339e-11   357.45 8.868e-14   9.418        1
10335      IL27      16_23 1.421e-02    81.12 3.430e-06  -9.140        1
8427      NEGR1       1_46 9.461e-02    76.50 2.153e-05  -8.928        1
7515      PSMC3      11_29 6.783e-03    78.61 1.586e-06  -8.866        1

Comparing z scores and PIPs

[1] 0.02136
       genename region_tag susie_pip      mu2       PVE       z num_eqtl
7227      MST1R       3_35 0.000e+00 34543.29 0.000e+00 -12.635        2
38         RBM6       3_35 0.000e+00 40475.73 0.000e+00  12.536        1
8727     ASPHD1      16_24 5.915e-01   120.67 2.124e-04 -11.849        1
1048      EFR3B       2_15 1.115e-08   203.25 6.742e-12  11.587        1
8728     KCTD13      16_24 6.773e-02   115.84 2.334e-05 -11.491        1
8068     INO80E      16_24 1.260e-02   103.18 3.868e-06  11.077        1
7223     RNF123       3_35 0.000e+00 22889.67 0.000e+00 -10.959        1
1721      MAPK3      16_24 1.146e-02   103.03 3.512e-06  10.880        1
10189    ATP2A1      16_23 5.056e-01   100.96 1.519e-04 -10.759        1
11438    NPIPB7      16_23 7.273e-02   100.96 2.185e-05  10.510        1
10225   SULT1A1      16_23 2.947e-02    99.07 8.685e-06  10.367        1
10271  C6orf106       6_28 5.814e-05   124.60 2.155e-08 -10.264        1
10322   SULT1A2      16_23 1.683e-02    95.72 4.794e-06 -10.171        2
7747     ZNF668      16_24 1.090e-01    80.22 2.602e-05  10.000        1
5341       SAE1      19_33 1.095e-03   100.69 3.281e-07   9.849        1
8426    C1QTNF4      11_29 6.346e-03    90.12 1.701e-06   9.564        1
11752 LINC00461       5_52 8.339e-11   357.45 8.868e-14   9.418        1
10335      IL27      16_23 1.421e-02    81.12 3.430e-06  -9.140        1
8427      NEGR1       1_46 9.461e-02    76.50 2.153e-05  -8.928        1
7515      PSMC3      11_29 6.783e-03    78.61 1.586e-06  -8.866        1

GO enrichment analysis for genes with PIP>0.5

#number of genes for gene set enrichment
length(genes)
[1] 32
Uploading data to Enrichr... Done.
  Querying GO_Biological_Process_2021... Done.
  Querying GO_Cellular_Component_2021... Done.
  Querying GO_Molecular_Function_2021... Done.
Parsing results... Done.
[1] "GO_Biological_Process_2021"

[1] Term             Overlap          Adjusted.P.value Genes           
<0 rows> (or 0-length row.names)
[1] "GO_Cellular_Component_2021"

                               Term Overlap Adjusted.P.value        Genes
1          microfibril (GO:0001527)    2/11         0.007029 EFEMP2;MFAP1
2 supramolecular fiber (GO:0099512)    2/19         0.010840 EFEMP2;MFAP1
[1] "GO_Molecular_Function_2021"

[1] Term             Overlap          Adjusted.P.value Genes           
<0 rows> (or 0-length row.names)

DisGeNET enrichment analysis for genes with PIP>0.5

                                             Description     FDR Ratio BgRatio
45          Nodular Sclerosis Classical Hodgkin Lymphoma 0.01404  1/15  1/9703
77                                        Brody myopathy 0.01404  1/15  1/9703
84   SPINAL MUSCULAR ATROPHY WITH RESPIRATORY DISTRESS 1 0.01404  1/15  1/9703
85                    Cerebral Cavernous Malformations 3 0.01404  1/15  1/9703
92              Familial cerebral cavernous malformation 0.01404  1/15  1/9703
95                          CARDIOMYOPATHY, DILATED, 1GG 0.01404  1/15  1/9703
97                                      PARAGANGLIOMAS 5 0.01404  1/15  1/9703
98              CUTIS LAXA, AUTOSOMAL RECESSIVE, TYPE IB 0.01404  1/15  1/9703
100 CHARCOT-MARIE-TOOTH DISEASE, DOMINANT INTERMEDIATE F 0.01404  1/15  1/9703
103         CHARCOT-MARIE-TOOTH DISEASE, AXONAL, TYPE 2S 0.01404  1/15  1/9703

WebGestalt enrichment analysis for genes with PIP>0.5

Loading the functional categories...
Loading the ID list...
Loading the reference list...
Performing the enrichment analysis...
Warning in oraEnrichment(interestGeneList, referenceGeneList, geneSet, minNum =
minNum, : No significant gene set is identified based on FDR 0.05!
NULL

PIP Manhattan Plot

Sensitivity, specificity and precision for silver standard genes

#number of genes in known annotations
print(length(known_annotations))
[1] 41
#number of genes in known annotations with imputed expression
print(sum(known_annotations %in% ctwas_gene_res$genename))
[1] 22
#significance threshold for TWAS
print(sig_thresh)
[1] 4.576
#number of ctwas genes
length(ctwas_genes)
[1] 7
#number of TWAS genes
length(twas_genes)
[1] 225
#show novel genes (ctwas genes with not in TWAS genes)
ctwas_gene_res[ctwas_gene_res$genename %in% novel_genes,report_cols]
        genename region_tag susie_pip   mu2     PVE      z num_eqtl
789         SDHA        5_1         1 21043 0.06261  3.012        1
11293 AC078842.3       7_84         1 19171 0.05704 -3.208        1
4274     IGHMBP2      11_38         1 31337 0.09324 -4.379        1
5140       MFAP1      15_16         1 29203 0.08689  4.303        1
#sensitivity / recall
print(sensitivity)
  ctwas    TWAS 
0.00000 0.07317 
#specificity
print(specificity)
 ctwas   TWAS 
0.9993 0.9789 
#precision / PPV
print(precision)
  ctwas    TWAS 
0.00000 0.01333 


sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Scientific Linux 7.4 (Nitrogen)

Matrix products: default
BLAS/LAPACK: /software/openblas-0.2.19-el7-x86_64/lib/libopenblas_haswellp-r0.2.19.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] readxl_1.3.1      forcats_0.5.1     stringr_1.4.0     dplyr_1.0.7      
 [5] purrr_0.3.4       readr_2.1.1       tidyr_1.1.4       tidyverse_1.3.1  
 [9] tibble_3.1.6      WebGestaltR_0.4.4 disgenet2r_0.99.2 enrichR_3.0      
[13] cowplot_1.0.0     ggplot2_3.3.5     workflowr_1.6.2  

loaded via a namespace (and not attached):
 [1] fs_1.5.2          lubridate_1.8.0   bit64_4.0.5       doParallel_1.0.16
 [5] httr_1.4.2        rprojroot_2.0.2   tools_3.6.1       backports_1.4.1  
 [9] doRNG_1.8.2       utf8_1.2.2        R6_2.5.1          vipor_0.4.5      
[13] DBI_1.1.1         colorspace_2.0-2  withr_2.4.3       ggrastr_1.0.1    
[17] tidyselect_1.1.1  bit_4.0.4         curl_4.3.2        compiler_3.6.1   
[21] git2r_0.26.1      cli_3.1.0         rvest_1.0.2       Cairo_1.5-12.2   
[25] xml2_1.3.3        labeling_0.4.2    scales_1.1.1      apcluster_1.4.8  
[29] digest_0.6.29     rmarkdown_2.11    svglite_1.2.2     pkgconfig_2.0.3  
[33] htmltools_0.5.2   dbplyr_2.1.1      fastmap_1.1.0     highr_0.9        
[37] rlang_0.4.12      rstudioapi_0.13   RSQLite_2.2.8     jquerylib_0.1.4  
[41] farver_2.1.0      generics_0.1.1    jsonlite_1.7.2    vroom_1.5.7      
[45] magrittr_2.0.1    Matrix_1.2-18     ggbeeswarm_0.6.0  Rcpp_1.0.7       
[49] munsell_0.5.0     fansi_0.5.0       gdtools_0.1.9     lifecycle_1.0.1  
[53] stringi_1.7.6     whisker_0.3-2     yaml_2.2.1        plyr_1.8.6       
[57] grid_3.6.1        blob_1.2.2        ggrepel_0.9.1     parallel_3.6.1   
[61] promises_1.0.1    crayon_1.4.2      lattice_0.20-38   haven_2.4.3      
[65] hms_1.1.1         knitr_1.36        pillar_1.6.4      igraph_1.2.10    
[69] rjson_0.2.20      rngtools_1.5.2    reshape2_1.4.4    codetools_0.2-16 
[73] reprex_2.0.1      glue_1.5.1        evaluate_0.14     data.table_1.14.2
[77] modelr_0.1.8      vctrs_0.3.8       tzdb_0.2.0        httpuv_1.5.1     
[81] foreach_1.5.1     cellranger_1.1.0  gtable_0.3.0      assertthat_0.2.1 
[85] cachem_1.0.6      xfun_0.29         broom_0.7.10      later_0.8.0      
[89] iterators_1.0.13  beeswarm_0.2.3    memoise_2.0.1     ellipsis_0.3.2