Last updated: 2022-02-21

Checks: 6 1

Knit directory: cTWAS_analysis/

This reproducible R Markdown analysis was created with workflowr (version 1.6.2). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.


Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

The command set.seed(20211220) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

Using absolute paths to the files within your workflowr project makes it difficult for you and others to run your code on a different machine. Change the absolute path(s) below to the suggested relative path(s) to make your code more reproducible.

absolute relative
/project2/xinhe/shengqian/cTWAS/cTWAS_analysis/data/ data
/project2/xinhe/shengqian/cTWAS/cTWAS_analysis/code/ctwas_config.R code/ctwas_config.R

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version bbf6737. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .ipynb_checkpoints/
    Ignored:    analysis/figure/

Untracked files:
    Untracked:  Rplot.png
    Untracked:  analysis/Glucose_Adipose_Subcutaneous.Rmd
    Untracked:  analysis/Glucose_Adipose_Visceral_Omentum.Rmd
    Untracked:  analysis/Splicing_Test.Rmd
    Untracked:  code/.ipynb_checkpoints/
    Untracked:  code/AF_out/
    Untracked:  code/BMI_S_out/
    Untracked:  code/BMI_out/
    Untracked:  code/Glucose_out/
    Untracked:  code/LDL_S_out/
    Untracked:  code/T2D_out/
    Untracked:  code/ctwas_config.R
    Untracked:  code/mapping.R
    Untracked:  code/out/
    Untracked:  code/run_AF_analysis.sbatch
    Untracked:  code/run_AF_analysis.sh
    Untracked:  code/run_AF_ctwas_rss_LDR.R
    Untracked:  code/run_BMI_analysis.sbatch
    Untracked:  code/run_BMI_analysis.sh
    Untracked:  code/run_BMI_analysis_S.sbatch
    Untracked:  code/run_BMI_analysis_S.sh
    Untracked:  code/run_BMI_ctwas_rss_LDR.R
    Untracked:  code/run_BMI_ctwas_rss_LDR_S.R
    Untracked:  code/run_Glucose_analysis.sbatch
    Untracked:  code/run_Glucose_analysis.sh
    Untracked:  code/run_Glucose_ctwas_rss_LDR.R
    Untracked:  code/run_LDL_analysis_S.sbatch
    Untracked:  code/run_LDL_analysis_S.sh
    Untracked:  code/run_LDL_ctwas_rss_LDR_S.R
    Untracked:  code/run_T2D_analysis.sbatch
    Untracked:  code/run_T2D_analysis.sh
    Untracked:  code/run_T2D_ctwas_rss_LDR.R
    Untracked:  data/.ipynb_checkpoints/
    Untracked:  data/AF/
    Untracked:  data/BMI/
    Untracked:  data/BMI_S/
    Untracked:  data/Glucose/
    Untracked:  data/LDL_S/
    Untracked:  data/T2D/
    Untracked:  data/TEST/
    Untracked:  data/UKBB/
    Untracked:  data/UKBB_SNPs_Info.text
    Untracked:  data/gene_OMIM.txt
    Untracked:  data/gene_pip_0.8.txt
    Untracked:  data/mashr_Heart_Atrial_Appendage.db
    Untracked:  data/mashr_sqtl/
    Untracked:  data/summary_known_genes_annotations.xlsx
    Untracked:  data/untitled.txt

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.


These are the previous versions of the repository in which changes were made to the R Markdown (analysis/BMI_Brain_Amygdala.Rmd) and HTML (docs/BMI_Brain_Amygdala.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File Version Author Date Message
Rmd bbf6737 sq-96 2022-02-21 update
html 91f38fa sq-96 2022-02-13 Build site.
Rmd eb13ecf sq-96 2022-02-13 update
html e6bc169 sq-96 2022-02-13 Build site.
Rmd 87fee8b sq-96 2022-02-13 update

Weight QC

#number of imputed weights
nrow(qclist_all)
[1] 10285
#number of imputed weights by chromosome
table(qclist_all$chr)

   1    2    3    4    5    6    7    8    9   10   11   12   13   14   15   16 
1009  735  594  399  502  598  475  370  404  396  626  561  223  326  336  453 
  17   18   19   20   21   22 
 604  162  819  311  117  265 
#number of imputed weights without missing variants
sum(qclist_all$nmiss==0)
[1] 8365
#proportion of imputed weights without missing variants
mean(qclist_all$nmiss==0)
[1] 0.8133

Check convergence of parameters

Version Author Date
e6bc169 sq-96 2022-02-13
#estimated group prior
estimated_group_prior <- group_prior_rec[,ncol(group_prior_rec)]
names(estimated_group_prior) <- c("gene", "snp")
estimated_group_prior["snp"] <- estimated_group_prior["snp"]*thin #adjust parameter to account for thin argument
print(estimated_group_prior)
     gene       snp 
0.0078519 0.0002945 
#estimated group prior variance
estimated_group_prior_var <- group_prior_var_rec[,ncol(group_prior_var_rec)]
names(estimated_group_prior_var) <- c("gene", "snp")
print(estimated_group_prior_var)
 gene   snp 
22.61 17.40 
#report sample size
print(sample_size)
[1] 336107
#report group size
group_size <- c(nrow(ctwas_gene_res), n_snps)
print(group_size)
[1]   10285 7535010
#estimated group PVE
estimated_group_pve <- estimated_group_prior_var*estimated_group_prior*group_size/sample_size #check PVE calculation
names(estimated_group_pve) <- c("gene", "snp")
print(estimated_group_pve)
    gene      snp 
0.005432 0.114855 
#compare sum(PIP*mu2/sample_size) with above PVE calculation
c(sum(ctwas_gene_res$PVE),sum(ctwas_snp_res$PVE))
[1]  0.1859 16.1230

Genes with highest PIPs

          genename region_tag susie_pip      mu2       PVE      z num_eqtl
7067         PPM1M       3_36    1.0000   203.85 6.065e-04  4.323        2
9534          GSAP       7_49    1.0000 31328.56 9.321e-02  5.260        1
9062          TMIE       3_33    0.9989    35.23 1.047e-04 -6.902        2
3144         CCND2       12_4    0.9639    29.10 8.345e-05 -5.120        1
2362        B3GAT1      11_84    0.9273    25.66 7.080e-05 -4.502        2
7314         CASP7      10_71    0.8414    24.92 6.240e-05  4.584        1
1743          TSC2       16_2    0.8369    30.80 7.669e-05  5.278        1
10858      SLC12A8       3_77    0.8108    22.51 5.431e-05 -4.338        1
129         CELSR3       3_34    0.8058    57.17 1.370e-04 -7.731        1
4947          SUOX      12_35    0.7929    57.59 1.359e-04 -5.807        1
6745          TAL1       1_29    0.7916    49.63 1.169e-04 -6.866        1
7155         ZNF12        7_9    0.7764    25.85 5.972e-05  4.972        2
4469          HEY2       6_84    0.7704 33571.89 7.696e-02  4.930        1
12540 RP11-823E8.3      12_54    0.7460    31.13 6.909e-05 -6.438        1
8259        EFEMP2      11_36    0.7404    97.56 2.149e-04 -7.542        2
3049        PRRC2C       1_84    0.7402    28.99 6.384e-05 -5.173        1
2893        SLC1A4       2_42    0.7264    23.46 5.069e-05 -4.047        1
7589       NCKAP5L      12_31    0.7218    48.20 1.035e-04 -8.217        1
10874        VPS52       6_28    0.7097   125.93 2.659e-04  1.606        1
12020    LINC01977      17_45    0.7064    28.57 6.004e-05  5.230        1

Genes with largest effect sizes

      genename region_tag susie_pip   mu2       PVE       z num_eqtl
9746   SLC38A3       3_35 0.000e+00 67724 0.000e+00   6.726        1
7061     CAMKV       3_35 0.000e+00 53039 0.000e+00   9.848        1
7212   CCDC171       9_13 0.000e+00 50688 0.000e+00   7.997        1
7063     MST1R       3_35 0.000e+00 34978 0.000e+00 -12.602        2
4469      HEY2       6_84 7.704e-01 33572 7.696e-02   4.930        1
8838     DHFR2       3_59 0.000e+00 32025 0.000e+00   5.146        1
9534      GSAP       7_49 1.000e+00 31329 9.321e-02   5.260        1
8841     STX19       3_59 0.000e+00 31018 0.000e+00  -5.060        1
7432      LEO1      15_21 2.077e-07 27409 1.694e-08   4.647        1
5003    LYSMD2      15_21 0.000e+00 26190 0.000e+00  -4.403        1
4997     MFAP1      15_16 1.608e-07 23703 1.134e-08   4.303        1
7058    RNF123       3_35 0.000e+00 23172 0.000e+00 -10.957        1
1259     WDR76      15_16 0.000e+00 21159 0.000e+00   4.859        2
9777      DPYD       1_60 0.000e+00 19618 0.000e+00  -3.213        1
849       MCM6       2_80 0.000e+00 17859 0.000e+00  -3.886        1
4751   TUBGCP4      15_16 0.000e+00 16922 0.000e+00   3.371        1
10116   ENTPD6      20_18 0.000e+00 16405 0.000e+00  -5.561        1
8836     NSUN3       3_59 0.000e+00 15636 0.000e+00   4.755        1
7782      ADAL      15_16 0.000e+00 14788 0.000e+00  -2.861        1
7783     LCMT2      15_16 0.000e+00 14376 0.000e+00  -3.087        2

Genes with highest PVE

          genename region_tag susie_pip      mu2       PVE      z num_eqtl
9534          GSAP       7_49    1.0000 31328.56 9.321e-02  5.260        1
4469          HEY2       6_84    0.7704 33571.89 7.696e-02  4.930        1
9966        TTC30B      2_107    0.3199   757.36 7.208e-04 -3.137        1
7067         PPM1M       3_36    1.0000   203.85 6.065e-04  4.323        2
10874        VPS52       6_28    0.7097   125.93 2.659e-04  1.606        1
8259        EFEMP2      11_36    0.7404    97.56 2.149e-04 -7.542        2
7134         SFXN1      5_105    0.0708  1012.70 2.133e-04 -3.398        1
129         CELSR3       3_34    0.8058    57.17 1.370e-04 -7.731        1
4947          SUOX      12_35    0.7929    57.59 1.359e-04 -5.807        1
6745          TAL1       1_29    0.7916    49.63 1.169e-04 -6.866        1
9062          TMIE       3_33    0.9989    35.23 1.047e-04 -6.902        2
7589       NCKAP5L      12_31    0.7218    48.20 1.035e-04 -8.217        1
9164         KCNB2       8_53    0.4664    63.18 8.767e-05 -8.057        2
3144         CCND2       12_4    0.9639    29.10 8.345e-05 -5.120        1
1743          TSC2       16_2    0.8369    30.80 7.669e-05  5.278        1
2362        B3GAT1      11_84    0.9273    25.66 7.080e-05 -4.502        2
12540 RP11-823E8.3      12_54    0.7460    31.13 6.909e-05 -6.438        1
8212         NEGR1       1_46    0.5012    45.67 6.810e-05 -8.928        1
3049        PRRC2C       1_84    0.7402    28.99 6.384e-05 -5.173        1
7314         CASP7      10_71    0.8414    24.92 6.240e-05  4.584        1

Genes with largest z scores

            genename region_tag susie_pip      mu2       PVE       z num_eqtl
7063           MST1R       3_35 0.000e+00 34978.09 0.000e+00 -12.602        2
7058          RNF123       3_35 0.000e+00 23171.66 0.000e+00 -10.957        1
5866           TAOK2      16_24 2.301e-02    95.97 6.571e-06  10.738        1
11609 RP11-1348G14.4      16_23 1.472e-01    90.57 3.967e-05  10.603        1
9939         SULT1A1      16_23 9.114e-02    89.15 2.417e-05  10.415        1
10040        SULT1A2      16_23 9.114e-02    89.15 2.417e-05 -10.415        1
7566          ZNF668      16_24 1.131e-01    77.72 2.615e-05  10.000        1
7567          ZNF646      16_24 1.131e-01    77.72 2.615e-05 -10.000        1
5192            SAE1      19_33 4.601e-03   101.25 1.386e-06   9.849        1
7061           CAMKV       3_35 0.000e+00 53039.36 0.000e+00   9.848        1
8211         C1QTNF4      11_29 3.023e-02    96.82 8.707e-06   9.834        2
439            PRSS8      16_24 1.755e-02    72.64 3.792e-06  -9.765        1
7337           RAPSN      11_29 1.110e-02    87.11 2.877e-06   9.614        1
10701            LAT      16_23 1.333e-01    85.22 3.380e-05  -9.553        1
2358           MTCH2      11_29 9.832e-03    84.51 2.472e-06  -9.514        1
11572    CTC-467M3.3       5_52 1.492e-10   355.30 1.577e-13   9.482        1
8212           NEGR1       1_46 5.012e-01    45.67 6.810e-05  -8.928        1
7336           PSMC3      11_29 1.089e-02    74.52 2.413e-06  -8.866        1
1699           MAPK3      16_24 1.422e-02    68.99 2.918e-06   8.826        1
12567          RCC1L       7_48 1.287e-01    83.58 3.199e-05  -8.667        1

Comparing z scores and PIPs

[1] 0.02129
            genename region_tag susie_pip      mu2       PVE       z num_eqtl
7063           MST1R       3_35 0.000e+00 34978.09 0.000e+00 -12.602        2
7058          RNF123       3_35 0.000e+00 23171.66 0.000e+00 -10.957        1
5866           TAOK2      16_24 2.301e-02    95.97 6.571e-06  10.738        1
11609 RP11-1348G14.4      16_23 1.472e-01    90.57 3.967e-05  10.603        1
9939         SULT1A1      16_23 9.114e-02    89.15 2.417e-05  10.415        1
10040        SULT1A2      16_23 9.114e-02    89.15 2.417e-05 -10.415        1
7566          ZNF668      16_24 1.131e-01    77.72 2.615e-05  10.000        1
7567          ZNF646      16_24 1.131e-01    77.72 2.615e-05 -10.000        1
5192            SAE1      19_33 4.601e-03   101.25 1.386e-06   9.849        1
7061           CAMKV       3_35 0.000e+00 53039.36 0.000e+00   9.848        1
8211         C1QTNF4      11_29 3.023e-02    96.82 8.707e-06   9.834        2
439            PRSS8      16_24 1.755e-02    72.64 3.792e-06  -9.765        1
7337           RAPSN      11_29 1.110e-02    87.11 2.877e-06   9.614        1
10701            LAT      16_23 1.333e-01    85.22 3.380e-05  -9.553        1
2358           MTCH2      11_29 9.832e-03    84.51 2.472e-06  -9.514        1
11572    CTC-467M3.3       5_52 1.492e-10   355.30 1.577e-13   9.482        1
8212           NEGR1       1_46 5.012e-01    45.67 6.810e-05  -8.928        1
7336           PSMC3      11_29 1.089e-02    74.52 2.413e-06  -8.866        1
1699           MAPK3      16_24 1.422e-02    68.99 2.918e-06   8.826        1
12567          RCC1L       7_48 1.287e-01    83.58 3.199e-05  -8.667        1

GO enrichment analysis for genes with PIP>0.5

#number of genes for gene set enrichment
length(genes)
[1] 38
Uploading data to Enrichr... Done.
  Querying GO_Biological_Process_2021... Done.
  Querying GO_Cellular_Component_2021... Done.
  Querying GO_Molecular_Function_2021... Done.
Parsing results... Done.
[1] "GO_Biological_Process_2021"

                                                                 Term Overlap
1     vascular associated smooth muscle cell development (GO:0097084)     2/7
2 vascular associated smooth muscle cell differentiation (GO:0035886)     2/8
3                                    aorta morphogenesis (GO:0035909)    2/17
4                            muscle tissue morphogenesis (GO:0060415)    2/17
  Adjusted.P.value       Genes
1          0.01559 EFEMP2;HEY2
2          0.01559 EFEMP2;HEY2
3          0.03744 EFEMP2;HEY2
4          0.03744 EFEMP2;HEY2
[1] "GO_Cellular_Component_2021"

[1] Term             Overlap          Adjusted.P.value Genes           
<0 rows> (or 0-length row.names)
[1] "GO_Molecular_Function_2021"

[1] Term             Overlap          Adjusted.P.value Genes           
<0 rows> (or 0-length row.names)

DisGeNET enrichment analysis for genes with PIP>0.5

                                                           Description      FDR
43                                          Acute Myeloid Leukemia, M1 0.008685
160                                    Acute Myeloid Leukemia (AML-M2) 0.008685
37                                         Leukemia, Myelocytic, Acute 0.020284
75                                          Sulfite oxidase deficiency 0.029827
148                                    DEAFNESS, AUTOSOMAL RECESSIVE 6 0.029827
165                                            RETINITIS PIGMENTOSA 42 0.029827
167                                                   Sulfocysteinuria 0.029827
171                           CUTIS LAXA, AUTOSOMAL RECESSIVE, TYPE IB 0.029827
176                                                NEMALINE MYOPATHY 8 0.029827
180 MEGALENCEPHALY-POLYMICROGYRIA-POLYDACTYLY-HYDROCEPHALUS SYNDROME 3 0.029827
    Ratio  BgRatio
43   4/19 125/9703
160  4/19 125/9703
37   4/19 173/9703
75   1/19   1/9703
148  1/19   1/9703
165  1/19   1/9703
167  1/19   1/9703
171  1/19   1/9703
176  1/19   1/9703
180  1/19   1/9703

WebGestalt enrichment analysis for genes with PIP>0.5

Loading the functional categories...
Loading the ID list...
Loading the reference list...
Performing the enrichment analysis...
Warning in oraEnrichment(interestGeneList, referenceGeneList, geneSet, minNum =
minNum, : No significant gene set is identified based on FDR 0.05!
NULL

PIP Manhattan Plot

Warning: ggrepel: 1 unlabeled data points (too many overlaps). Consider
increasing max.overlaps

Sensitivity, specificity and precision for silver standard genes

#number of genes in known annotations
print(length(known_annotations))
[1] 41
#number of genes in known annotations with imputed expression
print(sum(known_annotations %in% ctwas_gene_res$genename))
[1] 22
#significance threshold for TWAS
print(sig_thresh)
[1] 4.571
#number of ctwas genes
length(ctwas_genes)
[1] 9
#number of TWAS genes
length(twas_genes)
[1] 219
#show novel genes (ctwas genes with not in TWAS genes)
ctwas_gene_res[ctwas_gene_res$genename %in% novel_genes,report_cols]
      genename region_tag susie_pip    mu2       PVE      z num_eqtl
7067     PPM1M       3_36    1.0000 203.85 6.065e-04  4.323        2
10858  SLC12A8       3_77    0.8108  22.51 5.431e-05 -4.338        1
2362    B3GAT1      11_84    0.9273  25.66 7.080e-05 -4.502        2
#sensitivity / recall
print(sensitivity)
  ctwas    TWAS 
0.00000 0.09756 
#specificity
print(specificity)
 ctwas   TWAS 
0.9991 0.9791 
#precision / PPV
print(precision)
  ctwas    TWAS 
0.00000 0.01826 


sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Scientific Linux 7.4 (Nitrogen)

Matrix products: default
BLAS/LAPACK: /software/openblas-0.2.19-el7-x86_64/lib/libopenblas_haswellp-r0.2.19.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] readxl_1.3.1      forcats_0.5.1     stringr_1.4.0     dplyr_1.0.7      
 [5] purrr_0.3.4       readr_2.1.1       tidyr_1.1.4       tidyverse_1.3.1  
 [9] tibble_3.1.6      WebGestaltR_0.4.4 disgenet2r_0.99.2 enrichR_3.0      
[13] cowplot_1.0.0     ggplot2_3.3.5     workflowr_1.6.2  

loaded via a namespace (and not attached):
 [1] fs_1.5.2          lubridate_1.8.0   bit64_4.0.5       doParallel_1.0.16
 [5] httr_1.4.2        rprojroot_2.0.2   tools_3.6.1       backports_1.4.1  
 [9] doRNG_1.8.2       utf8_1.2.2        R6_2.5.1          vipor_0.4.5      
[13] DBI_1.1.1         colorspace_2.0-2  withr_2.4.3       ggrastr_1.0.1    
[17] tidyselect_1.1.1  bit_4.0.4         curl_4.3.2        compiler_3.6.1   
[21] git2r_0.26.1      cli_3.1.0         rvest_1.0.2       Cairo_1.5-12.2   
[25] xml2_1.3.3        labeling_0.4.2    scales_1.1.1      apcluster_1.4.8  
[29] digest_0.6.29     rmarkdown_2.11    svglite_1.2.2     pkgconfig_2.0.3  
[33] htmltools_0.5.2   dbplyr_2.1.1      fastmap_1.1.0     highr_0.9        
[37] rlang_0.4.12      rstudioapi_0.13   RSQLite_2.2.8     jquerylib_0.1.4  
[41] farver_2.1.0      generics_0.1.1    jsonlite_1.7.2    vroom_1.5.7      
[45] magrittr_2.0.1    Matrix_1.2-18     ggbeeswarm_0.6.0  Rcpp_1.0.7       
[49] munsell_0.5.0     fansi_0.5.0       gdtools_0.1.9     lifecycle_1.0.1  
[53] stringi_1.7.6     whisker_0.3-2     yaml_2.2.1        plyr_1.8.6       
[57] grid_3.6.1        blob_1.2.2        ggrepel_0.9.1     parallel_3.6.1   
[61] promises_1.0.1    crayon_1.4.2      lattice_0.20-38   haven_2.4.3      
[65] hms_1.1.1         knitr_1.36        pillar_1.6.4      igraph_1.2.10    
[69] rjson_0.2.20      rngtools_1.5.2    reshape2_1.4.4    codetools_0.2-16 
[73] reprex_2.0.1      glue_1.5.1        evaluate_0.14     data.table_1.14.2
[77] modelr_0.1.8      vctrs_0.3.8       tzdb_0.2.0        httpuv_1.5.1     
[81] foreach_1.5.1     cellranger_1.1.0  gtable_0.3.0      assertthat_0.2.1 
[85] cachem_1.0.6      xfun_0.29         broom_0.7.10      later_0.8.0      
[89] iterators_1.0.13  beeswarm_0.2.3    memoise_2.0.1     ellipsis_0.3.2