Last updated: 2022-05-12

Checks: 5 2

Knit directory: cTWAS_analysis/

This reproducible R Markdown analysis was created with workflowr (version 1.6.2). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.


The R Markdown file has unstaged changes. To know which version of the R Markdown file created these results, you’ll want to first commit it to the Git repo. If you’re still working on the analysis, you can ignore this warning. When you’re finished, you can run wflow_publish to commit the R Markdown file and build the HTML.

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

The command set.seed(20211220) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

Using absolute paths to the files within your workflowr project makes it difficult for you and others to run your code on a different machine. Change the absolute path(s) below to the suggested relative path(s) to make your code more reproducible.

absolute relative
/project2/xinhe/shengqian/cTWAS/cTWAS_analysis/data/ data
/project2/xinhe/shengqian/cTWAS/cTWAS_analysis/code/ctwas_config.R code/ctwas_config.R

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version 011327d. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .Rhistory
    Ignored:    .ipynb_checkpoints/
    Ignored:    data/AF/

Untracked files:
    Untracked:  G_list.RData
    Untracked:  Rplot.png
    Untracked:  SCZ_annotation.xlsx
    Untracked:  analysis/.ipynb_checkpoints/
    Untracked:  code/.ipynb_checkpoints/
    Untracked:  code/AF_out/
    Untracked:  code/Autism_out/
    Untracked:  code/BMI_S_out/
    Untracked:  code/BMI_out/
    Untracked:  code/Glucose_out/
    Untracked:  code/LDL_S_out/
    Untracked:  code/SCZ_2014_EUR_out/
    Untracked:  code/SCZ_2018_S_out/
    Untracked:  code/SCZ_2018_out/
    Untracked:  code/SCZ_2020_Single_out/
    Untracked:  code/SCZ_2020_out/
    Untracked:  code/SCZ_S_out/
    Untracked:  code/SCZ_out/
    Untracked:  code/T2D_out/
    Untracked:  code/ctwas_config.R
    Untracked:  code/mapping.R
    Untracked:  code/out/
    Untracked:  code/process_scz_2018_snps.R
    Untracked:  code/run_AF_analysis.sbatch
    Untracked:  code/run_AF_analysis.sh
    Untracked:  code/run_AF_ctwas_rss_LDR.R
    Untracked:  code/run_Autism_analysis.sbatch
    Untracked:  code/run_Autism_analysis.sh
    Untracked:  code/run_Autism_ctwas_rss_LDR.R
    Untracked:  code/run_BMI_analysis.sbatch
    Untracked:  code/run_BMI_analysis.sh
    Untracked:  code/run_BMI_analysis_S.sbatch
    Untracked:  code/run_BMI_analysis_S.sh
    Untracked:  code/run_BMI_ctwas_rss_LDR.R
    Untracked:  code/run_BMI_ctwas_rss_LDR_S.R
    Untracked:  code/run_Glucose_analysis.sbatch
    Untracked:  code/run_Glucose_analysis.sh
    Untracked:  code/run_Glucose_ctwas_rss_LDR.R
    Untracked:  code/run_LDL_analysis_S.sbatch
    Untracked:  code/run_LDL_analysis_S.sh
    Untracked:  code/run_LDL_ctwas_rss_LDR_S.R
    Untracked:  code/run_SCZ_2014_EUR_analysis.sbatch
    Untracked:  code/run_SCZ_2014_EUR_analysis.sh
    Untracked:  code/run_SCZ_2014_EUR_ctwas_rss_LDR.R
    Untracked:  code/run_SCZ_2018_analysis.sbatch
    Untracked:  code/run_SCZ_2018_analysis.sh
    Untracked:  code/run_SCZ_2018_analysis_S.sbatch
    Untracked:  code/run_SCZ_2018_analysis_S.sh
    Untracked:  code/run_SCZ_2018_ctwas_rss_LDR.R
    Untracked:  code/run_SCZ_2018_ctwas_rss_LDR_S.R
    Untracked:  code/run_SCZ_2020_Single_analysis.sbatch
    Untracked:  code/run_SCZ_2020_Single_analysis.sh
    Untracked:  code/run_SCZ_2020_Single_ctwas_rss_LDR.R
    Untracked:  code/run_SCZ_2020_analysis.sbatch
    Untracked:  code/run_SCZ_2020_analysis.sh
    Untracked:  code/run_SCZ_2020_ctwas_rss_LDR.R
    Untracked:  code/run_SCZ_analysis.sbatch
    Untracked:  code/run_SCZ_analysis.sh
    Untracked:  code/run_SCZ_analysis_S.sbatch
    Untracked:  code/run_SCZ_analysis_S.sh
    Untracked:  code/run_SCZ_ctwas_rss_LDR.R
    Untracked:  code/run_SCZ_ctwas_rss_LDR_S.R
    Untracked:  code/run_T2D_analysis.sbatch
    Untracked:  code/run_T2D_analysis.sh
    Untracked:  code/run_T2D_ctwas_rss_LDR.R
    Untracked:  code/wflow_build.R
    Untracked:  code/wflow_build.sbatch
    Untracked:  data/.ipynb_checkpoints/
    Untracked:  data/BMI/
    Untracked:  data/GO_Terms/
    Untracked:  data/PGC3_SCZ_wave3_public.v2.tsv
    Untracked:  data/SCZ/
    Untracked:  data/SCZ_2014_EUR/
    Untracked:  data/SCZ_2018/
    Untracked:  data/SCZ_2018_S/
    Untracked:  data/SCZ_2020/
    Untracked:  data/SCZ_2020_Single/
    Untracked:  data/SCZ_S/
    Untracked:  data/Supplementary Table 15 - MAGMA.xlsx
    Untracked:  data/Supplementary Table 20 - Prioritised Genes.xlsx
    Untracked:  data/T2D/
    Untracked:  data/UKBB/
    Untracked:  data/UKBB_SNPs_Info.text
    Untracked:  data/gene_OMIM.txt
    Untracked:  data/gene_pip_0.8.txt
    Untracked:  data/mashr_Heart_Atrial_Appendage.db
    Untracked:  data/mashr_sqtl/
    Untracked:  data/scz_2018.RDS
    Untracked:  data/summary_known_genes_annotations.xlsx
    Untracked:  data/untitled.txt
    Untracked:  top_genes_32.txt
    Untracked:  top_genes_37.txt
    Untracked:  top_genes_43.txt
    Untracked:  top_genes_81.txt
    Untracked:  z_snp_pos_SCZ.RData
    Untracked:  z_snp_pos_SCZ_2014_EUR.RData
    Untracked:  z_snp_pos_SCZ_2018.RData
    Untracked:  z_snp_pos_SCZ_2020.RData

Unstaged changes:
    Deleted:    analysis/BMI_S_results.Rmd
    Modified:   analysis/SCZ_2018_Brain_Amygdala_S.Rmd
    Modified:   analysis/SCZ_2018_Brain_Anterior_cingulate_cortex_BA24_S.Rmd
    Modified:   analysis/SCZ_2018_Brain_Caudate_basal_ganglia_S.Rmd
    Modified:   analysis/SCZ_2018_Brain_Cerebellar_Hemisphere_S.Rmd
    Modified:   analysis/SCZ_2018_Brain_Cerebellum_S.Rmd
    Modified:   analysis/SCZ_2018_Brain_Cortex_S.Rmd
    Modified:   analysis/SCZ_2018_Brain_Frontal_Cortex_BA9_S.Rmd
    Modified:   analysis/SCZ_2018_Brain_Hippocampus_S.Rmd
    Modified:   analysis/SCZ_2018_Brain_Hypothalamus_S.Rmd
    Modified:   analysis/SCZ_2018_Brain_Nucleus_accumbens_basal_ganglia_S.Rmd
    Modified:   analysis/SCZ_2018_Brain_Putamen_basal_ganglia_S.Rmd
    Modified:   analysis/SCZ_2018_Brain_Spinal_cord_cervical_c-1_S.Rmd
    Modified:   analysis/SCZ_2018_Brain_Substantia_nigra_S.Rmd

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.


These are the previous versions of the repository in which changes were made to the R Markdown (analysis/SCZ_2018_Brain_Cerebellum_S.Rmd) and HTML (docs/SCZ_2018_Brain_Cerebellum_S.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File Version Author Date Message
html 011327d sq-96 2022-05-12 update
Rmd 6c6abbd sq-96 2022-05-12 update

library(reticulate)
use_python("/scratch/midway2/shengqian/miniconda3/envs/PythonForR/bin/python",required=T)

Weight QC

#number of imputed weights
nrow(qclist_all)
[1] 27353
#number of imputed weights by chromosome
table(qclist_all$chr)

   1    2    3    4    5    6    7    8    9   10   11   12   13   14   15   16 
2535 1830 1661  982 1135 1371 1536  916 1175 1171 1678 1471  543  971  987 1200 
  17   18   19   20   21   22 
1981  337 2002  917   48  906 
#number of imputed weights without missing variants
sum(qclist_all$nmiss==0)
[1] 23734
#proportion of imputed weights without missing variants
mean(qclist_all$nmiss==0)
[1] 0.8677
finish

Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union

Check convergence of parameters

     gene       snp 
0.0088825 0.0002955 
 gene   snp 
12.43 10.04 
[1] 105318
[1]    8177 6309950
    gene      snp 
0.008573 0.177721 
[1] 0.0304 1.0607

Genes with highest PIPs

      genename region_tag susie_pip   mu2       PVE      z num_intron num_sqtl
3737      LRP8       1_33    1.2179 33.07 0.0003523 -4.820         10       11
5502    R3HDM2      12_36    1.1443 44.06 0.0004924  6.634         10       12
3555     LAMA5      20_36    1.1434 23.61 0.0002517  4.603         24       38
7678     WDR27      6_111    1.0487 17.72 0.0001014 -2.341         29       37
2759    GIGYF1       7_62    0.9748 26.79 0.0002375 -5.266          5        5
4202    MRPS33       7_87    0.9654 20.31 0.0001744 -4.304          6        6
262       AKT3      1_128    0.9580 35.61 0.0002979  6.350          5        5
7812   ZDHHC20       13_2    0.9572 24.94 0.0002118 -4.784          3        4
4567    NPIPA1      16_15    0.9556 24.97 0.0002096  4.689          3        3
3628 LINC00320       21_6    0.9542 29.24 0.0002419 -5.336          3        3
6117     SF3B1      2_117    0.9478 45.88 0.0003746  7.053          5        5
4791      PAK6      15_14    0.9449 30.33 0.0002506 -5.588          3        3
1654     CRTAP       3_24    0.9010 19.87 0.0001503  3.929          3        3
1128    CCDC57      17_47    0.8904 20.00 0.0001041  3.022         36       46
7314   TSNARE1       8_93    0.8894 34.70 0.0002087  6.287         10       12
1517      COA8      14_54    0.8857 43.21 0.0003125  7.429          6        7
5488   PYROXD2      10_62    0.8732 20.71 0.0001347 -3.755         12       14
4823      PATJ       1_39    0.8686 23.29 0.0001371 -2.798         16       19
324     ANAPC7      12_67    0.8369 37.61 0.0002240  6.385          7        7
4569  NPIPB14P      16_37    0.8337 18.72 0.0001125 -3.795         15       19
603     ATP2B2        3_8    0.8241 26.05 0.0001568  4.229          7        8
666     B3GAT1      11_84    0.8157 23.68 0.0001377  4.324          6        9
4643     NTRK3      15_41    0.8046 24.66 0.0001392  4.457          2        2
1073     CBWD1        9_1    0.8033 20.46 0.0001186  4.060          3        4
2554     FGFR1       8_34    0.8002 37.26 0.0001970 -6.046         10       12

Genes with highest PVE

      genename region_tag susie_pip    mu2       PVE      z num_intron num_sqtl
425       APOM       6_26    0.3686 623.03 0.0008033 11.590          2        2
826     BTN3A1       6_20    0.7393 146.39 0.0006649 13.091          8        8
5502    R3HDM2      12_36    1.1443  44.06 0.0004924  6.634         10       12
6117     SF3B1      2_117    0.9478  45.88 0.0003746  7.053          5        5
3737      LRP8       1_33    1.2179  33.07 0.0003523 -4.820         10       11
1517      COA8      14_54    0.8857  43.21 0.0003125  7.429          6        7
1275     CENPM      22_17    0.7509  57.80 0.0003094 -6.506          1        1
262       AKT3      1_128    0.9580  35.61 0.0002979  6.350          5        5
3555     LAMA5      20_36    1.1434  23.61 0.0002517  4.603         24       38
4791      PAK6      15_14    0.9449  30.33 0.0002506 -5.588          3        3
3628 LINC00320       21_6    0.9542  29.24 0.0002419 -5.336          3        3
2759    GIGYF1       7_62    0.9748  26.79 0.0002375 -5.266          5        5
7645      VWA7       6_26    0.1940 627.25 0.0002242 11.553          1        1
324     ANAPC7      12_67    0.8369  37.61 0.0002240  6.385          7        7
7812   ZDHHC20       13_2    0.9572  24.94 0.0002118 -4.784          3        4
4567    NPIPA1      16_15    0.9556  24.97 0.0002096  4.689          3        3
7314   TSNARE1       8_93    0.8894  34.70 0.0002087  6.287         10       12
2554     FGFR1       8_34    0.8002  37.26 0.0001970 -6.046         10       12
7240    TRANK1       3_27    0.7490  39.04 0.0001917 -6.365          8        8
1421     CLCN3      4_110    0.7913  29.64 0.0001762  5.470          1        2

Comparing z scores and PIPs

[1] 0.02091
     genename region_tag susie_pip    mu2       PVE       z num_intron num_sqtl
826    BTN3A1       6_20 7.393e-01 146.39 6.649e-04  13.091          8        8
4952    PGBD1       6_22 1.007e-01 160.95 7.079e-06  13.087          5        6
425      APOM       6_26 3.686e-01 623.03 8.033e-04  11.590          2        2
7645     VWA7       6_26 1.940e-01 627.25 2.242e-04  11.553          1        1
7578    VARS1       6_26 1.402e-01 623.95 1.165e-04 -11.548          2        2
4216     MSH5       6_26 1.588e-01 627.91 1.503e-04 -11.538          3        3
1834     DDR1       6_25 1.570e-01 105.86 2.456e-05 -11.175          3        3
7579    VARS2       6_25 1.118e-01 104.74 1.206e-05  11.137          2        2
925  C6orf136       6_25 7.591e-02  87.21 4.771e-06 -11.031          2        2
2587    FLOT1       6_25 1.547e-01  87.22 1.952e-05 -10.981          7        7
827    BTN3A2       6_20 1.644e-01  94.96 1.183e-05 -10.694          3        5
2816     GNL1       6_25 2.920e-03  78.25 6.334e-09 -10.645          1        1
7265   TRIM39       6_25 7.839e-03  82.27 4.800e-08 -10.616          1        1
686      BAG6       6_26 2.982e-09 498.08 4.206e-20  10.247          7        8
5293     PPT2       6_26 7.799e-12 464.25 2.681e-25 -10.061         10       12
5362    PRRT1       6_26 2.706e-12 462.51 3.216e-26 -10.018          1        1
2884    GPSM3       6_26 8.360e-14 414.68 2.752e-29  -9.377          1        1
1152   CCHCR1       6_25 4.718e-02  69.57 6.124e-07  -9.358         17       30
7175     TNXB       6_26 1.527e-13 452.13 1.000e-28   9.001          6        7
8165  ZSCAN26       6_22 6.731e-02  53.73 1.605e-06   8.672          6        6

GO enrichment analysis for genes with PIP>0.5

#number of genes for gene set enrichment
length(genes)
[1] 109
Uploading data to Enrichr... Done.
  Querying GO_Biological_Process_2021... Done.
  Querying GO_Cellular_Component_2021... Done.
  Querying GO_Molecular_Function_2021... Done.
Parsing results... Done.
[1] "GO_Biological_Process_2021"

                                                  Term Overlap Adjusted.P.value
1 morphogenesis of a polarized epithelium (GO:0001738)    3/12          0.03045
             Genes
1 AHI1;LAMA5;ACTG1
[1] "GO_Cellular_Component_2021"

[1] Term             Overlap          Adjusted.P.value Genes           
<0 rows> (or 0-length row.names)
[1] "GO_Molecular_Function_2021"

[1] Term             Overlap          Adjusted.P.value Genes           
<0 rows> (or 0-length row.names)

DisGeNET enrichment analysis for genes with PIP>0.5

                           Description     FDR Ratio  BgRatio
62                              Glioma 0.04401  4/61  87/9703
90                             Measles 0.04401  1/61   1/9703
156      Electroencephalogram abnormal 0.04401  1/61   1/9703
160                        Polydactyly 0.04401  4/61 117/9703
196                Short upturned nose 0.04401  1/61   1/9703
199                      mixed gliomas 0.04401  4/61  70/9703
219      Hypoglycemia, leucine-induced 0.04401  1/61   1/9703
278 Interfrontal craniofaciosynostosis 0.04401  1/61   1/9703
279            Osteoglophonic dwarfism 0.04401  1/61   1/9703
291                   Malignant Glioma 0.04401  4/61  70/9703

WebGestalt enrichment analysis for genes with PIP>0.5

Warning: replacing previous import 'lifecycle::last_warnings' by
'rlang::last_warnings' when loading 'hms'
Loading the functional categories...
Loading the ID list...
Loading the reference list...
Performing the enrichment analysis...
Warning in oraEnrichment(interestGeneList, referenceGeneList, geneSet, minNum =
minNum, : No significant gene set is identified based on FDR 0.05!
NULL

PIP Manhattan Plot

Warning: ggrepel: 67 unlabeled data points (too many overlaps). Consider
increasing max.overlaps

Sensitivity, specificity and precision for silver standard genes

#number of genes in known annotations
print(length(known_annotations))
[1] 130
#number of genes in known annotations with imputed expression
print(sum(known_annotations %in% ctwas_gene_res$genename))
[1] 60
#significance threshold for TWAS
print(sig_thresh)
[1] 4.522
#number of ctwas genes
length(ctwas_genes)
[1] 25
#number of TWAS genes
length(twas_genes)
[1] 171
#show novel genes (ctwas genes with not in TWAS genes)
ctwas_gene_res[ctwas_gene_res$genename %in% novel_genes,report_cols]
     genename region_tag susie_pip   mu2       PVE      z num_intron num_sqtl
603    ATP2B2        3_8    0.8241 26.05 0.0001568  4.229          7        8
666    B3GAT1      11_84    0.8157 23.68 0.0001377  4.324          6        9
1073    CBWD1        9_1    0.8033 20.46 0.0001186  4.060          3        4
1128   CCDC57      17_47    0.8904 20.00 0.0001041  3.022         36       46
1654    CRTAP       3_24    0.9010 19.87 0.0001503  3.929          3        3
4202   MRPS33       7_87    0.9654 20.31 0.0001744 -4.304          6        6
4569 NPIPB14P      16_37    0.8337 18.72 0.0001125 -3.795         15       19
4643    NTRK3      15_41    0.8046 24.66 0.0001392  4.457          2        2
4823     PATJ       1_39    0.8686 23.29 0.0001371 -2.798         16       19
5488  PYROXD2      10_62    0.8732 20.71 0.0001347 -3.755         12       14
7678    WDR27      6_111    1.0487 17.72 0.0001014 -2.341         29       37
#sensitivity / recall
print(sensitivity)
  ctwas    TWAS 
0.04615 0.18462 
#specificity
print(specificity)
 ctwas   TWAS 
0.9977 0.9819 
#precision / PPV
print(precision)
 ctwas   TWAS 
0.2400 0.1404 

sessionInfo()
R version 4.1.0 (2021-05-18)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Scientific Linux 7.4 (Nitrogen)

Matrix products: default
BLAS/LAPACK: /software/openblas-0.3.13-el7-x86_64/lib/libopenblas_haswellp-r0.3.13.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] readxl_1.4.0      forcats_0.5.1     stringr_1.4.0     purrr_0.3.4      
 [5] readr_1.4.0       tidyr_1.1.3       tidyverse_1.3.1   tibble_3.1.7     
 [9] WebGestaltR_0.4.4 disgenet2r_0.99.2 enrichR_3.0       cowplot_1.1.1    
[13] ggplot2_3.3.5     dplyr_1.0.7       reticulate_1.20   workflowr_1.6.2  

loaded via a namespace (and not attached):
 [1] fs_1.5.0          lubridate_1.7.10  doParallel_1.0.16 httr_1.4.2       
 [5] rprojroot_2.0.2   tools_4.1.0       backports_1.2.1   doRNG_1.8.2      
 [9] bslib_0.2.5.1     utf8_1.2.1        R6_2.5.0          vipor_0.4.5      
[13] DBI_1.1.1         colorspace_2.0-2  withr_2.4.2       ggrastr_1.0.1    
[17] tidyselect_1.1.1  curl_4.3.2        compiler_4.1.0    git2r_0.28.0     
[21] rvest_1.0.0       cli_3.0.0         Cairo_1.5-15      xml2_1.3.2       
[25] labeling_0.4.2    sass_0.4.0        scales_1.1.1      systemfonts_1.0.4
[29] apcluster_1.4.9   digest_0.6.27     rmarkdown_2.9     svglite_2.0.0    
[33] pkgconfig_2.0.3   htmltools_0.5.1.1 dbplyr_2.1.1      highr_0.9        
[37] rlang_1.0.2       rstudioapi_0.13   jquerylib_0.1.4   farver_2.1.0     
[41] generics_0.1.0    jsonlite_1.7.2    magrittr_2.0.1    Matrix_1.3-3     
[45] ggbeeswarm_0.6.0  Rcpp_1.0.7        munsell_0.5.0     fansi_0.5.0      
[49] lifecycle_1.0.0   stringi_1.6.2     whisker_0.4       yaml_2.2.1       
[53] plyr_1.8.6        grid_4.1.0        ggrepel_0.9.1     parallel_4.1.0   
[57] promises_1.2.0.1  crayon_1.4.1      lattice_0.20-44   haven_2.4.1      
[61] hms_1.1.0         knitr_1.33        pillar_1.7.0      igraph_1.2.6     
[65] rjson_0.2.20      rngtools_1.5      reshape2_1.4.4    codetools_0.2-18 
[69] reprex_2.0.0      glue_1.4.2        evaluate_0.14     data.table_1.14.0
[73] modelr_0.1.8      png_0.1-7         vctrs_0.3.8       httpuv_1.6.1     
[77] foreach_1.5.1     cellranger_1.1.0  gtable_0.3.0      assertthat_0.2.1 
[81] xfun_0.24         broom_0.7.8       later_1.2.0       iterators_1.0.13 
[85] beeswarm_0.4.0    ellipsis_0.3.2