Last updated: 2022-05-12

Checks: 5 2

Knit directory: cTWAS_analysis/

This reproducible R Markdown analysis was created with workflowr (version 1.6.2). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.


The R Markdown file has unstaged changes. To know which version of the R Markdown file created these results, you’ll want to first commit it to the Git repo. If you’re still working on the analysis, you can ignore this warning. When you’re finished, you can run wflow_publish to commit the R Markdown file and build the HTML.

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

The command set.seed(20211220) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

Using absolute paths to the files within your workflowr project makes it difficult for you and others to run your code on a different machine. Change the absolute path(s) below to the suggested relative path(s) to make your code more reproducible.

absolute relative
/project2/xinhe/shengqian/cTWAS/cTWAS_analysis/data/ data
/project2/xinhe/shengqian/cTWAS/cTWAS_analysis/code/ctwas_config.R code/ctwas_config.R

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version 011327d. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .Rhistory
    Ignored:    .ipynb_checkpoints/
    Ignored:    data/AF/

Untracked files:
    Untracked:  G_list.RData
    Untracked:  Rplot.png
    Untracked:  SCZ_annotation.xlsx
    Untracked:  analysis/.ipynb_checkpoints/
    Untracked:  code/.ipynb_checkpoints/
    Untracked:  code/AF_out/
    Untracked:  code/Autism_out/
    Untracked:  code/BMI_S_out/
    Untracked:  code/BMI_out/
    Untracked:  code/Glucose_out/
    Untracked:  code/LDL_S_out/
    Untracked:  code/SCZ_2014_EUR_out/
    Untracked:  code/SCZ_2018_S_out/
    Untracked:  code/SCZ_2018_out/
    Untracked:  code/SCZ_2020_Single_out/
    Untracked:  code/SCZ_2020_out/
    Untracked:  code/SCZ_S_out/
    Untracked:  code/SCZ_out/
    Untracked:  code/T2D_out/
    Untracked:  code/ctwas_config.R
    Untracked:  code/mapping.R
    Untracked:  code/out/
    Untracked:  code/process_scz_2018_snps.R
    Untracked:  code/run_AF_analysis.sbatch
    Untracked:  code/run_AF_analysis.sh
    Untracked:  code/run_AF_ctwas_rss_LDR.R
    Untracked:  code/run_Autism_analysis.sbatch
    Untracked:  code/run_Autism_analysis.sh
    Untracked:  code/run_Autism_ctwas_rss_LDR.R
    Untracked:  code/run_BMI_analysis.sbatch
    Untracked:  code/run_BMI_analysis.sh
    Untracked:  code/run_BMI_analysis_S.sbatch
    Untracked:  code/run_BMI_analysis_S.sh
    Untracked:  code/run_BMI_ctwas_rss_LDR.R
    Untracked:  code/run_BMI_ctwas_rss_LDR_S.R
    Untracked:  code/run_Glucose_analysis.sbatch
    Untracked:  code/run_Glucose_analysis.sh
    Untracked:  code/run_Glucose_ctwas_rss_LDR.R
    Untracked:  code/run_LDL_analysis_S.sbatch
    Untracked:  code/run_LDL_analysis_S.sh
    Untracked:  code/run_LDL_ctwas_rss_LDR_S.R
    Untracked:  code/run_SCZ_2014_EUR_analysis.sbatch
    Untracked:  code/run_SCZ_2014_EUR_analysis.sh
    Untracked:  code/run_SCZ_2014_EUR_ctwas_rss_LDR.R
    Untracked:  code/run_SCZ_2018_analysis.sbatch
    Untracked:  code/run_SCZ_2018_analysis.sh
    Untracked:  code/run_SCZ_2018_analysis_S.sbatch
    Untracked:  code/run_SCZ_2018_analysis_S.sh
    Untracked:  code/run_SCZ_2018_ctwas_rss_LDR.R
    Untracked:  code/run_SCZ_2018_ctwas_rss_LDR_S.R
    Untracked:  code/run_SCZ_2020_Single_analysis.sbatch
    Untracked:  code/run_SCZ_2020_Single_analysis.sh
    Untracked:  code/run_SCZ_2020_Single_ctwas_rss_LDR.R
    Untracked:  code/run_SCZ_2020_analysis.sbatch
    Untracked:  code/run_SCZ_2020_analysis.sh
    Untracked:  code/run_SCZ_2020_ctwas_rss_LDR.R
    Untracked:  code/run_SCZ_analysis.sbatch
    Untracked:  code/run_SCZ_analysis.sh
    Untracked:  code/run_SCZ_analysis_S.sbatch
    Untracked:  code/run_SCZ_analysis_S.sh
    Untracked:  code/run_SCZ_ctwas_rss_LDR.R
    Untracked:  code/run_SCZ_ctwas_rss_LDR_S.R
    Untracked:  code/run_T2D_analysis.sbatch
    Untracked:  code/run_T2D_analysis.sh
    Untracked:  code/run_T2D_ctwas_rss_LDR.R
    Untracked:  code/wflow_build.R
    Untracked:  code/wflow_build.sbatch
    Untracked:  data/.ipynb_checkpoints/
    Untracked:  data/BMI/
    Untracked:  data/GO_Terms/
    Untracked:  data/PGC3_SCZ_wave3_public.v2.tsv
    Untracked:  data/SCZ/
    Untracked:  data/SCZ_2014_EUR/
    Untracked:  data/SCZ_2018/
    Untracked:  data/SCZ_2018_S/
    Untracked:  data/SCZ_2020/
    Untracked:  data/SCZ_2020_Single/
    Untracked:  data/SCZ_S/
    Untracked:  data/Supplementary Table 15 - MAGMA.xlsx
    Untracked:  data/Supplementary Table 20 - Prioritised Genes.xlsx
    Untracked:  data/T2D/
    Untracked:  data/UKBB/
    Untracked:  data/UKBB_SNPs_Info.text
    Untracked:  data/gene_OMIM.txt
    Untracked:  data/gene_pip_0.8.txt
    Untracked:  data/mashr_Heart_Atrial_Appendage.db
    Untracked:  data/mashr_sqtl/
    Untracked:  data/scz_2018.RDS
    Untracked:  data/summary_known_genes_annotations.xlsx
    Untracked:  data/untitled.txt
    Untracked:  top_genes_32.txt
    Untracked:  top_genes_37.txt
    Untracked:  top_genes_43.txt
    Untracked:  top_genes_81.txt
    Untracked:  z_snp_pos_SCZ.RData
    Untracked:  z_snp_pos_SCZ_2014_EUR.RData
    Untracked:  z_snp_pos_SCZ_2018.RData
    Untracked:  z_snp_pos_SCZ_2020.RData

Unstaged changes:
    Deleted:    analysis/BMI_S_results.Rmd
    Modified:   analysis/SCZ_2018_Brain_Amygdala_S.Rmd
    Modified:   analysis/SCZ_2018_Brain_Anterior_cingulate_cortex_BA24_S.Rmd
    Modified:   analysis/SCZ_2018_Brain_Caudate_basal_ganglia_S.Rmd
    Modified:   analysis/SCZ_2018_Brain_Cerebellar_Hemisphere_S.Rmd
    Modified:   analysis/SCZ_2018_Brain_Cerebellum_S.Rmd
    Modified:   analysis/SCZ_2018_Brain_Cortex_S.Rmd
    Modified:   analysis/SCZ_2018_Brain_Frontal_Cortex_BA9_S.Rmd
    Modified:   analysis/SCZ_2018_Brain_Hippocampus_S.Rmd
    Modified:   analysis/SCZ_2018_Brain_Hypothalamus_S.Rmd
    Modified:   analysis/SCZ_2018_Brain_Nucleus_accumbens_basal_ganglia_S.Rmd
    Modified:   analysis/SCZ_2018_Brain_Putamen_basal_ganglia_S.Rmd
    Modified:   analysis/SCZ_2018_Brain_Spinal_cord_cervical_c-1_S.Rmd
    Modified:   analysis/SCZ_2018_Brain_Substantia_nigra_S.Rmd

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.


These are the previous versions of the repository in which changes were made to the R Markdown (analysis/SCZ_2018_Brain_Amygdala_S.Rmd) and HTML (docs/SCZ_2018_Brain_Amygdala_S.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File Version Author Date Message
html 011327d sq-96 2022-05-12 update
Rmd 6c6abbd sq-96 2022-05-12 update

library(reticulate)
use_python("/scratch/midway2/shengqian/miniconda3/envs/PythonForR/bin/python",required=T)

Weight QC

#number of imputed weights
nrow(qclist_all)
[1] 15774
#number of imputed weights by chromosome
table(qclist_all$chr)

   1    2    3    4    5    6    7    8    9   10   11   12   13   14   15   16 
1480 1078  891  631  654  819  931  556  642  735  948  858  322  583  532  618 
  17   18   19   20   21   22 
1087  198 1146  543   34  488 
#number of imputed weights without missing variants
sum(qclist_all$nmiss==0)
[1] 14087
#proportion of imputed weights without missing variants
mean(qclist_all$nmiss==0)
[1] 0.8931
finish

Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union

Check convergence of parameters

     gene       snp 
0.0048321 0.0003214 
 gene   snp 
 9.84 10.49 
[1] 105318
[1]    6521 6309950
    gene      snp 
0.002944 0.202078 
[1] 0.007243 1.090103

Genes with highest PIPs

      genename region_tag susie_pip   mu2       PVE      z num_intron num_sqtl
2879 LINC00320       21_6    1.0610 28.32 2.926e-04  5.336          6        6
2988      LRP8       1_33    0.9257 24.92 1.962e-04 -4.820          3        3
1218      COA8      14_54    0.8221 43.94 2.808e-04 -7.429          5        6
6271   ZDHHC20       13_2    0.7562 24.63 1.313e-04 -4.784          3        4
2825     LAMA5      20_37    0.7084 30.04 1.381e-04 -4.371         10       14
1635    DPYSL3       5_86    0.7025 25.41 1.190e-04 -4.157          1        1
597    BDNF-AS      11_19    0.6948 23.25 1.066e-04 -4.348          1        1
4083     PLCB2      15_14    0.6903 25.17 9.585e-05 -4.470          5        5
1513      DGKZ      11_28    0.6889 46.65 2.102e-04  7.216          2        2
537     B3GAT1      11_84    0.6636 25.37 9.490e-05  4.345          7       11
4406   PYROXD2      10_62    0.6365 24.28 8.561e-05  3.755         10       10
98      ACTR1B       2_57    0.6179 21.59 7.826e-05  3.978          3        3
1433     DBF4B      17_26    0.5949 20.61 6.733e-05 -3.890          4        4
3740     NTRK3      15_41    0.5946 22.72 7.628e-05 -4.457          1        1
5637     TMED4       7_32    0.5724 22.25 6.781e-05 -4.862          3        3
1574    DNAJB1      19_12    0.5641 19.50 5.863e-05  3.988          2        2
748    C2orf80      2_123    0.5393 25.65 5.402e-05 -3.011         10       12
214       AKT3      1_128    0.5206 34.14 8.239e-05  6.266          4        4
6345    ZNF211      19_39    0.5206 22.95 5.737e-05 -3.624          4        5
2350   GUSBP11       22_6    0.4888 19.24 3.703e-05  2.862         13       16

Genes with highest PVE

      genename region_tag susie_pip   mu2       PVE      z num_intron num_sqtl
2879 LINC00320       21_6    1.0610 28.32 2.926e-04  5.336          6        6
1218      COA8      14_54    0.8221 43.94 2.808e-04 -7.429          5        6
1513      DGKZ      11_28    0.6889 46.65 2.102e-04  7.216          2        2
2988      LRP8       1_33    0.9257 24.92 1.962e-04 -4.820          3        3
2825     LAMA5      20_37    0.7084 30.04 1.381e-04 -4.371         10       14
6271   ZDHHC20       13_2    0.7562 24.63 1.313e-04 -4.784          3        4
1635    DPYSL3       5_86    0.7025 25.41 1.190e-04 -4.157          1        1
597    BDNF-AS      11_19    0.6948 23.25 1.066e-04 -4.348          1        1
4083     PLCB2      15_14    0.6903 25.17 9.585e-05 -4.470          5        5
537     B3GAT1      11_84    0.6636 25.37 9.490e-05  4.345          7       11
4406   PYROXD2      10_62    0.6365 24.28 8.561e-05  3.755         10       10
3729     NT5C2      10_66    0.4472 46.77 8.490e-05 -8.511          7        9
214       AKT3      1_128    0.5206 34.14 8.239e-05  6.266          4        4
4907     SF3B1      2_117    0.4467 43.83 8.117e-05  7.002          2        2
98      ACTR1B       2_57    0.6179 21.59 7.826e-05  3.978          3        3
3740     NTRK3      15_41    0.5946 22.72 7.628e-05 -4.457          1        1
5637     TMED4       7_32    0.5724 22.25 6.781e-05 -4.862          3        3
492     ATP2B2        3_8    0.4886 31.83 6.736e-05  4.229          3        3
1433     DBF4B      17_26    0.5949 20.61 6.733e-05 -3.890          4        4
3677    NPEPL1      20_34    0.4774 34.73 6.443e-05  3.996         11       13

Comparing z scores and PIPs

[1] 0.01641
     genename region_tag susie_pip    mu2       PVE       z num_intron num_sqtl
3990    PGBD1       6_22 1.867e-02 157.22 2.841e-07 -13.087          2        3
348      APOM       6_26 2.029e-01 123.81 4.840e-05  11.541          1        1
1459     DDR1       6_26 1.524e-02 119.73 2.641e-07 -11.175          2        2
464     ATAT1       6_24 2.265e-02  79.41 3.868e-07  11.039          1        1
761  C6orf136       6_24 4.367e-02  79.12 1.433e-06 -11.031          2        2
2059    FLOT1       6_24 1.053e-01  77.80 8.177e-06 -10.981          6        6
554      BAG6       6_26 5.880e-04 108.10 3.549e-10 -10.247          5        7
4657     RNF5       6_26 6.016e-05  96.37 3.312e-12  -9.714          1        1
932    CCHCR1       6_26 8.126e-10  89.72 5.578e-22  -9.376          8       12
3729    NT5C2      10_66 4.472e-01  46.77 8.490e-05  -8.511          7        9
3053   MAD1L1        7_3 2.348e-01  63.35 2.501e-05  -8.215          3        3
2454    HLA-F       6_23 3.791e-02  61.03 6.413e-07  -8.066          2        3
2196   GIGYF2      2_137 3.745e-01  50.80 5.226e-05  -7.841          4        4
6514  ZSCAN26       6_22 2.206e-02  37.39 1.296e-07   7.631          4        4
6510  ZSCAN16       6_22 1.933e-02  52.88 1.201e-07   7.468          3        3
1218     COA8      14_54 8.221e-01  43.94 2.808e-04  -7.429          5        6
1513     DGKZ      11_28 6.889e-01  46.65 2.102e-04   7.216          2        2
4970   SKIV2L       6_26 2.813e-08  77.27 5.804e-19   7.101          4        5
1404   CYP2D7      22_17 5.014e-02  34.90 8.329e-07   7.071          1        2
4907    SF3B1      2_117 4.467e-01  43.83 8.117e-05   7.002          2        2

GO enrichment analysis for genes with PIP>0.5

#number of genes for gene set enrichment
length(genes)
[1] 19
Uploading data to Enrichr... Done.
  Querying GO_Biological_Process_2021... Done.
  Querying GO_Cellular_Component_2021... Done.
  Querying GO_Molecular_Function_2021... Done.
Parsing results... Done.
[1] "GO_Biological_Process_2021"

                                                               Term Overlap
1 positive regulation of neuron projection development (GO:0010976)    3/88
  Adjusted.P.value             Genes
1          0.01675 NTRK3;DPYSL3;LRP8
[1] "GO_Cellular_Component_2021"

[1] Term             Overlap          Adjusted.P.value Genes           
<0 rows> (or 0-length row.names)
[1] "GO_Molecular_Function_2021"

[1] Term             Overlap          Adjusted.P.value Genes           
<0 rows> (or 0-length row.names)

DisGeNET enrichment analysis for genes with PIP>0.5

                                                          Description     FDR
4                                                        Fibrosarcoma 0.02003
10                                              Myocardial Infarction 0.02003
25                             Fibrolamellar Hepatocellular Carcinoma 0.02003
38                                    Congenital Mesoblastic Nephroma 0.02003
45 MEGALENCEPHALY-POLYMICROGYRIA-POLYDACTYLY-HYDROCEPHALUS SYNDROME 1 0.02003
46 MEGALENCEPHALY-POLYMICROGYRIA-POLYDACTYLY-HYDROCEPHALUS SYNDROME 2 0.02003
40  Megalanecephaly Polymicrogyria-Polydactyly Hydrocephalus Syndrome 0.02212
30                                                 Hemimegalencephaly 0.02901
29                                                 Cortical Dysplasia 0.05400
42                              Malformations of Cortical Development 0.05400
   Ratio BgRatio
4    1/8  3/9703
10   2/8 95/9703
25   1/8  2/9703
38   1/8  2/9703
45   1/8  3/9703
46   1/8  1/9703
40   1/8  4/9703
30   1/8  6/9703
29   1/8 14/9703
42   1/8 14/9703

WebGestalt enrichment analysis for genes with PIP>0.5

Warning: replacing previous import 'lifecycle::last_warnings' by
'rlang::last_warnings' when loading 'hms'
Loading the functional categories...
Loading the ID list...
Loading the reference list...
Performing the enrichment analysis...
Warning in oraEnrichment(interestGeneList, referenceGeneList, geneSet, minNum =
minNum, : No significant gene set is identified based on FDR 0.05!
NULL

PIP Manhattan Plot

Sensitivity, specificity and precision for silver standard genes

#number of genes in known annotations
print(length(known_annotations))
[1] 130
#number of genes in known annotations with imputed expression
print(sum(known_annotations %in% ctwas_gene_res$genename))
[1] 51
#significance threshold for TWAS
print(sig_thresh)
[1] 4.474
#number of ctwas genes
length(ctwas_genes)
[1] 3
#number of TWAS genes
length(twas_genes)
[1] 107
#show novel genes (ctwas genes with not in TWAS genes)
ctwas_gene_res[ctwas_gene_res$genename %in% novel_genes,report_cols]
[1] genename   region_tag susie_pip  mu2        PVE        z          num_intron
[8] num_sqtl  
<0 rows> (or 0-length row.names)
#sensitivity / recall
print(sensitivity)
   ctwas     TWAS 
0.007692 0.100000 
#specificity
print(specificity)
 ctwas   TWAS 
0.9997 0.9855 
#precision / PPV
print(precision)
 ctwas   TWAS 
0.3333 0.1215 

sessionInfo()
R version 4.1.0 (2021-05-18)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Scientific Linux 7.4 (Nitrogen)

Matrix products: default
BLAS/LAPACK: /software/openblas-0.3.13-el7-x86_64/lib/libopenblas_haswellp-r0.3.13.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] readxl_1.4.0      forcats_0.5.1     stringr_1.4.0     purrr_0.3.4      
 [5] readr_1.4.0       tidyr_1.1.3       tidyverse_1.3.1   tibble_3.1.7     
 [9] WebGestaltR_0.4.4 disgenet2r_0.99.2 enrichR_3.0       cowplot_1.1.1    
[13] ggplot2_3.3.5     dplyr_1.0.7       reticulate_1.20   workflowr_1.6.2  

loaded via a namespace (and not attached):
 [1] fs_1.5.0          lubridate_1.7.10  doParallel_1.0.16 httr_1.4.2       
 [5] rprojroot_2.0.2   tools_4.1.0       backports_1.2.1   doRNG_1.8.2      
 [9] bslib_0.2.5.1     utf8_1.2.1        R6_2.5.0          vipor_0.4.5      
[13] DBI_1.1.1         colorspace_2.0-2  withr_2.4.2       ggrastr_1.0.1    
[17] tidyselect_1.1.1  curl_4.3.2        compiler_4.1.0    git2r_0.28.0     
[21] rvest_1.0.0       cli_3.0.0         Cairo_1.5-15      xml2_1.3.2       
[25] labeling_0.4.2    sass_0.4.0        scales_1.1.1      systemfonts_1.0.4
[29] apcluster_1.4.9   digest_0.6.27     rmarkdown_2.9     svglite_2.0.0    
[33] pkgconfig_2.0.3   htmltools_0.5.1.1 dbplyr_2.1.1      highr_0.9        
[37] rlang_1.0.2       rstudioapi_0.13   jquerylib_0.1.4   farver_2.1.0     
[41] generics_0.1.0    jsonlite_1.7.2    magrittr_2.0.1    Matrix_1.3-3     
[45] ggbeeswarm_0.6.0  Rcpp_1.0.7        munsell_0.5.0     fansi_0.5.0      
[49] lifecycle_1.0.0   stringi_1.6.2     whisker_0.4       yaml_2.2.1       
[53] plyr_1.8.6        grid_4.1.0        ggrepel_0.9.1     parallel_4.1.0   
[57] promises_1.2.0.1  crayon_1.4.1      lattice_0.20-44   haven_2.4.1      
[61] hms_1.1.0         knitr_1.33        pillar_1.7.0      igraph_1.2.6     
[65] rjson_0.2.20      rngtools_1.5      reshape2_1.4.4    codetools_0.2-18 
[69] reprex_2.0.0      glue_1.4.2        evaluate_0.14     data.table_1.14.0
[73] modelr_0.1.8      png_0.1-7         vctrs_0.3.8       httpuv_1.6.1     
[77] foreach_1.5.1     cellranger_1.1.0  gtable_0.3.0      assertthat_0.2.1 
[81] xfun_0.24         broom_0.7.8       later_1.2.0       iterators_1.0.13 
[85] beeswarm_0.4.0    ellipsis_0.3.2