Last updated: 2022-11-04

Checks: 5 2

Knit directory: cTWAS_analysis/

This reproducible R Markdown analysis was created with workflowr (version 1.7.0). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.


The R Markdown file has unstaged changes. To know which version of the R Markdown file created these results, you’ll want to first commit it to the Git repo. If you’re still working on the analysis, you can ignore this warning. When you’re finished, you can run wflow_publish to commit the R Markdown file and build the HTML.

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

The command set.seed(20211220) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

Using absolute paths to the files within your workflowr project makes it difficult for you and others to run your code on a different machine. Change the absolute path(s) below to the suggested relative path(s) to make your code more reproducible.

absolute relative
/project2/xinhe/shengqian/cTWAS/cTWAS_analysis/data/ data
/project2/xinhe/shengqian/cTWAS/cTWAS_analysis/code/ctwas_config_b38.R code/ctwas_config_b38.R

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version 22a3f5e. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .Rhistory
    Ignored:    .ipynb_checkpoints/
    Ignored:    analysis/figure/

Untracked files:
    Untracked:  Proposal plots.R
    Untracked:  RGS14.pdf
    Untracked:  RNF186.pdf
    Untracked:  SCZ_annotation.xlsx
    Untracked:  SLC8B1.pdf
    Untracked:  analysis/.ipynb_checkpoints/
    Untracked:  cache/
    Untracked:  code/.ipynb_checkpoints/
    Untracked:  data/.ipynb_checkpoints/
    Untracked:  data/FUMA_output/
    Untracked:  data/GO_Terms/
    Untracked:  data/IBD_ME/
    Untracked:  data/LDL/
    Untracked:  data/LDL_S/
    Untracked:  data/PGC3_SCZ_wave3_public.v2.tsv
    Untracked:  data/SCZ/
    Untracked:  data/SCZ_2014_EUR/
    Untracked:  data/SCZ_2014_EUR_ME/
    Untracked:  data/SCZ_2018/
    Untracked:  data/SCZ_2018_ME/
    Untracked:  data/SCZ_2018_S/
    Untracked:  data/SCZ_2020/
    Untracked:  data/SCZ_S/
    Untracked:  data/Supplementary Table 15 - MAGMA.xlsx
    Untracked:  data/Supplementary Table 20 - Prioritised Genes.xlsx
    Untracked:  data/UKBB/
    Untracked:  data/UKBB_SNPs_Info.text
    Untracked:  data/gene_OMIM.txt
    Untracked:  data/gene_pip_0.8.txt
    Untracked:  data/gwas_sumstats/
    Untracked:  data/magma.genes.out
    Untracked:  data/mashr_Heart_Atrial_Appendage.db
    Untracked:  data/mashr_sqtl/
    Untracked:  data/notes.txt
    Untracked:  data/scz_2018.RDS
    Untracked:  data/summary_known_genes_annotations.xlsx
    Untracked:  temp_LDR/
    Untracked:  top_genes_32.txt
    Untracked:  top_genes_37.txt
    Untracked:  top_genes_43.txt
    Untracked:  top_genes_54.txt
    Untracked:  top_genes_81.txt
    Untracked:  z_snp_pos_SCZ.RData
    Untracked:  z_snp_pos_SCZ_2014_EUR.RData
    Untracked:  z_snp_pos_SCZ_2018.RData
    Untracked:  z_snp_pos_SCZ_2020.RData

Unstaged changes:
    Deleted:    analysis/BMI_S_results.Rmd
    Modified:   analysis/LDL_Liver.Rmd
    Modified:   analysis/LDL_Liver_S.Rmd
    Modified:   analysis/LDL_Liver_S_gene_level.Rmd
    Modified:   analysis/SCZ_E_S_Analysis.Rmd
    Deleted:    code/run_IBD_ctwas_rss_LDR_ME.R

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.


These are the previous versions of the repository in which changes were made to the R Markdown (analysis/LDL_Liver_S_gene_level.Rmd) and HTML (docs/LDL_Liver_S_gene_level.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File Version Author Date Message
Rmd 22a3f5e sq-96 2022-11-04 update
html 22a3f5e sq-96 2022-11-04 update

Weight QC

#number of imputed weights
nrow(qclist_all)
[1] 21553
#number of imputed weights by chromosome
table(qclist_all$chr)

   1    2    3    4    5    6    7    8    9   10   11   12   13   14   15   16 
2079 1523 1276  769  878 1116 1176  701  854  931 1344 1116  391  772  677 1085 
  17   18   19   20   21   22 
1462  258 1524  659  262  700 
#number of imputed weights without missing variants
sum(qclist_all$nmiss==0)
[1] 19937
#proportion of imputed weights without missing variants
mean(qclist_all$nmiss==0)
[1] 0.925
library(reticulate)
use_python("/scratch/midway2/shengqian/miniconda3/envs/PythonForR/bin/python",required=T)
INFO:numexpr.utils:Note: NumExpr detected 28 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
finish
#add z scores to results
load(paste0(results_dir, "/", analysis_id, "_expr_z_gene.Rd"))
ctwas_gene_res <- py$ctwas_gene_res_df
ctwas_gene_res$z <- z_gene[ctwas_gene_res$intron_id,]$z

z_snp <- z_snp[z_snp$id %in% ctwas_snp_res$id,]
ctwas_snp_res$z <- z_snp$z[match(ctwas_snp_res$id, z_snp$id)]

#merge gene and snp results with added information
ctwas_snp_res$genename=NA
ctwas_snp_res$gene_type=NA
ctwas_snp_res$intron_id=NA
ctwas_snp_res$intron_pos=NA

saveRDS(ctwas_gene_res, file = paste0(results_dir,"/",analysis_id,"_ctwas_intron_res.RDS"))
saveRDS(ctwas_snp_res, file =  paste0(results_dir,"/",analysis_id,"_ctwas_snp_res.RDS"))

Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union
ctwas_res <- rbind(ctwas_gene_res,
                   ctwas_snp_res[,colnames(ctwas_gene_res)])

#store columns to report
report_cols <- colnames(ctwas_gene_res)[!(colnames(ctwas_gene_res) %in% c("type", "region_tag1", "region_tag2", "cs_index", "gene_type", "z_flag", "id", "chrom", "pos"))]
first_cols <- c("genename", "region_tag")
report_cols <- c(first_cols, report_cols[!(report_cols %in% first_cols)])

report_cols_snps <- c("id", report_cols[-1])
report_cols_snps <- report_cols_snps[!(report_cols_snps %in% "num_sqtl")]

#get number of SNPs from s1 results; adjust for thin argument
ctwas_res_s1 <- data.table::fread(paste0(results_dir, "/", analysis_id, "_ctwas.s1.susieIrss.txt"))
n_snps <- sum(ctwas_res_s1$type=="SNP")/thin
rm(ctwas_res_s1)

Genes with highest PIPs

#distribution of PIPs
hist(ctwas_gene_res$susie_pip, xlim=c(0,1), main="Distribution of Gene PIPs")

Version Author Date
22a3f5e sq-96 2022-11-04
#genes with PIP>0.8 or 20 highest PIPs
head(ctwas_gene_res[order(-ctwas_gene_res$susie_pip),report_cols], max(sum(ctwas_gene_res$susie_pip>0.8), 20))
        genename region_tag susie_pip    mu2       PVE       z num_intron
1353       CNIH4      1_114    1.8027  35.42 2.610e-04   5.273          6
2343        FKRP      19_33    1.7010  18.81 1.495e-04  -3.018          5
1053      CCDC57      17_47    1.4400  20.38 1.053e-04  -4.344         19
4796         PXK       3_40    1.2808  53.31 1.553e-04   6.852          6
190       ADHFE1       8_50    1.2508  18.20 6.146e-05  -3.584         14
5488    SLC22A18       11_2    1.1989  19.89 7.752e-05   4.096         13
1623     CYP4F12      19_13    1.1917  36.25 1.315e-04  -5.868         14
3812      MYO15B      17_42    1.1814  35.52 8.339e-05  -4.053         26
3848     NADSYN1      11_40    1.1678  11.24 2.297e-05   1.456         20
119         ACP6       1_73    1.1614  18.88 7.057e-05   3.994          5
4247       PARP9       3_76    1.1591  40.89 1.482e-04   6.409          8
1530 CTC-360G5.8      19_26    1.1535  21.58 7.356e-05   4.428         11
4637      PPP6R2      22_24    1.1406  25.23 8.252e-05  -4.832         14
5313      SEC16B       1_87    1.1343  22.62 6.981e-05  -3.767         15
2919      IL17RC        3_8    1.1248  21.95 5.256e-05   2.782         19
2089      ERGIC3      20_21    1.1191  51.67 1.667e-04  -7.267          5
3301       LRCH4       7_61    1.0948  31.67 1.087e-04   5.294          6
6059       THOP1       19_3    1.0915  27.51 8.892e-05   5.087          7
4378      PHLDB1      11_71    1.0895  23.26 6.229e-05   3.789          8
101         ACCS      11_27    1.0606  12.34 2.970e-05  -1.952         15
2820          HP      16_38    1.0598 273.62 8.764e-04  21.869          6
21         ABCA8      17_39    1.0506  27.86 8.623e-05  -4.775         12
3018       ITGAL      16_24    1.0361  21.82 6.601e-05  -4.428          3
4047        NQO2        6_3    1.0351  13.31 2.705e-05  -1.687         22
534        ASGR1       17_6    1.0344  83.95 2.614e-04   9.645          2
5736      SPRED2       2_42    1.0246  32.05 7.230e-05  -4.438          2
5788     ST3GAL4      11_77    1.0244  76.58 2.204e-04 -12.154          5
2349       FLOT2      17_17    1.0193  31.16 8.363e-05  -3.738          3
3847       NADK2       5_24    1.0159  20.93 6.200e-05   4.321          3
3981        NINL      20_19    1.0000  25.08 6.795e-05  -4.733          7
3226        LDLR       19_9    1.0000 751.97 2.188e-03  26.898          2
6593       USP53       4_77    0.9986  26.01 7.209e-05   4.888          4
3200       LAMA5      20_36    0.9982  14.84 2.480e-05   1.946         16
883      C2orf42       2_46    0.9963  22.94 5.154e-05  -3.356         12
2771       HLA-B       6_25    0.9916  62.55 1.790e-04   8.790         11
6591       USP47       11_9    0.9913  19.00 4.980e-05   3.654          7
6135    TMEM150A       2_54    0.9866  19.99 5.486e-05   4.079          4
4626     PPP2R5C      14_53    0.9859  20.51 5.247e-05  -3.976          6
813    C14orf159      14_46    0.9817  18.47 2.745e-05   2.224         17
4413      PIH1D1      19_34    0.9810  23.13 6.245e-05   3.988          7
45        ABHD12      20_18    0.9713  26.14 6.203e-05  -4.696          7
681        BCAT2      19_34    0.9682  27.94 7.568e-05  -5.197          2
4870      RALGDS       9_70    0.9665  17.53 4.588e-05   3.277          4
5408      SH3BP2        4_3    0.9603  21.48 4.788e-05   2.887         10
3121      KIF13B       8_28    0.9476  23.56 6.155e-05  -4.718          1
4569         POR       7_48    0.9384  33.80 7.885e-05  -5.646          5
5910       TACC1       8_34    0.9325  18.93 4.616e-05   3.936          4
1040     CCDC159      19_10    0.9318  40.93 7.676e-05  -3.958          7
6115       TMED4       7_32    0.9194  28.69 6.984e-05  -4.609          3
4448      PLA2G6      22_15    0.9176  15.78 3.232e-05  -2.419         14
5721       SPHK2      19_33    0.9153  42.12 1.026e-04   8.721          2
31         ABCC3      17_29    0.9118  13.25 2.257e-05  -1.908         16
3778      MTSS1L      16_37    0.9096  17.51 4.056e-05  -2.815          7
6524      UGT1A1      2_137    0.9090  30.83 6.881e-05   5.450          4
898           C5       9_62    0.9073  24.46 4.818e-05  -4.725          8
4365        PHC1       12_9    0.9054  36.98 8.822e-05   6.156          1
3083       KDM1A       1_16    0.8913  22.63 4.809e-05   4.158          3
4965       REPS1       6_92    0.8869  34.02 6.208e-05   3.381          8
1609      CYP2C8      10_61    0.8844  17.74 3.879e-05   3.549          8
2552        GNMT       6_33    0.8810  24.77 5.145e-05  -4.897          4
2358        FMO3       1_84    0.8770  23.11 3.982e-05   4.221          9
6240        TNK2      3_120    0.8753  23.32 3.771e-05   2.814         10
1392     COL18A1      21_23    0.8702  18.80 2.432e-05  -2.110         15
763         BRD9        5_1    0.8564  19.17 3.832e-05   3.964         12
5790     ST3GAL6       3_62    0.8559  27.99 4.872e-05   3.387          9
4818      R3HDM2      12_36    0.8555  36.25 7.375e-05  -5.818          3
525        ASAP3       1_16    0.8532  25.69 5.020e-05  -4.936          7
3619        MMAB      12_66    0.8529  18.44 3.180e-05   3.833         11
282      ALDH1A2      15_26    0.8444  62.53 1.146e-04  -7.783          9
1724     DENND5A       11_8    0.8428  22.43 3.220e-05   3.050          9
4580       PPCDC      15_35    0.8401  18.78 3.588e-05   4.030          5
1630      CYSTM1       5_83    0.8389  18.90 3.795e-05  -4.021          4
5814       STAU1      20_30    0.8386  21.35 4.004e-05   4.296          6
4361        PGS1      17_45    0.8372  48.60 9.077e-05  -7.141          5
1948       EFNA1       1_76    0.8364  28.21 5.706e-05   3.331          3
5926      TANGO2       22_4    0.8325  19.01 3.475e-05  -3.713          9
5365    SERPINF2       17_2    0.8273  20.50 3.732e-05   4.092          5
5352       SEPT2      2_144    0.8273  19.09 3.419e-05   3.767          9
2963      INPPL1      11_40    0.8260  12.15 1.504e-05  -1.424         12
3821       MYO9B      19_14    0.8226  20.72 3.883e-05  -4.344          7
1480      CRELD1        3_8    0.8185  19.42 2.475e-05   2.483         14
5440       SIPA1      11_36    0.8120  20.64 3.793e-05  -4.440          5
4132       NUP88       17_5    0.8105  19.78 2.973e-05   3.087         11
6328       TRIM4       7_61    0.8100  24.15 4.431e-05  -4.329          3
6693       WDR27      6_111    0.8061  16.95 1.531e-05   1.828         22
5255       SBNO2       19_2    0.8007  31.61 4.431e-05   3.473         12
     num_sqtl
1353        7
2343        5
1053       26
4796        7
190        14
5488       20
1623       16
3812       38
3848       27
119         6
4247       10
1530       13
4637       15
5313       16
2919       24
2089        6
3301        7
6059        7
4378        8
101        19
2820        9
21         15
3018        3
4047       39
534         2
5736        2
5788       10
2349        3
3847        3
3981        7
3226        3
6593        5
3200       25
883        14
2771       28
6591        7
6135        4
4626        7
813        21
4413        7
45          9
681         2
4870        4
5408       11
3121        1
4569        6
5910        4
1040        8
6115        4
4448       22
5721        2
31         20
3778        7
6524        4
898         9
4365        1
3083        3
4965       10
1609        8
2552        4
2358        9
6240       11
1392       16
763        15
5790       10
4818        3
525         7
3619       19
282         9
1724       10
4580        7
1630        6
5814        6
4361        5
1948        4
5926        9
5365        7
5352       10
2963       13
3821        7
1480       16
5440        6
4132       15
6328        5
6693       33
5255       14

Comparing z scores and PIPs

#set nominal signifiance threshold for z scores
alpha <- 0.05

#bonferroni adjusted threshold for z scores
sig_thresh <- qnorm(1-(alpha/nrow(ctwas_gene_res)/2), lower=T)

PIP Manhattan Plot

Sensitivity, specificity and precision for silver standard genes

library("readxl")

known_annotations <- read_xlsx("data/summary_known_genes_annotations.xlsx", sheet="LDL")
New names:
known_annotations <- unique(known_annotations$`Gene Symbol`)

unrelated_genes <- ctwas_gene_res$genename[!(ctwas_gene_res$genename %in% known_annotations)]

#number of genes in known annotations
print(length(known_annotations))
[1] 69
#number of genes in known annotations with imputed expression
print(sum(known_annotations %in% ctwas_gene_res$genename))
[1] 38
#assign ctwas, TWAS, and bystander genes
ctwas_genes <- ctwas_gene_res$genename[ctwas_gene_res$susie_pip>0.9]

#significance threshold for TWAS
print(sig_thresh)
[1] 4.49
#number of ctwas genes
length(ctwas_genes)
[1] 56
#sensitivity / recall
sensitivity <- rep(NA,1)
names(sensitivity) <- c("ctwas")
sensitivity["ctwas"] <- sum(ctwas_genes %in% known_annotations)/length(known_annotations)
sensitivity
  ctwas 
0.01449 
#specificity
specificity <- rep(NA,1)
names(specificity) <- c("ctwas")
specificity["ctwas"] <- sum(!(unrelated_genes %in% ctwas_genes))/length(unrelated_genes)
specificity
 ctwas 
0.9921 
#precision / PPV
precision <- rep(NA,1)
names(precision) <- c("ctwas")
precision["ctwas"] <- sum(ctwas_genes %in% known_annotations)/length(ctwas_genes)
precision
  ctwas 
0.01786 
#ROC curves

pip_range <- (0:1000)/1000
sensitivity <- rep(NA, length(pip_range))
specificity <- rep(NA, length(pip_range))

sessionInfo()
R version 4.1.0 (2021-05-18)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS/LAPACK: /software/openblas-0.3.13-el7-x86_64/lib/libopenblas_haswellp-r0.3.13.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] readxl_1.4.0    dplyr_1.0.9     reticulate_1.26 workflowr_1.7.0

loaded via a namespace (and not attached):
 [1] tidyselect_1.1.2  xfun_0.24         bslib_0.4.0       purrr_0.3.4      
 [5] lattice_0.20-44   vctrs_0.4.1       generics_0.1.2    htmltools_0.5.3  
 [9] yaml_2.2.1        utf8_1.2.2        rlang_1.0.4       later_1.2.0      
[13] pillar_1.7.0      jquerylib_0.1.4   DBI_1.1.2         glue_1.6.2       
[17] lifecycle_1.0.1   stringr_1.4.0     cellranger_1.1.0  evaluate_0.15    
[21] knitr_1.33        callr_3.7.0       fastmap_1.1.0     httpuv_1.6.1     
[25] ps_1.7.0          fansi_1.0.3       highr_0.9         Rcpp_1.0.9       
[29] promises_1.2.0.1  cachem_1.0.6      jsonlite_1.8.0    fs_1.5.2         
[33] png_0.1-7         digest_0.6.29     stringi_1.7.6     processx_3.5.3   
[37] getPass_0.2-2     rprojroot_2.0.3   grid_4.1.0        here_1.0.1       
[41] cli_3.3.0         tools_4.1.0       magrittr_2.0.3    sass_0.4.0       
[45] tibble_3.1.7      crayon_1.5.1      whisker_0.4       pkgconfig_2.0.3  
[49] ellipsis_0.3.2    Matrix_1.3-3      data.table_1.14.2 assertthat_0.2.1 
[53] rmarkdown_2.9     httr_1.4.3        rstudioapi_0.13   R6_2.5.1         
[57] git2r_0.28.0      compiler_4.1.0