Last updated: 2021-04-22

Checks: 7 0

Knit directory: funcFinemapping/

This reproducible R Markdown analysis was created with workflowr (version 1.6.2). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.


Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it's best to always run the code in an empty environment.

The command set.seed(20210404) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version 4dd58d7. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    analysis/results.nb.html

Untracked files:
    Untracked:  output/constrained_and_OPC.txt
    Untracked:  output/prop_constrained_variants_with_OCR.txt

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.


These are the previous versions of the repository in which changes were made to the R Markdown (analysis/results.Rmd) and HTML (docs/results.html) files. If you've configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File Version Author Date Message
Rmd 4dd58d7 Jing Gu 2021-04-22 compare annotations
html 540e287 Jing Gu 2021-04-21 Build site.
html e597d4e Jing Gu 2021-04-21 Build site.
Rmd b93bab1 Jing Gu 2021-04-21 compare with other conservation annotations
html 491ca5c Jing Gu 2021-04-14 Build site.
Rmd 1ed77cb Jing Gu 2021-04-14 characterize annotations
html 1ed77cb Jing Gu 2021-04-14 characterize annotations
html ba251cb Jing Gu 2021-04-14 Build site.
Rmd 76883f1 Jing Gu 2021-04-14 characterize annotations
html 76883f1 Jing Gu 2021-04-14 characterize annotations
html 1a76b29 Jing Gu 2021-04-06 Build site.
Rmd ae12e8a Jing Gu 2021-04-06 evaluate sequence constraints
html 752cb39 Jing Gu 2021-04-05 Build site.
html f1c5950 Jing Gu 2021-04-05 Build site.
html d16a5e0 Jing Gu 2021-04-05 Build site.
Rmd 6f7214b Jing Gu 2021-04-05 edit index page

Fine-mapping with functional annotations as priors has shown improved results in identifying causal variants. This project is to evaluate the utility of novel annotation features and adopt ones that can improve fine mapping results.

Evaluation

GWAS summary statistics

Schizopherenia - Pardinas et al., 2018

  • 40675 cases and 64643 controls
  • CLOZUK sample + PGC sample (independent)
  • 179 independent GWAS significant SNPs mapped to 145 independent loci
  • SNPs were imputed using a combination of the 1KGPp3 and UK 10K datasets.
  • SNPs were filtered by NFO > 0.6 and MAF > 0.01
  • LD-score regression analysis: An LD reference was generated from 1KGPp3 after restricting this dataset to strictly unrelated individuals and retaining only markers with MAF > 0.01.

GWAS QC Procedures

  • Current procedures was based on Alan's finemappeR pipeline
  • Criteria for filtering gwas SNPs
  1. Remove all non-biallelic SNPs
  2. Remove all SNPs with strand-ambiguous alleles (SNPs with A/T, C/G alleles)
  3. Removed SNPs without rs IDs, duplicated rs IDs or base pair position.
  4. Removed SNPs not in the reference panels
  5. Removed SNPs whose base pair positions or alleles doesn’t match the reference panels
  6. Removed all SNPs on chromosome X, Y, and MT

After filtering, there are around 6 million variants remained.

Plots for GWAS summary statistics

Plots for GWAS summary statistics

Features

  1. Sequence constraints:
    • context-dependent tolerance scores(CDTS) in percentiles
    • A score was computed for each 10bp bin in the genome.
    • The lower the score is, the more intolerant to variation is the bin.

Procedures

  1. GWAS summary statistics was pre-processed to remove sex chromosomes, indels, ambiguous and duplicated SNPs.
  2. Currently, genotypes from 1kg European samples are used to compute LD between SNPs.
  3. SNPs in GWAS summary statistics were matched with the reference panel and assigned to in total 1687 independent LD blocks.
  4. Run TORUS to perform genome-wide enrichment analyses.

Results

All variants were catogrized into whether or not they occur in genomic bins with CDTS up to 1 percentile or 5 percentile.

Examine the CDTS feature

check the proportion of variants with high sequencing constraints that also have functional annotations in brain

Summary of percentage of genetic variants within up to one/five percentile of CDTS that overlapped with OCRs in brain
iN_Dopa iN_GABA iN_Glut iPSC NPC Any_OCR
CDTS_1% 53.7% 62.6% 53.2% 70.8% 51.9% 76.4%
CDTS_5% 24.4% 27.8% 21.6% 33.9% 20.6% 40.4%

Check the percent of constrained sequences that overlaps with open chromatin regions from neurons

  • Overlaps between two sets of genomic features were identified using bedtools intersect. The constrained sequences were counted to be overlapped when at least 20%(>=2 bp) intersect with peaks called from ATAC-Seq profiles.
Summary of the overlappings between constrained sequences and OCRs in brain
iN_Dopa iN_GABA iN_Glut iPSC NPC
CDTS_1% 46.9% 55.6% 45.4% 62.2% 44.2%
CDTS_5% 18% 21.3% 17.4% 23.8% 16.9%

Enrichment analysis for sequence constraints

Version Author Date
76883f1 Jing Gu 2021-04-14

The enrichment estimate has a confience level above zero for CDTS and positive controls. This shows SNPs associated with SCZ are on average ~ 9 fold enriched in genomic bins with up to 5 percentile of CDTS.

Compare with other conservation annotations

  • LINSIGHT
    • predict how noncoding nucleotide sites are likely to have deleterious fitness consequences and hence be phenotypically important
  • genome-wide average of LINSIGHT scores was ~0.07 (range: 0.03-0.99)
  • Estimated mean LINSIGHT score for conserved TFBSs was 0.24->used as cutoff for whether the nucelotide site is conserved
  • 2.5% of GWAS SNPs are above LINSIGHT threshold.

  • CADD - Combined Annotation–Dependent Depletion,
    • provides metrics of deleteriousness
    • scaled PHRED score [-10log10(P)]
    • 5 percent chosen as a cutoff, which represents top 5% of all possible reference genome SNVs
    • 2.5% of GWAS SNPs are above LINSIGHT threshold.
  • GERP - Genomic Evolutionary Rate Profiling
    • produce position-specific estimates of evolutionary constraint
    • constraint intensity quantified as a "rejection score" range from -12.3 to 6.17
    • UCSC suggests a RS score threshold of 2 which provides high sensitivity and strongly enriched for true constraint sites
Summary of the pair-wise correlations between conservation annotations
CDTS LINSIGHT GERP CADD
CDTS 1.0000000 0.0401113 0.0207126 0.0321172
LINSIGHT 0.0401113 1.0000000 0.4004946 0.5376002
GERP 0.0207126 0.4004946 1.0000000 0.2748316
CADD 0.0321172 0.5376002 0.2748316 1.0000000

The correlation table shows the pair-wise correlations between each binary annotations. With current thresholds, CDTS at top 5 percent is uncorrelated to other binary annotations. Instead, there are high correlations among LINSIGHT, GERP and CADD scores.

joint TORUS enrichment analysis over conservation-related annotations

Version Author Date
e597d4e Jing Gu 2021-04-21

With other conservation annotations as predictors in the model, we can see CDTS within top 5 percentile still shows around 8 fold enrichment.


sessionInfo()
R version 4.0.4 (2021-02-15)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Scientific Linux 7.4 (Nitrogen)

Matrix products: default
BLAS/LAPACK: /software/openblas-0.3.13-el7-x86_64/lib/libopenblas_haswellp-r0.3.13.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] ggplot2_3.3.3     knitr_1.31        data.table_1.14.0 workflowr_1.6.2  

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.6        pillar_1.5.0      compiler_4.0.4    bslib_0.2.4      
 [5] later_1.1.0.1     jquerylib_0.1.3   git2r_0.28.0      highr_0.8        
 [9] tools_4.0.4       digest_0.6.27     gtable_0.3.0      jsonlite_1.7.2   
[13] evaluate_0.14     lifecycle_1.0.0   tibble_3.0.6      pkgconfig_2.0.3  
[17] rlang_0.4.10      DBI_1.1.1         yaml_2.2.1        xfun_0.21        
[21] withr_2.4.1       dplyr_1.0.4       stringr_1.4.0     generics_0.1.0   
[25] fs_1.5.0          vctrs_0.3.6       sass_0.3.1        tidyselect_1.1.0 
[29] rprojroot_2.0.2   grid_4.0.4        glue_1.4.2        R6_2.5.0         
[33] fansi_0.4.2       rmarkdown_2.7     farver_2.0.3      purrr_0.3.4      
[37] magrittr_2.0.1    whisker_0.4       scales_1.1.1      promises_1.2.0.1 
[41] ellipsis_0.3.1    htmltools_0.5.1.1 assertthat_0.2.1  colorspace_2.0-0 
[45] httpuv_1.5.5      labeling_0.4.2    utf8_1.1.4        stringi_1.5.3    
[49] munsell_0.5.0     crayon_1.4.1