Last updated: 2022-06-21

Checks: 7 0

Knit directory: rotation2/

This reproducible R Markdown analysis was created with workflowr (version 1.7.0). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.


Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

The command set.seed(20220607) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version f753baa. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    data/

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.


These are the previous versions of the repository in which changes were made to the R Markdown (analysis/project_1.Rmd) and HTML (docs/project_1.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File Version Author Date Message
Rmd f753baa chenh19 2022-06-21 wflow_publish("./analysis/*.Rmd")
html d376ad0 chenh19 2022-06-21 Build site.
Rmd 4a9db6a chenh19 2022-06-21 wflow_publish("./analysis/*.Rmd")
html 325a212 chenh19 2022-06-21 Build site.
Rmd d54cc77 chenh19 2022-06-21 wflow_publish("./analysis/*.Rmd")
html e14c55c chenh19 2022-06-21 Build site.
Rmd 2ab041a chenh19 2022-06-21 wflow_publish("./analysis/*.Rmd")
html f971bbd chenh19 2022-06-21 Build site.
html dca882f chenh19 2022-06-21 Build site.
html 3a80eaf chenh19 2022-06-21 Build site.
Rmd b6fb1d2 chenh19 2022-06-21 wflow_publish("./analysis/*.Rmd")
html 33829a5 chenh19 2022-06-21 Build site.
Rmd a949aec chenh19 2022-06-21 wflow_publish("./analysis/*.Rmd")
html 5b446cf chenh19 2022-06-21 Build site.
Rmd c4ad45d chenh19 2022-06-21 wflow_publish("./analysis/*.Rmd")
html 28a06ee chenh19 2022-06-21 Build site.
Rmd 53f2292 chenh19 2022-06-21 wflow_publish("./analysis/*.Rmd")
html d5b4ff0 chenh19 2022-06-21 Build site.
Rmd e982ac7 chenh19 2022-06-21 wflow_publish("./analysis/*.Rmd")
html 16400a7 chenh19 2022-06-21 Build site.
Rmd d6aa5b2 chenh19 2022-06-21 wflow_publish("./analysis/*.Rmd")
html a324166 chenh19 2022-06-21 Build site.
Rmd ab8cbc3 chenh19 2022-06-21 wflow_publish("./analysis/*.Rmd")
html 27a4d51 chenh19 2022-06-21 Build site.
Rmd 624e791 chenh19 2022-06-21 wflow_publish("./analysis/*.Rmd")
html f765024 chenh19 2022-06-21 Build site.
html bc55dbb chenh19 2022-06-21 Build site.
Rmd 8b66c3b chenh19 2022-06-21 update
html b3b7ed6 chenh19 2022-06-21 Build site.
Rmd 405116a chenh19 2022-06-21 update
html 405d57e chenh19 2022-06-21 Build site.
Rmd bbbaab0 chenh19 2022-06-21 wflow_publish("./analysis/*.Rmd")
html c10a1a8 chenh19 2022-06-21 Build site.
Rmd a8f7999 chenh19 2022-06-21 wflow_publish("./analysis/*.Rmd")
html 2443e38 chenh19 2022-06-21 Build site.
Rmd 8e0c8ad chenh19 2022-06-21 wflow_publish("./analysis/*.Rmd")
html 5d192d2 chenh19 2022-06-21 Build site.
Rmd 356550c chenh19 2022-06-21 wflow_publish("./analysis/*.Rmd")
html 21e9501 chenh19 2022-06-21 Build site.
Rmd 4a8f7ee chenh19 2022-06-21 wflow_publish("./analysis/*.Rmd")
html b002776 chenh19 2022-06-20 Build site.
Rmd 6d78198 chenh19 2022-06-20 wflow_publish("./analysis/*.Rmd")
Rmd 0e1817a chenh19 2022-06-19 update
html 6211c60 chenh19 2022-06-15 Build site.
Rmd 46e8cc3 chenh19 2022-06-15 wflow_publish("./analysis/*.Rmd")
html 0da18e2 chenh19 2022-06-15 Build site.
Rmd d999122 chenh19 2022-06-15 wflow_publish("./analysis/*.Rmd")
html 0aff555 chenh19 2022-06-15 Build site.
Rmd eda2d56 chenh19 2022-06-15 wflow_publish("./analysis/*.Rmd")
html a4e1e73 chenh19 2022-06-15 Build site.
Rmd 7229c17 chenh19 2022-06-15 wflow_publish("./analysis/*.Rmd")
html f0e98f9 chenh19 2022-06-14 Build site.
Rmd e0aa022 chenh19 2022-06-14 wflow_publish("./analysis/*.Rmd")
html eafa16b chenh19 2022-06-14 Build site.
Rmd 69b29f1 chenh19 2022-06-14 wflow_publish("./analysis/*.Rmd")
html dfd60ce chenh19 2022-06-14 Build site.
Rmd 49f1922 chenh19 2022-06-14 wflow_publish("./analysis/*.Rmd")
html fd7271e chenh19 2022-06-14 Build site.
html 1b4d12e chenh19 2022-06-14 Build site.
html a6c402d chenh19 2022-06-14 Build site.
html cedad99 chenh19 2022-06-14 Build site.
Rmd 6b3f021 chenh19 2022-06-14 wflow_publish("analysis/*.Rmd")
html e3b9788 chenh19 2022-06-14 Build site.
html aed5eed chenh19 2022-06-14 Build site.
Rmd c552123 chenh19 2022-06-14 wflow_publish("analysis/*.Rmd")
html 45bb6ed chenh19 2022-06-14 Build site.
Rmd 81fdf42 chenh19 2022-06-14 wflow_publish("analysis/*.Rmd")
html c23765a chenh19 2022-06-14 Build site.
html b5fb71c chenh19 2022-06-14 Build site.
Rmd 10f6641 chenh19 2022-06-14 wflow_publish("analysis/*.Rmd")
html 152325b chenh19 2022-06-14 Build site.
Rmd 1739dd9 chenh19 2022-06-14 wflow_publish("analysis/*.Rmd")
html 543ef4c chenh19 2022-06-14 Build site.
html 9b6cb27 chenh19 2022-06-14 Build site.
Rmd d8908c0 chenh19 2022-06-14 wflow_publish("analysis/*.Rmd")
html 8674c8a chenh19 2022-06-14 Build site.
Rmd 2972ce6 chenh19 2022-06-14 wflow_publish("analysis/*.Rmd")
html ada4068 chenh19 2022-06-14 Build site.
Rmd 7c5402e chenh19 2022-06-14 wflow_publish("analysis/*.Rmd")
html 0d08121 chenh19 2022-06-14 Build site.
Rmd 6226526 chenh19 2022-06-14 wflow_publish("analysis/*.Rmd")
html dd2046d chenh19 2022-06-14 Build site.
Rmd 71f5d04 chenh19 2022-06-14 wflow_publish("analysis/*.Rmd")
html 2c4cab1 chenh19 2022-06-14 Build site.
Rmd 6e73c04 chenh19 2022-06-14 wflow_publish("analysis/*.Rmd")
html d46eaab chenh19 2022-06-14 Build site.
Rmd be56d9d chenh19 2022-06-14 wflow_publish("analysis/*.Rmd")
html 02a0c26 chenh19 2022-06-14 Build site.
Rmd e2c15c0 chenh19 2022-06-14 wflow_publish("analysis/*.Rmd")
html abad46e chenh19 2022-06-14 Build site.
Rmd 68948e3 chenh19 2022-06-14 wflow_publish("analysis/*.Rmd")
html 741027b chenh19 2022-06-14 Build site.
Rmd 065d7e9 chenh19 2022-06-14 wflow_publish("analysis/*.Rmd")
html bb15812 chenh19 2022-06-14 Build site.
Rmd b6e0993 chenh19 2022-06-14 wflow_publish("analysis/*.Rmd")
html 93a27ae chenh19 2022-06-14 Build site.
Rmd b4a7331 chenh19 2022-06-14 wflow_publish("analysis/*.Rmd")
html 27121a9 chenh19 2022-06-14 Build site.
Rmd fce1ffd chenh19 2022-06-14 wflow_publish("analysis/*.Rmd")
html 44517c1 chenh19 2022-06-13 Build site.
Rmd cc6a40a chenh19 2022-06-13 wflow_publish("analysis/*.Rmd")
html da08f11 chenh19 2022-06-13 Build site.
Rmd 05ccc35 chenh19 2022-06-13 wflow_publish("analysis/*.Rmd")
html 572f6ba chenh19 2022-06-13 Build site.
html b8870d3 chenh19 2022-06-13 Build site.
html 719925e chenh19 2022-06-13 Build site.
html e7541fa chenh19 2022-06-13 Build site.
html 9d9615d chenh19 2022-06-13 Build site.
Rmd 04feaa7 chenh19 2022-06-13 wflow_publish("analysis/*.Rmd")
html bbd8978 chenh19 2022-06-13 Build site.
Rmd 0ec2bfa chenh19 2022-06-13 wflow_publish("analysis/*.Rmd")
html e5a5b52 chenh19 2022-06-13 Build site.
Rmd c43ae1f chenh19 2022-06-13 wflow_publish("analysis/*.Rmd")
html 4d8bd72 chenh19 2022-06-13 Build site.
html 3373521 chenh19 2022-06-13 Build site.
html af21ea8 chenh19 2022-06-13 Build site.
Rmd 6e56d75 chenh19 2022-06-13 wflow_publish("analysis/*.Rmd")
html f653f7b chenh19 2022-06-13 Build site.
Rmd 2723e7f chenh19 2022-06-13 wflow_publish("analysis/*.Rmd")
html d69c892 chenh19 2022-06-13 Build site.
html 34d877d chenh19 2022-06-13 Build site.
html e72400b chenh19 2022-06-13 Build site.
html c411223 chenh19 2022-06-13 Build site.
html 1daccd2 chenh19 2022-06-13 Build site.
Rmd 63f46d2 chenh19 2022-06-13 wflow_publish("analysis/*.Rmd")
html 26adb45 chenh19 2022-06-13 Build site.
html a6022a8 chenh19 2022-06-13 Build site.
Rmd 1215832 chenh19 2022-06-13 wflow_publish("analysis/*.Rmd")
html 9abc4b8 chenh19 2022-06-13 Build site.
Rmd 7efcfe0 chenh19 2022-06-13 wflow_publish("analysis/*.Rmd")
html f18d385 chenh19 2022-06-13 Build site.
Rmd a7c1ce0 chenh19 2022-06-13 wflow_publish("analysis/*.Rmd")
html e991f56 chenh19 2022-06-13 Build site.
html 3c9b1d9 chenh19 2022-06-13 Build site.
Rmd ae1553a chenh19 2022-06-13 wflow_publish("analysis/*.Rmd")
html 34e0d02 chenh19 2022-06-13 Build site.
Rmd e69aa83 chenh19 2022-06-13 wflow_publish("analysis/*.Rmd")
html 9be31af chenh19 2022-06-13 Build site.
Rmd ead84c2 chenh19 2022-06-13 wflow_publish("analysis/*.Rmd")
html 0f41de8 chenh19 2022-06-13 Build site.
html 31ad035 chenh19 2022-06-13 Build site.
html bdf3b44 chenh19 2022-06-13 Build site.
html 8d0890c chenh19 2022-06-13 Build site.
Rmd 26f455b chenh19 2022-06-13 update
html 26f455b chenh19 2022-06-13 update

1. Understand RNA-seq

a. Read about RNA-seq analysis

Yalamanchili et al. 2017: RNA-seq analysis pipeline

Some key points:

Protocol-1 (differential expression of genes):

  • demuxed raw reads (FastQC)
  • trimming reads (awk)
  • aligning reads (TopHat2)
  • counting reads (HTSeq; may filter out genes with low counts before next step)
  • detect DE using counted reads (DEseq2)
  • more QC (PCA/correlation heatmap)

Protocol-2 (differential usage of isoforms):

  • Protocol-1
  • counting isoforms (Kallisto, also check cell ranger)
  • detect DU using counted isoforms (Sleuth)
  • more QC (aslo PCA/correlation heatmap)

Protocol-3 (crypic splicing):

  • Protocol-1
  • detect differential junstions (CrypSplice)

b. Read more about RNA-seq analysis

Luecken et al. 2019: RNA-seq analysis pipeline

Some key points:

c. Read the Morris paper

Morris et al. 2021: STING-Seq

Some key points:

Some key ideas:

  • STING-seq: Systematic Targeting and Inbition of Noncoding GWAS loci with scRNA-seq
  • prioritizes candidate cis-regulatory elements (cCREs, 1kb<distance to TSS<1Mb) using fine-mapped GWAS
  • selected 88 variants (in 56 loci) with enhancer activity
  • dual CRISPR inhibition: dCas9 as the GPS, MeCP2 and KRAB as the repressors
  • confirming dual CRISPRi efficacy: gRNAs target TSS of MRPS23, CTSB, FSCN1
  • CRIPSRi on the 88 variants: two gRNAs for each variant, both within 200bp of the variant
  • ECCITE-seq: captures gRNAs and epitopes

Some data processing steps and results:

  • QC: remove cells with low total reads or excessive mitochondrial reads, gRNA assignment UMI>5 (9,343 cells after QC)
  • Kallisto: counting read more on the official website
  • Seurat: QC and reference mapping? read more on the official website
  • SCEPTRE: gRNA_to_gene-expression pairwise test
  • non-targeting gRNA-gene pairs: not significant (negative ctrl)
  • TSS-targeting gRNA-gene pairs: expression significantly decreased (positive ctrl)
  • 37 of the 88 variants were significant
  • Trans-regulatory elements: I'll come back later

Note:

2. Prelim QC for raw STING-seq data

a. Download all data

Code: download.sh

b. Perform FastQC on all fastq files

Code: fastqc.sh

SRR14141135:

SRR14141136:

SRR14141137:

SRR14141138:

SRR14141139:

SRR14141140:

SRR14141141:

SRR14141142:

SRR14141143:

SRR14141144:

SRR14141145:

SRR14141146:

A brief summary:

  • length: 26bp or 57bp (trimmed?)
  • depth: 30-35x
  • overall quality: good (within ~40 bp)

d. Kallisto | bustools pipeline

Code: pip3-kb.sh
Code: anaconda_kallisto.sh

3. Analyze QC’ed STING-seq data

a. Install packages

Code: seurat.sh

b. Data overview

Code: overview.R
Note: about sparse matrix

The [Expression] matrix has: 
- 35,606 rows/genes/targets 
- 686,612 columns/barcodes/cells 
- 24,447,506,872 values in total 
- 82,507,471 values that are non-zero 
- 50,421,358 values that are 1 
- 32,086,113 values that are bigger than 1 
- 3,370,699 values that are bigger than 10 
- 259,734 values that are bigger than 100 
- 2,515 values that are bigger than 1,000 
- 0 values that are bigger than 10,000 
- 0 values that are bigger than 100,000 
 
The [gRNA] matrix has: 
- 210 rows/genes/targets 
- 137,347 columns/barcodes/cells 
- 28,842,870 values in total 
- 2,506,474 values that are non-zero 
- 1,510,919 values that are 1 
- 995,555 values that are bigger than 1 
- 121,554 values that are bigger than 10 
- 41,071 values that are bigger than 100 
- 2,232 values that are bigger than 1,000 
- 20 values that are bigger than 10,000 
- 0 values that are bigger than 100,000 
 
The [Hashtag] matrix has: 
- 4 rows/genes/targets 
- 410,228 columns/barcodes/cells 
- 1,640,912 values in total 
- 739,820 values that are non-zero 
- 409,830 values that are 1 
- 329,990 values that are bigger than 1 
- 218,280 values that are bigger than 10 
- 8,155 values that are bigger than 100 
- 282 values that are bigger than 1,000 
- 46 values that are bigger than 10,000 
- 0 values that are bigger than 100,000 

d. Expression (cDNA) dataset

i) Expression barcodes

Code: Expression_barcode_dist_plot.R

(Figures: pending)

ii) Expression targets

Code: Expression_target_dist_plot.R

Comment: The highest (mean) expressed gene is WDR45-like (WDR45L) pseudogene (high UMI counts in all cells)

Comment: The lowest (mean) expressed gene is RP4-669L17.1 pseudogene (zero UMI counts in all cells)

Comment: Non-zero UMI counts for all genes (~35k, including mito genes; 686,612 cells intotal)

Comment: CDF plot: ~80% genes have < ~5000 UMI counts in all cells (not all genes captured in each cell, but I guess still a lot)

Comment: PDF plot: same conclusion as above

e. gRNA dataset

i) gRNA barcodes

Code: gRNA_barcode_dist_plot.R

Comment: The cell with highest overall (mean) gRNAs, and it has 15 gRNAs

Comment: The cell with lowest overall (mean) gRNAs (transfection/transduction failed in this cell)

Comment: Non-zero UMI counts in all cells

Comment: CDF plot: ~80% cells have < ~40 UMI counts for each gRNA (note: the authors mentioned MOI ~ 10)

Comment: PDF plot: same conclusion as above

ii) gRNA targets

Code: gRNA_target_dist_plot.R

Comment: The highest (mean) gRNA in all cells (gRNA targeting PPIA-2, which is a control)

Comment: The lowest (mean) gRNA in all cells (likely it’s a low score gRNA site but the authors didn’t have better choices)

Comment: Non-zero UMI counst for all gRNAs (I’d say the transfection/transduction efficiency varies among gRNAs. The authors designed all the gRNAs within 200bp of the targeted variants,there must be limitations in terms of gRNA selection)

Comment: CDF plot: ~80% gRNAs have < ~20,000 UMI counts in all cells (137,347 cells in total, ~15% transfection/transduction success rate, acceptable)

Comment: PDF plot: same conclusion as above

f. Hashtag dataset

i) Hashtag barcodes

Code: Hashtag_barcode_dist_plot.R

Comment: The cell with highest (mean) Hashtags (note: the authors used only 4 Hashtags, I might check which antibodies they are when performing association)

Comment: The cell with lowest (mean) Hashtags (not tagged by any of the antibodies)

Comment: This figure is not an error. All cells have 1/2/3/4 UMI counts, and because many of them have 4, it looks like a block when it’s such dense

Comment: CDF plot: ~80% cells have < ~2 UMI counts for each Hashtag (It make sense to me because the authors are likely trying to label different cell types)

Comment: PDF plot: same conclusion as above

i) Hashtag targets

Code: Hashtag_target_dist_plot.R

Comment: The highest (mean) Hashtag (HTO23) in all cells (I would guess this is the relatively more common cell type, also, there were some non-specific antibody binding)

Comment: The lowest (mean) Hashtag (HTO25) in all cells (I would guess this is the relatively less common cell type, also, it dosen’t seem to overlap with HTO25, which is a good thing)

Comment: Non-zero UMI counts for the 4 Hashtags (I’d say the 4 cell types are relatively even)

Comment: CDF plot: ~80% Hashtags have < ~200,000 UMI counts in all cells (410,228 cells in total, I thinking the antibody binding efficiency is pretty good)

Comment: PDF plot: same conclusion as above


sessionInfo()
R version 4.2.0 (2022-04-22)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.10.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] workflowr_1.7.0

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.8.3     bslib_0.3.1      compiler_4.2.0   pillar_1.7.0    
 [5] later_1.3.0      git2r_0.30.1     jquerylib_0.1.4  tools_4.2.0     
 [9] getPass_0.2-2    digest_0.6.29    jsonlite_1.8.0   evaluate_0.15   
[13] tibble_3.1.7     lifecycle_1.0.1  pkgconfig_2.0.3  rlang_1.0.2     
[17] cli_3.3.0        rstudioapi_0.13  yaml_2.3.5       xfun_0.31       
[21] fastmap_1.1.0    httr_1.4.3       stringr_1.4.0    knitr_1.39      
[25] sass_0.4.1       fs_1.5.2         vctrs_0.4.1      rprojroot_2.0.3 
[29] glue_1.6.2       R6_2.5.1         processx_3.6.0   fansi_1.0.3     
[33] rmarkdown_2.14   callr_3.7.0      magrittr_2.0.3   whisker_0.4     
[37] ps_1.7.0         promises_1.2.0.1 htmltools_0.5.2  ellipsis_0.3.2  
[41] httpuv_1.6.5     utf8_1.2.2       stringi_1.7.6    crayon_1.5.1