Last updated: 2022-06-15

Checks: 7 0

Knit directory: rotation2/

This reproducible R Markdown analysis was created with workflowr (version 1.7.0). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.


Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

The command set.seed(20220607) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version d999122. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Unstaged changes:
    Modified:   .RData
    Modified:   .Rhistory

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.


These are the previous versions of the repository in which changes were made to the R Markdown (analysis/project_1.Rmd) and HTML (docs/project_1.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File Version Author Date Message
Rmd d999122 chenh19 2022-06-15 wflow_publish("./analysis/*.Rmd")
html 0aff555 chenh19 2022-06-15 Build site.
Rmd eda2d56 chenh19 2022-06-15 wflow_publish("./analysis/*.Rmd")
html a4e1e73 chenh19 2022-06-15 Build site.
Rmd 7229c17 chenh19 2022-06-15 wflow_publish("./analysis/*.Rmd")
html f0e98f9 chenh19 2022-06-14 Build site.
Rmd e0aa022 chenh19 2022-06-14 wflow_publish("./analysis/*.Rmd")
html eafa16b chenh19 2022-06-14 Build site.
Rmd 69b29f1 chenh19 2022-06-14 wflow_publish("./analysis/*.Rmd")
html dfd60ce chenh19 2022-06-14 Build site.
Rmd 49f1922 chenh19 2022-06-14 wflow_publish("./analysis/*.Rmd")
html fd7271e chenh19 2022-06-14 Build site.
html 1b4d12e chenh19 2022-06-14 Build site.
html a6c402d chenh19 2022-06-14 Build site.
html cedad99 chenh19 2022-06-14 Build site.
Rmd 6b3f021 chenh19 2022-06-14 wflow_publish("analysis/*.Rmd")
html e3b9788 chenh19 2022-06-14 Build site.
html aed5eed chenh19 2022-06-14 Build site.
Rmd c552123 chenh19 2022-06-14 wflow_publish("analysis/*.Rmd")
html 45bb6ed chenh19 2022-06-14 Build site.
Rmd 81fdf42 chenh19 2022-06-14 wflow_publish("analysis/*.Rmd")
html c23765a chenh19 2022-06-14 Build site.
html b5fb71c chenh19 2022-06-14 Build site.
Rmd 10f6641 chenh19 2022-06-14 wflow_publish("analysis/*.Rmd")
html 152325b chenh19 2022-06-14 Build site.
Rmd 1739dd9 chenh19 2022-06-14 wflow_publish("analysis/*.Rmd")
html 543ef4c chenh19 2022-06-14 Build site.
html 9b6cb27 chenh19 2022-06-14 Build site.
Rmd d8908c0 chenh19 2022-06-14 wflow_publish("analysis/*.Rmd")
html 8674c8a chenh19 2022-06-14 Build site.
Rmd 2972ce6 chenh19 2022-06-14 wflow_publish("analysis/*.Rmd")
html ada4068 chenh19 2022-06-14 Build site.
Rmd 7c5402e chenh19 2022-06-14 wflow_publish("analysis/*.Rmd")
html 0d08121 chenh19 2022-06-14 Build site.
Rmd 6226526 chenh19 2022-06-14 wflow_publish("analysis/*.Rmd")
html dd2046d chenh19 2022-06-14 Build site.
Rmd 71f5d04 chenh19 2022-06-14 wflow_publish("analysis/*.Rmd")
html 2c4cab1 chenh19 2022-06-14 Build site.
Rmd 6e73c04 chenh19 2022-06-14 wflow_publish("analysis/*.Rmd")
html d46eaab chenh19 2022-06-14 Build site.
Rmd be56d9d chenh19 2022-06-14 wflow_publish("analysis/*.Rmd")
html 02a0c26 chenh19 2022-06-14 Build site.
Rmd e2c15c0 chenh19 2022-06-14 wflow_publish("analysis/*.Rmd")
html abad46e chenh19 2022-06-14 Build site.
Rmd 68948e3 chenh19 2022-06-14 wflow_publish("analysis/*.Rmd")
html 741027b chenh19 2022-06-14 Build site.
Rmd 065d7e9 chenh19 2022-06-14 wflow_publish("analysis/*.Rmd")
html bb15812 chenh19 2022-06-14 Build site.
Rmd b6e0993 chenh19 2022-06-14 wflow_publish("analysis/*.Rmd")
html 93a27ae chenh19 2022-06-14 Build site.
Rmd b4a7331 chenh19 2022-06-14 wflow_publish("analysis/*.Rmd")
html 27121a9 chenh19 2022-06-14 Build site.
Rmd fce1ffd chenh19 2022-06-14 wflow_publish("analysis/*.Rmd")
html 44517c1 chenh19 2022-06-13 Build site.
Rmd cc6a40a chenh19 2022-06-13 wflow_publish("analysis/*.Rmd")
html da08f11 chenh19 2022-06-13 Build site.
Rmd 05ccc35 chenh19 2022-06-13 wflow_publish("analysis/*.Rmd")
html 572f6ba chenh19 2022-06-13 Build site.
html b8870d3 chenh19 2022-06-13 Build site.
html 719925e chenh19 2022-06-13 Build site.
html e7541fa chenh19 2022-06-13 Build site.
html 9d9615d chenh19 2022-06-13 Build site.
Rmd 04feaa7 chenh19 2022-06-13 wflow_publish("analysis/*.Rmd")
html bbd8978 chenh19 2022-06-13 Build site.
Rmd 0ec2bfa chenh19 2022-06-13 wflow_publish("analysis/*.Rmd")
html e5a5b52 chenh19 2022-06-13 Build site.
Rmd c43ae1f chenh19 2022-06-13 wflow_publish("analysis/*.Rmd")
html 4d8bd72 chenh19 2022-06-13 Build site.
html 3373521 chenh19 2022-06-13 Build site.
html af21ea8 chenh19 2022-06-13 Build site.
Rmd 6e56d75 chenh19 2022-06-13 wflow_publish("analysis/*.Rmd")
html f653f7b chenh19 2022-06-13 Build site.
Rmd 2723e7f chenh19 2022-06-13 wflow_publish("analysis/*.Rmd")
html d69c892 chenh19 2022-06-13 Build site.
html 34d877d chenh19 2022-06-13 Build site.
html e72400b chenh19 2022-06-13 Build site.
html c411223 chenh19 2022-06-13 Build site.
html 1daccd2 chenh19 2022-06-13 Build site.
Rmd 63f46d2 chenh19 2022-06-13 wflow_publish("analysis/*.Rmd")
html 26adb45 chenh19 2022-06-13 Build site.
html a6022a8 chenh19 2022-06-13 Build site.
Rmd 1215832 chenh19 2022-06-13 wflow_publish("analysis/*.Rmd")
html 9abc4b8 chenh19 2022-06-13 Build site.
Rmd 7efcfe0 chenh19 2022-06-13 wflow_publish("analysis/*.Rmd")
html f18d385 chenh19 2022-06-13 Build site.
Rmd a7c1ce0 chenh19 2022-06-13 wflow_publish("analysis/*.Rmd")
html e991f56 chenh19 2022-06-13 Build site.
html 3c9b1d9 chenh19 2022-06-13 Build site.
Rmd ae1553a chenh19 2022-06-13 wflow_publish("analysis/*.Rmd")
html 34e0d02 chenh19 2022-06-13 Build site.
Rmd e69aa83 chenh19 2022-06-13 wflow_publish("analysis/*.Rmd")
html 9be31af chenh19 2022-06-13 Build site.
Rmd ead84c2 chenh19 2022-06-13 wflow_publish("analysis/*.Rmd")
html 0f41de8 chenh19 2022-06-13 Build site.
html 31ad035 chenh19 2022-06-13 Build site.
html bdf3b44 chenh19 2022-06-13 Build site.
html 8d0890c chenh19 2022-06-13 Build site.
Rmd 26f455b chenh19 2022-06-13 update
html 26f455b chenh19 2022-06-13 update

1. Understand STING-seq

a. Read the Morris paper

Morris et al. 2021: STING-Seq

Some key points:

Some key ideas:

  • STING-seq: Systematic Targeting and Inbition of Noncoding GWAS loci with scRNA-seq
  • prioritizes candidate cis-regulatory elements (cCREs, 1kb<distance to TSS<1Mb) using fine-mapped GWAS
  • selected 88 variants (in 56 loci) with enhancer activity
  • dual CRISPR inhibition: dCas9 as the GPS, MeCP2 and KRAB as the repressors
  • confirming dual CRISPRi efficacy: gRNAs target TSS of MRPS23, CTSB, FSCN1
  • CRIPSRi on the 88 variants: two gRNAs for each variant, both within 200bp of the variant
  • ECCITE-seq: captures gRNAs and epitopes

Some data processing steps and results:

  • QC: remove cells with low total reads or excessive mitochondrial reads, gRNA assignment UMI>5 (9,343 cells after QC)
  • Kallisto: counting read more on the official website
  • Seurat: QC and reference mapping? read more on the official website
  • SCEPTRE: gRNA_to_gene-expression pairwise test
  • non-targeting gRNA-gene pairs: not significant (negative ctrl)
  • TSS-targeting gRNA-gene pairs: expression significantly decreased (positive ctrl)
  • 37 of the 88 variants were significant
  • Trans-regulatory elements: I'll come back later

Note:

2. Prelim QC for STING-seq data

a. Read about RNA-seq analysis

Yalamanchili et al. 2017: RNA-seq analysis pipeline

Some key points:

Protocol-1 (differential expression of genes):

  • demuxed raw reads (FastQC)
  • trimming reads (awk)
  • aligning reads (TopHat2)
  • counting reads (HTSeq; may filter out genes with low counts before next step)
  • detect DE using counted reads (DEseq2)
  • more QC (PCA/correlation heatmap)

Protocol-2 (differential usage of isoforms):

  • Protocol-1
  • counting isoforms (Kallisto, also check cell ranger)
  • detect DU using counted isoforms (Sleuth)
  • more QC (aslo PCA/correlation heatmap)

Protocol-3 (crypic splicing):

  • Protocol-1
  • detect differential junstions (CrypSplice)

b. Download data

rsync -a hchen131@midway2.rcc.uchicago.edu:/project2/xuanyao/data/Morris_2021 ~/Desktop

c. Perform FastQC on all fastq files

# install fastqc
sudo apt-get update && sudo apt-get install fastqc -y

# run fastqc in parallel
cd ~/rotation2/
[ ! -d ../Morris_2021/fastqc_results/ ] && mkdir ../Morris_2021/fastqc_results/
fastqc --threads 16 ../Morris_2021/STINGseq_Morris_2021_raw/*.fastq.gz --outdir=../Morris_2021/fastqc_results/

SRR14141135:

SRR14141136:

SRR14141137:

SRR14141138:

SRR14141139:

SRR14141140:

SRR14141141:

SRR14141142:

SRR14141143:

SRR14141144:

SRR14141145:

SRR14141146:

A brief summary:

  • length: 26bp or 57bp (trimmed?)
  • depth: 30-35x
  • overall quality: good (within ~40 bp)

d. Kallisto | bustools pipeline

Install package: pip3 install kb:

# install via python3-pip (python3 is installed by default)
sudo apt-get update && sudo apt install python3-pip -y
sudo pip3 install kb-python

# run
kb

Side note: conda install kallisto (not necessary for kb):

# install anaconda
[ ! -d ./shscript/ ] && mkdir ./shscript/
wget -O ./shscript/Anaconda-latest-Linux-x86_64.sh https://repo.anaconda.com/archive/Anaconda3-2022.05-Linux-x86_64.sh
bash ./shscript/Anaconda-latest-Linux-x86_64.sh && sleep 3
conda update anaconda -y && conda update --all -y
conda config --add channels defaults
conda config --add channels bioconda
conda config --add channels conda-forge
conda install kallisto -y
conda config --set auto_activate_base false # disable auto activate base in terminal
#conda activate # activate base when needed
#rm -rf ~/anaconda3/ # uninstall anaconda
rm -rf ./shscript/

3. Analyze QC’ed STING-seq data

Note:

  • cDNA, HTO (Hashing antibody-oligos), GDO (gRNAs)
  • ECCITE-seq:

a. Install R packages

sudo apt-get update && sudo apt-get install libgeos-dev -y
sudo -i R
installe.packages("Seurat")
q()

b. Import QC’ed data

Note: about sparse matrix

# load expression data
Expression <- Seurat::ReadMtx(mtx = "../Morris_2021/GSM5225857_cDNA.mtx", 
                                features = "../Morris_2021/GSM5225857_cDNA.features.txt", 
                                cells = "../Morris_2021/GSM5225857_cDNA.barcodes.txt", 
                                cell.column=1, 
                                feature.column=1, 
                                mtx.transpose=T)

# load gRNA data
gRNA <- Seurat::ReadMtx(mtx = "../Morris_2021/GSM5225858_GDO.mtx", 
                                 features = "../Morris_2021/GSM5225858_GDO.features.txt", 
                                 cells = "../Morris_2021/GSM5225858_GDO.barcodes.txt", 
                                 cell.column=1, 
                                 feature.column=1, 
                                 mtx.transpose=T)
overall <- function(mtx){
  cat(paste0('The [', deparse(substitute(mtx)), '] matrix has: \n', 
        '- ', format(mtx@Dim[1], big.mark=",", scientific=FALSE), 
        ' rows/genes/targets and \n', 
        '- ', format(mtx@Dim[2], big.mark=",", scientific=FALSE), 
        ' columns/barcodes/cells \n', 
        '- ', format(length(mtx), big.mark=",", scientific=FALSE), 
        ' values in total \n', 
        '- ', format(sum(mtx != 0), big.mark=",", scientific=FALSE), 
        ' values that are non-zero \n', 
        '- ', format(sum(mtx == 1), big.mark=",", scientific=FALSE), 
        ' values that are 1 \n', 
        '- ', format(sum(mtx > 1), big.mark=",", scientific=FALSE), 
        ' values that are bigger than 1 \n', 
        '- ', format(sum(mtx > 10), big.mark=",", scientific=FALSE), 
        ' values that are bigger than 10 \n', 
        '- ', format(sum(mtx > 100), big.mark=",", scientific=FALSE), 
        ' values that are bigger than 100 \n', 
        '- ', format(sum(mtx > 1000), big.mark=",", scientific=FALSE), 
        ' values that are bigger than 1,000 \n', 
        '- ', format(sum(mtx > 10000), big.mark=",", scientific=FALSE), 
        ' values that are bigger than 10,000 \n', 
        '- ', format(sum(mtx > 100000), big.mark=",", scientific=FALSE), 
        ' values that are bigger than 100,000 \n', 
        ' \n'))
}

overall(Expression)
overall(gRNA)
> overall(Expression)
The [Expression] matrix has: 
- 35,606 rows/genes/targets and 
- 686,612 columns/barcodes/cells 
- 24,447,506,872 values in total 
- 82,507,471 values that are non-zero 
- 50,421,358 values that are 1 
- 32,086,113 values that are bigger than 1 
- 3,370,699 values that are bigger than 10 
- 259,734 values that are bigger than 100 
- 2,515 values that are bigger than 1,000 
- 0 values that are bigger than 10,000 
- 0 values that are bigger than 100,000 
 
> overall(gRNA)
The [gRNA] matrix has: 
- 210 rows/genes/targets and 
- 137,347 columns/barcodes/cells 
- 28,842,870 values in total 
- 2,506,474 values that are non-zero 
- 1,510,919 values that are 1 
- 995,555 values that are bigger than 1 
- 121,554 values that are bigger than 10 
- 41,071 values that are bigger than 100 
- 2,232 values that are bigger than 1,000 
- 20 values that are bigger than 10,000 
- 0 values that are bigger than 100,000 

try:

class(f_exp_matrix)
attributes(f_exp_matrix)
dim(f_exp_matrix)
f_exp_matrix[1,1]
summary(f_exp_matrix[,1])


col_num <- 199
test <- f_exp_matrix[,col_num]
sum(test !=0)
plot(test)

row_num <- 199

row_num <- 199
test <- f_exp_matrix[row_num,]
sum(test != 0)
plot(test)

sum(f_exp_matrix >= 1)
sum(f_exp_matrix > 1)
sum(f_exp_matrix == 1)
  • gene by cell
  • check how many cells with non zero expression genes
  • check how many gene with non zero expression cells
  • hist()

sessionInfo()
R version 4.2.0 (2022-04-22)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.10.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] workflowr_1.7.0

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.8.3     bslib_0.3.1      compiler_4.2.0   pillar_1.7.0    
 [5] later_1.3.0      git2r_0.30.1     jquerylib_0.1.4  tools_4.2.0     
 [9] getPass_0.2-2    digest_0.6.29    jsonlite_1.8.0   evaluate_0.15   
[13] tibble_3.1.7     lifecycle_1.0.1  pkgconfig_2.0.3  rlang_1.0.2     
[17] cli_3.3.0        rstudioapi_0.13  yaml_2.3.5       xfun_0.31       
[21] fastmap_1.1.0    httr_1.4.3       stringr_1.4.0    knitr_1.39      
[25] sass_0.4.1       fs_1.5.2         vctrs_0.4.1      rprojroot_2.0.3 
[29] glue_1.6.2       R6_2.5.1         processx_3.6.0   fansi_1.0.3     
[33] rmarkdown_2.14   callr_3.7.0      magrittr_2.0.3   whisker_0.4     
[37] ps_1.7.0         promises_1.2.0.1 htmltools_0.5.2  ellipsis_0.3.2  
[41] httpuv_1.6.5     utf8_1.2.2       stringi_1.7.6    crayon_1.5.1