Last updated: 2023-11-08
Checks: 5 2
Knit directory:
20211209_JingxinRNAseq/analysis/
This reproducible R Markdown analysis was created with workflowr (version 1.7.0). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.
The R Markdown file has unstaged changes. To know which version of
the R Markdown file created these results, you’ll want to first commit
it to the Git repo. If you’re still working on the analysis, you can
ignore this warning. When you’re finished, you can run
wflow_publish to commit the R Markdown file and build the
HTML.
Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.
The command set.seed(19900924) was run prior to running
the code in the R Markdown file. Setting a seed ensures that any results
that rely on randomness, e.g. subsampling or permutations, are
reproducible.
Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.
Using absolute paths to the files within your workflowr project makes it difficult for you and others to run your code on a different machine. Change the absolute path(s) below to the suggested relative path(s) to make your code more reproducible.
| absolute | relative |
|---|---|
| /project2/yangili1/bjf79/20211209_JingxinRNAseq/code/bigwigs/unstranded/ | ../code/bigwigs/unstranded |
| /project2/yangili1/bjf79/20211209_JingxinRNAseq/code/SplicingAnalysis/leafcutter_all_samples_202310/leafcutter_perind_numers.bed.gz | ../code/SplicingAnalysis/leafcutter_all_samples_202310/leafcutter_perind_numers.bed.gz |
| /project2/yangili1/bjf79/20211209_JingxinRNAseq/code/ | ../code |
Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.
The results in this page were generated with repository version 5dce412. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.
Note that you need to be careful to ensure that all relevant files for
the analysis have been committed to Git prior to generating the results
(you can use wflow_publish or
wflow_git_commit). workflowr only checks the R Markdown
file, but you know if there are other scripts or data files that it
depends on. Below is the status of the Git repository when the results
were generated:
Ignored files:
Ignored: .DS_Store
Ignored: .Rhistory
Ignored: .Rproj.user/
Ignored: analysis/.RData
Ignored: analysis/.Rhistory
Ignored: analysis/20220707_TitrationSeries_DE_testing.nb.html
Ignored: analysis/figure/
Ignored: code/.DS_Store
Ignored: code/._DOCK7.pdf
Ignored: code/._DOCK7_DMSO1.pdf
Ignored: code/._DOCK7_SM2_1.pdf
Ignored: code/._FKTN_DMSO_1.pdf
Ignored: code/._FKTN_SM2_1.pdf
Ignored: code/._MAPT.pdf
Ignored: code/._PKD1_DMSO_1.pdf
Ignored: code/._PKD1_SM2_1.pdf
Ignored: code/.snakemake/
Ignored: code/1KG_HighCoverageCalls.samplelist.txt
Ignored: code/5ssSeqs.tab
Ignored: code/Alignments/
Ignored: code/Branaplam_Risdiplam_specific_introns.bed.gz
Ignored: code/Branaplam_Risdiplam_specific_introns.bed.gz.tbi
Ignored: code/ChemCLIP/
Ignored: code/ClinVar/
Ignored: code/DE_testing/
Ignored: code/DE_tests.mat.counts.gz
Ignored: code/DE_tests.txt.gz
Ignored: code/DataNotToCommit/
Ignored: code/DonorMotifSearches/
Ignored: code/DoseResponseData/
Ignored: code/Exp202310_tracks.xml
Ignored: code/Exp202310_tracks_e.xml
Ignored: code/Exp202310_tracks_overlaid.xml
Ignored: code/Fastq/
Ignored: code/FastqFastp/
Ignored: code/FivePrimeSpliceSites.txt
Ignored: code/FragLenths/
Ignored: code/Ishigami/
Ignored: code/Meme/
Ignored: code/Multiqc/
Ignored: code/OMIM/
Ignored: code/OldBigWigs/
Ignored: code/PhyloP/
Ignored: code/QC/
Ignored: code/ReferenceGenomes/
Ignored: code/Rplots.pdf
Ignored: code/Session.vim
Ignored: code/Session2.vim
Ignored: code/SplicingAnalysis/
Ignored: code/TracksSession
Ignored: code/bigwigs/
Ignored: code/featureCounts/
Ignored: code/figs/
Ignored: code/fimo_out/
Ignored: code/geena/
Ignored: code/hg38ToMm39.over.chain.gz
Ignored: code/igv_session.template.xml
Ignored: code/igv_session.xml
Ignored: code/log
Ignored: code/logs/
Ignored: code/rstudio-server.job
Ignored: code/scratch/
Ignored: code/temp/
Ignored: code/test.txt.gz
Ignored: code/testPlottingWithMyScript.ForJingxin.sh
Ignored: code/testPlottingWithMyScript.ForJingxin2.sh
Ignored: code/testPlottingWithMyScript.ForJingxin3.sh
Ignored: code/testPlottingWithMyScript.ForJingxin4.sh
Ignored: code/testPlottingWithMyScript.sh
Ignored: code/tracks.with_chRNA.RisdiBrana.xml
Ignored: code/tracks.with_chRNA.RisdiBranaWithExtras.xml
Ignored: code/tracks.xml
Ignored: data/~$52CompoundsTempPlateLayoutForPipettingConvenience.xlsx
Untracked files:
Untracked: code/scripts/DE_edgeR_ExpOf3.R
Untracked: data/202310_FullMetadata.tsv.gz
Unstaged changes:
Modified: analysis/2023-11-02_3MoleculesOfInterestLeafcutterDs.Rmd
Modified: code/Snakefile
Modified: code/rules/ExpMoleculesOfInterest.smk
Modified: code/rules/ProcessTitrationSeries.smk
Modified: code/rules/Process_chRNA.smk
Modified: code/rules/RNASeqProcessing.smk
Modified: code/scripts/GenometracksByGenotype
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
These are the previous versions of the repository in which changes were
made to the R Markdown
(analysis/2023-11-02_3MoleculesOfInterestLeafcutterDs.Rmd)
and HTML
(docs/2023-11-02_3MoleculesOfInterestLeafcutterDs.html)
files. If you’ve configured a remote Git repository (see
?wflow_git_remote), click on the hyperlinks in the table
below to view the files as they were in that past version.
| File | Version | Author | Date | Message |
|---|---|---|---|---|
| Rmd | 5dce412 | Benjmain Fair | 2023-11-02 | updated nb |
| html | 5dce412 | Benjmain Fair | 2023-11-02 | updated nb |
I’ve processed each of three doses for each of three molecules as a leafcutter contrast with 5 DMSO samples… Previously in the firboblast experiment, I noticed a correlation between effect size and different types of NN in NNGA|GT introns… I will similarly look for that here with these three new molecules.
load libs
library(tidyverse)
library(RColorBrewer)
library(data.table)
library(gplots)
library(ggrepel)
library(GGally)
library(edgeR)
# define some useful funcs
sample_n_of <- function(data, size, ...) {
dots <- quos(...)
group_ids <- data %>%
group_by(!!! dots) %>%
group_indices()
sampled_groups <- sample(unique(group_ids), size)
data %>%
filter(group_ids %in% sampled_groups)
}
# Set theme
theme_set(
theme_classic() +
theme(text=element_text(size=16, family="Helvetica")))
# I use layer a lot, to rotate long x-axis labels
Rotate_x_labels <- theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust=1))
Read in data
juncs <- fread("../code/SplicingAnalysis/FullSpliceSiteAnnotations/JuncfilesMerged.annotated.basic.bed.gz") %>%
mutate(Junc = paste(chrom, start, end, strand, sep="_"))
donors <- fread("../code/SplicingAnalysis/FullSpliceSiteAnnotations/JuncfilesMerged.annotated.basic.bed.5ss.tab.gz", col.names = c("Junc.Donor", "seq", "DonorScore")) %>%
separate(Junc.Donor, into=c("Junc", "Donor"), sep="::")
Cluster_sig <-
Sys.glob("../code/SplicingAnalysis/Exp202310_3Molecules_Contrasts/*_cluster_significance.txt") %>%
setNames(str_replace(., "../code/SplicingAnalysis/Exp202310_3Molecules_Contrasts/(.+?)_cluster_significance.txt", "\\1")) %>%
append(
Sys.glob("../code/SplicingAnalysis/leafcutter/differential_splicing/*_cluster_significance.txt") %>%
grep("ExpOf52", ., invert=T, value=T) %>%
setNames(str_replace(., "../code/SplicingAnalysis/leafcutter/differential_splicing/(.+?)_cluster_significance.txt", "\\1"))
) %>%
lapply(fread) %>%
bind_rows(.id="Contrast")
effect_sizes <- Sys.glob("../code/SplicingAnalysis/Exp202310_3Molecules_Contrasts/*_effect_sizes.txt") %>%
setNames(str_replace(., "../code/SplicingAnalysis/Exp202310_3Molecules_Contrasts/(.+?)_effect_sizes.txt", "\\1")) %>%
append(
Sys.glob("../code/SplicingAnalysis/leafcutter/differential_splicing/*_effect_sizes.txt") %>%
grep("ExpOf52", ., invert=T, value=T) %>%
setNames(str_replace(., "../code/SplicingAnalysis/leafcutter/differential_splicing/(.+?)_effect_sizes.txt", "\\1"))
) %>%
lapply(fread, col.names=c("intron", "logef", "DMSO", "treatment", "deltapsi")) %>%
bind_rows(.id="Contrast") %>%
mutate(Junc = str_replace(intron, "^(.+?):(.+?):(.+?):clu_.+?_([+-])$", "\\1_\\2_\\3_\\4")) %>%
mutate(cluster = str_replace(intron, "^(.+?):.+?:.+?:(clu_.+?_[+-])$", "\\1:\\2")) %>%
inner_join(
inner_join(juncs, donors)
) %>%
inner_join(Cluster_sig) %>%
mutate(CellType = if_else(str_detect(Contrast, "_"), "LCL", "Fibroblast")) %>%
mutate(logef = if_else(CellType=="Fibroblast", -1*logef, logef)) %>%
mutate(logef = if_else(CellType=="Fibroblast", -1*deltapsi, deltapsi)) %>%
mutate(NN__GU = str_extract(seq, "^\\w{2}")) %>%
mutate(NNGU = substr(seq, 3, 4))
effect_sizes %>%
filter(p.adjust < 0.01) %>%
ggplot(aes(x=logef, color=Contrast)) +
stat_ecdf() +
geom_vline(xintercept = 0, linetype="dashed") +
geom_vline(
data = . %>%
group_by(NNGU, Contrast) %>%
summarise(med=median(logef, na.rm=T)),
aes(xintercept=med, color=Contrast),
linetype='dashed'
) +
coord_cartesian(xlim=c(-0.5, 0.5)) +
ylab("CumulativeFraction") +
facet_wrap(~NNGU) +
theme_light() +
ggtitle("Effect sizes for NN|GT introns, grouped by NN") +
labs(caption="Only included introns in FDR<0.01 significant clusters")

| Version | Author | Date |
|---|---|---|
| 5dce412 | Benjmain Fair | 2023-11-02 |
effect_sizes %>%
# filter(p.adjust < 0.01) %>%
filter(NNGU=="GA") %>%
ggplot(aes(x=logef, color=Contrast)) +
stat_ecdf() +
geom_vline(xintercept = 0, linetype="dashed") +
geom_vline(
data = . %>%
group_by(NN__GU, Contrast) %>%
summarise(med=median(logef, na.rm=T)),
aes(xintercept=med, color=Contrast),
linetype='dashed'
) +
coord_cartesian(xlim=c(-0.5, 0.5)) +
ylab("CumulativeFraction") +
facet_wrap(~NN__GU) +
theme_light() +
ggtitle("Effect sizes for NNGA|GT introns, grouped by NN") +
labs(caption="All tested GA|GU")

| Version | Author | Date |
|---|---|---|
| 5dce412 | Benjmain Fair | 2023-11-02 |
Make same plot with less samples… There’s a lot of clutter…
Colors <- effect_sizes %>%
distinct(Contrast) %>%
mutate(Colors = recode(Contrast, SMSM27_3000="#3182bd", SMSM27_600="#9ecae1", SMSM27_120="#deebf7", SMSM32_120="#fee0d2", SMSM32_3000="#de2d26", SMSM32_600="#fc9272", SMSM6_10000="#a65628", SMSM6_2000="#a65628", SMSM6_50000="#a65628", A10="#7fc97f", C2="#beaed4", CUOME="#fdc086", SM2="#ffff99")) %>%
dplyr::select(Contrast, Colors) %>% deframe()
Colors.Subset <- effect_sizes %>%
distinct(Contrast) %>%
filter(Contrast %in% c("SM2", "C2", "SMSM27_3000", "SMSM27_600", "SMSM32_3000", "SMSM32_600", "SMSM6_50000")) %>%
mutate(Colors = recode(Contrast, SMSM6_50000="#b15928", SMSM27_3000="#e31a1c", SMSM27_600="#fb9a99", SMSM32_3000="#ff7f00", SMSM32_600="#fdbf6f", C2="#377eb8", SM2="#4daf4a")) %>%
dplyr::select(Contrast, Colors) %>% deframe()
effect_sizes %>%
# filter(p.adjust < 0.01) %>%
filter(Contrast %in% names(Colors.Subset)) %>%
filter(NNGU=="GA") %>%
group_by(NN__GU) %>%
mutate(Unique_Elements = n_distinct(intron)) %>% # Now summarise with unique elements per group
ungroup() %>%
mutate(FacetGroup = str_glue("{NN__GU}; n={Unique_Elements}")) %>%
ggplot(aes(x=logef, color=Contrast)) +
stat_ecdf() +
geom_vline(xintercept = 0, linetype="dashed") +
geom_vline(
data = . %>%
group_by(FacetGroup, Contrast) %>%
summarise(med=median(logef, na.rm=T)),
aes(xintercept=med, color=Contrast),
linetype='dashed'
) +
scale_color_manual(values=Colors.Subset, drop=T) +
coord_cartesian(xlim=c(-0.1, 0.1)) +
ylab("CumulativeFraction") +
facet_wrap(~FacetGroup) +
theme_light() +
ggtitle("Effect sizes for NNGA|GT introns, grouped by NN") +
labs(caption="All leafcutter-tested GA|GU")

| Version | Author | Date |
|---|---|---|
| 5dce412 | Benjmain Fair | 2023-11-02 |
So first recall a similar plot I have made in the past (in this notebook: https://bfairkun.github.io/SmallMoleculeRNASeq/20220923_ExploreSpecificityEstimates.html), where the x-axis is the spearman dose:PSI correlation coefficient, sort of a proxy for effects on different NN sub-categories of NNGA|GU introns. branaplam has stronger effects on GA, CA, and TA… but AA looks more or less the same between risdiplam/branaplam… AT maybe looks a bit risdiplam specific… This btw is the same sequence in that risdiplam-specific sequence in HSD17B4 which has two poison exons. In any case, now look at the second plot, where the x-axis is the logef (from leafcutter) for introns with these different groups… I also included some data from the fibroblast experiment: SM2 (colored green, because it is a branaplam scaffold) and C2 (colored blue because it is a risdiplam scaffold). Again just looking at SM2 and C2, the relative effects between classes are mostly consistent with the dose titration experiment with risdiplam and branaplam… Anyway, now looking at SMSM27 and SMSM32 (the red and orange), the effects look stronger than C2 for the GA and CA facets… Again, its tricky to compare, because the effective genome-wide dose isn’t exactly the same between experiments, and the cell type is different… But I figure that by looking across different facets you can sort of control for those things. So from that, I do feel in some ways these molecules do have some slight branaplam-like-character moreso than risdiplam.
I used edge to make contrasts for each treatment:dose vs DMSO… I will look at the results, with a particular focus on identifying any totally novel genes for now, that don’t have any apparent effect in risdiplam/branaplam…
DE.results <-
bind_rows(
fread("../code/DE_testing/ExpOf3_Results.txt.gz") %>%
mutate(CellType = "LCL") %>%
mutate(treatment = str_replace(treatment, "^group", "")),
fread("../code/DE_testing/Results.txt.gz") %>%
mutate(CellType = "Fibroblast")
)
DE.results %>%
ggplot(aes(x=logFC, y=-log10(FDR), color=FDR<0.01)) +
geom_point(alpha=0.01) +
facet_wrap(~treatment)

DE.results %>%
dplyr::select(treatment, logCPM, Geneid) %>%
pivot_wider(names_from="treatment", values_from="logCPM") %>%
column_to_rownames("Geneid") %>%
drop_na() %>%
cor(method='s')
SMSM27_120 SMSM27_3000 SMSM27_600 SMSM32_120 SMSM32_3000 SMSM32_600
SMSM27_120 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000
SMSM27_3000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000
SMSM27_600 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000
SMSM32_120 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000
SMSM32_3000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000
SMSM32_600 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000
SMSM6_10000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000
SMSM6_2000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000
SMSM6_50000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000
C2 0.6718307 0.6718307 0.6718307 0.6718307 0.6718307 0.6718307
CUOME 0.6718307 0.6718307 0.6718307 0.6718307 0.6718307 0.6718307
SM2 0.6718307 0.6718307 0.6718307 0.6718307 0.6718307 0.6718307
A10 0.6718307 0.6718307 0.6718307 0.6718307 0.6718307 0.6718307
SMSM6_10000 SMSM6_2000 SMSM6_50000 C2 CUOME SM2
SMSM27_120 1.0000000 1.0000000 1.0000000 0.6718307 0.6718307 0.6718307
SMSM27_3000 1.0000000 1.0000000 1.0000000 0.6718307 0.6718307 0.6718307
SMSM27_600 1.0000000 1.0000000 1.0000000 0.6718307 0.6718307 0.6718307
SMSM32_120 1.0000000 1.0000000 1.0000000 0.6718307 0.6718307 0.6718307
SMSM32_3000 1.0000000 1.0000000 1.0000000 0.6718307 0.6718307 0.6718307
SMSM32_600 1.0000000 1.0000000 1.0000000 0.6718307 0.6718307 0.6718307
SMSM6_10000 1.0000000 1.0000000 1.0000000 0.6718307 0.6718307 0.6718307
SMSM6_2000 1.0000000 1.0000000 1.0000000 0.6718307 0.6718307 0.6718307
SMSM6_50000 1.0000000 1.0000000 1.0000000 0.6718307 0.6718307 0.6718307
C2 0.6718307 0.6718307 0.6718307 1.0000000 1.0000000 1.0000000
CUOME 0.6718307 0.6718307 0.6718307 1.0000000 1.0000000 1.0000000
SM2 0.6718307 0.6718307 0.6718307 1.0000000 1.0000000 1.0000000
A10 0.6718307 0.6718307 0.6718307 1.0000000 1.0000000 1.0000000
A10
SMSM27_120 0.6718307
SMSM27_3000 0.6718307
SMSM27_600 0.6718307
SMSM32_120 0.6718307
SMSM32_3000 0.6718307
SMSM32_600 0.6718307
SMSM6_10000 0.6718307
SMSM6_2000 0.6718307
SMSM6_50000 0.6718307
C2 1.0000000
CUOME 1.0000000
SM2 1.0000000
A10 1.0000000
Ok, so it is clear that the logCPM column probably refers to the DMSO expression, or maybe the mean expession, and as it is the same value for amongst each contrast in the group of contrasts… I am anticipating looking for things that are significant and strong effect size in LCL, and further filtering out the genes for which that may be the case because there is lower expression and therefore likely less power in LCLs.
GenesHigherExpressedInLCLs <- DE.results %>%
distinct(CellType, Geneid, logCPM) %>%
pivot_wider(names_from="CellType", values_from="logCPM") %>%
filter(LCL > Fibroblast) %>%
pull(Geneid)
diag_limitrange <- function(data, mapping, ...) {
ggplot(data = data, mapping = mapping, ...) +
geom_density(...) +
# coord_cartesian(xlim = c(1, 1E7)) +
theme_bw()
}
upper_point <- function(data, mapping, ...) {
ggplot(data = data, mapping = mapping, ...) +
geom_hex(...) +
geom_abline() +
theme_bw()
}
my_fn <- function(data, mapping, method="s", use="everything", ...){
# grab data
x <- eval_data_col(data, mapping$x)
y <- eval_data_col(data, mapping$y)
# calculate correlation
corr <- cor(x, y, method=method, use=use)
# calculate colour based on correlation value
# Here I have set a correlation of minus one to blue,
# zero to white, and one to red
# Change this to suit: possibly extend to add as an argument of `my_fn`
colFn <- colorRampPalette(c("blue", "white", "red"), interpolate ='spline')
fill <- colFn(100)[findInterval(corr, seq(-1, 1, length=100))]
ggally_cor(data = data, mapping = mapping, ...) +
theme_void() +
theme(panel.background = element_rect(fill=fill) + theme_classic())
}
DE.results %>%
dplyr::select(treatment, logFC, Geneid) %>%
# filter(Geneid %in% GenesHigherExpressedInLCLs) %>%
pivot_wider(names_from="treatment", values_from="logFC") %>%
dplyr::select(-Geneid) %>%
drop_na() %>%
ggpairs(
title="log2FC across contrasts",
# upper=list(continuous = my_fn),
upper=list(continuous = wrap("cor", method = "spearman", hjust=0.7)),
lower=list(continuous = upper_point),
diag=list(continuous = diag_limitrange)) +
theme_classic()

DE.results %>%
dplyr::select(treatment, logFC, Geneid) %>%
# filter(Geneid %in% GenesHigherExpressedInLCLs) %>%
pivot_wider(names_from="treatment", values_from="logFC") %>%
drop_na() %>%
column_to_rownames("Geneid") %>%
cor(method='s') %>%
heatmap.2(trace='none')

Let’s find some genes that are DE in SMSM27 or SMSM32, but not in SM2 or C2, and then check for their dose response curve in branaplam/risdiplam…
GenesSignificantInFibroblast <- DE.results %>%
filter(CellType == "Fibroblast") %>%
filter(FDR < 0.01 & abs(logFC) > 0.5) %>%
filter(treatment %in% c("C2", "SM2")) %>%
distinct(Geneid) %>% pull(Geneid)
DE.results %>%
filter(CellType == "LCL") %>%
filter(FDR < 0.01 & abs(logFC) > 1) %>%
filter(!Geneid %in% GenesSignificantInFibroblast) %>%
filter(!Geneid %in% GenesHigherExpressedInLCLs) %>%
ggplot(aes(x=logFC, y=-log10(FDR))) +
geom_point(alpha=0.1) +
facet_wrap(~treatment)

DE.results %>%
filter(CellType == "LCL") %>%
filter(FDR < 0.01 & abs(logFC) > 1) %>%
filter(!Geneid %in% GenesSignificantInFibroblast) %>%
filter(!Geneid %in% GenesHigherExpressedInLCLs)
treatment Geneid logFC logCPM F
1: SMSM27_120 ENSG00000133063.16_CHIT1 2.052872 2.0811621 99.63585
2: SMSM27_120 ENSG00000130032.17_PRRG3 6.983977 -0.1006592 44.53688
3: SMSM27_120 ENSG00000204822.7_MRPL53 1.334302 2.6444511 34.40114
4: SMSM27_120 ENSG00000263264.2_NA -1.069657 1.5626708 25.81330
5: SMSM27_120 ENSG00000079337.16_RAPGEF3 -1.569852 0.2900860 25.81021
---
1159: SMSM6_50000 ENSG00000101443.18_WFDC2 -1.788250 -0.7560406 10.28984
1160: SMSM6_50000 ENSG00000080823.23_MOK 1.080904 -0.4729347 10.25412
1161: SMSM6_50000 ENSG00000184786.6_DYNLT2 -2.522072 -1.6502297 10.16735
1162: SMSM6_50000 ENSG00000089505.17_CMTM1 -1.453388 -0.9389790 10.16304
1163: SMSM6_50000 ENSG00000185561.10_TLCD2 1.713482 -0.8396798 10.14342
PValue FDR CellType
1: 1.087421e-11 6.396874e-09 LCL
2: 1.106004e-07 9.845675e-06 LCL
3: 1.240318e-06 5.484152e-05 LCL
4: 1.314963e-05 2.924082e-04 LCL
5: 1.316161e-05 2.924082e-04 LCL
---
1159: 2.892821e-03 8.756121e-03 LCL
1160: 2.936126e-03 8.869342e-03 LCL
1161: 3.044220e-03 9.140766e-03 LCL
1162: 3.049700e-03 9.151131e-03 LCL
1163: 3.074773e-03 9.207986e-03 LCL
So I have a list of genes that are pretty meaningfully DE in SMSM27 or SMSM32 and not in SM2 or C2… Let’s check their dose response curve in branaplam/risdiplam…
FullMetadata <- read_tsv('../data/202310_FullMetadata.tsv.gz')
doseResponse.ColorKey <- FullMetadata %>%
group_by(treatment) %>%
filter(dose.nM == max(dose.nM)) %>%
ungroup() %>%
filter(treatment %in% c("Risdiplam", "Branaplam", "C2C5", "SMSM6", "SMSM32", "SMSM27")) %>%
distinct(treatment, color) %>%
add_row(treatment = "Old.DMSO", color = "#cccccc") %>%
add_row(treatment = "New.DMSO", color = "#252525")
gene.counts <- read_tsv("../code/featureCounts/AllSamples_Counts.txt", comment = "#") %>%
rename_with(~ str_replace(.x, "Alignments/STAR_Align/(.+?)/Aligned.sortedByCoord.out.bam", "\\1"), contains("Alignments")) %>%
dplyr::select(-c(2:6)) %>%
column_to_rownames("Geneid") %>%
DGEList() %>%
calcNormFactors()
CPM.mat <- cpm(gene.counts, prior.count = 0.1, log = T)
symbols <- read_tsv("../data/Genes.list.txt")
PotentialNewMoleculeSpecificDEGenes <- DE.results %>%
filter(CellType == "LCL") %>%
filter(FDR < 0.01 & abs(logFC) > 2) %>%
filter(!Geneid %in% GenesSignificantInFibroblast) %>%
filter(!Geneid %in% GenesHigherExpressedInLCLs) %>%
separate(Geneid, into=c("ensembl_id", "hgnc_symbol"), sep="_")
set.seed(0)
CPM.mat %>%
as.data.frame() %>%
rownames_to_column() %>%
mutate(ensembl_gene_id = str_replace_all(rowname, "\\..+$", "")) %>%
as_tibble() %>%
left_join(symbols) %>%
filter(hgnc_symbol %in% PotentialNewMoleculeSpecificDEGenes$hgnc_symbol) %>%
sample_n_of(size = 25, hgnc_symbol) %>%
dplyr::select(-rowname) %>%
pivot_longer(names_to = "old.sample.name", values_to = "log2CPM", -c("hgnc_symbol", "ensembl_gene_id")) %>%
left_join(FullMetadata) %>%
filter(Experiment %in% c("Dose response titration", "Exp090823")) %>%
mutate(doseInApproxRisdiscale = case_when(
treatment == "Risdiplam" ~ dose.nM,
treatment == "C2C5" ~ dose.nM * 10,
treatment == "Branaplam" ~ dose.nM * sqrt(10),
treatment == "DMSO" ~ 0.316,
TRUE ~ dose.nM * sqrt(10),
)) %>%
mutate(treatment = case_when(
treatment == "DMSO" & Experiment == "Dose response titration" ~ "Old.DMSO",
treatment == "DMSO" & Experiment == "Exp090823" ~ "New.DMSO",
TRUE ~ treatment
)) %>%
ggplot(aes(x=doseInApproxRisdiscale, y=log2CPM, group=interaction(rep, treatment), color=treatment)) +
geom_point() +
geom_line() +
scale_x_continuous(trans='log10') +
facet_wrap(~hgnc_symbol, scales="free") +
scale_color_manual(values=deframe(doseResponse.ColorKey)) +
Rotate_x_labels +
labs(x="nanomolar of risdiplam (or approx equivalent)") +
theme(legend.position='bottom')

Most don’t seem all that unique. Here is a few cherry picked example that does look unique… It’d be nice to find the unproductive splice site for this.
CPM.mat %>%
as.data.frame() %>%
rownames_to_column() %>%
mutate(ensembl_gene_id = str_replace_all(rowname, "\\..+$", "")) %>%
as_tibble() %>%
left_join(symbols) %>%
filter(hgnc_symbol %in% c("ACTL8", "TERT", "ADAMTSL2", "IL2B", "DECR2")) %>%
dplyr::select(-rowname) %>%
pivot_longer(names_to = "old.sample.name", values_to = "log2CPM", -c("hgnc_symbol", "ensembl_gene_id")) %>%
left_join(FullMetadata) %>%
filter(Experiment %in% c("Dose response titration", "Exp090823")) %>%
mutate(doseInApproxRisdiscale = case_when(
treatment == "Risdiplam" ~ dose.nM,
treatment == "C2C5" ~ dose.nM * 10,
treatment == "Branaplam" ~ dose.nM * sqrt(10),
treatment == "DMSO" ~ 0.316,
TRUE ~ dose.nM * sqrt(10),
)) %>%
mutate(treatment = case_when(
treatment == "DMSO" & Experiment == "Dose response titration" ~ "Old.DMSO",
treatment == "DMSO" & Experiment == "Exp090823" ~ "New.DMSO",
TRUE ~ treatment
)) %>%
ggplot(aes(x=doseInApproxRisdiscale, y=log2CPM, group=interaction(rep, treatment), color=treatment)) +
geom_point() +
geom_line() +
scale_x_continuous(trans='log10') +
facet_wrap(~hgnc_symbol, scales="free") +
scale_color_manual(values=deframe(doseResponse.ColorKey)) +
Rotate_x_labels +
labs(x="nanomolar of risdiplam (or approx equivalent)") +
theme(legend.position='bottom')

Let’s go about this a different way: Look for DE genes in this LCL experiment that have a low dose:response spearman correlation coefficient in the titration series experiment.
doseResponseExperimentDat <- fread("../code/DoseResponseData/LCL/TidyExpressionDoseData_logFCTransformedAndAllDMSORepsInEachSeries.txt.gz")
DE.results$treatment %>% unique()
[1] "SMSM27_120" "SMSM27_3000" "SMSM27_600" "SMSM32_120" "SMSM32_3000"
[6] "SMSM32_600" "SMSM6_10000" "SMSM6_2000" "SMSM6_50000" "C2"
[11] "CUOME" "SM2" "A10"
DE.results %>%
filter(CellType == "LCL") %>%
filter(treatment %in% c("C2", "SM2", "SMSM6_50000", "SMSM27_3000", "SMSM32_3000")) %>%
group_by(Geneid) %>%
filter(any(FDR < 0.01 & abs(logFC) > 2)) %>%
ungroup() %>%
separate(Geneid, into=c("Geneid", "hgnc_symbol"), sep="_") %>%
inner_join(
doseResponseExperimentDat %>%
distinct(treatment, Geneid, spearman, .keep_all=T),
by=c("Geneid", "hgnc_symbol")
) %>%
mutate(label = case_when(
hgnc_symbol %in% c("HTT", "MYB", "STAT1", "HSD17B4", "CRYL1", "ATG5", "BTG2") ~ hgnc_symbol,
log2FC < -1.5 & spearman > -0.3 ~ hgnc_symbol,
TRUE ~ NA_character_
)) %>%
ggplot(aes(x=spearman, y=log2FC)) +
geom_point(alpha=0.1) +
geom_point(
data = . %>% filter(!is.na(label)),
color='red') +
geom_text_repel(aes(label=label), max.overlaps=Inf) +
facet_grid(treatment.x~treatment.y) +
labs(title="High correlation between spearman and high dose log2FC")

Okay let’s manually inspect some of those genes in dose respnse curves…
CPM.mat %>%
as.data.frame() %>%
rownames_to_column() %>%
mutate(ensembl_gene_id = str_replace_all(rowname, "\\..+$", "")) %>%
as_tibble() %>%
left_join(symbols) %>%
filter(hgnc_symbol %in% c("ASIC1", "LEXM", "CCR4", "ORG6", "GNS", "PAX9", "GPR25", "HDAC5", "CSF3", "FBXO9", "CDC40", "PHEX", "MBOAT2", "PPP3CA", "MYEOV", "TERT", "ADAMTSL2", "ACTL8")) %>%
dplyr::select(-rowname) %>%
pivot_longer(names_to = "old.sample.name", values_to = "log2CPM", -c("hgnc_symbol", "ensembl_gene_id")) %>%
left_join(FullMetadata) %>%
filter(Experiment %in% c("Dose response titration", "Exp090823")) %>%
mutate(doseInApproxRisdiscale = case_when(
treatment == "Risdiplam" ~ dose.nM,
treatment == "C2C5" ~ dose.nM * 10,
treatment == "Branaplam" ~ dose.nM * sqrt(10),
treatment == "DMSO" ~ 0.316,
TRUE ~ dose.nM * sqrt(10),
)) %>%
mutate(treatment = case_when(
treatment == "DMSO" & Experiment == "Dose response titration" ~ "Old.DMSO",
treatment == "DMSO" & Experiment == "Exp090823" ~ "New.DMSO",
TRUE ~ treatment
)) %>%
ggplot(aes(x=doseInApproxRisdiscale, y=log2CPM, group=interaction(rep, treatment), color=treatment)) +
geom_point() +
geom_line() +
scale_x_continuous(trans='log10') +
facet_wrap(~hgnc_symbol, scales="free") +
scale_color_manual(values=deframe(doseResponse.ColorKey)) +
Rotate_x_labels +
labs(x="nanomolar of risdiplam (or approx equivalent)") +
theme(legend.position='bottom')

All of these are very very lowly expressed (eg ASIC1, PAX9, MYEOV), which I don’t think I trust these to reproduce, or they do have clear effects in one treatment but maybe the spearman coef is smaller in another (eg FBXO9, HDAC5 has obvious effect in branaplam, and maybe less so in risdi)… None of these have a convincing effect that wouldn’t be found in high dose of risdiplam/branaplam experiment.
Nonetheless, I think to help me to better understand this experiment by manually browse coverage plots, I ought to make an IGV session with the help of my script that requires a metadata table.
Rewrites files that are read in this block. be careful not to rerun without thinking
MetadataForHelperScript <- read_tsv("../code/bigwigs/BigwigList.tsv", col_names=c("SampleName", "filepath", "Group", "Strand"))
MetadataForHelperScript %>%
filter(Strand == ".")
Exp090823.Metadata <- FullMetadata %>%
filter(Experiment == "Exp090823") %>%
mutate(Group = case_when(
treatment == "DMSO" ~ paste(Experiment, treatment, dose.nM, cell.type, libType, rep, "u", sep="_"),
TRUE ~ paste(Experiment, treatment, dose.nM, cell.type, libType, "u", sep="_"))
) %>%
mutate(filepath = paste0("/project2/yangili1/bjf79/20211209_JingxinRNAseq/code/bigwigs/unstranded/" ,old.sample.name, ".bw")) %>%
mutate(Strand = ".") %>%
mutate(Supergroup = case_when(
old.sample.name %in% c("Exp090823_DMSO_NA_1", "Exp090823_DMSO_NA_2") ~ "SMSM27_titration_u",
old.sample.name %in% c("Exp090823_DMSO_NA_3", "Exp090823_DMSO_NA_4") ~ "SMSM32_titration_u",
old.sample.name %in% c("Exp090823_DMSO_NA_5") ~ "SMSM6_titration_u" ,
TRUE ~ paste(treatment, "titration_u", sep="_")
)) %>%
mutate(Color = case_when(
str_detect(old.sample.name, "SMSM27_120") ~ "#fee0d2",
str_detect(old.sample.name, "SMSM27_600") ~ "#fc9272",
str_detect(old.sample.name, "SMSM27_3000") ~ "#de2d26",
str_detect(old.sample.name, "SMSM32_120") ~ "#fee6ce",
str_detect(old.sample.name, "SMSM32_600") ~ "#fdae6b",
str_detect(old.sample.name, "SMSM32_3000") ~ "#e6550d",
str_detect(old.sample.name, "SMSM6") ~ "#b15928",
str_detect(old.sample.name, "DMSO") ~ "#969696",
TRUE ~ "color"
))
ColorsForTitration <-
FullMetadata %>%
filter(treatment %in% c("C2C5", "Risdiplam", "Branaplam")) %>%
distinct(dose.nM, treatment) %>%
arrange(treatment, dose.nM) %>%
mutate(TitrationColor = c(brewer.pal(8, "Greens"), brewer.pal(8, "Purples"), brewer.pal(8, "Blues")))
ColorsForTitration %>%
filter(treatment %in% c("C2C5", "Risdiplam", "Branaplam")) %>%
group_by(treatment) %>%
mutate(doseRank = dense_rank(dose.nM)) %>%
ungroup() %>%
ggplot(aes(x=doseRank, y=treatment, fill=TitrationColor)) +
geom_raster() +
scale_fill_identity()
Exp090823.Metadata$Supergroup %>% unique()
FullBwList <- Exp090823.Metadata %>%
dplyr::select(SampleName = old.sample.name, filepath, Group, Strand, Supergroup, color=Color) %>%
bind_rows(
FullMetadata %>%
filter(!is.na(bigwig)) %>%
filter(!str_detect(old.sample.name, "Exp090823")) %>%
mutate(Strand = ".") %>%
dplyr::select(SampleName, filepath=bigwig, Group=group, Strand, Supergroup=supergroup, color)
) %>%
distinct()
FullBwList %>%
dplyr::select(-Supergroup, -color) %>%
write_tsv("../code/bigwigs/BigwigList.tsv")
FullBwList %>%
mutate(bed = "/project2/yangili1/bjf79/20211209_JingxinRNAseq/code/SplicingAnalysis/leafcutter_all_samples_202310/leafcutter_perind_numers.bed.gz") %>%
dplyr::select(Group, color, bed, Supergroup) %>%
distinct() %>%
write_tsv("../code/bigwigs/BigwigList.groups.tsv")
# GroupsFile <- read_tsv("../code/bigwigs/BigwigList.groups.tsv")
# FullMetadata %>%
# filter(!is.na(bigwig)) %>%
# filter(!str_detect(old.sample.name, "Exp090823")) %>%
# mutate(Strand = ".") %>%
# dplyr::select(SampleName, filepath=bigwig, Group=group)
#
# Exp090823.Metadata %>%
# # dplyr::select(SampleName = old.sample.name, filepath, Group, Strand) %>%
# bind_rows(MetadataForHelperScript) %>%
# distinct(Group, color, Supergroup, .keep_all=T) %>%
# group_by()
# # mutate color, supergroup
# mutate(bed = "/project2/yangili1/bjf79/20211209_JingxinRNAseq/code/SplicingAnalysis/leafcutter_all_samples_202310/leafcutter_perind_numers.bed.gz") %>%
# dplyr::select(name = SampleName, color)
#
# # bind_rows(
# # GroupsFile,
# # Exp090823.Metadata %>%
# # distinct(Group, Color, Supergroup) %>%
# # mutate(PSI = "/project2/yangili1/bjf79/20211209_JingxinRNAseq/code/SplicingAnalysis/leafcutter_all_samples_202310/leafcutter_perind_numers.bed.gz")) %>%
# # filter(!Group=="Exp090823_DMSO_NA_LCL_polyA_u") %>%
# # drop_na() %>%
# # distinct(Group, Color, Supergroup) %>%
# # write_tsv("../code/bigwigs/BigwigList.groups.tsv")
read_tsv("../code/bigwigs/BigwigList.tsv") %>%
mutate(filepath = str_replace_all(filepath, "/project2/yangili1/bjf79/20211209_JingxinRNAseq/code/", "")) %>%
write_tsv("../code/bigwigs/BigwigList.relativepaths.tsv")
Now create the session
cd /project2/yangili1/bjf79/20211209_JingxinRNAseq/code
conda activate GenometracksByGenotype
python scripts/GenometracksByGenotype/AggregateBigwigsForPlotting.py --GroupSettingsFile <(awk '$1 ~ /_u$/'
bigwigs/BigwigList.groups.tsv | grep 'LCL' | grep -e 'SMSM32' -e 'SMSM27' -e 'SMSM6' -e 'Risdiplam' -e 'Branaplam' -e 'DMSO' | awk -v OFS='\t' 'BEGIN {print "name", "color", "bed", "supergroup"} {print}') --BigwigList bigwigs/BigwigList.relativepaths.tsv --Normalization WholeGenome --Region chr2:190968033-191021960 --BigwigListType KeyFile --OutputPrefix scratch/Exp202310_ --TracksTemplate scripts/GenometracksByGenotype/tracks_templates/GeneralPurposeColoredByGenotypeWithSupergroups.ini --Bed12GenesToIni scripts/GenometracksByGenotype/PremadeTracks/gencode.v26.FromGTEx.genes.bed12.gz --NoSashimi --IGVSessionOutput --IGVSessionTemplate scripts/GenometracksByGenotype/tracks_templates/PlotAllOriginalBigwigs.xml
mv scratch/Exp202310_tracks.xml ./
Then open the xml session file in IGV
sessionInfo()
R version 4.2.0 (2022-04-22)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)
Matrix products: default
BLAS/LAPACK: /software/openblas-0.3.13-el7-x86_64/lib/libopenblas_haswellp-r0.3.13.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=C
[4] LC_COLLATE=C LC_MONETARY=C LC_MESSAGES=C
[7] LC_PAPER=C LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=C LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] edgeR_3.38.4 limma_3.52.4 GGally_2.1.2 ggrepel_0.9.1
[5] gplots_3.1.3 data.table_1.14.2 RColorBrewer_1.1-3 forcats_0.5.1
[9] stringr_1.4.0 dplyr_1.0.9 purrr_0.3.4 readr_2.1.2
[13] tidyr_1.2.0 tibble_3.1.7 ggplot2_3.3.6 tidyverse_1.3.1
loaded via a namespace (and not attached):
[1] bitops_1.0-7 fs_1.5.2 bit64_4.0.5 lubridate_1.8.0
[5] httr_1.4.3 rprojroot_2.0.3 tools_4.2.0 backports_1.4.1
[9] bslib_0.3.1 utf8_1.2.2 R6_2.5.1 KernSmooth_2.23-20
[13] DBI_1.1.2 colorspace_2.0-3 withr_2.5.0 tidyselect_1.1.2
[17] bit_4.0.4 compiler_4.2.0 git2r_0.30.1 cli_3.3.0
[21] rvest_1.0.2 xml2_1.3.3 labeling_0.4.2 sass_0.4.1
[25] caTools_1.18.2 scales_1.2.0 hexbin_1.28.3 digest_0.6.29
[29] rmarkdown_2.14 R.utils_2.11.0 pkgconfig_2.0.3 htmltools_0.5.2
[33] dbplyr_2.1.1 fastmap_1.1.0 highr_0.9 rlang_1.0.2
[37] readxl_1.4.0 rstudioapi_0.13 jquerylib_0.1.4 generics_0.1.2
[41] farver_2.1.0 jsonlite_1.8.0 vroom_1.5.7 gtools_3.9.2
[45] R.oo_1.24.0 magrittr_2.0.3 Rcpp_1.0.8.3 munsell_0.5.0
[49] fansi_1.0.3 lifecycle_1.0.1 R.methodsS3_1.8.1 stringi_1.7.6
[53] whisker_0.4 yaml_2.3.5 plyr_1.8.7 grid_4.2.0
[57] parallel_4.2.0 promises_1.2.0.1 crayon_1.5.1 lattice_0.20-45
[61] haven_2.5.0 hms_1.1.1 locfit_1.5-9.7 knitr_1.39
[65] pillar_1.7.0 reprex_2.0.1 glue_1.6.2 evaluate_0.15
[69] modelr_0.1.8 vctrs_0.4.1 tzdb_0.3.0 httpuv_1.6.5
[73] cellranger_1.1.0 gtable_0.3.0 reshape_0.8.9 assertthat_0.2.1
[77] xfun_0.30 broom_0.8.0 later_1.3.0 workflowr_1.7.0
[81] ellipsis_0.3.2