Last updated: 2023-11-02
Checks: 5 2
Knit directory:
20211209_JingxinRNAseq/analysis/
This reproducible R Markdown analysis was created with workflowr (version 1.7.0). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.
The R Markdown file has unstaged changes. To know which version of
the R Markdown file created these results, you’ll want to first commit
it to the Git repo. If you’re still working on the analysis, you can
ignore this warning. When you’re finished, you can run
wflow_publish to commit the R Markdown file and build the
HTML.
Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.
The command set.seed(19900924) was run prior to running
the code in the R Markdown file. Setting a seed ensures that any results
that rely on randomness, e.g. subsampling or permutations, are
reproducible.
Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.
Using absolute paths to the files within your workflowr project makes it difficult for you and others to run your code on a different machine. Change the absolute path(s) below to the suggested relative path(s) to make your code more reproducible.
| absolute | relative |
|---|---|
| /project2/yangili1/bjf79/20211209_JingxinRNAseq/code/bigwigs/unstranded/(.+?).bw | ../code/bigwigs/unstranded/(.+?).bw |
Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.
The results in this page were generated with repository version 3c2aa40. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.
Note that you need to be careful to ensure that all relevant files for
the analysis have been committed to Git prior to generating the results
(you can use wflow_publish or
wflow_git_commit). workflowr only checks the R Markdown
file, but you know if there are other scripts or data files that it
depends on. Below is the status of the Git repository when the results
were generated:
Ignored files:
Ignored: .DS_Store
Ignored: .Rhistory
Ignored: .Rproj.user/
Ignored: analysis/.RData
Ignored: analysis/.Rhistory
Ignored: analysis/20220707_TitrationSeries_DE_testing.nb.html
Ignored: code/.DS_Store
Ignored: code/._DOCK7.pdf
Ignored: code/._DOCK7_DMSO1.pdf
Ignored: code/._DOCK7_SM2_1.pdf
Ignored: code/._FKTN_DMSO_1.pdf
Ignored: code/._FKTN_SM2_1.pdf
Ignored: code/._MAPT.pdf
Ignored: code/._PKD1_DMSO_1.pdf
Ignored: code/._PKD1_SM2_1.pdf
Ignored: code/.snakemake/
Ignored: code/1KG_HighCoverageCalls.samplelist.txt
Ignored: code/5ssSeqs.tab
Ignored: code/Alignments/
Ignored: code/Branaplam_Risdiplam_specific_introns.bed.gz
Ignored: code/Branaplam_Risdiplam_specific_introns.bed.gz.tbi
Ignored: code/ChemCLIP/
Ignored: code/ClinVar/
Ignored: code/DE_testing/
Ignored: code/DE_tests.mat.counts.gz
Ignored: code/DE_tests.txt.gz
Ignored: code/DataNotToCommit/
Ignored: code/DonorMotifSearches/
Ignored: code/DoseResponseData/
Ignored: code/Fastq/
Ignored: code/FastqFastp/
Ignored: code/FivePrimeSpliceSites.txt
Ignored: code/FragLenths/
Ignored: code/Ishigami/
Ignored: code/Meme/
Ignored: code/Multiqc/
Ignored: code/OMIM/
Ignored: code/OldBigWigs/
Ignored: code/PhyloP/
Ignored: code/QC/
Ignored: code/ReferenceGenomes/
Ignored: code/Rplots.pdf
Ignored: code/Session.vim
Ignored: code/Session2.vim
Ignored: code/SplicingAnalysis/
Ignored: code/TracksSession
Ignored: code/bigwigs/
Ignored: code/featureCounts/
Ignored: code/figs/
Ignored: code/fimo_out/
Ignored: code/geena/
Ignored: code/hg38ToMm39.over.chain.gz
Ignored: code/igv_session.template.xml
Ignored: code/igv_session.xml
Ignored: code/log
Ignored: code/logs/
Ignored: code/rstudio-server.job
Ignored: code/rules/.ExpMoleculesOfInterest.smk.swp
Ignored: code/scratch/
Ignored: code/temp/
Ignored: code/test.txt.gz
Ignored: code/testPlottingWithMyScript.ForJingxin.sh
Ignored: code/testPlottingWithMyScript.ForJingxin2.sh
Ignored: code/testPlottingWithMyScript.ForJingxin3.sh
Ignored: code/testPlottingWithMyScript.ForJingxin4.sh
Ignored: code/testPlottingWithMyScript.sh
Ignored: code/tracks.with_chRNA.RisdiBrana.xml
Ignored: code/tracks.with_chRNA.RisdiBranaWithExtras.xml
Ignored: code/tracks.xml
Ignored: data/~$52CompoundsTempPlateLayoutForPipettingConvenience.xlsx
Untracked files:
Untracked: analysis/2023-11-02_3MoleculesOfInterestLeafcutterDs.Rmd
Untracked: code/config/samples.3MoleculesOfInterestExperiment.Contrasts.tsv
Untracked: code/scripts/Exp202310_MakeLeafcutterContrasts.R
Unstaged changes:
Modified: analysis/2023-10-31_Explore3MoleculesOfInterest.Rmd
Modified: code/rules/ExpMoleculesOfInterest.smk
Modified: code/rules/common.smk
Modified: code/scripts/GenometracksByGenotype
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
These are the previous versions of the repository in which changes were
made to the R Markdown
(analysis/2023-10-31_Explore3MoleculesOfInterest.Rmd) and
HTML (docs/2023-10-31_Explore3MoleculesOfInterest.html)
files. If you’ve configured a remote Git repository (see
?wflow_git_remote), click on the hyperlinks in the table
below to view the files as they were in that past version.
| File | Version | Author | Date | Message |
|---|---|---|---|---|
| Rmd | 3c2aa40 | Benjmain Fair | 2023-11-01 | add 3molecule exp nb |
| html | 3c2aa40 | Benjmain Fair | 2023-11-01 | add 3molecule exp nb |
Based on previous experiment of 52 molecules, we dtermined three interesting molecules to sequence deeper and at 3 doses… Each dose (and DMSO control) was prepared in biological triplicate and subject to RNA-seq…
library(tidyverse)
library(RColorBrewer)
library(data.table)
library(gplots)
library(ggrepel)
library(edgeR)
# define some useful funcs
sample_n_of <- function(data, size, ...) {
dots <- quos(...)
group_ids <- data %>%
group_by(!!! dots) %>%
group_indices()
sampled_groups <- sample(unique(group_ids), size)
data %>%
filter(group_ids %in% sampled_groups)
}
# Set theme
theme_set(
theme_classic() +
theme(text=element_text(size=16, family="Helvetica")))
# I use layer a lot, to rotate long x-axis labels
Rotate_x_labels <- theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust=1))
Metadata.ExpOf52 <- read_tsv("../code/config/samples.52MoleculeExperiment.tsv") %>%
mutate(cell.type = "LCL", libType="polyA", rep=BioRep, old.sample.name=SampleID, dose.nM=NA) %>%
mutate(treatment = if_else(Treatment=="DMSO", "DMSO", paste0("W", Treatment))) %>%
dplyr::select(treatment, cell.type, dose.nM, libType, rep, old.sample.name) %>%
mutate(SampleName = paste(treatment, dose.nM, cell.type, libType, rep, sep = "_"))
Metadata.PreviousExperiments <- read_tsv("../code/bigwigs/BigwigList.tsv",col_names = c("SampleName", "bigwig", "group", "strand")) %>%
filter(strand==".") %>%
dplyr::select(-strand) %>%
mutate(old.sample.name = str_replace(bigwig, "/project2/yangili1/bjf79/20211209_JingxinRNAseq/code/bigwigs/unstranded/(.+?).bw", "\\1")) %>%
separate(SampleName, into=c("treatment", "dose.nM", "cell.type", "libType", "rep"), convert=T, remove=F, sep="_") %>%
left_join(
read_tsv("../code/bigwigs/BigwigList.groups.tsv", col_names = c("group", "color", "bed", "supergroup")),
by="group"
)
Metadata.ExpOf3 <- read_tsv("../code/config/samples.3MoleculesOfInterestExperiment.tsv") %>%
dplyr::select(old.sample.name=SampleName) %>%
distinct() %>%
separate(old.sample.name, into=c("Experiment", "treatment","dose.nM", "rep"), convert=T, remove=F, sep="_") %>%
mutate(cell.type="LCL", libType="polyA") %>%
arrange(treatment, dose.nM)
FullMetadata <- bind_rows(Metadata.ExpOf52, Metadata.PreviousExperiments) %>%
mutate(Experiment = case_when(
cell.type == "Fibroblast" ~ "Single high dose fibroblast",
startsWith(old.sample.name, "TitrationExp") ~ "Dose response titration",
startsWith(old.sample.name, "chRNA") ~ "nascent RNA profiling",
startsWith(old.sample.name, "NewMolecule") ~ "Single high dose LCL"
)) %>%
bind_rows(Metadata.ExpOf3) %>%
mutate(color = case_when(
treatment == "DMSO" ~ "#969696",
Experiment == "Single high dose LCL" ~ "#252525",
treatment == "SMSM6" ~ "#a65628",
treatment == "SMSM32" ~ "#ff7f00",
treatment == "SMSM27" ~ "#e41a1c",
TRUE ~ color
)) %>%
mutate(leafcutter.name = str_replace_all(old.sample.name, "-", "."))
Metadata.ExpOf3 %>% distinct(treatment)
# A tibble: 4 × 1
treatment
<chr>
1 DMSO
2 SMSM27
3 SMSM32
4 SMSM6
PSI <- fread("../code/SplicingAnalysis/leafcutter_all_samples_202310/leafcutter_perind_numers.bed.gz", sep = '\t' ) %>%
dplyr::select(-"NewMolecule.C04.2")
PreviouslyModeledJuncs <- read_tsv("../output/EC50Estimtes.FromPSI.txt.gz")
Modelled.introns <- PreviouslyModeledJuncs %>%
filter(!is.na(Steepness)) %>%
dplyr::select(`#Chrom`, start, end, strand=strand.y)
make heatmap of juncs
PSI.GAGT.Only <- PSI %>%
inner_join(Modelled.introns) %>%
dplyr::select(junc, A10.1:Exp090823_SMSM6_50000_3
) %>%
drop_na()
PSI.GAGT.Only.cormat <- PSI.GAGT.Only %>%
column_to_rownames("junc") %>%
cor(method='s')
heatmap.2(PSI.GAGT.Only.cormat, trace='none')

| Version | Author | Date |
|---|---|---|
| 3c2aa40 | Benjmain Fair | 2023-11-01 |
PSI.GAGT.Only.Only3Molecules <- PSI %>%
inner_join(Modelled.introns) %>%
dplyr::select(junc, contains("Exp090823")
) %>%
drop_na()
PSI.GAGT.Only.cormat <- PSI.GAGT.Only.Only3Molecules %>%
column_to_rownames("junc") %>%
cor(method='s')
PSI.GAGT.Only.cormat.colors <- PSI.GAGT.Only.Only3Molecules %>%
dplyr::select(-junc) %>%
colnames() %>%
as.data.frame() %>%
left_join(FullMetadata, by=c("."="leafcutter.name")) %>%
group_by(treatment) %>%
mutate(doseRank = dense_rank(dose.nM)) %>%
ungroup() %>%
mutate(dosecolor = dplyr::recode(doseRank, !!!c("1"="#cccccc", "2"="#969696", "3"="#525252"))) %>%
replace_na(list(dosecolor="#f7f7f7"))
heatmap.2(PSI.GAGT.Only.cormat, trace='none', cexRow = 0.5, cexCol=0.5, ColSideColors=PSI.GAGT.Only.cormat.colors$dosecolor, RowSideColors=PSI.GAGT.Only.cormat.colors$color)

| Version | Author | Date |
|---|---|---|
| 3c2aa40 | Benjmain Fair | 2023-11-01 |
PSI.GAGT.Only <- PSI %>%
inner_join(Modelled.introns) %>%
dplyr::select(junc, A10.1:Exp090823_SMSM6_50000_3
) %>%
drop_na()
pca.results <- PSI.GAGT.Only %>%
column_to_rownames("junc") %>%
scale() %>% t() %>% prcomp(scale=T)
summary(pca.results)
Importance of components:
PC1 PC2 PC3 PC4 PC5 PC6 PC7
Standard deviation 9.1089 5.1977 4.28795 4.01418 3.29582 2.77333 2.65510
Proportion of Variance 0.3332 0.1085 0.07384 0.06471 0.04362 0.03089 0.02831
Cumulative Proportion 0.3332 0.4417 0.51556 0.58028 0.62390 0.65479 0.68310
PC8 PC9 PC10 PC11 PC12 PC13 PC14
Standard deviation 2.2922 1.94967 1.78253 1.74744 1.66844 1.61446 1.52818
Proportion of Variance 0.0211 0.01527 0.01276 0.01226 0.01118 0.01047 0.00938
Cumulative Proportion 0.7042 0.71947 0.73223 0.74449 0.75567 0.76614 0.77552
PC15 PC16 PC17 PC18 PC19 PC20 PC21
Standard deviation 1.50098 1.42748 1.42231 1.34748 1.31648 1.30291 1.29333
Proportion of Variance 0.00905 0.00818 0.00812 0.00729 0.00696 0.00682 0.00672
Cumulative Proportion 0.78457 0.79275 0.80087 0.80817 0.81513 0.82194 0.82866
PC22 PC23 PC24 PC25 PC26 PC27 PC28
Standard deviation 1.21346 1.20821 1.18801 1.16246 1.13052 1.12255 1.10596
Proportion of Variance 0.00591 0.00586 0.00567 0.00543 0.00513 0.00506 0.00491
Cumulative Proportion 0.83457 0.84044 0.84610 0.85153 0.85666 0.86173 0.86664
PC29 PC30 PC31 PC32 PC33 PC34 PC35
Standard deviation 1.08474 1.06351 1.05042 1.0223 1.00101 0.9857 0.97727
Proportion of Variance 0.00473 0.00454 0.00443 0.0042 0.00402 0.0039 0.00384
Cumulative Proportion 0.87136 0.87591 0.88034 0.8845 0.88856 0.8925 0.89630
PC36 PC37 PC38 PC39 PC40 PC41 PC42
Standard deviation 0.94208 0.92799 0.89820 0.89491 0.88503 0.86344 0.85586
Proportion of Variance 0.00356 0.00346 0.00324 0.00322 0.00315 0.00299 0.00294
Cumulative Proportion 0.89986 0.90332 0.90656 0.90977 0.91292 0.91591 0.91886
PC43 PC44 PC45 PC46 PC47 PC48 PC49
Standard deviation 0.84070 0.83022 0.80076 0.79794 0.76846 0.7562 0.75277
Proportion of Variance 0.00284 0.00277 0.00258 0.00256 0.00237 0.0023 0.00228
Cumulative Proportion 0.92169 0.92446 0.92704 0.92960 0.93197 0.9343 0.93654
PC50 PC51 PC52 PC53 PC54 PC55 PC56
Standard deviation 0.74404 0.72969 0.71372 0.7064 0.69480 0.68674 0.67508
Proportion of Variance 0.00222 0.00214 0.00205 0.0020 0.00194 0.00189 0.00183
Cumulative Proportion 0.93876 0.94090 0.94295 0.9449 0.94689 0.94878 0.95061
PC57 PC58 PC59 PC60 PC61 PC62 PC63
Standard deviation 0.66586 0.66323 0.6509 0.64068 0.62846 0.60634 0.59381
Proportion of Variance 0.00178 0.00177 0.0017 0.00165 0.00159 0.00148 0.00142
Cumulative Proportion 0.95239 0.95416 0.9559 0.95751 0.95910 0.96057 0.96199
PC64 PC65 PC66 PC67 PC68 PC69 PC70
Standard deviation 0.5910 0.57817 0.57155 0.55737 0.55568 0.5457 0.53302
Proportion of Variance 0.0014 0.00134 0.00131 0.00125 0.00124 0.0012 0.00114
Cumulative Proportion 0.9634 0.96473 0.96605 0.96729 0.96853 0.9697 0.97087
PC71 PC72 PC73 PC74 PC75 PC76 PC77
Standard deviation 0.5227 0.51210 0.50508 0.48483 0.47846 0.46855 0.46660
Proportion of Variance 0.0011 0.00105 0.00102 0.00094 0.00092 0.00088 0.00087
Cumulative Proportion 0.9720 0.97302 0.97405 0.97499 0.97591 0.97679 0.97767
PC78 PC79 PC80 PC81 PC82 PC83 PC84
Standard deviation 0.46283 0.45924 0.4462 0.43611 0.43333 0.42579 0.41051
Proportion of Variance 0.00086 0.00085 0.0008 0.00076 0.00075 0.00073 0.00068
Cumulative Proportion 0.97853 0.97937 0.9802 0.98094 0.98169 0.98242 0.98310
PC85 PC86 PC87 PC88 PC89 PC90 PC91
Standard deviation 0.39996 0.39767 0.38903 0.38150 0.38016 0.37425 0.36672
Proportion of Variance 0.00064 0.00064 0.00061 0.00058 0.00058 0.00056 0.00054
Cumulative Proportion 0.98374 0.98437 0.98498 0.98557 0.98615 0.98671 0.98725
PC92 PC93 PC94 PC95 PC96 PC97 PC98
Standard deviation 0.34811 0.34641 0.34331 0.33488 0.33123 0.32712 0.32363
Proportion of Variance 0.00049 0.00048 0.00047 0.00045 0.00044 0.00043 0.00042
Cumulative Proportion 0.98773 0.98822 0.98869 0.98914 0.98958 0.99001 0.99043
PC99 PC100 PC101 PC102 PC103 PC104 PC105
Standard deviation 0.31832 0.30911 0.30532 0.30220 0.29553 0.28147 0.27979
Proportion of Variance 0.00041 0.00038 0.00037 0.00037 0.00035 0.00032 0.00031
Cumulative Proportion 0.99084 0.99122 0.99160 0.99196 0.99231 0.99263 0.99295
PC106 PC107 PC108 PC109 PC110 PC111 PC112
Standard deviation 0.2726 0.26903 0.26452 0.25634 0.25277 0.24650 0.24030
Proportion of Variance 0.0003 0.00029 0.00028 0.00026 0.00026 0.00024 0.00023
Cumulative Proportion 0.9932 0.99354 0.99382 0.99408 0.99434 0.99458 0.99481
PC113 PC114 PC115 PC116 PC117 PC118 PC119
Standard deviation 0.23359 0.23125 0.22953 0.2254 0.2236 0.21395 0.21104
Proportion of Variance 0.00022 0.00021 0.00021 0.0002 0.0002 0.00018 0.00018
Cumulative Proportion 0.99503 0.99525 0.99546 0.9957 0.9959 0.99605 0.99623
PC120 PC121 PC122 PC123 PC124 PC125 PC126
Standard deviation 0.20723 0.20234 0.19712 0.19377 0.19291 0.18584 0.18463
Proportion of Variance 0.00017 0.00016 0.00016 0.00015 0.00015 0.00014 0.00014
Cumulative Proportion 0.99640 0.99656 0.99672 0.99687 0.99702 0.99716 0.99730
PC127 PC128 PC129 PC130 PC131 PC132 PC133
Standard deviation 0.17532 0.17109 0.16921 0.1606 0.1599 0.1578 0.1548
Proportion of Variance 0.00012 0.00012 0.00011 0.0001 0.0001 0.0001 0.0001
Cumulative Proportion 0.99742 0.99754 0.99765 0.9978 0.9979 0.9980 0.9980
PC134 PC135 PC136 PC137 PC138 PC139 PC140
Standard deviation 0.1540 0.15120 0.14779 0.14386 0.14219 0.13826 0.13429
Proportion of Variance 0.0001 0.00009 0.00009 0.00008 0.00008 0.00008 0.00007
Cumulative Proportion 0.9981 0.99824 0.99833 0.99841 0.99849 0.99857 0.99864
PC141 PC142 PC143 PC144 PC145 PC146 PC147
Standard deviation 0.13314 0.12991 0.12630 0.12547 0.12327 0.12112 0.11832
Proportion of Variance 0.00007 0.00007 0.00006 0.00006 0.00006 0.00006 0.00006
Cumulative Proportion 0.99871 0.99878 0.99884 0.99891 0.99897 0.99903 0.99908
PC148 PC149 PC150 PC151 PC152 PC153 PC154
Standard deviation 0.11396 0.11166 0.10926 0.10361 0.10223 0.10150 0.09994
Proportion of Variance 0.00005 0.00005 0.00005 0.00004 0.00004 0.00004 0.00004
Cumulative Proportion 0.99914 0.99919 0.99923 0.99928 0.99932 0.99936 0.99940
PC155 PC156 PC157 PC158 PC159 PC160 PC161
Standard deviation 0.09668 0.09318 0.09127 0.08861 0.08832 0.08426 0.08295
Proportion of Variance 0.00004 0.00003 0.00003 0.00003 0.00003 0.00003 0.00003
Cumulative Proportion 0.99944 0.99947 0.99951 0.99954 0.99957 0.99960 0.99963
PC162 PC163 PC164 PC165 PC166 PC167 PC168
Standard deviation 0.08161 0.07976 0.07738 0.07494 0.07325 0.07049 0.06599
Proportion of Variance 0.00003 0.00003 0.00002 0.00002 0.00002 0.00002 0.00002
Cumulative Proportion 0.99965 0.99968 0.99970 0.99972 0.99975 0.99977 0.99978
PC169 PC170 PC171 PC172 PC173 PC174 PC175
Standard deviation 0.06483 0.06210 0.06156 0.05933 0.05758 0.05485 0.05171
Proportion of Variance 0.00002 0.00002 0.00002 0.00001 0.00001 0.00001 0.00001
Cumulative Proportion 0.99980 0.99982 0.99983 0.99985 0.99986 0.99987 0.99988
PC176 PC177 PC178 PC179 PC180 PC181 PC182
Standard deviation 0.05093 0.04982 0.04690 0.04615 0.04317 0.04297 0.04043
Proportion of Variance 0.00001 0.00001 0.00001 0.00001 0.00001 0.00001 0.00001
Cumulative Proportion 0.99989 0.99990 0.99991 0.99992 0.99993 0.99993 0.99994
PC183 PC184 PC185 PC186 PC187 PC188 PC189
Standard deviation 0.03933 0.03755 0.03554 0.0348 0.03384 0.03038 0.03018
Proportion of Variance 0.00001 0.00001 0.00001 0.0000 0.00000 0.00000 0.00000
Cumulative Proportion 0.99995 0.99995 0.99996 1.0000 0.99997 0.99997 0.99997
PC190 PC191 PC192 PC193 PC194 PC195 PC196
Standard deviation 0.02808 0.02778 0.02606 0.02506 0.02403 0.02326 0.02136
Proportion of Variance 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000
Cumulative Proportion 0.99998 0.99998 0.99998 0.99999 0.99999 0.99999 0.99999
PC197 PC198 PC199 PC200 PC201 PC202 PC203
Standard deviation 0.02082 0.01785 0.01753 0.01634 0.01576 0.01313 0.01288
Proportion of Variance 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000
Cumulative Proportion 0.99999 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000
PC204
Standard deviation 4.045e-15
Proportion of Variance 0.000e+00
Cumulative Proportion 1.000e+00
pca.results.to.plot <- pca.results$x %>%
as.data.frame() %>%
rownames_to_column("leafcutter.name") %>%
dplyr::select(leafcutter.name, PC1:PC6) %>%
left_join(FullMetadata, by="leafcutter.name")
pca.results.to.plot %>%
# filter(!Experiment=="Single high dose LCL") %>%
mutate(label = case_when(
Experiment == "Exp090823" & rep==1 ~ str_glue("{treatment}, {dose.nM}"),
TRUE ~ NA_character_
)) %>%
ggplot(aes(x=PC1, y=PC2, color=color, shape=Experiment)) +
geom_point(size=3) +
scale_color_identity() +
geom_text_repel(aes(label=label), max.overlaps=15, size=3) +
theme_bw() +
labs(title = "PCA using 294 GAGT introns\nAll samples")

| Version | Author | Date |
|---|---|---|
| 3c2aa40 | Benjmain Fair | 2023-11-01 |
gene.counts <- read_tsv("../code/featureCounts/AllSamples_Counts.txt", comment = "#") %>%
rename_with(~ str_replace(.x, "Alignments/STAR_Align/(.+?)/Aligned.sortedByCoord.out.bam", "\\1"), contains("Alignments")) %>%
dplyr::select(-c(2:6)) %>%
column_to_rownames("Geneid") %>%
DGEList() %>%
calcNormFactors()
counts.plot.dat <- gene.counts$samples %>%
rownames_to_column("old.sample.name") %>%
inner_join(FullMetadata, by="old.sample.name") %>%
mutate(dose.nM = case_when(
treatment == "DMSO" ~ "NA",
cell.type == "Fibroblast" ~ "CC50 dose",
Experiment == "Single high dose LCL" ~ "SMN_EC90 dose",
TRUE ~ as.character(dose.nM)
)) %>%
mutate(label = dose.nM) %>%
arrange(Experiment, treatment, dose.nM)
counts.plot.labels <- counts.plot.dat %>%
dplyr::select(old.sample.name, label) %>% deframe()
ReadsPerDataset <- ggplot(counts.plot.dat, aes(x=old.sample.name, y=lib.size/2E6, fill=color)) +
geom_col() +
scale_fill_identity() +
scale_x_discrete(name="dose (nanomolar)", label=counts.plot.labels) +
scale_y_continuous(expand=c(0,0)) +
theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust=1, size=3)) +
theme(strip.text.x = element_text(size = 8)) +
facet_grid(~Experiment, scales = "free_x", space = "free_x", labeller = label_wrap_gen(15)) +
labs(title="RNA-seq datasets", y="Read count (M)")
ReadsPerDataset

| Version | Author | Date |
|---|---|---|
| 3c2aa40 | Benjmain Fair | 2023-11-01 |
CPM.mat <- cpm(gene.counts, prior.count = 0.1, log = T)
symbols <- read_tsv("../data/Genes.list.txt")
FullMetadata
# A tibble: 205 × 14
treatment cell.type dose.nM libType rep old.sample.name SampleName bigwig
<chr> <chr> <dbl> <chr> <dbl> <chr> <chr> <chr>
1 WA01 LCL NA polyA 1 NewMolecule.A01-1 WA01_NA_L… <NA>
2 WA02 LCL NA polyA 1 NewMolecule.A02-1 WA02_NA_L… <NA>
3 WA03 LCL NA polyA 1 NewMolecule.A03-1 WA03_NA_L… <NA>
4 WA04 LCL NA polyA 1 NewMolecule.A04-1 WA04_NA_L… <NA>
5 WA05 LCL NA polyA 1 NewMolecule.A05-1 WA05_NA_L… <NA>
6 WA06 LCL NA polyA 1 NewMolecule.A06-1 WA06_NA_L… <NA>
7 WA07 LCL NA polyA 1 NewMolecule.A07-1 WA07_NA_L… <NA>
8 WA08 LCL NA polyA 1 NewMolecule.A08-1 WA08_NA_L… <NA>
9 WA09 LCL NA polyA 1 NewMolecule.A09-1 WA09_NA_L… <NA>
10 WA10 LCL NA polyA 1 NewMolecule.A10-1 WA10_NA_L… <NA>
# … with 195 more rows, and 6 more variables: group <chr>, color <chr>,
# bed <chr>, supergroup <chr>, Experiment <chr>, leafcutter.name <chr>
FullMetadata %>%
filter(Experiment == "Exp090823")
# A tibble: 32 × 14
treatment cell.type dose.nM libType rep old.sample.name SampleName bigwig
<chr> <chr> <dbl> <chr> <dbl> <chr> <chr> <chr>
1 DMSO LCL NA polyA 1 Exp090823_DMSO_N… <NA> <NA>
2 DMSO LCL NA polyA 2 Exp090823_DMSO_N… <NA> <NA>
3 DMSO LCL NA polyA 3 Exp090823_DMSO_N… <NA> <NA>
4 DMSO LCL NA polyA 4 Exp090823_DMSO_N… <NA> <NA>
5 DMSO LCL NA polyA 5 Exp090823_DMSO_N… <NA> <NA>
6 SMSM27 LCL 120 polyA 1 Exp090823_SMSM27… <NA> <NA>
7 SMSM27 LCL 120 polyA 2 Exp090823_SMSM27… <NA> <NA>
8 SMSM27 LCL 120 polyA 3 Exp090823_SMSM27… <NA> <NA>
9 SMSM27 LCL 600 polyA 1 Exp090823_SMSM27… <NA> <NA>
10 SMSM27 LCL 600 polyA 2 Exp090823_SMSM27… <NA> <NA>
# … with 22 more rows, and 6 more variables: group <chr>, color <chr>,
# bed <chr>, supergroup <chr>, Experiment <chr>, leafcutter.name <chr>
CPM.mat %>%
as.data.frame() %>%
rownames_to_column() %>%
mutate(ensembl_gene_id = str_replace_all(rowname, "\\..+$", "")) %>%
as_tibble() %>%
left_join(symbols) %>%
filter(hgnc_symbol %in% c("HTT", "MYB", "FHOD1", "STAT1")) %>%
dplyr::select(-rowname) %>%
pivot_longer(names_to = "old.sample.name", values_to = "log2CPM", -c("hgnc_symbol", "ensembl_gene_id")) %>%
left_join(FullMetadata) %>%
filter(Experiment %in% c("Dose response titration", "Exp090823")) %>%
group_by(Experiment, treatment, ensembl_gene_id) %>%
mutate(doseRank = dense_rank(dose.nM)) %>%
ungroup() %>%
ggplot(aes(x=dose.nM, y=log2CPM, group=rep, color=color)) +
scale_color_identity() +
geom_point() +
scale_x_continuous(trans='log10') +
facet_grid(treatment~hgnc_symbol, scales="free")

| Version | Author | Date |
|---|---|---|
| 3c2aa40 | Benjmain Fair | 2023-11-01 |
Colors <- FullMetadata %>%
group_by(treatment) %>%
filter(dose.nM == max(dose.nM) | treatment == "DMSO") %>%
distinct(dose.nM, treatment, color) %>%
dplyr::select(treatment, color)
CPM.mat %>%
as.data.frame() %>%
rownames_to_column() %>%
mutate(ensembl_gene_id = str_replace_all(rowname, "\\..+$", "")) %>%
as_tibble() %>%
left_join(symbols) %>%
filter(hgnc_symbol %in% c("HTT", "MYB", "FHOD1", "STAT1", "CRYL1", "DENND5A", "ATG5", "BTG2", "ZIK1", "HSD17B4","EIF2AK4", "SOS2", "ALG8", "MLLT10")) %>%
dplyr::select(-rowname) %>%
pivot_longer(names_to = "old.sample.name", values_to = "log2CPM", -c("hgnc_symbol", "ensembl_gene_id")) %>%
left_join(FullMetadata) %>%
filter(Experiment %in% c("Dose response titration", "Exp090823")) %>%
mutate(doseInApproxRisdiscale = case_when(
treatment == "Risdiplam" ~ dose.nM,
treatment == "C2C5" ~ dose.nM * 10,
treatment == "Branaplam" ~ dose.nM * sqrt(10),
treatment == "DMSO" ~ 0.316,
TRUE ~ dose.nM * sqrt(10),
)) %>%
mutate(treatment = case_when(
treatment == "DMSO" & Experiment == "Dose response titration" ~ "Old.DMSO",
treatment == "DMSO" & Experiment == "Exp090823" ~ "New.DMSO",
TRUE ~ treatment
)) %>%
ggplot(aes(x=doseInApproxRisdiscale, y=log2CPM, group=interaction(rep, treatment), color=treatment)) +
geom_point() +
geom_line() +
scale_x_continuous(trans='log10') +
facet_wrap(~hgnc_symbol, scales="free") +
# scale_color_manual(values=FullMetadata) +
Rotate_x_labels +
labs(x="nanomolar of risdiplam (or approx equivalent)") +
theme(legend.position='bottom')

| Version | Author | Date |
|---|---|---|
| 3c2aa40 | Benjmain Fair | 2023-11-01 |
PSI.GAGT.Only %>%
pivot_longer(names_to = "leafcutter.name", values_to = "LeafcutterPSI", -c("junc")) %>%
left_join(FullMetadata) %>%
filter(Experiment %in% c("Dose response titration", "Exp090823")) %>%
mutate(doseInApproxRisdiscale = case_when(
treatment == "Risdiplam" ~ dose.nM,
treatment == "C2C5" ~ dose.nM * 10,
treatment == "Branaplam" ~ dose.nM * sqrt(10),
treatment == "DMSO" ~ 0.316,
TRUE ~ dose.nM * sqrt(10),
)) %>%
mutate(treatment = case_when(
treatment == "DMSO" & Experiment == "Dose response titration" ~ "Old.DMSO",
treatment == "DMSO" & Experiment == "Exp090823" ~ "New.DMSO",
TRUE ~ treatment
)) %>%
mutate(junc = str_replace(junc, "^(.+?):clu_.+$", "\\1")) %>%
sample_n_of(20, junc) %>%
left_join(
PreviouslyModeledJuncs %>%
mutate(junc = str_replace(junc, "^(.+?):clu_.+$", "\\1")) %>%
dplyr::select(junc, seq, gene_names),
) %>%
mutate(PosMinus4Minus3 = str_extract(seq, "^\\w{2}")) %>%
mutate(facetName = str_glue("{PosMinus4Minus3}, {gene_names}\n{junc}")) %>%
ggplot(aes(x=doseInApproxRisdiscale, y=LeafcutterPSI, group=interaction(rep, treatment), color=treatment)) +
geom_point() +
geom_line() +
scale_x_continuous(trans='log10') +
facet_wrap(~facetName, scales="free") +
# scale_color_manual(values=FullMetadata) +
Rotate_x_labels +
labs(x="nanomolar of risdiplam (or approx equivalent)", caption="FacetTitle:\nNN in NNGA|GT, gene\njunc coords") +
theme(legend.position='bottom')

| Version | Author | Date |
|---|---|---|
| 3c2aa40 | Benjmain Fair | 2023-11-01 |
PSI.GAGT.Only %>%
pivot_longer(names_to = "leafcutter.name", values_to = "LeafcutterPSI", -c("junc")) %>%
left_join(FullMetadata) %>%
filter(Experiment %in% c("Dose response titration", "Exp090823")) %>%
mutate(doseInApproxRisdiscale = case_when(
treatment == "Risdiplam" ~ dose.nM,
treatment == "C2C5" ~ dose.nM * 10,
treatment == "Branaplam" ~ dose.nM * sqrt(10),
treatment == "DMSO" ~ 0.316,
TRUE ~ dose.nM * sqrt(10),
)) %>%
mutate(treatment = case_when(
treatment == "DMSO" & Experiment == "Dose response titration" ~ "Old.DMSO",
treatment == "DMSO" & Experiment == "Exp090823" ~ "New.DMSO",
TRUE ~ treatment
)) %>%
mutate(junc = str_replace(junc, "^(.+?):clu_.+$", "\\1")) %>%
left_join(
PreviouslyModeledJuncs %>%
mutate(junc = str_replace(junc, "^(.+?):clu_.+$", "\\1")) %>%
dplyr::select(junc, seq, gene_names),
) %>%
filter(gene_names %in% c("HTT", "MYB", "FHOD1", "STAT1", "CRYL1", "DENND5A", "ATG5", "BTG2", "ZIK1", "HSD17B4","EIF2AK4", "SOS2", "ALG8", "MLLT10")) %>%
mutate(PosMinus4Minus3 = str_extract(seq, "^\\w{2}")) %>%
mutate(facetName = str_glue("{PosMinus4Minus3}, {gene_names}\n{junc}")) %>%
ggplot(aes(x=doseInApproxRisdiscale, y=LeafcutterPSI, group=interaction(rep, treatment), color=treatment)) +
geom_point() +
geom_line() +
scale_x_continuous(trans='log10') +
facet_wrap(~facetName, scales="free") +
# scale_color_manual(values=FullMetadata) +
Rotate_x_labels +
labs(x="nanomolar of risdiplam (or approx equivalent)", caption="FacetTitle:\nNN in NNGA|GT, gene\njunc coords") +
theme(legend.position='bottom')

| Version | Author | Date |
|---|---|---|
| 3c2aa40 | Benjmain Fair | 2023-11-01 |
I think to further process this data, one useful approach will be to fit a loglogistic model, with the upper limit, lower limit, and slope fixed, and then only EC50 may change… This is how I processed some of the branaplam/C2C5/risdiplam titration data… so it would make it easier to integrate results.
Perhaps the other way to make sense of the data is to look at the distribution of spearman correlatino of log2FC of introns, grouped by NNGA|GT motif…
I will accomplish these tasks in the snakemake or in some other notebooks…
sessionInfo()
R version 4.2.0 (2022-04-22)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)
Matrix products: default
BLAS/LAPACK: /software/openblas-0.3.13-el7-x86_64/lib/libopenblas_haswellp-r0.3.13.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=C
[4] LC_COLLATE=C LC_MONETARY=C LC_MESSAGES=C
[7] LC_PAPER=C LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=C LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] edgeR_3.38.4 limma_3.52.4 ggrepel_0.9.1 gplots_3.1.3
[5] data.table_1.14.2 RColorBrewer_1.1-3 forcats_0.5.1 stringr_1.4.0
[9] dplyr_1.0.9 purrr_0.3.4 readr_2.1.2 tidyr_1.2.0
[13] tibble_3.1.7 ggplot2_3.3.6 tidyverse_1.3.1
loaded via a namespace (and not attached):
[1] bitops_1.0-7 fs_1.5.2 lubridate_1.8.0 bit64_4.0.5
[5] httr_1.4.3 rprojroot_2.0.3 tools_4.2.0 backports_1.4.1
[9] bslib_0.3.1 utf8_1.2.2 R6_2.5.1 KernSmooth_2.23-20
[13] DBI_1.1.2 colorspace_2.0-3 withr_2.5.0 tidyselect_1.1.2
[17] bit_4.0.4 compiler_4.2.0 git2r_0.30.1 cli_3.3.0
[21] rvest_1.0.2 xml2_1.3.3 labeling_0.4.2 sass_0.4.1
[25] caTools_1.18.2 scales_1.2.0 digest_0.6.29 rmarkdown_2.14
[29] R.utils_2.11.0 pkgconfig_2.0.3 htmltools_0.5.2 dbplyr_2.1.1
[33] fastmap_1.1.0 highr_0.9 rlang_1.0.2 readxl_1.4.0
[37] rstudioapi_0.13 farver_2.1.0 jquerylib_0.1.4 generics_0.1.2
[41] jsonlite_1.8.0 gtools_3.9.2 vroom_1.5.7 R.oo_1.24.0
[45] magrittr_2.0.3 Rcpp_1.0.8.3 munsell_0.5.0 fansi_1.0.3
[49] lifecycle_1.0.1 R.methodsS3_1.8.1 stringi_1.7.6 whisker_0.4
[53] yaml_2.3.5 grid_4.2.0 parallel_4.2.0 promises_1.2.0.1
[57] crayon_1.5.1 lattice_0.20-45 haven_2.5.0 hms_1.1.1
[61] locfit_1.5-9.7 knitr_1.39 pillar_1.7.0 reprex_2.0.1
[65] glue_1.6.2 evaluate_0.15 modelr_0.1.8 vctrs_0.4.1
[69] tzdb_0.3.0 httpuv_1.6.5 cellranger_1.1.0 gtable_0.3.0
[73] assertthat_0.2.1 xfun_0.30 broom_0.8.0 later_1.3.0
[77] workflowr_1.7.0 ellipsis_0.3.2