Last updated: 2023-02-22
Checks: 5 2
Knit directory:
20211209_JingxinRNAseq/analysis/
This reproducible R Markdown analysis was created with workflowr (version 1.7.0). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.
The R Markdown file has unstaged changes. To know which version of
the R Markdown file created these results, you’ll want to first commit
it to the Git repo. If you’re still working on the analysis, you can
ignore this warning. When you’re finished, you can run
wflow_publish to commit the R Markdown file and build the
HTML.
Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.
The command set.seed(19900924) was run prior to running
the code in the R Markdown file. Setting a seed ensures that any results
that rely on randomness, e.g. subsampling or permutations, are
reproducible.
Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.
Using absolute paths to the files within your workflowr project makes it difficult for you and others to run your code on a different machine. Change the absolute path(s) below to the suggested relative path(s) to make your code more reproducible.
| absolute | relative |
|---|---|
| /project2/yangili1/bjf79/20211209_JingxinRNAseq/code/bigwigs/unstranded/(.+?).bw | ../code/bigwigs/unstranded/(.+?).bw |
Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.
The results in this page were generated with repository version 94cef29. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.
Note that you need to be careful to ensure that all relevant files for
the analysis have been committed to Git prior to generating the results
(you can use wflow_publish or
wflow_git_commit). workflowr only checks the R Markdown
file, but you know if there are other scripts or data files that it
depends on. Below is the status of the Git repository when the results
were generated:
Ignored files:
Ignored: .DS_Store
Ignored: .Rhistory
Ignored: .Rproj.user/
Ignored: analysis/.RData
Ignored: analysis/.Rhistory
Ignored: analysis/20220707_TitrationSeries_DE_testing.nb.html
Ignored: analysis/figure/
Ignored: code/.DS_Store
Ignored: code/._DOCK7.pdf
Ignored: code/._DOCK7_DMSO1.pdf
Ignored: code/._DOCK7_SM2_1.pdf
Ignored: code/._FKTN_DMSO_1.pdf
Ignored: code/._FKTN_SM2_1.pdf
Ignored: code/._MAPT.pdf
Ignored: code/._PKD1_DMSO_1.pdf
Ignored: code/._PKD1_SM2_1.pdf
Ignored: code/.snakemake/
Ignored: code/1KG_HighCoverageCalls.samplelist.txt
Ignored: code/5ssSeqs.tab
Ignored: code/Alignments/
Ignored: code/ChemCLIP/
Ignored: code/ClinVar/
Ignored: code/DE_testing/
Ignored: code/DE_tests.mat.counts.gz
Ignored: code/DE_tests.txt.gz
Ignored: code/DataNotToCommit/
Ignored: code/DoseResponseData/
Ignored: code/Fastq/
Ignored: code/FastqFastp/
Ignored: code/FragLenths/
Ignored: code/Meme/
Ignored: code/Multiqc/
Ignored: code/OMIM/
Ignored: code/OldBigWigs/
Ignored: code/PhyloP/
Ignored: code/QC/
Ignored: code/ReferenceGenomes/
Ignored: code/Session.vim
Ignored: code/SplicingAnalysis/
Ignored: code/TracksSession
Ignored: code/bigwigs/
Ignored: code/featureCounts/
Ignored: code/figs/
Ignored: code/geena/
Ignored: code/hg38ToMm39.over.chain.gz
Ignored: code/igv_session.template.xml
Ignored: code/igv_session.xml
Ignored: code/log
Ignored: code/logs/
Ignored: code/rstudio-server.job
Ignored: code/rules/.Process_ExpOf52.smk.swp
Ignored: code/rules/.RNASeqProcessing.smk.swp
Ignored: code/scratch/
Ignored: code/test.txt.gz
Ignored: code/testPlottingWithMyScript.ForJingxin.sh
Ignored: code/testPlottingWithMyScript.ForJingxin2.sh
Ignored: code/testPlottingWithMyScript.ForJingxin3.sh
Ignored: code/testPlottingWithMyScript.ForJingxin4.sh
Ignored: code/testPlottingWithMyScript.sh
Ignored: data/~$52CompoundsTempPlateLayoutForPipettingConvenience.xlsx
Ignored: output/._PioritizedIntronTargets.pdf
Unstaged changes:
Modified: analysis/20220221_ExploreExpOf52.Rmd
Modified: code/rules/Process_ExpOf52.smk
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
These are the previous versions of the repository in which changes were
made to the R Markdown
(analysis/20220221_ExploreExpOf52.Rmd) and HTML
(docs/20220221_ExploreExpOf52.html) files. If you’ve
configured a remote Git repository (see ?wflow_git_remote),
click on the hyperlinks in the table below to view the files as they
were in that past version.
| File | Version | Author | Date | Message |
|---|---|---|---|---|
| Rmd | 94cef29 | Benjmain Fair | 2023-02-22 | added Rmd for 52Exp |
knitr::opts_chunk$set(echo = TRUE, warning = F, message = F)
library(tidyverse)
── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
✔ ggplot2 3.3.6 ✔ purrr 0.3.4
✔ tibble 3.1.7 ✔ dplyr 1.0.9
✔ tidyr 1.2.0 ✔ stringr 1.4.0
✔ readr 2.1.2 ✔ forcats 0.5.1
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
library(RColorBrewer)
library(data.table)
Attaching package: 'data.table'
The following objects are masked from 'package:dplyr':
between, first, last
The following object is masked from 'package:purrr':
transpose
library(edgeR)
Loading required package: limma
library(qvalue)
# library(Heatplus)
library(gplots)
Attaching package: 'gplots'
The following object is masked from 'package:stats':
lowess
# define some useful funcs
sample_n_of <- function(data, size, ...) {
dots <- quos(...)
group_ids <- data %>%
group_by(!!! dots) %>%
group_indices()
sampled_groups <- sample(unique(group_ids), size)
data %>%
filter(group_ids %in% sampled_groups)
}
# Set theme
theme_set(
theme_classic() +
theme(text=element_text(size=16, family="Helvetica")))
# I use layer a lot, to rotate long x-axis labels
Rotate_x_labels <- theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust=1))
#test plot
ggplot(mtcars, aes(x=mpg, y=cyl)) +
geom_point()

Jingxin’s lab has synthesized (or maybe contracted out synthesis of) hundred(s) branaplam/risdiplam derivatives. They screened them for splice-modifying activity using a SMN2 minigene I believe, resulting in 52 compounds that they send Gabi and I for further experiments: We recieved the 52 compounds in a 96 well plate, and in subsequent plots we are naming them just by the well position they were shipped in (ie A01, A02, …, E04). The molar concentration of each molecule was determined by Jingxin’s lab and shipped at 500x concentration. I think molar concentration is based on EC90 of SMN2 splice modifying activity in their initial screen. We grew LCLs cells to a density of approximately 1M cells per mL, and split into seperate flasks each containing 7mL: 2 seperate replicate flasks for each molecule treatment, and 6 flasks of DMSO control (total of 110 samples), and applied 14uL of treatment. The following day, cells were collected by centrifugation, supernate was discarded and stored in 1mL of Trizol at -20C for futher processing. We extracted RNA by phase seperation (Trizol manufacturer’s instructions). The top phase was mixed with an equal volume of ethanol, and applied to Zymo5 nucleic-acid purification columns. After two washes, we performed on-column DNAse I treatment, followed by an extra application of binding buffer and two more washes, and finally elution in water. Qubit quantification was used to quantify RNA before using NEBNext ultra directional ii RNAseq kits with NEB’s polyA capture kit. There are a couple optional steps in the kit preparation: We fragmented the RNA for only5 minutes, and performed size selection (step 6.3) to preferentially obtain longer insert sizes. The libraries were intially sequenced on a MiSeq (~20M reads total), to determine optimal re-pooling volumes (aiming for equal representation of each sample) and to confirm long insert sizes to justify the 150+150 paired end sequencing that we ended up doing on the NovaSeq S4 (~10B reads total). In practice, I noticed that one sample had very very low number of reads, and will likely have to be dropped from analysis (see below). Also, sample pooling volumes weren’t as perfect as I would have hoped, but I think they are acceptable. see below.
In the remainder of this notebook, I will explore this data is ways similar to my previous notebooks regarding Jingxin’s RNAseq - analyses that I can do quickly. The pre-processing that I have done leading up to this notebook includes read trimming (fastp), alignment with STAR (I used single pass mode, but perhaps before publication it might be worth aligning all samples and re-aligning samples for STAR’s “two-pass” procedure for most sensitive identification of unannotated junctions), gene counting with featureCounts, differential expression analysis with edgeR (a separate contrast for each treatment, comparing to the same 6 DMSO replicates ), splicing quantification and differential testing with leafcutter (again, comparing each treatment to 6 DMSO replicates).
Let’s just consider protein coding genes, as written out in my reformatted featureCounts output, which I will read in here. I will occasionally read in data from previous experiments too for comparison.
# read in sample metadata
Metadata.ExpOf52 <- read_tsv("../code/config/samples.52MoleculeExperiment.tsv") %>%
mutate(cell.type = "LCL", libType="polyA", rep=BioRep, old.sample.name=SampleID, dose.nM=NA) %>%
mutate(treatment = if_else(Treatment=="DMSO", "DMSO", paste0("W", Treatment))) %>%
dplyr::select(treatment, cell.type, dose.nM, libType, rep, old.sample.name) %>%
mutate(SampleName = paste(treatment, dose.nM, cell.type, libType, rep, sep = "_"))
Metadata.PreviousExperiments <- read_tsv("../code/bigwigs/BigwigList.tsv",col_names = c("SampleName", "bigwig", "group", "strand")) %>%
filter(strand==".") %>%
dplyr::select(-strand) %>%
mutate(old.sample.name = str_replace(bigwig, "/project2/yangili1/bjf79/20211209_JingxinRNAseq/code/bigwigs/unstranded/(.+?).bw", "\\1")) %>%
separate(SampleName, into=c("treatment", "dose.nM", "cell.type", "libType", "rep"), convert=T, remove=F, sep="_") %>%
left_join(
read_tsv("../code/bigwigs/BigwigList.groups.tsv", col_names = c("group", "color", "bed", "supergroup")),
by="group"
)
FullMetadata <- bind_rows(Metadata.ExpOf52, Metadata.PreviousExperiments) %>%
mutate(Experiment = case_when(
cell.type == "Fibroblast" ~ "Single high dose fibroblast",
startsWith(old.sample.name, "TitrationExp") ~ "Dose response titration",
startsWith(old.sample.name, "chRNA") ~ "nascent RNA profiling",
startsWith(old.sample.name, "NewMolecule") ~ "Single high dose LCL"
)) %>%
mutate(color = case_when(
treatment == "DMSO" ~ "#969696",
Experiment == "Single high dose LCL" ~ "#252525",
TRUE ~ color
)) %>%
mutate(leafcutter.name = str_replace_all(old.sample.name, "-", "."))
gene.counts <- read_tsv("../code/DE_testing/ExpOf52_Counts.mat.txt.gz") %>%
column_to_rownames("Geneid") %>%
DGEList() %>%
calcNormFactors()
counts.plot.dat <- gene.counts$samples %>%
rownames_to_column("old.sample.name") %>%
inner_join(FullMetadata, by="old.sample.name") %>%
mutate(dose.nM = case_when(
treatment == "DMSO" ~ "NA",
cell.type == "Fibroblast" ~ "CC50 dose",
Experiment == "Single high dose LCL" ~ "SMN_EC90 dose",
TRUE ~ as.character(dose.nM)
)) %>%
mutate(label = dose.nM) %>%
arrange(Experiment, treatment, dose.nM)
counts.plot.labels <- counts.plot.dat %>%
dplyr::select(old.sample.name, label) %>% deframe()
counts.plot.dat %>%
filter(Experiment=="Single high dose LCL")
old.sample.name group.x lib.size norm.factors treatment cell.type
1 NewMolecule.E05-1 1 135983671 0.9401592 DMSO LCL
2 NewMolecule.E06-2 1 317779082 0.6907965 DMSO LCL
3 NewMolecule.E07-3 1 72297626 0.9693137 DMSO LCL
4 NewMolecule.E05-4 1 129001545 0.8532237 DMSO LCL
5 NewMolecule.E06-5 1 182905676 0.9465600 DMSO LCL
6 NewMolecule.E07-6 1 126772689 0.7745714 DMSO LCL
7 NewMolecule.A01-1 1 183728666 0.9614895 WA01 LCL
8 NewMolecule.A01-2 1 198632154 0.6962443 WA01 LCL
9 NewMolecule.A02-1 1 312182548 0.9826652 WA02 LCL
10 NewMolecule.A02-2 1 183395930 0.9897056 WA02 LCL
11 NewMolecule.A03-1 1 177704482 0.9400929 WA03 LCL
12 NewMolecule.A03-2 1 249934473 0.9561665 WA03 LCL
13 NewMolecule.A04-1 1 174653379 0.9222987 WA04 LCL
14 NewMolecule.A04-2 1 548033015 0.7589615 WA04 LCL
15 NewMolecule.A05-1 1 178435056 0.8614118 WA05 LCL
16 NewMolecule.A05-2 1 153251808 0.7788587 WA05 LCL
17 NewMolecule.A06-1 1 190563753 1.0080036 WA06 LCL
18 NewMolecule.A06-2 1 151314210 0.6000523 WA06 LCL
19 NewMolecule.A07-1 1 123503847 0.7953700 WA07 LCL
20 NewMolecule.A07-2 1 163742739 0.7842479 WA07 LCL
21 NewMolecule.A08-1 1 128532357 0.9945259 WA08 LCL
22 NewMolecule.A08-2 1 191110810 0.8467907 WA08 LCL
23 NewMolecule.A09-1 1 250617437 0.9637614 WA09 LCL
24 NewMolecule.A09-2 1 198092385 0.8870051 WA09 LCL
25 NewMolecule.A10-1 1 233224920 0.9467818 WA10 LCL
26 NewMolecule.A10-2 1 125874155 0.8707373 WA10 LCL
27 NewMolecule.A11-1 1 151421732 0.7724431 WA11 LCL
28 NewMolecule.A11-2 1 159203721 0.6239473 WA11 LCL
29 NewMolecule.A12-1 1 242979912 0.8638066 WA12 LCL
30 NewMolecule.A12-2 1 197958602 0.6814641 WA12 LCL
31 NewMolecule.B01-1 1 176245134 0.8257129 WB01 LCL
32 NewMolecule.B01-2 1 144824255 0.7994645 WB01 LCL
33 NewMolecule.B02-1 1 159560647 0.9525037 WB02 LCL
34 NewMolecule.B02-2 1 181332941 0.8442828 WB02 LCL
35 NewMolecule.B03-1 1 501440283 0.9212623 WB03 LCL
36 NewMolecule.B03-2 1 161562099 0.8031914 WB03 LCL
37 NewMolecule.B04-1 1 363641016 0.9203852 WB04 LCL
38 NewMolecule.B04-2 1 262007104 0.9186215 WB04 LCL
39 NewMolecule.B05-1 1 134091835 0.9803464 WB05 LCL
40 NewMolecule.B05-2 1 129596579 0.9316498 WB05 LCL
41 NewMolecule.B06-1 1 159743971 0.9776352 WB06 LCL
42 NewMolecule.B06-2 1 84886553 0.9970722 WB06 LCL
43 NewMolecule.B07-1 1 160357588 0.8486932 WB07 LCL
44 NewMolecule.B07-2 1 137933937 0.7173181 WB07 LCL
45 NewMolecule.B08-1 1 172609974 1.0109620 WB08 LCL
46 NewMolecule.B08-2 1 82813127 0.9109179 WB08 LCL
47 NewMolecule.B09-1 1 178767085 0.9536717 WB09 LCL
48 NewMolecule.B09-2 1 264104038 0.8378014 WB09 LCL
49 NewMolecule.B10-1 1 238413032 0.8317911 WB10 LCL
50 NewMolecule.B10-2 1 110987864 0.8812357 WB10 LCL
51 NewMolecule.B11-1 1 174607137 0.8259890 WB11 LCL
52 NewMolecule.B11-2 1 168465700 0.7937547 WB11 LCL
53 NewMolecule.B12-1 1 181271838 0.9129708 WB12 LCL
54 NewMolecule.B12-2 1 172779925 0.6964442 WB12 LCL
55 NewMolecule.C01-1 1 244342093 0.9665939 WC01 LCL
56 NewMolecule.C01-2 1 138058013 0.6168776 WC01 LCL
57 NewMolecule.C02-1 1 154946142 0.9809795 WC02 LCL
58 NewMolecule.C02-2 1 187610831 0.7841767 WC02 LCL
59 NewMolecule.C03-1 1 165446125 0.8161006 WC03 LCL
60 NewMolecule.C03-2 1 174266943 0.7833399 WC03 LCL
61 NewMolecule.C04-1 1 148019761 0.9639406 WC04 LCL
62 NewMolecule.C04-2 1 4025684 0.8837338 WC04 LCL
63 NewMolecule.C05-1 1 165455144 0.8796405 WC05 LCL
64 NewMolecule.C05-2 1 174718878 0.7735656 WC05 LCL
65 NewMolecule.C06-1 1 140277168 0.9390193 WC06 LCL
66 NewMolecule.C06-2 1 168324527 0.7628401 WC06 LCL
67 NewMolecule.C07-1 1 261153647 0.9823486 WC07 LCL
68 NewMolecule.C07-2 1 143163993 0.8017761 WC07 LCL
69 NewMolecule.C08-1 1 147676792 0.9694391 WC08 LCL
70 NewMolecule.C08-2 1 108832959 0.9102188 WC08 LCL
71 NewMolecule.C09-1 1 408007760 0.9806656 WC09 LCL
72 NewMolecule.C09-2 1 173089550 0.8327315 WC09 LCL
73 NewMolecule.C10-1 1 167604989 0.9559976 WC10 LCL
74 NewMolecule.C10-2 1 137455261 0.8389724 WC10 LCL
75 NewMolecule.C11-1 1 187805337 0.8355439 WC11 LCL
76 NewMolecule.C11-2 1 177066236 0.7906897 WC11 LCL
77 NewMolecule.C12-1 1 171294673 0.8092100 WC12 LCL
78 NewMolecule.C12-2 1 207161226 0.9439126 WC12 LCL
79 NewMolecule.D01-1 1 148474840 0.8973409 WD01 LCL
80 NewMolecule.D01-2 1 176919098 0.8387012 WD01 LCL
81 NewMolecule.D02-1 1 182860841 0.9258999 WD02 LCL
82 NewMolecule.D02-2 1 186819768 0.9105906 WD02 LCL
83 NewMolecule.D03-1 1 131745383 0.8897832 WD03 LCL
84 NewMolecule.D03-2 1 186120074 0.8657665 WD03 LCL
85 NewMolecule.D04-1 1 147445482 0.9188042 WD04 LCL
86 NewMolecule.D04-2 1 170662427 0.8789052 WD04 LCL
87 NewMolecule.D05-1 1 109693197 0.9849873 WD05 LCL
88 NewMolecule.D05-2 1 55187567 0.9279879 WD05 LCL
89 NewMolecule.D06-1 1 122174618 0.9893905 WD06 LCL
90 NewMolecule.D06-2 1 211272103 0.7724558 WD06 LCL
91 NewMolecule.D07-1 1 186303795 0.9672297 WD07 LCL
92 NewMolecule.D07-2 1 167009833 0.7753588 WD07 LCL
93 NewMolecule.D08-1 1 183462792 1.0004535 WD08 LCL
94 NewMolecule.D08-2 1 146479099 1.0033560 WD08 LCL
95 NewMolecule.D09-1 1 268002864 0.9646019 WD09 LCL
96 NewMolecule.D09-2 1 91945124 0.9215091 WD09 LCL
97 NewMolecule.D10-1 1 278370909 0.9590191 WD10 LCL
98 NewMolecule.D10-2 1 116412533 0.8802938 WD10 LCL
99 NewMolecule.D11-1 1 175024946 0.9210652 WD11 LCL
100 NewMolecule.D11-2 1 106706050 0.8053986 WD11 LCL
101 NewMolecule.D12-1 1 82466922 0.8892412 WD12 LCL
102 NewMolecule.D12-2 1 243296548 0.7777244 WD12 LCL
103 NewMolecule.E01-1 1 123440553 0.9716875 WE01 LCL
104 NewMolecule.E01-2 1 128461448 0.7058736 WE01 LCL
105 NewMolecule.E02-1 1 440121482 0.9524480 WE02 LCL
106 NewMolecule.E02-2 1 127160501 0.7928684 WE02 LCL
107 NewMolecule.E03-1 1 128754472 0.9062433 WE03 LCL
108 NewMolecule.E03-2 1 185922881 0.8547261 WE03 LCL
109 NewMolecule.E04-1 1 468869018 0.9580410 WE04 LCL
110 NewMolecule.E04-2 1 168588670 0.9200381 WE04 LCL
dose.nM libType rep SampleName bigwig group.y color bed
1 NA polyA 1 DMSO_NA_LCL_polyA_1 <NA> <NA> #969696 <NA>
2 NA polyA 2 DMSO_NA_LCL_polyA_2 <NA> <NA> #969696 <NA>
3 NA polyA 3 DMSO_NA_LCL_polyA_3 <NA> <NA> #969696 <NA>
4 NA polyA 4 DMSO_NA_LCL_polyA_4 <NA> <NA> #969696 <NA>
5 NA polyA 5 DMSO_NA_LCL_polyA_5 <NA> <NA> #969696 <NA>
6 NA polyA 6 DMSO_NA_LCL_polyA_6 <NA> <NA> #969696 <NA>
7 SMN_EC90 dose polyA 1 WA01_NA_LCL_polyA_1 <NA> <NA> #252525 <NA>
8 SMN_EC90 dose polyA 2 WA01_NA_LCL_polyA_2 <NA> <NA> #252525 <NA>
9 SMN_EC90 dose polyA 1 WA02_NA_LCL_polyA_1 <NA> <NA> #252525 <NA>
10 SMN_EC90 dose polyA 2 WA02_NA_LCL_polyA_2 <NA> <NA> #252525 <NA>
11 SMN_EC90 dose polyA 1 WA03_NA_LCL_polyA_1 <NA> <NA> #252525 <NA>
12 SMN_EC90 dose polyA 2 WA03_NA_LCL_polyA_2 <NA> <NA> #252525 <NA>
13 SMN_EC90 dose polyA 1 WA04_NA_LCL_polyA_1 <NA> <NA> #252525 <NA>
14 SMN_EC90 dose polyA 2 WA04_NA_LCL_polyA_2 <NA> <NA> #252525 <NA>
15 SMN_EC90 dose polyA 1 WA05_NA_LCL_polyA_1 <NA> <NA> #252525 <NA>
16 SMN_EC90 dose polyA 2 WA05_NA_LCL_polyA_2 <NA> <NA> #252525 <NA>
17 SMN_EC90 dose polyA 1 WA06_NA_LCL_polyA_1 <NA> <NA> #252525 <NA>
18 SMN_EC90 dose polyA 2 WA06_NA_LCL_polyA_2 <NA> <NA> #252525 <NA>
19 SMN_EC90 dose polyA 1 WA07_NA_LCL_polyA_1 <NA> <NA> #252525 <NA>
20 SMN_EC90 dose polyA 2 WA07_NA_LCL_polyA_2 <NA> <NA> #252525 <NA>
21 SMN_EC90 dose polyA 1 WA08_NA_LCL_polyA_1 <NA> <NA> #252525 <NA>
22 SMN_EC90 dose polyA 2 WA08_NA_LCL_polyA_2 <NA> <NA> #252525 <NA>
23 SMN_EC90 dose polyA 1 WA09_NA_LCL_polyA_1 <NA> <NA> #252525 <NA>
24 SMN_EC90 dose polyA 2 WA09_NA_LCL_polyA_2 <NA> <NA> #252525 <NA>
25 SMN_EC90 dose polyA 1 WA10_NA_LCL_polyA_1 <NA> <NA> #252525 <NA>
26 SMN_EC90 dose polyA 2 WA10_NA_LCL_polyA_2 <NA> <NA> #252525 <NA>
27 SMN_EC90 dose polyA 1 WA11_NA_LCL_polyA_1 <NA> <NA> #252525 <NA>
28 SMN_EC90 dose polyA 2 WA11_NA_LCL_polyA_2 <NA> <NA> #252525 <NA>
29 SMN_EC90 dose polyA 1 WA12_NA_LCL_polyA_1 <NA> <NA> #252525 <NA>
30 SMN_EC90 dose polyA 2 WA12_NA_LCL_polyA_2 <NA> <NA> #252525 <NA>
31 SMN_EC90 dose polyA 1 WB01_NA_LCL_polyA_1 <NA> <NA> #252525 <NA>
32 SMN_EC90 dose polyA 2 WB01_NA_LCL_polyA_2 <NA> <NA> #252525 <NA>
33 SMN_EC90 dose polyA 1 WB02_NA_LCL_polyA_1 <NA> <NA> #252525 <NA>
34 SMN_EC90 dose polyA 2 WB02_NA_LCL_polyA_2 <NA> <NA> #252525 <NA>
35 SMN_EC90 dose polyA 1 WB03_NA_LCL_polyA_1 <NA> <NA> #252525 <NA>
36 SMN_EC90 dose polyA 2 WB03_NA_LCL_polyA_2 <NA> <NA> #252525 <NA>
37 SMN_EC90 dose polyA 1 WB04_NA_LCL_polyA_1 <NA> <NA> #252525 <NA>
38 SMN_EC90 dose polyA 2 WB04_NA_LCL_polyA_2 <NA> <NA> #252525 <NA>
39 SMN_EC90 dose polyA 1 WB05_NA_LCL_polyA_1 <NA> <NA> #252525 <NA>
40 SMN_EC90 dose polyA 2 WB05_NA_LCL_polyA_2 <NA> <NA> #252525 <NA>
41 SMN_EC90 dose polyA 1 WB06_NA_LCL_polyA_1 <NA> <NA> #252525 <NA>
42 SMN_EC90 dose polyA 2 WB06_NA_LCL_polyA_2 <NA> <NA> #252525 <NA>
43 SMN_EC90 dose polyA 1 WB07_NA_LCL_polyA_1 <NA> <NA> #252525 <NA>
44 SMN_EC90 dose polyA 2 WB07_NA_LCL_polyA_2 <NA> <NA> #252525 <NA>
45 SMN_EC90 dose polyA 1 WB08_NA_LCL_polyA_1 <NA> <NA> #252525 <NA>
46 SMN_EC90 dose polyA 2 WB08_NA_LCL_polyA_2 <NA> <NA> #252525 <NA>
47 SMN_EC90 dose polyA 1 WB09_NA_LCL_polyA_1 <NA> <NA> #252525 <NA>
48 SMN_EC90 dose polyA 2 WB09_NA_LCL_polyA_2 <NA> <NA> #252525 <NA>
49 SMN_EC90 dose polyA 1 WB10_NA_LCL_polyA_1 <NA> <NA> #252525 <NA>
50 SMN_EC90 dose polyA 2 WB10_NA_LCL_polyA_2 <NA> <NA> #252525 <NA>
51 SMN_EC90 dose polyA 1 WB11_NA_LCL_polyA_1 <NA> <NA> #252525 <NA>
52 SMN_EC90 dose polyA 2 WB11_NA_LCL_polyA_2 <NA> <NA> #252525 <NA>
53 SMN_EC90 dose polyA 1 WB12_NA_LCL_polyA_1 <NA> <NA> #252525 <NA>
54 SMN_EC90 dose polyA 2 WB12_NA_LCL_polyA_2 <NA> <NA> #252525 <NA>
55 SMN_EC90 dose polyA 1 WC01_NA_LCL_polyA_1 <NA> <NA> #252525 <NA>
56 SMN_EC90 dose polyA 2 WC01_NA_LCL_polyA_2 <NA> <NA> #252525 <NA>
57 SMN_EC90 dose polyA 1 WC02_NA_LCL_polyA_1 <NA> <NA> #252525 <NA>
58 SMN_EC90 dose polyA 2 WC02_NA_LCL_polyA_2 <NA> <NA> #252525 <NA>
59 SMN_EC90 dose polyA 1 WC03_NA_LCL_polyA_1 <NA> <NA> #252525 <NA>
60 SMN_EC90 dose polyA 2 WC03_NA_LCL_polyA_2 <NA> <NA> #252525 <NA>
61 SMN_EC90 dose polyA 1 WC04_NA_LCL_polyA_1 <NA> <NA> #252525 <NA>
62 SMN_EC90 dose polyA 2 WC04_NA_LCL_polyA_2 <NA> <NA> #252525 <NA>
63 SMN_EC90 dose polyA 1 WC05_NA_LCL_polyA_1 <NA> <NA> #252525 <NA>
64 SMN_EC90 dose polyA 2 WC05_NA_LCL_polyA_2 <NA> <NA> #252525 <NA>
65 SMN_EC90 dose polyA 1 WC06_NA_LCL_polyA_1 <NA> <NA> #252525 <NA>
66 SMN_EC90 dose polyA 2 WC06_NA_LCL_polyA_2 <NA> <NA> #252525 <NA>
67 SMN_EC90 dose polyA 1 WC07_NA_LCL_polyA_1 <NA> <NA> #252525 <NA>
68 SMN_EC90 dose polyA 2 WC07_NA_LCL_polyA_2 <NA> <NA> #252525 <NA>
69 SMN_EC90 dose polyA 1 WC08_NA_LCL_polyA_1 <NA> <NA> #252525 <NA>
70 SMN_EC90 dose polyA 2 WC08_NA_LCL_polyA_2 <NA> <NA> #252525 <NA>
71 SMN_EC90 dose polyA 1 WC09_NA_LCL_polyA_1 <NA> <NA> #252525 <NA>
72 SMN_EC90 dose polyA 2 WC09_NA_LCL_polyA_2 <NA> <NA> #252525 <NA>
73 SMN_EC90 dose polyA 1 WC10_NA_LCL_polyA_1 <NA> <NA> #252525 <NA>
74 SMN_EC90 dose polyA 2 WC10_NA_LCL_polyA_2 <NA> <NA> #252525 <NA>
75 SMN_EC90 dose polyA 1 WC11_NA_LCL_polyA_1 <NA> <NA> #252525 <NA>
76 SMN_EC90 dose polyA 2 WC11_NA_LCL_polyA_2 <NA> <NA> #252525 <NA>
77 SMN_EC90 dose polyA 1 WC12_NA_LCL_polyA_1 <NA> <NA> #252525 <NA>
78 SMN_EC90 dose polyA 2 WC12_NA_LCL_polyA_2 <NA> <NA> #252525 <NA>
79 SMN_EC90 dose polyA 1 WD01_NA_LCL_polyA_1 <NA> <NA> #252525 <NA>
80 SMN_EC90 dose polyA 2 WD01_NA_LCL_polyA_2 <NA> <NA> #252525 <NA>
81 SMN_EC90 dose polyA 1 WD02_NA_LCL_polyA_1 <NA> <NA> #252525 <NA>
82 SMN_EC90 dose polyA 2 WD02_NA_LCL_polyA_2 <NA> <NA> #252525 <NA>
83 SMN_EC90 dose polyA 1 WD03_NA_LCL_polyA_1 <NA> <NA> #252525 <NA>
84 SMN_EC90 dose polyA 2 WD03_NA_LCL_polyA_2 <NA> <NA> #252525 <NA>
85 SMN_EC90 dose polyA 1 WD04_NA_LCL_polyA_1 <NA> <NA> #252525 <NA>
86 SMN_EC90 dose polyA 2 WD04_NA_LCL_polyA_2 <NA> <NA> #252525 <NA>
87 SMN_EC90 dose polyA 1 WD05_NA_LCL_polyA_1 <NA> <NA> #252525 <NA>
88 SMN_EC90 dose polyA 2 WD05_NA_LCL_polyA_2 <NA> <NA> #252525 <NA>
89 SMN_EC90 dose polyA 1 WD06_NA_LCL_polyA_1 <NA> <NA> #252525 <NA>
90 SMN_EC90 dose polyA 2 WD06_NA_LCL_polyA_2 <NA> <NA> #252525 <NA>
91 SMN_EC90 dose polyA 1 WD07_NA_LCL_polyA_1 <NA> <NA> #252525 <NA>
92 SMN_EC90 dose polyA 2 WD07_NA_LCL_polyA_2 <NA> <NA> #252525 <NA>
93 SMN_EC90 dose polyA 1 WD08_NA_LCL_polyA_1 <NA> <NA> #252525 <NA>
94 SMN_EC90 dose polyA 2 WD08_NA_LCL_polyA_2 <NA> <NA> #252525 <NA>
95 SMN_EC90 dose polyA 1 WD09_NA_LCL_polyA_1 <NA> <NA> #252525 <NA>
96 SMN_EC90 dose polyA 2 WD09_NA_LCL_polyA_2 <NA> <NA> #252525 <NA>
97 SMN_EC90 dose polyA 1 WD10_NA_LCL_polyA_1 <NA> <NA> #252525 <NA>
98 SMN_EC90 dose polyA 2 WD10_NA_LCL_polyA_2 <NA> <NA> #252525 <NA>
99 SMN_EC90 dose polyA 1 WD11_NA_LCL_polyA_1 <NA> <NA> #252525 <NA>
100 SMN_EC90 dose polyA 2 WD11_NA_LCL_polyA_2 <NA> <NA> #252525 <NA>
101 SMN_EC90 dose polyA 1 WD12_NA_LCL_polyA_1 <NA> <NA> #252525 <NA>
102 SMN_EC90 dose polyA 2 WD12_NA_LCL_polyA_2 <NA> <NA> #252525 <NA>
103 SMN_EC90 dose polyA 1 WE01_NA_LCL_polyA_1 <NA> <NA> #252525 <NA>
104 SMN_EC90 dose polyA 2 WE01_NA_LCL_polyA_2 <NA> <NA> #252525 <NA>
105 SMN_EC90 dose polyA 1 WE02_NA_LCL_polyA_1 <NA> <NA> #252525 <NA>
106 SMN_EC90 dose polyA 2 WE02_NA_LCL_polyA_2 <NA> <NA> #252525 <NA>
107 SMN_EC90 dose polyA 1 WE03_NA_LCL_polyA_1 <NA> <NA> #252525 <NA>
108 SMN_EC90 dose polyA 2 WE03_NA_LCL_polyA_2 <NA> <NA> #252525 <NA>
109 SMN_EC90 dose polyA 1 WE04_NA_LCL_polyA_1 <NA> <NA> #252525 <NA>
110 SMN_EC90 dose polyA 2 WE04_NA_LCL_polyA_2 <NA> <NA> #252525 <NA>
supergroup Experiment leafcutter.name label
1 <NA> Single high dose LCL NewMolecule.E05.1 NA
2 <NA> Single high dose LCL NewMolecule.E06.2 NA
3 <NA> Single high dose LCL NewMolecule.E07.3 NA
4 <NA> Single high dose LCL NewMolecule.E05.4 NA
5 <NA> Single high dose LCL NewMolecule.E06.5 NA
6 <NA> Single high dose LCL NewMolecule.E07.6 NA
7 <NA> Single high dose LCL NewMolecule.A01.1 SMN_EC90 dose
8 <NA> Single high dose LCL NewMolecule.A01.2 SMN_EC90 dose
9 <NA> Single high dose LCL NewMolecule.A02.1 SMN_EC90 dose
10 <NA> Single high dose LCL NewMolecule.A02.2 SMN_EC90 dose
11 <NA> Single high dose LCL NewMolecule.A03.1 SMN_EC90 dose
12 <NA> Single high dose LCL NewMolecule.A03.2 SMN_EC90 dose
13 <NA> Single high dose LCL NewMolecule.A04.1 SMN_EC90 dose
14 <NA> Single high dose LCL NewMolecule.A04.2 SMN_EC90 dose
15 <NA> Single high dose LCL NewMolecule.A05.1 SMN_EC90 dose
16 <NA> Single high dose LCL NewMolecule.A05.2 SMN_EC90 dose
17 <NA> Single high dose LCL NewMolecule.A06.1 SMN_EC90 dose
18 <NA> Single high dose LCL NewMolecule.A06.2 SMN_EC90 dose
19 <NA> Single high dose LCL NewMolecule.A07.1 SMN_EC90 dose
20 <NA> Single high dose LCL NewMolecule.A07.2 SMN_EC90 dose
21 <NA> Single high dose LCL NewMolecule.A08.1 SMN_EC90 dose
22 <NA> Single high dose LCL NewMolecule.A08.2 SMN_EC90 dose
23 <NA> Single high dose LCL NewMolecule.A09.1 SMN_EC90 dose
24 <NA> Single high dose LCL NewMolecule.A09.2 SMN_EC90 dose
25 <NA> Single high dose LCL NewMolecule.A10.1 SMN_EC90 dose
26 <NA> Single high dose LCL NewMolecule.A10.2 SMN_EC90 dose
27 <NA> Single high dose LCL NewMolecule.A11.1 SMN_EC90 dose
28 <NA> Single high dose LCL NewMolecule.A11.2 SMN_EC90 dose
29 <NA> Single high dose LCL NewMolecule.A12.1 SMN_EC90 dose
30 <NA> Single high dose LCL NewMolecule.A12.2 SMN_EC90 dose
31 <NA> Single high dose LCL NewMolecule.B01.1 SMN_EC90 dose
32 <NA> Single high dose LCL NewMolecule.B01.2 SMN_EC90 dose
33 <NA> Single high dose LCL NewMolecule.B02.1 SMN_EC90 dose
34 <NA> Single high dose LCL NewMolecule.B02.2 SMN_EC90 dose
35 <NA> Single high dose LCL NewMolecule.B03.1 SMN_EC90 dose
36 <NA> Single high dose LCL NewMolecule.B03.2 SMN_EC90 dose
37 <NA> Single high dose LCL NewMolecule.B04.1 SMN_EC90 dose
38 <NA> Single high dose LCL NewMolecule.B04.2 SMN_EC90 dose
39 <NA> Single high dose LCL NewMolecule.B05.1 SMN_EC90 dose
40 <NA> Single high dose LCL NewMolecule.B05.2 SMN_EC90 dose
41 <NA> Single high dose LCL NewMolecule.B06.1 SMN_EC90 dose
42 <NA> Single high dose LCL NewMolecule.B06.2 SMN_EC90 dose
43 <NA> Single high dose LCL NewMolecule.B07.1 SMN_EC90 dose
44 <NA> Single high dose LCL NewMolecule.B07.2 SMN_EC90 dose
45 <NA> Single high dose LCL NewMolecule.B08.1 SMN_EC90 dose
46 <NA> Single high dose LCL NewMolecule.B08.2 SMN_EC90 dose
47 <NA> Single high dose LCL NewMolecule.B09.1 SMN_EC90 dose
48 <NA> Single high dose LCL NewMolecule.B09.2 SMN_EC90 dose
49 <NA> Single high dose LCL NewMolecule.B10.1 SMN_EC90 dose
50 <NA> Single high dose LCL NewMolecule.B10.2 SMN_EC90 dose
51 <NA> Single high dose LCL NewMolecule.B11.1 SMN_EC90 dose
52 <NA> Single high dose LCL NewMolecule.B11.2 SMN_EC90 dose
53 <NA> Single high dose LCL NewMolecule.B12.1 SMN_EC90 dose
54 <NA> Single high dose LCL NewMolecule.B12.2 SMN_EC90 dose
55 <NA> Single high dose LCL NewMolecule.C01.1 SMN_EC90 dose
56 <NA> Single high dose LCL NewMolecule.C01.2 SMN_EC90 dose
57 <NA> Single high dose LCL NewMolecule.C02.1 SMN_EC90 dose
58 <NA> Single high dose LCL NewMolecule.C02.2 SMN_EC90 dose
59 <NA> Single high dose LCL NewMolecule.C03.1 SMN_EC90 dose
60 <NA> Single high dose LCL NewMolecule.C03.2 SMN_EC90 dose
61 <NA> Single high dose LCL NewMolecule.C04.1 SMN_EC90 dose
62 <NA> Single high dose LCL NewMolecule.C04.2 SMN_EC90 dose
63 <NA> Single high dose LCL NewMolecule.C05.1 SMN_EC90 dose
64 <NA> Single high dose LCL NewMolecule.C05.2 SMN_EC90 dose
65 <NA> Single high dose LCL NewMolecule.C06.1 SMN_EC90 dose
66 <NA> Single high dose LCL NewMolecule.C06.2 SMN_EC90 dose
67 <NA> Single high dose LCL NewMolecule.C07.1 SMN_EC90 dose
68 <NA> Single high dose LCL NewMolecule.C07.2 SMN_EC90 dose
69 <NA> Single high dose LCL NewMolecule.C08.1 SMN_EC90 dose
70 <NA> Single high dose LCL NewMolecule.C08.2 SMN_EC90 dose
71 <NA> Single high dose LCL NewMolecule.C09.1 SMN_EC90 dose
72 <NA> Single high dose LCL NewMolecule.C09.2 SMN_EC90 dose
73 <NA> Single high dose LCL NewMolecule.C10.1 SMN_EC90 dose
74 <NA> Single high dose LCL NewMolecule.C10.2 SMN_EC90 dose
75 <NA> Single high dose LCL NewMolecule.C11.1 SMN_EC90 dose
76 <NA> Single high dose LCL NewMolecule.C11.2 SMN_EC90 dose
77 <NA> Single high dose LCL NewMolecule.C12.1 SMN_EC90 dose
78 <NA> Single high dose LCL NewMolecule.C12.2 SMN_EC90 dose
79 <NA> Single high dose LCL NewMolecule.D01.1 SMN_EC90 dose
80 <NA> Single high dose LCL NewMolecule.D01.2 SMN_EC90 dose
81 <NA> Single high dose LCL NewMolecule.D02.1 SMN_EC90 dose
82 <NA> Single high dose LCL NewMolecule.D02.2 SMN_EC90 dose
83 <NA> Single high dose LCL NewMolecule.D03.1 SMN_EC90 dose
84 <NA> Single high dose LCL NewMolecule.D03.2 SMN_EC90 dose
85 <NA> Single high dose LCL NewMolecule.D04.1 SMN_EC90 dose
86 <NA> Single high dose LCL NewMolecule.D04.2 SMN_EC90 dose
87 <NA> Single high dose LCL NewMolecule.D05.1 SMN_EC90 dose
88 <NA> Single high dose LCL NewMolecule.D05.2 SMN_EC90 dose
89 <NA> Single high dose LCL NewMolecule.D06.1 SMN_EC90 dose
90 <NA> Single high dose LCL NewMolecule.D06.2 SMN_EC90 dose
91 <NA> Single high dose LCL NewMolecule.D07.1 SMN_EC90 dose
92 <NA> Single high dose LCL NewMolecule.D07.2 SMN_EC90 dose
93 <NA> Single high dose LCL NewMolecule.D08.1 SMN_EC90 dose
94 <NA> Single high dose LCL NewMolecule.D08.2 SMN_EC90 dose
95 <NA> Single high dose LCL NewMolecule.D09.1 SMN_EC90 dose
96 <NA> Single high dose LCL NewMolecule.D09.2 SMN_EC90 dose
97 <NA> Single high dose LCL NewMolecule.D10.1 SMN_EC90 dose
98 <NA> Single high dose LCL NewMolecule.D10.2 SMN_EC90 dose
99 <NA> Single high dose LCL NewMolecule.D11.1 SMN_EC90 dose
100 <NA> Single high dose LCL NewMolecule.D11.2 SMN_EC90 dose
101 <NA> Single high dose LCL NewMolecule.D12.1 SMN_EC90 dose
102 <NA> Single high dose LCL NewMolecule.D12.2 SMN_EC90 dose
103 <NA> Single high dose LCL NewMolecule.E01.1 SMN_EC90 dose
104 <NA> Single high dose LCL NewMolecule.E01.2 SMN_EC90 dose
105 <NA> Single high dose LCL NewMolecule.E02.1 SMN_EC90 dose
106 <NA> Single high dose LCL NewMolecule.E02.2 SMN_EC90 dose
107 <NA> Single high dose LCL NewMolecule.E03.1 SMN_EC90 dose
108 <NA> Single high dose LCL NewMolecule.E03.2 SMN_EC90 dose
109 <NA> Single high dose LCL NewMolecule.E04.1 SMN_EC90 dose
110 <NA> Single high dose LCL NewMolecule.E04.2 SMN_EC90 dose
ReadsPerDataset <- ggplot(counts.plot.dat, aes(x=old.sample.name, y=lib.size/2E6, fill=color)) +
geom_col() +
scale_fill_identity() +
scale_x_discrete(name="dose (nanomolar)", label=counts.plot.labels) +
scale_y_continuous(expand=c(0,0)) +
theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust=1, size=3)) +
theme(strip.text.x = element_text(size = 8)) +
facet_grid(~Experiment, scales = "free_x", space = "free_x", labeller = label_wrap_gen(15)) +
labs(title="RNA-seq datasets", y="Read count (M)")
ReadsPerDataset

So for the most part, these libraries were sequenced deeper than any of our previous experiments with Jingxin. Let’s make note of that one outlier sample that was clearly not sequenced well. This sample will probably have to be excluded from further analysis…
counts.plot.dat %>%
arrange(lib.size) %>%
head(1)
old.sample.name group.x lib.size norm.factors treatment cell.type
1 NewMolecule.C04-2 1 4025684 0.8837338 WC04 LCL
dose.nM libType rep SampleName bigwig group.y color bed
1 SMN_EC90 dose polyA 2 WC04_NA_LCL_polyA_2 <NA> <NA> #252525 <NA>
supergroup Experiment leafcutter.name label
1 <NA> Single high dose LCL NewMolecule.C04.2 SMN_EC90 dose
Ok so rep2 of the molecule in well C04 will probably have to be excluded. 4M reads isn’t enough to get much out of, and I sort of worry whether those 4M reads are even from the corresponding library versus barcode contamination from other samples on the lane…
First let’s perform PCA with all samples, including fibroblasts. I know this isn’t necessarily the most interpretable, but I am just curious what the first few PCs will look like…
SamplesToInclude <- FullMetadata %>%
pull(old.sample.name)
CPM <- gene.counts %>%
cpm(log=T, prior.count=T) %>%
as.data.frame() %>%
rownames_to_column("Geneid") %>%
dplyr::select(Geneid, all_of(SamplesToInclude))
Top14K_ExpressedGenes <- (CPM %>%
column_to_rownames("Geneid") %>%
apply(1, mean) %>%
sort(decreasing=T))[1:14000] %>%
names()
pca.results <- CPM %>%
filter(Geneid %in% Top14K_ExpressedGenes) %>%
column_to_rownames("Geneid") %>%
scale() %>% t() %>% prcomp(scale=T)
summary(pca.results)
Importance of components:
PC1 PC2 PC3 PC4 PC5 PC6
Standard deviation 72.6439 62.2561 42.4165 30.06520 18.23340 14.95023
Proportion of Variance 0.3769 0.2768 0.1285 0.06457 0.02375 0.01596
Cumulative Proportion 0.3769 0.6538 0.7823 0.84686 0.87061 0.88657
PC7 PC8 PC9 PC10 PC11 PC12
Standard deviation 12.10732 11.51993 10.52554 8.64035 8.05303 7.52829
Proportion of Variance 0.01047 0.00948 0.00791 0.00533 0.00463 0.00405
Cumulative Proportion 0.89704 0.90652 0.91443 0.91977 0.92440 0.92845
PC13 PC14 PC15 PC16 PC17 PC18 PC19
Standard deviation 7.43240 6.07546 5.90323 5.71783 5.45541 4.70763 4.5829
Proportion of Variance 0.00395 0.00264 0.00249 0.00234 0.00213 0.00158 0.0015
Cumulative Proportion 0.93239 0.93503 0.93752 0.93985 0.94198 0.94356 0.9451
PC20 PC21 PC22 PC23 PC24 PC25 PC26
Standard deviation 4.34842 4.2642 4.18983 4.05111 3.70776 3.66428 3.52860
Proportion of Variance 0.00135 0.0013 0.00125 0.00117 0.00098 0.00096 0.00089
Cumulative Proportion 0.94641 0.9477 0.94897 0.95014 0.95112 0.95208 0.95297
PC27 PC28 PC29 PC30 PC31 PC32 PC33
Standard deviation 3.46988 3.42074 3.30835 3.25718 3.20878 3.19963 3.1289
Proportion of Variance 0.00086 0.00084 0.00078 0.00076 0.00074 0.00073 0.0007
Cumulative Proportion 0.95383 0.95466 0.95545 0.95620 0.95694 0.95767 0.9584
PC34 PC35 PC36 PC37 PC38 PC39 PC40
Standard deviation 3.09510 3.07249 3.04599 3.02764 2.96070 2.91136 2.87777
Proportion of Variance 0.00068 0.00067 0.00066 0.00065 0.00063 0.00061 0.00059
Cumulative Proportion 0.95905 0.95973 0.96039 0.96105 0.96167 0.96228 0.96287
PC41 PC42 PC43 PC44 PC45 PC46 PC47
Standard deviation 2.85418 2.81059 2.76061 2.75270 2.74102 2.70975 2.66896
Proportion of Variance 0.00058 0.00056 0.00054 0.00054 0.00054 0.00052 0.00051
Cumulative Proportion 0.96345 0.96402 0.96456 0.96510 0.96564 0.96616 0.96667
PC48 PC49 PC50 PC51 PC52 PC53 PC54
Standard deviation 2.6481 2.62696 2.59712 2.56767 2.55087 2.53274 2.49007
Proportion of Variance 0.0005 0.00049 0.00048 0.00047 0.00046 0.00046 0.00044
Cumulative Proportion 0.9672 0.96767 0.96815 0.96862 0.96908 0.96954 0.96998
PC55 PC56 PC57 PC58 PC59 PC60 PC61
Standard deviation 2.48213 2.47583 2.46033 2.44854 2.41859 2.40764 2.38683
Proportion of Variance 0.00044 0.00044 0.00043 0.00043 0.00042 0.00041 0.00041
Cumulative Proportion 0.97042 0.97086 0.97129 0.97172 0.97214 0.97255 0.97296
PC62 PC63 PC64 PC65 PC66 PC67 PC68
Standard deviation 2.3676 2.34748 2.33774 2.32323 2.30991 2.29735 2.28203
Proportion of Variance 0.0004 0.00039 0.00039 0.00039 0.00038 0.00038 0.00037
Cumulative Proportion 0.9734 0.97375 0.97415 0.97453 0.97491 0.97529 0.97566
PC69 PC70 PC71 PC72 PC73 PC74 PC75
Standard deviation 2.26430 2.25797 2.25197 2.23527 2.21472 2.20971 2.19590
Proportion of Variance 0.00037 0.00036 0.00036 0.00036 0.00035 0.00035 0.00034
Cumulative Proportion 0.97603 0.97639 0.97675 0.97711 0.97746 0.97781 0.97815
PC76 PC77 PC78 PC79 PC80 PC81 PC82
Standard deviation 2.19124 2.16918 2.15949 2.14600 2.13137 2.11768 2.10960
Proportion of Variance 0.00034 0.00034 0.00033 0.00033 0.00032 0.00032 0.00032
Cumulative Proportion 0.97850 0.97883 0.97917 0.97950 0.97982 0.98014 0.98046
PC83 PC84 PC85 PC86 PC87 PC88 PC89
Standard deviation 2.10420 2.09067 2.07932 2.07158 2.06890 2.0552 2.0510
Proportion of Variance 0.00032 0.00031 0.00031 0.00031 0.00031 0.0003 0.0003
Cumulative Proportion 0.98077 0.98109 0.98140 0.98170 0.98201 0.9823 0.9826
PC90 PC91 PC92 PC93 PC94 PC95 PC96
Standard deviation 2.0325 2.02628 2.02095 2.01505 2.01453 2.00140 1.99207
Proportion of Variance 0.0003 0.00029 0.00029 0.00029 0.00029 0.00029 0.00028
Cumulative Proportion 0.9829 0.98320 0.98349 0.98378 0.98407 0.98436 0.98464
PC97 PC98 PC99 PC100 PC101 PC102 PC103
Standard deviation 1.97879 1.96978 1.96142 1.95843 1.94605 1.93103 1.92697
Proportion of Variance 0.00028 0.00028 0.00027 0.00027 0.00027 0.00027 0.00027
Cumulative Proportion 0.98492 0.98520 0.98547 0.98574 0.98602 0.98628 0.98655
PC104 PC105 PC106 PC107 PC108 PC109 PC110
Standard deviation 1.91522 1.90976 1.90458 1.90223 1.89465 1.89031 1.88708
Proportion of Variance 0.00026 0.00026 0.00026 0.00026 0.00026 0.00026 0.00025
Cumulative Proportion 0.98681 0.98707 0.98733 0.98759 0.98784 0.98810 0.98835
PC111 PC112 PC113 PC114 PC115 PC116 PC117
Standard deviation 1.86688 1.86214 1.85424 1.85032 1.83938 1.82714 1.82318
Proportion of Variance 0.00025 0.00025 0.00025 0.00024 0.00024 0.00024 0.00024
Cumulative Proportion 0.98860 0.98885 0.98910 0.98934 0.98958 0.98982 0.99006
PC118 PC119 PC120 PC121 PC122 PC123 PC124
Standard deviation 1.80790 1.80509 1.79744 1.78928 1.77623 1.77575 1.76304
Proportion of Variance 0.00023 0.00023 0.00023 0.00023 0.00023 0.00023 0.00022
Cumulative Proportion 0.99029 0.99052 0.99075 0.99098 0.99121 0.99143 0.99166
PC125 PC126 PC127 PC128 PC129 PC130 PC131
Standard deviation 1.75581 1.74527 1.74072 1.73710 1.72480 1.71605 1.71063
Proportion of Variance 0.00022 0.00022 0.00022 0.00022 0.00021 0.00021 0.00021
Cumulative Proportion 0.99188 0.99209 0.99231 0.99253 0.99274 0.99295 0.99316
PC132 PC133 PC134 PC135 PC136 PC137 PC138
Standard deviation 1.70481 1.69772 1.6895 1.6757 1.6641 1.6591 1.64981
Proportion of Variance 0.00021 0.00021 0.0002 0.0002 0.0002 0.0002 0.00019
Cumulative Proportion 0.99336 0.99357 0.9938 0.9940 0.9942 0.9944 0.99456
PC139 PC140 PC141 PC142 PC143 PC144 PC145
Standard deviation 1.64177 1.63931 1.62764 1.61481 1.61010 1.60438 1.59635
Proportion of Variance 0.00019 0.00019 0.00019 0.00019 0.00019 0.00018 0.00018
Cumulative Proportion 0.99476 0.99495 0.99514 0.99532 0.99551 0.99569 0.99587
PC146 PC147 PC148 PC149 PC150 PC151 PC152
Standard deviation 1.59185 1.58410 1.57662 1.56689 1.55334 1.55064 1.54392
Proportion of Variance 0.00018 0.00018 0.00018 0.00018 0.00017 0.00017 0.00017
Cumulative Proportion 0.99606 0.99624 0.99641 0.99659 0.99676 0.99693 0.99710
PC153 PC154 PC155 PC156 PC157 PC158 PC159
Standard deviation 1.52924 1.52064 1.51619 1.50377 1.49740 1.48478 1.47533
Proportion of Variance 0.00017 0.00017 0.00016 0.00016 0.00016 0.00016 0.00016
Cumulative Proportion 0.99727 0.99743 0.99760 0.99776 0.99792 0.99808 0.99823
PC160 PC161 PC162 PC163 PC164 PC165 PC166
Standard deviation 1.46910 1.45457 1.44128 1.42765 1.41422 1.39923 1.38427
Proportion of Variance 0.00015 0.00015 0.00015 0.00015 0.00014 0.00014 0.00014
Cumulative Proportion 0.99839 0.99854 0.99869 0.99883 0.99898 0.99912 0.99925
PC167 PC168 PC169 PC170 PC171 PC172
Standard deviation 1.37542 1.37181 1.35146 1.31545 1.27766 1.22693
Proportion of Variance 0.00014 0.00013 0.00013 0.00012 0.00012 0.00011
Cumulative Proportion 0.99939 0.99952 0.99965 0.99978 0.99989 1.00000
PC173
Standard deviation 1.839e-14
Proportion of Variance 0.000e+00
Cumulative Proportion 1.000e+00
pca.results.to.plot <- pca.results$x %>%
as.data.frame() %>%
rownames_to_column("old.sample.name") %>%
dplyr::select(old.sample.name, PC1:PC6) %>%
left_join(FullMetadata, by="old.sample.name")
ggplot(pca.results.to.plot, aes(x=PC1, y=PC2, color=color, shape=Experiment)) +
geom_point(size=3) +
scale_color_identity() +
theme_bw() +
labs(title = "PCA using top 14K genes\nAll samples")

ggplot(pca.results.to.plot, aes(x=PC3, y=PC4, color=color, shape=Experiment)) +
geom_point(size=3) +
scale_color_identity() +
theme_bw() +
labs(title = "PCA using top 14K genes\nAll samples")

ggplot(pca.results.to.plot, aes(x=PC5, y=PC6, color=color, shape=Experiment)) +
geom_point(size=3) +
scale_color_identity() +
theme_bw() +
labs(title = "PCA using top 14K genes\nAll samples")

Let’s repeat but just include the dose titration experiment samples and this recent experiment of 52 molecules…
SamplesToInclude <- FullMetadata %>%
filter(Experiment %in% c("Single high dose LCL", "Dose response titration")) %>%
pull(old.sample.name)
CPM <- gene.counts %>%
cpm(log=T, prior.count=T) %>%
as.data.frame() %>%
rownames_to_column("Geneid") %>%
dplyr::select(Geneid, all_of(SamplesToInclude))
Top14K_ExpressedGenes <- (CPM %>%
column_to_rownames("Geneid") %>%
apply(1, mean) %>%
sort(decreasing=T))[1:14000] %>%
names()
pca.results <- CPM %>%
filter(Geneid %in% Top14K_ExpressedGenes) %>%
column_to_rownames("Geneid") %>%
scale() %>% t() %>% prcomp(scale=T)
summary(pca.results)
Importance of components:
PC1 PC2 PC3 PC4 PC5 PC6
Standard deviation 83.1894 55.0561 24.77276 24.28695 19.94844 14.70160
Proportion of Variance 0.4943 0.2165 0.04383 0.04213 0.02842 0.01544
Cumulative Proportion 0.4943 0.7108 0.75467 0.79680 0.82522 0.84066
PC7 PC8 PC9 PC10 PC11 PC12
Standard deviation 13.00748 11.7139 10.86657 9.32844 8.28935 7.53691
Proportion of Variance 0.01209 0.0098 0.00843 0.00622 0.00491 0.00406
Cumulative Proportion 0.85275 0.8626 0.87098 0.87720 0.88211 0.88616
PC13 PC14 PC15 PC16 PC17 PC18 PC19
Standard deviation 7.24574 6.92730 6.29126 5.97731 5.69311 5.48199 5.2932
Proportion of Variance 0.00375 0.00343 0.00283 0.00255 0.00232 0.00215 0.0020
Cumulative Proportion 0.88991 0.89334 0.89617 0.89872 0.90104 0.90318 0.9052
PC20 PC21 PC22 PC23 PC24 PC25 PC26
Standard deviation 5.04198 4.97192 4.91997 4.79399 4.69313 4.61243 4.55639
Proportion of Variance 0.00182 0.00177 0.00173 0.00164 0.00157 0.00152 0.00148
Cumulative Proportion 0.90700 0.90877 0.91050 0.91214 0.91371 0.91523 0.91671
PC27 PC28 PC29 PC30 PC31 PC32 PC33
Standard deviation 4.54571 4.46480 4.4291 4.38057 4.30252 4.24036 4.19862
Proportion of Variance 0.00148 0.00142 0.0014 0.00137 0.00132 0.00128 0.00126
Cumulative Proportion 0.91819 0.91961 0.9210 0.92238 0.92371 0.92499 0.92625
PC34 PC35 PC36 PC37 PC38 PC39 PC40
Standard deviation 4.17229 4.12458 4.0955 4.07422 4.06222 3.98905 3.96266
Proportion of Variance 0.00124 0.00122 0.0012 0.00119 0.00118 0.00114 0.00112
Cumulative Proportion 0.92749 0.92871 0.9299 0.93109 0.93227 0.93341 0.93453
PC41 PC42 PC43 PC44 PC45 PC46 PC47
Standard deviation 3.9299 3.89363 3.89227 3.82138 3.81024 3.77689 3.75136
Proportion of Variance 0.0011 0.00108 0.00108 0.00104 0.00104 0.00102 0.00101
Cumulative Proportion 0.9356 0.93672 0.93780 0.93884 0.93988 0.94090 0.94190
PC48 PC49 PC50 PC51 PC52 PC53 PC54
Standard deviation 3.72866 3.72341 3.69228 3.66300 3.65323 3.62103 3.60460
Proportion of Variance 0.00099 0.00099 0.00097 0.00096 0.00095 0.00094 0.00093
Cumulative Proportion 0.94289 0.94388 0.94486 0.94582 0.94677 0.94771 0.94864
PC55 PC56 PC57 PC58 PC59 PC60 PC61
Standard deviation 3.58833 3.57787 3.5416 3.51746 3.50854 3.48229 3.46762
Proportion of Variance 0.00092 0.00091 0.0009 0.00088 0.00088 0.00087 0.00086
Cumulative Proportion 0.94955 0.95047 0.9514 0.95225 0.95313 0.95399 0.95485
PC62 PC63 PC64 PC65 PC66 PC67 PC68
Standard deviation 3.45692 3.43346 3.42045 3.39272 3.38289 3.36342 3.3474
Proportion of Variance 0.00085 0.00084 0.00084 0.00082 0.00082 0.00081 0.0008
Cumulative Proportion 0.95571 0.95655 0.95738 0.95821 0.95902 0.95983 0.9606
PC69 PC70 PC71 PC72 PC73 PC74 PC75
Standard deviation 3.32739 3.31967 3.30262 3.28465 3.28166 3.25750 3.23148
Proportion of Variance 0.00079 0.00079 0.00078 0.00077 0.00077 0.00076 0.00075
Cumulative Proportion 0.96142 0.96221 0.96299 0.96376 0.96453 0.96529 0.96603
PC76 PC77 PC78 PC79 PC80 PC81 PC82
Standard deviation 3.22940 3.21138 3.19718 3.18893 3.17672 3.16148 3.14200
Proportion of Variance 0.00074 0.00074 0.00073 0.00073 0.00072 0.00071 0.00071
Cumulative Proportion 0.96678 0.96751 0.96825 0.96897 0.96969 0.97041 0.97111
PC83 PC84 PC85 PC86 PC87 PC88 PC89
Standard deviation 3.1396 3.1281 3.11272 3.09873 3.07685 3.03933 3.03570
Proportion of Variance 0.0007 0.0007 0.00069 0.00069 0.00068 0.00066 0.00066
Cumulative Proportion 0.9718 0.9725 0.97321 0.97389 0.97457 0.97523 0.97589
PC90 PC91 PC92 PC93 PC94 PC95 PC96
Standard deviation 3.02075 3.01671 2.99872 2.98609 2.97603 2.95891 2.94129
Proportion of Variance 0.00065 0.00065 0.00064 0.00064 0.00063 0.00063 0.00062
Cumulative Proportion 0.97654 0.97719 0.97783 0.97847 0.97910 0.97973 0.98034
PC97 PC98 PC99 PC100 PC101 PC102 PC103
Standard deviation 2.91912 2.91648 2.8908 2.87791 2.86451 2.84156 2.83717
Proportion of Variance 0.00061 0.00061 0.0006 0.00059 0.00059 0.00058 0.00057
Cumulative Proportion 0.98095 0.98156 0.9822 0.98275 0.98333 0.98391 0.98449
PC104 PC105 PC106 PC107 PC108 PC109 PC110
Standard deviation 2.82141 2.79989 2.77712 2.76955 2.75960 2.74038 2.73493
Proportion of Variance 0.00057 0.00056 0.00055 0.00055 0.00054 0.00054 0.00053
Cumulative Proportion 0.98505 0.98561 0.98617 0.98671 0.98726 0.98779 0.98833
PC111 PC112 PC113 PC114 PC115 PC116 PC117
Standard deviation 2.72667 2.71699 2.69161 2.67425 2.67144 2.6408 2.6329
Proportion of Variance 0.00053 0.00053 0.00052 0.00051 0.00051 0.0005 0.0005
Cumulative Proportion 0.98886 0.98939 0.98990 0.99041 0.99092 0.9914 0.9919
PC118 PC119 PC120 PC121 PC122 PC123 PC124
Standard deviation 2.62226 2.60273 2.57230 2.56833 2.53999 2.53069 2.50099
Proportion of Variance 0.00049 0.00048 0.00047 0.00047 0.00046 0.00046 0.00045
Cumulative Proportion 0.99241 0.99289 0.99337 0.99384 0.99430 0.99475 0.99520
PC125 PC126 PC127 PC128 PC129 PC130 PC131
Standard deviation 2.48410 2.47870 2.46989 2.41798 2.40858 2.39057 2.3799
Proportion of Variance 0.00044 0.00044 0.00044 0.00042 0.00041 0.00041 0.0004
Cumulative Proportion 0.99564 0.99608 0.99652 0.99693 0.99735 0.99776 0.9982
PC132 PC133 PC134 PC135 PC136 PC137
Standard deviation 2.3681 2.33293 2.28534 2.21770 2.13166 3.325e-14
Proportion of Variance 0.0004 0.00039 0.00037 0.00035 0.00032 0.000e+00
Cumulative Proportion 0.9986 0.99895 0.99932 0.99968 1.00000 1.000e+00
pca.results.to.plot <- pca.results$x %>%
as.data.frame() %>%
rownames_to_column("old.sample.name") %>%
dplyr::select(old.sample.name, PC1:PC6) %>%
left_join(FullMetadata, by="old.sample.name")
ggplot(pca.results.to.plot, aes(x=PC1, y=PC2, color=color, shape=Experiment)) +
geom_point(size=3) +
scale_color_identity() +
theme_bw() +
labs(title = "PCA using top 14K genes\nTitration experiment and ExpOf52")

ggplot(pca.results.to.plot, aes(x=PC3, y=PC4, color=color, shape=Experiment)) +
geom_point(size=3) +
scale_color_identity() +
theme_bw() +
labs(title = "PCA using top 14K genes\nTitration experiment and ExpOf52")

ggplot(pca.results.to.plot, aes(x=PC5, y=PC6, color=color, shape=Experiment)) +
geom_point(size=3) +
scale_color_identity() +
theme_bw() +
labs(title = "PCA using top 14K genes\nTitration experiment and ExpOf52")

pca.results.to.plot %>%
filter(PC5>200)
old.sample.name PC1 PC2 PC3 PC4 PC5 PC6
1 NewMolecule.C04-2 78.41428 41.81794 7.094914 17.13685 217.0802 48.79238
treatment cell.type dose.nM libType rep SampleName bigwig group
1 WC04 LCL NA polyA 2 WC04_NA_LCL_polyA_2 <NA> <NA>
color bed supergroup Experiment leafcutter.name
1 #252525 <NA> <NA> Single high dose LCL NewMolecule.C04.2
Ok… some interesting patterns. That one outlier in PC5 is the sample with low read depth that I want to exclude. It seems like the lab/batch effects are quite strong, the first two PCs have nothing to do with dose. (Lighter gray points are the DMSO controls in both experiments).. Maybe PC3 is related to this? But it’s not going to be so straightforward to integrate this experiment of 52 data with the previous dose titration experiment also done in LCLs. Well at least previously when I perform PCA based on splicing junction excision ratios these batch and tissue effects seem to dissappear in the PCA, so maybe those splicing comparisons might be more fair without more careful integration. Now let’s try PCA again but now just looking at the samples in this recent experiment of 52 molecules, to check at least that the DMSO samples cluster together…
SamplesToInclude <- FullMetadata %>%
filter(Experiment %in% c("Single high dose LCL")) %>%
filter(!old.sample.name == "NewMolecule.C04-2") %>%
pull(old.sample.name)
CPM <- gene.counts %>%
cpm(log=T, prior.count=T) %>%
as.data.frame() %>%
rownames_to_column("Geneid") %>%
dplyr::select(Geneid, all_of(SamplesToInclude))
Top14K_ExpressedGenes <- (CPM %>%
column_to_rownames("Geneid") %>%
apply(1, mean) %>%
sort(decreasing=T))[1:14000] %>%
names()
pca.results <- CPM %>%
filter(Geneid %in% Top14K_ExpressedGenes) %>%
column_to_rownames("Geneid") %>%
scale() %>% t() %>% prcomp(scale=T)
summary(pca.results)
Importance of components:
PC1 PC2 PC3 PC4 PC5 PC6
Standard deviation 90.959 35.44498 25.33393 18.9319 16.70679 13.71600
Proportion of Variance 0.591 0.08974 0.04584 0.0256 0.01994 0.01344
Cumulative Proportion 0.591 0.68071 0.72655 0.7521 0.77209 0.78553
PC7 PC8 PC9 PC10 PC11 PC12
Standard deviation 13.2829 11.50160 10.55358 9.41743 8.79989 8.34584
Proportion of Variance 0.0126 0.00945 0.00796 0.00633 0.00553 0.00498
Cumulative Proportion 0.7981 0.80758 0.81553 0.82187 0.82740 0.83238
PC13 PC14 PC15 PC16 PC17 PC18 PC19
Standard deviation 8.08116 7.48997 7.28621 7.0990 6.92833 6.66918 6.61082
Proportion of Variance 0.00466 0.00401 0.00379 0.0036 0.00343 0.00318 0.00312
Cumulative Proportion 0.83704 0.84105 0.84484 0.8484 0.85187 0.85505 0.85817
PC20 PC21 PC22 PC23 PC24 PC25 PC26
Standard deviation 6.51037 6.40642 6.33377 6.27073 6.12692 6.07714 5.94719
Proportion of Variance 0.00303 0.00293 0.00287 0.00281 0.00268 0.00264 0.00253
Cumulative Proportion 0.86119 0.86413 0.86699 0.86980 0.87248 0.87512 0.87765
PC27 PC28 PC29 PC30 PC31 PC32 PC33
Standard deviation 5.87452 5.82229 5.78341 5.75898 5.68407 5.60317 5.5522
Proportion of Variance 0.00247 0.00242 0.00239 0.00237 0.00231 0.00224 0.0022
Cumulative Proportion 0.88011 0.88253 0.88492 0.88729 0.88960 0.89184 0.8940
PC34 PC35 PC36 PC37 PC38 PC39 PC40
Standard deviation 5.53457 5.40381 5.40268 5.35630 5.33855 5.27676 5.24385
Proportion of Variance 0.00219 0.00209 0.00208 0.00205 0.00204 0.00199 0.00196
Cumulative Proportion 0.89623 0.89832 0.90040 0.90245 0.90449 0.90647 0.90844
PC41 PC42 PC43 PC44 PC45 PC46 PC47
Standard deviation 5.21234 5.20952 5.18056 5.13364 5.12455 5.08510 5.05057
Proportion of Variance 0.00194 0.00194 0.00192 0.00188 0.00188 0.00185 0.00182
Cumulative Proportion 0.91038 0.91232 0.91424 0.91612 0.91799 0.91984 0.92166
PC48 PC49 PC50 PC51 PC52 PC53 PC54
Standard deviation 5.02761 4.96764 4.91994 4.90446 4.85925 4.81311 4.80058
Proportion of Variance 0.00181 0.00176 0.00173 0.00172 0.00169 0.00165 0.00165
Cumulative Proportion 0.92347 0.92523 0.92696 0.92868 0.93036 0.93202 0.93367
PC55 PC56 PC57 PC58 PC59 PC60 PC61
Standard deviation 4.78097 4.76876 4.74355 4.70452 4.67266 4.66706 4.64767
Proportion of Variance 0.00163 0.00162 0.00161 0.00158 0.00156 0.00156 0.00154
Cumulative Proportion 0.93530 0.93692 0.93853 0.94011 0.94167 0.94323 0.94477
PC62 PC63 PC64 PC65 PC66 PC67 PC68
Standard deviation 4.63563 4.59928 4.56367 4.54641 4.53988 4.52220 4.49649
Proportion of Variance 0.00153 0.00151 0.00149 0.00148 0.00147 0.00146 0.00144
Cumulative Proportion 0.94630 0.94781 0.94930 0.95078 0.95225 0.95371 0.95516
PC69 PC70 PC71 PC72 PC73 PC74 PC75
Standard deviation 4.48580 4.46001 4.43564 4.4250 4.39833 4.35304 4.33560
Proportion of Variance 0.00144 0.00142 0.00141 0.0014 0.00138 0.00135 0.00134
Cumulative Proportion 0.95659 0.95801 0.95942 0.9608 0.96220 0.96355 0.96490
PC76 PC77 PC78 PC79 PC80 PC81 PC82
Standard deviation 4.28467 4.2674 4.2613 4.23460 4.21085 4.19227 4.17845
Proportion of Variance 0.00131 0.0013 0.0013 0.00128 0.00127 0.00126 0.00125
Cumulative Proportion 0.96621 0.9675 0.9688 0.97009 0.97135 0.97261 0.97385
PC83 PC84 PC85 PC86 PC87 PC88 PC89
Standard deviation 4.14368 4.14020 4.1010 4.07135 4.05284 3.99964 3.95561
Proportion of Variance 0.00123 0.00122 0.0012 0.00118 0.00117 0.00114 0.00112
Cumulative Proportion 0.97508 0.97631 0.9775 0.97869 0.97986 0.98101 0.98212
PC90 PC91 PC92 PC93 PC94 PC95 PC96
Standard deviation 3.93752 3.90476 3.89645 3.86114 3.84045 3.82990 3.81073
Proportion of Variance 0.00111 0.00109 0.00108 0.00106 0.00105 0.00105 0.00104
Cumulative Proportion 0.98323 0.98432 0.98541 0.98647 0.98752 0.98857 0.98961
PC97 PC98 PC99 PC100 PC101 PC102 PC103
Standard deviation 3.75910 3.71712 3.68479 3.61557 3.5502 3.51461 3.48465
Proportion of Variance 0.00101 0.00099 0.00097 0.00093 0.0009 0.00088 0.00087
Cumulative Proportion 0.99062 0.99160 0.99257 0.99351 0.9944 0.99529 0.99616
PC104 PC105 PC106 PC107 PC108 PC109
Standard deviation 3.44532 3.39627 3.26300 3.17868 3.10257 4.622e-14
Proportion of Variance 0.00085 0.00082 0.00076 0.00072 0.00069 0.000e+00
Cumulative Proportion 0.99701 0.99783 0.99859 0.99931 1.00000 1.000e+00
pca.results.to.plot <- pca.results$x %>%
as.data.frame() %>%
rownames_to_column("old.sample.name") %>%
dplyr::select(old.sample.name, PC1:PC6) %>%
left_join(FullMetadata, by="old.sample.name")
ggplot(pca.results.to.plot, aes(x=PC1, y=PC2, color=color, shape=Experiment)) +
geom_text(size=3, aes(label=treatment)) +
scale_color_identity() +
theme_bw() +
labs(title = "PCA using top 14K genes\nExpOf52")

ggplot(pca.results.to.plot, aes(x=PC3, y=PC4, color=color, shape=Experiment)) +
geom_text(size=3, aes(label=treatment)) +
scale_color_identity() +
theme_bw() +
labs(title = "PCA using top 14K genes\nExpOf52")

ggplot(pca.results.to.plot, aes(x=PC5, y=PC6, color=color, shape=Experiment)) +
geom_text(size=3, aes(label=treatment)) +
scale_color_identity() +
theme_bw() +
labs(title = "PCA using top 14K genes\nExpOf52")

Ok, even though I am looking at only the first 6 PCs, it seems pretty clear that the DMSO samples don’t cleanly cluster together. Perhaps some of these molecules weren’t very active.
Now just for curiosity, let’s just include the fibroblast and the dose titration experiment.
SamplesToInclude <- FullMetadata %>%
filter(Experiment %in% c("Dose response titration", "Single high dose fibroblast")) %>%
pull(old.sample.name)
CPM <- gene.counts %>%
cpm(log=T, prior.count=T) %>%
as.data.frame() %>%
rownames_to_column("Geneid") %>%
dplyr::select(Geneid, all_of(SamplesToInclude))
Top14K_ExpressedGenes <- (CPM %>%
column_to_rownames("Geneid") %>%
apply(1, mean) %>%
sort(decreasing=T))[1:14000] %>%
names()
pca.results <- CPM %>%
filter(Geneid %in% Top14K_ExpressedGenes) %>%
column_to_rownames("Geneid") %>%
scale() %>% t() %>% prcomp(scale=T)
summary(pca.results)
Importance of components:
PC1 PC2 PC3 PC4 PC5 PC6
Standard deviation 95.5166 43.8246 29.44168 20.92836 16.18447 14.81627
Proportion of Variance 0.6517 0.1372 0.06192 0.03129 0.01871 0.01568
Cumulative Proportion 0.6517 0.7889 0.85077 0.88206 0.90077 0.91645
PC7 PC8 PC9 PC10 PC11 PC12 PC13
Standard deviation 12.69624 10.41019 9.97714 8.53642 8.01215 6.54003 6.1490
Proportion of Variance 0.01151 0.00774 0.00711 0.00521 0.00459 0.00306 0.0027
Cumulative Proportion 0.92796 0.93570 0.94281 0.94802 0.95260 0.95566 0.9584
PC14 PC15 PC16 PC17 PC18 PC19 PC20
Standard deviation 5.81417 5.60050 5.26490 5.17174 5.10546 5.04332 4.95929
Proportion of Variance 0.00241 0.00224 0.00198 0.00191 0.00186 0.00182 0.00176
Cumulative Proportion 0.96078 0.96302 0.96500 0.96691 0.96877 0.97058 0.97234
PC21 PC22 PC23 PC24 PC25 PC26 PC27
Standard deviation 4.92047 4.86882 4.79843 4.69269 4.66221 4.60350 4.56525
Proportion of Variance 0.00173 0.00169 0.00164 0.00157 0.00155 0.00151 0.00149
Cumulative Proportion 0.97407 0.97576 0.97741 0.97898 0.98053 0.98205 0.98354
PC28 PC29 PC30 PC31 PC32 PC33 PC34
Standard deviation 4.38148 4.37470 4.29248 4.2711 4.13309 4.1052 4.06510
Proportion of Variance 0.00137 0.00137 0.00132 0.0013 0.00122 0.0012 0.00118
Cumulative Proportion 0.98491 0.98627 0.98759 0.9889 0.99011 0.9913 0.99250
PC35 PC36 PC37 PC38 PC39 PC40 PC41
Standard deviation 4.04592 3.98561 3.94879 3.89471 3.83228 3.76342 3.62769
Proportion of Variance 0.00117 0.00113 0.00111 0.00108 0.00105 0.00101 0.00094
Cumulative Proportion 0.99367 0.99480 0.99592 0.99700 0.99805 0.99906 1.00000
PC42
Standard deviation 4.242e-14
Proportion of Variance 0.000e+00
Cumulative Proportion 1.000e+00
pca.results.to.plot <- pca.results$x %>%
as.data.frame() %>%
rownames_to_column("old.sample.name") %>%
dplyr::select(old.sample.name, PC1:PC6) %>%
left_join(FullMetadata, by="old.sample.name")
ggplot(pca.results.to.plot, aes(x=PC1, y=PC2, color=color, shape=Experiment)) +
geom_point(size=3) +
scale_color_identity() +
theme_bw() +
labs(title = "PCA using top 14K genes\nTitrationExp and FibroblastExp")

ggplot(pca.results.to.plot, aes(x=PC3, y=PC4, color=color, shape=Experiment)) +
geom_point(size=3) +
scale_color_identity() +
theme_bw() +
labs(title = "PCA using top 14K genes\nTitrationExp and FibroblastExp")

ggplot(pca.results.to.plot, aes(x=PC5, y=PC6, color=color, shape=Experiment)) +
geom_point(size=3) +
scale_color_identity() +
theme_bw() +
labs(title = "PCA using top 14K genes\nTitrationExp and FibroblastExp")

Wow, so in that, the second PC seems to pretty obviously reflect dose. I am quite surprised we don’t see something so obvious with the new samples… Somewhat concerning. Of course, perhaps it’s not too surprising considering this experiment just has so many more samples, only a few of which are DMSO, that maybe I shouldn’t expect some dose effect in the first few PCs… One other thing I have done that was helpful last time was to perform PCA on the titration series then project new samples in the same PC space. I’ll try that later…
Now Let’s check for some splicing effects. Maybe that will be more illuminating…
Let’s start by checking for global enrichment of GA|GT juncs… If this doesn’t show an effect I’ll be very skeptical about the usefulness of this data.
leafcutter.sig <- Sys.glob("../code/SplicingAnalysis/leafcutter/differential_splicing/ExpOf52_*_cluster_significance.txt") %>%
setNames(str_replace(., "../code/SplicingAnalysis/leafcutter/differential_splicing/ExpOf52_(.+?)_cluster_significance.txt", "W\\1")) %>%
lapply(fread) %>%
bind_rows(.id="treatment")
leafcutter.effects <- Sys.glob("../code/SplicingAnalysis/leafcutter/differential_splicing/ExpOf52_*_effect_sizes.txt") %>%
setNames(str_replace(., "../code/SplicingAnalysis/leafcutter/differential_splicing/ExpOf52_(.+?)_effect_sizes.txt", "W\\1")) %>%
lapply(fread, col.names=c("intron", "logef", "treatment_PSI", "DMSO_PSI", "deltapsi")) %>%
bind_rows(.id="treatment") %>%
mutate(name = str_replace(intron, "(.+?):(.+?):(.+?):clu.+([+-])", "\\1_\\2_\\3_\\4")) %>%
mutate(deltapsi = deltapsi * -1, logef=logef*-1) %>%
mutate(cluster = str_replace(intron, "(.+?):.+?:.+?:(clu.+[+-])", "\\1:\\2"))
Introns <- read_tsv("../code/SplicingAnalysis/FullSpliceSiteAnnotations/JuncfilesMerged.annotated.basic.bed.5ss.tab.gz", col_names = c("name", "seq", "score")) %>%
separate(name, into=c("name", "pos"), sep = "::")
Introns.annotations <- read_tsv("../code/SplicingAnalysis/FullSpliceSiteAnnotations/JuncfilesMerged.annotated.basic.bed.gz") %>%
mutate(name = paste(chrom, start, end, strand, sep = "_")) %>%
left_join(Introns, by="name")
leafcutter.effects %>%
inner_join(
Introns.annotations %>%
mutate(IsGAGT = if_else(str_detect(seq, "^\\w\\wGAGT"), "GA-GT", "not GA-GT")),
by="name") %>%
# count(IsGAGT, treatment)
ggplot(aes(x=logef, group=treatment)) +
stat_ecdf() +
coord_cartesian(xlim=c(-2,2)) +
facet_wrap(~IsGAGT) +
theme_bw() +
labs(title="Distribution of log(FC) for all tested introns\nExpOf52", y="ecdf", x="Splicing log(FC)")

Ok so I see some subtle general up-regulation of GA-GT introns… That’s reassuring. But the degree to which all samples have about the same general effect size is surprising to me… I expected some of the compounds to be less active… Even though we normalized concentration based on SMN2 minigene EC90, I suspected a bit of noise in that estimate anyway. Let’s compare this to the analogous plot from the previous fibroblast dataset where I also did differential splicing with leafcutter to estimate splicing log(FC)…
leafcutter.effects.fibroblasts <- list.files(path="../code/SplicingAnalysis/leafcutter/differential_splicing", pattern="^[A-Z]+[0-9]*_effect_sizes.txt", full.names=T) %>%
setNames(str_replace(., "../code/SplicingAnalysis/leafcutter/differential_splicing/(.+?)_effect_sizes.txt", "\\1")) %>%
lapply(fread, col.names=c("intron", "logef", "treatment_PSI", "DMSO_PSI", "deltapsi")) %>%
bind_rows(.id="treatment") %>%
mutate(name = str_replace(intron, "(.+?):(.+?):(.+?):clu.+([+-])", "\\1_\\2_\\3_\\4")) %>%
mutate(deltapsi = deltapsi * -1, logef=logef*-1) %>%
mutate(cluster = str_replace(intron, "(.+?):.+?:.+?:(clu.+[+-])", "\\1:\\2"))
leafcutter.sig.fibroblasts <-list.files(path="../code/SplicingAnalysis/leafcutter/differential_splicing", pattern="^[A-Z]+[0-9]*_cluster_significance.txt", full.names=T) %>%
setNames(str_replace(., "../code/SplicingAnalysis/leafcutter/differential_splicing/(.+?)_cluster_significance.txt", "\\1")) %>%
lapply(fread) %>%
bind_rows(.id="treatment")
Fibroblast.colors <- FullMetadata %>%
filter(cell.type=="Fibroblast") %>%
distinct(treatment, .keep_all=T) %>%
dplyr::select(treatment, color) %>%
deframe()
leafcutter.effects.fibroblasts %>%
inner_join(
Introns.annotations %>%
mutate(IsGAGT = if_else(str_detect(seq, "^\\w\\wGAGT"), "GA-GT", "not GA-GT")),
by="name") %>%
# count(IsGAGT, treatment)
ggplot(aes(x=logef, color=treatment)) +
stat_ecdf() +
scale_color_manual(values=Fibroblast.colors) +
coord_cartesian(xlim=c(-2,2)) +
facet_wrap(~IsGAGT) +
theme_bw() +
labs(title="Distribution of log(FC) for all tested introns\nFibroblast Exp", y="ecdf", x="Splicing log(FC)")
Ok, so maybe the degree of these 52 treatments are roughly on the order
of the CUOME treatment in fibroblast data. Not too strong. I’m still not
totally sure this is even real…
Ok, so now let’s doing PCA just using GA|GT juncs… I think this should really work. Let’s start by including all samples. If I don’t really see DMSO control pop out from the rest, I will be a bit worried.
Actually before doing PCA, to get a broad overview of results, let’s calculate the correlation coefficient across samples based on splicing. For sake of quickly getting an overview, and picking reasonable GA-GT introns (that aren’t too noisy to measure) let’s just consider the 500ish GA-GT introns I previously modelled. These are a pretty ‘good’ set of differentially spliced GA-GT introns based on risdiplam,branaplam,C2C5…
#Read in splice junction count table
PSI <- fread("../code/SplicingAnalysis/leafcutter_all_samples/leafcutter_perind_numers.bed.gz", sep = '\t' ) %>%
dplyr::select(-"NewMolecule.C04.2")
Modelled.introns <- read_tsv("../output/EC50Estimtes.FromPSI.txt.gz") %>%
dplyr::select(`#Chrom`, start, end, strand=strand.y)
colnames(PSI)
[1] "#Chrom" "start" "end"
[4] "junc" "gid" "strand"
[7] "A10.1" "A10.2" "A10.3"
[10] "C2.1" "C2.2" "C2.3"
[13] "CUOME.1" "CUOME.2" "CUOME.3"
[16] "DMSO.1" "DMSO.2" "DMSO.3"
[19] "SM2.1" "SM2.2" "SM2.3"
[22] "TitrationExpBran.1" "TitrationExpBran.2" "TitrationExpBran.3"
[25] "TitrationExpBran.4" "TitrationExpBran.5" "TitrationExpBran.6"
[28] "TitrationExpBran.7" "TitrationExpBran.8" "TitrationExpC2C5.1"
[31] "TitrationExpC2C5.2" "TitrationExpC2C5.3" "TitrationExpC2C5.4"
[34] "TitrationExpC2C5.5" "TitrationExpC2C5.6" "TitrationExpC2C5.7"
[37] "TitrationExpC2C5.8" "TitrationExpDMSO.1" "TitrationExpDMSO.2"
[40] "TitrationExpDMSO.3" "TitrationExpRis.1" "TitrationExpRis.2"
[43] "TitrationExpRis.3" "TitrationExpRis.4" "TitrationExpRis.5"
[46] "TitrationExpRis.6" "TitrationExpRis.7" "TitrationExpRis.8"
[49] "chRNA_1" "chRNA_2" "chRNA_3"
[52] "chRNA_4" "chRNA_5" "chRNA_6"
[55] "chRNA_7" "chRNA_8" "chRNA_9"
[58] "chRNA_10" "chRNA_11" "chRNA_12"
[61] "chRNA_13" "chRNA_14" "chRNA_15"
[64] "chRNA_16" "chRNA_17" "chRNA_18"
[67] "chRNA_19" "chRNA_20" "chRNA_21"
[70] "NewMolecule.A01.1" "NewMolecule.A02.1" "NewMolecule.A03.1"
[73] "NewMolecule.A04.1" "NewMolecule.A05.1" "NewMolecule.A06.1"
[76] "NewMolecule.A07.1" "NewMolecule.A08.1" "NewMolecule.A09.1"
[79] "NewMolecule.A10.1" "NewMolecule.A11.1" "NewMolecule.A12.1"
[82] "NewMolecule.B01.1" "NewMolecule.B02.1" "NewMolecule.B03.1"
[85] "NewMolecule.B04.1" "NewMolecule.B05.1" "NewMolecule.B06.1"
[88] "NewMolecule.B07.1" "NewMolecule.B08.1" "NewMolecule.B09.1"
[91] "NewMolecule.B10.1" "NewMolecule.B11.1" "NewMolecule.B12.1"
[94] "NewMolecule.C01.1" "NewMolecule.C02.1" "NewMolecule.C03.1"
[97] "NewMolecule.C04.1" "NewMolecule.C05.1" "NewMolecule.C06.1"
[100] "NewMolecule.C07.1" "NewMolecule.C08.1" "NewMolecule.C09.1"
[103] "NewMolecule.C10.1" "NewMolecule.C11.1" "NewMolecule.C12.1"
[106] "NewMolecule.D01.1" "NewMolecule.D02.1" "NewMolecule.D03.1"
[109] "NewMolecule.D04.1" "NewMolecule.D05.1" "NewMolecule.D06.1"
[112] "NewMolecule.D07.1" "NewMolecule.D08.1" "NewMolecule.D09.1"
[115] "NewMolecule.D10.1" "NewMolecule.D11.1" "NewMolecule.D12.1"
[118] "NewMolecule.E01.1" "NewMolecule.E02.1" "NewMolecule.E03.1"
[121] "NewMolecule.E04.1" "NewMolecule.E05.1" "NewMolecule.E06.2"
[124] "NewMolecule.E07.3" "NewMolecule.A01.2" "NewMolecule.A02.2"
[127] "NewMolecule.A03.2" "NewMolecule.A04.2" "NewMolecule.A05.2"
[130] "NewMolecule.A06.2" "NewMolecule.A07.2" "NewMolecule.A08.2"
[133] "NewMolecule.A09.2" "NewMolecule.A10.2" "NewMolecule.A11.2"
[136] "NewMolecule.A12.2" "NewMolecule.B01.2" "NewMolecule.B02.2"
[139] "NewMolecule.B03.2" "NewMolecule.B04.2" "NewMolecule.B05.2"
[142] "NewMolecule.B06.2" "NewMolecule.B07.2" "NewMolecule.B08.2"
[145] "NewMolecule.B09.2" "NewMolecule.B10.2" "NewMolecule.B11.2"
[148] "NewMolecule.B12.2" "NewMolecule.C01.2" "NewMolecule.C02.2"
[151] "NewMolecule.C03.2" "NewMolecule.C05.2" "NewMolecule.C06.2"
[154] "NewMolecule.C07.2" "NewMolecule.C08.2" "NewMolecule.C09.2"
[157] "NewMolecule.C10.2" "NewMolecule.C11.2" "NewMolecule.C12.2"
[160] "NewMolecule.D01.2" "NewMolecule.D02.2" "NewMolecule.D03.2"
[163] "NewMolecule.D04.2" "NewMolecule.D05.2" "NewMolecule.D06.2"
[166] "NewMolecule.D07.2" "NewMolecule.D08.2" "NewMolecule.D09.2"
[169] "NewMolecule.D10.2" "NewMolecule.D11.2" "NewMolecule.D12.2"
[172] "NewMolecule.E01.2" "NewMolecule.E02.2" "NewMolecule.E03.2"
[175] "NewMolecule.E04.2" "NewMolecule.E05.4" "NewMolecule.E06.5"
[178] "NewMolecule.E07.6"
PSI.GAGT.Only <- PSI %>%
inner_join(Modelled.introns) %>%
dplyr::select(junc, A10.1:NewMolecule.E07.6) %>%
drop_na()
PSI.GAGT.Only.cormat <- PSI.GAGT.Only %>%
column_to_rownames("junc") %>%
cor(method='s')
heatmap.2(PSI.GAGT.Only.cormat, trace='none')

And now let’s do PCA based on these introns…
pca.results <- PSI.GAGT.Only %>%
column_to_rownames("junc") %>%
scale() %>% t() %>% prcomp(scale=T)
summary(pca.results)
Importance of components:
PC1 PC2 PC3 PC4 PC5 PC6 PC7
Standard deviation 17.8068 10.8573 8.17051 7.30330 5.95972 5.17196 4.98289
Proportion of Variance 0.2909 0.1081 0.06125 0.04893 0.03259 0.02454 0.02278
Cumulative Proportion 0.2909 0.3991 0.46029 0.50923 0.54181 0.56635 0.58913
PC8 PC9 PC10 PC11 PC12 PC13 PC14
Standard deviation 4.65906 4.15658 4.07768 3.91171 3.76570 3.7066 3.51788
Proportion of Variance 0.01991 0.01585 0.01525 0.01404 0.01301 0.0126 0.01135
Cumulative Proportion 0.60905 0.62490 0.64015 0.65419 0.66720 0.6798 0.69116
PC15 PC16 PC17 PC18 PC19 PC20 PC21
Standard deviation 3.35281 3.25740 3.16906 3.12639 3.09122 3.02097 3.00494
Proportion of Variance 0.01031 0.00973 0.00921 0.00897 0.00877 0.00837 0.00828
Cumulative Proportion 0.70147 0.71121 0.72042 0.72939 0.73815 0.74653 0.75481
PC22 PC23 PC24 PC25 PC26 PC27 PC28
Standard deviation 2.91815 2.83474 2.79655 2.72807 2.67897 2.57623 2.56316
Proportion of Variance 0.00781 0.00737 0.00717 0.00683 0.00658 0.00609 0.00603
Cumulative Proportion 0.76262 0.77000 0.77717 0.78400 0.79058 0.79667 0.80270
PC29 PC30 PC31 PC32 PC33 PC34 PC35
Standard deviation 2.4933 2.42762 2.38281 2.32648 2.22686 2.14178 2.12131
Proportion of Variance 0.0057 0.00541 0.00521 0.00497 0.00455 0.00421 0.00413
Cumulative Proportion 0.8084 0.81381 0.81902 0.82398 0.82853 0.83274 0.83687
PC36 PC37 PC38 PC39 PC40 PC41 PC42
Standard deviation 2.11105 2.07792 2.04445 1.99915 1.96822 1.95962 1.8972
Proportion of Variance 0.00409 0.00396 0.00383 0.00367 0.00355 0.00352 0.0033
Cumulative Proportion 0.84096 0.84492 0.84876 0.85242 0.85598 0.85950 0.8628
PC43 PC44 PC45 PC46 PC47 PC48 PC49
Standard deviation 1.87969 1.8689 1.82108 1.8094 1.77040 1.72243 1.70644
Proportion of Variance 0.00324 0.0032 0.00304 0.0030 0.00288 0.00272 0.00267
Cumulative Proportion 0.86604 0.8692 0.87229 0.8753 0.87817 0.88089 0.88356
PC50 PC51 PC52 PC53 PC54 PC55 PC56
Standard deviation 1.70530 1.65973 1.63623 1.60037 1.59820 1.58595 1.5491
Proportion of Variance 0.00267 0.00253 0.00246 0.00235 0.00234 0.00231 0.0022
Cumulative Proportion 0.88623 0.88876 0.89121 0.89356 0.89591 0.89821 0.9004
PC57 PC58 PC59 PC60 PC61 PC62 PC63
Standard deviation 1.52812 1.50637 1.49055 1.46312 1.44493 1.4382 1.41790
Proportion of Variance 0.00214 0.00208 0.00204 0.00196 0.00192 0.0019 0.00184
Cumulative Proportion 0.90256 0.90464 0.90668 0.90864 0.91056 0.9124 0.91430
PC64 PC65 PC66 PC67 PC68 PC69 PC70
Standard deviation 1.40272 1.38000 1.3627 1.35727 1.33489 1.32463 1.30675
Proportion of Variance 0.00181 0.00175 0.0017 0.00169 0.00163 0.00161 0.00157
Cumulative Proportion 0.91610 0.91785 0.9196 0.92125 0.92288 0.92449 0.92606
PC71 PC72 PC73 PC74 PC75 PC76 PC77
Standard deviation 1.29953 1.29335 1.28311 1.26960 1.25385 1.24148 1.23204
Proportion of Variance 0.00155 0.00153 0.00151 0.00148 0.00144 0.00141 0.00139
Cumulative Proportion 0.92761 0.92914 0.93065 0.93213 0.93357 0.93499 0.93638
PC78 PC79 PC80 PC81 PC82 PC83 PC84
Standard deviation 1.21811 1.20481 1.20317 1.1894 1.17549 1.17356 1.15990
Proportion of Variance 0.00136 0.00133 0.00133 0.0013 0.00127 0.00126 0.00123
Cumulative Proportion 0.93774 0.93907 0.94040 0.9417 0.94297 0.94423 0.94546
PC85 PC86 PC87 PC88 PC89 PC90 PC91
Standard deviation 1.15034 1.1443 1.13021 1.12064 1.11255 1.10493 1.10202
Proportion of Variance 0.00121 0.0012 0.00117 0.00115 0.00114 0.00112 0.00111
Cumulative Proportion 0.94668 0.9479 0.94905 0.95020 0.95134 0.95246 0.95357
PC92 PC93 PC94 PC95 PC96 PC97 PC98
Standard deviation 1.09981 1.07942 1.06509 1.05582 1.05330 1.03539 1.02657
Proportion of Variance 0.00111 0.00107 0.00104 0.00102 0.00102 0.00098 0.00097
Cumulative Proportion 0.95468 0.95575 0.95679 0.95781 0.95883 0.95982 0.96078
PC99 PC100 PC101 PC102 PC103 PC104 PC105
Standard deviation 1.01193 1.01054 1.00006 0.9921 0.98595 0.98306 0.97870
Proportion of Variance 0.00094 0.00094 0.00092 0.0009 0.00089 0.00089 0.00088
Cumulative Proportion 0.96172 0.96266 0.96358 0.9645 0.96537 0.96626 0.96714
PC106 PC107 PC108 PC109 PC110 PC111 PC112
Standard deviation 0.96883 0.96234 0.95920 0.9351 0.9323 0.92778 0.91778
Proportion of Variance 0.00086 0.00085 0.00084 0.0008 0.0008 0.00079 0.00077
Cumulative Proportion 0.96800 0.96885 0.96969 0.9705 0.9713 0.97208 0.97285
PC113 PC114 PC115 PC116 PC117 PC118 PC119
Standard deviation 0.91312 0.89464 0.88541 0.8759 0.8726 0.86852 0.86330
Proportion of Variance 0.00076 0.00073 0.00072 0.0007 0.0007 0.00069 0.00068
Cumulative Proportion 0.97362 0.97435 0.97507 0.9758 0.9765 0.97717 0.97785
PC120 PC121 PC122 PC123 PC124 PC125 PC126
Standard deviation 0.85787 0.85264 0.84570 0.84076 0.82717 0.82139 0.82103
Proportion of Variance 0.00068 0.00067 0.00066 0.00065 0.00063 0.00062 0.00062
Cumulative Proportion 0.97853 0.97919 0.97985 0.98050 0.98112 0.98174 0.98236
PC127 PC128 PC129 PC130 PC131 PC132 PC133
Standard deviation 0.81732 0.80003 0.79463 0.79015 0.77880 0.77161 0.76729
Proportion of Variance 0.00061 0.00059 0.00058 0.00057 0.00056 0.00055 0.00054
Cumulative Proportion 0.98297 0.98356 0.98414 0.98471 0.98527 0.98582 0.98636
PC134 PC135 PC136 PC137 PC138 PC139 PC140
Standard deviation 0.75267 0.74685 0.74292 0.73295 0.72225 0.71640 0.71031
Proportion of Variance 0.00052 0.00051 0.00051 0.00049 0.00048 0.00047 0.00046
Cumulative Proportion 0.98688 0.98739 0.98789 0.98839 0.98887 0.98934 0.98980
PC141 PC142 PC143 PC144 PC145 PC146 PC147
Standard deviation 0.70834 0.69835 0.69170 0.68614 0.68132 0.66946 0.6641
Proportion of Variance 0.00046 0.00045 0.00044 0.00043 0.00043 0.00041 0.0004
Cumulative Proportion 0.99026 0.99071 0.99115 0.99158 0.99200 0.99242 0.9928
PC148 PC149 PC150 PC151 PC152 PC153 PC154
Standard deviation 0.6576 0.65180 0.64783 0.63299 0.62971 0.62458 0.61835
Proportion of Variance 0.0004 0.00039 0.00039 0.00037 0.00036 0.00036 0.00035
Cumulative Proportion 0.9932 0.99361 0.99399 0.99436 0.99472 0.99508 0.99543
PC155 PC156 PC157 PC158 PC159 PC160 PC161
Standard deviation 0.61050 0.60903 0.58801 0.58311 0.57708 0.56122 0.55556
Proportion of Variance 0.00034 0.00034 0.00032 0.00031 0.00031 0.00029 0.00028
Cumulative Proportion 0.99577 0.99611 0.99643 0.99674 0.99705 0.99734 0.99762
PC162 PC163 PC164 PC165 PC166 PC167 PC168
Standard deviation 0.54262 0.53499 0.53018 0.52517 0.51671 0.51438 0.50128
Proportion of Variance 0.00027 0.00026 0.00026 0.00025 0.00024 0.00024 0.00023
Cumulative Proportion 0.99789 0.99815 0.99841 0.99866 0.99891 0.99915 0.99938
PC169 PC170 PC171 PC172
Standard deviation 0.48872 0.48374 0.44743 1.373e-14
Proportion of Variance 0.00022 0.00021 0.00018 0.000e+00
Cumulative Proportion 0.99960 0.99982 1.00000 1.000e+00
pca.results.to.plot <- pca.results$x %>%
as.data.frame() %>%
rownames_to_column("leafcutter.name") %>%
dplyr::select(leafcutter.name, PC1:PC6) %>%
left_join(FullMetadata, by="leafcutter.name")
ggplot(pca.results.to.plot, aes(x=PC1, y=PC2, color=color, shape=Experiment)) +
geom_point(size=3) +
scale_color_identity() +
theme_bw() +
labs(title = "PCA using 1090 GAGT introns\nAll samples")

ggplot(pca.results.to.plot, aes(x=PC3, y=PC4, color=color, shape=Experiment)) +
geom_point(size=3) +
scale_color_identity() +
theme_bw() +
labs(title = "PCA using 1090 GAGT introns\nAll samples")

ggplot(pca.results.to.plot, aes(x=PC5, y=PC6, color=color, shape=Experiment)) +
geom_point(size=3) +
scale_color_identity() +
theme_bw() +
labs(title = "PCA using 1090 GAGT introns\nAll samples")

That is actually kind of reassuring. The first PC is definitely related to dose. And all DMSO samples cluster together. However, again, the effective dosage seems a bit weak compared to fibroblast experiments… Looks like most of the samples are roughly equivalent to ~100nM risdiplam, so still perhaps on the high end of what would be realistic clinical concentration but I think a slightly higher dose would have been more informative. In this particular PCA, it looks like PC3 sepearates branaplam vs risdiplam, and it’s not totally clear how these molecules are different along that axis… I could try, as mentioned before just doing PCA using the dose titration experiments then projecting the new samples in that space… But first let’s do a PCA just using the samples from the new experiment..
SamplesToInclude <- FullMetadata %>%
filter(Experiment %in% c("Single high dose LCL")) %>%
filter(!old.sample.name == "NewMolecule.C04-2") %>%
pull(leafcutter.name)
pca.results <- PSI.GAGT.Only %>%
dplyr::select(junc, all_of(SamplesToInclude)) %>%
column_to_rownames("junc") %>%
scale() %>% t() %>% prcomp(scale=T)
summary(pca.results)
Importance of components:
PC1 PC2 PC3 PC4 PC5 PC6 PC7
Standard deviation 17.3652 8.02663 7.63797 4.89303 4.62299 4.47221 4.29379
Proportion of Variance 0.2767 0.05911 0.05352 0.02196 0.01961 0.01835 0.01691
Cumulative Proportion 0.2767 0.33576 0.38928 0.41125 0.43085 0.44920 0.46612
PC8 PC9 PC10 PC11 PC12 PC13 PC14
Standard deviation 3.9200 3.84117 3.69235 3.60719 3.54779 3.47655 3.44963
Proportion of Variance 0.0141 0.01354 0.01251 0.01194 0.01155 0.01109 0.01092
Cumulative Proportion 0.4802 0.49375 0.50626 0.51820 0.52974 0.54083 0.55175
PC15 PC16 PC17 PC18 PC19 PC20 PC21
Standard deviation 3.33808 3.32645 3.30802 3.26244 3.24349 3.17997 3.10580
Proportion of Variance 0.01022 0.01015 0.01004 0.00976 0.00965 0.00928 0.00885
Cumulative Proportion 0.56197 0.57212 0.58216 0.59193 0.60158 0.61086 0.61971
PC22 PC23 PC24 PC25 PC26 PC27 PC28
Standard deviation 3.04656 3.01811 2.97941 2.92856 2.91952 2.8791 2.87597
Proportion of Variance 0.00852 0.00836 0.00814 0.00787 0.00782 0.0076 0.00759
Cumulative Proportion 0.62822 0.63658 0.64472 0.65259 0.66041 0.6680 0.67560
PC29 PC30 PC31 PC32 PC33 PC34 PC35
Standard deviation 2.84331 2.80557 2.77632 2.74624 2.73980 2.69767 2.6825
Proportion of Variance 0.00742 0.00722 0.00707 0.00692 0.00689 0.00668 0.0066
Cumulative Proportion 0.68302 0.69024 0.69731 0.70423 0.71112 0.71780 0.7244
PC36 PC37 PC38 PC39 PC40 PC41 PC42
Standard deviation 2.67372 2.66036 2.6208 2.58876 2.5790 2.55420 2.51670
Proportion of Variance 0.00656 0.00649 0.0063 0.00615 0.0061 0.00599 0.00581
Cumulative Proportion 0.73096 0.73745 0.7438 0.74990 0.7560 0.76199 0.76780
PC43 PC44 PC45 PC46 PC47 PC48 PC49
Standard deviation 2.50155 2.48335 2.48004 2.45997 2.4489 2.42392 2.41933
Proportion of Variance 0.00574 0.00566 0.00564 0.00555 0.0055 0.00539 0.00537
Cumulative Proportion 0.77354 0.77920 0.78484 0.79039 0.7959 0.80128 0.80665
PC50 PC51 PC52 PC53 PC54 PC55 PC56
Standard deviation 2.38644 2.36665 2.33817 2.31661 2.30298 2.29857 2.27662
Proportion of Variance 0.00522 0.00514 0.00502 0.00492 0.00487 0.00485 0.00476
Cumulative Proportion 0.81188 0.81702 0.82203 0.82695 0.83182 0.83667 0.84142
PC57 PC58 PC59 PC60 PC61 PC62 PC63
Standard deviation 2.26629 2.25224 2.22404 2.2140 2.17655 2.16350 2.15976
Proportion of Variance 0.00471 0.00465 0.00454 0.0045 0.00435 0.00429 0.00428
Cumulative Proportion 0.84613 0.85079 0.85533 0.8598 0.86417 0.86846 0.87274
PC64 PC65 PC66 PC67 PC68 PC69 PC70
Standard deviation 2.12927 2.10607 2.09074 2.07866 2.05863 2.05530 2.02185
Proportion of Variance 0.00416 0.00407 0.00401 0.00396 0.00389 0.00388 0.00375
Cumulative Proportion 0.87690 0.88097 0.88498 0.88895 0.89283 0.89671 0.90046
PC71 PC72 PC73 PC74 PC75 PC76 PC77
Standard deviation 2.00272 1.99787 1.9816 1.96398 1.94003 1.93185 1.92771
Proportion of Variance 0.00368 0.00366 0.0036 0.00354 0.00345 0.00342 0.00341
Cumulative Proportion 0.90414 0.90780 0.9114 0.91494 0.91840 0.92182 0.92523
PC78 PC79 PC80 PC81 PC82 PC83 PC84
Standard deviation 1.88660 1.86190 1.85577 1.83633 1.82108 1.80387 1.78728
Proportion of Variance 0.00327 0.00318 0.00316 0.00309 0.00304 0.00299 0.00293
Cumulative Proportion 0.92849 0.93168 0.93483 0.93793 0.94097 0.94396 0.94689
PC85 PC86 PC87 PC88 PC89 PC90 PC91
Standard deviation 1.77038 1.76254 1.73891 1.72961 1.69315 1.68904 1.67390
Proportion of Variance 0.00288 0.00285 0.00277 0.00274 0.00263 0.00262 0.00257
Cumulative Proportion 0.94976 0.95261 0.95539 0.95813 0.96076 0.96338 0.96595
PC92 PC93 PC94 PC95 PC96 PC97 PC98
Standard deviation 1.64617 1.6161 1.59282 1.58560 1.55184 1.52377 1.51716
Proportion of Variance 0.00249 0.0024 0.00233 0.00231 0.00221 0.00213 0.00211
Cumulative Proportion 0.96843 0.9708 0.97316 0.97547 0.97767 0.97980 0.98192
PC99 PC100 PC101 PC102 PC103 PC104 PC105
Standard deviation 1.49663 1.46361 1.46094 1.45131 1.43022 1.42266 1.39131
Proportion of Variance 0.00205 0.00197 0.00196 0.00193 0.00188 0.00186 0.00178
Cumulative Proportion 0.98397 0.98594 0.98789 0.98983 0.99170 0.99356 0.99534
PC106 PC107 PC108 PC109
Standard deviation 1.35016 1.32492 1.22667 3.724e-14
Proportion of Variance 0.00167 0.00161 0.00138 0.000e+00
Cumulative Proportion 0.99701 0.99862 1.00000 1.000e+00
pca.results.to.plot <- pca.results$x %>%
as.data.frame() %>%
rownames_to_column("leafcutter.name") %>%
dplyr::select(leafcutter.name, PC1:PC6) %>%
left_join(FullMetadata, by="leafcutter.name")
ggplot(pca.results.to.plot, aes(x=PC1, y=PC2, color=color, shape=Experiment)) +
geom_text(size=3, aes(label=treatment)) +
scale_color_identity() +
theme_bw() +
labs(title = "PCA using 1090 GAGT introns\nExpOf52 samples")

ggplot(pca.results.to.plot, aes(x=PC3, y=PC4, color=color, shape=Experiment)) +
geom_text(size=3, aes(label=treatment)) +
scale_color_identity() +
theme_bw() +
labs(title = "PCA using 1090 GAGT introns\nExpOf52 samples")

ggplot(pca.results.to.plot, aes(x=PC5, y=PC6, color=color, shape=Experiment)) +
geom_text(size=3, aes(label=treatment)) +
scale_color_identity() +
theme_bw() +
labs(title = "PCA using 1090 GAGT introns\nExpOf52 samples")

Ok, reassuringly, again DMSO samples stick out in the first PC, and I see a few biological replicates are close together.
Ok, now I want to try projecting the old titration series data into this same PCA spaace, which might help interpretability.
SamplesToProject <- FullMetadata %>%
filter(!Experiment %in% c("Single high dose LCL")) %>%
filter(!old.sample.name == "NewMolecule.C04-2") %>%
pull(leafcutter.name)
pca.results.to.plot_AdditionProjections <- PSI.GAGT.Only %>%
dplyr::select(junc, all_of(SamplesToProject)) %>%
column_to_rownames("junc") %>%
scale() %>% t() %>% predict(pca.results, .) %>%
as.data.frame() %>%
rownames_to_column("leafcutter.name") %>%
dplyr::select(leafcutter.name, PC1:PC6) %>%
left_join(FullMetadata, by="leafcutter.name")
TreatmentColorsForLabels.FibroblastColorsAdded <-
FullMetadata %>%
group_by(treatment) %>%
filter(dose.nM == max(dose.nM) | treatment == "DMSO" | cell.type=="Fibroblast") %>%
ungroup() %>%
distinct(treatment, .keep_all=T) %>%
arrange(dose.nM) %>%
mutate(vjust=row_number()*1.2)
TreatmentColorsLabels.Layer.FibroblastColorsAdded <- geom_text(
data = TreatmentColorsForLabels.FibroblastColorsAdded,
aes(label=treatment, color=color, vjust=vjust),
y=Inf, x=Inf, hjust=1.05
)
ggplot(pca.results.to.plot, aes(x=PC1, y=PC2, color=color, shape=Experiment)) +
geom_text(size=3, aes(label=treatment)) +
geom_point(data = pca.results.to.plot_AdditionProjections) +
TreatmentColorsLabels.Layer.FibroblastColorsAdded +
scale_color_identity() +
theme_bw() +
labs(title = "PCA using 1090 GAGT introns\nExpOf52 samples", caption="Additional samples projected")

ggplot(pca.results.to.plot, aes(x=PC3, y=PC4, color=color, shape=Experiment)) +
geom_text(size=3, aes(label=treatment)) +
geom_point(data = pca.results.to.plot_AdditionProjections) +
TreatmentColorsLabels.Layer.FibroblastColorsAdded +
scale_color_identity() +
theme_bw() +
labs(title = "PCA using 1090 GAGT introns\nAll samples")

I think these PCA are interseting… it’s still hard to make sense of it all… And this isn’t really the way to identify new axes of specifity since the introns were selected precisely because of their effect in risdiplam/branaplam/C2C5 in previous experiments… Better would be to identify all differentially spliced introns and perform a PCA on that, also projecting the old risdiplam/branaplam/C2C5 data in the PCA for help with interpretation… Let’s work towards that….
First, let’s explore the leafcutter differential splicing results… From previous plots, I suspect the effective dosage of these new data to be much less than the fibroblast data, and perhaps we will get much fewer significant differential splicing events… First Let’s check it out, the number of differentially spliced clusters, some P value histograms, etc.
leafcutter.sig %>%
sample_n_of(12, treatment) %>%
filter(!is.na(p)) %>%
ggplot(aes(x=p)) +
geom_histogram() +
facet_wrap(~treatment, scales="free") +
labs(caption="P value histograms for some random splicing contrasts", title="Exp of 52")

…and compare that to the fibroblast tests…
leafcutter.sig.fibroblasts %>%
filter(!is.na(p)) %>%
ggplot(aes(x=p)) +
geom_histogram() +
facet_wrap(~treatment, scales="free") +
labs(caption="P value histograms for some random splicing contrasts", title="fibroblast experiment")

ok, so a lot these will probably have even less significant hits than the fibroblast CUOME contrast.
Let’s see how many hits at an FDR of 10%…
bind_rows(
bind_rows(leafcutter.sig.fibroblasts, leafcutter.sig) %>%
filter(p.adjust < 0.01) %>%
count(treatment) %>%
mutate(FDR = 0.01),
bind_rows(leafcutter.sig.fibroblasts, leafcutter.sig) %>%
filter(p.adjust < 0.05) %>%
count(treatment) %>%
mutate(FDR = 0.05),
bind_rows(leafcutter.sig.fibroblasts, leafcutter.sig) %>%
filter(p.adjust < 0.1) %>%
count(treatment) %>%
mutate(FDR = 0.1)) %>%
left_join(FullMetadata, by='treatment') %>%
ggplot(aes(x=treatment, y=n)) +
geom_bar(stat="identity",position = "identity", alpha=.15) +
facet_grid(~Experiment, scales = "free", space = "free_x", labeller = label_wrap_gen(10)) +
Rotate_x_labels +
theme(axis.text.x = element_text(size=6)) +
theme(strip.text.x = element_text(size = 5)) +
labs(title="Number of leafcutter signigicant clusters", x="treatment", caption="FDR 1%, 5%, 10%")

bind_rows(leafcutter.sig.fibroblasts, leafcutter.sig) %>%
filter(p.adjust < 2) %>%
count(treatment) %>%
left_join(FullMetadata, by='treatment') %>%
ggplot(aes(x=treatment, y=n)) +
geom_bar(stat="identity",position = "identity", alpha=1) +
facet_grid(~Experiment, scales = "free", space = "free_x", labeller = label_wrap_gen(10)) +
Rotate_x_labels +
theme(axis.text.x = element_text(size=6)) +
theme(strip.text.x = element_text(size = 5)) +
labs(title="Number of leafcutter tests", x="treatment")

Ok, most treatments have even less hits than the CUOME contrast. In principle, sharing information across samples could help with this. I think this might be important towards finding difference amongst samples and identification of new specificities. Like mash could be useful. But that takes a bit of time to implement… leafcutter doesn’t just output standard errors, which I think are required. I suppose one hackish solution would be to perform a contrast with all treatments combined, vs DMSO… then identify significant clusters. To find more molecule-specific effects, I could then look at the nominal P value from the non aggregated treatment contrast.
Following this idea, I did a contrast with all molecules vs DMSO… Here are the leafcutter results…
leafcutter.effects.MergedContrast <- read_tsv("../code/SplicingAnalysis/MergedExp52_Contrast/_effect_sizes.txt")
leafcutter.sig.MergedContrast <- read_tsv("../code/SplicingAnalysis/MergedExp52_Contrast/_cluster_significance.txt")
bind_rows(
bind_rows(leafcutter.sig.fibroblasts, leafcutter.sig, leafcutter.sig.MergedContrast) %>%
filter(p.adjust < 0.01) %>%
count(treatment) %>%
mutate(FDR = 0.01),
bind_rows(leafcutter.sig.fibroblasts, leafcutter.sig, leafcutter.sig.MergedContrast) %>%
filter(p.adjust < 0.05) %>%
count(treatment) %>%
mutate(FDR = 0.05),
bind_rows(leafcutter.sig.fibroblasts, leafcutter.sig, leafcutter.sig.MergedContrast) %>%
filter(p.adjust < 0.1) %>%
count(treatment) %>%
mutate(FDR = 0.1)) %>%
left_join(FullMetadata, by='treatment') %>%
replace_na(list(treatment="Merged", Experiment="Single high dose LCL")) %>%
ggplot(aes(x=treatment, y=n)) +
geom_bar(stat="identity",position = "identity", alpha=.15) +
facet_grid(~Experiment, scales = "free", space = "free_x", labeller = label_wrap_gen(10)) +
Rotate_x_labels +
theme(axis.text.x = element_text(size=6)) +
theme(strip.text.x = element_text(size = 5)) +
labs(title="Number of leafcutter signigicant clusters", x="treatment", caption="FDR 1%, 5%, 10%")

bind_rows(leafcutter.sig.fibroblasts, leafcutter.sig, leafcutter.sig.MergedContrast) %>%
filter(p.adjust < 2) %>%
count(treatment) %>%
left_join(FullMetadata, by='treatment') %>%
replace_na(list(treatment="Merged", Experiment="Single high dose LCL")) %>%
ggplot(aes(x=treatment, y=n)) +
geom_bar(stat="identity",position = "identity", alpha=1) +
facet_grid(~Experiment, scales = "free", space = "free_x", labeller = label_wrap_gen(10)) +
Rotate_x_labels +
theme(axis.text.x = element_text(size=6)) +
theme(strip.text.x = element_text(size = 5)) +
labs(title="Number of leafcutter tests", x="treatment")

Wow. That did surprisingly little. I don’t have the time to troubleshoot if there is a bug or if this is a legitamite result of doing this merged contrast.
So now let’s try counting how many unique clusters there are amongst all the attempted contrasts. Let’s be permissive and consider FDR 10%
#Number of total hits (not necessarily unique)
leafcutter.sig %>%
filter(p.adjust < 0.1) %>% nrow()
[1] 32162
#Number of unique cluster hits amongst all hits
leafcutter.sig %>%
filter(p.adjust < 0.1) %>%
distinct(cluster) %>% nrow()
[1] 10803
Ok, so a lot of the hits are unique to some contrasts, in the sense that I get 10803/32162 unique clusters with FDR<10%. I think what I will do next is do the PCA, but just include introns that are FDR<10% in at least one contrast, and also consider some minimal effect size, and also consider just GAGT introns within those clusters.
IntronsToInclude <- leafcutter.sig %>%
filter(p.adjust < 0.1) %>%
inner_join(leafcutter.effects, by=c("treatment", "cluster")) %>%
filter(deltapsi > 0.01) %>%
inner_join(Introns.annotations, by='name') %>%
filter(str_detect(seq, "^\\w\\wGAGT")) %>%
pull(intron)
length(IntronsToInclude)
[1] 6383
PSI.IntronsToInclude <- PSI %>%
filter(junc %in% IntronsToInclude) %>%
dplyr::select(junc, A10.1:NewMolecule.E07.6) %>%
drop_na()
dim(PSI.IntronsToInclude)
[1] 475 173
Ok, so only 475 introns left… This probably won’t change anything from my previous PCA with ~1000 GAGT introns… Well let’s follow through anyway…
SamplesToInclude <- FullMetadata %>%
filter(Experiment %in% c("Single high dose LCL")) %>%
filter(!old.sample.name == "NewMolecule.C04-2") %>%
pull(leafcutter.name)
pca.results <- PSI.IntronsToInclude %>%
dplyr::select(junc, all_of(SamplesToInclude)) %>%
column_to_rownames("junc") %>%
scale() %>% t() %>% prcomp(scale=T)
summary(pca.results)
Importance of components:
PC1 PC2 PC3 PC4 PC5 PC6 PC7
Standard deviation 11.6045 6.86645 5.16078 3.74247 3.39272 3.03012 2.81941
Proportion of Variance 0.2835 0.09926 0.05607 0.02949 0.02423 0.01933 0.01673
Cumulative Proportion 0.2835 0.38276 0.43883 0.46832 0.49255 0.51188 0.52862
PC8 PC9 PC10 PC11 PC12 PC13 PC14
Standard deviation 2.76866 2.66673 2.5319 2.46868 2.42788 2.38280 2.30959
Proportion of Variance 0.01614 0.01497 0.0135 0.01283 0.01241 0.01195 0.01123
Cumulative Proportion 0.54476 0.55973 0.5732 0.58605 0.59846 0.61042 0.62165
PC15 PC16 PC17 PC18 PC19 PC20 PC21
Standard deviation 2.24564 2.21327 2.18855 2.1681 2.1462 2.10843 2.08510
Proportion of Variance 0.01062 0.01031 0.01008 0.0099 0.0097 0.00936 0.00915
Cumulative Proportion 0.63226 0.64258 0.65266 0.6626 0.6723 0.68161 0.69076
PC22 PC23 PC24 PC25 PC26 PC27 PC28
Standard deviation 2.02730 2.01845 1.95205 1.91161 1.86509 1.84638 1.83184
Proportion of Variance 0.00865 0.00858 0.00802 0.00769 0.00732 0.00718 0.00706
Cumulative Proportion 0.69942 0.70799 0.71602 0.72371 0.73103 0.73821 0.74527
PC29 PC30 PC31 PC32 PC33 PC34 PC35
Standard deviation 1.82624 1.81628 1.76089 1.73452 1.69435 1.68276 1.66454
Proportion of Variance 0.00702 0.00695 0.00653 0.00633 0.00604 0.00596 0.00583
Cumulative Proportion 0.75230 0.75924 0.76577 0.77210 0.77815 0.78411 0.78994
PC36 PC37 PC38 PC39 PC40 PC41 PC42
Standard deviation 1.65906 1.62985 1.6161 1.59049 1.57398 1.56278 1.55085
Proportion of Variance 0.00579 0.00559 0.0055 0.00533 0.00522 0.00514 0.00506
Cumulative Proportion 0.79574 0.80133 0.8068 0.81215 0.81737 0.82251 0.82757
PC43 PC44 PC45 PC46 PC47 PC48 PC49
Standard deviation 1.54608 1.53399 1.53190 1.52334 1.51515 1.46995 1.45829
Proportion of Variance 0.00503 0.00495 0.00494 0.00489 0.00483 0.00455 0.00448
Cumulative Proportion 0.83260 0.83756 0.84250 0.84738 0.85222 0.85677 0.86124
PC50 PC51 PC52 PC53 PC54 PC55 PC56
Standard deviation 1.43807 1.41947 1.4118 1.39282 1.37367 1.36640 1.35806
Proportion of Variance 0.00435 0.00424 0.0042 0.00408 0.00397 0.00393 0.00388
Cumulative Proportion 0.86560 0.86984 0.8740 0.87812 0.88209 0.88602 0.88991
PC57 PC58 PC59 PC60 PC61 PC62 PC63
Standard deviation 1.34707 1.3249 1.31220 1.28503 1.27454 1.26128 1.24039
Proportion of Variance 0.00382 0.0037 0.00362 0.00348 0.00342 0.00335 0.00324
Cumulative Proportion 0.89373 0.8974 0.90105 0.90452 0.90794 0.91129 0.91453
PC64 PC65 PC66 PC67 PC68 PC69 PC70
Standard deviation 1.2326 1.21995 1.20051 1.18288 1.17106 1.16242 1.14475
Proportion of Variance 0.0032 0.00313 0.00303 0.00295 0.00289 0.00284 0.00276
Cumulative Proportion 0.9177 0.92086 0.92390 0.92684 0.92973 0.93257 0.93533
PC71 PC72 PC73 PC74 PC75 PC76 PC77
Standard deviation 1.13640 1.12952 1.10683 1.10248 1.08830 1.07329 1.06585
Proportion of Variance 0.00272 0.00269 0.00258 0.00256 0.00249 0.00243 0.00239
Cumulative Proportion 0.93805 0.94074 0.94332 0.94588 0.94837 0.95079 0.95319
PC78 PC79 PC80 PC81 PC82 PC83 PC84
Standard deviation 1.05289 1.03925 1.02051 1.01535 0.99384 0.96728 0.96611
Proportion of Variance 0.00233 0.00227 0.00219 0.00217 0.00208 0.00197 0.00196
Cumulative Proportion 0.95552 0.95779 0.95999 0.96216 0.96424 0.96621 0.96817
PC85 PC86 PC87 PC88 PC89 PC90 PC91
Standard deviation 0.95319 0.93714 0.92882 0.89236 0.88891 0.87637 0.86367
Proportion of Variance 0.00191 0.00185 0.00182 0.00168 0.00166 0.00162 0.00157
Cumulative Proportion 0.97008 0.97193 0.97375 0.97542 0.97709 0.97870 0.98028
PC92 PC93 PC94 PC95 PC96 PC97 PC98
Standard deviation 0.85877 0.85077 0.83298 0.81957 0.81333 0.80289 0.76496
Proportion of Variance 0.00155 0.00152 0.00146 0.00141 0.00139 0.00136 0.00123
Cumulative Proportion 0.98183 0.98335 0.98481 0.98623 0.98762 0.98898 0.99021
PC99 PC100 PC101 PC102 PC103 PC104 PC105
Standard deviation 0.7542 0.74544 0.71378 0.70698 0.70179 0.68516 0.66594
Proportion of Variance 0.0012 0.00117 0.00107 0.00105 0.00104 0.00099 0.00093
Cumulative Proportion 0.9914 0.99258 0.99365 0.99470 0.99574 0.99673 0.99766
PC106 PC107 PC108 PC109
Standard deviation 0.64548 0.59660 0.58250 8.508e-15
Proportion of Variance 0.00088 0.00075 0.00071 0.000e+00
Cumulative Proportion 0.99854 0.99929 1.00000 1.000e+00
pca.results.to.plot <- pca.results$x %>%
as.data.frame() %>%
rownames_to_column("leafcutter.name") %>%
dplyr::select(leafcutter.name, PC1:PC6) %>%
left_join(FullMetadata, by="leafcutter.name")
SamplesToProject <- FullMetadata %>%
filter(!Experiment %in% c("Single high dose LCL")) %>%
filter(!old.sample.name == "NewMolecule.C04-2") %>%
pull(leafcutter.name)
pca.results.to.plot_AdditionProjections <- PSI.IntronsToInclude %>%
dplyr::select(junc, all_of(SamplesToProject)) %>%
column_to_rownames("junc") %>%
scale() %>% t() %>% predict(pca.results, .) %>%
as.data.frame() %>%
rownames_to_column("leafcutter.name") %>%
dplyr::select(leafcutter.name, PC1:PC6) %>%
left_join(FullMetadata, by="leafcutter.name")
TreatmentColorsForLabels.FibroblastColorsAdded <-
FullMetadata %>%
group_by(treatment) %>%
filter(dose.nM == max(dose.nM) | treatment == "DMSO" | cell.type=="Fibroblast") %>%
ungroup() %>%
distinct(treatment, .keep_all=T) %>%
arrange(dose.nM) %>%
mutate(vjust=row_number()*1.2)
TreatmentColorsLabels.Layer.FibroblastColorsAdded <- geom_text(
data = TreatmentColorsForLabels.FibroblastColorsAdded,
aes(label=treatment, color=color, vjust=vjust),
y=Inf, x=Inf, hjust=1.05
)
ggplot(pca.results.to.plot, aes(x=PC1, y=PC2, color=color, shape=Experiment)) +
geom_text(size=3, aes(label=treatment)) +
geom_point(data = pca.results.to.plot_AdditionProjections) +
TreatmentColorsLabels.Layer.FibroblastColorsAdded +
scale_color_identity() +
theme_bw() +
labs(title = "PCA using 475 GAGT introns\nAll samples", caption="Introns based on being significant in any contrast")

ggplot(pca.results.to.plot, aes(x=PC3, y=PC4, color=color, shape=Experiment)) +
geom_text(size=3, aes(label=treatment)) +
geom_point(data = pca.results.to.plot_AdditionProjections) +
TreatmentColorsLabels.Layer.FibroblastColorsAdded +
scale_color_identity() +
theme_bw() +
labs(title = "PCA using 475 GAGT introns\nAll samples", caption="Introns based on being significant in any contrast")

Ok this is actually somewhat interpretable. Again the main axis of
specificity seems to be along branaplam vs risdiplam effects, and there
is plenty of stuff in between. WD07 is actually 500nM risdiplam, which
looks fitting in relation to previous experiments. I think perhaps WC05
(SMSM32) and WD08 (4F-Py) the most interesting point I see so far… as
both replicates doesn’t neatly lie on either the risdiplam axis or the
branaplam axis. Just browsing these molecules structure that ZT shared
with me, these seem by my eye pretty structurally distinct. I would plot
the structures of these in the PCA but ggimage is kind of
tricky to install in my current R environment, so that may have to wait
for later…
some of the first things to do: - repeat analysis without constricting to GA-GT introns, to look for really novel types of induced splicing - Check dependencies at -4 and -3 positions, as these positions distinguish rsidiplam. - Both of those things are sort of difficult with the limited power with the limited dosage. Obviously higher dose experiments would be nice, but maybe some of the power issues can be compensated by thoughtfully combining similar samples for estimating effects - like with mash - replot PCA with ggimage, to help relate effects to structural motifs
sessionInfo()
R version 4.2.0 (2022-04-22)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)
Matrix products: default
BLAS/LAPACK: /software/openblas-0.3.13-el7-x86_64/lib/libopenblas_haswellp-r0.3.13.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=C
[4] LC_COLLATE=C LC_MONETARY=C LC_MESSAGES=C
[7] LC_PAPER=C LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=C LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] gplots_3.1.3 qvalue_2.28.0 edgeR_3.38.4 limma_3.52.4
[5] data.table_1.14.2 RColorBrewer_1.1-3 forcats_0.5.1 stringr_1.4.0
[9] dplyr_1.0.9 purrr_0.3.4 readr_2.1.2 tidyr_1.2.0
[13] tibble_3.1.7 ggplot2_3.3.6 tidyverse_1.3.1
loaded via a namespace (and not attached):
[1] bitops_1.0-7 fs_1.5.2 lubridate_1.8.0 bit64_4.0.5
[5] httr_1.4.3 rprojroot_2.0.3 tools_4.2.0 backports_1.4.1
[9] bslib_0.3.1 utf8_1.2.2 R6_2.5.1 KernSmooth_2.23-20
[13] DBI_1.1.2 colorspace_2.0-3 withr_2.5.0 tidyselect_1.1.2
[17] bit_4.0.4 compiler_4.2.0 git2r_0.30.1 cli_3.3.0
[21] rvest_1.0.2 xml2_1.3.3 labeling_0.4.2 sass_0.4.1
[25] caTools_1.18.2 scales_1.2.0 digest_0.6.29 R.utils_2.11.0
[29] rmarkdown_2.14 pkgconfig_2.0.3 htmltools_0.5.2 dbplyr_2.1.1
[33] fastmap_1.1.0 highr_0.9 rlang_1.0.2 readxl_1.4.0
[37] rstudioapi_0.13 jquerylib_0.1.4 generics_0.1.2 farver_2.1.0
[41] jsonlite_1.8.0 gtools_3.9.2 vroom_1.5.7 R.oo_1.24.0
[45] magrittr_2.0.3 Rcpp_1.0.8.3 munsell_0.5.0 fansi_1.0.3
[49] R.methodsS3_1.8.1 lifecycle_1.0.1 stringi_1.7.6 whisker_0.4
[53] yaml_2.3.5 plyr_1.8.7 grid_4.2.0 parallel_4.2.0
[57] promises_1.2.0.1 crayon_1.5.1 lattice_0.20-45 haven_2.5.0
[61] splines_4.2.0 hms_1.1.1 locfit_1.5-9.7 knitr_1.39
[65] pillar_1.7.0 reshape2_1.4.4 reprex_2.0.1 glue_1.6.2
[69] evaluate_0.15 modelr_0.1.8 vctrs_0.4.1 tzdb_0.3.0
[73] httpuv_1.6.5 cellranger_1.1.0 gtable_0.3.0 assertthat_0.2.1
[77] xfun_0.30 broom_0.8.0 later_1.3.0 workflowr_1.7.0
[81] ellipsis_0.3.2