Last updated: 2021-02-25

Checks: 7 0

Knit directory: neural_scRNAseq/

This reproducible R Markdown analysis was created with workflowr (version 1.6.2). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.


Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it's best to always run the code in an empty environment.

The command set.seed(20200522) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version f01a91a. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .DS_Store
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/
    Ignored:    ._.DS_Store
    Ignored:    ._Rplots.pdf
    Ignored:    ._Unfiltered.pdf
    Ignored:    .__workflowr.yml
    Ignored:    ._coverage.pdf
    Ignored:    ._coverage_sashimi.pdf
    Ignored:    ._coverage_sashimi.png
    Ignored:    ._neural_scRNAseq.Rproj
    Ignored:    ._sashimi.pdf
    Ignored:    ._stmn2.pdf
    Ignored:    ._tdp.pdf
    Ignored:    analysis/.DS_Store
    Ignored:    analysis/.Rhistory
    Ignored:    analysis/._.DS_Store
    Ignored:    analysis/._01-preprocessing.Rmd
    Ignored:    analysis/._01-preprocessing.html
    Ignored:    analysis/._02.1-SampleQC.Rmd
    Ignored:    analysis/._03-filtering.Rmd
    Ignored:    analysis/._04-clustering.Rmd
    Ignored:    analysis/._04-clustering.knit.md
    Ignored:    analysis/._04.1-cell_cycle.Rmd
    Ignored:    analysis/._05-annotation.Rmd
    Ignored:    analysis/._Lam-0-NSC_no_integration.Rmd
    Ignored:    analysis/._Lam-01-NSC_integration.Rmd
    Ignored:    analysis/._Lam-02-NSC_annotation.Rmd
    Ignored:    analysis/._NSC-1-clustering.Rmd
    Ignored:    analysis/._NSC-2-annotation.Rmd
    Ignored:    analysis/.__site.yml
    Ignored:    analysis/._additional_filtering.Rmd
    Ignored:    analysis/._additional_filtering_clustering.Rmd
    Ignored:    analysis/._index.Rmd
    Ignored:    analysis/._organoid-01-1-qualtiy-control.Rmd
    Ignored:    analysis/._organoid-01-clustering.Rmd
    Ignored:    analysis/._organoid-02-integration.Rmd
    Ignored:    analysis/._organoid-03-cluster_analysis.Rmd
    Ignored:    analysis/._organoid-04-group_integration.Rmd
    Ignored:    analysis/._organoid-04-stage_integration.Rmd
    Ignored:    analysis/._organoid-05-group_integration_cluster_analysis.Rmd
    Ignored:    analysis/._organoid-05-stage_integration_cluster_analysis.Rmd
    Ignored:    analysis/._organoid-06-1-prepare-sce.Rmd
    Ignored:    analysis/._organoid-06-conos-analysis-Seurat.Rmd
    Ignored:    analysis/._organoid-06-conos-analysis-function.Rmd
    Ignored:    analysis/._organoid-06-conos-analysis.Rmd
    Ignored:    analysis/._organoid-06-group-integration-conos-analysis.Rmd
    Ignored:    analysis/._organoid-07-conos-visualization.Rmd
    Ignored:    analysis/._organoid-07-group-integration-conos-visualization.Rmd
    Ignored:    analysis/._organoid-08-conos-comparison.Rmd
    Ignored:    analysis/._organoid-0x-sample_integration.Rmd
    Ignored:    analysis/01-preprocessing_cache/
    Ignored:    analysis/02-1-SampleQC_cache/
    Ignored:    analysis/02-quality_control_cache/
    Ignored:    analysis/02.1-SampleQC_cache/
    Ignored:    analysis/03-filtering_cache/
    Ignored:    analysis/04-clustering_cache/
    Ignored:    analysis/04.1-cell_cycle_cache/
    Ignored:    analysis/05-annotation_cache/
    Ignored:    analysis/06-clustering-all-timepoints_cache/
    Ignored:    analysis/07-cluster-analysis-all-timepoints_cache/
    Ignored:    analysis/Lam-01-NSC_integration_cache/
    Ignored:    analysis/Lam-02-NSC_annotation_cache/
    Ignored:    analysis/NSC-1-clustering_cache/
    Ignored:    analysis/NSC-2-annotation_cache/
    Ignored:    analysis/TDP-01-preprocessing_cache/
    Ignored:    analysis/TDP-02-quality_control_cache/
    Ignored:    analysis/TDP-03-filtering_cache/
    Ignored:    analysis/TDP-04-clustering_cache/
    Ignored:    analysis/TDP-05-00-filtering-plasmid-QC_cache/
    Ignored:    analysis/TDP-05-plasmid_expression_cache/
    Ignored:    analysis/TDP-06-cluster_analysis_cache/
    Ignored:    analysis/TDP-07-cluster_12_cache/
    Ignored:    analysis/TDP-08-00-clustering-HA-D96_cache/
    Ignored:    analysis/TDP-08-clustering-timeline-HA_cache/
    Ignored:    analysis/additional_filtering_cache/
    Ignored:    analysis/additional_filtering_clustering_cache/
    Ignored:    analysis/figure/
    Ignored:    analysis/organoid-01-1-qualtiy-control_cache/
    Ignored:    analysis/organoid-01-clustering_cache/
    Ignored:    analysis/organoid-02-integration_cache/
    Ignored:    analysis/organoid-03-cluster_analysis_cache/
    Ignored:    analysis/organoid-04-group_integration_cache/
    Ignored:    analysis/organoid-04-stage_integration_cache/
    Ignored:    analysis/organoid-05-group_integration_cluster_analysis_cache/
    Ignored:    analysis/organoid-05-stage_integration_cluster_analysis_cache/
    Ignored:    analysis/organoid-06-conos-analysis_cache/
    Ignored:    analysis/organoid-06-conos-analysis_test_cache/
    Ignored:    analysis/organoid-06-group-integration-conos-analysis_cache/
    Ignored:    analysis/organoid-07-conos-visualization_cache/
    Ignored:    analysis/organoid-07-group-integration-conos-visualization_cache/
    Ignored:    analysis/organoid-08-conos-comparison_cache/
    Ignored:    analysis/organoid-0x-sample_integration_cache/
    Ignored:    analysis/sample5_QC_cache/
    Ignored:    analysis/timepoints-01-organoid-integration_cache/
    Ignored:    data/.DS_Store
    Ignored:    data/._.DS_Store
    Ignored:    data/._.smbdeleteAAA17ed8b4b
    Ignored:    data/._Lam_figure2_markers.R
    Ignored:    data/._Reactive_astrocytes_markers.xlsx
    Ignored:    data/._known_NSC_markers.R
    Ignored:    data/._known_cell_type_markers.R
    Ignored:    data/._metadata.csv
    Ignored:    data/._virus_cell_tropism_markers.R
    Ignored:    data/._~$Reactive_astrocytes_markers.xlsx
    Ignored:    data/data_sushi/
    Ignored:    data/filtered_feature_matrices/
    Ignored:    output/.DS_Store
    Ignored:    output/._.DS_Store
    Ignored:    output/._NSC_cluster2_marker_genes.txt
    Ignored:    output/._TDP-06-no_integration_cluster12_marker_genes.txt
    Ignored:    output/._TDP-06-no_integration_cluster13_marker_genes.txt
    Ignored:    output/._organoid_integration_cluster1_marker_genes.txt
    Ignored:    output/Lam-01-clustering.rds
    Ignored:    output/NSC_1_clustering.rds
    Ignored:    output/NSC_cluster1_marker_genes.txt
    Ignored:    output/NSC_cluster2_marker_genes.txt
    Ignored:    output/NSC_cluster3_marker_genes.txt
    Ignored:    output/NSC_cluster4_marker_genes.txt
    Ignored:    output/NSC_cluster5_marker_genes.txt
    Ignored:    output/NSC_cluster6_marker_genes.txt
    Ignored:    output/NSC_cluster7_marker_genes.txt
    Ignored:    output/TDP-06-no_integration_cluster0_marker_genes.txt
    Ignored:    output/TDP-06-no_integration_cluster10_marker_genes.txt
    Ignored:    output/TDP-06-no_integration_cluster11_marker_genes.txt
    Ignored:    output/TDP-06-no_integration_cluster12_marker_genes.txt
    Ignored:    output/TDP-06-no_integration_cluster13_marker_genes.txt
    Ignored:    output/TDP-06-no_integration_cluster14_marker_genes.txt
    Ignored:    output/TDP-06-no_integration_cluster15_marker_genes.txt
    Ignored:    output/TDP-06-no_integration_cluster16_marker_genes.txt
    Ignored:    output/TDP-06-no_integration_cluster17_marker_genes.txt
    Ignored:    output/TDP-06-no_integration_cluster1_marker_genes.txt
    Ignored:    output/TDP-06-no_integration_cluster2_marker_genes.txt
    Ignored:    output/TDP-06-no_integration_cluster3_marker_genes.txt
    Ignored:    output/TDP-06-no_integration_cluster4_marker_genes.txt
    Ignored:    output/TDP-06-no_integration_cluster5_marker_genes.txt
    Ignored:    output/TDP-06-no_integration_cluster6_marker_genes.txt
    Ignored:    output/TDP-06-no_integration_cluster7_marker_genes.txt
    Ignored:    output/TDP-06-no_integration_cluster8_marker_genes.txt
    Ignored:    output/TDP-06-no_integration_cluster9_marker_genes.txt
    Ignored:    output/TDP-06_scran_markers.rds
    Ignored:    output/additional_filtering.rds
    Ignored:    output/conos/
    Ignored:    output/conos_organoid-06-conos-analysis.rds
    Ignored:    output/conos_organoid-06-group-integration-conos-analysis.rds
    Ignored:    output/figures/
    Ignored:    output/organoid_integration_cluster10_marker_genes.txt
    Ignored:    output/organoid_integration_cluster11_marker_genes.txt
    Ignored:    output/organoid_integration_cluster12_marker_genes.txt
    Ignored:    output/organoid_integration_cluster13_marker_genes.txt
    Ignored:    output/organoid_integration_cluster14_marker_genes.txt
    Ignored:    output/organoid_integration_cluster15_marker_genes.txt
    Ignored:    output/organoid_integration_cluster16_marker_genes.txt
    Ignored:    output/organoid_integration_cluster17_marker_genes.txt
    Ignored:    output/organoid_integration_cluster1_marker_genes.txt
    Ignored:    output/organoid_integration_cluster2_marker_genes.txt
    Ignored:    output/organoid_integration_cluster3_marker_genes.txt
    Ignored:    output/organoid_integration_cluster4_marker_genes.txt
    Ignored:    output/organoid_integration_cluster5_marker_genes.txt
    Ignored:    output/organoid_integration_cluster6_marker_genes.txt
    Ignored:    output/organoid_integration_cluster7_marker_genes.txt
    Ignored:    output/organoid_integration_cluster8_marker_genes.txt
    Ignored:    output/organoid_integration_cluster9_marker_genes.txt
    Ignored:    output/sce_01_preprocessing.rds
    Ignored:    output/sce_02_quality_control.rds
    Ignored:    output/sce_03_filtering.rds
    Ignored:    output/sce_03_filtering_all_genes.rds
    Ignored:    output/sce_06-1-prepare-sce.rds
    Ignored:    output/sce_TDP_01_preprocessing.rds
    Ignored:    output/sce_TDP_02_quality_control.rds
    Ignored:    output/sce_TDP_03_filtering.rds
    Ignored:    output/sce_TDP_03_filtering_all_genes.rds
    Ignored:    output/sce_organoid-01-clustering.rds
    Ignored:    output/sce_preprocessing.rds
    Ignored:    output/so_04-group_integration.rds
    Ignored:    output/so_04-stage_integration.rds
    Ignored:    output/so_04_1_cell_cycle.rds
    Ignored:    output/so_04_clustering.rds
    Ignored:    output/so_06-clustering_all_timepoints.rds
    Ignored:    output/so_08-00_clustering_HA_D96.rds
    Ignored:    output/so_08-clustering_timeline_HA.rds
    Ignored:    output/so_0x-sample_integration.rds
    Ignored:    output/so_TDP-06-cluster-analysis.rds
    Ignored:    output/so_TDP_04_clustering.rds
    Ignored:    output/so_TDP_05_plasmid_expression.rds
    Ignored:    output/so_additional_filtering_clustering.rds
    Ignored:    output/so_integrated_organoid-02-integration.rds
    Ignored:    output/so_merged_organoid-02-integration.rds
    Ignored:    output/so_organoid-01-clustering.rds
    Ignored:    output/so_sample_organoid-01-clustering.rds
    Ignored:    scripts/.DS_Store
    Ignored:    scripts/._.DS_Store
    Ignored:    scripts/._bu_Rcode.R
    Ignored:    scripts/._plasmid_expression.sh

Untracked files:
    Untracked:  Filtered.pdf
    Untracked:  Rplots.pdf
    Untracked:  Unfiltered
    Untracked:  Unfiltered.pdf
    Untracked:  analysis/Lam-0-NSC_no_integration.Rmd
    Untracked:  analysis/TDP-07-01-STMN2_expression copy.Rmd
    Untracked:  analysis/TDP-08-01-HA-D96-expression-changes.Rmd
    Untracked:  analysis/additional_filtering.Rmd
    Untracked:  analysis/additional_filtering_clustering.Rmd
    Untracked:  analysis/organoid-01-1-qualtiy-control.Rmd
    Untracked:  analysis/organoid-06-conos-analysis-Seurat.Rmd
    Untracked:  analysis/organoid-06-conos-analysis-function.Rmd
    Untracked:  analysis/organoid-07-conos-visualization.Rmd
    Untracked:  analysis/organoid-07-group-integration-conos-visualization.Rmd
    Untracked:  analysis/organoid-08-conos-comparison.Rmd
    Untracked:  analysis/organoid-0x-sample_integration.Rmd
    Untracked:  analysis/sample5_QC.Rmd
    Untracked:  coverage.pdf
    Untracked:  coverage_sashimi.pdf
    Untracked:  coverage_sashimi.png
    Untracked:  data/Homo_sapiens.GRCh38.98.sorted.gtf
    Untracked:  data/Kanton_et_al/
    Untracked:  data/Lam_et_al/
    Untracked:  data/Sep2020/
    Untracked:  data/reference/
    Untracked:  data/virus_cell_tropism_markers.R
    Untracked:  data/~$Reactive_astrocytes_markers.xlsx
    Untracked:  sashimi.pdf
    Untracked:  scripts/bu_Rcode.R
    Untracked:  scripts/salmon-latest_linux_x86_64/
    Untracked:  stmn2.pdf
    Untracked:  tdp.pdf

Unstaged changes:
    Modified:   analysis/05-annotation.Rmd
    Modified:   analysis/Lam-02-NSC_annotation.Rmd
    Modified:   analysis/TDP-04-clustering.Rmd
    Modified:   analysis/TDP-06-cluster_analysis.Rmd
    Modified:   analysis/_site.yml
    Modified:   analysis/organoid-02-integration.Rmd
    Modified:   analysis/organoid-04-group_integration.Rmd
    Modified:   analysis/organoid-06-conos-analysis.Rmd
    Modified:   analysis/timepoints-01-organoid-integration.Rmd

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.


These are the previous versions of the repository in which changes were made to the R Markdown (analysis/TDP-07-01-STMN2_expression.Rmd) and HTML (docs/TDP-07-01-STMN2_expression.html) files. If you've configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File Version Author Date Message
Rmd f01a91a khembach 2021-02-25 add cryptic exon location and splice junctions to plot
html 5340095 khembach 2021-02-19 Build site.
Rmd 4ddbb0d khembach 2021-02-19 text size
html a7c4a5b khembach 2021-02-18 Build site.
Rmd 1755a27 khembach 2021-02-18 plot stathmin2 read coverage of cells from cluster 12

Load packages

library(Seurat)
library(SingleCellExperiment)
library(dplyr)
library(Gviz)
library(TxDb.Hsapiens.UCSC.hg38.knownGene)
library(Rsamtools)
library(GenomicAlignments)
library(rtracklayer)

Load data & convert to SCE

so <- readRDS(file.path("output", "so_TDP_05_plasmid_expression.rds"))
so <- SetIdent(so, value = "RNA_snn_res.0.4")
so@meta.data$cluster_id <- Idents(so)

Cells from cluster 12

We want to compare the stathmin2 read coverage of cells expressing TDP-HA (from cluster 12) and other neuronal cells without TDP-HA expression. For this, we randomly select 5 cells from each group and filter the corresponding stathmin2 reads from the BAM file.

clus12 <- subset(so, subset = cluster_id == "12")
## from which sample do the cells come from?
clus12$sample_id %>% table
.
 TDP2wON TDP4wOFF TDP4wONa TDP4wONb 
      97        3       88       36 
## what is the range of TDP-HA expression in all cells in cluster 12?
dat_ha <- GetAssayData(object = clus12, slot = "data")["TDP43-HA",]
summary(dat_ha)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  0.000   1.226   1.771   1.800   2.454   4.112 
## select cells with high TDP-HA expression
high <- clus12[which(dat_ha > 3.5)]
high$barcode
 AAAGGATGTGTCATTG-1.TDP2wON  AAAGTGAAGAGATTCA-1.TDP2wON 
       "AAAGGATGTGTCATTG-1"        "AAAGTGAAGAGATTCA-1" 
 AACAACCGTGGCCTCA-1.TDP2wON  AAGAACAAGCGTACAG-1.TDP2wON 
       "AACAACCGTGGCCTCA-1"        "AAGAACAAGCGTACAG-1" 
 AAGACTCGTTGCAACT-1.TDP2wON  AATCACGCAAATGGAT-1.TDP2wON 
       "AAGACTCGTTGCAACT-1"        "AATCACGCAAATGGAT-1" 
 AATCGTGTCCGTGGGT-1.TDP2wON  AATTCCTCACACAGCC-1.TDP2wON 
       "AATCGTGTCCGTGGGT-1"        "AATTCCTCACACAGCC-1" 
 ACCATTTAGGCTCCCA-1.TDP2wON  ACGGTCGCAACGAGGT-1.TDP2wON 
       "ACCATTTAGGCTCCCA-1"        "ACGGTCGCAACGAGGT-1" 
 ACTATGGAGCCATCCG-1.TDP2wON  ACTGATGCATAGATGA-1.TDP2wON 
       "ACTATGGAGCCATCCG-1"        "ACTGATGCATAGATGA-1" 
 AGCCAATGTCATGCAT-1.TDP2wON  AGCCACGGTGTACATC-1.TDP2wON 
       "AGCCAATGTCATGCAT-1"        "AGCCACGGTGTACATC-1" 
 AGGAGGTTCCGAAATC-1.TDP2wON  AGGGTTTAGTTGGGAC-1.TDP2wON 
       "AGGAGGTTCCGAAATC-1"        "AGGGTTTAGTTGGGAC-1" 
 ATACCGACAAGATTGA-1.TDP2wON  ATAGACCAGCATTTCG-1.TDP2wON 
       "ATACCGACAAGATTGA-1"        "ATAGACCAGCATTTCG-1" 
 ATCGGATGTGTGATGG-1.TDP2wON  ATGGATCAGCCAGAGT-1.TDP2wON 
       "ATCGGATGTGTGATGG-1"        "ATGGATCAGCCAGAGT-1" 
 ATTACCTAGTTGCCCG-1.TDP2wON  ATTACTCAGAAAGTCT-1.TDP2wON 
       "ATTACCTAGTTGCCCG-1"        "ATTACTCAGAAAGTCT-1" 
 ATTACTCTCCGGTTCT-1.TDP2wON  ATTCCCGAGGTTGGAC-1.TDP2wON 
       "ATTACTCTCCGGTTCT-1"        "ATTCCCGAGGTTGGAC-1" 
 ATTCCTAAGCTTTCCC-1.TDP2wON  CAACAACCAACCGACC-1.TDP2wON 
       "ATTCCTAAGCTTTCCC-1"        "CAACAACCAACCGACC-1" 
 CAACGATTCCCGTGAG-1.TDP2wON  CACTTCGCAACGTTAC-1.TDP2wON 
       "CAACGATTCCCGTGAG-1"        "CACTTCGCAACGTTAC-1" 
 CAGCCAGTCAATCTCT-1.TDP2wON  CAGGGCTCACCCTAGG-1.TDP2wON 
       "CAGCCAGTCAATCTCT-1"        "CAGGGCTCACCCTAGG-1" 
 CATGAGTCACCAAAGG-1.TDP2wON  CATTGAGGTACGTGAG-1.TDP2wON 
       "CATGAGTCACCAAAGG-1"        "CATTGAGGTACGTGAG-1" 
 CCTTCAGGTAGATCGG-1.TDP2wON  CGCCATTCACAGCTTA-1.TDP2wON 
       "CCTTCAGGTAGATCGG-1"        "CGCCATTCACAGCTTA-1" 
 CGTAAGTAGCAATTAG-1.TDP2wON  CGTCCATTCTTACTGT-1.TDP2wON 
       "CGTAAGTAGCAATTAG-1"        "CGTCCATTCTTACTGT-1" 
 CGTTAGAAGGCATGGT-1.TDP2wON  CGTTCTGTCTTCGTGC-1.TDP2wON 
       "CGTTAGAAGGCATGGT-1"        "CGTTCTGTCTTCGTGC-1" 
 CTAGACAGTCGAGTGA-1.TDP2wON  CTCAGAATCTAGTCAG-1.TDP2wON 
       "CTAGACAGTCGAGTGA-1"        "CTCAGAATCTAGTCAG-1" 
 CTCGAGGCAGCGTATT-1.TDP2wON  CTGAGCGAGGTCACAG-1.TDP2wON 
       "CTCGAGGCAGCGTATT-1"        "CTGAGCGAGGTCACAG-1" 
 CTGCAGGAGTACTGGG-1.TDP2wON  CTGCGAGAGACCTTTG-1.TDP2wON 
       "CTGCAGGAGTACTGGG-1"        "CTGCGAGAGACCTTTG-1" 
 CTTAGGAAGTTATGGA-1.TDP2wON  CTTCTCTAGCCTATCA-1.TDP2wON 
       "CTTAGGAAGTTATGGA-1"        "CTTCTCTAGCCTATCA-1" 
 GAAACCTGTGGTAATA-1.TDP2wON  GACCGTGGTAGTGGCA-1.TDP2wON 
       "GAAACCTGTGGTAATA-1"        "GACCGTGGTAGTGGCA-1" 
 GACTTCCAGGGCTTCC-1.TDP2wON  GAGGCAATCGGTAACT-1.TDP2wON 
       "GACTTCCAGGGCTTCC-1"        "GAGGCAATCGGTAACT-1" 
 GAGGGATTCCGTTGGG-1.TDP2wON  GAGTTGTTCCTGGTCT-1.TDP2wON 
       "GAGGGATTCCGTTGGG-1"        "GAGTTGTTCCTGGTCT-1" 
 GATAGCTCATCAGCTA-1.TDP2wON  GCTACCTGTCAGATTC-1.TDP2wON 
       "GATAGCTCATCAGCTA-1"        "GCTACCTGTCAGATTC-1" 
 GCTCAAACATCGATCA-1.TDP2wON  GCTGCAGCAACGCATT-1.TDP2wON 
       "GCTCAAACATCGATCA-1"        "GCTGCAGCAACGCATT-1" 
 GCTTTCGCACTACACA-1.TDP2wON  GGCTGTGCACTGCGTG-1.TDP2wON 
       "GCTTTCGCACTACACA-1"        "GGCTGTGCACTGCGTG-1" 
 GGCTTTCTCGTCAGAT-1.TDP2wON  GGGTTATGTTACAGCT-1.TDP2wON 
       "GGCTTTCTCGTCAGAT-1"        "GGGTTATGTTACAGCT-1" 
 GGTTAACAGGTCGACA-1.TDP2wON  GTAGTACAGTCGGGAT-1.TDP2wON 
       "GGTTAACAGGTCGACA-1"        "GTAGTACAGTCGGGAT-1" 
 GTCAAGTAGGACAAGA-1.TDP2wON  GTCATTTAGCTGGTGA-1.TDP2wON 
       "GTCAAGTAGGACAAGA-1"        "GTCATTTAGCTGGTGA-1" 
 GTCTAGACAATTGAAG-1.TDP2wON  GTGAGGAGTCGTTGGC-1.TDP2wON 
       "GTCTAGACAATTGAAG-1"        "GTGAGGAGTCGTTGGC-1" 
 GTGGTTACAAGATCCT-1.TDP2wON  GTGTAACAGAGCCGTA-1.TDP2wON 
       "GTGGTTACAAGATCCT-1"        "GTGTAACAGAGCCGTA-1" 
 TAACACGAGATTCGAA-1.TDP2wON  TACCCGTAGTTAGTAG-1.TDP2wON 
       "TAACACGAGATTCGAA-1"        "TACCCGTAGTTAGTAG-1" 
 TACTTCAGTGGTAACG-1.TDP2wON  TAGATCGCACGAAGAC-1.TDP2wON 
       "TACTTCAGTGGTAACG-1"        "TAGATCGCACGAAGAC-1" 
 TATCAGGTCGTTGCCT-1.TDP2wON  TCAATCTTCGCCAACG-1.TDP2wON 
       "TATCAGGTCGTTGCCT-1"        "TCAATCTTCGCCAACG-1" 
 TCACTCGCATGAGGGT-1.TDP2wON  TCAGGGCGTGAGACCA-1.TDP2wON 
       "TCACTCGCATGAGGGT-1"        "TCAGGGCGTGAGACCA-1" 
 TCAGTCCAGGGCCAAT-1.TDP2wON  TCATGGAGTCTCTCCA-1.TDP2wON 
       "TCAGTCCAGGGCCAAT-1"        "TCATGGAGTCTCTCCA-1" 
 TCCATCGCATTGTAGC-1.TDP2wON  TCGGGTGAGGTCGTCC-1.TDP2wON 
       "TCCATCGCATTGTAGC-1"        "TCGGGTGAGGTCGTCC-1" 
 TCGGGTGGTCGTCTCT-1.TDP2wON  TCTAACTAGCACTAGG-1.TDP2wON 
       "TCGGGTGGTCGTCTCT-1"        "TCTAACTAGCACTAGG-1" 
 TCTACATGTTGTCCCT-1.TDP2wON  TCTACCGGTTTCCATT-1.TDP2wON 
       "TCTACATGTTGTCCCT-1"        "TCTACCGGTTTCCATT-1" 
 TGAGCGCTCCATCGTC-1.TDP2wON  TGCAGTATCGCCGTGA-1.TDP2wON 
       "TGAGCGCTCCATCGTC-1"        "TGCAGTATCGCCGTGA-1" 
 TGCTCGTCAAAGACGC-1.TDP2wON  TGCTTCGAGCAGCAGT-1.TDP2wON 
       "TGCTCGTCAAAGACGC-1"        "TGCTTCGAGCAGCAGT-1" 
 TGGGAGACAGCGTGCT-1.TDP2wON  TGTCCTGAGGTCTTTG-1.TDP2wON 
       "TGGGAGACAGCGTGCT-1"        "TGTCCTGAGGTCTTTG-1" 
 TTCCAATAGCGACTTT-1.TDP2wON  TTCCTCTGTACCGGAA-1.TDP2wON 
       "TTCCAATAGCGACTTT-1"        "TTCCTCTGTACCGGAA-1" 
 TTCGCTGAGCTTTCCC-1.TDP2wON  TTGGATGAGCGGCTCT-1.TDP2wON 
       "TTCGCTGAGCTTTCCC-1"        "TTGGATGAGCGGCTCT-1" 
 TTGGGTAAGTATAGGT-1.TDP2wON  TTTACGTCACAGCCAC-1.TDP2wON 
       "TTGGGTAAGTATAGGT-1"        "TTTACGTCACAGCCAC-1" 
 TTTGACTGTTATGTCG-1.TDP2wON AACCCAACATTCTTCA-1.TDP4wOFF 
       "TTTGACTGTTATGTCG-1"        "AACCCAACATTCTTCA-1" 
ATTCCCGAGTGATGGC-1.TDP4wOFF CAGTTCCTCATGCCGG-1.TDP4wOFF 
       "ATTCCCGAGTGATGGC-1"        "CAGTTCCTCATGCCGG-1" 
AAAGGATTCGTCAAAC-1.TDP4wONa AATGGCTTCAGGGATG-1.TDP4wONa 
       "AAAGGATTCGTCAAAC-1"        "AATGGCTTCAGGGATG-1" 
ACAGAAACATCGGCCA-1.TDP4wONa ACGATCAGTCATAAAG-1.TDP4wONa 
       "ACAGAAACATCGGCCA-1"        "ACGATCAGTCATAAAG-1" 
ACTTTCATCCACATAG-1.TDP4wONa AGCGATTGTGGGTCAA-1.TDP4wONa 
       "ACTTTCATCCACATAG-1"        "AGCGATTGTGGGTCAA-1" 
AGGGCTCAGTTTCTTC-1.TDP4wONa AGTTAGCAGTGCTACT-1.TDP4wONa 
       "AGGGCTCAGTTTCTTC-1"        "AGTTAGCAGTGCTACT-1" 
ATAGAGAGTGTTGATC-1.TDP4wONa ATCGCCTAGCTAGAGC-1.TDP4wONa 
       "ATAGAGAGTGTTGATC-1"        "ATCGCCTAGCTAGAGC-1" 
ATGAGGGGTACGATCT-1.TDP4wONa ATGATCGCACCGGAAA-1.TDP4wONa 
       "ATGAGGGGTACGATCT-1"        "ATGATCGCACCGGAAA-1" 
ATTCAGGTCGGACAAG-1.TDP4wONa ATTTACCCATCACAGT-1.TDP4wONa 
       "ATTCAGGTCGGACAAG-1"        "ATTTACCCATCACAGT-1" 
ATTTCACAGAGTCACG-1.TDP4wONa CAACGATGTGAGACCA-1.TDP4wONa 
       "ATTTCACAGAGTCACG-1"        "CAACGATGTGAGACCA-1" 
CACCAAAGTTAACCTG-1.TDP4wONa CAGATACCACGGATCC-1.TDP4wONa 
       "CACCAAAGTTAACCTG-1"        "CAGATACCACGGATCC-1" 
CATCAAGTCCCAAGTA-1.TDP4wONa CCAATGAGTGCGGTAA-1.TDP4wONa 
       "CATCAAGTCCCAAGTA-1"        "CCAATGAGTGCGGTAA-1" 
CCGGTAGTCTCGGGAC-1.TDP4wONa CCTCATGCATAGGTAA-1.TDP4wONa 
       "CCGGTAGTCTCGGGAC-1"        "CCTCATGCATAGGTAA-1" 
CCTGCATAGAGCCGTA-1.TDP4wONa CGACAGCAGAGGCCAT-1.TDP4wONa 
       "CCTGCATAGAGCCGTA-1"        "CGACAGCAGAGGCCAT-1" 
CGGAGAAGTTCTGAGT-1.TDP4wONa CGGGTCACATTACTCT-1.TDP4wONa 
       "CGGAGAAGTTCTGAGT-1"        "CGGGTCACATTACTCT-1" 
CGTTAGAAGGCCACTC-1.TDP4wONa CTAGACAAGATCACCT-1.TDP4wONa 
       "CGTTAGAAGGCCACTC-1"        "CTAGACAAGATCACCT-1" 
CTCCCTCCATATGGCT-1.TDP4wONa CTCGAGGAGTTGCATC-1.TDP4wONa 
       "CTCCCTCCATATGGCT-1"        "CTCGAGGAGTTGCATC-1" 
CTGCCTATCAAAGACA-1.TDP4wONa CTGTCGTTCTACCAGA-1.TDP4wONa 
       "CTGCCTATCAAAGACA-1"        "CTGTCGTTCTACCAGA-1" 
CTTCTAACACCACATA-1.TDP4wONa GAAGCGAGTTGTCCCT-1.TDP4wONa 
       "CTTCTAACACCACATA-1"        "GAAGCGAGTTGTCCCT-1" 
GAAGTAAGTCCCAAAT-1.TDP4wONa GACCGTGGTCTTCCGT-1.TDP4wONa 
       "GAAGTAAGTCCCAAAT-1"        "GACCGTGGTCTTCCGT-1" 
GAGACCCTCGAGCACC-1.TDP4wONa GAGGGATCACGAGAAC-1.TDP4wONa 
       "GAGACCCTCGAGCACC-1"        "GAGGGATCACGAGAAC-1" 
GATAGCTGTACTGCGC-1.TDP4wONa GATGCTAGTCACTGAT-1.TDP4wONa 
       "GATAGCTGTACTGCGC-1"        "GATGCTAGTCACTGAT-1" 
GCAGTTAAGGTGGCTA-1.TDP4wONa GCATCTCGTAGTATAG-1.TDP4wONa 
       "GCAGTTAAGGTGGCTA-1"        "GCATCTCGTAGTATAG-1" 
GCATCTCTCCGGGACT-1.TDP4wONa GCCCAGATCTCCTGAC-1.TDP4wONa 
       "GCATCTCTCCGGGACT-1"        "GCCCAGATCTCCTGAC-1" 
GCTGGGTTCCCGAGAC-1.TDP4wONa GGAGAACGTAAGATTG-1.TDP4wONa 
       "GCTGGGTTCCCGAGAC-1"        "GGAGAACGTAAGATTG-1" 
GGGTATTCAAGCGCAA-1.TDP4wONa GGGTCTGTCGATACTG-1.TDP4wONa 
       "GGGTATTCAAGCGCAA-1"        "GGGTCTGTCGATACTG-1" 
GGTTCTCAGACCAAGC-1.TDP4wONa GTAGCTACATACAGGG-1.TDP4wONa 
       "GGTTCTCAGACCAAGC-1"        "GTAGCTACATACAGGG-1" 
GTAGTACAGTTGCCTA-1.TDP4wONa GTATTGGCATGATGCT-1.TDP4wONa 
       "GTAGTACAGTTGCCTA-1"        "GTATTGGCATGATGCT-1" 
GTCTACCGTACAGGTG-1.TDP4wONa GTCTCACCACCATATG-1.TDP4wONa 
       "GTCTACCGTACAGGTG-1"        "GTCTCACCACCATATG-1" 
GTCTTTATCAGCTCTC-1.TDP4wONa GTCTTTATCATCCTGC-1.TDP4wONa 
       "GTCTTTATCAGCTCTC-1"        "GTCTTTATCATCCTGC-1" 
GTGCAGCAGTCCTGCG-1.TDP4wONa GTGTTCCTCCAAGGGA-1.TDP4wONa 
       "GTGCAGCAGTCCTGCG-1"        "GTGTTCCTCCAAGGGA-1" 
GTTAGACTCGCTACGG-1.TDP4wONa GTTCGCTCACTGGACC-1.TDP4wONa 
       "GTTAGACTCGCTACGG-1"        "GTTCGCTCACTGGACC-1" 
TAACGACGTCTTCCGT-1.TDP4wONa TACAGGTAGTTGGGAC-1.TDP4wONa 
       "TAACGACGTCTTCCGT-1"        "TACAGGTAGTTGGGAC-1" 
TACCCACAGGGAGAAT-1.TDP4wONa TACGGGCTCGAAATCC-1.TDP4wONa 
       "TACCCACAGGGAGAAT-1"        "TACGGGCTCGAAATCC-1" 
TATATCCGTAGCCCTG-1.TDP4wONa TCACACCCATCACCAA-1.TDP4wONa 
       "TATATCCGTAGCCCTG-1"        "TCACACCCATCACCAA-1" 
TCACATTAGTCCGTCG-1.TDP4wONa TCACTCGCAACAGAGC-1.TDP4wONa 
       "TCACATTAGTCCGTCG-1"        "TCACTCGCAACAGAGC-1" 
TCCCATGCAATTGCGT-1.TDP4wONa TCTACCGGTAAGTTAG-1.TDP4wONa 
       "TCCCATGCAATTGCGT-1"        "TCTACCGGTAAGTTAG-1" 
TGACTCCTCCACCTCA-1.TDP4wONa TGAGACTAGGACACTG-1.TDP4wONa 
       "TGACTCCTCCACCTCA-1"        "TGAGACTAGGACACTG-1" 
TGCATCCGTTACGCCG-1.TDP4wONa TGCATGATCGCGATCG-1.TDP4wONa 
       "TGCATCCGTTACGCCG-1"        "TGCATGATCGCGATCG-1" 
TGTAAGCTCCGTGCGA-1.TDP4wONa TGTCAGAAGTTTGCTG-1.TDP4wONa 
       "TGTAAGCTCCGTGCGA-1"        "TGTCAGAAGTTTGCTG-1" 
TGTCCCAAGTGCAACG-1.TDP4wONa TGTGGCGAGCACTAGG-1.TDP4wONa 
       "TGTCCCAAGTGCAACG-1"        "TGTGGCGAGCACTAGG-1" 
TGTTCTACACATAGCT-1.TDP4wONa TTACGTTTCGCGTTTC-1.TDP4wONa 
       "TGTTCTACACATAGCT-1"        "TTACGTTTCGCGTTTC-1" 
TTAGTCTGTCCTGAAT-1.TDP4wONa TTCCTTCCAGTTCCAA-1.TDP4wONa 
       "TTAGTCTGTCCTGAAT-1"        "TTCCTTCCAGTTCCAA-1" 
TTCGATTTCTCAAAGC-1.TDP4wONa TTCTAGTAGAAGATCT-1.TDP4wONa 
       "TTCGATTTCTCAAAGC-1"        "TTCTAGTAGAAGATCT-1" 
TTCTCTCCAGCGAGTA-1.TDP4wONa TTGGGCGAGGATCATA-1.TDP4wONa 
       "TTCTCTCCAGCGAGTA-1"        "TTGGGCGAGGATCATA-1" 
TTTACCAAGCGGGTTA-1.TDP4wONa TTTATGCAGCGAACTG-1.TDP4wONa 
       "TTTACCAAGCGGGTTA-1"        "TTTATGCAGCGAACTG-1" 
AATCGACGTTCGAGCC-1.TDP4wONb ACACTGAAGACGGATC-1.TDP4wONb 
       "AATCGACGTTCGAGCC-1"        "ACACTGAAGACGGATC-1" 
AGAACCTTCCCTATTA-1.TDP4wONb AGATGCTTCTGTCAGA-1.TDP4wONb 
       "AGAACCTTCCCTATTA-1"        "AGATGCTTCTGTCAGA-1" 
AGCGCTGTCAAGTGTC-1.TDP4wONb ATCCTATCAATAGTGA-1.TDP4wONb 
       "AGCGCTGTCAAGTGTC-1"        "ATCCTATCAATAGTGA-1" 
CAGGGCTAGGTCATAA-1.TDP4wONb CCCTAACCACCTTCGT-1.TDP4wONb 
       "CAGGGCTAGGTCATAA-1"        "CCCTAACCACCTTCGT-1" 
CGAAGTTAGACGGATC-1.TDP4wONb CGCCAGATCGCAACAT-1.TDP4wONb 
       "CGAAGTTAGACGGATC-1"        "CGCCAGATCGCAACAT-1" 
CGTAGTACAAGCTGTT-1.TDP4wONb CTCATCGAGTATAGGT-1.TDP4wONb 
       "CGTAGTACAAGCTGTT-1"        "CTCATCGAGTATAGGT-1" 
CTCATCGCAAGCAGGT-1.TDP4wONb CTCATTAGTAGGAGGG-1.TDP4wONb 
       "CTCATCGCAAGCAGGT-1"        "CTCATTAGTAGGAGGG-1" 
CTGAGGCTCCTCTAAT-1.TDP4wONb CTGCCATGTACGTGTT-1.TDP4wONb 
       "CTGAGGCTCCTCTAAT-1"        "CTGCCATGTACGTGTT-1" 
GAAATGACAAGGCTTT-1.TDP4wONb GAGTCATCAGAGTCAG-1.TDP4wONb 
       "GAAATGACAAGGCTTT-1"        "GAGTCATCAGAGTCAG-1" 
GCAGCTGTCACGGGCT-1.TDP4wONb GCCGATGCAATCACGT-1.TDP4wONb 
       "GCAGCTGTCACGGGCT-1"        "GCCGATGCAATCACGT-1" 
GGGAGTACAGCTGTTA-1.TDP4wONb GTAGAGGCATATCTGG-1.TDP4wONb 
       "GGGAGTACAGCTGTTA-1"        "GTAGAGGCATATCTGG-1" 
GTGTTCCCACGCCACA-1.TDP4wONb TAAGTCGAGATCGCCC-1.TDP4wONb 
       "GTGTTCCCACGCCACA-1"        "TAAGTCGAGATCGCCC-1" 
TCAGTGAAGCTGGCCT-1.TDP4wONb TCATACTTCAAGGACG-1.TDP4wONb 
       "TCAGTGAAGCTGGCCT-1"        "TCATACTTCAAGGACG-1" 
TCTACCGTCTCTGGTC-1.TDP4wONb TCTCACGCAATTGCTG-1.TDP4wONb 
       "TCTACCGTCTCTGGTC-1"        "TCTCACGCAATTGCTG-1" 
TGATCAGAGCTCCGAC-1.TDP4wONb TGCTTGCAGTGGACTG-1.TDP4wONb 
       "TGATCAGAGCTCCGAC-1"        "TGCTTGCAGTGGACTG-1" 
TGTTCCGGTACTCGAT-1.TDP4wONb TGTTTGTAGCCGAATG-1.TDP4wONb 
       "TGTTCCGGTACTCGAT-1"        "TGTTTGTAGCCGAATG-1" 
TTCATGTAGCATCCCG-1.TDP4wONb TTCATGTGTTGTTTGG-1.TDP4wONb 
       "TTCATGTAGCATCCCG-1"        "TTCATGTGTTGTTTGG-1" 
TTGCCTGAGCACTTTG-1.TDP4wONb TTGTTGTTCAGAGCAG-1.TDP4wONb 
       "TTGCCTGAGCACTTTG-1"        "TTGTTGTTCAGAGCAG-1" 
## cells with low TDP-HA expression
low <- clus12[which(dat_ha < 0.5 & dat_ha > 0)]
low$barcode 
 AAAGGATGTGTCATTG-1.TDP2wON  AAAGTGAAGAGATTCA-1.TDP2wON 
       "AAAGGATGTGTCATTG-1"        "AAAGTGAAGAGATTCA-1" 
 AACAACCGTGGCCTCA-1.TDP2wON  AAGAACAAGCGTACAG-1.TDP2wON 
       "AACAACCGTGGCCTCA-1"        "AAGAACAAGCGTACAG-1" 
 AAGACTCGTTGCAACT-1.TDP2wON  AATCACGCAAATGGAT-1.TDP2wON 
       "AAGACTCGTTGCAACT-1"        "AATCACGCAAATGGAT-1" 
 AATCGTGTCCGTGGGT-1.TDP2wON  AATTCCTCACACAGCC-1.TDP2wON 
       "AATCGTGTCCGTGGGT-1"        "AATTCCTCACACAGCC-1" 
 ACCATTTAGGCTCCCA-1.TDP2wON  ACGGTCGCAACGAGGT-1.TDP2wON 
       "ACCATTTAGGCTCCCA-1"        "ACGGTCGCAACGAGGT-1" 
 ACTATGGAGCCATCCG-1.TDP2wON  ACTGATGCATAGATGA-1.TDP2wON 
       "ACTATGGAGCCATCCG-1"        "ACTGATGCATAGATGA-1" 
 AGCCAATGTCATGCAT-1.TDP2wON  AGCCACGGTGTACATC-1.TDP2wON 
       "AGCCAATGTCATGCAT-1"        "AGCCACGGTGTACATC-1" 
 AGGAGGTTCCGAAATC-1.TDP2wON  AGGGTTTAGTTGGGAC-1.TDP2wON 
       "AGGAGGTTCCGAAATC-1"        "AGGGTTTAGTTGGGAC-1" 
 ATACCGACAAGATTGA-1.TDP2wON  ATAGACCAGCATTTCG-1.TDP2wON 
       "ATACCGACAAGATTGA-1"        "ATAGACCAGCATTTCG-1" 
 ATCGGATGTGTGATGG-1.TDP2wON  ATGGATCAGCCAGAGT-1.TDP2wON 
       "ATCGGATGTGTGATGG-1"        "ATGGATCAGCCAGAGT-1" 
 ATTACCTAGTTGCCCG-1.TDP2wON  ATTACTCAGAAAGTCT-1.TDP2wON 
       "ATTACCTAGTTGCCCG-1"        "ATTACTCAGAAAGTCT-1" 
 ATTACTCTCCGGTTCT-1.TDP2wON  ATTCCCGAGGTTGGAC-1.TDP2wON 
       "ATTACTCTCCGGTTCT-1"        "ATTCCCGAGGTTGGAC-1" 
 ATTCCTAAGCTTTCCC-1.TDP2wON  CAACAACCAACCGACC-1.TDP2wON 
       "ATTCCTAAGCTTTCCC-1"        "CAACAACCAACCGACC-1" 
 CAACGATTCCCGTGAG-1.TDP2wON  CACTTCGCAACGTTAC-1.TDP2wON 
       "CAACGATTCCCGTGAG-1"        "CACTTCGCAACGTTAC-1" 
 CAGCCAGTCAATCTCT-1.TDP2wON  CAGGGCTCACCCTAGG-1.TDP2wON 
       "CAGCCAGTCAATCTCT-1"        "CAGGGCTCACCCTAGG-1" 
 CATGAGTCACCAAAGG-1.TDP2wON  CATTGAGGTACGTGAG-1.TDP2wON 
       "CATGAGTCACCAAAGG-1"        "CATTGAGGTACGTGAG-1" 
 CCTTCAGGTAGATCGG-1.TDP2wON  CGCCATTCACAGCTTA-1.TDP2wON 
       "CCTTCAGGTAGATCGG-1"        "CGCCATTCACAGCTTA-1" 
 CGTAAGTAGCAATTAG-1.TDP2wON  CGTCCATTCTTACTGT-1.TDP2wON 
       "CGTAAGTAGCAATTAG-1"        "CGTCCATTCTTACTGT-1" 
 CGTTAGAAGGCATGGT-1.TDP2wON  CGTTCTGTCTTCGTGC-1.TDP2wON 
       "CGTTAGAAGGCATGGT-1"        "CGTTCTGTCTTCGTGC-1" 
 CTAGACAGTCGAGTGA-1.TDP2wON  CTCAGAATCTAGTCAG-1.TDP2wON 
       "CTAGACAGTCGAGTGA-1"        "CTCAGAATCTAGTCAG-1" 
 CTCGAGGCAGCGTATT-1.TDP2wON  CTGAGCGAGGTCACAG-1.TDP2wON 
       "CTCGAGGCAGCGTATT-1"        "CTGAGCGAGGTCACAG-1" 
 CTGCAGGAGTACTGGG-1.TDP2wON  CTGCGAGAGACCTTTG-1.TDP2wON 
       "CTGCAGGAGTACTGGG-1"        "CTGCGAGAGACCTTTG-1" 
 CTTAGGAAGTTATGGA-1.TDP2wON  CTTCTCTAGCCTATCA-1.TDP2wON 
       "CTTAGGAAGTTATGGA-1"        "CTTCTCTAGCCTATCA-1" 
 GAAACCTGTGGTAATA-1.TDP2wON  GACCGTGGTAGTGGCA-1.TDP2wON 
       "GAAACCTGTGGTAATA-1"        "GACCGTGGTAGTGGCA-1" 
 GACTTCCAGGGCTTCC-1.TDP2wON  GAGGCAATCGGTAACT-1.TDP2wON 
       "GACTTCCAGGGCTTCC-1"        "GAGGCAATCGGTAACT-1" 
 GAGGGATTCCGTTGGG-1.TDP2wON  GAGTTGTTCCTGGTCT-1.TDP2wON 
       "GAGGGATTCCGTTGGG-1"        "GAGTTGTTCCTGGTCT-1" 
 GATAGCTCATCAGCTA-1.TDP2wON  GCTACCTGTCAGATTC-1.TDP2wON 
       "GATAGCTCATCAGCTA-1"        "GCTACCTGTCAGATTC-1" 
 GCTCAAACATCGATCA-1.TDP2wON  GCTGCAGCAACGCATT-1.TDP2wON 
       "GCTCAAACATCGATCA-1"        "GCTGCAGCAACGCATT-1" 
 GCTTTCGCACTACACA-1.TDP2wON  GGCTGTGCACTGCGTG-1.TDP2wON 
       "GCTTTCGCACTACACA-1"        "GGCTGTGCACTGCGTG-1" 
 GGCTTTCTCGTCAGAT-1.TDP2wON  GGGTTATGTTACAGCT-1.TDP2wON 
       "GGCTTTCTCGTCAGAT-1"        "GGGTTATGTTACAGCT-1" 
 GGTTAACAGGTCGACA-1.TDP2wON  GTAGTACAGTCGGGAT-1.TDP2wON 
       "GGTTAACAGGTCGACA-1"        "GTAGTACAGTCGGGAT-1" 
 GTCAAGTAGGACAAGA-1.TDP2wON  GTCATTTAGCTGGTGA-1.TDP2wON 
       "GTCAAGTAGGACAAGA-1"        "GTCATTTAGCTGGTGA-1" 
 GTCTAGACAATTGAAG-1.TDP2wON  GTGAGGAGTCGTTGGC-1.TDP2wON 
       "GTCTAGACAATTGAAG-1"        "GTGAGGAGTCGTTGGC-1" 
 GTGGTTACAAGATCCT-1.TDP2wON  GTGTAACAGAGCCGTA-1.TDP2wON 
       "GTGGTTACAAGATCCT-1"        "GTGTAACAGAGCCGTA-1" 
 TAACACGAGATTCGAA-1.TDP2wON  TACCCGTAGTTAGTAG-1.TDP2wON 
       "TAACACGAGATTCGAA-1"        "TACCCGTAGTTAGTAG-1" 
 TACTTCAGTGGTAACG-1.TDP2wON  TAGATCGCACGAAGAC-1.TDP2wON 
       "TACTTCAGTGGTAACG-1"        "TAGATCGCACGAAGAC-1" 
 TATCAGGTCGTTGCCT-1.TDP2wON  TCAATCTTCGCCAACG-1.TDP2wON 
       "TATCAGGTCGTTGCCT-1"        "TCAATCTTCGCCAACG-1" 
 TCACTCGCATGAGGGT-1.TDP2wON  TCAGGGCGTGAGACCA-1.TDP2wON 
       "TCACTCGCATGAGGGT-1"        "TCAGGGCGTGAGACCA-1" 
 TCAGTCCAGGGCCAAT-1.TDP2wON  TCATGGAGTCTCTCCA-1.TDP2wON 
       "TCAGTCCAGGGCCAAT-1"        "TCATGGAGTCTCTCCA-1" 
 TCCATCGCATTGTAGC-1.TDP2wON  TCGGGTGAGGTCGTCC-1.TDP2wON 
       "TCCATCGCATTGTAGC-1"        "TCGGGTGAGGTCGTCC-1" 
 TCGGGTGGTCGTCTCT-1.TDP2wON  TCTAACTAGCACTAGG-1.TDP2wON 
       "TCGGGTGGTCGTCTCT-1"        "TCTAACTAGCACTAGG-1" 
 TCTACATGTTGTCCCT-1.TDP2wON  TCTACCGGTTTCCATT-1.TDP2wON 
       "TCTACATGTTGTCCCT-1"        "TCTACCGGTTTCCATT-1" 
 TGAGCGCTCCATCGTC-1.TDP2wON  TGCAGTATCGCCGTGA-1.TDP2wON 
       "TGAGCGCTCCATCGTC-1"        "TGCAGTATCGCCGTGA-1" 
 TGCTCGTCAAAGACGC-1.TDP2wON  TGCTTCGAGCAGCAGT-1.TDP2wON 
       "TGCTCGTCAAAGACGC-1"        "TGCTTCGAGCAGCAGT-1" 
 TGGGAGACAGCGTGCT-1.TDP2wON  TGTCCTGAGGTCTTTG-1.TDP2wON 
       "TGGGAGACAGCGTGCT-1"        "TGTCCTGAGGTCTTTG-1" 
 TTCCAATAGCGACTTT-1.TDP2wON  TTCCTCTGTACCGGAA-1.TDP2wON 
       "TTCCAATAGCGACTTT-1"        "TTCCTCTGTACCGGAA-1" 
 TTCGCTGAGCTTTCCC-1.TDP2wON  TTGGATGAGCGGCTCT-1.TDP2wON 
       "TTCGCTGAGCTTTCCC-1"        "TTGGATGAGCGGCTCT-1" 
 TTGGGTAAGTATAGGT-1.TDP2wON  TTTACGTCACAGCCAC-1.TDP2wON 
       "TTGGGTAAGTATAGGT-1"        "TTTACGTCACAGCCAC-1" 
 TTTGACTGTTATGTCG-1.TDP2wON AACCCAACATTCTTCA-1.TDP4wOFF 
       "TTTGACTGTTATGTCG-1"        "AACCCAACATTCTTCA-1" 
ATTCCCGAGTGATGGC-1.TDP4wOFF CAGTTCCTCATGCCGG-1.TDP4wOFF 
       "ATTCCCGAGTGATGGC-1"        "CAGTTCCTCATGCCGG-1" 
AAAGGATTCGTCAAAC-1.TDP4wONa AATGGCTTCAGGGATG-1.TDP4wONa 
       "AAAGGATTCGTCAAAC-1"        "AATGGCTTCAGGGATG-1" 
ACAGAAACATCGGCCA-1.TDP4wONa ACGATCAGTCATAAAG-1.TDP4wONa 
       "ACAGAAACATCGGCCA-1"        "ACGATCAGTCATAAAG-1" 
ACTTTCATCCACATAG-1.TDP4wONa AGCGATTGTGGGTCAA-1.TDP4wONa 
       "ACTTTCATCCACATAG-1"        "AGCGATTGTGGGTCAA-1" 
AGGGCTCAGTTTCTTC-1.TDP4wONa AGTTAGCAGTGCTACT-1.TDP4wONa 
       "AGGGCTCAGTTTCTTC-1"        "AGTTAGCAGTGCTACT-1" 
ATAGAGAGTGTTGATC-1.TDP4wONa ATCGCCTAGCTAGAGC-1.TDP4wONa 
       "ATAGAGAGTGTTGATC-1"        "ATCGCCTAGCTAGAGC-1" 
ATGAGGGGTACGATCT-1.TDP4wONa ATGATCGCACCGGAAA-1.TDP4wONa 
       "ATGAGGGGTACGATCT-1"        "ATGATCGCACCGGAAA-1" 
ATTCAGGTCGGACAAG-1.TDP4wONa ATTTACCCATCACAGT-1.TDP4wONa 
       "ATTCAGGTCGGACAAG-1"        "ATTTACCCATCACAGT-1" 
ATTTCACAGAGTCACG-1.TDP4wONa CAACGATGTGAGACCA-1.TDP4wONa 
       "ATTTCACAGAGTCACG-1"        "CAACGATGTGAGACCA-1" 
CACCAAAGTTAACCTG-1.TDP4wONa CAGATACCACGGATCC-1.TDP4wONa 
       "CACCAAAGTTAACCTG-1"        "CAGATACCACGGATCC-1" 
CATCAAGTCCCAAGTA-1.TDP4wONa CCAATGAGTGCGGTAA-1.TDP4wONa 
       "CATCAAGTCCCAAGTA-1"        "CCAATGAGTGCGGTAA-1" 
CCGGTAGTCTCGGGAC-1.TDP4wONa CCTCATGCATAGGTAA-1.TDP4wONa 
       "CCGGTAGTCTCGGGAC-1"        "CCTCATGCATAGGTAA-1" 
CCTGCATAGAGCCGTA-1.TDP4wONa CGACAGCAGAGGCCAT-1.TDP4wONa 
       "CCTGCATAGAGCCGTA-1"        "CGACAGCAGAGGCCAT-1" 
CGGAGAAGTTCTGAGT-1.TDP4wONa CGGGTCACATTACTCT-1.TDP4wONa 
       "CGGAGAAGTTCTGAGT-1"        "CGGGTCACATTACTCT-1" 
CGTTAGAAGGCCACTC-1.TDP4wONa CTAGACAAGATCACCT-1.TDP4wONa 
       "CGTTAGAAGGCCACTC-1"        "CTAGACAAGATCACCT-1" 
CTCCCTCCATATGGCT-1.TDP4wONa CTCGAGGAGTTGCATC-1.TDP4wONa 
       "CTCCCTCCATATGGCT-1"        "CTCGAGGAGTTGCATC-1" 
CTGCCTATCAAAGACA-1.TDP4wONa CTGTCGTTCTACCAGA-1.TDP4wONa 
       "CTGCCTATCAAAGACA-1"        "CTGTCGTTCTACCAGA-1" 
CTTCTAACACCACATA-1.TDP4wONa GAAGCGAGTTGTCCCT-1.TDP4wONa 
       "CTTCTAACACCACATA-1"        "GAAGCGAGTTGTCCCT-1" 
GAAGTAAGTCCCAAAT-1.TDP4wONa GACCGTGGTCTTCCGT-1.TDP4wONa 
       "GAAGTAAGTCCCAAAT-1"        "GACCGTGGTCTTCCGT-1" 
GAGACCCTCGAGCACC-1.TDP4wONa GAGGGATCACGAGAAC-1.TDP4wONa 
       "GAGACCCTCGAGCACC-1"        "GAGGGATCACGAGAAC-1" 
GATAGCTGTACTGCGC-1.TDP4wONa GATGCTAGTCACTGAT-1.TDP4wONa 
       "GATAGCTGTACTGCGC-1"        "GATGCTAGTCACTGAT-1" 
GCAGTTAAGGTGGCTA-1.TDP4wONa GCATCTCGTAGTATAG-1.TDP4wONa 
       "GCAGTTAAGGTGGCTA-1"        "GCATCTCGTAGTATAG-1" 
GCATCTCTCCGGGACT-1.TDP4wONa GCCCAGATCTCCTGAC-1.TDP4wONa 
       "GCATCTCTCCGGGACT-1"        "GCCCAGATCTCCTGAC-1" 
GCTGGGTTCCCGAGAC-1.TDP4wONa GGAGAACGTAAGATTG-1.TDP4wONa 
       "GCTGGGTTCCCGAGAC-1"        "GGAGAACGTAAGATTG-1" 
GGGTATTCAAGCGCAA-1.TDP4wONa GGGTCTGTCGATACTG-1.TDP4wONa 
       "GGGTATTCAAGCGCAA-1"        "GGGTCTGTCGATACTG-1" 
GGTTCTCAGACCAAGC-1.TDP4wONa GTAGCTACATACAGGG-1.TDP4wONa 
       "GGTTCTCAGACCAAGC-1"        "GTAGCTACATACAGGG-1" 
GTAGTACAGTTGCCTA-1.TDP4wONa GTATTGGCATGATGCT-1.TDP4wONa 
       "GTAGTACAGTTGCCTA-1"        "GTATTGGCATGATGCT-1" 
GTCTACCGTACAGGTG-1.TDP4wONa GTCTCACCACCATATG-1.TDP4wONa 
       "GTCTACCGTACAGGTG-1"        "GTCTCACCACCATATG-1" 
GTCTTTATCAGCTCTC-1.TDP4wONa GTCTTTATCATCCTGC-1.TDP4wONa 
       "GTCTTTATCAGCTCTC-1"        "GTCTTTATCATCCTGC-1" 
GTGCAGCAGTCCTGCG-1.TDP4wONa GTGTTCCTCCAAGGGA-1.TDP4wONa 
       "GTGCAGCAGTCCTGCG-1"        "GTGTTCCTCCAAGGGA-1" 
GTTAGACTCGCTACGG-1.TDP4wONa GTTCGCTCACTGGACC-1.TDP4wONa 
       "GTTAGACTCGCTACGG-1"        "GTTCGCTCACTGGACC-1" 
TAACGACGTCTTCCGT-1.TDP4wONa TACAGGTAGTTGGGAC-1.TDP4wONa 
       "TAACGACGTCTTCCGT-1"        "TACAGGTAGTTGGGAC-1" 
TACCCACAGGGAGAAT-1.TDP4wONa TACGGGCTCGAAATCC-1.TDP4wONa 
       "TACCCACAGGGAGAAT-1"        "TACGGGCTCGAAATCC-1" 
TATATCCGTAGCCCTG-1.TDP4wONa TCACACCCATCACCAA-1.TDP4wONa 
       "TATATCCGTAGCCCTG-1"        "TCACACCCATCACCAA-1" 
TCACATTAGTCCGTCG-1.TDP4wONa TCACTCGCAACAGAGC-1.TDP4wONa 
       "TCACATTAGTCCGTCG-1"        "TCACTCGCAACAGAGC-1" 
TCCCATGCAATTGCGT-1.TDP4wONa TCTACCGGTAAGTTAG-1.TDP4wONa 
       "TCCCATGCAATTGCGT-1"        "TCTACCGGTAAGTTAG-1" 
TGACTCCTCCACCTCA-1.TDP4wONa TGAGACTAGGACACTG-1.TDP4wONa 
       "TGACTCCTCCACCTCA-1"        "TGAGACTAGGACACTG-1" 
TGCATCCGTTACGCCG-1.TDP4wONa TGCATGATCGCGATCG-1.TDP4wONa 
       "TGCATCCGTTACGCCG-1"        "TGCATGATCGCGATCG-1" 
TGTAAGCTCCGTGCGA-1.TDP4wONa TGTCAGAAGTTTGCTG-1.TDP4wONa 
       "TGTAAGCTCCGTGCGA-1"        "TGTCAGAAGTTTGCTG-1" 
TGTCCCAAGTGCAACG-1.TDP4wONa TGTGGCGAGCACTAGG-1.TDP4wONa 
       "TGTCCCAAGTGCAACG-1"        "TGTGGCGAGCACTAGG-1" 
TGTTCTACACATAGCT-1.TDP4wONa TTACGTTTCGCGTTTC-1.TDP4wONa 
       "TGTTCTACACATAGCT-1"        "TTACGTTTCGCGTTTC-1" 
TTAGTCTGTCCTGAAT-1.TDP4wONa TTCCTTCCAGTTCCAA-1.TDP4wONa 
       "TTAGTCTGTCCTGAAT-1"        "TTCCTTCCAGTTCCAA-1" 
TTCGATTTCTCAAAGC-1.TDP4wONa TTCTAGTAGAAGATCT-1.TDP4wONa 
       "TTCGATTTCTCAAAGC-1"        "TTCTAGTAGAAGATCT-1" 
TTCTCTCCAGCGAGTA-1.TDP4wONa TTGGGCGAGGATCATA-1.TDP4wONa 
       "TTCTCTCCAGCGAGTA-1"        "TTGGGCGAGGATCATA-1" 
TTTACCAAGCGGGTTA-1.TDP4wONa TTTATGCAGCGAACTG-1.TDP4wONa 
       "TTTACCAAGCGGGTTA-1"        "TTTATGCAGCGAACTG-1" 
AATCGACGTTCGAGCC-1.TDP4wONb ACACTGAAGACGGATC-1.TDP4wONb 
       "AATCGACGTTCGAGCC-1"        "ACACTGAAGACGGATC-1" 
AGAACCTTCCCTATTA-1.TDP4wONb AGATGCTTCTGTCAGA-1.TDP4wONb 
       "AGAACCTTCCCTATTA-1"        "AGATGCTTCTGTCAGA-1" 
AGCGCTGTCAAGTGTC-1.TDP4wONb ATCCTATCAATAGTGA-1.TDP4wONb 
       "AGCGCTGTCAAGTGTC-1"        "ATCCTATCAATAGTGA-1" 
CAGGGCTAGGTCATAA-1.TDP4wONb CCCTAACCACCTTCGT-1.TDP4wONb 
       "CAGGGCTAGGTCATAA-1"        "CCCTAACCACCTTCGT-1" 
CGAAGTTAGACGGATC-1.TDP4wONb CGCCAGATCGCAACAT-1.TDP4wONb 
       "CGAAGTTAGACGGATC-1"        "CGCCAGATCGCAACAT-1" 
CGTAGTACAAGCTGTT-1.TDP4wONb CTCATCGAGTATAGGT-1.TDP4wONb 
       "CGTAGTACAAGCTGTT-1"        "CTCATCGAGTATAGGT-1" 
CTCATCGCAAGCAGGT-1.TDP4wONb CTCATTAGTAGGAGGG-1.TDP4wONb 
       "CTCATCGCAAGCAGGT-1"        "CTCATTAGTAGGAGGG-1" 
CTGAGGCTCCTCTAAT-1.TDP4wONb CTGCCATGTACGTGTT-1.TDP4wONb 
       "CTGAGGCTCCTCTAAT-1"        "CTGCCATGTACGTGTT-1" 
GAAATGACAAGGCTTT-1.TDP4wONb GAGTCATCAGAGTCAG-1.TDP4wONb 
       "GAAATGACAAGGCTTT-1"        "GAGTCATCAGAGTCAG-1" 
GCAGCTGTCACGGGCT-1.TDP4wONb GCCGATGCAATCACGT-1.TDP4wONb 
       "GCAGCTGTCACGGGCT-1"        "GCCGATGCAATCACGT-1" 
GGGAGTACAGCTGTTA-1.TDP4wONb GTAGAGGCATATCTGG-1.TDP4wONb 
       "GGGAGTACAGCTGTTA-1"        "GTAGAGGCATATCTGG-1" 
GTGTTCCCACGCCACA-1.TDP4wONb TAAGTCGAGATCGCCC-1.TDP4wONb 
       "GTGTTCCCACGCCACA-1"        "TAAGTCGAGATCGCCC-1" 
TCAGTGAAGCTGGCCT-1.TDP4wONb TCATACTTCAAGGACG-1.TDP4wONb 
       "TCAGTGAAGCTGGCCT-1"        "TCATACTTCAAGGACG-1" 
TCTACCGTCTCTGGTC-1.TDP4wONb TCTCACGCAATTGCTG-1.TDP4wONb 
       "TCTACCGTCTCTGGTC-1"        "TCTCACGCAATTGCTG-1" 
TGATCAGAGCTCCGAC-1.TDP4wONb TGCTTGCAGTGGACTG-1.TDP4wONb 
       "TGATCAGAGCTCCGAC-1"        "TGCTTGCAGTGGACTG-1" 
TGTTCCGGTACTCGAT-1.TDP4wONb TGTTTGTAGCCGAATG-1.TDP4wONb 
       "TGTTCCGGTACTCGAT-1"        "TGTTTGTAGCCGAATG-1" 
TTCATGTAGCATCCCG-1.TDP4wONb TTCATGTGTTGTTTGG-1.TDP4wONb 
       "TTCATGTAGCATCCCG-1"        "TTCATGTGTTGTTTGG-1" 
TTGCCTGAGCACTTTG-1.TDP4wONb TTGTTGTTCAGAGCAG-1.TDP4wONb 
       "TTGCCTGAGCACTTTG-1"        "TTGTTGTTCAGAGCAG-1" 

Get stathmin2 reads of selected cells

We extract the reads covering the stathmin2 genes of the selected cells.

bams <- list(TDP4wOFF = file.path("data", "Sep2020", "CellRangerCount_50076_2020-09-22--15-40-54",
            "no1_Neural_cuture_d_96_TDP-43-HA_4w_DOXoff",
            "possorted_genome_bam.bam"),
            TDP2wON = file.path("data", "Sep2020", "CellRangerCount_50076_2020-09-22--15-40-54",
            "no2_Neural_cuture_d_96_TDP-43-HA_2w_DOXON",
            "possorted_genome_bam.bam"),
            TDP4wONa = file.path("data", "Sep2020", "CellRangerCount_50076_2020-09-22--15-40-54",
            "no3_Neural_cuture_d_96_TDP-43-HA_4w_DOXONa",
            "possorted_genome_bam.bam"),
            TDP4wONb = file.path("data", "Sep2020", "CellRangerCount_50076_2020-09-22--15-40-54",
            "no4_Neural_cuture_d_96_TDP-43-HA_4w_DOXONb",
            "possorted_genome_bam.bam"))

chr <- "chr8"
region_start <- 79611100
region_end <- 79666200
stmn2 <- GRanges(chr, IRanges(region_start, region_end), "+")

# keep all reads from cells in cluster 12
param <- ScanBamParam(which=stmn2, what = c("qname"), tag = "CB", 
                      tagFilter = list(CB = clus12$barcode))
gals <- lapply(bams, function(x) {
  readGAlignments(x, use.names = TRUE, param=param)
  })

covs <- lapply(gals, coverage)

Ploting with Gviz

Plot the stathmin2 transcripts and the read coverage of all cells from cluster 12.

## gene annotations from UCSC
options(ucscChromosomeNames = FALSE)
eTrack <- GeneRegionTrack(TxDb.Hsapiens.UCSC.hg38.knownGene, 
                          chromosome = chr, start = region_start, 
                          end = region_end, name = "annotation")

coords <- 79611100:79666201
dat <- matrix(c(as.vector(covs[[1]]$chr8[region_start:region_end]),
                as.vector(covs[[2]]$chr8[region_start:region_end]),
                as.vector(covs[[3]]$chr8[region_start:region_end]),
                as.vector(covs[[4]]$chr8[region_start:region_end])), 
              nrow = 4, byrow = TRUE)
rownames(dat) <- names(covs)
dtrack <- DataTrack(data = dat,
                    start = coords[-length(coords)], end = coords[-1], chromosome = chr,
                    genome = "hg38")

plotTracks(c(dtrack, eTrack),  
           type = "histogram", showSampleNames = TRUE,
           shape = "arrow", geneSymbols = TRUE, aggregateGroups=FALSE,
           groups = c("TDP4wOFF", "TDP2wON", "TDP4wONa", "TDP4wONb"),
           stackedBars = FALSE, fontsize=13 )

Version Author Date
5340095 khembach 2021-02-19
a7c4a5b khembach 2021-02-18
## one data track per sample
dats <- list("4wOFF" = matrix(as.vector(covs[[1]]$chr8[region_start:region_end]),
                              nrow = 1, byrow = TRUE),
             "2wON" = matrix(as.vector(covs[[2]]$chr8[region_start:region_end]),
                              nrow = 1, byrow = TRUE),
             "4wONa" = matrix(as.vector(covs[[3]]$chr8[region_start:region_end]),
                              nrow = 1, byrow = TRUE),
             "4wONb" = matrix(as.vector(covs[[4]]$chr8[region_start:region_end]),
                              nrow = 1, byrow = TRUE))

# rownames(dat) <- names(covs)
dtrack_4wOFF <- DataTrack(data = dats[[1]],
                    start = coords[-length(coords)], end = coords[-1], chromosome = chr,
                    genome = "hg38", name = "4wOFF")
dtrack_2wON <- DataTrack(data = dats[[2]],
                    start = coords[-length(coords)], end = coords[-1], chromosome = chr,
                    genome = "hg38", name = "2wON")
dtrack_4wONa <- DataTrack(data = dats[[3]],
                    start = coords[-length(coords)], end = coords[-1], chromosome = chr,
                    genome = "hg38", name = "4wONa")
dtrack_4wONb <- DataTrack(data = dats[[4]],
                    start = coords[-length(coords)], end = coords[-1], chromosome = chr,
                    genome = "hg38", name = "4wONb")

plotTracks(c(dtrack_4wOFF, dtrack_2wON, dtrack_4wONa, dtrack_4wONb, eTrack), 
           type = "histogram", showSampleNames = TRUE,
           shape = "arrow", geneSymbols = TRUE, aggregateGroups=FALSE,
           stackedBars = FALSE, fontsize=13)

Version Author Date
5340095 khembach 2021-02-19
a7c4a5b khembach 2021-02-18
## zoom into intron 1 that contains the cryptic exon
chr <- "chr8"
region_start <- 79611100
region_end <- 79637000

plotTracks(c(dtrack_4wOFF, dtrack_2wON, dtrack_4wONa, dtrack_4wONb, eTrack),
           type = "histogram", showSampleNames = TRUE,
           chromosome = chr, from = region_start, to = region_end,
           shape = "arrow", geneSymbols = TRUE, aggregateGroups=FALSE,
           stackedBars = FALSE, fontsize=13)

Version Author Date
5340095 khembach 2021-02-19
a7c4a5b khembach 2021-02-18
## cryptic exon location: HG19 Chr8: 80,529,075-80,529,28
## in hg38: chr8:79616840-79617049
ce_start <- 79616840
ce_end <- 79617049
ceTrack <- AnnotationTrack(start = ce_start, end = ce_end, chromosome = chr, 
                          strand = "*", genome = "hg38", name = "CE")
plotTracks(c(dtrack_4wOFF, dtrack_2wON, dtrack_4wONa, dtrack_4wONb, 
             ceTrack, eTrack),
           type = "histogram", showSampleNames = TRUE,
           chromosome = chr, from = region_start, to = region_end,
           shape = "arrow", geneSymbols = TRUE, aggregateGroups=FALSE,
           stackedBars = FALSE, fontsize=13)

Splice junctions

Are there any splice junctions in the first intron?

We define new functions that allow us to filter the BAM files basd on barcodes:

####### import only reads from cells of cluster 12
## function is copid and modified from the Gviz package
.import.bam.alignments.cells <- function(file, selection) {
    indNames <- c(sub("\\.bam$", ".bai", file), paste(file, "bai", sep = "."))
    index <- NULL
    for (i in indNames) {
        if (file.exists(i)) {
            index <- i
            break
        }
    }
    if (is.null(index)) {
          stop(
              "Unable to find index for BAM file '", file, "'. You can build an index using the following command:\n\t",
              "library(Rsamtools)\n\tindexBam(\"", file, "\")"
          )
      }
    pairedEnd <- parent.env(environment())[["._isPaired"]]
    if (is.null(pairedEnd)) {
          pairedEnd <- TRUE
      }
    flag <- parent.env(environment())[["._flag"]]
    if (is.null(flag)) {
          flag <- scanBamFlag(isUnmappedQuery = FALSE)
      }
    bf <- BamFile(file, index = index, asMates = pairedEnd)
    cells <- parent.env(environment())[["._cells"]]
    if(!is.null(cells)){
      param <- ScanBamParam(which = selection, what = scanBamWhat(),
                          tag = c("MD", "CB"), flag = flag, 
                          tagFilter = list(CB = cells))
    } else{
      param <- ScanBamParam(which = selection, what = scanBamWhat(), 
                            tag = "MD", flag = flag)
    }
    
    reads <- if (as.character(seqnames(selection)[1]) %in% names(scanBamHeader(bf)$targets)) scanBam(bf, param = param)[[1]] else list()
    md <- if (is.null(reads$tag$MD)) rep(as.character(NA), length(reads$pos)) else reads$tag$MD
    if (length(reads$pos)) {
        layed_seq <- sequenceLayer(reads$seq, reads$cigar)
        region <- unlist(bamWhich(param), use.names = FALSE)
        ans <- stackStrings(layed_seq, start(region), end(region), shift = reads$pos - 1L, Lpadding.letter = "+", Rpadding.letter = "+")
        names(ans) <- seq_along(reads$qname)
    } else {
        ans <- DNAStringSet()
    }
    return(GRanges(
        seqnames = if (is.null(reads$rname)) character() else reads$rname,
        strand = if (is.null(reads$strand)) character() else reads$strand,
        ranges = IRanges(start = reads$pos, width = reads$qwidth),
        id = if (is.null(reads$qname)) character() else reads$qname,
        cigar = if (is.null(reads$cigar)) character() else reads$cigar,
        mapq = if (is.null(reads$mapq)) integer() else reads$mapq,
        flag = if (is.null(reads$flag)) integer() else reads$flag,
        md = md, seq = ans,
        isize = if (is.null(reads$isize)) integer() else reads$isize,
        groupid = if (pairedEnd) if (is.null(reads$groupid)) integer() else reads$groupid else seq_along(reads$pos),
        status = if (pairedEnd) {
            if (is.null(reads$mate_status)) factor(levels = c("mated", "ambiguous", "unmated")) else reads$mate_status
        } else {
            rep(
                factor("unmated", levels = c("mated", "ambiguous", "unmated")),
                length(reads$pos)
            )
        }
    ))
}


## Constructor
AlignmentsTrack <- function(range = NULL, start = NULL, end = NULL, width = NULL, strand, chromosome, genome,
                            stacking = "squish", id, cigar, mapq, flag = scanBamFlag(isUnmappedQuery = FALSE), isize, groupid, status, md, seqs,
                            name = "AlignmentsTrack", isPaired = TRUE, importFunction, referenceSequence, cells = NULL, ...) {
    ## Some defaults
    if (missing(importFunction)) {
        importFunction <- .import.bam.alignments
    }
    covars <- Gviz:::.getCovars(range)
    isStream <- FALSE
    if (!is.character(range)) {
        n <- max(c(length(start), length(end), length(width)), nrow(covars))
        id <- Gviz:::.covDefault(id, covars[["id"]], paste("read", seq_len(n), sep = "_"))
        cigar <- Gviz:::.covDefault(cigar, covars[["cigar"]], paste(if (is(range, "GRangesOrIRanges")) width(range) else width, "M", sep = ""))
        mapq <- Gviz:::.covDefault(mapq, covars[["mapq"]], rep(as.integer(NA), n))
        flag <- Gviz:::.covDefault(flag, covars[["flag"]], rep(as.integer(NA), n))
        isize <- Gviz:::.covDefault(isize, covars[["isize"]], rep(as.integer(NA), n))
        groupid <- Gviz:::.covDefault(groupid, covars[["groupid"]], seq_len(n))
        md <- Gviz:::.covDefault(md, covars[["md"]], rep(as.character(NA), n))
        status <- Gviz:::.covDefault(status, covars[["status"]], ifelse(groupid %in% groupid[duplicated(groupid)], "mated", "unmated"))
    }
    ## Build a GRanges object from the inputs
    Gviz:::.missingToNull(c(
        "strand", "chromosome", "importFunction", "genome", "id", "cigar", "mapq", "flag", "isize", "groupid", "status",
        "md", "seqs", "referenceSequence"
    ))
    args <- list(
        id = id, cigar = cigar, mapq = mapq, flag = flag, isize = isize, groupid = groupid, status = status, strand = strand, md = md,
        chromosome = chromosome, genome = genome
    )
    defs <- list(
        strand = "*", chromosome = "chrNA", genome = NA, id = as.character(NA), cigar = as.character(NA), mapq = as.integer(NA),
        flag = as.integer(NA), isize = as.integer(NA), groupid = as.character(NA), status = as.character(NA), md = as.character(NA)
    )
    range <- Gviz:::.buildRange(
        range = range, start = start, end = end, width = width,
        args = args, defaults = defs, chromosome = chromosome, trackType = "AlignmentsTrack",
        importFun = importFunction, stream = TRUE, autodetect = TRUE, ...
    )
    ## This is going to be a list if we have to stream data from a file, otherwise we can compute some additional values
    if (is.list(range)) {
        isStream <- TRUE
        slist <- range
        range <- GRanges()
        stackRanges <- GRanges()
        stacks <- NULL
        seqs <- DNAStringSet()
    } else {
        if (is.null(seqs)) {
            seqs <- DNAStringSet(vapply(width(range), function(x) paste(rep("N", x), collapse = ""), character(1)))
        }
        addArgs <- list(...)
        if ("showIndels" %in% names(addArgs)) {
            showIndels <- addArgs$showIndels
        } else {
            showIndels <- FALSE
        }
        tmp <- .computeAlignments(range, drop.D.ranges = showIndels)
        range <- tmp$range
        stackRanges <- tmp$stackRange
        stacks <- tmp$stacks
    }
    ## If no chromosome was explicitly asked for we just take the first one in the GRanges object
    if (missing(chromosome) || is.null(chromosome)) {
        chromosome <- if (length(range) > 0) Gviz:::.chrName(as.character(seqnames(range)[1])) else "chrNA"
    }
    ## And finally the object instantiation
    genome <- Gviz:::.getGenomeFromGRange(range, ifelse(is.null(genome), character(), genome[1]))
    if (!isStream) {
        return(new("AlignmentsTrack",
            chromosome = chromosome[1], range = range, stacks = stacks,
            name = name, genome = genome, stacking = stacking, stackRanges = stackRanges, sequences = seqs,
            referenceSequence = referenceSequence, ...
        ))
    } else {
        ## A bit hackish but for some functions we may want to know which track type we need but at the
        ## same time we do not want to enforce this as an additional argument
        e <- new.env()
        e[["._trackType"]] <- "AlignmentsTrack"
        e[["._isPaired"]] <- isPaired
        e[["._flag"]] <- flag
        e[["._cells"]] <- cells
        environment(slist[["stream"]]) <- e
        return(new("ReferenceAlignmentsTrack",
            chromosome = chromosome[1], range = range, stackRanges = stackRanges,
            name = name, genome = genome, stacking = stacking, stream = slist[["stream"]], reference = slist[["reference"]],
            mapping = slist[["mapping"]], args = args, defaults = defs, stacks = stacks, referenceSequence = referenceSequence, ...
        ))
    }
}
## STMN2 gene range
chr <- "chr8"
region_start <- 79611100
region_end <- 79666200

# Create the alignments track
alTrack_4wOFF <- AlignmentsTrack(
  range = bams[["TDP4wOFF"]],
  isPaired = FALSE, chromosome = chr, from = region_start, to = region_end,
  cells = clus12$barcode[clus12$sample_id == "TDP4wOFF"], 
  importFunction = .import.bam.alignments.cells, name = "4wOFF")
alTrack_2wON <- AlignmentsTrack(
  range = bams[["TDP2wON"]],
  isPaired = FALSE, chromosome = chr, from = region_start, to = region_end,
  cells = clus12$barcode[clus12$sample_id == "TDP2wON"], 
  importFunction = .import.bam.alignments.cells, name = "2wON")
alTrack_4wONa <- AlignmentsTrack(
  range = bams[["TDP4wONa"]],
  isPaired = FALSE, chromosome = chr, from = region_start, to = region_end,
  cells = clus12$barcode[clus12$sample_id == "TDP4wONa"], 
  importFunction = .import.bam.alignments.cells, name = "4wONa")
alTrack_4wONb <- AlignmentsTrack(
  range = bams[["TDP4wONb"]],
  isPaired = FALSE, chromosome = chr, from = region_start, to = region_end,
  cells = clus12$barcode[clus12$sample_id == "TDP4wONb"], 
  importFunction = .import.bam.alignments.cells, name = "4wONb")

## import GTF with gene annotation
gtf <- import(file.path("data", "Homo_sapiens.GRCh38.98.sorted.gtf"))
### import GTF, transfort to TxDb and create GeneRegionTrack
seqlevelsStyle(gtf) <- "UCSC"
txdb <- makeTxDbFromGRanges(gtf)
Warning in .get_cds_IDX(mcols0$type, mcols0$phase): The "phase" metadata column contains non-NA values for features of type
  stop_codon. This information was ignored.
gtftrack <- GeneRegionTrack(txdb, name = "annotation") 


## Whole gene
plotTracks(c(alTrack_4wOFF, alTrack_2wON, alTrack_4wONa, alTrack_4wONb,
             ceTrack, gtftrack),  
           type = c("coverage", "sashimi"), 
           chromosome = chr, from = region_start, to = region_end,
           extend.left = 500, extend.right = 100,
           fontsize=13,
           sizes = c(rep(3, 4), 1, 3),
           transcriptAnnotation = "transcript")

## only first intron 
## zoom into intron 1 that contains the cryptic exon
region_start <- 79611100
region_end <- 79637000

plotTracks(c(alTrack_4wOFF, 
             alTrack_2wON, alTrack_4wONa, alTrack_4wONb,
             ceTrack, gtftrack),  
           type = c("coverage", "sashimi"), 
           chromosome = chr, from = region_start, to = region_end,
           extend.left = 500, extend.right = 100,
           fontsize=13,
           sizes = c(rep(3, 4), 1, 3),
           transcriptAnnotation = "transcript")

Read coverage in cells with low and high TDP-HA expression from cluster 12

## LOW ##
chr <- "chr8"
region_start <- 79611100
region_end <- 79666200

# Create the alignments track
alTrack_4wOFF <- AlignmentsTrack(
  range = bams[["TDP4wOFF"]],
  isPaired = FALSE, chromosome = chr, from = region_start, to = region_end,
  cells = low$barcode[clus12$sample_id == "TDP4wOFF"], 
  importFunction = .import.bam.alignments.cells, name = "4wOFF")
alTrack_2wON <- AlignmentsTrack(
  range = bams[["TDP2wON"]],
  isPaired = FALSE, chromosome = chr, from = region_start, to = region_end,
  cells = low$barcode[clus12$sample_id == "TDP2wON"], 
  importFunction = .import.bam.alignments.cells, name = "2wON")
alTrack_4wONa <- AlignmentsTrack(
  range = bams[["TDP4wONa"]],
  isPaired = FALSE, chromosome = chr, from = region_start, to = region_end,
  cells = low$barcode[clus12$sample_id == "TDP4wONa"], 
  importFunction = .import.bam.alignments.cells, name = "4wONa")
alTrack_4wONb <- AlignmentsTrack(
  range = bams[["TDP4wONb"]],
  isPaired = FALSE, chromosome = chr, from = region_start, to = region_end,
  cells = low$barcode[clus12$sample_id == "TDP4wONb"], 
  importFunction = .import.bam.alignments.cells, name = "4wONb")

plotTracks(c(alTrack_4wOFF, alTrack_2wON, alTrack_4wONa, alTrack_4wONb,
             ceTrack, gtftrack),  
           type = c("coverage", "sashimi"), 
           chromosome = chr, from = region_start, to = region_end,
           extend.left = 500, extend.right = 100,
           fontsize=13,
           sizes = c(rep(3, 4), 1, 3),
           transcriptAnnotation = "transcript")

## show individual reads
plotTracks(c(alTrack_4wOFF, alTrack_2wON, alTrack_4wONa, alTrack_4wONb,
             ceTrack, gtftrack),
           chromosome = chr, from = region_start, to = region_end,
           extend.left = 500, extend.right = 100,
           fontsize=13,
           sizes = c(rep(3, 4), 1, 3),
           transcriptAnnotation = "transcript")

## only first intron 
## zoom into intron 1 that contains the cryptic exon
region_start <- 79611100
region_end <- 79637000

plotTracks(c(alTrack_4wOFF, 
             alTrack_2wON, alTrack_4wONa, alTrack_4wONb,
             ceTrack, gtftrack),  
           type = c("coverage", "sashimi"), 
           chromosome = chr, from = region_start, to = region_end,
           extend.left = 500, extend.right = 100,
           fontsize=13,
           sizes = c(rep(3, 4), 1, 3),
           transcriptAnnotation = "transcript")

## HIGH ##
region_start <- 79611100
region_end <- 79666200

# Create the alignments track
alTrack_4wOFF <- AlignmentsTrack(
  range = bams[["TDP4wOFF"]],
  isPaired = FALSE, chromosome = chr, from = region_start, to = region_end,
  cells = high$barcode[clus12$sample_id == "TDP4wOFF"], 
  importFunction = .import.bam.alignments.cells, name = "4wOFF")
alTrack_2wON <- AlignmentsTrack(
  range = bams[["TDP2wON"]],
  isPaired = FALSE, chromosome = chr, from = region_start, to = region_end,
  cells = high$barcode[clus12$sample_id == "TDP2wON"], 
  importFunction = .import.bam.alignments.cells, name = "2wON")
alTrack_4wONa <- AlignmentsTrack(
  range = bams[["TDP4wONa"]],
  isPaired = FALSE, chromosome = chr, from = region_start, to = region_end,
  cells = high$barcode[clus12$sample_id == "TDP4wONa"], 
  importFunction = .import.bam.alignments.cells, name = "4wONa")
alTrack_4wONb <- AlignmentsTrack(
  range = bams[["TDP4wONb"]],
  isPaired = FALSE, chromosome = chr, from = region_start, to = region_end,
  cells = high$barcode[clus12$sample_id == "TDP4wONb"], 
  importFunction = .import.bam.alignments.cells, name = "4wONb")

plotTracks(c(alTrack_4wOFF, alTrack_2wON, alTrack_4wONa, alTrack_4wONb,
             ceTrack, gtftrack),  
           type = c("coverage", "sashimi"), 
           chromosome = chr, from = region_start, to = region_end,
           extend.left = 500, extend.right = 100, fontsize=13,
           sizes = c(rep(3, 4), 1, 2),
           transcriptAnnotation = "transcript")

plotTracks(c(alTrack_4wOFF, alTrack_2wON, alTrack_4wONa, alTrack_4wONb,
             ceTrack, gtftrack),
           chromosome = chr, from = region_start, to = region_end,
           extend.left = 500, extend.right = 100, fontsize=13,
           sizes = c(rep(3, 4), 1, 3),
           transcriptAnnotation = "transcript")

## only first intron 
## zoom into intron 1 that contains the cryptic exon
region_start <- 79611100
region_end <- 79637000

plotTracks(c(alTrack_4wOFF, 
             alTrack_2wON, alTrack_4wONa, alTrack_4wONb,
             ceTrack, gtftrack),  
           type = c("coverage", "sashimi"), 
           chromosome = chr, from = region_start, to = region_end,
           extend.left = 500, extend.right = 100, fontsize=13,
           sizes = c(rep(3, 4), 1, 3),
           transcriptAnnotation = "transcript")


sessionInfo()
R version 4.0.0 (2020-04-24)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.6 LTS

Matrix products: default
BLAS:   /usr/local/R/R-4.0.0/lib/libRblas.so
LAPACK: /usr/local/R/R-4.0.0/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
 [1] grid      parallel  stats4    stats     graphics  grDevices utils    
 [8] datasets  methods   base     

other attached packages:
 [1] rtracklayer_1.48.0                      
 [2] GenomicAlignments_1.24.0                
 [3] Rsamtools_2.4.0                         
 [4] Biostrings_2.56.0                       
 [5] XVector_0.28.0                          
 [6] TxDb.Hsapiens.UCSC.hg38.knownGene_3.10.0
 [7] GenomicFeatures_1.40.0                  
 [8] AnnotationDbi_1.50.1                    
 [9] Gviz_1.32.0                             
[10] dplyr_1.0.2                             
[11] SingleCellExperiment_1.10.1             
[12] SummarizedExperiment_1.18.1             
[13] DelayedArray_0.14.0                     
[14] matrixStats_0.56.0                      
[15] Biobase_2.48.0                          
[16] GenomicRanges_1.40.0                    
[17] GenomeInfoDb_1.24.2                     
[18] IRanges_2.22.2                          
[19] S4Vectors_0.26.1                        
[20] BiocGenerics_0.34.0                     
[21] Seurat_3.1.5                            
[22] workflowr_1.6.2                         

loaded via a namespace (and not attached):
  [1] backports_1.1.9          Hmisc_4.4-1              BiocFileCache_1.12.0    
  [4] plyr_1.8.6               igraph_1.2.5             lazyeval_0.2.2          
  [7] splines_4.0.0            BiocParallel_1.22.0      listenv_0.8.0           
 [10] ggplot2_3.3.2            digest_0.6.25            ensembldb_2.12.1        
 [13] htmltools_0.5.0          checkmate_2.0.0          magrittr_1.5            
 [16] memoise_1.1.0            BSgenome_1.56.0          cluster_2.1.0           
 [19] ROCR_1.0-11              globals_0.12.5           askpass_1.1             
 [22] prettyunits_1.1.1        jpeg_0.1-8.1             colorspace_1.4-1        
 [25] blob_1.2.1               rappdirs_0.3.1           ggrepel_0.8.2           
 [28] xfun_0.15                crayon_1.3.4             RCurl_1.98-1.2          
 [31] jsonlite_1.7.0           VariantAnnotation_1.34.0 survival_3.2-3          
 [34] zoo_1.8-8                ape_5.4                  glue_1.4.2              
 [37] gtable_0.3.0             zlibbioc_1.34.0          leiden_0.3.3            
 [40] future.apply_1.6.0       scales_1.1.1             DBI_1.1.0               
 [43] Rcpp_1.0.5               htmlTable_2.0.1          viridisLite_0.3.0       
 [46] progress_1.2.2           reticulate_1.16          foreign_0.8-80          
 [49] bit_1.1-15.2             rsvd_1.0.3               Formula_1.2-3           
 [52] tsne_0.1-3               htmlwidgets_1.5.1        httr_1.4.1              
 [55] RColorBrewer_1.1-2       ellipsis_0.3.1           ica_1.0-2               
 [58] pkgconfig_2.0.3          XML_3.99-0.4             nnet_7.3-14             
 [61] uwot_0.1.8               dbplyr_1.4.4             tidyselect_1.1.0        
 [64] rlang_0.4.7              reshape2_1.4.4           later_1.1.0.1           
 [67] munsell_0.5.0            tools_4.0.0              generics_0.0.2          
 [70] RSQLite_2.2.0            ggridges_0.5.2           evaluate_0.14           
 [73] stringr_1.4.0            yaml_2.2.1               knitr_1.29              
 [76] bit64_0.9-7              fs_1.4.2                 fitdistrplus_1.1-1      
 [79] purrr_0.3.4              RANN_2.6.1               AnnotationFilter_1.12.0 
 [82] pbapply_1.4-2            future_1.17.0            nlme_3.1-148            
 [85] whisker_0.4              biomaRt_2.44.1           rstudioapi_0.11         
 [88] compiler_4.0.0           plotly_4.9.2.1           curl_4.3                
 [91] png_0.1-7                tibble_3.0.3             stringi_1.4.6           
 [94] lattice_0.20-41          ProtGenerics_1.20.0      Matrix_1.2-18           
 [97] vctrs_0.3.4              pillar_1.4.6             lifecycle_0.2.0         
[100] lmtest_0.9-37            RcppAnnoy_0.0.16         data.table_1.12.8       
[103] cowplot_1.0.0            bitops_1.0-6             irlba_2.3.3             
[106] httpuv_1.5.4             patchwork_1.0.1          R6_2.4.1                
[109] latticeExtra_0.6-29      promises_1.1.1           KernSmooth_2.23-17      
[112] gridExtra_2.3            codetools_0.2-16         dichromat_2.0-0         
[115] MASS_7.3-51.6            assertthat_0.2.1         openssl_1.4.2           
[118] rprojroot_1.3-2          sctransform_0.2.1        GenomeInfoDbData_1.2.3  
[121] hms_0.5.3                rpart_4.1-15             tidyr_1.1.0             
[124] rmarkdown_2.3            Rtsne_0.15               biovizBase_1.36.0       
[127] git2r_0.27.1             base64enc_0.1-3