Last updated: 2023-03-14
Checks: 7 0
Knit directory: Hevesi_2023/
This reproducible R Markdown analysis was created with workflowr (version 1.7.0). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.
Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.
Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.
The command set.seed(20230121)
was run prior to running
the code in the R Markdown file. Setting a seed ensures that any results
that rely on randomness, e.g. subsampling or permutations, are
reproducible.
Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.
Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.
Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.
The results in this page were generated with repository version a7f002d. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.
Note that you need to be careful to ensure that all relevant files for
the analysis have been committed to Git prior to generating the results
(you can use wflow_publish
or
wflow_git_commit
). workflowr only checks the R Markdown
file, but you know if there are other scripts or data files that it
depends on. Below is the status of the Git repository when the results
were generated:
Ignored files:
Ignored: .DS_Store
Ignored: .Rhistory
Ignored: .Rproj.user/
Ignored: .cache/
Ignored: .config/
Ignored: .nv/
Ignored: .snakemake/
Ignored: cellbender/
Ignored: cellbender_latest.sif
Ignored: cellranger/
Ignored: data/Pr5P7_clusters.h5Seurat
Ignored: data/Pr5P7_clusters.h5ad
Ignored: data/THP7_clusters.h5Seurat
Ignored: data/THP7_clusters.h5ad
Ignored: data/neuro_fin-THP7.h5seurat
Ignored: data/neuro_fin.h5seurat
Ignored: fastq/
Ignored: mm10_optimized/
Ignored: souporcell/
Ignored: souporcell_latest.sif
Unstaged changes:
Modified: output/figures/Pr5P7_sex_umap.pdf
Modified: output/figures/Pr5P7_stress_umap.pdf
Modified: output/figures/Pr5P7_top5_umap.pdf
Modified: output/figures/THP7_sex_umap.pdf
Modified: output/figures/THP7_stress_umap.pdf
Modified: output/figures/THP7_top5_umap.pdf
Modified: output/figures/combined_sex_umap.pdf
Modified: output/figures/combined_stress_umap.pdf
Modified: output/figures/combined_top5_umap.pdf
Modified: output/tables/hevesi2023-all-mrk_logreg-sct_combined.csv
Modified: output/tables/hevesi2023-all-mrk_wilcox-sct_combined.csv
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
These are the previous versions of the repository in which changes were
made to the R Markdown (analysis/methods.Rmd
) and HTML
(docs/methods.html
) files. If you’ve configured a remote
Git repository (see ?wflow_git_remote
), click on the
hyperlinks in the table below to view the files as they were in that
past version.
File | Version | Author | Date | Message |
---|---|---|---|---|
Rmd | a7f002d | EugOT | 2023-03-14 | update bib |
Rmd | edcb936 | Evgenii O. Tretiakov | 2023-03-14 | update methods |
Rmd | 3b155ad | Evgenii O. Tretiakov | 2023-03-05 | add figures code |
# Presentation
library("glue")
library("knitr")
# JSON
library("jsonlite")
# Tidyverse
library("tidyverse")
dir.create(here::here("output", DOCNAME), showWarnings = FALSE)
write_bib(c("base", "Seurat", "SeuratWrappers", "SeuratDisk", "sctransform",
"glmGamPoi", "patchwork", "scCustomize", "Nebulosa", "clustree",
"mrtree", "gprofiler2", "cowplot", "UpSetR", "ggstatsplot",
"gridExtra", "tidyverse", "dplyr", "tidyr", "magrittr", "stringr",
"skimr", "future", "purrr", "here", "workflowr", "zeallot", "knitr",
"kableExtra", "rmarkdown", "reticulate"),
file = here::here("output", DOCNAME, "packages.bib"))
versions <- list(
biomaRt = packageVersion("biomaRt"),
cellbender = "docker://etretiakov/cellbender:v0.0.1",
cellranger = "7.1.0",
cellranger_ref = "mm10_optimized_v.1.0",
clustree = packageVersion("clustree"),
cowplot = packageVersion("cowplot"),
dplyr = packageVersion("dplyr"),
future = packageVersion("future"),
ggplot2 = packageVersion("ggplot2"),
ggstatsplot = packageVersion("ggstatsplot"),
glmGamPoi = packageVersion("glmGamPoi"),
gprofiler2 = packageVersion("gprofiler2"),
gridExtra = packageVersion("gridExtra"),
here = packageVersion("here"),
kableExtra = packageVersion("kableExtra"),
knitr = packageVersion("knitr"),
magrittr = packageVersion("magrittr"),
mrtree = packageVersion("mrtree"),
Nebulosa = packageVersion("Nebulosa"),
pandoc = rmarkdown::pandoc_version(),
patchwork = packageVersion("patchwork"),
purrr = packageVersion("purrr"),
python = "3.8.8",
R = str_extract(R.version.string, "[0-9\\.]+"),
reticulate = packageVersion("reticulate"),
rmarkdown = packageVersion("rmarkdown"),
scCustomize = packageVersion("scCustomize"),
sctransform = packageVersion("sctransform"),
Seurat = packageVersion("Seurat"),
skimr = packageVersion("skimr"),
stringr = packageVersion("stringr"),
Snakemake = "7.21.0",
souporcell = "shub://wheaton5/souporcell",
tidyr = packageVersion("tidyr"),
tidyverse = packageVersion("tidyverse"),
UpSetR = packageVersion("UpSetR"),
viridis = packageVersion("viridis"),
workflowr = packageVersion("workflowr"),
zeallot = packageVersion("zeallot")
)
The Cell Ranger pipeline (v7.1.0) (Zheng et al. 2017) was used to perform sample demultiplexing, barcode processing and single-nuclei gene counting. Briefly, samples were demultiplexed to produce a pair of FASTQ files for each sample. Reads containing sequence information were aligned using the optimised mouse genome reference (vmm10_optimized_v.1.0) provided by Pool’s lab based on the default Cell Ranger mm10 genome version 2020-A that was cleared from gene overlaps, poorly annotated exons and 3’-UTRs and intergenic fragments (Pool et al. 2022). PCR duplicates were removed by selecting unique combinations of cell barcodes, unique molecular identifiers (UMIs) and gene ids with the final results being a gene expression matrix that was used for further analysis.
The droplet selection method of Cell Ranger identified 923 nuclei in ventrobasal thalamus and in principal sensory trigeminal nucleus - 731 nuclei based on EmptyDrops method (Lun et al. 2019) incorporated into cellranger count pipeline.
Using those values as expected number of cells we applied neural network-based approach called CellBender (docker://etretiakov/cellbender:v0.0.1) (flemingUnsupervisedRemovalSystematic2022?). We set false positive rate threshold at the level of 0.01 and set neural network to learn over 150 epoch with total 5000 droplets included based on knee plots (please see online supplementary Cell Ranger reports).
We quantified log probability to be a doublet for every cell based on apriori knowledge of genotypes from each input samples and called variant occurrence frequencies that allowed to cluster nuclei to source organism or classify as a doublet (heatonSouporcellRobustClustering2020?) (see shub://wheaton5/souporcell) .
Gene annotation information was added using the gprofiler2 package (v2.54.0, docker://etretiakov/cellbender:v0.0.1, 7.1.0, mm10_optimized_v.1.0, 0.5.0, 1.1.1, 1.1.0, 1.31.0, 3.4.1, 0.10.0, 1.10.2, 0.2.1, 2.3, 1.0.1, 1.3.4, 1.42, 2.0.3, 0.0.0.9000, 1.8.0, 2.19.2, 1.1.2.9000, 1.0.1, 3.8.8, 4.2.2, 1.28, 2.20, 1.1.1, 0.3.5, 4.3.0, 2.1.5, 1.5.0, 7.21.0, shub://wheaton5/souporcell, 1.3.0, 1.3.2.9000, 1.4.0, 0.6.2, 1.7.0, 0.1.0$gprofiler2) [@reimandProfileraWebServer2016]; thus we filter cells based on high content of mitochondrial, ribosomal or hemoglobin proteins genes, specifically 1%, 1% and 0.5%; additionally pseudogenes and poorly annotated genes were also deleted from count matrix. Moreover, cells of low complexity were filtered out as (\(\log_{10}Genes/\log_{10}UMI < 0.8\)). Therefore, cells were assigned cell cycle scores using the CellCycleScoring function in the Seurat package (v4.3.0).
We used the selection method in the Seurat package (v4.3.0) (Satija2015-or?; Stuart and Satija 2019), which uses a modern variance stabilising transformation statistical technic that utilises scaling to person residuals (Hafemeister and Satija 2019). That way we selected 3000 highly variable genes per dataset and regressed out complexity and cell-cycle variability prior to final scaling of filtered matrixes.
We performed Leiden algorithm graph-based clustering. PCA was
performed using the selected genes and the jacknife tested (chungStatisticalSignificanceVariables2015?)
principal components (we tested significance of feature for randomly
picked 1% of data over 1000 iterations; see PCScore function in
functions.R script of code directory) were used to construct a shared
nearest neighbour graph using the overlap between the 15
nearest neighbours of each cell. Leiden modularity optimisation (Traag, Waltman, and Eck 2019) was used to
partition this graph with an array of resolution parameters where 30
modularity events were sampled between 0.2
and
2.5
. Clustering tree visualisations (Zappia and Oshlack 2018) were produced using
the clustree package (v0.5.0) showing the resolution of previously
identified clusters. By inspecting these resolutions reconcile tree
produced by mrtree package (v0.0.0.9000 (pengCellTypeHierarchy2021?)
and calculating adjusted multi-resolution Rand index chosen as maximum
value if there is no higher modularity within 0.05 AMRI difference (see
SelectResolution in function.R file of code directory).
Marker genes for each cluster were identified using logreg test (ntranosDiscriminativeLearningApproach2019?)
implemented in Seurat framework (v) (Stuart and
Satija 2019). Genes were considered significant markers for a
cluster if they had an FDR less than 0.001
. Identities were
assigned to each cluster by comparing the detected genes to previously
published markers and our own validation experiments.
Visualisations and figures were primarily created using the ggplot2 (v3.4.1) (Wickham2010-zq?) and cowplot (v1.1.1) (Wilke 2020) packages using the viridis colour palette (v0.6.2) for continuous data. UpSet plots (Conway, Lex, and Gehlenborg 2017) were produced using the UpSetR package (v1.4.0) (Gehlenborg 2019) with help from the gridExtra package (v2.3) (Auguie 2017). Data manipulation was performed using other packages in the tidyverse (v1.3.2.9000) (Wickham 2023) particularly dplyr (v1.1.0) (Wickham et al. 2023), tidyr (v1.1.0) (Wickham, Vaughan, and Girlich 2023) and purrr (v1.0.1) (Wickham and Henry 2023). The analysis project was managed using the Snakemake system (v ) (Mölder et al. 2021) and the workflowr (v1.7.0) (Blischak, Carbonetto, and Stephens 2021) package which was also used to produce the publicly available website displaying the analysis code, results and output. Reproducible reports were produced using knitr (v1.42) (Xie2014-ha?; Xie2016-ct?; Xie 2023) and R Markdown (v2.20) (Xie2018-tw?; Allaire et al. 2023) and converted to HTML using Pandoc (v2.19.2).
versions <- purrr::map(versions, as.character)
versions <- jsonlite::toJSON(versions, pretty = TRUE)
readr::write_lines(versions,
here::here("output", DOCNAME, "package-versions.json"))
devtools::session_info()
─ Session info ───────────────────────────────────────────────────────────────
setting value
version R version 4.2.2 (2022-10-31)
os Ubuntu 22.04.1 LTS
system x86_64, linux-gnu
ui X11
language en_US:en
collate en_US.UTF-8
ctype en_US.UTF-8
tz Etc/UTC
date 2023-03-14
pandoc 2.19.2 @ /usr/lib/rstudio-server/bin/quarto/bin/tools/ (via rmarkdown)
─ Packages ───────────────────────────────────────────────────────────────────
package * version date (UTC) lib source
bit 4.0.5 2022-11-15 [2] RSPM (R 4.2.0)
bit64 4.0.5 2020-08-30 [2] RSPM (R 4.2.0)
bslib 0.4.2 2022-12-16 [2] RSPM (R 4.2.0)
cachem 1.0.6 2021-08-19 [2] RSPM (R 4.2.0)
callr 3.7.3 2022-11-02 [2] RSPM (R 4.2.0)
cli 3.6.0 2023-01-09 [2] RSPM (R 4.2.2)
colorspace 2.1-0 2023-01-23 [2] RSPM (R 4.2.2)
crayon 1.5.2 2022-09-29 [2] RSPM (R 4.2.0)
devtools 2.4.5 2022-10-11 [2] RSPM (R 4.2.0)
digest 0.6.31 2022-12-11 [2] RSPM (R 4.2.0)
dplyr * 1.1.0 2023-01-29 [2] RSPM (R 4.2.2)
ellipsis 0.3.2 2021-04-29 [2] RSPM (R 4.2.0)
evaluate 0.20 2023-01-17 [2] RSPM (R 4.2.2)
fansi 1.0.4 2023-01-22 [2] RSPM (R 4.2.2)
fastmap 1.1.0 2021-01-25 [2] RSPM (R 4.2.0)
forcats * 0.5.2 2022-08-19 [2] RSPM (R 4.2.0)
fs 1.6.1 2023-02-06 [2] RSPM (R 4.2.2)
generics 0.1.3 2022-07-05 [2] RSPM (R 4.2.0)
getPass 0.2-2 2017-07-21 [2] RSPM (R 4.2.0)
ggplot2 * 3.4.1 2023-02-10 [2] RSPM (R 4.2.2)
git2r 0.30.1 2022-03-16 [2] RSPM (R 4.2.0)
glue * 1.6.2 2022-02-24 [2] RSPM (R 4.2.0)
gtable 0.3.1 2022-09-01 [2] RSPM (R 4.2.0)
here 1.0.1 2020-12-13 [2] RSPM (R 4.2.0)
hms 1.1.2 2022-08-19 [2] RSPM (R 4.2.0)
htmltools 0.5.4 2022-12-07 [2] RSPM (R 4.2.0)
htmlwidgets 1.6.1 2023-01-07 [2] RSPM (R 4.2.0)
httpuv 1.6.9 2023-02-14 [2] RSPM (R 4.2.2)
httr 1.4.4 2022-08-17 [2] RSPM (R 4.2.0)
jquerylib 0.1.4 2021-04-26 [2] RSPM (R 4.2.0)
jsonlite * 1.8.4 2022-12-06 [2] RSPM (R 4.2.0)
knitr * 1.42 2023-01-25 [2] RSPM (R 4.2.2)
later 1.3.0 2021-08-18 [2] RSPM (R 4.2.0)
lifecycle 1.0.3 2022-10-07 [2] RSPM (R 4.2.0)
lubridate * 1.9.0 2022-11-06 [2] RSPM (R 4.2.0)
magrittr 2.0.3 2022-03-30 [2] RSPM (R 4.2.0)
memoise 2.0.1 2021-11-26 [2] RSPM (R 4.2.0)
mime 0.12 2021-09-28 [2] RSPM (R 4.2.0)
miniUI 0.1.1.1 2018-05-18 [2] RSPM (R 4.2.0)
munsell 0.5.0 2018-06-12 [2] RSPM (R 4.2.0)
pillar 1.8.1 2022-08-19 [2] RSPM (R 4.2.0)
pkgbuild 1.4.0 2022-11-27 [2] RSPM (R 4.2.0)
pkgconfig 2.0.3 2019-09-22 [2] RSPM (R 4.2.0)
pkgload 1.3.2 2022-11-16 [2] RSPM (R 4.2.0)
prettyunits 1.1.1 2020-01-24 [2] RSPM (R 4.2.0)
processx 3.8.0 2022-10-26 [2] RSPM (R 4.2.0)
profvis 0.3.7 2020-11-02 [2] RSPM (R 4.2.0)
promises 1.2.0.1 2021-02-11 [2] RSPM (R 4.2.0)
ps 1.7.2 2022-10-26 [2] RSPM (R 4.2.0)
purrr * 1.0.1 2023-01-10 [2] RSPM (R 4.2.2)
R6 2.5.1 2021-08-19 [2] RSPM (R 4.2.0)
Rcpp 1.0.10 2023-01-22 [2] RSPM (R 4.2.2)
readr * 2.1.3 2022-10-01 [2] RSPM (R 4.2.0)
remotes 2.4.2 2021-11-30 [2] RSPM (R 4.2.0)
rlang 1.0.6 2022-09-24 [2] RSPM (R 4.2.0)
rmarkdown 2.20 2023-01-19 [2] RSPM (R 4.2.2)
rprojroot 2.0.3 2022-04-02 [2] RSPM (R 4.2.0)
rstudioapi 0.14 2022-08-22 [2] RSPM (R 4.2.0)
sass 0.4.5 2023-01-24 [2] RSPM (R 4.2.2)
scales 1.2.1 2022-08-20 [2] RSPM (R 4.2.0)
sessioninfo 1.2.2 2021-12-06 [2] RSPM (R 4.2.0)
shiny 1.7.4 2022-12-15 [2] RSPM (R 4.2.0)
stringi 1.7.12 2023-01-11 [2] RSPM (R 4.2.2)
stringr * 1.5.0 2022-12-02 [2] RSPM (R 4.2.0)
tibble * 3.1.8 2022-07-22 [2] RSPM (R 4.2.0)
tidyr * 1.3.0 2023-01-24 [2] RSPM (R 4.2.2)
tidyselect 1.2.0 2022-10-10 [2] RSPM (R 4.2.0)
tidyverse * 1.3.2.9000 2023-02-21 [2] Github (tidyverse/tidyverse@d4a33f6)
timechange * 0.1.1 2022-11-04 [2] RSPM (R 4.2.0)
tzdb 0.3.0 2022-03-28 [2] RSPM (R 4.2.0)
urlchecker 1.0.1 2021-11-30 [2] RSPM (R 4.2.0)
usethis 2.1.6 2022-05-25 [2] RSPM (R 4.2.0)
utf8 1.2.3 2023-01-31 [2] RSPM (R 4.2.2)
vctrs 0.5.2 2023-01-23 [2] RSPM (R 4.2.2)
vroom 1.6.0 2022-09-30 [2] RSPM (R 4.2.0)
whisker 0.4.1 2022-12-05 [2] RSPM (R 4.2.0)
withr 2.5.0 2022-03-03 [2] RSPM (R 4.2.0)
workflowr * 1.7.0 2021-12-21 [2] RSPM (R 4.2.0)
xfun 0.37 2023-01-31 [2] RSPM (R 4.2.2)
xtable 1.8-4 2019-04-21 [2] RSPM (R 4.2.0)
yaml 2.3.7 2023-01-23 [2] RSPM (R 4.2.2)
[1] /home/etretiakov/R/x86_64-pc-linux-gnu-library/4.2
[2] /opt/R/4.2.2/lib/R/library
──────────────────────────────────────────────────────────────────────────────
sessionInfo()
R version 4.2.2 (2022-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04.1 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] lubridate_1.9.0 timechange_0.1.1 forcats_0.5.2
[4] stringr_1.5.0 dplyr_1.1.0 purrr_1.0.1
[7] readr_2.1.3 tidyr_1.3.0 tibble_3.1.8
[10] ggplot2_3.4.1 tidyverse_1.3.2.9000 jsonlite_1.8.4
[13] knitr_1.42 glue_1.6.2 workflowr_1.7.0
loaded via a namespace (and not attached):
[1] Rcpp_1.0.10 here_1.0.1 prettyunits_1.1.1 getPass_0.2-2
[5] ps_1.7.2 rprojroot_2.0.3 digest_0.6.31 utf8_1.2.3
[9] mime_0.12 R6_2.5.1 evaluate_0.20 httr_1.4.4
[13] pillar_1.8.1 rlang_1.0.6 rstudioapi_0.14 miniUI_0.1.1.1
[17] urlchecker_1.0.1 whisker_0.4.1 callr_3.7.3 jquerylib_0.1.4
[21] rmarkdown_2.20 devtools_2.4.5 htmlwidgets_1.6.1 bit_4.0.5
[25] munsell_0.5.0 shiny_1.7.4 compiler_4.2.2 httpuv_1.6.9
[29] xfun_0.37 pkgconfig_2.0.3 pkgbuild_1.4.0 htmltools_0.5.4
[33] tidyselect_1.2.0 fansi_1.0.4 crayon_1.5.2 tzdb_0.3.0
[37] withr_2.5.0 later_1.3.0 grid_4.2.2 xtable_1.8-4
[41] gtable_0.3.1 lifecycle_1.0.3 git2r_0.30.1 magrittr_2.0.3
[45] scales_1.2.1 cli_3.6.0 stringi_1.7.12 vroom_1.6.0
[49] cachem_1.0.6 remotes_2.4.2 fs_1.6.1 promises_1.2.0.1
[53] bslib_0.4.2 ellipsis_0.3.2 generics_0.1.3 vctrs_0.5.2
[57] tools_4.2.2 bit64_4.0.5 hms_1.1.2 pkgload_1.3.2
[61] processx_3.8.0 parallel_4.2.2 fastmap_1.1.0 yaml_2.3.7
[65] colorspace_2.1-0 sessioninfo_1.2.2 memoise_2.0.1 usethis_2.1.6
[69] profvis_0.3.7 sass_0.4.5