Last updated: 2023-03-14
Checks: 6 1
Knit directory: Hevesi_2023/
This reproducible R Markdown analysis was created with workflowr (version 1.7.0). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.
The R Markdown file has unstaged changes. To know which version of
the R Markdown file created these results, you’ll want to first commit
it to the Git repo. If you’re still working on the analysis, you can
ignore this warning. When you’re finished, you can run
wflow_publish
to commit the R Markdown file and build the
HTML.
Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.
The command set.seed(20230121)
was run prior to running
the code in the R Markdown file. Setting a seed ensures that any results
that rely on randomness, e.g. subsampling or permutations, are
reproducible.
Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.
Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.
Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.
The results in this page were generated with repository version 98a2fc9. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.
Note that you need to be careful to ensure that all relevant files for
the analysis have been committed to Git prior to generating the results
(you can use wflow_publish
or
wflow_git_commit
). workflowr only checks the R Markdown
file, but you know if there are other scripts or data files that it
depends on. Below is the status of the Git repository when the results
were generated:
Ignored files:
Ignored: .DS_Store
Ignored: .Rhistory
Ignored: .Rproj.user/
Ignored: .cache/
Ignored: .config/
Ignored: .nv/
Ignored: .snakemake/
Ignored: cellbender/
Ignored: cellbender_latest.sif
Ignored: cellranger/
Ignored: data/Pr5P7_clusters.h5Seurat
Ignored: data/Pr5P7_clusters.h5ad
Ignored: data/THP7_clusters.h5Seurat
Ignored: data/THP7_clusters.h5ad
Ignored: data/neuro_fin-THP7.h5seurat
Ignored: data/neuro_fin.h5seurat
Ignored: fastq/
Ignored: mm10_optimized/
Ignored: souporcell/
Ignored: souporcell_latest.sif
Unstaged changes:
Modified: analysis/methods.Rmd
Modified: data/references/references.bib
Modified: output/figures/Pr5P7_sex_umap.pdf
Modified: output/figures/Pr5P7_stress_umap.pdf
Modified: output/figures/Pr5P7_top5_umap.pdf
Modified: output/figures/THP7_sex_umap.pdf
Modified: output/figures/THP7_stress_umap.pdf
Modified: output/figures/THP7_top5_umap.pdf
Modified: output/figures/combined_sex_umap.pdf
Modified: output/figures/combined_stress_umap.pdf
Modified: output/figures/combined_top5_umap.pdf
Modified: output/tables/hevesi2023-all-mrk_logreg-sct_combined.csv
Modified: output/tables/hevesi2023-all-mrk_wilcox-sct_combined.csv
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
These are the previous versions of the repository in which changes were
made to the R Markdown (analysis/methods.Rmd
) and HTML
(docs/methods.html
) files. If you’ve configured a remote
Git repository (see ?wflow_git_remote
), click on the
hyperlinks in the table below to view the files as they were in that
past version.
File | Version | Author | Date | Message |
---|---|---|---|---|
html | 98a2fc9 | Evgenii O. Tretiakov | 2023-03-14 | Build site. |
Rmd | a7f002d | EugOT | 2023-03-14 | update bib |
Rmd | edcb936 | Evgenii O. Tretiakov | 2023-03-14 | update methods |
Rmd | 3b155ad | Evgenii O. Tretiakov | 2023-03-05 | add figures code |
# Presentation
library("glue")
library("knitr")
# JSON
library("jsonlite")
# Tidyverse
library("tidyverse")
dir.create(here::here("output", DOCNAME), showWarnings = FALSE)
write_bib(c("base", "Seurat", "SeuratWrappers", "SeuratDisk", "sctransform",
"glmGamPoi", "patchwork", "scCustomize", "Nebulosa", "clustree",
"mrtree", "gprofiler2", "cowplot", "UpSetR", "ggstatsplot",
"gridExtra", "tidyverse", "dplyr", "tidyr", "magrittr", "stringr",
"skimr", "future", "purrr", "here", "workflowr", "zeallot", "knitr",
"kableExtra", "rmarkdown", "reticulate"),
file = here::here("output", DOCNAME, "packages.bib"))
versions <- list(
biomaRt = packageVersion("biomaRt"),
cellbender = "docker://etretiakov/cellbender:v0.0.1",
cellranger = "7.1.0",
cellranger_ref = "mm10_optimized_v.1.0",
clustree = packageVersion("clustree"),
cowplot = packageVersion("cowplot"),
dplyr = packageVersion("dplyr"),
future = packageVersion("future"),
ggplot2 = packageVersion("ggplot2"),
ggstatsplot = packageVersion("ggstatsplot"),
glmGamPoi = packageVersion("glmGamPoi"),
gprofiler2 = packageVersion("gprofiler2"),
gridExtra = packageVersion("gridExtra"),
here = packageVersion("here"),
kableExtra = packageVersion("kableExtra"),
knitr = packageVersion("knitr"),
magrittr = packageVersion("magrittr"),
mrtree = packageVersion("mrtree"),
Nebulosa = packageVersion("Nebulosa"),
pandoc = rmarkdown::pandoc_version(),
patchwork = packageVersion("patchwork"),
purrr = packageVersion("purrr"),
python = "3.8.8",
R = str_extract(R.version.string, "[0-9\\.]+"),
reticulate = packageVersion("reticulate"),
rmarkdown = packageVersion("rmarkdown"),
scCustomize = packageVersion("scCustomize"),
sctransform = packageVersion("sctransform"),
Seurat = packageVersion("Seurat"),
skimr = packageVersion("skimr"),
stringr = packageVersion("stringr"),
Snakemake = "7.21.0",
souporcell = "shub://wheaton5/souporcell",
tidyr = packageVersion("tidyr"),
tidyverse = packageVersion("tidyverse"),
UpSetR = packageVersion("UpSetR"),
viridis = packageVersion("viridis"),
workflowr = packageVersion("workflowr"),
zeallot = packageVersion("zeallot")
)
The Cell Ranger pipeline (v7.1.0) (Zheng et al. 2017) was used to perform sample demultiplexing, barcode processing and single-nuclei gene counting. Briefly, samples were demultiplexed to produce a pair of FASTQ files for each sample. Reads containing sequence information were aligned using the optimised mouse genome reference (vmm10_optimized_v.1.0) provided by Pool’s lab based on the default Cell Ranger mm10 genome version 2020-A that was cleared from gene overlaps, poorly annotated exons and 3’-UTRs and intergenic fragments (Pool et al. 2022). PCR duplicates were removed by selecting unique combinations of cell barcodes, unique molecular identifiers (UMIs) and gene ids with the final results being a gene expression matrix that was used for further analysis.
The droplet selection method of Cell Ranger identified 923 nuclei in ventrobasal thalamus and in principal sensory trigeminal nucleus - 731 nuclei based on EmptyDrops method (Lun et al. 2019) incorporated into cellranger count pipeline.
Using those values as the expected number of cells, we applied a neural network-based approach called CellBender (docker://etretiakov/cellbender:v0.0.1) (Fleming et al. 2022). We set a false positive rate threshold at the level of 0.01 and set the neural network to learn over 150 epochs with a total of 5000 droplets included based on knee plots (please see online supplementary Cell Ranger reports).
We quantified log probability to be a doublet for every cell based on apriori knowledge of genotypes from each input sample and called variant occurrence frequencies that allowed to cluster nuclei to source organism or classify as a doublet (Heaton et al. 2020) (see shub://wheaton5/souporcell).
Gene annotation information was added using the gprofiler2 package
(v0.2.1) (Reimand et al. 2016); thus we
filter cells based on high content of mitochondrial, ribosomal or
hemoglobin proteins genes, specifically thresholds set as 1%, 1% and
0.5%; additionally pseudogenes and poorly annotated genes were also
deleted from count matrix. Moreover, cells of low complexity were
filtered out as (\(\log_{10}Genes/\log_{10}UMI
< 0.8\)). Therefore, cells were assigned cell cycle scores
using the CellCycleScoring
function in the Seurat package
(v4.3.0).
We used the selection method in the Seurat package (v4.3.0) (Satija et al. 2015; Stuart and Satija 2019), which uses a modern variance stabilising transformation statistical technic that utilises scaling to person residuals (Hafemeister and Satija 2019). That way, we selected 3000 highly variable genes per dataset and regressed out complexity and cell-cycle variability prior to the final scaling of filtered matrixes.
We performed Leiden algorithm graph-based clustering. PCA was
performed using the selected genes and the jackknife tested (Chung and Storey 2015) principal components (we
tested the significance of feature for randomly picked 1% of data over
1000 iterations; see PCScore
function in
functions.R
script of code directory) were used to
construct a shared nearest neighbour graph using the overlap between the
15
nearest neighbours of each cell. Leiden modularity
optimisation (Traag, Waltman, and Eck
2019) was used to partition this graph with an array of
resolution parameters where 30 modularity events were sampled between
0.2
and 2.5
. Clustering tree visualisations
(Zappia and Oshlack 2018) were produced
using the clustree package (v0.5.0) showing the resolution of previously
identified clusters. By inspecting these resolutions reconcile tree
produced by mrtree package (v0.0.0.9000) (Peng et
al. 2021) and calculating adjusted multi-resolution Rand index
chosen as maximum value if there is no higher modularity within
0.05
AMRI difference (see SelectResolution
in
function.R
file of code directory).
Marker genes for each cluster were identified using logreg test (Ntranos et al. 2019) implemented in Seurat
framework (v) (Stuart and Satija 2019).
Genes were considered significant markers for a cluster if they had an
FDR less than 0.001
. Identities were assigned to each
cluster by comparing the detected genes to previously published markers
and our own validation experiments.
Visualisations and figures were primarily created using the ggplot2 (v3.4.1), cowplot (v1.1.1) (Wilke 2020) and patchwork (v1.1.2.9000) packages using the viridis colour palette (v0.6.2) for continuous data. UpSet plots (Conway, Lex, and Gehlenborg 2017) were produced using the UpSetR package (v1.4.0) (Gehlenborg 2019) with help from the gridExtra package (v2.3) (Auguie 2017). Data manipulation was performed using other packages in the tidyverse (v1.3.2.9000) (Wickham 2023) particularly dplyr (v1.1.0) (Wickham et al. 2023), tidyr (v1.1.0) (Wickham, Vaughan, and Girlich 2023) and purrr (v1.0.1) (Wickham and Henry 2023). The analysis project was managed using the Snakemake system (v ) (Mölder et al. 2021) and the workflowr (v1.7.0) (Blischak, Carbonetto, and Stephens 2021) package which was also used to produce the publicly available website displaying the analysis code, results and output. Reproducible reports were produced using knitr (v1.42) (Xie 2023) and R Markdown (v2.20) (Allaire et al. 2023) and converted to HTML using Pandoc (v2.19.2).
versions <- purrr::map(versions, as.character)
versions <- jsonlite::toJSON(versions, pretty = TRUE)
readr::write_lines(versions,
here::here("output", DOCNAME, "package-versions.json"))
sessionInfo()
R version 4.2.2 (2022-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04.1 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] lubridate_1.9.0 timechange_0.1.1 forcats_0.5.2
[4] stringr_1.5.0 dplyr_1.1.0 purrr_1.0.1
[7] readr_2.1.3 tidyr_1.3.0 tibble_3.1.8
[10] ggplot2_3.4.1 tidyverse_1.3.2.9000 jsonlite_1.8.4
[13] knitr_1.42 glue_1.6.2
loaded via a namespace (and not attached):
[1] tidyselect_1.2.0 xfun_0.37 bslib_0.4.2 colorspace_2.1-0
[5] vctrs_0.5.2 generics_0.1.3 htmltools_0.5.4 yaml_2.3.7
[9] utf8_1.2.3 rlang_1.0.6 jquerylib_0.1.4 later_1.3.0
[13] pillar_1.8.1 withr_2.5.0 bit64_4.0.5 lifecycle_1.0.3
[17] munsell_0.5.0 gtable_0.3.1 workflowr_1.7.0 evaluate_0.20
[21] tzdb_0.3.0 fastmap_1.1.0 httpuv_1.6.9 parallel_4.2.2
[25] fansi_1.0.4 Rcpp_1.0.10 promises_1.2.0.1 scales_1.2.1
[29] cachem_1.0.6 vroom_1.6.0 bit_4.0.5 fs_1.6.1
[33] hms_1.1.2 digest_0.6.31 stringi_1.7.12 rprojroot_2.0.3
[37] grid_4.2.2 here_1.0.1 cli_3.6.0 tools_4.2.2
[41] magrittr_2.0.3 sass_0.4.5 crayon_1.5.2 whisker_0.4.1
[45] pkgconfig_2.0.3 ellipsis_0.3.2 rmarkdown_2.20 rstudioapi_0.14
[49] R6_2.5.1 git2r_0.30.1 compiler_4.2.2