• Dependencies
  • Differential abundance testing with Milo
    • Data
    • Pre-processing
    • Milo object
    • Construct KNN graph
    • Defining representative neighbourhoods
    • Counting cells in neighbourhoods
    • Differential abundance testing
    • Visualise neighbourhoods displaying DA
  • Mouse gastrulation example
    • Data
    • Visualize the data
    • Create a Milo object
    • Construct KNN graph
    • Defining representative neighbourhoods on the KNN graph
    • Counting cells in neighbourhoods
    • Defining experimental design
    • Computing neighbourhood connectivity
    • Testing
    • Inspecting DA testing results
    • Finding markers of DA populations
    • Automatic grouping of neighbourhoods
    • Finding gene signatures for neighbourhoods
    • Visualize detected markers
    • DGE testing within a group

Last updated: 2025-01-10

Checks: 7 0

Knit directory: muse/

This reproducible R Markdown analysis was created with workflowr (version 1.7.1). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.


Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

The command set.seed(20200712) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version 803a1f5. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/
    Ignored:    data/pbmc3k.csv
    Ignored:    data/pbmc3k.csv.gz
    Ignored:    data/pbmc3k/
    Ignored:    r_packages_4.4.0/
    Ignored:    r_packages_4.4.1/

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.


These are the previous versions of the repository in which changes were made to the R Markdown (analysis/miloR.Rmd) and HTML (docs/miloR.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File Version Author Date Message
Rmd 803a1f5 Dave Tang 2025-01-10 Mouse gastrulation example
html 5477e49 Dave Tang 2025-01-10 Build site.
Rmd 28f42c0 Dave Tang 2025-01-10 Update notebook
html 710ddf2 Dave Tang 2025-01-10 Build site.
Rmd 2dd7fac Dave Tang 2025-01-10 DA testing with miloR

miloR:

Milo is a tool for analysis of complex single cell datasets generated from replicated multi-condition experiments, which detects changes in composition between conditions. While differential abundance (DA) is commonly quantified in discrete cell clusters, Milo uses partially overlapping neighbourhoods of cells on a KNN graph. Starting from a graph that faithfully recapitulates the biology of the cell population, Milo analysis consists of 3 steps:

Sampling of representative neighbourhoods Testing for differential abundance of conditions in all neighbourhoods Accounting for multiple hypothesis testing using a weighted FDR procedure that accounts for the overlap of neighbourhoods

Dependencies

Install Bioconductor packages using BiocManager::install().

if (!require("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("SingleCellExperiment")
BiocManager::install("scran")
BiocManager::install("scater")
BiocManager::install("miloR")
BiocManager::install("MouseGastrulationData")

install.packages('dplyr')
install.packages('patchwork')

Load libraries.

suppressPackageStartupMessages(library(miloR))
suppressPackageStartupMessages(library(SingleCellExperiment))
suppressPackageStartupMessages(library(scater))
suppressPackageStartupMessages(library(scran))
suppressPackageStartupMessages(library(dplyr))
suppressPackageStartupMessages(library(patchwork))
suppressPackageStartupMessages(library(ggplot2))
suppressPackageStartupMessages(library(MouseGastrulationData))

Differential abundance testing with Milo

Vignette.

Data

Load testing data.

data("sim_trajectory", package = "miloR")

## Extract SingleCellExperiment object
traj_sce <- sim_trajectory[['SCE']]

## Extract sample metadata to use for testing
traj_meta <- sim_trajectory[["meta"]]

## Add metadata to colData slot
colData(traj_sce) <- DataFrame(traj_meta)
colnames(traj_sce) <- colData(traj_sce)$cell_id

redim <- reducedDim(traj_sce, "PCA")
dimnames(redim) <- list(colnames(traj_sce), paste0("PC", c(1:50)))
reducedDim(traj_sce, "PCA") <- redim 

Sample and conditions.

table(colData(traj_sce)$Sample)

A_R1 A_R2 A_R3 B_R1 B_R2 B_R3 
  46   42   58  103  107  144 

Pre-processing

set.seed(1984)
logcounts(traj_sce) <- log(counts(traj_sce) + 1)
traj_sce <- runPCA(traj_sce, ncomponents=30)
traj_sce <- runUMAP(traj_sce)

cbind(
  colData(traj_sce),
  reducedDim(traj_sce, "UMAP")
) |>
  ggplot(aes(UMAP1, UMAP2, colour = Condition)) +
  geom_point() +
  theme_minimal() -> my_umap

my_umap

Version Author Date
5477e49 Dave Tang 2025-01-10
710ddf2 Dave Tang 2025-01-10

Milo object

traj_milo <- Milo(traj_sce)
reducedDim(traj_milo, "UMAP") <- reducedDim(traj_sce, "UMAP")

traj_milo
class: Milo 
dim: 500 500 
metadata(0):
assays(2): counts logcounts
rownames(500): G1 G2 ... G499 G500
rowData names(0):
colnames(500): C1 C2 ... C499 C500
colData names(5): cell_id group_id Condition Replicate Sample
reducedDimNames(2): PCA UMAP
mainExpName: NULL
altExpNames(0):
nhoods dimensions(2): 1 1
nhoodCounts dimensions(2): 1 1
nhoodDistances dimension(1): 0
graph names(0):
nhoodIndex names(1): 0
nhoodExpression dimension(2): 1 1
nhoodReducedDim names(0):
nhoodGraph names(0):
nhoodAdjacency dimension(2): 1 1

Construct KNN graph

traj_milo <- buildGraph(traj_milo, k = 10, d = 30)
Constructing kNN graph with k:10

Defining representative neighbourhoods

traj_milo <- makeNhoods(traj_milo, prop = 0.1, k = 10, d=30, refined = TRUE)
Checking valid object
Running refined sampling with reduced_dim
plotNhoodSizeHist(traj_milo)

Version Author Date
5477e49 Dave Tang 2025-01-10
710ddf2 Dave Tang 2025-01-10

Counting cells in neighbourhoods

traj_milo <- countCells(traj_milo, meta.data = data.frame(colData(traj_milo)), samples="Sample")
Checking meta.data validity
Counting cells in neighbourhoods
head(nhoodCounts(traj_milo))
6 x 6 sparse Matrix of class "dgCMatrix"
  B_R1 A_R1 A_R2 B_R2 B_R3 A_R3
1   15    .    1   22   29    1
2    8    .    .   14   28    1
3    9    6    7    7    6   10
4   13    2    1   19   21    1
5    5    6    4    8    6    7
6    6    7    5   12    8   11

Differential abundance testing

traj_design <- data.frame(colData(traj_milo))[,c("Sample", "Condition")]
traj_design <- distinct(traj_design)
rownames(traj_design) <- traj_design$Sample
## Reorder rownames to match columns of nhoodCounts(milo)
traj_design <- traj_design[colnames(nhoodCounts(traj_milo)), , drop=FALSE]

traj_design
     Sample Condition
B_R1   B_R1         B
A_R1   A_R1         A
A_R2   A_R2         A
B_R2   B_R2         B
B_R3   B_R3         B
A_R3   A_R3         A
traj_milo <- calcNhoodDistance(traj_milo, d=30)
'as(<dgTMatrix>, "dgCMatrix")' is deprecated.
Use 'as(., "CsparseMatrix")' instead.
See help("Deprecated") and help("Matrix-deprecated").
da_results <- testNhoods(traj_milo, design = ~ Condition, design.df = traj_design)
Using TMM normalisation
Running with model contrasts
Performing spatial FDR correction with k-distance weighting
da_results %>%
  dplyr::arrange(SpatialFDR) %>%
  head() 
       logFC   logCPM        F       PValue          FDR Nhood   SpatialFDR
1   3.886596 16.30369 33.87551 6.099827e-08 1.646953e-06     1 1.611592e-06
2   4.337415 15.95908 29.07187 4.140178e-07 5.589241e-06     2 5.504741e-06
4   2.655301 16.09039 19.26657 2.658333e-05 2.392499e-04     4 2.371232e-04
8   2.897422 15.53655 20.75784 6.887852e-05 4.649300e-04     8 4.627806e-04
24 -1.527532 15.73713 29.71687 5.308236e-04 2.866447e-03    24 2.872952e-03
23  2.278126 15.50551 10.21171 1.830055e-03 8.235247e-03    23 8.283248e-03

Visualise neighbourhoods displaying DA

traj_milo <- buildNhoodGraph(traj_milo)

my_umap + plotNhoodGraphDA(traj_milo, da_results, alpha=0.05) +
  plot_layout(guides="collect")
Adding nhood effect sizes to neighbourhood graph attributes

Version Author Date
5477e49 Dave Tang 2025-01-10
710ddf2 Dave Tang 2025-01-10

Mouse gastrulation example

Vignette.

Data

select_samples <- c(2,  3,  6, 4, #15,
                    # 19, 
                    10, 14#, 20 #30
                    #31, 32
                    )
embryo_data = EmbryoAtlasData(samples = select_samples)
see ?MouseGastrulationData and browseVignettes('MouseGastrulationData') for documentation
loading from cache
see ?MouseGastrulationData and browseVignettes('MouseGastrulationData') for documentation
loading from cache
see ?MouseGastrulationData and browseVignettes('MouseGastrulationData') for documentation
loading from cache
see ?MouseGastrulationData and browseVignettes('MouseGastrulationData') for documentation
loading from cache
see ?MouseGastrulationData and browseVignettes('MouseGastrulationData') for documentation
loading from cache
see ?MouseGastrulationData and browseVignettes('MouseGastrulationData') for documentation
loading from cache
see ?MouseGastrulationData and browseVignettes('MouseGastrulationData') for documentation
loading from cache
see ?MouseGastrulationData and browseVignettes('MouseGastrulationData') for documentation
loading from cache
see ?MouseGastrulationData and browseVignettes('MouseGastrulationData') for documentation
loading from cache
see ?MouseGastrulationData and browseVignettes('MouseGastrulationData') for documentation
loading from cache
see ?MouseGastrulationData and browseVignettes('MouseGastrulationData') for documentation
loading from cache
see ?MouseGastrulationData and browseVignettes('MouseGastrulationData') for documentation
loading from cache
see ?MouseGastrulationData and browseVignettes('MouseGastrulationData') for documentation
loading from cache
see ?MouseGastrulationData and browseVignettes('MouseGastrulationData') for documentation
loading from cache
see ?MouseGastrulationData and browseVignettes('MouseGastrulationData') for documentation
loading from cache
see ?MouseGastrulationData and browseVignettes('MouseGastrulationData') for documentation
loading from cache
see ?MouseGastrulationData and browseVignettes('MouseGastrulationData') for documentation
loading from cache
see ?MouseGastrulationData and browseVignettes('MouseGastrulationData') for documentation
loading from cache
see ?MouseGastrulationData and browseVignettes('MouseGastrulationData') for documentation
loading from cache
see ?MouseGastrulationData and browseVignettes('MouseGastrulationData') for documentation
loading from cache
see ?MouseGastrulationData and browseVignettes('MouseGastrulationData') for documentation
loading from cache
see ?MouseGastrulationData and browseVignettes('MouseGastrulationData') for documentation
loading from cache
see ?MouseGastrulationData and browseVignettes('MouseGastrulationData') for documentation
loading from cache
see ?MouseGastrulationData and browseVignettes('MouseGastrulationData') for documentation
loading from cache
see ?MouseGastrulationData and browseVignettes('MouseGastrulationData') for documentation
loading from cache
embryo_data
class: SingleCellExperiment 
dim: 29452 7558 
metadata(0):
assays(1): counts
rownames(29452): ENSMUSG00000051951 ENSMUSG00000089699 ...
  ENSMUSG00000096730 ENSMUSG00000095742
rowData names(2): ENSEMBL SYMBOL
colnames(7558): cell_361 cell_362 ... cell_29013 cell_29014
colData names(17): cell barcode ... colour sizeFactor
reducedDimNames(2): pca.corrected umap
mainExpName: NULL
altExpNames(0):

Visualize the data

embryo_data <- embryo_data[,apply(reducedDim(embryo_data, "pca.corrected"), 1, function(x) !all(is.na(x)))]
embryo_data <- runUMAP(embryo_data, dimred = "pca.corrected", name = 'umap')

plotReducedDim(embryo_data, colour_by="stage", dimred = "umap") 

Create a Milo object

embryo_milo <- Milo(embryo_data)
embryo_milo
class: Milo 
dim: 29452 6875 
metadata(0):
assays(1): counts
rownames(29452): ENSMUSG00000051951 ENSMUSG00000089699 ...
  ENSMUSG00000096730 ENSMUSG00000095742
rowData names(2): ENSEMBL SYMBOL
colnames(6875): cell_361 cell_362 ... cell_29013 cell_29014
colData names(17): cell barcode ... colour sizeFactor
reducedDimNames(2): pca.corrected umap
mainExpName: NULL
altExpNames(0):
nhoods dimensions(2): 1 1
nhoodCounts dimensions(2): 1 1
nhoodDistances dimension(1): 0
graph names(0):
nhoodIndex names(1): 0
nhoodExpression dimension(2): 1 1
nhoodReducedDim names(0):
nhoodGraph names(0):
nhoodAdjacency dimension(2): 1 1

Construct KNN graph

embryo_milo <- buildGraph(embryo_milo, k = 30, d = 30, reduced.dim = "pca.corrected")
Constructing kNN graph with k:30

Defining representative neighbourhoods on the KNN graph

embryo_milo <- makeNhoods(embryo_milo, prop = 0.1, k = 30, d=30, refined = TRUE, reduced_dims = "pca.corrected")
Checking valid object
Running refined sampling with reduced_dim
plotNhoodSizeHist(embryo_milo)

Counting cells in neighbourhoods

embryo_milo <- countCells(embryo_milo, meta.data = as.data.frame(colData(embryo_milo)), sample="sample")
Checking meta.data validity
Counting cells in neighbourhoods
head(nhoodCounts(embryo_milo))
6 x 6 sparse Matrix of class "dgCMatrix"
   2 3  6 4 10 14
1  . 2 24 2 11  .
2  1 4 48 7 25  6
3  3 4 87 5 15 11
4  . .  5 . 90 19
5  . . 27 . 47  7
6 11 5 26 2  .  .

Defining experimental design

embryo_design <- data.frame(colData(embryo_milo))[,c("sample", "stage", "sequencing.batch")]

## Convert batch info from integer to factor
embryo_design$sequencing.batch <- as.factor(embryo_design$sequencing.batch) 
embryo_design <- distinct(embryo_design)
rownames(embryo_design) <- embryo_design$sample

embryo_design
   sample stage sequencing.batch
2       2  E7.5                1
3       3  E7.5                1
6       6  E7.5                1
4       4  E7.5                1
10     10  E7.0                1
14     14  E7.0                2

Computing neighbourhood connectivity

embryo_milo <- calcNhoodDistance(embryo_milo, d=30, reduced.dim = "pca.corrected")

Testing

da_results <- testNhoods(embryo_milo, design = ~ sequencing.batch + stage, design.df = embryo_design, reduced.dim="pca.corrected")
Using TMM normalisation
Running with model contrasts
Performing spatial FDR correction with k-distance weighting
da_results %>%
  arrange(SpatialFDR) %>%
  head() 
        logFC   logCPM        F       PValue          FDR Nhood   SpatialFDR
36  -8.702572 11.38854 51.62516 1.091251e-12 5.063407e-10    36 3.911834e-10
377 -8.791174 11.72929 47.66587 7.658840e-12 1.776851e-09   377 1.394183e-09
314 -7.170721 11.85656 45.98225 1.758371e-11 2.039710e-09   314 1.593529e-09
347 -8.426464 11.27041 46.53411 1.338823e-11 2.039710e-09   347 1.593529e-09
52  -7.838842 12.18774 44.65865 3.383398e-11 3.139793e-09    52 2.453204e-09
223 -6.168287 11.96552 42.75561 8.685459e-11 6.566003e-09   223 5.087165e-09

Inspecting DA testing results

ggplot(da_results, aes(PValue)) + geom_histogram(bins=50)

ggplot(da_results, aes(logFC, -log10(SpatialFDR))) + 
  geom_point() +
  geom_hline(yintercept = 1) ## Mark significance threshold (10% FDR)

embryo_milo <- buildNhoodGraph(embryo_milo)

## Plot single-cell UMAP
umap_pl <- plotReducedDim(embryo_milo, dimred = "umap", colour_by="stage", text_by = "celltype", 
                          text_size = 3, point_size=0.5) +
  guides(fill="none")

## Plot neighbourhood graph
nh_graph_pl <- plotNhoodGraphDA(embryo_milo, da_results, layout="umap",alpha=0.1) 
Adding nhood effect sizes to neighbourhood graph attributes
umap_pl + nh_graph_pl +
  plot_layout(guides="collect")
Warning: ggrepel: 19 unlabeled data points (too many overlaps). Consider
increasing max.overlaps

da_results <- annotateNhoods(embryo_milo, da_results, coldata_col = "celltype")
Converting celltype to factor...
head(da_results)
       logFC   logCPM          F       PValue          FDR Nhood   SpatialFDR
1 -0.7073196 10.38623  0.5596000 4.545478e-01 5.131635e-01     1 5.170902e-01
2 -0.6852005 11.51879  0.6659869 4.145939e-01 4.749915e-01     2 4.773720e-01
3  0.4505361 11.73195  0.2354735 6.275710e-01 6.694091e-01     3 6.705023e-01
4 -6.0397103 12.02021 38.5362677 7.078557e-10 2.189634e-08     4 1.690890e-08
5 -3.1492070 11.32228 16.2297553 5.913724e-05 1.932372e-04     5 1.756735e-04
6  6.7795549 10.89194 10.2510770 1.396926e-03 2.842866e-03     6 2.831445e-03
           celltype celltype_fraction
1 Parietal endoderm          1.000000
2  Primitive Streak          0.956044
3      ExE ectoderm          1.000000
4          Epiblast          0.754386
5      ExE ectoderm          1.000000
6   Caudal epiblast          0.750000
ggplot(da_results, aes(celltype_fraction)) + geom_histogram(bins=50)

da_results$celltype <- ifelse(da_results$celltype_fraction < 0.7, "Mixed", da_results$celltype)
plotDAbeeswarm(da_results, group.by = "celltype")
Converting group_by to factor...

Finding markers of DA populations

## Add log normalized count to Milo object
embryo_milo <- logNormCounts(embryo_milo)

da_results$NhoodGroup <- as.numeric(da_results$SpatialFDR < 0.1 & da_results$logFC < 0)
da_nhood_markers <- findNhoodGroupMarkers(embryo_milo, da_results, subset.row = rownames(embryo_milo)[1:10])
Warning: Zero sample variances detected, have been offset away from zero
Warning: Zero sample variances detected, have been offset away from zero
head(da_nhood_markers)
              GeneID       logFC_0  adj.P.Val_0       logFC_1  adj.P.Val_1
1 ENSMUSG00000025900  9.299023e-05 1.000000e+00 -9.299023e-05 1.000000e+00
2 ENSMUSG00000025902  1.130534e-01 9.610514e-06 -1.130534e-01 9.610514e-06
3 ENSMUSG00000025903 -6.605353e-02 4.487599e-03  6.605353e-02 4.487599e-03
4 ENSMUSG00000033813 -6.175050e-02 8.928182e-03  6.175050e-02 8.928182e-03
5 ENSMUSG00000033845 -6.025957e-02 1.563741e-02  6.025957e-02 1.563741e-02
6 ENSMUSG00000051951  0.000000e+00 1.000000e+00  0.000000e+00 1.000000e+00
da_nhood_markers <- findNhoodGroupMarkers(embryo_milo, da_results, subset.row = rownames(embryo_milo)[1:10], 
                                          aggregate.samples = TRUE, sample_col = "sample")
Warning: Zero sample variances detected, have been offset away from zero
Warning: Zero sample variances detected, have been offset away from zero
head(da_nhood_markers)
              GeneID      logFC_0 adj.P.Val_0      logFC_1 adj.P.Val_1
1 ENSMUSG00000025900  0.001765224           1 -0.001765224           1
2 ENSMUSG00000025902 -0.205886718           1  0.205886718           1
3 ENSMUSG00000025903 -0.010020237           1  0.010020237           1
4 ENSMUSG00000033813 -0.050052716           1  0.050052716           1
5 ENSMUSG00000033845 -0.168226202           1  0.168226202           1
6 ENSMUSG00000051951  0.000000000           1  0.000000000           1

Automatic grouping of neighbourhoods

## Run buildNhoodGraph to store nhood adjacency matrix
embryo_milo <- buildNhoodGraph(embryo_milo)

## Find groups
da_results <- groupNhoods(embryo_milo, da_results, max.lfc.delta = 10)
Found 332 DA neighbourhoods at FDR 10%
nhoodAdjacency found - using for nhood grouping
head(da_results)
       logFC   logCPM          F       PValue          FDR Nhood   SpatialFDR
1 -0.7073196 10.38623  0.5596000 4.545478e-01 5.131635e-01     1 5.170902e-01
2 -0.6852005 11.51879  0.6659869 4.145939e-01 4.749915e-01     2 4.773720e-01
3  0.4505361 11.73195  0.2354735 6.275710e-01 6.694091e-01     3 6.705023e-01
4 -6.0397103 12.02021 38.5362677 7.078557e-10 2.189634e-08     4 1.690890e-08
5 -3.1492070 11.32228 16.2297553 5.913724e-05 1.932372e-04     5 1.756735e-04
6  6.7795549 10.89194 10.2510770 1.396926e-03 2.842866e-03     6 2.831445e-03
           celltype celltype_fraction NhoodGroup
1 Parietal endoderm          1.000000          1
2  Primitive Streak          0.956044          2
3      ExE ectoderm          1.000000          3
4          Epiblast          0.754386          4
5      ExE ectoderm          1.000000          3
6   Caudal epiblast          0.750000          5
plotNhoodGroups(embryo_milo, da_results, layout="umap") 

plotDAbeeswarm(da_results, "NhoodGroup")
Converting group_by to factor...

set.seed(42)
da_results <- groupNhoods(embryo_milo, da_results, max.lfc.delta = 10, overlap=1)
Found 332 DA neighbourhoods at FDR 10%
nhoodAdjacency found - using for nhood grouping
plotNhoodGroups(embryo_milo, da_results, layout="umap")

Finding gene signatures for neighbourhoods

## Exclude zero counts genes
keep.rows <- rowSums(logcounts(embryo_milo)) != 0
embryo_milo <- embryo_milo[keep.rows, ]

## Find HVGs
set.seed(101)
dec <- modelGeneVar(embryo_milo)
hvgs <- getTopHVGs(dec, n=2000)

# this vignette randomly fails to identify HVGs for some reason
if(!length(hvgs)){
    set.seed(42)
    dec <- modelGeneVar(embryo_milo)
    hvgs <- getTopHVGs(dec, n=2000)
}

head(hvgs)
[1] "ENSMUSG00000032083" "ENSMUSG00000095180" "ENSMUSG00000061808"
[4] "ENSMUSG00000002985" "ENSMUSG00000024990" "ENSMUSG00000024391"
set.seed(42)
nhood_markers <- findNhoodGroupMarkers(embryo_milo, da_results, subset.row = hvgs, 
                                       aggregate.samples = TRUE, sample_col = "sample")

head(nhood_markers)
              GeneID    logFC_1  adj.P.Val_1    logFC_2 adj.P.Val_2    logFC_3
1 ENSMUSG00000000031 -0.1957851 9.023691e-01 -1.6227811 0.131661210  1.0818991
2 ENSMUSG00000000078  1.3761092 4.763130e-07 -0.7577163 0.027746376 -0.1254065
3 ENSMUSG00000000088  0.8440478 1.708425e-02 -0.2230027 0.542175995 -0.2398095
4 ENSMUSG00000000125 -0.3215022 1.333035e-01  0.4961217 0.006035858 -0.1759920
5 ENSMUSG00000000149  0.1267912 5.214792e-01 -0.0666690 0.674509763 -0.1369532
6 ENSMUSG00000000184 -0.9431346 1.760344e-01  0.6236021 0.330231952 -0.9388169
  adj.P.Val_3    logFC_4 adj.P.Val_4     logFC_5 adj.P.Val_5     logFC_6
1  0.23946003 -1.6554796   0.1458825 -1.80398024   0.1388493 -0.89469039
2  0.67224856 -0.1319154   0.6784129 -0.15102950   0.6054605  0.02395376
3  0.41954806 -0.3931160   0.3057337 -0.32144387   0.3363805 -0.43803782
4  0.28562140 -0.1463749   0.4448630  0.06021375   0.7152295  0.13762273
5  0.28861385 -0.1423463   0.3819111 -0.10926021   0.4165005 -0.05889050
6  0.06449184 -0.3613323   0.5274175  0.97209298   0.1673039  1.77682169
   adj.P.Val_6      logFC_7  adj.P.Val_7    logFC_8 adj.P.Val_8
1 4.170448e-01  3.064364566 8.723146e-07  1.3586017   0.1234551
2 9.538828e-01  0.021188483 9.356372e-01  0.2273650   0.4628588
3 2.394639e-01  0.708399915 1.668138e-03  0.3362223   0.2712942
4 4.999481e-01 -0.006688093 9.634529e-01 -0.1069250   0.5488718
5 7.417181e-01  0.278006093 5.600597e-03  0.1330199   0.3244233
6 1.769196e-05 -0.941127033 2.679463e-02 -0.4483902   0.4263436
gr5_markers <- nhood_markers[c("logFC_5", "adj.P.Val_5")] 
colnames(gr5_markers) <- c("logFC", "adj.P.Val")

head(gr5_markers[order(gr5_markers$adj.P.Val), ])
         logFC    adj.P.Val
979  0.4954478 3.303464e-07
363  0.2585361 1.623317e-06
1737 0.3048098 1.623317e-06
832  0.3793630 4.815722e-06
974  0.2832659 6.186392e-06
478  0.4508483 2.593135e-05
nhood_markers <- findNhoodGroupMarkers(embryo_milo, da_results, subset.row = hvgs, 
                                       aggregate.samples = TRUE, sample_col = "sample",
                                       subset.groups = c("5")
                                       )

head(nhood_markers)
                     logFC_5  adj.P.Val_5             GeneID
ENSMUSG00000029838 0.4954478 3.303464e-07 ENSMUSG00000029838
ENSMUSG00000059246 0.3048098 1.623317e-06 ENSMUSG00000059246
ENSMUSG00000020427 0.2585361 1.623317e-06 ENSMUSG00000020427
ENSMUSG00000027996 0.3793630 4.815722e-06 ENSMUSG00000027996
ENSMUSG00000029755 0.2832659 6.186392e-06 ENSMUSG00000029755
ENSMUSG00000022054 0.4508483 2.593135e-05 ENSMUSG00000022054
nhood_markers <- findNhoodGroupMarkers(embryo_milo, da_results, subset.row = hvgs,
                                       subset.nhoods = da_results$NhoodGroup %in% c('5','6'),
                                       aggregate.samples = TRUE, sample_col = "sample")
head(nhood_markers)
              GeneID     logFC_5 adj.P.Val_5     logFC_6 adj.P.Val_6
1 ENSMUSG00000000031 -0.98660033  0.01729991  0.98660033  0.01729991
2 ENSMUSG00000000078  0.09373713  0.74682071 -0.09373713  0.74682071
3 ENSMUSG00000000088  0.19162019  0.53797687 -0.19162019  0.53797687
4 ENSMUSG00000000125 -0.04908371  0.82997598  0.04908371  0.82997598
5 ENSMUSG00000000149 -0.03932969  0.44348577  0.03932969  0.44348577
6 ENSMUSG00000000184 -0.73167698  0.13642895  0.73167698  0.13642895

Visualize detected markers

ggplot(nhood_markers, aes(logFC_5, -log10(adj.P.Val_5 ))) + 
  geom_point(alpha=0.5, size=0.5) +
  geom_hline(yintercept = 3)

markers <- nhood_markers$GeneID[nhood_markers$adj.P.Val_5 < 0.01 & nhood_markers$logFC_5 > 0]
set.seed(42)
plotNhoodExpressionGroups(embryo_milo, da_results, features=intersect(rownames(embryo_milo), markers[1:10]),
                          subset.nhoods = da_results$NhoodGroup %in% c('6','5'), 
                          scale=TRUE,
                          grid.space = "fixed")
Warning in plotNhoodExpressionGroups(embryo_milo, da_results, features =
intersect(rownames(embryo_milo), : Nothing in nhoodExpression(x): computing for
requested features...

DGE testing within a group

dge_6 <- testDiffExp(embryo_milo, da_results, design = ~ stage, meta.data = data.frame(colData(embryo_milo)),
                     subset.row = rownames(embryo_milo)[1:5], subset.nhoods=da_results$NhoodGroup=="6")
Warning in fitFDist(var, df1 = df, covariate = covariate): More than half of
residual variances are exactly zero: eBayes unreliable
dge_6
$`6`
                          logFC   AveExpr         t    P.Value adj.P.Val
ENSMUSG00000025902 -0.080567087 0.3520449 -2.023978 0.04314914 0.2157457
ENSMUSG00000033845 -0.009033338 2.2775883 -0.209271 0.83426503 1.0000000
ENSMUSG00000051951  0.000000000 0.0000000  0.000000 1.00000000 1.0000000
ENSMUSG00000102343  0.000000000 0.0000000  0.000000 1.00000000 1.0000000
ENSMUSG00000025900  0.000000000 0.0000000  0.000000 1.00000000 1.0000000
                            B Nhood.Group
ENSMUSG00000025902  -9.440543           6
ENSMUSG00000033845 -11.465446           6
ENSMUSG00000051951 -11.487357           6
ENSMUSG00000102343 -11.487357           6
ENSMUSG00000025900 -11.487357           6

sessionInfo()
R version 4.4.1 (2024-06-14)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 22.04.5 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so;  LAPACK version 3.10.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

time zone: Etc/UTC
tzcode source: system (glibc)

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
 [1] MouseGastrulationData_1.20.0 SpatialExperiment_1.16.0    
 [3] patchwork_1.3.0              dplyr_1.1.4                 
 [5] scran_1.34.0                 scater_1.34.0               
 [7] ggplot2_3.5.1                scuttle_1.16.0              
 [9] SingleCellExperiment_1.28.1  SummarizedExperiment_1.36.0 
[11] Biobase_2.66.0               GenomicRanges_1.58.0        
[13] GenomeInfoDb_1.42.1          IRanges_2.40.1              
[15] S4Vectors_0.44.0             BiocGenerics_0.52.0         
[17] MatrixGenerics_1.18.0        matrixStats_1.4.1           
[19] miloR_2.2.0                  edgeR_4.4.1                 
[21] limma_3.62.1                 workflowr_1.7.1             

loaded via a namespace (and not attached):
  [1] RColorBrewer_1.1-3      rstudioapi_0.17.1       jsonlite_1.8.9         
  [4] magrittr_2.0.3          ggbeeswarm_0.7.2        magick_2.8.5           
  [7] farver_2.1.2            rmarkdown_2.28          fs_1.6.4               
 [10] zlibbioc_1.52.0         vctrs_0.6.5             memoise_2.0.1          
 [13] htmltools_0.5.8.1       S4Arrays_1.6.0          curl_6.0.1             
 [16] AnnotationHub_3.14.0    BiocNeighbors_2.0.1     SparseArray_1.6.0      
 [19] sass_0.4.9              pracma_2.4.4            bslib_0.8.0            
 [22] cachem_1.1.0            whisker_0.4.1           igraph_2.1.2           
 [25] mime_0.12               lifecycle_1.0.4         pkgconfig_2.0.3        
 [28] rsvd_1.0.5              Matrix_1.7-0            R6_2.5.1               
 [31] fastmap_1.2.0           GenomeInfoDbData_1.2.13 digest_0.6.37          
 [34] numDeriv_2016.8-1.1     colorspace_2.1-1        AnnotationDbi_1.68.0   
 [37] ps_1.8.1                rprojroot_2.0.4         dqrng_0.4.1            
 [40] irlba_2.3.5.1           ExperimentHub_2.14.0    RSQLite_2.3.9          
 [43] beachmat_2.22.0         labeling_0.4.3          filelock_1.0.3         
 [46] fansi_1.0.6             httr_1.4.7              polyclip_1.10-7        
 [49] abind_1.4-8             compiler_4.4.1          bit64_4.5.2            
 [52] withr_3.0.2             BiocParallel_1.40.0     DBI_1.2.3              
 [55] viridis_0.6.5           highr_0.11              ggforce_0.4.2          
 [58] MASS_7.3-60.2           rappdirs_0.3.3          DelayedArray_0.32.0    
 [61] rjson_0.2.23            bluster_1.16.0          gtools_3.9.5           
 [64] tools_4.4.1             vipor_0.4.7             beeswarm_0.4.0         
 [67] httpuv_1.6.15           glue_1.8.0              callr_3.7.6            
 [70] promises_1.3.0          grid_4.4.1              getPass_0.2-4          
 [73] cluster_2.1.6           generics_0.1.3          gtable_0.3.6           
 [76] tidyr_1.3.1             BiocSingular_1.22.0     tidygraph_1.3.1        
 [79] ScaledMatrix_1.14.0     metapod_1.14.0          utf8_1.2.4             
 [82] XVector_0.46.0          RcppAnnoy_0.0.22        BiocVersion_3.20.0     
 [85] ggrepel_0.9.6           pillar_1.9.0            stringr_1.5.1          
 [88] BumpyMatrix_1.14.0      later_1.3.2             splines_4.4.1          
 [91] tweenr_2.0.3            BiocFileCache_2.14.0    lattice_0.22-6         
 [94] FNN_1.1.4.1             bit_4.5.0               tidyselect_1.2.1       
 [97] locfit_1.5-9.10         Biostrings_2.74.1       knitr_1.48             
[100] git2r_0.35.0            gridExtra_2.3           xfun_0.48              
[103] graphlayouts_1.2.0      statmod_1.5.0           stringi_1.8.4          
[106] UCSC.utils_1.2.0        yaml_2.3.10             evaluate_1.0.1         
[109] codetools_0.2-20        ggraph_2.2.1            tibble_3.2.1           
[112] BiocManager_1.30.25     cli_3.6.3               uwot_0.2.2             
[115] munsell_0.5.1           processx_3.8.4          jquerylib_0.1.4        
[118] Rcpp_1.0.13             dbplyr_2.5.0            png_0.1-8              
[121] parallel_4.4.1          blob_1.2.4              viridisLite_0.4.2      
[124] scales_1.3.0            purrr_1.0.2             crayon_1.5.3           
[127] rlang_1.1.4             KEGGREST_1.46.0         cowplot_1.1.3