Last updated: 2022-02-22

Checks: 7 0

Knit directory: MelanomaIMC/

This reproducible R Markdown analysis was created with workflowr (version 1.7.0). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.

R Markdown file: up-to-date

Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Environment: empty

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

Seed: set.seed(20200728)

The command set.seed(20200728) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Session information: recorded

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Cache: none

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

File paths: relative

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Repository version: d246c15

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version d246c15. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .DS_Store
    Ignored:    .Rproj.user/
    Ignored:    Table_S4.csv
    Ignored:    code/.DS_Store
    Ignored:    code/._.DS_Store
    Ignored:    data/.DS_Store
    Ignored:    data/._.DS_Store
    Ignored:    data/data_for_analysis/
    Ignored:    data/full_data/

Unstaged changes:
    Modified:   .gitignore
    Modified:   analysis/Supp-Figure_10.rmd
    Modified:   analysis/_site.yml
    Deleted:    analysis/license.Rmd

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.

These are the previous versions of the repository in which changes were made to the R Markdown (analysis/04_2_protein_classification_subclustering.rmd) and HTML (docs/04_2_protein_classification_subclustering.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File	Version	Author	Date	Message
Rmd	b24a994	toobiwankenobi	2022-02-08	update code to work under R 4.1.2
Rmd	f9a3a83	toobiwankenobi	2022-02-08	clean repo for release
html	4109ff1	toobiwankenobi	2021-07-07	delete html files and adapt gitignore
Rmd	3203891	toobiwankenobi	2021-02-19	change celltype names
html	3203891	toobiwankenobi	2021-02-19	change celltype names
Rmd	ee1595d	toobiwankenobi	2021-02-12	clean repo and adapt files
html	ee1595d	toobiwankenobi	2021-02-12	clean repo and adapt files
html	3f5af3f	toobiwankenobi	2021-02-09	add .html files
Rmd	d8819f2	toobiwankenobi	2020-10-08	read new data (nuclei expansion) and adapt scripts
Rmd	fb0f7cb	SchulzDan	2020-08-24	more paths adapted
Rmd	b68b4c3	SchulzDan	2020-08-17	added scores and coxph models. not finished

load packages

sapply(list.files("code/helper_functions/", full.names = TRUE), source)


Attaching package: 'dplyr'

The following objects are masked from 'package:stats':

    filter, lag

The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union

        code/helper_functions//calculateSummary.R
value   ?                                        
visible FALSE                                    
        code/helper_functions//censor_dat.R
value   ?                                  
visible FALSE                              
        code/helper_functions//detect_mRNA_expression.R
value   ?                                              
visible FALSE                                          
        code/helper_functions//DistanceToClusterCenter.R
value   ?                                               
visible FALSE                                           
        code/helper_functions//findMilieu.R code/helper_functions//findPatch.R
value   ?                                   ?                                 
visible FALSE                               FALSE                             
        code/helper_functions//getInfoFromString.R
value   ?                                         
visible FALSE                                     
        code/helper_functions//getSpotnumber.R
value   ?                                     
visible FALSE                                 
        code/helper_functions//plotCellCounts.R
value   ?                                      
visible FALSE                                  
        code/helper_functions//plotCellFractions.R
value   ?                                         
visible FALSE                                     
        code/helper_functions//plotDist.R code/helper_functions//read_Data.R
value   ?                                 ?                                 
visible FALSE                             FALSE                             
        code/helper_functions//scatter_function.R
value   ?                                        
visible FALSE                                    
        code/helper_functions//sceChecks.R
value   ?                                 
visible FALSE                             
        code/helper_functions//validityChecks.R
value   ?                                      
visible FALSE

library(LSD)
library(SingleCellExperiment)

Loading required package: SummarizedExperiment

Loading required package: MatrixGenerics

Loading required package: matrixStats


Attaching package: 'matrixStats'

The following object is masked from 'package:dplyr':

    count


Attaching package: 'MatrixGenerics'

The following objects are masked from 'package:matrixStats':

    colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse,
    colCounts, colCummaxs, colCummins, colCumprods, colCumsums,
    colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs,
    colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats,
    colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds,
    colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,
    colWeightedMeans, colWeightedMedians, colWeightedSds,
    colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,
    rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,
    rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps,
    rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,
    rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,
    rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars,
    rowWeightedMads, rowWeightedMeans, rowWeightedMedians,
    rowWeightedSds, rowWeightedVars

Loading required package: GenomicRanges

Loading required package: stats4

Loading required package: BiocGenerics


Attaching package: 'BiocGenerics'

The following objects are masked from 'package:dplyr':

    combine, intersect, setdiff, union

The following objects are masked from 'package:stats':

    IQR, mad, sd, var, xtabs

The following objects are masked from 'package:base':

    anyDuplicated, append, as.data.frame, basename, cbind, colnames,
    dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep,
    grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget,
    order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
    rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply,
    union, unique, unsplit, which.max, which.min

Loading required package: S4Vectors


Attaching package: 'S4Vectors'

The following objects are masked from 'package:dplyr':

    first, rename

The following objects are masked from 'package:base':

    expand.grid, I, unname

Loading required package: IRanges


Attaching package: 'IRanges'

The following objects are masked from 'package:dplyr':

    collapse, desc, slice

Loading required package: GenomeInfoDb

Loading required package: Biobase

Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.


Attaching package: 'Biobase'

The following object is masked from 'package:MatrixGenerics':

    rowMedians

The following objects are masked from 'package:matrixStats':

    anyMissing, rowMedians

library(ggplot2)
library(scater)

Loading required package: scuttle

library(viridis)

Loading required package: viridisLite

library(igraph)


Attaching package: 'igraph'

The following object is masked from 'package:scater':

    normalize

The following object is masked from 'package:GenomicRanges':

    union

The following object is masked from 'package:IRanges':

    union

The following object is masked from 'package:S4Vectors':

    union

The following objects are masked from 'package:BiocGenerics':

    normalize, path, union

The following objects are masked from 'package:dplyr':

    as_data_frame, groups, union

The following objects are masked from 'package:stats':

    decompose, spectrum

The following object is masked from 'package:base':

    union

library(CATALYST)
library(workflowr)

load data

sce = readRDS(file = "data/data_for_analysis/sce_protein.rds")

for(i in 1:10){
scatter_x_y(sce,x = "CD8",y = "pERK",imagenumber =i, xlim = c(0,5), ylim = c(0,5))
  abline(h = 1.25)
  abline(v=1.5)
  abline(a = 0, b= 1)
}

mean expression of markers per celltype

# Aggregate the counts. calculateSummary is a function written from Nils
mean_sce <- calculateSummary(sce, split_by = "celltype", 
                             exprs_values = "counts")

# Exclude DNA and Histone
mean_sce <- mean_sce[!grepl("DNA|Histone", rownames(mean_sce)),]

# Transform and scale
assay(mean_sce, "arcsinh") <- asinh(assay(mean_sce, "meanCounts")) 
assay(mean_sce, "arcsinh_scaled") <- t(scale(t(asinh(assay(mean_sce, "meanCounts")))))

# Linear scale 
plotHeatmap(mean_sce, exprs_values = "arcsinh", 
            features = c(rownames(mean_sce)), 
            colour_columns_by = "celltype", 
            color = viridis(100), cluster_rows = FALSE, gaps_row = 10)

scaled mean expression of markers per celltype

# Linear scale 
plotHeatmap(mean_sce, exprs_values = "arcsinh_scaled", 
            features = c(rownames(mean_sce)), 
            colour_columns_by = "celltype", 
            color = colorRampPalette(c("dark blue", "white", "dark red"))(100),
            zlim = c(-3,3), cluster_rows = TRUE, gaps_row = 10)

definition of markers used for subclustering the super-classes

# Select markers
marker_list <- list()

marker_list$Tumor <- c("Ki67", "PDL1", "PARP", "H3K27me3","CXCR2","HLADR","S100","Sox9","pERK","CD36",
                       "p75","MiTF","bCatenin","Collagen1","pS6","IDO1","SOX10")

marker_list$Neutrophil <- c("CD15", "PDL1", "MPO", "CD11b")


marker_list$`FOXP3+ T cell` <- c("IDO1", "PD1", "FOXP3", "ICOS", "CD4", "Ki67", "GrzB",
                        "CD45RA", "CD45RO")

marker_list$`CD8+ T cell` <- c("IDO1", "TOX1", "PD1", "TCF7", "ICOS", "CD8","Ki67", "GrzB",
                        "CD45RA", "CD45RO")

marker_list$`CD4+ T cell` <- c("IDO1", "TOX1", "PD1", "TCF7", 
                        "FOXP3", "ICOS", "CD4", "Ki67", "GrzB",
                        "CD45RA", "CD45RO")

marker_list$`BnT cell` <- c("IDO1", "TOX1", "PD1", "TCF7", 
                        "FOXP3", "ICOS", "CD8", "CD4", "Ki67", "GrzB",
                        "CD45RA", "CD45RO", "CD20","HLADR","H3K27me3","pS6","pERK", "CD7")

marker_list$pDC <- c("CD303","IDO1","GrzB","CXCR2","CD11b")

marker_list$`B cell` <- c("Ki67","CD45RA", "CD45RO", "CD20","HLADR","H3K27me3","pS6","pERK")

marker_list$Macrophage <- c("Caveolin1", "PDL1", "PARP", "H3K27me3","CXCR2","HLADR","CD45RO","CD45RA","CD68","pERK","CD36", "Collagen1","CD11c","pS6","IDO1","CD206")

marker_list$Stroma <- c("Caveolin1", "CD68","CD36", "Collagen1","SMA")

marker_list$unknown <- rownames(sce[rowData(sce)$good_marker,])

cluster the whole dataset with FlowSOM

FlowSOM first because it is faster

## the FlowSOM function from CATALYST needs an another column in the rowData of the sce to work properly:
rowData(sce)$marker_class <- "state"

# vector for clustering
fs_clustering <- vector(length = ncol(sce))

# create the "exprs" slot in the assay data (needed for CATALYST)
assay(sce, "exprs") <- assay(sce,"asinh")

# Macrophage, B cells, CD4+ T cell, CD8+ T cell, Tother and BnT cells will be clustered for a total of 6 clustes each
set.seed(12345)
for(i in c("Macrophage","CD4+ T cell","CD8+ T cell")){
  cur_sce <- sce[,sce$celltype == i]
  
  cur_sce <- CATALYST::cluster(cur_sce,features = marker_list[i][[1]], ydim = 2,xdim = 3,maxK = 4)
  
  fs_clustering[sce$celltype == i] <- cur_sce$cluster_id
}

o running FlowSOM clustering...

o running ConsensusClusterPlus metaclustering...

o running FlowSOM clustering...

o running ConsensusClusterPlus metaclustering...

o running FlowSOM clustering...

o running ConsensusClusterPlus metaclustering...

# pDCs and Neutrophils will be clustered to 4 clusteres
for(i in c("pDC","Neutrophil","BnT cell","B cell","FOXP3+ T cell","Stroma", "unknown")){
  cur_sce <- sce[,sce$celltype == i]
  
  cur_sce <- CATALYST::cluster(cur_sce,features = marker_list[i][[1]],ydim = 2,xdim = 2,maxK = 3)
  
  fs_clustering[sce$celltype == i] <- cur_sce$cluster_id
}

o running FlowSOM clustering...
o running ConsensusClusterPlus metaclustering...

o running FlowSOM clustering...

o running ConsensusClusterPlus metaclustering...

o running FlowSOM clustering...

o running ConsensusClusterPlus metaclustering...

o running FlowSOM clustering...

o running ConsensusClusterPlus metaclustering...

o running FlowSOM clustering...

o running ConsensusClusterPlus metaclustering...

o running FlowSOM clustering...

o running ConsensusClusterPlus metaclustering...

o running FlowSOM clustering...

o running ConsensusClusterPlus metaclustering...

# Tumor will be clustered into 9 clusters
cur_sce <- sce[,sce$celltype == "Tumor"]
  
cur_sce <- CATALYST::cluster(cur_sce,features = marker_list["Tumor"][[1]],ydim = 3,xdim = 3,maxK = 7)

o running FlowSOM clustering...
o running ConsensusClusterPlus metaclustering...

fs_clustering[sce$celltype == "Tumor"] <- cur_sce$cluster_id

# Save in SCE object
colData(sce)$celltype_clustered <- as.factor(fs_clustering)

sce$celltype_clustered <- paste0(sce$celltype, "_", sce$celltype_clustered)

visulalize clustering results for FlowSOM

# Aggregate the counts
mean_sce <- calculateSummary(sce, split_by = c("celltype","celltype_clustered"), 
                             exprs_values = "counts")

# Exclude DNA and Histone
mean_sce <- mean_sce[!grepl("DNA|Histone", rownames(mean_sce)),]

# Transform and scale
assay(mean_sce, "arcsinh") <- asinh(assay(mean_sce, "meanCounts")) 
assay(mean_sce, "arcsinh_scaled") <- t(scale(t(asinh(assay(mean_sce, "meanCounts")))))

# Linear scale 
plotHeatmap(mean_sce, exprs_values = "arcsinh",
            features = rownames(sce)[rowData(sce)$good_marker],
            colour_columns_by = c("celltype"), 
            color = viridis(100), 
            labels_col = mean_sce$celltype_clustered,
            show_colnames = TRUE, annotation_legend = FALSE, borders_color = NA)

# Scaled
plotHeatmap(mean_sce, exprs_values = "arcsinh_scaled", 
            features = rownames(sce)[rowData(sce)$good_marker],
            colour_columns_by = c("celltype"), 
            labels_col = mean_sce$celltype_clustered,
            show_colnames = TRUE, annotation_legend = TRUE, borders_color = NA,
            color = colorRampPalette(c("dark blue", "white", "dark red"))(100),
            zlim = c(-4, 4),legend = TRUE)

numbers of cells per cluster

table(sce$celltype_clustered)


       B cell_1        B cell_2        B cell_3        B cell_4      BnT cell_1 
           4315           11731           12914           21486            7024 
     BnT cell_2      BnT cell_3      BnT cell_4   CD4+ T cell_1   CD4+ T cell_2 
           7961            6527            9082           11334           11205 
  CD4+ T cell_3   CD4+ T cell_4   CD4+ T cell_5   CD4+ T cell_6   CD8+ T cell_1 
           9717           11569           13073            2997            3671 
  CD8+ T cell_2   CD8+ T cell_3   CD8+ T cell_4   CD8+ T cell_5   CD8+ T cell_6 
          11058           11808           12549            9536            4671 
FOXP3+ T cell_1 FOXP3+ T cell_2 FOXP3+ T cell_3 FOXP3+ T cell_4    Macrophage_1 
           2976            2375            2827            1293           12010 
   Macrophage_2    Macrophage_3    Macrophage_4    Macrophage_5    Macrophage_6 
           8126           10022            8439           14226            9699 
   Neutrophil_1    Neutrophil_2    Neutrophil_3    Neutrophil_4           pDC_1 
           3339            2473            1770            3612            2508 
          pDC_2           pDC_3           pDC_4        Stroma_1        Stroma_2 
           1810             891            1882           14333           17709 
       Stroma_3        Stroma_4         Tumor_1         Tumor_2         Tumor_3 
          10161           18751           84616           76055           82520 
        Tumor_4         Tumor_5         Tumor_6         Tumor_7         Tumor_8 
          51185           72087           86565           43957           51597 
        Tumor_9       unknown_1       unknown_2       unknown_3       unknown_4 
          75909            2795           10315            1678            4665

assign names to clusters

annotations <- sce$celltype_clustered

# annotations[annotations == "other_4"] <- "Other_CD206Low_pS6Low"
# annotations[annotations == "other_2"] <- "Tumor_SOX9Low_SOX10+_MITF+_S100+_bCatenin++"
# annotations[annotations == "other_3"] <- "Other_SOX10Low_MITFLow_bCatenin+"
# annotations[annotations == "other_1"] <- "Other_CaveolinLow"
# annotations[annotations == "other_5"] <- "Vasculature_SMA++_Collagen+_Caveolin+"
# annotations[annotations == "other_6"] <- "Stroma_SMA+_Collagen++_Caveolin++_CD36++"
# 
# annotations[annotations == "Tumor_2"] <- "Tumor_SOX9++_p75+"
# annotations[annotations == "Tumor_3"] <- "Tumor_SOX10+_MITFLow_CD36Low"
# annotations[annotations == "Tumor_4"] <- "Tumor_SOX10Low_S100+"
# annotations[annotations == "Tumor_5"] <- "Tumor_SOX9++_SOX10+_MITF+_S100Low"
# annotations[annotations == "Tumor_8"] <- "Tumor_SOX9Low_SOX10Low_MITFLow_p75+"
# annotations[annotations == "Tumor_9"] <- "Tumor_SOX10+_MITF+_S100++"
# annotations[annotations == "Tumor_6"] <- "Tumor_S100+"
# annotations[annotations == "Tumor_1"] <- "Tumor_SOX10Low_MITFLow_HLADR+"
# annotations[annotations == "Tumor_7"] <- "Tumor_SOX10+_MITF++_S100+_pS6+"
# 
# annotations[annotations == "B cells_1"] <- "B cells_pErkLow_H3K27meLow"
# annotations[annotations == "B cells_4"] <- "B cells_pErkLow_H3K27meLow"
# annotations[annotations == "B cells_2"] <- "B cells_H3K27me_LowpS6Low"
# annotations[annotations == "B cells_3"] <- "B cells_CD19++_pErkLow_H3K27meLow"
# annotations[annotations == "B cells_5"] <- "B cells_Ki67++"
# annotations[annotations == "B cells_6"] <- "B cells_SMALow"
# 
# annotations[annotations == "BnT_1"] <- "B cells_CD3Low_CD4Low"
# annotations[annotations == "BnT_2"] <- "B cells_pErkLow_H3K27meLow"
# annotations[annotations == "BnT_6"] <- "BnT_CD8+"
# annotations[annotations == "BnT_4"] <- "BnT_CD8+"
# annotations[annotations == "BnT_5"] <- "BnT_CD4+_TCF7+"
# annotations[annotations == "BnT_3"] <- "BnT_CD4+_ICOS++_PD1++_Ki67+"
# 
# annotations[annotations == "T_other_4"] <- "Tcells_undefined"
# annotations[annotations == "T_other_5"] <- "Tcells_undefined"
# annotations[annotations == "T_other_3"] <- "Tcells_CD15_LowGrzB+"
# annotations[annotations == "T_other_1"] <- "Tcells_Ki67++"
# annotations[annotations == "T_other_2"] <- "Tcells_CaveolinLow_SMALow"
# annotations[annotations == "T_other_6"] <- "Tcells_undefined"
# 
# annotations[annotations == "T_helper_2"] <- "CD4+ T cell_CD11cLow_HLADRLow"
# annotations[annotations == "T_helper_5"] <- "CD4+ T cell_SMALow_CaveolinLow"
# annotations[annotations == "T_helper_1"] <- "CD4+ T cell_TCF7++"
# annotations[annotations == "T_helper_6"] <- "CD4+ T cell_FoxP3++_ICOS+_CD11cLow_CD206Low"
# annotations[annotations == "T_helper_3"] <- "CD4+ T cell_FoxP3Low_pERKLow"
# annotations[annotations == "T_helper_4"] <- "CD4+ T cell_FoxP3Low"
# 
# annotations[annotations == "T_cytotoxic_3"] <- "CD8+ T cell_PD1+_TOX1+_GrzBLow"
# annotations[annotations == "T_cytotoxic_5"] <- "CD8+ T cell_TCF7+_H3K27meLow"
# annotations[annotations == "T_cytotoxic_6"] <- "CD8+ T cell_PD1+_TOX1+_GrzB+_ICOSLow_Ki67+"
# annotations[annotations == "T_cytotoxic_1"] <- "CD8+ T cell_GrzBLow_SMALow_CaveolinLow"
# annotations[annotations == "T_cytotoxic_2"] <- "CD8+ T cell_GrzBLow_pS6Low"
# annotations[annotations == "T_cytotoxic_4"] <- "CD8+ T cell_PD1+_TOX1+_GrzBLow_ICOSLow_CD11cLow_CD206Low"
# 
# annotations[annotations == "Macrophage_4"] <- "Macrophage_CD68Low_CaveolinLow"
# annotations[annotations == "Macrophage_6"] <- "Tumor_CD68Low_S100Low_bCateninLow_MITFLow_SOX10Low_HLADR+"
# annotations[annotations == "Macrophage_2"] <- "Macrophage_CD68Low_CD206++_CD36Low"
# annotations[annotations == "Macrophage_1"] <- "Macrophage_CD68Low_pS6Low_S100Low"
# annotations[annotations == "Macrophage_3"] <- "Macrophage_CD68+_CD11c+_CD206++_CXCR2+_PDL1+"
# annotations[annotations == "Macrophage_5"] <- "Macrophage_CD68++_CD11c+_CD36+_Caveolin+_Collagen+"
# 
# annotations[annotations == "pDC_1"] <- "pDC_IDO1+_GrzB+"
# annotations[annotations == "pDC_2"] <- "pDC_IDO1+_TOX1+_CD11b+"
# annotations[annotations == "pDC_3"] <- "pDC_IDO1++_GrzB++"
# annotations[annotations == "pDC_4"] <- "pDC_Ido1+_PARP++"
# 
# annotations[annotations == "Neutrophil_2"] <- "Neutrophil_CD11b+_CD15++_MPO+_PDL1++"
# annotations[annotations == "Neutrophil_4"] <- "Neutrophil_CD11b+_CD15+_MPO+_PDL1Low"
# annotations[annotations == "Neutrophil_3"] <- "Neutrophil_CD11b++_CD15++_MPO++_CXCR2Low_PDL1+"
# annotations[annotations == "Neutrophil_1"] <- "Neutrophil_CD11b+_CD15+_MPO++"

sce$cellAnnotation <- annotations

mean_sce <- calculateSummary(sce, split_by = c("celltype","cellAnnotation"), 
                             exprs_values = "counts")

# Exclude DNA and Histone
mean_sce <- mean_sce[!grepl("DNA|Histone", rownames(mean_sce)),]

# Transform and scale
assay(mean_sce, "arcsinh") <- asinh(assay(mean_sce, "meanCounts")) 
assay(mean_sce, "arcsinh_scaled") <- t(scale(t(asinh(assay(mean_sce, "meanCounts")))))

plotHeatmap(mean_sce, exprs_values = "arcsinh_scaled", 
            features = rownames(sce)[rowData(sce)$good_marker],
            colour_columns_by = c("celltype"), 
            labels_col = mean_sce$cellAnnotation,
            show_colnames = TRUE, annotation_legend = TRUE, borders_color = NA,
            color = colorRampPalette(c("dark blue", "white", "dark red"))(100),
            zlim = c(-4, 4),legend = TRUE)

clustering does not really pick up PDL1 positive Tumor cells (which definitely exist). Potentially we have to investigate this with a manual cut off. arcsinh > 1.75 seems to be reasonable

Additionally, the clustering did not pick up proliferating and non prolifertating tumor clusters. this should also be investigated by manual cut offs. arcsinh > 1.25 seems reasonable.

celltype_counts <- sce$celltype

table(celltype_counts)

save the single cell object with the clusters assigned to the cells

# delete the "exprs" slot from the single cell experiment again.
assay(sce,"exprs") <- NULL

saveRDS(sce,file = "data/data_for_analysis/sce_protein.rds")

sessionInfo()

R version 4.1.2 (2021-11-01)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.3 LTS

Matrix products: default
BLAS/LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.8.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
 [1] CATALYST_1.18.1             igraph_1.2.11              
 [3] viridis_0.6.2               viridisLite_0.4.0          
 [5] scater_1.22.0               scuttle_1.4.0              
 [7] ggplot2_3.3.5               SingleCellExperiment_1.16.0
 [9] SummarizedExperiment_1.24.0 Biobase_2.54.0             
[11] GenomicRanges_1.46.1        GenomeInfoDb_1.30.1        
[13] IRanges_2.28.0              S4Vectors_0.32.3           
[15] BiocGenerics_0.40.0         MatrixGenerics_1.6.0       
[17] matrixStats_0.61.0          LSD_4.1-0                  
[19] dplyr_1.0.7                 workflowr_1.7.0            

loaded via a namespace (and not attached):
  [1] utf8_1.2.2                  tidyselect_1.1.1           
  [3] grid_4.1.2                  BiocParallel_1.28.3        
  [5] Rtsne_0.15                  aws.signature_0.6.0        
  [7] flowCore_2.6.0              munsell_0.5.0              
  [9] ScaledMatrix_1.2.0          codetools_0.2-18           
 [11] withr_2.4.3                 colorspace_2.0-2           
 [13] highr_0.9                   knitr_1.37                 
 [15] rstudioapi_0.13             ggsignif_0.6.3             
 [17] git2r_0.29.0                GenomeInfoDbData_1.2.7     
 [19] polyclip_1.10-0             farver_2.1.0               
 [21] pheatmap_1.0.12             flowWorkspace_4.6.0        
 [23] rprojroot_2.0.2             vctrs_0.3.8                
 [25] generics_0.1.2              TH.data_1.1-0              
 [27] xfun_0.29                   R6_2.5.1                   
 [29] doParallel_1.0.16           ggbeeswarm_0.6.0           
 [31] clue_0.3-60                 rsvd_1.0.5                 
 [33] bitops_1.0-7                DelayedArray_0.20.0        
 [35] assertthat_0.2.1            promises_1.2.0.1           
 [37] scales_1.1.1                multcomp_1.4-18            
 [39] beeswarm_0.4.0              gtable_0.3.0               
 [41] beachmat_2.10.0             processx_3.5.2             
 [43] RProtoBufLib_2.6.0          sandwich_3.0-1             
 [45] rlang_1.0.0                 GlobalOptions_0.1.2        
 [47] splines_4.1.2               rstatix_0.7.0              
 [49] hexbin_1.28.2               broom_0.7.12               
 [51] reshape2_1.4.4              yaml_2.2.2                 
 [53] abind_1.4-5                 backports_1.4.1            
 [55] httpuv_1.6.5                RBGL_1.70.0                
 [57] tools_4.1.2                 ellipsis_0.3.2             
 [59] jquerylib_0.1.4             RColorBrewer_1.1-2         
 [61] ggridges_0.5.3              Rcpp_1.0.8                 
 [63] plyr_1.8.6                  base64enc_0.1-3            
 [65] sparseMatrixStats_1.6.0     zlibbioc_1.40.0            
 [67] purrr_0.3.4                 RCurl_1.98-1.5             
 [69] ps_1.6.0                    FlowSOM_2.2.0              
 [71] ggpubr_0.4.0                GetoptLong_1.0.5           
 [73] cowplot_1.1.1               zoo_1.8-9                  
 [75] ggrepel_0.9.1               cluster_2.1.2              
 [77] colorRamps_2.3              fs_1.5.2                   
 [79] magrittr_2.0.2              ncdfFlow_2.40.0            
 [81] data.table_1.14.2           scattermore_0.7            
 [83] circlize_0.4.13             mvtnorm_1.1-3              
 [85] whisker_0.4                 ggnewscale_0.4.5           
 [87] evaluate_0.14               XML_3.99-0.8               
 [89] jpeg_0.1-9                  gridExtra_2.3              
 [91] shape_1.4.6                 ggcyto_1.22.0              
 [93] compiler_4.1.2              tibble_3.1.6               
 [95] crayon_1.4.2                ggpointdensity_0.1.0       
 [97] htmltools_0.5.2             later_1.3.0                
 [99] tidyr_1.2.0                 RcppParallel_5.1.5         
[101] aws.s3_0.3.21               DBI_1.1.2                  
[103] tweenr_1.0.2                ComplexHeatmap_2.10.0      
[105] MASS_7.3-55                 Matrix_1.4-0               
[107] car_3.0-12                  cli_3.1.1                  
[109] parallel_4.1.2              pkgconfig_2.0.3            
[111] getPass_0.2-2               xml2_1.3.3                 
[113] foreach_1.5.2               vipor_0.4.5                
[115] bslib_0.3.1                 XVector_0.34.0             
[117] drc_3.0-1                   stringr_1.4.0              
[119] callr_3.7.0                 digest_0.6.29              
[121] ConsensusClusterPlus_1.58.0 graph_1.72.0               
[123] rmarkdown_2.11              DelayedMatrixStats_1.16.0  
[125] curl_4.3.2                  gtools_3.9.2               
[127] rjson_0.2.21                lifecycle_1.0.1            
[129] jsonlite_1.7.3              carData_3.0-5              
[131] BiocNeighbors_1.12.0        fansi_1.0.2                
[133] pillar_1.7.0                lattice_0.20-45            
[135] plotrix_3.8-2               fastmap_1.1.0              
[137] httr_1.4.2                  survival_3.2-13            
[139] glue_1.6.1                  png_0.1-7                  
[141] iterators_1.0.13            Rgraphviz_2.38.0           
[143] nnls_1.4                    ggforce_0.3.3              
[145] stringi_1.7.6               sass_0.4.0                 
[147] BiocSingular_1.10.0         CytoML_2.6.0               
[149] latticeExtra_0.6-29         cytolib_2.6.1              
[151] irlba_2.3.5

subluclustering and cell type definition

Daniel

19 12 2019