Last updated: 2022-07-25
Checks: 7 0
Knit directory: GSFA_analysis/
This reproducible R Markdown analysis was created with workflowr (version 1.7.0). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.
Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.
Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it's best to always run the code in an empty environment.
The command set.seed(20220524)
was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.
Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.
Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.
Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.
The results in this page were generated with repository version 3cbce15. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.
Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish
or wflow_git_commit
). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:
Ignored files:
Ignored: .Rhistory
Ignored: .Rproj.user/
Untracked files:
Untracked: analysis/interpret_gsfa_LUHMES.Rmd
Untracked: analysis/spca_LUHMES_data.Rmd
Untracked: code/music_LUHMES_Yifan.R
Untracked: code/plotting_functions.R
Untracked: code/run_music_LUHMES.R
Untracked: code/run_music_LUHMES_data.sbatch
Untracked: code/run_music_Tcells_stimulated_data.R
Untracked: code/run_music_Tcells_stimulated_data.sbatch
Untracked: code/run_music_Tcells_unstimulated_data.R
Untracked: code/run_music_Tcells_unstimulated_data.sbatch
Untracked: code/run_sceptre_LUHMES_data.sbatch
Untracked: code/run_sceptre_Tcells_stimulated_data.sbatch
Untracked: code/run_sceptre_Tcells_unstimulated_data.sbatch
Untracked: code/run_spca_LUHMES.R
Untracked: code/run_spca_TCells.R
Untracked: code/sceptre_LUHMES_data.R
Untracked: code/sceptre_Tcells_stimulated_data.R
Untracked: code/sceptre_Tcells_unstimulated_data.R
Unstaged changes:
Modified: analysis/sceptre_LUHMES_data.Rmd
Modified: code/run_sceptre_cropseq_data.sbatch
Modified: code/sceptre_analysis.R
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
These are the previous versions of the repository in which changes were made to the R Markdown (analysis/music_LUHMES_data.Rmd
) and HTML (docs/music_LUHMES_data.html
) files. If you've configured a remote Git repository (see ?wflow_git_remote
), click on the hyperlinks in the table below to view the files as they were in that past version.
File | Version | Author | Date | Message |
---|---|---|---|---|
Rmd | 3cbce15 | kevinlkx | 2022-07-25 | update music results for LUHMES data |
html | 5200b77 | kevinlkx | 2022-07-19 | Build site. |
Rmd | 358270c | kevinlkx | 2022-07-19 | run MUSIC on LUHMES data |
slurm setting
sinteractive --partition=broadwl --account=pi-xinhe --mem=30G --time=10:00:00 --cpus-per-task=10
MUSIC website: https://github.com/bm2-lab/MUSIC
Scripts for running the analysis:
cd /project2/xinhe/kevinluo/GSFA/music_analysis/log
sbatch --mem=50G --cpus-per-task=10 ~/projects/GSFA_analysis/code/run_music_LUHMES_data.sbatch
CROP-seq datasets: /project2/xinhe/yifan/Factor_analysis/shared_data/
The data are Seurat objects, with raw gene counts stored in obj@assays$RNA@counts, and cell meta data stored in obj@meta.data. Normalized and scaled data used for GSFA are stored in obj@assays$RNA@scale.data , the rownames of which are the 6k genes used for GSFA.
Load packages
suppressPackageStartupMessages(library(data.table))
suppressPackageStartupMessages(library(Seurat))
suppressPackageStartupMessages(library(MUSIC))
suppressPackageStartupMessages(library(ComplexHeatmap))
suppressPackageStartupMessages(library(ggplot2))
theme_set(theme_bw() + theme(plot.title = element_text(size = 14, hjust = 0.5),
axis.title = element_text(size = 14),
axis.text = element_text(size = 13),
legend.title = element_text(size = 13),
legend.text = element_text(size = 12),
panel.grid.minor = element_blank())
)
functions
## Adapted over MUSIC's Diff_topic_distri() function
Empirical_topic_prob_diff <- function(model, perturb_information,
permNum = 10^4, seed = 1000){
require(reshape2)
require(dplyr)
require(ComplexHeatmap)
options(warn = -1)
prob_mat <- model@gamma
row.names(prob_mat) <- model@documents
topicNum <- ncol(prob_mat)
topicName <- paste0('Topic_', 1:topicNum)
colnames(prob_mat) <- topicName
ko_name <- unique(perturb_information)
prob_df <- data.frame(prob_mat,
samples = rownames(prob_mat),
knockout = perturb_information)
prob_df <- melt(prob_df, id = c('samples', 'knockout'), variable.name = "topic")
summary_df <- prob_df %>%
group_by(knockout, topic) %>%
summarise(number = sum(value)) %>%
ungroup() %>%
group_by(knockout) %>%
mutate(cellNum = sum(number)) %>%
ungroup() %>%
mutate(ratio = number/cellNum)
summary_df$ctrlNum <- rep(summary_df$cellNum[summary_df$knockout == "CTRL"],
length(ko_name))
summary_df$ctrl_ratio <- rep(summary_df$ratio[summary_df$knockout == "CTRL"],
length(ko_name))
summary_df <- summary_df %>% mutate(diff_index = ratio - ctrl_ratio)
test_df <- data.frame(matrix(nrow = length(ko_name) * topicNum, ncol = 5))
colnames(test_df) <- c("knockout", "topic", "obs_t_stats", "obs_pval", "empirical_pval")
k <- 1
for(i in topicName){
prob_df.topic <- prob_df[prob_df$topic == i, ]
ctrl_topic <- prob_df.topic$value[prob_df.topic$knockout == "CTRL"]
ctrl_topic_z <- (ctrl_topic - mean(ctrl_topic)) / sqrt(var(ctrl_topic))
for(j in ko_name){
ko_topic <- prob_df.topic$value[prob_df.topic$knockout == j]
ko_topic_z <- (ko_topic - mean(ctrl_topic)) / sqrt(var(ctrl_topic))
test_df$knockout[k] <- j
test_df$topic[k] <- i
test <- t.test(ko_topic_z, ctrl_topic_z)
test_df$obs_t_stats[k] <- test$statistic
test_df$obs_pval[k] <- test$p.value
k <- k + 1
}
}
## Permutation on the perturbation conditions:
# permNum <- 10^4
print(paste0("Performing permutation for ", permNum, " rounds."))
perm_t_stats <- matrix(0, nrow = nrow(test_df), ncol = permNum)
set.seed(seed)
for (perm in 1:permNum){
perm_prob_df <- data.frame(prob_mat,
samples = rownames(prob_mat),
knockout = perturb_information[sample(length(perturb_information))])
perm_prob_df <- melt(perm_prob_df, id = c('samples', 'knockout'), variable.name = "topic")
k <- 1
for(i in topicName){
perm_prob_df.topic <- perm_prob_df[perm_prob_df$topic == i, ]
ctrl_topic <- perm_prob_df.topic$value[perm_prob_df.topic$knockout == "CTRL"]
ctrl_topic_z <- (ctrl_topic - mean(ctrl_topic)) / sqrt(var(ctrl_topic))
for(j in ko_name){
ko_topic <- perm_prob_df.topic$value[perm_prob_df.topic$knockout == j]
ko_topic_z <- (ko_topic - mean(ctrl_topic)) / sqrt(var(ctrl_topic))
test <- t.test(ko_topic_z, ctrl_topic_z)
perm_t_stats[k, perm] <- test$statistic
k <- k + 1
}
}
if (perm %% 1000 == 0){
print(paste0(perm, " rounds finished."))
}
}
## Compute two-sided empirical p value:
for (k in 1:nrow(test_df)){
test_df$empirical_pval[k] <-
2 * min(mean(perm_t_stats[k, ] <= test_df$obs_t_stats[k]),
mean(perm_t_stats[k, ] >= test_df$obs_t_stats[k]))
}
test_df <- test_df %>%
mutate(empirical_pval = ifelse(empirical_pval == 0, 1/permNum, empirical_pval)) %>%
mutate(empirical_pval = ifelse(empirical_pval > 1, 1, empirical_pval))
summary_df <- inner_join(summary_df, test_df, by = c("knockout", "topic"))
summary_df <- summary_df %>%
mutate(polar_log10_pval = ifelse(obs_t_stats > 0, -log10(empirical_pval), log10(empirical_pval)))
return(summary_df)
}
Set directories
data_dir <- "/project2/xinhe/yifan/Factor_analysis/LUHMES/"
res_dir <- "/project2/xinhe/kevinluo/GSFA/music_analysis/LUHMES"
# setwd(res_dir)
dir.create(file.path(res_dir,"/music_output"), recursive = TRUE, showWarnings = FALSE)
feature.names <- data.frame(fread(paste0(data_dir, "GSE142078_raw/GSM4219575_Run1_genes.tsv.gz"),
header = FALSE), stringsAsFactors = FALSE)
# combined_obj <- readRDS("processed_data/seurat_obj.merged_scaled_detect_01.corrected_new.rds")
combined_obj <- readRDS("/project2/xinhe/yifan/Factor_analysis/shared_data/LUHMES_cropseq_data_seurat.rds")
expression_profile <- combined_obj@assays$RNA@counts
rownames(expression_profile) <- feature.names$V2[match(rownames(expression_profile),
feature.names$V1)]
cat("Dimension of expression profile matrix: \n")
dim(expression_profile)
targets <- names(combined_obj@meta.data)[4:18]
targets[targets == "Nontargeting"] <- "CTRL"
cat("Targets: \n")
print(targets)
perturb_information <- apply(combined_obj@meta.data[4:18], 1,
function(x){ targets[which(x > 0)] })
Dimension of expression profile matrix:
[1] 33694 8708
Targets:
[1] "ADNP" "ARID1B" "ASH1L" "CHD2" "CHD8" "CTNND2" "DYRK1A" "HDAC5"
[9] "MECP2" "MYT1L" "CTRL" "POGZ" "PTEN" "RELN" "SETD5"
crop_seq_list <- Input_preprocess(expression_profile, perturb_information)
crop_seq_qc <- Cell_qc(crop_seq_list$expression_profile,
crop_seq_list$perturb_information,
species = "Hs", plot = F)
crop_seq_imputation <- Data_imputation(crop_seq_qc$expression_profile,
crop_seq_qc$perturb_information,
cpu_num = 10)
saveRDS(crop_seq_imputation, "music_output/music_imputation.merged.rds")
crop_seq_filtered <- Cell_filtering(crop_seq_imputation$expression_profile,
crop_seq_imputation$perturb_information,
cpu_num = 10)
saveRDS(crop_seq_filtered, "music_output/music_filtered.merged.rds")
Obtain highly dispersion differentially expressed genes.
crop_seq_filtered <- readRDS("music_output/music_filtered.merged.rds")
dim(crop_seq_filtered$expression_profile)
length(crop_seq_filtered$perturb_information)
crop_seq_vargene <- Get_high_varGenes(crop_seq_filtered$expression_profile,
crop_seq_filtered$perturb_information, plot = T)
saveRDS(crop_seq_vargene, "music_output/music_vargene.merged.rds")
crop_seq_vargene <- readRDS("music_output/music_vargene.merged.rds")
dim(crop_seq_vargene$expression_profile)
crop_seq_vargene$expression_profile[1:5,1:5]
length(crop_seq_vargene$perturb_information)
get topics.
## Get_topics() can take up to a few hours to finish,
## depending on the size of data
system.time(
topic_1 <- Get_topics(crop_seq_vargene$expression_profile,
crop_seq_vargene$perturb_information,
topic_number = 5))
saveRDS(topic_1, "music_output/music_merged_5_topics.rds")
system.time(
topic_2 <- Get_topics(crop_seq_vargene$expression_profile,
crop_seq_vargene$perturb_information,
topic_number = 10))
saveRDS(topic_2, "music_output/music_merged_10_topics.rds")
system.time(
topic_3 <- Get_topics(crop_seq_vargene$expression_profile,
crop_seq_vargene$perturb_information,
topic_number = 15))
saveRDS(topic_3, "music_output/music_merged_15_topics.rds")
system.time(
topic_4 <- Get_topics(crop_seq_vargene$expression_profile,
crop_seq_vargene$perturb_information,
topic_number = 20))
saveRDS(topic_4, "music_output/music_merged_20_topics.rds")
## try fewer numbers of topics
system.time(
topic_5 <- Get_topics(crop_seq_vargene$expression_profile,
crop_seq_vargene$perturb_information,
topic_number = 4))
saveRDS(topic_5, "music_output/music_merged_4_topics.rds")
system.time(
topic_6 <- Get_topics(crop_seq_vargene$expression_profile,
crop_seq_vargene$perturb_information,
topic_number = 6))
saveRDS(topic_6, "music_output/music_merged_6_topics.rds")
system.time(
topic_7 <- Get_topics(crop_seq_vargene$expression_profile,
crop_seq_vargene$perturb_information,
topic_number = 7))
saveRDS(topic_7, "music_output/music_merged_7_topics.rds")
system.time(
topic_8 <- Get_topics(crop_seq_vargene$expression_profile,
crop_seq_vargene$perturb_information,
topic_number = 3))
saveRDS(topic_8, "music_output/music_merged_3_topics.rds")
topic_1 <- readRDS("music_output/music_merged_4_topics.rds")
topic_2 <- readRDS("music_output/music_merged_5_topics.rds")
topic_3 <- readRDS("music_output/music_merged_6_topics.rds")
topic_4 <- readRDS("music_output/music_merged_10_topics.rds")
topic_5 <- readRDS("music_output/music_merged_15_topics.rds")
topic_6 <- readRDS("music_output/music_merged_20_topics.rds")
topic_model_list <- list()
topic_model_list$models <- list()
topic_model_list$perturb_information <- topic_1$perturb_information
topic_model_list$models[[1]] <- topic_1$models[[1]]
topic_model_list$models[[2]] <- topic_2$models[[1]]
topic_model_list$models[[3]] <- topic_3$models[[1]]
topic_model_list$models[[4]] <- topic_4$models[[1]]
topic_model_list$models[[5]] <- topic_5$models[[1]]
topic_model_list$models[[6]] <- topic_6$models[[1]]
optimalModel <- Select_topic_number(topic_model_list$models,
plot = T,
plot_path = "music_output/select_topic_number_4to6to20.pdf")
Gene ontology annotations for top topics
topic_res <- readRDS("music_output/music_merged_20_topics.rds")
topic_func <- Topic_func_anno(topic_res$models[[1]], species = "Hs")
saveRDS(topic_func, "music_output/topic_func.rds")
topic_func <- readRDS("music_output/topic_func.rds")
# pdf("music_output/music_merged_20_topics_GO_annotations.pdf",
# width = 14, height = 12)
ggplot(topic_func$topic_annotation_result) +
geom_point(aes(x = Cluster, y = Description,
size = Count, color = -log10(qvalue))) +
scale_color_gradientn(colors = c("blue", "red")) +
theme_bw() +
theme(axis.title = element_blank(),
axis.text.x = element_text(angle = 45, hjust = 1))
# dev.off()
Perturbation effect prioritizing
# calculate topic distribution for each cell.
distri_diff <- Diff_topic_distri(topic_res$models[[1]],
topic_res$perturb_information,
plot = T)
saveRDS(distri_diff, "music_output/distri_diff.rds")
distri_diff <- readRDS("music_output/distri_diff.rds")
t_D_diff_matrix <- dcast(distri_diff %>% dplyr::select(knockout, variable, t_D_diff),
knockout ~ variable)
rownames(t_D_diff_matrix) <- t_D_diff_matrix$knockout
t_D_diff_matrix$knockout <- NULL
# pdf("music_output/music_merged_20_topics_TPD_heatmap.pdf", width = 12, height = 8)
Heatmap(t_D_diff_matrix,
name = "Topic probability difference (vs ctrl)",
cluster_rows = T, cluster_columns = T,
column_names_rot = 45,
heatmap_legend_param = list(title_gp = gpar(fontsize = 12, fontface = "bold")))
# dev.off()
Overall perturbation effect ranking list.
distri_diff <- readRDS(file.path(res_dir, "music_output/distri_diff.rds"))
rank_overall_result <- Rank_overall(distri_diff)
print(rank_overall_result)
# saveRDS(rank_overall_result, "music_output/rank_overall_result.rds")
perturbation ranking Score off_target
1 ASH1L 1 85.91013 none
2 SETD5 2 79.56861 none
3 ARID1B 3 77.59034 none
4 CHD8 4 70.16751 none
5 PTEN 5 66.48328 none
6 POGZ 6 64.12727 none
7 MECP2 7 62.10662 none
8 RELN 8 59.26017 none
9 CHD2 9 55.85265 none
10 MYT1L 10 39.67237 none
11 ADNP 11 39.06289 none
12 DYRK1A 12 38.98911 none
13 HDAC5 13 38.48395 none
topic-specific ranking list.
rank_topic_specific_result <- Rank_specific(distri_diff)
head(rank_topic_specific_result, 10)
# saveRDS(rank_topic_specific_result, "music_output/rank_topic_specific_result.rds")
topic perturbation ranking
1 Topic1 ARID1B 1
2 Topic1 CHD8 2
3 Topic1 SETD5 3
4 Topic1 MECP2 4
5 Topic1 POGZ 5
6 Topic1 PTEN 6
7 Topic1 HDAC5 7
8 Topic1 DYRK1A 8
9 Topic1 ADNP 9
10 Topic1 RELN 10
perturbation correlation.
perturb_cor <- Correlation_perturbation(distri_diff,
cutoff = 0.5, gene = "all", plot = T,
plot_path = file.path(res_dir, "music_output/correlation_network_20_topics.pdf"))
head(perturb_cor, 10)
# saveRDS(perturb_cor, "music_output/perturb_cor.rds")
Perturbation_1 Perturbation_2 Correlation
2 ARID1B ADNP -0.36642339
3 ASH1L ADNP 0.60467107
16 ASH1L ARID1B 0.12772609
30 CHD2 ASH1L 0.91852307
4 CHD2 ADNP 0.81036468
17 CHD2 ARID1B 0.03762586
18 CHD8 ARID1B 0.89232727
5 CHD8 ADNP -0.26907399
31 CHD8 ASH1L 0.25264119
44 CHD8 CHD2 0.13836576
Adaptation to the code to generate calibrated empirical TPD scores
summary_df <- Empirical_topic_prob_diff(topic_res$models[[1]],
topic_res$perturb_information)
saveRDS(summary_df, "music_output/music_merged_20_topics_ttest_summary.rds")
summary_df <- readRDS("music_output/music_merged_20_topics_ttest_summary.rds")
log10_pval_mat <- dcast(summary_df %>% dplyr::select(knockout, topic, polar_log10_pval),
knockout ~ topic)
rownames(log10_pval_mat) <- log10_pval_mat$knockout
log10_pval_mat$knockout <- NULL
pdf("music_output/music_merged_20_topics_empirical_tstats_heatmap.pdf",
width = 12, height = 8)
ht <- Heatmap(log10_pval_mat,
name = "Polarized empirical t-test -log10(p-value)\n(KO vs ctrl cell topic probs)",
col = circlize::colorRamp2(breaks = c(-4, 0, 4), colors = c("blue", "grey90", "red")),
cluster_rows = T, cluster_columns = T,
column_names_rot = 45,
heatmap_legend_param = list(title_gp = gpar(fontsize = 12,
fontface = "bold")))
draw(ht)
dev.off()
Pick optimal number of topics
topic_1 <- readRDS("music_output/music_merged_4_topics.rds")
topic_2 <- readRDS("music_output/music_merged_5_topics.rds")
topic_3 <- readRDS("music_output/music_merged_6_topics.rds")
topic_4 <- readRDS("music_output/music_merged_7_topics.rds")
topic_model_list <- list()
topic_model_list$models <- list()
topic_model_list$perturb_information <- topic_1$perturb_information
topic_model_list$models[[1]] <- topic_1$models[[1]]
topic_model_list$models[[2]] <- topic_2$models[[1]]
topic_model_list$models[[3]] <- topic_3$models[[1]]
topic_model_list$models[[4]] <- topic_4$models[[1]]
optimalModel <- Select_topic_number(topic_model_list$models,
plot = T,
plot_path = "music_output/select_topic_number_4to7.pdf")
optimalModel
saveRDS(optimalModel, "music_output/optimalModel_4_topics.rds")
Gene ontology annotations for top topics
topic_func <- Topic_func_anno(optimalModel, species = "Hs", plot_path = "music_output/topic_annotation_GO_4_topics.pdf")
saveRDS(topic_func, "music_output/topic_func_4_topics.rds")
topic_func <- readRDS("music_output/topic_func_4_topics.rds")
pdf("music_output/music_merged_4_topics_GO_annotations.pdf",
width = 14, height = 12)
ggplot(topic_func$topic_annotation_result) +
geom_point(aes(x = Cluster, y = Description,
size = Count, color = -log10(qvalue))) +
scale_color_gradientn(colors = c("blue", "red")) +
theme_bw() +
theme(axis.title = element_blank(),
axis.text.x = element_text(angle = 45, hjust = 1))
dev.off()
Perturbation effect prioritizing
# calculate topic distribution for each cell.
distri_diff <- Diff_topic_distri(optimalModel,
topic_model_list$perturb_information,
plot = T,
plot_path = "music_output/distribution_of_topic_4_topics.pdf")
saveRDS(distri_diff, "music_output/distri_diff_4_topics.rds")
t_D_diff_matrix <- dcast(distri_diff %>% dplyr::select(knockout, variable, t_D_diff),
knockout ~ variable)
rownames(t_D_diff_matrix) <- t_D_diff_matrix$knockout
t_D_diff_matrix$knockout <- NULL
pdf("music_output/music_merged_4_topics_TPD_heatmap.pdf", width = 12, height = 8)
Heatmap(t_D_diff_matrix,
name = "Topic probability difference (vs ctrl)",
cluster_rows = T, cluster_columns = T,
column_names_rot = 45,
heatmap_legend_param = list(title_gp = gpar(fontsize = 12, fontface = "bold")))
dev.off()
The overall perturbation effect ranking list.
distri_diff <- readRDS(file.path(res_dir, "music_output/distri_diff_4_topics.rds"))
rank_overall_result <- Rank_overall(distri_diff)
print(rank_overall_result)
# saveRDS(rank_overall_result, "music_output/rank_overall_4_topics_result.rds")
perturbation ranking Score off_target
1 SETD5 1 32.193659 none
2 CHD8 2 29.778960 none
3 ASH1L 3 27.549804 none
4 ARID1B 4 26.055173 none
5 PTEN 5 21.015782 none
6 RELN 6 19.725323 none
7 POGZ 7 19.274733 none
8 MECP2 8 16.098968 none
9 CHD2 9 15.626195 none
10 MYT1L 10 13.654875 none
11 DYRK1A 11 11.714480 none
12 HDAC5 12 9.397238 none
13 ADNP 13 8.510481 none
Topic-specific ranking list.
rank_topic_specific_result <- Rank_specific(distri_diff)
print(rank_topic_specific_result)
# saveRDS(rank_topic_specific_result, "music_output/rank_topic_specific_4_topics_result.rds")
topic perturbation ranking
1 Topic1 ARID1B 1
2 Topic1 POGZ 2
3 Topic1 CHD8 3
4 Topic1 MECP2 4
5 Topic1 RELN 5
6 Topic1 SETD5 6
7 Topic1 MYT1L 7
8 Topic1 CHD2 8
9 Topic1 ASH1L 9
10 Topic2 MECP2 1
11 Topic2 MYT1L 2
12 Topic2 HDAC5 3
13 Topic2 RELN 4
14 Topic2 ASH1L 5
15 Topic2 CHD8 6
16 Topic2 PTEN 7
17 Topic2 POGZ 8
18 Topic3 ASH1L 1
19 Topic3 CHD2 2
20 Topic3 PTEN 3
21 Topic3 DYRK1A 4
22 Topic3 ADNP 5
23 Topic3 HDAC5 6
24 Topic3 POGZ 7
25 Topic3 SETD5 8
26 Topic3 MECP2 9
27 Topic3 MYT1L 10
28 Topic4 SETD5 1
29 Topic4 CHD8 2
30 Topic4 ARID1B 3
31 Topic4 RELN 4
32 Topic4 ADNP 5
33 Topic4 MYT1L 6
34 Topic4 POGZ 7
35 Topic4 PTEN 8
36 Topic4 DYRK1A 9
Perturbation correlation.
perturb_cor <- Correlation_perturbation(distri_diff,
cutoff = 0.5, gene = "all", plot = T,
plot_path = file.path(res_dir, "music_output/correlation_network_4_topics.pdf"))
perturb_cor
# saveRDS(perturb_cor, "music_output/perturb_cor_4_topics.rds")
Perturbation_1 Perturbation_2 Correlation
2 ARID1B ADNP -0.447523272
3 ASH1L ADNP 0.808812895
16 ASH1L ARID1B 0.111932621
30 CHD2 ASH1L 0.993900166
4 CHD2 ADNP 0.838181223
17 CHD2 ARID1B 0.094168491
18 CHD8 ARID1B 0.978666101
5 CHD8 ADNP -0.500026993
31 CHD8 ASH1L 0.092206774
44 CHD8 CHD2 0.052593036
6 DYRK1A ADNP 0.955029925
45 DYRK1A CHD2 0.919477277
32 DYRK1A ASH1L 0.919100625
58 DYRK1A CHD8 -0.297668475
19 DYRK1A ARID1B -0.288603303
72 HDAC5 DYRK1A 0.932784067
33 HDAC5 ASH1L 0.874456238
46 HDAC5 CHD2 0.835012221
7 HDAC5 ADNP 0.797584426
20 HDAC5 ARID1B -0.221043654
59 HDAC5 CHD8 -0.157648224
86 MECP2 HDAC5 0.811847524
73 MECP2 DYRK1A 0.672188849
21 MECP2 ARID1B -0.631761939
8 MECP2 ADNP 0.586633586
60 MECP2 CHD8 -0.507132674
34 MECP2 ASH1L 0.433309624
47 MECP2 CHD2 0.378577277
61 MYT1L CHD8 0.840005209
22 MYT1L ARID1B 0.799747668
35 MYT1L ASH1L 0.592301247
48 MYT1L CHD2 0.540801963
87 MYT1L HDAC5 0.398597255
74 MYT1L DYRK1A 0.255949785
100 MYT1L MECP2 -0.059212860
9 MYT1L ADNP 0.005279451
101 POGZ MECP2 0.958316633
23 POGZ ARID1B -0.805004765
88 POGZ HDAC5 0.736344855
62 POGZ CHD8 -0.722861540
75 POGZ DYRK1A 0.688067491
10 POGZ ADNP 0.687278591
36 POGZ ASH1L 0.380222120
49 POGZ CHD2 0.353597376
114 POGZ MYT1L -0.288746616
76 PTEN DYRK1A -0.985212923
50 PTEN CHD2 -0.962750106
11 PTEN ADNP -0.951787011
37 PTEN ASH1L -0.950209735
89 PTEN HDAC5 -0.877983243
128 PTEN POGZ -0.561378750
102 PTEN MECP2 -0.535492807
115 PTEN MYT1L -0.311745226
63 PTEN CHD8 0.216529114
24 PTEN ARID1B 0.178082584
116 RELN MYT1L 0.974354811
64 RELN CHD8 0.938901611
25 RELN ARID1B 0.912206685
129 RELN POGZ -0.491290755
38 RELN ASH1L 0.422418671
51 RELN CHD2 0.377326661
103 RELN MECP2 -0.264536001
12 RELN ADNP -0.187659173
90 RELN HDAC5 0.182050610
142 RELN PTEN -0.121093154
77 RELN DYRK1A 0.047179655
65 SETD5 CHD8 0.935879993
26 SETD5 ARID1B 0.893863822
130 SETD5 POGZ -0.793996102
13 SETD5 ADNP -0.772282356
156 SETD5 RELN 0.767194513
117 SETD5 MYT1L 0.619428909
78 SETD5 DYRK1A -0.600332391
104 SETD5 MECP2 -0.590159858
143 SETD5 PTEN 0.543236545
91 SETD5 HDAC5 -0.425593665
52 SETD5 CHD2 -0.300856393
39 SETD5 ASH1L -0.257298553
Adaptation to the code to generate calibrated empirical TPD scores
summary_df <- Empirical_topic_prob_diff(optimalModel,
topic_model_list$perturb_information)
saveRDS(summary_df, "music_output/music_merged_4_topics_ttest_summary.rds")
summary_df <- readRDS("music_output/music_merged_4_topics_ttest_summary.rds")
log10_pval_mat <- dcast(summary_df %>% dplyr::select(knockout, topic, polar_log10_pval),
knockout ~ topic)
rownames(log10_pval_mat) <- log10_pval_mat$knockout
log10_pval_mat$knockout <- NULL
pdf("music_output/music_merged_4_topics_empirical_tstats_heatmap.pdf",
width = 12, height = 8)
ht <- Heatmap(log10_pval_mat,
name = "Polarized empirical t-test -log10(p-value)\n(KO vs ctrl cell topic probs)",
col = circlize::colorRamp2(breaks = c(-4, 0, 4), colors = c("blue", "grey90", "red")),
cluster_rows = T, cluster_columns = T,
column_names_rot = 45,
heatmap_legend_param = list(title_gp = gpar(fontsize = 12,
fontface = "bold")))
draw(ht)
dev.off()
sessionInfo()
R version 4.0.4 (2021-02-15)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Scientific Linux 7.4 (Nitrogen)
Matrix products: default
BLAS/LAPACK: /software/openblas-0.3.13-el7-x86_64/lib/libopenblas_haswellp-r0.3.13.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] grid stats4 parallel stats graphics grDevices utils
[8] datasets methods base
other attached packages:
[1] igraph_1.2.11 reshape2_1.4.4 gplots_3.1.1
[4] entropy_1.3.1 dplyr_1.0.8 ggplot2_3.3.5
[7] ComplexHeatmap_2.6.2 MUSIC_1.0 SAVER_1.1.2
[10] clusterProfiler_3.18.1 hash_2.2.6.2 topicmodels_0.2-12
[13] Biostrings_2.58.0 XVector_0.30.0 IRanges_2.24.1
[16] S4Vectors_0.28.1 BiocGenerics_0.36.1 SeuratObject_4.0.4
[19] Seurat_4.1.0 data.table_1.14.2 workflowr_1.7.0
loaded via a namespace (and not attached):
[1] utf8_1.2.2 R.utils_2.11.0 reticulate_1.24
[4] tidyselect_1.1.2 RSQLite_2.2.11 AnnotationDbi_1.52.0
[7] htmlwidgets_1.5.4 BiocParallel_1.24.1 Rtsne_0.15
[10] scatterpie_0.1.7 munsell_0.5.0 codetools_0.2-18
[13] ica_1.0-2 future_1.24.0 miniUI_0.1.1.1
[16] withr_2.5.0 spatstat.random_2.1-0 colorspace_2.0-3
[19] GOSemSim_2.16.1 Biobase_2.50.0 NLP_0.2-1
[22] knitr_1.38 rstudioapi_0.13 ROCR_1.0-11
[25] tensor_1.5 DOSE_3.16.0 listenv_0.8.0
[28] git2r_0.30.1 slam_0.1-50 polyclip_1.10-0
[31] bit64_4.0.5 farver_2.1.0 rprojroot_2.0.2
[34] downloader_0.4 parallelly_1.31.0 vctrs_0.4.1
[37] generics_0.1.2 xfun_0.30 R6_2.5.1
[40] clue_0.3-60 graphlayouts_0.8.0 bitops_1.0-7
[43] spatstat.utils_2.3-0 cachem_1.0.6 fgsea_1.21.0
[46] assertthat_0.2.1 promises_1.2.0.1 scales_1.2.0
[49] ggraph_2.0.5 enrichplot_1.10.2 gtable_0.3.0
[52] Cairo_1.5-15 globals_0.14.0 processx_3.5.3
[55] goftest_1.2-3 tidygraph_1.2.0 rlang_1.0.2
[58] GlobalOptions_0.1.2 splines_4.0.4 lazyeval_0.2.2
[61] spatstat.geom_2.3-2 BiocManager_1.30.16 yaml_2.3.5
[64] abind_1.4-5 httpuv_1.6.5 qvalue_2.22.0
[67] tools_4.0.4 ellipsis_0.3.2 spatstat.core_2.4-0
[70] jquerylib_0.1.4 RColorBrewer_1.1-3 ggridges_0.5.3
[73] Rcpp_1.0.9 plyr_1.8.6 zlibbioc_1.36.0
[76] purrr_0.3.4 ps_1.6.0 rpart_4.1-15
[79] deldir_1.0-6 GetoptLong_1.0.5 pbapply_1.5-0
[82] viridis_0.6.2 cowplot_1.1.1 zoo_1.8-9
[85] ggrepel_0.9.1 cluster_2.1.2 fs_1.5.2
[88] magrittr_2.0.3 scattermore_0.7 DO.db_2.9
[91] circlize_0.4.14 lmtest_0.9-40 RANN_2.6.1
[94] whisker_0.4 fitdistrplus_1.1-8 matrixStats_0.61.0
[97] patchwork_1.1.1 mime_0.12 evaluate_0.15
[100] xtable_1.8-4 shape_1.4.6 gridExtra_2.3
[103] compiler_4.0.4 tibble_3.1.6 KernSmooth_2.23-20
[106] crayon_1.5.1 shadowtext_0.1.1 R.oo_1.24.0
[109] htmltools_0.5.2 ggfun_0.0.5 mgcv_1.8-39
[112] later_1.3.0 tidyr_1.2.0 DBI_1.1.2
[115] tweenr_1.0.2 MASS_7.3-56 Matrix_1.4-1
[118] cli_3.2.0 R.methodsS3_1.8.1 pkgconfig_2.0.3
[121] getPass_0.2-2 rvcheck_0.2.1 plotly_4.10.0
[124] spatstat.sparse_2.1-0 foreach_1.5.2 xml2_1.3.3
[127] bslib_0.3.1 yulab.utils_0.0.4 stringr_1.4.0
[130] callr_3.7.0 digest_0.6.29 sctransform_0.3.3
[133] RcppAnnoy_0.0.19 spatstat.data_2.1-2 tm_0.7-8
[136] rmarkdown_2.13 leiden_0.3.9 fastmatch_1.1-3
[139] uwot_0.1.11 gtools_3.9.2 shiny_1.7.1
[142] modeltools_0.2-23 rjson_0.2.21 lifecycle_1.0.1
[145] nlme_3.1-155 jsonlite_1.8.0 viridisLite_0.4.0
[148] fansi_1.0.3 pillar_1.7.0 lattice_0.20-45
[151] fastmap_1.1.0 httr_1.4.2 survival_3.3-1
[154] GO.db_3.12.1 glue_1.6.2 iterators_1.0.14
[157] png_0.1-7 bit_4.0.4 ggforce_0.3.3
[160] stringi_1.7.6 sass_0.4.1 blob_1.2.3
[163] caTools_1.18.2 memoise_2.0.1 irlba_2.3.5
[166] future.apply_1.8.1