  1. Consider the fucci-expression human ESC in Leng et al. 2015

  2. What are the expression profiles of the sorted cells given the phase-specific scores in Macoscko et al. 2015?

  • G1 cells score high on M/G1, G1/S, and M.
  • S cells score high on G1/S, and S.
  • G2 cells score high oh G2 and M.

Compared to the preliminary analysis of our data (here), we observe this pattern in some individuals but not all of the individuals. What’s clear is that phase-specific scores along are not adequate in predicting cell cycle phase. Importantly, there appears to be a pattern in average expression levels within each cell state (fucci-assigned). G1 cells peaks at M/G1 and G1/S, S cells peaks at S, while G2 cells peaks at G2.

  1. 123 genes were identified as variable along cell cycle phase.

Data and packages



Load the normalized and filtered data from Leng et al 2015.


counts <- exprs(HumanLengESC)
libsize <- colSums(counts)
cpm <- t(t(counts)*(10^6)/libsize)
log2cpm <- log2(cpm+1)
pdata <- pData(HumanLengESC)


 G1  G2  H1   S 
 91  76 213  80 

select fucci-expression cells

cpm <- cpm[,pdata$cell_state != "H1"]
log2cpm <- log2cpm[,pdata$cell_state != "H1"]
pdata <- pdata[pdata$cell_state != "H1", ]

filter genes

genes_to_include <- which(rowMeans(cpm)>1)
log2cpm <- log2cpm[genes_to_include,]

import cell cycle genes info

cellcycle <- readRDS("../data/cellcycle-genes-previous-studies/rds/macosko-2017.rds")
which_cc <- which(rownames(log2cpm) %in% cellcycle$hgnc)

subset genes to include only cell cycle genes

log2cpm_cc <- log2cpm[which_cc, ]
cc_genes <- cellcycle[which(cellcycle$hgnc %in% rownames(log2cpm)),]

compute phase-specific scores

cc_scores <- lapply(1:uniqueN(cc_genes$phase), function(i) {
  ph <- unique(cc_genes$phase)[i]
  df_sub <- log2cpm_cc[rownames(log2cpm_cc) %in% cc_genes$hgnc[cc_genes$phase == ph],]
  mn <- colMeans(df_sub)
  cc <- cor(t(rbind(mn, df_sub)))
  cc_mean <- cc[-1,1]
  genes_cc <- names(cc_mean)[which(cc_mean > .3)]
  scores_raw <- colMeans(df_sub[rownames(df_sub) %in% genes_cc,])
  scores_z <- scale(scores_raw)
  return(list(scores_z=scores_z, ngenes = length(genes_cc)))
names(cc_scores) <- unique(cc_genes$phase)

ngenes <- sapply(cc_scores, function(x) x[[2]])
   S G1/S M/G1    M   G2 
  10    6   27   46   34 
scores <-, lapply(cc_scores, function(x) x[[1]]))
colnames(scores) <- unique(cc_genes$phase)

scores_z <- t(apply(scores, 1, scale))
colnames(scores_z) <- unique(cc_genes$phase)
scores_z <-

scores_z_long <- gather(scores_z, key=phase, value=scores)
scores_z_long$sample_id <- rep(rownames(scores_z), ncol(scores_z))

scores_z_long$cell_state <- pdata$cell_state[match(scores_z_long$sample_id, rownames(pdata))]


