Model training and prediction results
Early analysis using trendfilter and evaluating model performance
Investigate the property and quality of the cell time labels derived from GFP and RFP: PVE of intensities by cell times, comparing PVEs by cell times from DAPI and FUCCI vs from FUCCI; comparing prediction before and after removing PC outliers (after is slightly better), and in different sets of genes.
Investigate the property of Seurat classes and cell time
- Noisy label analysis
- Analyze the training datasets
- Analyze the withheld samples
- Evaluate the withheld sample including noisy labels on the top 10 and top 101 cyclical genes] and comparing peco methods with seurat predictions.
- Evaluate the withheld samples removing noisy labels on the top 10 and top 101 cyclical genes and comparing peco methods with seurat predictions.
- Finalizing training for predicting cell times for cells from individuals
- Compute and compare cell time estimates derived under two different assumptions: assuming equal variance in PC1 and PC2 (circle) vs. unequal variances in PC1 and PC2.
- Results assuming equal variance, considering when
- Cell times derived from fucci only: cyclical genes selected, and gene annotations
- Cell times derived from fucci and dapi: cyclical genes selected
- Comparisons: Prediction error smaller when based on fucci time only than based on fucci and dapi time, though only a very small difference. However, the genes selected in these two are similar, both found to have 37 genes enriched for the Cell Cycle GO (0007049).
- Finalizing training for predicting cell times for one individual
- Results assuming equal variance, considering when
Approaches to fitting cyclical trend in gene expression data
Cell cycle signal in gene expression data
- We investigated cell cycle signals in the sequencing data alone.
- We then assign categorical labels of cell cycle and explored the expresson profiles of these categories.
- We ordered cells on a circle using FUCCI intensities alone.
- I used nonparametric methods to identify genes that may be cyclical along cell cycle phases.
- Fit smash and kernel regression on circular variables on a subset of genes with detection rate > .8.
- Fit trendfilter on a subset of genes (5) that are observed (visually) to have cyclical pattern. trendfilter is robust to small proportion of undetected cells, approx 2 or 3%. In cases of simulation when increasing proportion of undetected cells to 20%, we observed a flat line in gene expression for genes previously identified to tend to a cyclical pattern.
- Next, we fit trendfilter on all genes after transforming the data to follow standard normal distribution, permutation-based p-values for PVE are used to select 101 significant cyclical genes.
RNA-seq data preprcessing
- The first step in preprocessing RNA-seq data consists of QC and filtering.
- Sample QC and filtering
- Gene QC and filtering
- We then analyzed and corrected for batch effect due to C1 plate in the sequencing data
Microscopy image analysis
We evaluated and pre-processed the results of image analysis as follows:
- We visually inspect images deteced to have none or more than one nucleus. For cases that are inconsistent with visual inspection, we correct the number of nuclei detected.
- We applied background correction to the intensity measurements of GFP, RFP and DAPI based on the following analyses.
- We analyzed intensity variation across individuals and batches and considers approaches for removing batch effects in the data.
- We investigated the cell time estimates based on FUCCI intensities.
This R Markdown site was created with workflowr