Last updated: 2025-02-27

Checks: 7 0

Knit directory: Recovery_5FU/

This reproducible R Markdown analysis was created with workflowr (version 1.7.1). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.


Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

The command set.seed(20250217) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version e77e0d4. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/

Untracked files:
    Untracked:  code/Recovery_RNAseq_Analysis_EMP_250210.Rmd

Unstaged changes:
    Deleted:    analysis/Recovery_RNAseq_Analysis_EMP_250210.Rmd

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.


These are the previous versions of the repository in which changes were made to the R Markdown (analysis/Recovery_DE.Rmd) and HTML (docs/Recovery_DE.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File Version Author Date Message
Rmd e77e0d4 emmapfort 2025-02-27 Make website cohesive and dataframes concise.

Welcome to my research website.

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Loading required package: limma

Loading required package: ggrepel


Attaching package: 'PCAtools'


The following objects are masked from 'package:stats':

    biplot, screeplot


Loading required package: affy

Loading required package: BiocGenerics


Attaching package: 'BiocGenerics'


The following object is masked from 'package:limma':

    plotMA


The following objects are masked from 'package:lubridate':

    intersect, setdiff, union


The following objects are masked from 'package:dplyr':

    combine, intersect, setdiff, union


The following objects are masked from 'package:stats':

    IQR, mad, sd, var, xtabs


The following objects are masked from 'package:base':

    anyDuplicated, aperm, append, as.data.frame, basename, cbind,
    colnames, dirname, do.call, duplicated, eval, evalq, Filter, Find,
    get, grep, grepl, intersect, is.unsorted, lapply, Map, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int,
    Position, rank, rbind, Reduce, rownames, sapply, saveRDS, setdiff,
    table, tapply, union, unique, unsplit, which.max, which.min


Loading required package: Biobase

Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.



Attaching package: 'affy'


The following object is masked from 'package:lubridate':

    pm


Loading required package: EDASeq

Loading required package: ShortRead

Loading required package: BiocParallel

Loading required package: Biostrings

Loading required package: S4Vectors

Loading required package: stats4


Attaching package: 'S4Vectors'


The following objects are masked from 'package:lubridate':

    second, second<-


The following objects are masked from 'package:dplyr':

    first, rename


The following object is masked from 'package:tidyr':

    expand


The following object is masked from 'package:utils':

    findMatches


The following objects are masked from 'package:base':

    expand.grid, I, unname


Loading required package: IRanges


Attaching package: 'IRanges'


The following object is masked from 'package:lubridate':

    %within%


The following objects are masked from 'package:dplyr':

    collapse, desc, slice


The following object is masked from 'package:purrr':

    reduce


The following object is masked from 'package:grDevices':

    windows


Loading required package: XVector


Attaching package: 'XVector'


The following object is masked from 'package:purrr':

    compact


Loading required package: GenomeInfoDb


Attaching package: 'Biostrings'


The following object is masked from 'package:base':

    strsplit


Loading required package: Rsamtools

Loading required package: GenomicRanges

Loading required package: GenomicAlignments

Loading required package: SummarizedExperiment

Loading required package: MatrixGenerics

Loading required package: matrixStats


Attaching package: 'matrixStats'


The following objects are masked from 'package:Biobase':

    anyMissing, rowMedians


The following object is masked from 'package:dplyr':

    count



Attaching package: 'MatrixGenerics'


The following objects are masked from 'package:matrixStats':

    colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse,
    colCounts, colCummaxs, colCummins, colCumprods, colCumsums,
    colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs,
    colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats,
    colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds,
    colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,
    colWeightedMeans, colWeightedMedians, colWeightedSds,
    colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,
    rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,
    rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps,
    rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,
    rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,
    rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars,
    rowWeightedMads, rowWeightedMeans, rowWeightedMedians,
    rowWeightedSds, rowWeightedVars


The following object is masked from 'package:Biobase':

    rowMedians



Attaching package: 'GenomicAlignments'


The following object is masked from 'package:dplyr':

    last



Attaching package: 'ShortRead'


The following object is masked from 'package:affy':

    intensity


The following object is masked from 'package:dplyr':

    id


The following object is masked from 'package:purrr':

    compose


The following object is masked from 'package:tibble':

    view

These samples (63 in total) have been aligned to the Ensembl 113 release of GRCh38 via subread, and counts have been generated using featureCounts with inbuilt hg38 genome

Data is located under 5FU Project on my computer All analysis saved in RDirectory Recovery_RNAseq

Individual 1 - 84-1 (9 samples)

Individual 2 - 87-1 (9 samples)

Individual 3 - 75-1 (9 samples)

Individual 4 - 75-1 (9 samples)

Individual 5 - 17-3 (9 samples)

Individual 6 - 90-1 (9 samples)

Individual 7 - 90-1 REP (9 samples)

Now that I have imported all of these counts files, I will make a matrix with all of the samples

First I’ll break them up by individual, and then combine them all together

Now that I’ve made individual matrices for each individual - I will combine them together into one large matrix

#Now that I've put together my matrix - I will do some QC checks

#now that I have all of my factors and my gene info, I will make some graphs to show the QC data

reads_by_sample <- c(tx_col)
fC_AllCounts %>% 
  ggplot(., aes (x = Conditions, y = Total_Align, fill = Treatment, group_by = Line))+
  geom_col()+
 geom_hline(aes(yintercept=20000000))+
 scale_fill_manual(values=reads_by_sample)+
  ggtitle(expression("Total number of reads by sample"))+
  xlab("")+
  ylab(expression("RNA-sequencing reads"))+
  theme_bw()+
  theme(plot.title = element_text(size = rel(2), hjust = 0.5),
        axis.title = element_text(size = 15, color = "black"),
        axis.ticks = element_line(linewidth = 1.5),
        axis.line = element_line(linewidth = 1.5),
        axis.text.y = element_text(size =10, color = "black", angle = 0, hjust = 0.8, vjust = 0.5),
        axis.text.x = element_text(size =10, color = "black", angle = 90, hjust = 1, vjust = 0.2),
        #strip.text.x = element_text(size = 15, color = "black", face = "bold"),
        strip.text.y = element_text(color = "white"))



####Read Counts by Treatment####
fC_AllCounts %>% 
  ggplot(., aes (x =Treatment, y= Total_Align, fill = Treatment))+
  geom_boxplot()+
 scale_fill_manual(values=reads_by_sample)+
  ggtitle(expression("Total number of reads by treatment"))+
  xlab("")+
  ylab(expression("RNA-sequencing reads"))+
  theme_bw()+
  
  theme(plot.title = element_text(size = rel(2), hjust = 0.5),
        axis.title = element_text(size = 15, color = "black"),
        axis.ticks = element_line(linewidth = 1.5),
        axis.line = element_line(linewidth = 1.5),
        axis.text.y = element_text(size =10, color = "black", angle = 0, hjust = 0.8, vjust = 0.5),
        axis.text.x = element_text(size =10, color = "black", angle = 90, hjust = 1, vjust = 0.2),
        #strip.text.x = element_text(size = 15, color = "black", face = "bold"),
        strip.text.y = element_text(color = "white"))


####Total Reads Per Individual####
fC_AllCounts %>% 
  ggplot(., aes (x =as.factor(Line), y=Total_Align))+
  geom_boxplot(aes(fill=as.factor(Line)))+
 scale_fill_brewer(palette = "Dark2", name = "Individual")+
  ggtitle(expression("Total number of reads by individual"))+
  xlab("")+
  ylab(expression("RNA-sequencing reads"))+
  theme_bw()+

  theme(plot.title = element_text(size = rel(2), hjust = 0.5),
        axis.title = element_text(size = 15, color = "black"),
        axis.ticks = element_line(linewidth = 1.5),
        axis.line = element_line(linewidth = 1.5),
        axis.text.y = element_text(size =10, color = "black", angle = 0, hjust = 0.8, vjust = 0.5),
        axis.text.x = element_text(size =10, color = "black", angle = 0, hjust = 1),
        #strip.text.x = element_text(size = 15, color = "black", face = "bold"),
        strip.text.y = element_text(color = "white"))


####Total Mapped Reads Per Drug####

reads_by_sample <- c(tx_col)
fC_AllCounts %>% 
  ggplot(., aes (x = Conditions, y = Assigned_Align, fill = Treatment, group_by = Line))+
  geom_col()+
 geom_hline(aes(yintercept=20000000))+
 scale_fill_manual(values=reads_by_sample)+
  ggtitle(expression("Total number of mapped reads by sample"))+
  xlab("")+
  ylab(expression("RNA-sequencing reads"))+
  theme_bw()+
  theme(plot.title = element_text(size = rel(2), hjust = 0.5),
        axis.title = element_text(size = 15, color = "black"),
        axis.ticks = element_line(linewidth = 1.5),
        axis.line = element_line(linewidth = 1.5),
        axis.text.y = element_text(size =10, color = "black", angle = 0, hjust = 0.8, vjust = 0.5),
        axis.text.x = element_text(size =10, color = "black", angle = 90, hjust = 1, vjust = 0.2),
        #strip.text.x = element_text(size = 15, color = "black", face = "bold"),
        strip.text.y = element_text(color = "white"))



####Read Counts by Treatment####
fC_AllCounts %>% 
  ggplot(., aes (x =Treatment, y= Assigned_Align, fill = Treatment))+
  geom_boxplot()+
 scale_fill_manual(values=reads_by_sample)+
  ggtitle(expression("Total number of mapped reads by treatment"))+
  xlab("")+
  ylab(expression("RNA-sequencing reads"))+
  theme_bw()+
  
  theme(plot.title = element_text(size = rel(2), hjust = 0.5),
        axis.title = element_text(size = 15, color = "black"),
        axis.ticks = element_line(linewidth = 1.5),
        axis.line = element_line(linewidth = 1.5),
        axis.text.y = element_text(size =10, color = "black", angle = 0, hjust = 0.8, vjust = 0.5),
        axis.text.x = element_text(size =10, color = "black", angle = 90, hjust = 1, vjust = 0.2),
        #strip.text.x = element_text(size = 15, color = "black", face = "bold"),
        strip.text.y = element_text(color = "white"))


####Total Reads Per Individual####
fC_AllCounts %>% 
  ggplot(., aes (x =as.factor(Line), y=Assigned_Align))+
  geom_boxplot(aes(fill=as.factor(Line)))+
 scale_fill_brewer(palette = "Dark2", name = "Individual")+
  ggtitle(expression("Total number of mapped reads by individual"))+
  xlab("")+
  ylab(expression("RNA-sequencing reads"))+
  theme_bw()+

  theme(plot.title = element_text(size = rel(2), hjust = 0.5),
        axis.title = element_text(size = 15, color = "black"),
        axis.ticks = element_line(linewidth = 1.5),
        axis.line = element_line(linewidth = 1.5),
        axis.text.y = element_text(size =10, color = "black", angle = 0, hjust = 0.8, vjust = 0.5),
        axis.text.x = element_text(size =10, color = "black", angle = 0, hjust = 1),
        #strip.text.x = element_text(size = 15, color = "black", face = "bold"),
        strip.text.y = element_text(color = "white"))

Now that I’ve made the final full matrix of all samples - 1. convert to log2cpm using cpm 2. filter for lowly expressed reads (rowMeans > 0)

boxplot(fC_Matrix_Full_cpm, xlab = "Conditions", ylab = expression("log"[2]* "counts per million"), ylim= c(-10, 20))

hist(fC_Matrix_Full_cpm, xlab = expression("log" [2]* "counts per million"))

#write this to a csv just in case to save it
#write.csv(fC_Matrix_Full_cpm, "C:/Users/emmap/RDirectory/Recovery_RNAseq/Recovery_RNAseq/featureCounts_Concat_Matrix_AllSamples_cpm_EMP_250210.csv")

#first check the number of variables in the cpm file
dim(fC_Matrix_Full_cpm)
[1] 28395    63
#[1] 28395    63

#now use rowMeans to filter out values with mean < 0 (this means rowMeans > 0 is what I'm left with) for each row/gene in the original file
rowMeans_All <- rowMeans(fC_Matrix_Full_cpm)
fC_Matrix_Full_cpm_filter <- fC_Matrix_Full_cpm[rowMeans_All > 0,]
dim(fC_Matrix_Full_cpm_filter)
[1] 14225    63
#[1] 14225    63

#with this I've filtered out a significant number of rows when filtering after cpm conversion - does this make it more balanced?

hist(fC_Matrix_Full_cpm_filter, 
     main = "Recovery All Individuals log2cpm Filtered", 
        xlab = expression("log"[2]*"counts per million"))

boxplot(fC_Matrix_Full_cpm_filter, 
        main = "Recovery All Individuals log2cpm Filtered", 
        xlab = "Conditions", 
        ylab = expression("log"[2]*"counts per million"),
        ylim = c(-10,15))


#write this to a csv to save
#write.csv(fC_Matrix_Full_cpm_filter, "C:/Users/emmap/RDirectory/Recovery_RNAseq/Recovery_RNAseq/featureCounts_Concat_Matrix_AllSamples_cpm_filtered_EMP_250210.csv")
####histogram of total counts, unfiltered####
hist(fC_Matrix_Full_cpm, main = "Recovery All Individuals log2cpm Unfiltered",
     xlab = expression("log" [2]* "counts per million"))

####histogram of total counts, filtered####
hist(fC_Matrix_Full_cpm_filter, 
     main = "Recovery All Individuals log2cpm Filtered rowMeans > 0", 
        xlab = expression("log"[2]*"counts per million"))

####boxplot log2cpm all samples####

boxplot(fC_Matrix_Full_cpm, xlab = "Conditions", ylab = expression("log"[2]* "counts per million"), ylim= c(-10, 20), main = "Boxplots of log2cpm per sample")

####boxplot log2cpm filtered all samples####

boxplot(fC_Matrix_Full_cpm_filter, xlab = "Conditions", ylab = expression("log"[2]* "counts per million"), ylim= c(-10, 20), main = "Boxplots of log2cpm Filtered rowMeans > 0 per sample")

Now that I’ve filtered the samples after log2cpm conversion, I can start to make heatmaps and PCA plots to look at my data

After trying and failing to get all three factors on one PCA plot - I will make iterations of each of these with two factors each

Now, I want to update these PCA plots to have treatment and time as the primary variables Individual will be a fill? Or somehow get some numbers labelled on the points.

PCA_data <- fC_Matrix_Full_cpm_filter %>% 
  prcomp(.) %>% 
  t()

PCA_data_test <- (prcomp(t(fC_Matrix_Full_cpm_filter), scale. = TRUE))

####now make an annotation for my PCA####
annot <- data.frame("samples" = colnames(fC_Matrix_Full_cpm_filter)) %>% separate_wider_delim(., cols = samples, names = c("Tx", "Time", "Ind"), delim = "_", cols_remove = FALSE) %>% unite(., col = "Tx_Time", Tx, Time, sep = "_", remove = FALSE)

#combine the prcomp matrix and annotation
annot_PCA_matrix <- PCA_data_test$x %>% cbind(., annot)

#now I can make a graph where I have filled values for individual! I have seven colors for seven individuals
# I have three fill values for three timepoints

#using annotation matrix above as well as annotated PCA matrix (annot_PCA_matrix)

#extra info for colors in the graph (fill parameter)
fill_col_ind <- c("#66C2A5", "#FC8D62", "#1F78B4", "#E78AC3", "#A6D854", "#FFD92A", "#8B3E9B")

fill_col_ind_dark <- c("#003F5C", "#45AE91",  "#58508D", "#BC4099", "#8B3E9B", "#FF6361", "#FF2362")

fill_col_tx <- c("#63666D", "#DCACED", "#499FBD")

#Now I have to switch the fill and color parameters as it's doing the opposite of what I wanted: need individuals to be the outline color (color), and tx to be the inside color (fill)

####PC1/PC2####
annot_PCA_matrix %>% ggplot(., aes(x=PC1, y=PC2, size = 10)) +
  geom_point(aes(color = Ind, shape = Time, fill = Tx)) +
                       scale_shape_manual(values = c(21, 22, 24)) +
  scale_fill_manual(values =  fill_col_tx)+
  scale_color_manual(values = c(fill_col_ind_dark))+
  guides(fill=guide_legend(override.aes=list(shape=21)))+
  guides(color=guide_legend(override.aes=list(shape=21)))+
  guides(fill=guide_legend(override.aes=list(shape=21,fill=fill_col_tx,color=fill_col_tx)))


####PC2/PC3####
annot_PCA_matrix %>% ggplot(., aes(x=PC2, y=PC3)) +
  geom_point(aes(color = Ind, shape = Time, fill = Tx, size = 10)) +
                       scale_shape_manual(values = c(21, 22, 24)) +
  scale_fill_manual(values =  fill_col_tx)+
  scale_color_manual(values = c(fill_col_ind_dark))+
  guides(fill=guide_legend(override.aes=list(shape=21)))+
  guides(color=guide_legend(override.aes=list(shape=21)))+
  guides(fill=guide_legend(override.aes=list(shape=21,fill=fill_col_tx,color=fill_col_tx)))


####PC3/PC4####
annot_PCA_matrix %>% ggplot(., aes(x=PC3, y=PC4)) +
  geom_point(aes(color = Ind, shape = Time, fill = Tx, size = 10)) +
                       scale_shape_manual(values = c(21, 22, 24)) +
  scale_fill_manual(values =  fill_col_tx)+
  scale_color_manual(values = c(fill_col_ind_dark))+
  guides(fill=guide_legend(override.aes=list(shape=21)))+
  guides(color=guide_legend(override.aes=list(shape=21)))+
  guides(fill=guide_legend(override.aes=list(shape=21,fill=fill_col_tx,color=fill_col_tx)))


#correlation heatmap for all samples

####PEARSON FILTERED####
fC_Matrix_Full_cpm_filter_pearsoncor <-
  cor(
    fC_Matrix_Full_cpm_filter,
    y = NULL,
    use = "everything",
    method = "pearson"
  )

####SPEARMAN FILTERED####
fC_Matrix_Full_cpm_filter_spearmancor <-
  cor(
    fC_Matrix_Full_cpm_filter,
    y = NULL,
    use = "everything",
    method = "spearman"
  )



#Now let's graph these correlations

pheatmap(fC_Matrix_Full_cpm_filter_pearsoncor, border_color = "black", legend = TRUE, angle_col = 90, display_numbers = FALSE, number_color = "black", fontsize = 12, fontsize_number = 9)


pheatmap(fC_Matrix_Full_cpm_filter_spearmancor, border_color = "black", legend = TRUE, angle_col = 90, display_numbers = FALSE, number_color = "black", fontsize = 12, fontsize_number = 9)




#now annotate these heatmaps
#Factor 1 - Individual
Individual <- as.factor(c(rep("Ind1", 9), rep("Ind2", 9), rep("Ind3", 9), rep("Ind4", 9), rep("Ind5", 9), rep("Ind6", 9), rep("Ind6REP", 9)))

#Factor 2 - Treatment
tx_factor <- c("DOX", "FLUO", "DMSO")
Tx <- as.factor(c(rep(tx_factor, 21)))
#view(Treatment)

#Factor 3 - Timepoint
time_factor <- c(rep("24", 3), rep("24rec", 3), rep("144rec", 3))
Time <- as.factor(c(rep(time_factor, 7)))

####annotation for colors####
annot_col_hm = list(Tx = c(DOX = "red", FLUO = "blue", DMSO = "black"),
                             Ind = c(Ind1 = "#66C2A5", Ind2 = "#FC8D62", Ind3 = "#1F78B4", Ind4 = "#E78AC3", Ind5 = "#A6D854", Ind6 = "#FFD92A", Ind6REP = "#8B3E9B"),
                             Time = c("24" = "#046A38", "24rec" = "#0050B5", "144rec" = "#B725AD"))

####annotation for values####
annot_list_hm <- data.frame(Individual = as.factor(c(rep("Ind1", 9), rep("Ind2", 9), rep("Ind3", 9), rep("Ind4", 9), rep("Ind5", 9), rep("Ind6", 9), rep("Ind6REP", 9))),
                                               Tx = as.factor(c(rep(tx_factor, 21))), 
                                               Time = as.factor(c(rep(time_factor, 7))))

  
##add in the annotations from above into the dataframe
row.names(annot_list_hm) <- colnames(fC_Matrix_Full_cpm_filter_pearsoncor)
row.names(annot_list_hm) <- colnames(fC_Matrix_Full_cpm_filter_spearmancor)

####ANNOTATED HEATMAPS####
pheatmap(fC_Matrix_Full_cpm_filter_pearsoncor, border_color = "black", legend = TRUE, angle_col = 90, display_numbers = FALSE, number_color = "black", fontsize = 10, fontsize_number = 5, annotation_col = annot_list_hm, annotation_colors = annot_col_hm)


pheatmap(fC_Matrix_Full_cpm_filter_spearmancor, border_color = "black", legend = TRUE, angle_col = 90, display_numbers = FALSE, number_color = "black", fontsize = 10, fontsize_number = 5, annotation_col = annot_list_hm, annotation_colors = annot_col_hm)


# group <- c(rep(c(1, 2, 3, 4, 5, 6, 7, 8, 9), 6))
# group <- factor(group, levels = c("1", "2", "3", "4", "5", "6", "7", "8", "9"))


####rowMeans > 0 Filtering####
rowMeans_DE <- rowMeans(counts_DE)
counts_DE_filter <- counts_DE[rowMeans_DE > 0,]
dim(counts_DE_filter)
[1] 26445    54

group_1 <- rep(c("DOX_24","FLUO_24",
                                "DMSO_24",
                                "DOX_24rec",
                                "FLUO_24rec",
                                "DMSO_24rec",
                                "DOX_144rec",
                                "FLUO_144rec",
                                "DMSO_144rec"), 6)

dge <- DGEList.data.frame(counts = counts_DE_filter, group = group_1, genes = row.names(counts_DE_filter))

#this is the inital file before norm factors are calculated
dge$samples
                       group lib.size norm.factors
DOX_24_Ind1           DOX_24 23533985            1
FLUO_24_Ind1         FLUO_24 26970695            1
DMSO_24_Ind1         DMSO_24 22892686            1
DOX_24rec_Ind1     DOX_24rec 23932153            1
FLUO_24rec_Ind1   FLUO_24rec 21879931            1
DMSO_24rec_Ind1   DMSO_24rec 21337454            1
DOX_144rec_Ind1   DOX_144rec 18260440            1
FLUO_144rec_Ind1 FLUO_144rec 18780865            1
DMSO_144rec_Ind1 DMSO_144rec 28167799            1
DOX_24_Ind2           DOX_24 20017534            1
FLUO_24_Ind2         FLUO_24 23930827            1
DMSO_24_Ind2         DMSO_24 21334270            1
DOX_24rec_Ind2     DOX_24rec 25714988            1
FLUO_24rec_Ind2   FLUO_24rec 25160237            1
DMSO_24rec_Ind2   DMSO_24rec 26356290            1
DOX_144rec_Ind2   DOX_144rec 23497279            1
FLUO_144rec_Ind2 FLUO_144rec 24769290            1
DMSO_144rec_Ind2 DMSO_144rec 25878841            1
DOX_24_Ind3           DOX_24 23195184            1
FLUO_24_Ind3         FLUO_24 25146622            1
DMSO_24_Ind3         DMSO_24 25648723            1
DOX_24rec_Ind3     DOX_24rec 18250610            1
FLUO_24rec_Ind3   FLUO_24rec 23122465            1
DMSO_24rec_Ind3   DMSO_24rec 24364234            1
DOX_144rec_Ind3   DOX_144rec 19801072            1
FLUO_144rec_Ind3 FLUO_144rec 29845161            1
DMSO_144rec_Ind3 DMSO_144rec 22672688            1
DOX_24_Ind4           DOX_24 20726954            1
FLUO_24_Ind4         FLUO_24 23704697            1
DMSO_24_Ind4         DMSO_24 28213836            1
DOX_24rec_Ind4     DOX_24rec 25911782            1
FLUO_24rec_Ind4   FLUO_24rec 40622129            1
DMSO_24rec_Ind4   DMSO_24rec 26124863            1
DOX_144rec_Ind4   DOX_144rec 24701727            1
FLUO_144rec_Ind4 FLUO_144rec 24610641            1
DMSO_144rec_Ind4 DMSO_144rec 25455999            1
DOX_24_Ind5           DOX_24 22697304            1
FLUO_24_Ind5         FLUO_24 27856508            1
DMSO_24_Ind5         DMSO_24 24639151            1
DOX_24rec_Ind5     DOX_24rec 24885980            1
FLUO_24rec_Ind5   FLUO_24rec 29665028            1
DMSO_24rec_Ind5   DMSO_24rec 26036618            1
DOX_144rec_Ind5   DOX_144rec 27513167            1
FLUO_144rec_Ind5 FLUO_144rec 23993986            1
DMSO_144rec_Ind5 DMSO_144rec 24942365            1
DOX_24_Ind6           DOX_24 25308109            1
FLUO_24_Ind6         FLUO_24 25269296            1
DMSO_24_Ind6         DMSO_24 25666994            1
DOX_24rec_Ind6     DOX_24rec 26209951            1
FLUO_24rec_Ind6   FLUO_24rec 25079566            1
DMSO_24rec_Ind6   DMSO_24rec 24465515            1
DOX_144rec_Ind6   DOX_144rec 23359517            1
FLUO_144rec_Ind6 FLUO_144rec 22471386            1
DMSO_144rec_Ind6 DMSO_144rec 25460267            1

#calculate the normalization factors with method TMM
dge_calc <- calcNormFactors(dge, method = "TMM")

#final file after norm calculation
dge_calc$samples
                       group lib.size norm.factors
DOX_24_Ind1           DOX_24 23533985    1.0837997
FLUO_24_Ind1         FLUO_24 26970695    0.9310699
DMSO_24_Ind1         DMSO_24 22892686    0.9611153
DOX_24rec_Ind1     DOX_24rec 23932153    1.2332319
FLUO_24rec_Ind1   FLUO_24rec 21879931    0.9480185
DMSO_24rec_Ind1   DMSO_24rec 21337454    0.9725610
DOX_144rec_Ind1   DOX_144rec 18260440    1.0159884
FLUO_144rec_Ind1 FLUO_144rec 18780865    0.9797092
DMSO_144rec_Ind1 DMSO_144rec 28167799    0.9690593
DOX_24_Ind2           DOX_24 20017534    1.1031490
FLUO_24_Ind2         FLUO_24 23930827    0.9960906
DMSO_24_Ind2         DMSO_24 21334270    0.9579222
DOX_24rec_Ind2     DOX_24rec 25714988    1.0897455
FLUO_24rec_Ind2   FLUO_24rec 25160237    0.9709110
DMSO_24rec_Ind2   DMSO_24rec 26356290    0.9632263
DOX_144rec_Ind2   DOX_144rec 23497279    0.8853001
FLUO_144rec_Ind2 FLUO_144rec 24769290    0.9818596
DMSO_144rec_Ind2 DMSO_144rec 25878841    0.9747532
DOX_24_Ind3           DOX_24 23195184    0.8512838
FLUO_24_Ind3         FLUO_24 25146622    0.9755381
DMSO_24_Ind3         DMSO_24 25648723    0.9934558
DOX_24rec_Ind3     DOX_24rec 18250610    1.3034379
FLUO_24rec_Ind3   FLUO_24rec 23122465    0.9351582
DMSO_24rec_Ind3   DMSO_24rec 24364234    0.9693937
DOX_144rec_Ind3   DOX_144rec 19801072    1.0246641
FLUO_144rec_Ind3 FLUO_144rec 29845161    0.9803051
DMSO_144rec_Ind3 DMSO_144rec 22672688    0.9806025
DOX_24_Ind4           DOX_24 20726954    1.1649074
FLUO_24_Ind4         FLUO_24 23704697    0.9849905
DMSO_24_Ind4         DMSO_24 28213836    0.9912683
DOX_24rec_Ind4     DOX_24rec 25911782    1.1791631
FLUO_24rec_Ind4   FLUO_24rec 40622129    0.9751981
DMSO_24rec_Ind4   DMSO_24rec 26124863    0.9981792
DOX_144rec_Ind4   DOX_144rec 24701727    0.9176890
FLUO_144rec_Ind4 FLUO_144rec 24610641    0.9816716
DMSO_144rec_Ind4 DMSO_144rec 25455999    0.9636547
DOX_24_Ind5           DOX_24 22697304    1.0773544
FLUO_24_Ind5         FLUO_24 27856508    0.9397778
DMSO_24_Ind5         DMSO_24 24639151    0.9538894
DOX_24rec_Ind5     DOX_24rec 24885980    1.2233210
FLUO_24rec_Ind5   FLUO_24rec 29665028    0.9193648
DMSO_24rec_Ind5   DMSO_24rec 26036618    0.9527365
DOX_144rec_Ind5   DOX_144rec 27513167    0.9469333
FLUO_144rec_Ind5 FLUO_144rec 23993986    0.9381287
DMSO_144rec_Ind5 DMSO_144rec 24942365    0.9285333
DOX_24_Ind6           DOX_24 25308109    1.0981454
FLUO_24_Ind6         FLUO_24 25269296    0.9948090
DMSO_24_Ind6         DMSO_24 25666994    1.0159532
DOX_24rec_Ind6     DOX_24rec 26209951    1.1400925
FLUO_24rec_Ind6   FLUO_24rec 25079566    0.9794435
DMSO_24rec_Ind6   DMSO_24rec 24465515    0.9860363
DOX_144rec_Ind6   DOX_144rec 23359517    0.9397642
FLUO_144rec_Ind6 FLUO_144rec 22471386    0.9892227
DMSO_144rec_Ind6 DMSO_144rec 25460267    0.9800208

#View(dge_calc)

#Pull out factors
snames <- data.frame("samples" = colnames(dge_calc)) %>% separate_wider_delim(., cols = samples, names = c("Treatment", "Time", "Individual"), delim = "_", cols_remove = FALSE)

#snames_list <- as.list(snames)

snames_time <- snames$Time
snames_tx <- snames$Treatment
snames_ind <- snames$Individual

#Create my model matrix
mm_r <- model.matrix(~0 + group_1)

p <- voom(dge_calc$counts, mm_r, plot = TRUE)


corfit <- duplicateCorrelation(p, mm_r, block = snames_ind)

v <- voom(dge_calc$counts, mm_r, block = snames_ind, correlation = corfit$consensus)

fit <- lmFit(v, mm_r, block = snames_ind, correlation = corfit$consensus)

#make sure to check which order the columns are in - otherwise they won't match right (it was moved into alphabetical and number order)
colnames(mm_r) <- c("DMSO_144rec","DMSO_24","DMSO_24rec","DOX_144rec","DOX_24","DOX_24rec","FLUO_144rec","FLUO_24","FLUO_24rec")

cm_r <- makeContrasts(
        V.D24 = DOX_24 - DMSO_24,
        V.F24 = FLUO_24 - DMSO_24,
        V.D24r = DOX_24rec - DMSO_24rec,
        V.F24r = FLUO_24rec - DMSO_24rec,
        V.D144r = DOX_144rec - DMSO_144rec,
        V.F144r = FLUO_144rec - DMSO_144rec,
        levels = mm_r
)

vfit_r <- lmFit(p, mm_r)
vfit_r <- contrasts.fit(vfit_r, contrasts = cm_r)

efit2 <- eBayes(vfit_r)

results = decideTests(efit2)
summary(results)
       V.D24 V.F24 V.D24r V.F24r V.D144r V.F144r
Down    4567     0   2083      0     288       0
NotSig 14521 26445  16339  26445   25296   26445
Up      7357     0   8023      0     861       0
#inbuilt annotation results rowMeans >0
#         V.D24 V.F24 V.D24r V.F24r V.D144r V.F144r
# Down    4567     0   2083      0     288       0
# NotSig 14521 26445  16339  26445   25296   26445
# Up      7357     0   8023      0     861       0

## have no DE genes detected for 5FU, only DOX


####plot your voom####
voom_plot <- voom(dge_calc, mm_r, plot = TRUE)

The voom plot for this with rowMeans > 0 has a dip in the front, meaning that there is likely some noise left over in my data. I am going to filter a bit stricter at rowMeans > 1 with the same process as above.


####rowMeans >1####
rowMeans_DE <- rowMeans(counts_DE)
counts_DE_filter1 <- counts_DE[rowMeans_DE > 1,]
dim(counts_DE_filter1)
[1] 20443    54


dge1 <- DGEList.data.frame(counts = counts_DE_filter1, group = group_1, genes = row.names(counts_DE_filter1))

#this is the inital file before norm factors are calculated
dge1$samples
                       group lib.size norm.factors
DOX_24_Ind1           DOX_24 23529436            1
FLUO_24_Ind1         FLUO_24 26969621            1
DMSO_24_Ind1         DMSO_24 22891830            1
DOX_24rec_Ind1     DOX_24rec 23928177            1
FLUO_24rec_Ind1   FLUO_24rec 21879021            1
DMSO_24rec_Ind1   DMSO_24rec 21336552            1
DOX_144rec_Ind1   DOX_144rec 18258678            1
FLUO_144rec_Ind1 FLUO_144rec 18779997            1
DMSO_144rec_Ind1 DMSO_144rec 28166609            1
DOX_24_Ind2           DOX_24 20015891            1
FLUO_24_Ind2         FLUO_24 23929971            1
DMSO_24_Ind2         DMSO_24 21333555            1
DOX_24rec_Ind2     DOX_24rec 25712271            1
FLUO_24rec_Ind2   FLUO_24rec 25159540            1
DMSO_24rec_Ind2   DMSO_24rec 26355579            1
DOX_144rec_Ind2   DOX_144rec 23495279            1
FLUO_144rec_Ind2 FLUO_144rec 24768504            1
DMSO_144rec_Ind2 DMSO_144rec 25877847            1
DOX_24_Ind3           DOX_24 23190452            1
FLUO_24_Ind3         FLUO_24 25145882            1
DMSO_24_Ind3         DMSO_24 25647796            1
DOX_24rec_Ind3     DOX_24rec 18243178            1
FLUO_24rec_Ind3   FLUO_24rec 23121928            1
DMSO_24rec_Ind3   DMSO_24rec 24363550            1
DOX_144rec_Ind3   DOX_144rec 19797829            1
FLUO_144rec_Ind3 FLUO_144rec 29844009            1
DMSO_144rec_Ind3 DMSO_144rec 22672047            1
DOX_24_Ind4           DOX_24 20724269            1
FLUO_24_Ind4         FLUO_24 23703826            1
DMSO_24_Ind4         DMSO_24 28212829            1
DOX_24rec_Ind4     DOX_24rec 25908531            1
FLUO_24rec_Ind4   FLUO_24rec 40620908            1
DMSO_24rec_Ind4   DMSO_24rec 26124015            1
DOX_144rec_Ind4   DOX_144rec 24699442            1
FLUO_144rec_Ind4 FLUO_144rec 24609834            1
DMSO_144rec_Ind4 DMSO_144rec 25455151            1
DOX_24_Ind5           DOX_24 22693201            1
FLUO_24_Ind5         FLUO_24 27855535            1
DMSO_24_Ind5         DMSO_24 24638257            1
DOX_24rec_Ind5     DOX_24rec 24881937            1
FLUO_24rec_Ind5   FLUO_24rec 29663662            1
DMSO_24rec_Ind5   DMSO_24rec 26035482            1
DOX_144rec_Ind5   DOX_144rec 27510410            1
FLUO_144rec_Ind5 FLUO_144rec 23992921            1
DMSO_144rec_Ind5 DMSO_144rec 24941416            1
DOX_24_Ind6           DOX_24 25305699            1
FLUO_24_Ind6         FLUO_24 25268538            1
DMSO_24_Ind6         DMSO_24 25666263            1
DOX_24rec_Ind6     DOX_24rec 26207393            1
FLUO_24rec_Ind6   FLUO_24rec 25078845            1
DMSO_24rec_Ind6   DMSO_24rec 24464720            1
DOX_144rec_Ind6   DOX_144rec 23357605            1
FLUO_144rec_Ind6 FLUO_144rec 22470700            1
DMSO_144rec_Ind6 DMSO_144rec 25459487            1

#calculate the normalization factors with method TMM
dge1_calc <- calcNormFactors(dge1, method = "TMM")

#final file after norm calculation
dge1_calc$samples
                       group lib.size norm.factors
DOX_24_Ind1           DOX_24 23529436    1.1010478
FLUO_24_Ind1         FLUO_24 26969621    0.9224026
DMSO_24_Ind1         DMSO_24 22891830    0.9523318
DOX_24rec_Ind1     DOX_24rec 23928177    1.2419707
FLUO_24rec_Ind1   FLUO_24rec 21879021    0.9414647
DMSO_24rec_Ind1   DMSO_24rec 21336552    0.9714352
DOX_144rec_Ind1   DOX_144rec 18258678    1.0061282
FLUO_144rec_Ind1 FLUO_144rec 18779997    0.9577701
DMSO_144rec_Ind1 DMSO_144rec 28166609    0.9633660
DOX_24_Ind2           DOX_24 20015891    1.1311547
FLUO_24_Ind2         FLUO_24 23929971    1.0037288
DMSO_24_Ind2         DMSO_24 21333555    0.9711101
DOX_24rec_Ind2     DOX_24rec 25712271    1.0919406
FLUO_24rec_Ind2   FLUO_24rec 25159540    0.9757042
DMSO_24rec_Ind2   DMSO_24rec 26355579    0.9694902
DOX_144rec_Ind2   DOX_144rec 23495279    0.8831165
FLUO_144rec_Ind2 FLUO_144rec 24768504    0.9802117
DMSO_144rec_Ind2 DMSO_144rec 25877847    0.9735050
DOX_24_Ind3           DOX_24 23190452    0.8944444
FLUO_24_Ind3         FLUO_24 25145882    0.9787822
DMSO_24_Ind3         DMSO_24 25647796    0.9998010
DOX_24rec_Ind3     DOX_24rec 18243178    1.3366353
FLUO_24rec_Ind3   FLUO_24rec 23121928    0.9387564
DMSO_24rec_Ind3   DMSO_24rec 24363550    0.9645367
DOX_144rec_Ind3   DOX_144rec 19797829    1.0329079
FLUO_144rec_Ind3 FLUO_144rec 29844009    0.9794210
DMSO_144rec_Ind3 DMSO_144rec 22672047    0.9717442
DOX_24_Ind4           DOX_24 20724269    1.2119035
FLUO_24_Ind4         FLUO_24 23703826    0.9874501
DMSO_24_Ind4         DMSO_24 28212829    0.9971983
DOX_24rec_Ind4     DOX_24rec 25908531    1.1839389
FLUO_24rec_Ind4   FLUO_24rec 40620908    0.9545701
DMSO_24rec_Ind4   DMSO_24rec 26124015    0.9903031
DOX_144rec_Ind4   DOX_144rec 24699442    0.9135958
FLUO_144rec_Ind4 FLUO_144rec 24609834    0.9800993
DMSO_144rec_Ind4 DMSO_144rec 25455151    0.9499049
DOX_24_Ind5           DOX_24 22693201    1.1040571
FLUO_24_Ind5         FLUO_24 27855535    0.9171950
DMSO_24_Ind5         DMSO_24 24638257    0.9316924
DOX_24rec_Ind5     DOX_24rec 24881937    1.2117783
FLUO_24rec_Ind5   FLUO_24rec 29663662    0.8980900
DMSO_24rec_Ind5   DMSO_24rec 26035482    0.9397879
DOX_144rec_Ind5   DOX_144rec 27510410    0.9273815
FLUO_144rec_Ind5 FLUO_144rec 23992921    0.9249558
DMSO_144rec_Ind5 DMSO_144rec 24941416    0.9177884
DOX_24_Ind6           DOX_24 25305699    1.1312672
FLUO_24_Ind6         FLUO_24 25268538    0.9892086
DMSO_24_Ind6         DMSO_24 25666263    1.0174650
DOX_24rec_Ind6     DOX_24rec 26207393    1.1422058
FLUO_24rec_Ind6   FLUO_24rec 25078845    0.9747427
DMSO_24rec_Ind6   DMSO_24rec 24464720    0.9836358
DOX_144rec_Ind6   DOX_144rec 23357605    0.9424696
FLUO_144rec_Ind6 FLUO_144rec 22470700    0.9862254
DMSO_144rec_Ind6 DMSO_144rec 25459487    0.9808584

#View(dge1_calc)

#Pull out factors
snames1 <- data.frame("samples" = colnames(dge1_calc)) %>% separate_wider_delim(., cols = samples, names = c("Treatment", "Time", "Individual"), delim = "_", cols_remove = FALSE)

#snames_list <- as.list(snames)

snames1_time <- snames1$Time
snames1_tx <- snames1$Treatment
snames1_ind <- snames1$Individual

#Create my model matrix
mm_r1 <- model.matrix(~0 + group_1)

p1 <- voom(dge1_calc$counts, mm_r1, plot = TRUE)


corfit1 <- duplicateCorrelation(p1, mm_r1, block = snames1_ind)

v1 <- voom(dge1_calc$counts, mm_r1, block = snames1_ind, correlation = corfit1$consensus)

fit1 <- lmFit(v1, mm_r1, block = snames1_ind, correlation = corfit1$consensus)

#make sure to check which order the columns are in - otherwise they won't match right (it was moved into alphabetical and number order)
colnames(mm_r1) <- c("DMSO_144rec","DMSO_24","DMSO_24rec","DOX_144rec","DOX_24","DOX_24rec","FLUO_144rec","FLUO_24","FLUO_24rec")

cm_r1 <- makeContrasts(
        V.D24 = DOX_24 - DMSO_24,
        V.F24 = FLUO_24 - DMSO_24,
        V.D24r = DOX_24rec - DMSO_24rec,
        V.F24r = FLUO_24rec - DMSO_24rec,
        V.D144r = DOX_144rec - DMSO_144rec,
        V.F144r = FLUO_144rec - DMSO_144rec,
        levels = mm_r1
)

vfit_r1 <- lmFit(p1, mm_r1)
vfit_r1 <- contrasts.fit(vfit_r1, contrasts = cm_r1)

efit4 <- eBayes(vfit_r1)

results1 = decideTests(efit4)
summary(results1)
       V.D24 V.F24 V.D24r V.F24r V.D144r V.F144r
Down    4640     0   2120      0     263       0
NotSig  9460 20443  11485  20443   19838   20443
Up      6343     0   6838      0     342       0
#inbuilt annotation results rowMeans >0
#         V.D24 V.F24 V.D24r V.F24r V.D144r V.F144r
# Down    4567     0   2083      0     288       0
# NotSig 14521 26445  16339  26445   25296   26445
# Up      7357     0   8023      0     861       0

## have no DE genes detected for 5FU, only DOX


####plot your voom####
voom_plot1 <- voom(dge1_calc, mm_r1, plot = TRUE)


top.table_V.D24 <- topTable(fit = efit4, coef = "V.D24", number = nrow(dge_calc), adjust.method = "BH", p.value = 1, sort.by = "none")
head(top.table_V.D24)

top.table_V.F24 <- topTable(fit = efit4, coef = "V.F24", number = nrow(dge_calc), adjust.method = "BH", p.value = 1, sort.by = "none")
head(top.table_V.F24)

top.table_V.D24r <- topTable(fit = efit4, coef = "V.D24r", number = nrow(dge_calc), adjust.method = "BH", p.value = 1, sort.by = "none")
head(top.table_V.D24r)

top.table_V.F24r <- topTable(fit = efit4, coef = "V.F24r", number = nrow(dge_calc), adjust.method = "BH", p.value = 1, sort.by = "none")
head(top.table_V.F24r)

top.table_V.D144r <- topTable(fit = efit4, coef = "V.D144r", number = nrow(dge_calc), adjust.method = "BH", p.value = 1, sort.by = "none")
head(top.table_V.D144r)

top.table_V.F144r <- topTable(fit = efit4, coef = "V.F144r", number = nrow(dge_calc), adjust.method = "BH", p.value = 1, sort.by = "none")
head(top.table_V.F144r)

Volcano Plots


generate_volcano_plot <- function(toptable, title) {
  
  # Add Significance Labels
  toptable$Significance <- "Not Significant"
  toptable$Significance[toptable$logFC > 0 & toptable$adj.P.Val < 0.05] <- "Upregulated"
  toptable$Significance[toptable$logFC < 0 & toptable$adj.P.Val < 0.05] <- "Downregulated"

  # Generate Volcano Plot
  ggplot(toptable, aes(x = logFC, y = -log10(P.Value), color = Significance)) +
    geom_point(alpha = 0.4, size = 2) + 
    scale_color_manual(values = c("Downregulated" = "red", "Upregulated" = "blue", "Not Significant" = "gray")) +
    xlim(-5, 5) +
    labs(title = title, x = "log2 Fold Change", y = "-log10 P-value") +
    theme(legend.position = "none", 
          plot.title = element_text(size = rel(1.5), hjust = 0.5),
          axis.title = element_text(size = rel(1.25))) +
    theme_bw()
}

#now that I've made a function, I can make volcano plots for each of my comparisons (6 total)

volcano_plots <- list(
  "V.D24" = generate_volcano_plot(top.table_V.D24, "Volcano Plot DOX 24hr (adj P-val<0.05)"),
  "V.F24" = generate_volcano_plot(top.table_V.F24, "Volcano Plot 5FU 24hr (adj P-val<0.05)"),
  "V.D24r" = generate_volcano_plot(top.table_V.D24r, "Volcano Plot DOX 24hr Recovery (adj P-val<0.05)"),
  "V.F24r" = generate_volcano_plot(top.table_V.F24r, "Volcano Plot 5FU 24hr Recovery (adj P-val<0.05)"),
  "V.D144r" = generate_volcano_plot(top.table_V.D144r, "Volcano Plot DOX 6d Recovery (adj P-val<0.05)"),
  "V.F144r" = generate_volcano_plot(top.table_V.F144r, "Volcano Plot 5FU 6d Recovery (adj P-val<0.05)")
)

# Display each volcano plot
for (plot_name in names(volcano_plots)) {
  print(volcano_plots[[plot_name]])
}
Warning: Removed 344 rows containing missing values or values outside the scale range
(`geom_point()`).

Warning: Removed 66 rows containing missing values or values outside the scale range
(`geom_point()`).

Warning: Removed 23 rows containing missing values or values outside the scale range
(`geom_point()`).

Now that I’ve gone through and figured out the basics of DE - let’s remove unwanted variation using ruv This uses my replicates to figure out covariates in the data and remove variation occurring due to batch effects or other effects We’re seeing a batch effect occurring here in PCA and cor. heatmaps

# #counts need to be integer values and in a numeric matrix for ruv
# counts_DE_matrix <- as.matrix(counts_DE_filter1)
# dim(counts_DE_matrix)
# #[1] 20443    54 - new inbuilt
# #this does NOT contain the replicate samples, save for later
# 
# ##this file contains the replicates that I can pull from
# fC_Matrix_cpm_filter_ruv <- as.matrix (fC_Matrix_Full_cpm_filter)
# 
# #create a matrix specifying the replicates
# reps <- as.data.frame(fC_Matrix_Full_cpm_filter) %>%
#   dplyr::select(contains("Ind6"))
# dim(reps)
# #[1] 14225    18
# reps_mat <- as.matrix(reps)
# 
# #using the documentation do this:
# #controls: gene-by-sample numeric matrix containing counts
# controls <- rownames(fC_Matrix_Full_cpm_filter)
# 
# #I have replicates for Individual 6, which I can compare using this method
# #They are matched as following:
# ##col46:55, 47:56, 48:57, 49:58, 50:59, 51:60, 52:61, 53:62, 54:63
# ###this is the scIdx argument for reps, the cIdx argument is for genes
# #differences <- matrix(data = c(1, 10, 2, 11, 3, 12, 4, 13, 5, 14, 6, 15, 7, 16, 8, 17, 9, 18), byrow = TRUE, nrow = 9)
# differences_full <- matrix(data = c(46, 55, 47, 56, 48 , 57, 49, 58, 50 , 59, 51, 60, 52, 61, 53, 62, 54, 63), byrow = TRUE, nrow = 9)
# 
# 
# signature(x = "matrix", cIdx = "ANY", k = "numeric", scIdx = "matrix")
# 
# 
# seqRUVs <- RUVs(rep, controls, k=3, differences_full, isLog = TRUE)
# 
# pData(counts_DE)
# head(seqRUVs)
# 
# 
# ###now try putting this in your matrix?
# design <- model.matrix(~0 + group_1, seqRUVs)
# 
# 
# 
# 
# ####try it exactly like the documentation and filter afterwards?####
# genes <- rownames(fC_Matrix_Full)[grep("^"), rownames(fC_Matrix_Full)]
# spikes <- rownames()
# 
# ####leaving off here EMP 25/02/26####

sessionInfo()
R version 4.4.2 (2024-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64
Running under: Windows 11 x64 (build 22000)

Matrix products: default


locale:
[1] LC_COLLATE=English_United States.utf8 
[2] LC_CTYPE=English_United States.utf8   
[3] LC_MONETARY=English_United States.utf8
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.utf8    

time zone: America/Chicago
tzcode source: internal

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
 [1] RUVSeq_1.40.0               EDASeq_2.40.0              
 [3] ShortRead_1.64.0            GenomicAlignments_1.42.0   
 [5] SummarizedExperiment_1.36.0 MatrixGenerics_1.18.1      
 [7] matrixStats_1.4.1           Rsamtools_2.22.0           
 [9] GenomicRanges_1.58.0        Biostrings_2.74.0          
[11] GenomeInfoDb_1.42.3         XVector_0.46.0             
[13] IRanges_2.40.0              S4Vectors_0.44.0           
[15] BiocParallel_1.40.0         biomaRt_2.62.1             
[17] RColorBrewer_1.1-3          Cormotif_1.52.0            
[19] affy_1.84.0                 Biobase_2.66.0             
[21] BiocGenerics_0.52.0         PCAtools_2.18.0            
[23] ggrepel_0.9.6               ggfortify_0.4.17           
[25] pheatmap_1.0.12             edgeR_4.4.0                
[27] limma_3.62.1                readxl_1.4.3               
[29] edgebundleR_0.1.4           lubridate_1.9.3            
[31] forcats_1.0.0               stringr_1.5.1              
[33] dplyr_1.1.4                 purrr_1.0.2                
[35] readr_2.1.5                 tidyr_1.3.1                
[37] tidyverse_2.0.0             tibble_3.2.1               
[39] hrbrthemes_0.8.7            reshape2_1.4.4             
[41] ggplot2_3.5.1               workflowr_1.7.1            

loaded via a namespace (and not attached):
  [1] later_1.4.1               BiocIO_1.16.0            
  [3] bitops_1.0-9              filelock_1.0.3           
  [5] R.oo_1.27.0               cellranger_1.1.0         
  [7] preprocessCore_1.68.0     XML_3.99-0.18            
  [9] lifecycle_1.0.4           httr2_1.1.0              
 [11] pwalign_1.2.0             rprojroot_2.0.4          
 [13] vroom_1.6.5               MASS_7.3-61              
 [15] processx_3.8.5            lattice_0.22-6           
 [17] magrittr_2.0.3            sass_0.4.9               
 [19] rmarkdown_2.29            jquerylib_0.1.4          
 [21] yaml_2.3.10               httpuv_1.6.15            
 [23] cowplot_1.1.3             DBI_1.2.3                
 [25] abind_1.4-8               zlibbioc_1.52.0          
 [27] R.utils_2.12.3            RCurl_1.98-1.16          
 [29] rappdirs_0.3.3            git2r_0.35.0             
 [31] gdtools_0.4.1             GenomeInfoDbData_1.2.13  
 [33] irlba_2.3.5.1             dqrng_0.4.1              
 [35] DelayedMatrixStats_1.28.1 codetools_0.2-20         
 [37] DelayedArray_0.32.0       xml2_1.3.6               
 [39] tidyselect_1.2.1          farver_2.1.2             
 [41] UCSC.utils_1.2.0          ScaledMatrix_1.14.0      
 [43] BiocFileCache_2.14.0      jsonlite_1.8.9           
 [45] systemfonts_1.1.0         tools_4.4.2              
 [47] progress_1.2.3            Rcpp_1.0.13-1            
 [49] glue_1.8.0                gridExtra_2.3            
 [51] Rttf2pt1_1.3.12           SparseArray_1.6.0        
 [53] xfun_0.49                 withr_3.0.2              
 [55] BiocManager_1.30.25       fastmap_1.2.0            
 [57] latticeExtra_0.6-30       callr_3.7.6              
 [59] digest_0.6.37             rsvd_1.0.5               
 [61] timechange_0.3.0          R6_2.6.1                 
 [63] mime_0.12                 colorspace_2.1-1         
 [65] jpeg_0.1-10               RSQLite_2.3.8            
 [67] R.methodsS3_1.8.2         generics_0.1.3           
 [69] fontLiberation_0.1.0      rtracklayer_1.66.0       
 [71] prettyunits_1.2.0         httr_1.4.7               
 [73] htmlwidgets_1.6.4         S4Arrays_1.6.0           
 [75] whisker_0.4.1             pkgconfig_2.0.3          
 [77] gtable_0.3.6              blob_1.2.4               
 [79] hwriter_1.3.2.1           htmltools_0.5.8.1        
 [81] fontBitstreamVera_0.1.1   scales_1.3.0             
 [83] png_0.1-8                 knitr_1.49               
 [85] rstudioapi_0.17.1         tzdb_0.4.0               
 [87] rjson_0.2.23              curl_6.0.1               
 [89] cachem_1.1.0              parallel_4.4.2           
 [91] extrafont_0.19            AnnotationDbi_1.68.0     
 [93] restfulr_0.0.15           pillar_1.10.1            
 [95] grid_4.4.2                vctrs_0.6.5              
 [97] promises_1.3.2            BiocSingular_1.22.0      
 [99] dbplyr_2.5.0              beachmat_2.22.0          
[101] xtable_1.8-4              extrafontdb_1.0          
[103] evaluate_1.0.3            GenomicFeatures_1.58.0   
[105] cli_3.6.3                 locfit_1.5-9.10          
[107] compiler_4.4.2            rlang_1.1.4              
[109] crayon_1.5.3              labeling_0.4.3           
[111] aroma.light_3.36.0        interp_1.1-6             
[113] ps_1.8.1                  getPass_0.2-4            
[115] plyr_1.8.9                fs_1.6.5                 
[117] stringi_1.8.4             deldir_2.0-4             
[119] munsell_0.5.1             fontquiver_0.2.1         
[121] Matrix_1.7-1              hms_1.1.3                
[123] sparseMatrixStats_1.18.0  bit64_4.5.2              
[125] KEGGREST_1.46.0           statmod_1.5.0            
[127] shiny_1.10.0              igraph_2.1.1             
[129] memoise_2.0.1             affyio_1.76.0            
[131] bslib_0.9.0               bit_4.5.0