Last updated: 2023-08-16

Checks: 7 0

Knit directory: mecfs-dge-analysis/

This reproducible R Markdown analysis was created with workflowr (version 1.7.0). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.


Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

The command set.seed(20230618) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version f22d1e2. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .DS_Store
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/
    Ignored:    output/batch-correction-limma/

Untracked files:
    Untracked:  code/edit_metadata.R
    Untracked:  data/MECFS_RNAseq_metadata_2023_08_16.csv
    Untracked:  data/MECFS_RNAseq_metadata_2023_08_17.csv
    Untracked:  output/batch-analysis.RData
    Untracked:  output/condition-analysis.RData
    Untracked:  output/counts_vst2.csv
    Untracked:  output/res_batch.csv
    Untracked:  output/results_tier.csv
    Untracked:  output/results_tier_1_vs_2.csv
    Untracked:  output/results_tier_1_vs_3.csv

Unstaged changes:
    Modified:   README.md
    Deleted:    analysis/analysis.Rmd
    Deleted:    analysis/initial-analysis.Rmd
    Modified:   code/helpers.R
    Modified:   data/MECFS_RNAseq_metadata_2023_06_23.csv
    Modified:   output/counts_vst_limma.csv

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.


These are the previous versions of the repository in which changes were made to the R Markdown (analysis/batch-analysis.Rmd) and HTML (docs/batch-analysis.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File Version Author Date Message
Rmd f22d1e2 sdhutchins 2023-08-16 Add new experiment designs.

Differential Gene Expression Analysis by Batch

All libraries from the prior analysis should be loaded.

library(tidyverse) # Available via CRAN
library(DESeq2) # Available via Bioconductor
library(RColorBrewer) # Available via CRAN
library(pheatmap) # Available via CRAN
library(genefilter) # Available via Bioconductor
library(limma) # Available via Bioconductor
library(gprofiler2) # Available via CRAN
library(biomaRt) # Available via Bioconductor
library(plotly) # Available via CRAN
library(ggpubr) # Available via CRAN
library(rmarkdown) # Available via CRAN

Data Import

We will be importing counts data from the star-salmon pipeline and our metadata for the project which is hosted on Box. This also ensures data is properly ordered by sample id.

counts <- read_tsv("data/star-salmon/salmon.merged.gene_counts_length_scaled.tsv")

# Import variants of interest
vars_of_interest <- read_csv("data/RNAseq Variants of Interest List_V2.csv")
genes_of_interest <- unique(vars_of_interest$Gene)
genes_of_interest <- genes_of_interest[!is.na(genes_of_interest)]

# Use first column (gene_id) for row names
counts <- data.frame(counts, row.names = 1)
counts$Ensembl_ID <- row.names(counts)
drop <- c("Ensembl_ID", "gene_name")
gene_info <- counts[, drop]
counts <- counts[, !(names(counts) %in% drop)] # remove both columns

# Import metadata
sample_metadata <- read_csv("data/MECFS_RNAseq_metadata_2023_08_17.csv")
row.names(sample_metadata) <- sample_metadata$RNA_Samples_id

# Check that data is ordered properly
sample_metadata <- check_order(sample_metadata = sample_metadata, counts = counts)

genes_biomart <- retrieve_gene_info(values = gene_info$Ensembl_ID, filters = "ensembl_gene_id_version")

DESeq2 Analysis

sample_metadata$Family <- factor(sample_metadata$Family)
sample_metadata$Affected <- factor(sample_metadata$Affected)
sample_metadata$Batch <- factor(sample_metadata$Batch)
sample_metadata$Gender <- factor(sample_metadata$Gender)
sample_metadata$Tier <- factor(sample_metadata$Tier)
sample_metadata$Sex <- factor(sample_metadata$Sex)

# Account for Family later but batch is accounted for
# Account for another factor seems to be an issue.
dds_batch <- DESeqDataSetFromMatrix(countData = round(counts), colData = sample_metadata, design = ~ Batch)

# Pre-filtering: Keep only rows that have at least 10 reads total
keep <- rowSums(counts(dds_batch)) >= 10
dds_batch <- dds_batch[keep, ]

# Run DESeq function
dds_batch <- DESeq(dds_batch)
head(counts(dds_batch, normalized = TRUE))
                       LW001974    LW001975     LW001976     LW001977
ENSG00000000003.15     6.283344     2.05296     7.223942     4.264822
ENSG00000000419.14   921.557182  1043.92997   850.017210   859.361676
ENSG00000000457.14  1114.246411   746.25082   829.549373   916.936776
ENSG00000000460.17   234.578192   189.89876   193.842452   328.391311
ENSG00000000938.13 12201.207645 11118.82929 15551.943765 17490.035904
ENSG00000000971.17    83.777926   130.36294   167.354663   197.248027
                       LW001978    LW001979     LW001980    LW001981
ENSG00000000003.15     3.565091    14.47998     4.573315    10.46587
ENSG00000000419.14   910.286661   866.16626   868.015180  1118.80153
ENSG00000000457.14  1100.424867   971.47523  1048.203789  1003.67696
ENSG00000000460.17   294.714219   276.43604   185.676587   252.22747
ENSG00000000938.13 10116.540921 10099.13001 18749.676681 21708.30816
ENSG00000000971.17   553.777525   143.48347   222.263107   216.64352
                      LW001982     LW001983     LW001984     LW001985
ENSG00000000003.15    17.45822     6.161279     5.718447     9.434676
ENSG00000000419.14   979.40627  1059.740056   924.754609   919.356806
ENSG00000000457.14  1104.23256   635.844033   901.880820   939.274457
ENSG00000000460.17   374.47887   144.173938   303.894624   263.122644
ENSG00000000938.13 19973.95218 23219.397520 19682.078435 13836.477178
ENSG00000000971.17   107.36807    27.109629    86.593629   116.361010
                     LW001986     LW001987    LW001988    LW001989     LW001990
ENSG00000000003.15    14.3010     8.261848    20.80204    11.32375     7.592455
ENSG00000000419.14   894.8338   840.642993   896.56804   949.93645   832.458476
ENSG00000000457.14  1046.6967   747.008720   897.60814   859.34649  1079.755586
ENSG00000000460.17   305.7689   232.708707   200.73971   223.95853   264.651294
ENSG00000000938.13 14919.3442 14147.037031 13752.23045 14215.07557 13223.345284
ENSG00000000971.17   222.6869   227.200809   160.17573   163.56522   143.172011
                       LW001991    LW001992     LW001993    LW001994
ENSG00000000003.15     5.130748    3.925784     7.688938    12.43332
ENSG00000000419.14  1002.206201  917.324885   997.639691   920.06541
ENSG00000000457.14   903.866855 1092.676575   897.683498   885.63469
ENSG00000000460.17   217.201685  378.183869   149.934289   273.53296
ENSG00000000938.13 14667.099624 8978.268242 13184.606244 15375.23032
ENSG00000000971.17   149.646830  294.433808   152.817641   124.33316
                       LW001995    LW001996    LW001997   LW001998     LW001999
ENSG00000000003.15     4.865262    13.74314    6.723075   11.82344     3.302019
ENSG00000000419.14  1119.010269   992.12394  808.113622 1212.74707   893.746405
ENSG00000000457.14   904.938740   897.23081  935.852048  728.83058   989.504949
ENSG00000000460.17   515.717776   293.84146  404.729118  143.57034   159.597572
ENSG00000000938.13 15497.481354 14288.28620 9146.071308 8941.89828 16707.114147
ENSG00000000971.17   229.478193   126.96045  208.415327  369.06021   184.913049
                     LW002000    LW002001     LW002080
ENSG00000000003.15   10.49324     9.01276     4.661212
ENSG00000000419.14 1437.57367   736.47124   918.258773
ENSG00000000457.14 1143.76299   987.54098  1041.780892
ENSG00000000460.17  339.71860   350.21010   205.093330
ENSG00000000938.13 8890.39630 10409.73770 12261.318282
ENSG00000000971.17  247.90276   133.90386   209.754542
resultsNames(dds_batch)
[1] "Intercept"    "Batch_B_vs_A" "Batch_C_vs_A"
# Normalize gene counts for differences in seq. depth/global differences
counts_norm_dds_batch <- counts(dds_batch, normalized = TRUE)

Data transformation and visualization

Perform count data transformation by variance stabilizing transformation (vst) on normalized counts.

vsd_batch <- vst(dds_batch, blind = FALSE)

Batch correction with limma

counts_vsd_batch <- assay(vsd_batch)
write.csv(counts_vsd_batch, file = "output/counts_vst2.csv")
mm <- model.matrix(~Batch, colData(vsd_batch))

counts_vsd_batch_limma <- limma::removeBatchEffect(counts_vsd_batch, batch = vsd_batch$Batch, design = mm)
Coefficients not estimable: batch1 batch2 
write.csv(counts_vsd_batch_limma, file = "output/counts_vst_limma.csv")

vsd_batch_limma <- vsd_batch
assay(vsd_batch_limma) <- counts_vsd_batch_limma

Comparison/Contrast of Batches

res_batch <- results(dds_batch, contrast = c("Batch", "A", "B"))
res_batch <- res_batch[order(res_batch$padj), ]
summary(res_batch)

out of 29605 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up)       : 1571, 5.3%
LFC < 0 (down)     : 1716, 5.8%
outliers [1]       : 0, 0%
low counts [2]     : 4043, 14%
(mean count < 1)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results
write.csv(res_batch, file = "output/res_batch.csv")
res_batch_df <- as.data.frame(res_batch)
res_batch_df_05 <- subset(res_batch_df, padj < 0.05)

Save data

save.image(file = "output/batch-analysis.RData")

sessionInfo()
R version 4.1.1 (2021-08-10)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Big Sur 10.16

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
 [1] rmarkdown_2.24              ggpubr_0.6.0               
 [3] plotly_4.10.2               biomaRt_2.50.3             
 [5] gprofiler2_0.2.2            limma_3.50.3               
 [7] genefilter_1.76.0           pheatmap_1.0.12            
 [9] RColorBrewer_1.1-3          DESeq2_1.34.0              
[11] SummarizedExperiment_1.24.0 Biobase_2.54.0             
[13] MatrixGenerics_1.6.0        matrixStats_1.0.0          
[15] GenomicRanges_1.46.1        GenomeInfoDb_1.30.1        
[17] IRanges_2.28.0              S4Vectors_0.32.4           
[19] BiocGenerics_0.40.0         lubridate_1.9.2            
[21] forcats_1.0.0               stringr_1.5.0              
[23] dplyr_1.1.2                 purrr_1.0.2                
[25] readr_2.1.4                 tidyr_1.3.0                
[27] tibble_3.2.1                ggplot2_3.4.3              
[29] tidyverse_2.0.0             workflowr_1.7.0            

loaded via a namespace (and not attached):
 [1] colorspace_2.1-0       ggsignif_0.6.4         rprojroot_2.0.3       
 [4] XVector_0.34.0         fs_1.6.3               rstudioapi_0.15.0     
 [7] bit64_4.0.5            AnnotationDbi_1.56.2   fansi_1.0.4           
[10] xml2_1.3.5             splines_4.1.1          cachem_1.0.8          
[13] geneplotter_1.72.0     knitr_1.43             jsonlite_1.8.7        
[16] broom_1.0.5            annotate_1.72.0        dbplyr_2.3.3          
[19] png_0.1-8              compiler_4.1.1         httr_1.4.7            
[22] backports_1.4.1        Matrix_1.6-1           fastmap_1.1.1         
[25] lazyeval_0.2.2         cli_3.6.1              later_1.3.1           
[28] htmltools_0.5.6        prettyunits_1.1.1      tools_4.1.1           
[31] gtable_0.3.3           glue_1.6.2             GenomeInfoDbData_1.2.7
[34] rappdirs_0.3.3         Rcpp_1.0.11            carData_3.0-5         
[37] jquerylib_0.1.4        vctrs_0.6.3            Biostrings_2.62.0     
[40] xfun_0.40              ps_1.7.5               timechange_0.2.0      
[43] lifecycle_1.0.3        rstatix_0.7.2          XML_3.99-0.14         
[46] getPass_0.2-2          zlibbioc_1.40.0        scales_1.2.1          
[49] vroom_1.6.3            hms_1.1.3              promises_1.2.1        
[52] parallel_4.1.1         yaml_2.3.7             curl_5.0.2            
[55] memoise_2.0.1          sass_0.4.7             stringi_1.7.12        
[58] RSQLite_2.3.1          filelock_1.0.2         BiocParallel_1.28.3   
[61] rlang_1.1.1            pkgconfig_2.0.3        bitops_1.0-7          
[64] evaluate_0.21          lattice_0.21-8         htmlwidgets_1.6.2     
[67] bit_4.0.5              processx_3.8.2         tidyselect_1.2.0      
[70] magrittr_2.0.3         R6_2.5.1               generics_0.1.3        
[73] DelayedArray_0.20.0    DBI_1.1.3              pillar_1.9.0          
[76] whisker_0.4.1          withr_2.5.0            abind_1.4-5           
[79] survival_3.5-7         KEGGREST_1.34.0        RCurl_1.98-1.12       
[82] car_3.1-2              crayon_1.5.2           utf8_1.2.3            
[85] BiocFileCache_2.2.1    tzdb_0.4.0             progress_1.2.2        
[88] locfit_1.5-9.8         grid_4.1.1             data.table_1.14.8     
[91] blob_1.2.4             callr_3.7.3            git2r_0.32.0          
[94] digest_0.6.33          xtable_1.8-4           httpuv_1.6.11         
[97] munsell_0.5.0          viridisLite_0.4.2      bslib_0.5.1