Last updated: 2025-02-19

Checks: 5 2

Knit directory: locust-comparative-genomics/

This reproducible R Markdown analysis was created with workflowr (version 1.7.1). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.

R Markdown file: uncommitted changes

The R Markdown file has unstaged changes. To know which version of the R Markdown file created these results, you’ll want to first commit it to the Git repo. If you’re still working on the analysis, you can ignore this warning. When you’re finished, you can run wflow_publish to commit the R Markdown file and build the HTML.

Environment: empty

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

Seed: set.seed(20221025)

The command set.seed(20221025) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Session information: recorded

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Cache: none

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

File paths: absolute

Using absolute paths to the files within your workflowr project makes it difficult for you and others to run your code on a different machine. Change the absolute path(s) below to the suggested relative path(s) to make your code more reproducible.

absolute	relative
/Users/maevatecher/Documents/GitHub/locust-comparative-genomics/data/RefSeq/GCF_023897955.1_iqSchGreg1.2_genomic.gtf	data/RefSeq/GCF_023897955.1_iqSchGreg1.2_genomic.gtf
/Users/maevatecher/Documents/GitHub/locust-comparative-genomics/data/list/GO_Annotations/blast2go_gregaria.annot.mgp_removed	data/list/GO_Annotations/blast2go_gregaria.annot.mgp_removed
/Users/maevatecher/Documents/GitHub/locust-comparative-genomics/data/custom_sgregaria_orgdb	data/custom_sgregaria_orgdb
/Users/maevatecher/Documents/GitHub/locust-comparative-genomics/data/custom_sgregaria_orgdb/org.Sgregaria.eg.db	data/custom_sgregaria_orgdb/org.Sgregaria.eg.db
/Users/maevatecher/Documents/GitHub/locust-comparative-genomics/data	data

Repository version: d7fa779

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version d7fa779. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .DS_Store
    Ignored:    analysis/.DS_Store
    Ignored:    analysis/.Rhistory
    Ignored:    data/.DS_Store
    Ignored:    data/DEG_results/.DS_Store
    Ignored:    data/DEG_results/Bulk_RNAseq/.DS_Store
    Ignored:    data/DEG_results/Bulk_RNAseq/americana/.DS_Store
    Ignored:    data/DEG_results/Bulk_RNAseq/cancellata/.DS_Store
    Ignored:    data/DEG_results/Bulk_RNAseq/cubense/.DS_Store
    Ignored:    data/DEG_results/Bulk_RNAseq/gregaria/.DS_Store
    Ignored:    data/DEG_results/Bulk_RNAseq/nitens/.DS_Store
    Ignored:    data/DEG_results/Bulk_RNAseq/piceifrons/.DS_Store
    Ignored:    data/DEG_results/RNAi/.DS_Store
    Ignored:    data/DEG_results/RNAi/All/.DS_Store
    Ignored:    data/DEG_results/RNAi/All_no_rRNA/.DS_Store
    Ignored:    data/DEG_results/RNAi/Head_no_rRNA/.DS_Store
    Ignored:    data/DEG_results/RNAi/Thorax_no_rRNA/.DS_Store
    Ignored:    data/WGCNA/.DS_Store
    Ignored:    data/WGCNA/input/.DS_Store
    Ignored:    data/WGCNA/input/Bulk_RNAseq/.DS_Store
    Ignored:    data/WGCNA/output/
    Ignored:    data/behavioral_data/.DS_Store
    Ignored:    data/behavioral_data/Raw_data/.DS_Store
    Ignored:    data/custom_sgregaria_orgdb/.DS_Store
    Ignored:    data/list/.DS_Store
    Ignored:    data/list/Bulk_RNAseq/.DS_Store
    Ignored:    data/list/GO_Annotations/.DS_Store
    Ignored:    data/orthofinder/.DS_Store
    Ignored:    data/orthofinder/Polyneoptera/.DS_Store
    Ignored:    data/orthofinder/Polyneoptera/Results_I2/.DS_Store
    Ignored:    data/orthofinder/Polyneoptera/Results_I2/Orthogroups/.DS_Store
    Ignored:    data/orthofinder/Polyneoptera/Results_I5/.DS_Store
    Ignored:    data/orthofinder/Polyneoptera/Results_I5/Orthogroups/.DS_Store
    Ignored:    data/orthofinder/Schistocerca/.DS_Store
    Ignored:    data/orthofinder/Schistocerca/Results_I2/.DS_Store
    Ignored:    data/orthofinder/Schistocerca/Results_I2/Orthogroups/.DS_Store
    Ignored:    data/orthofinder/Schistocerca/Results_I5/.DS_Store
    Ignored:    data/orthofinder/Schistocerca/Results_I5/Orthogroups/.DS_Store
    Ignored:    data/overlap/.DS_Store
    Ignored:    data/overlap/Bulk_RNAseq/.DS_Store
    Ignored:    data/overlap/Bulk_RNAseq/cancellata/
    Ignored:    data/readcounts/.DS_Store
    Ignored:    data/readcounts/Bulk_RNAseq/.DS_Store
    Ignored:    data/readcounts/RNAi/.DS_Store

Untracked files:
    Untracked:  VennDiagram.2025-02-19_00-26-10.090756.log
    Untracked:  VennDiagram.2025-02-19_00-26-10.673464.log
    Untracked:  VennDiagram.2025-02-19_00-26-11.487897.log
    Untracked:  VennDiagram.2025-02-19_00-26-12.000365.log
    Untracked:  VennDiagram.2025-02-19_00-26-12.563903.log
    Untracked:  VennDiagram.2025-02-19_00-26-13.147608.log
    Untracked:  VennDiagram.2025-02-19_00-26-13.241595.log
    Untracked:  VennDiagram.2025-02-19_00-26-13.40181.log
    Untracked:  VennDiagram.2025-02-19_00-26-14.299368.log
    Untracked:  VennDiagram.2025-02-19_00-26-14.491279.log
    Untracked:  VennDiagram.2025-02-19_00-26-14.608716.log
    Untracked:  VennDiagram.2025-02-19_00-26-15.84461.log
    Untracked:  VennDiagram.2025-02-19_00-26-15.888073.log
    Untracked:  VennDiagram.2025-02-19_00-26-16.020382.log
    Untracked:  VennDiagram.2025-02-19_00-26-16.691562.log
    Untracked:  VennDiagram.2025-02-19_00-26-16.734494.log
    Untracked:  VennDiagram.2025-02-19_00-26-16.808353.log
    Untracked:  VennDiagram.2025-02-19_00-26-17.664569.log
    Untracked:  VennDiagram.2025-02-19_00-26-17.773688.log
    Untracked:  VennDiagram.2025-02-19_00-26-17.955978.log
    Untracked:  VennDiagram.2025-02-19_00-26-18.912722.log
    Untracked:  VennDiagram.2025-02-19_00-26-19.023018.log
    Untracked:  VennDiagram.2025-02-19_00-26-19.201162.log
    Untracked:  VennDiagram.2025-02-19_00-26-20.198876.log
    Untracked:  VennDiagram.2025-02-19_00-26-20.339394.log
    Untracked:  VennDiagram.2025-02-19_00-26-20.495203.log
    Untracked:  VennDiagram.2025-02-19_00-26-20.754769.log
    Untracked:  VennDiagram.2025-02-19_00-26-20.915182.log
    Untracked:  VennDiagram.2025-02-19_00-26-21.110117.log
    Untracked:  VennDiagram.2025-02-19_00-26-28.319055.log
    Untracked:  VennDiagram.2025-02-19_00-26-28.902249.log
    Untracked:  VennDiagram.2025-02-19_00-26-29.434464.log
    Untracked:  VennDiagram.2025-02-19_00-26-29.981347.log
    Untracked:  VennDiagram.2025-02-19_00-26-30.48741.log
    Untracked:  VennDiagram.2025-02-19_00-26-33.503353.log
    Untracked:  VennDiagram.2025-02-19_00-26-33.864508.log
    Untracked:  VennDiagram.2025-02-19_00-26-33.937407.log
    Untracked:  VennDiagram.2025-02-19_00-26-39.23504.log
    Untracked:  VennDiagram.2025-02-19_00-26-39.35344.log
    Untracked:  VennDiagram.2025-02-19_00-26-39.426602.log
    Untracked:  VennDiagram.2025-02-19_00-26-45.123997.log
    Untracked:  VennDiagram.2025-02-19_00-26-45.174552.log
    Untracked:  VennDiagram.2025-02-19_00-26-45.245105.log
    Untracked:  VennDiagram.2025-02-19_00-26-49.933369.log
    Untracked:  VennDiagram.2025-02-19_00-26-49.989636.log
    Untracked:  VennDiagram.2025-02-19_00-26-50.067423.log
    Untracked:  VennDiagram.2025-02-19_00-26-55.250141.log
    Untracked:  VennDiagram.2025-02-19_00-26-55.345536.log
    Untracked:  VennDiagram.2025-02-19_00-26-55.455281.log
    Untracked:  VennDiagram.2025-02-19_00-27-00.668035.log
    Untracked:  VennDiagram.2025-02-19_00-27-00.815473.log
    Untracked:  VennDiagram.2025-02-19_00-27-00.921827.log
    Untracked:  VennDiagram.2025-02-19_00-27-07.539614.log
    Untracked:  VennDiagram.2025-02-19_00-27-07.629385.log
    Untracked:  VennDiagram.2025-02-19_00-27-07.734975.log
    Untracked:  VennDiagram.2025-02-19_00-27-07.89401.log
    Untracked:  VennDiagram.2025-02-19_00-27-07.995417.log
    Untracked:  VennDiagram.2025-02-19_00-27-08.149242.log
    Untracked:  VennDiagram.2025-02-19_00-27-16.554396.log
    Untracked:  VennDiagram.2025-02-19_00-27-16.707058.log
    Untracked:  VennDiagram.2025-02-19_00-27-16.801039.log
    Untracked:  VennDiagram.2025-02-19_00-27-23.737163.log
    Untracked:  VennDiagram.2025-02-19_00-27-23.824665.log
    Untracked:  VennDiagram.2025-02-19_00-27-23.928958.log
    Untracked:  VennDiagram.2025-02-19_00-33-20.698591.log
    Untracked:  VennDiagram.2025-02-19_00-33-21.318215.log
    Untracked:  VennDiagram.2025-02-19_00-33-21.853493.log
    Untracked:  VennDiagram.2025-02-19_00-33-22.432323.log
    Untracked:  VennDiagram.2025-02-19_00-33-23.299242.log
    Untracked:  VennDiagram.2025-02-19_00-33-23.921532.log
    Untracked:  VennDiagram.2025-02-19_00-33-24.056721.log
    Untracked:  VennDiagram.2025-02-19_00-33-24.230483.log
    Untracked:  VennDiagram.2025-02-19_00-33-25.132705.log
    Untracked:  VennDiagram.2025-02-19_00-33-25.271917.log
    Untracked:  VennDiagram.2025-02-19_00-33-25.364172.log
    Untracked:  VennDiagram.2025-02-19_00-33-26.382907.log
    Untracked:  VennDiagram.2025-02-19_00-33-26.429725.log
    Untracked:  VennDiagram.2025-02-19_00-33-26.511566.log
    Untracked:  VennDiagram.2025-02-19_00-33-27.339106.log
    Untracked:  VennDiagram.2025-02-19_00-33-27.38223.log
    Untracked:  VennDiagram.2025-02-19_00-33-27.510075.log
    Untracked:  VennDiagram.2025-02-19_00-33-28.610624.log
    Untracked:  VennDiagram.2025-02-19_00-33-28.766455.log
    Untracked:  VennDiagram.2025-02-19_00-33-28.894421.log
    Untracked:  VennDiagram.2025-02-19_00-33-29.797531.log
    Untracked:  VennDiagram.2025-02-19_00-33-29.899743.log
    Untracked:  VennDiagram.2025-02-19_00-33-30.067137.log
    Untracked:  VennDiagram.2025-02-19_00-33-31.300233.log
    Untracked:  VennDiagram.2025-02-19_00-33-31.390845.log
    Untracked:  VennDiagram.2025-02-19_00-33-31.49441.log
    Untracked:  VennDiagram.2025-02-19_00-33-31.696759.log
    Untracked:  VennDiagram.2025-02-19_00-33-31.866906.log
    Untracked:  VennDiagram.2025-02-19_00-33-31.976004.log
    Untracked:  VennDiagram.2025-02-19_00-33-38.41082.log
    Untracked:  VennDiagram.2025-02-19_00-33-39.052663.log
    Untracked:  VennDiagram.2025-02-19_00-33-39.597515.log
    Untracked:  VennDiagram.2025-02-19_00-33-40.154527.log
    Untracked:  VennDiagram.2025-02-19_00-33-40.717342.log
    Untracked:  VennDiagram.2025-02-19_00-33-44.143492.log
    Untracked:  VennDiagram.2025-02-19_00-33-44.206999.log
    Untracked:  VennDiagram.2025-02-19_00-33-44.282855.log
    Untracked:  VennDiagram.2025-02-19_00-33-49.323427.log
    Untracked:  VennDiagram.2025-02-19_00-33-49.381072.log
    Untracked:  VennDiagram.2025-02-19_00-33-49.451647.log
    Untracked:  VennDiagram.2025-02-19_00-33-54.707214.log
    Untracked:  VennDiagram.2025-02-19_00-33-54.756589.log
    Untracked:  VennDiagram.2025-02-19_00-33-54.888789.log
    Untracked:  VennDiagram.2025-02-19_00-33-59.923873.log
    Untracked:  VennDiagram.2025-02-19_00-33-59.976814.log
    Untracked:  VennDiagram.2025-02-19_00-34-00.059093.log
    Untracked:  VennDiagram.2025-02-19_00-34-05.246014.log
    Untracked:  VennDiagram.2025-02-19_00-34-05.386983.log
    Untracked:  VennDiagram.2025-02-19_00-34-05.495627.log
    Untracked:  VennDiagram.2025-02-19_00-34-10.66816.log
    Untracked:  VennDiagram.2025-02-19_00-34-10.758927.log
    Untracked:  VennDiagram.2025-02-19_00-34-10.866676.log
    Untracked:  VennDiagram.2025-02-19_00-34-17.794858.log
    Untracked:  VennDiagram.2025-02-19_00-34-17.88475.log
    Untracked:  VennDiagram.2025-02-19_00-34-18.038407.log
    Untracked:  VennDiagram.2025-02-19_00-34-18.13879.log
    Untracked:  VennDiagram.2025-02-19_00-34-18.304421.log
    Untracked:  VennDiagram.2025-02-19_00-34-18.411287.log
    Untracked:  VennDiagram.2025-02-19_00-34-27.235827.log
    Untracked:  VennDiagram.2025-02-19_00-34-27.325741.log
    Untracked:  VennDiagram.2025-02-19_00-34-27.426145.log
    Untracked:  VennDiagram.2025-02-19_00-34-34.155129.log
    Untracked:  VennDiagram.2025-02-19_00-34-34.241188.log
    Untracked:  VennDiagram.2025-02-19_00-34-34.409073.log
    Untracked:  VennDiagram.2025-02-19_00-52-33.095813.log
    Untracked:  VennDiagram.2025-02-19_00-52-33.77208.log
    Untracked:  VennDiagram.2025-02-19_00-52-34.323694.log
    Untracked:  VennDiagram.2025-02-19_00-52-34.876006.log
    Untracked:  VennDiagram.2025-02-19_00-52-35.682899.log
    Untracked:  VennDiagram.2025-02-19_00-52-36.285793.log
    Untracked:  VennDiagram.2025-02-19_00-52-36.385768.log
    Untracked:  VennDiagram.2025-02-19_00-52-36.533758.log
    Untracked:  VennDiagram.2025-02-19_00-52-37.396762.log
    Untracked:  VennDiagram.2025-02-19_00-52-37.522269.log
    Untracked:  VennDiagram.2025-02-19_00-52-37.672476.log
    Untracked:  VennDiagram.2025-02-19_00-52-38.548478.log
    Untracked:  VennDiagram.2025-02-19_00-52-38.890685.log
    Untracked:  VennDiagram.2025-02-19_00-52-38.963824.log
    Untracked:  VennDiagram.2025-02-19_00-52-39.629011.log
    Untracked:  VennDiagram.2025-02-19_00-52-39.669878.log
    Untracked:  VennDiagram.2025-02-19_00-52-39.795557.log
    Untracked:  VennDiagram.2025-02-19_00-52-40.791824.log
    Untracked:  VennDiagram.2025-02-19_00-52-40.898838.log
    Untracked:  VennDiagram.2025-02-19_00-52-41.067889.log
    Untracked:  VennDiagram.2025-02-19_00-52-41.902756.log
    Untracked:  VennDiagram.2025-02-19_00-52-42.060632.log
    Untracked:  VennDiagram.2025-02-19_00-52-42.176172.log
    Untracked:  VennDiagram.2025-02-19_00-52-43.124775.log
    Untracked:  VennDiagram.2025-02-19_00-52-43.26273.log
    Untracked:  VennDiagram.2025-02-19_00-52-43.414173.log
    Untracked:  VennDiagram.2025-02-19_00-52-43.624641.log
    Untracked:  VennDiagram.2025-02-19_00-52-43.723864.log
    Untracked:  VennDiagram.2025-02-19_00-52-43.876686.log
    Untracked:  VennDiagram.2025-02-19_00-52-50.395671.log
    Untracked:  VennDiagram.2025-02-19_00-52-50.982669.log
    Untracked:  VennDiagram.2025-02-19_00-52-51.506272.log
    Untracked:  VennDiagram.2025-02-19_00-52-52.025053.log
    Untracked:  VennDiagram.2025-02-19_00-52-52.535647.log
    Untracked:  VennDiagram.2025-02-19_00-52-55.520631.log
    Untracked:  VennDiagram.2025-02-19_00-52-55.577491.log
    Untracked:  VennDiagram.2025-02-19_00-52-55.704049.log
    Untracked:  VennDiagram.2025-02-19_00-53-01.225018.log
    Untracked:  VennDiagram.2025-02-19_00-53-01.283446.log
    Untracked:  VennDiagram.2025-02-19_00-53-01.35707.log
    Untracked:  VennDiagram.2025-02-19_00-53-07.009745.log
    Untracked:  VennDiagram.2025-02-19_00-53-07.05761.log
    Untracked:  VennDiagram.2025-02-19_00-53-07.127274.log
    Untracked:  VennDiagram.2025-02-19_00-53-12.622344.log
    Untracked:  VennDiagram.2025-02-19_00-53-12.727351.log
    Untracked:  VennDiagram.2025-02-19_00-53-12.803588.log
    Untracked:  VennDiagram.2025-02-19_00-53-18.384503.log
    Untracked:  VennDiagram.2025-02-19_00-53-18.566148.log
    Untracked:  VennDiagram.2025-02-19_00-53-18.742062.log
    Untracked:  VennDiagram.2025-02-19_00-53-24.158518.log
    Untracked:  VennDiagram.2025-02-19_00-53-24.257476.log
    Untracked:  VennDiagram.2025-02-19_00-53-24.426949.log
    Untracked:  VennDiagram.2025-02-19_00-53-31.42607.log
    Untracked:  VennDiagram.2025-02-19_00-53-31.517315.log
    Untracked:  VennDiagram.2025-02-19_00-53-31.943246.log
    Untracked:  VennDiagram.2025-02-19_00-53-32.045029.log
    Untracked:  VennDiagram.2025-02-19_00-53-32.158777.log
    Untracked:  VennDiagram.2025-02-19_00-53-32.314934.log
    Untracked:  VennDiagram.2025-02-19_00-53-40.482904.log
    Untracked:  VennDiagram.2025-02-19_00-53-40.568202.log
    Untracked:  VennDiagram.2025-02-19_00-53-40.671108.log
    Untracked:  VennDiagram.2025-02-19_00-53-47.727146.log
    Untracked:  VennDiagram.2025-02-19_00-53-47.807306.log
    Untracked:  VennDiagram.2025-02-19_00-53-47.904046.log
    Untracked:  analysis/VennDiagram.2025-02-19_00-02-41.857736.log
    Untracked:  analysis/VennDiagram.2025-02-19_00-02-42.342479.log
    Untracked:  analysis/VennDiagram.2025-02-19_00-02-42.786429.log
    Untracked:  analysis/VennDiagram.2025-02-19_00-02-43.346076.log
    Untracked:  analysis/VennDiagram.2025-02-19_00-02-43.692998.log
    Untracked:  analysis/VennDiagram.2025-02-19_00-02-44.080721.log
    Untracked:  analysis/VennDiagram.2025-02-19_00-06-06.617464.log
    Untracked:  analysis/VennDiagram.2025-02-19_00-06-06.937883.log
    Untracked:  analysis/VennDiagram.2025-02-19_00-06-07.107717.log
    Untracked:  analysis/VennDiagram.2025-02-19_00-07-49.53549.log
    Untracked:  analysis/VennDiagram.2025-02-19_00-07-49.838215.log
    Untracked:  analysis/VennDiagram.2025-02-19_00-07-50.194331.log
    Untracked:  analysis/VennDiagram.2025-02-19_00-07-51.368246.log
    Untracked:  analysis/VennDiagram.2025-02-19_00-07-51.669745.log
    Untracked:  analysis/VennDiagram.2025-02-19_00-07-52.153586.log
    Untracked:  analysis/VennDiagram.2025-02-19_00-15-35.217343.log
    Untracked:  analysis/VennDiagram.2025-02-19_00-15-35.441086.log
    Untracked:  analysis/VennDiagram.2025-02-19_00-15-35.662977.log
    Untracked:  analysis/VennDiagram.2025-02-19_00-15-35.883515.log
    Untracked:  analysis/VennDiagram.2025-02-19_00-15-36.174919.log
    Untracked:  analysis/VennDiagram.2025-02-19_00-15-36.37529.log
    Untracked:  analysis/VennDiagram.2025-02-19_00-16-35.357156.log
    Untracked:  analysis/VennDiagram.2025-02-19_00-16-35.505389.log
    Untracked:  analysis/VennDiagram.2025-02-19_00-16-35.779052.log
    Untracked:  analysis/VennDiagram.2025-02-19_00-17-29.763089.log
    Untracked:  analysis/VennDiagram.2025-02-19_00-17-29.914926.log
    Untracked:  analysis/VennDiagram.2025-02-19_00-17-30.198819.log
    Untracked:  analysis/VennDiagram.2025-02-19_00-17-30.941132.log
    Untracked:  analysis/VennDiagram.2025-02-19_00-17-31.112726.log
    Untracked:  analysis/VennDiagram.2025-02-19_00-17-31.382201.log
    Untracked:  analysis/VennDiagram.2025-02-19_00-17-38.602576.log
    Untracked:  analysis/VennDiagram.2025-02-19_00-17-38.829729.log
    Untracked:  analysis/VennDiagram.2025-02-19_00-17-39.0535.log
    Untracked:  analysis/VennDiagram.2025-02-19_00-17-39.997323.log
    Untracked:  analysis/VennDiagram.2025-02-19_00-17-40.24828.log
    Untracked:  analysis/VennDiagram.2025-02-19_00-17-40.467315.log
    Untracked:  analysis/VennDiagram.2025-02-19_00-24-27.135103.log
    Untracked:  analysis/VennDiagram.2025-02-19_00-24-27.433968.log
    Untracked:  analysis/VennDiagram.2025-02-19_00-24-27.946018.log
    Untracked:  analysis/VennDiagram.2025-02-19_00-26-14.710653.log
    Untracked:  analysis/VennDiagram.2025-02-19_00-26-14.926229.log
    Untracked:  analysis/VennDiagram.2025-02-19_00-26-15.260861.log
    Untracked:  analysis/VennDiagram.2025-02-19_00-27-20.796686.log
    Untracked:  analysis/VennDiagram.2025-02-19_00-27-21.157414.log
    Untracked:  analysis/VennDiagram.2025-02-19_00-27-21.420226.log
    Untracked:  analysis/VennDiagram.2025-02-19_00-27-53.716011.log
    Untracked:  analysis/VennDiagram.2025-02-19_00-27-53.978443.log
    Untracked:  analysis/VennDiagram.2025-02-19_00-27-54.235876.log
    Untracked:  analysis/VennDiagram.2025-02-19_00-28-42.999843.log
    Untracked:  analysis/VennDiagram.2025-02-19_00-28-43.291244.log
    Untracked:  analysis/VennDiagram.2025-02-19_00-28-43.495347.log
    Untracked:  analysis/VennDiagram.2025-02-19_00-29-08.797774.log
    Untracked:  analysis/VennDiagram.2025-02-19_00-29-09.000699.log
    Untracked:  analysis/VennDiagram.2025-02-19_00-29-09.231426.log
    Untracked:  analysis/VennDiagram.2025-02-19_00-29-48.724336.log
    Untracked:  analysis/VennDiagram.2025-02-19_00-29-48.96818.log
    Untracked:  analysis/VennDiagram.2025-02-19_00-29-49.195993.log
    Untracked:  analysis/VennDiagram.2025-02-19_00-30-13.873298.log
    Untracked:  analysis/VennDiagram.2025-02-19_00-30-14.054743.log
    Untracked:  analysis/VennDiagram.2025-02-19_00-30-14.207976.log
    Untracked:  analysis/VennDiagram.2025-02-19_00-30-42.323592.log
    Untracked:  analysis/VennDiagram.2025-02-19_00-30-42.490849.log
    Untracked:  analysis/VennDiagram.2025-02-19_00-30-42.687886.log
    Untracked:  analysis/VennDiagram.2025-02-19_00-32-09.35965.log
    Untracked:  analysis/VennDiagram.2025-02-19_00-32-09.541393.log
    Untracked:  analysis/VennDiagram.2025-02-19_00-32-09.721318.log
    Untracked:  analysis/VennDiagram.2025-02-19_00-32-29.804779.log
    Untracked:  analysis/VennDiagram.2025-02-19_00-32-29.977269.log
    Untracked:  analysis/VennDiagram.2025-02-19_00-32-30.147694.log
    Untracked:  analysis/VennDiagram.2025-02-19_01-20-21.674127.log
    Untracked:  analysis/VennDiagram.2025-02-19_01-20-21.931179.log
    Untracked:  analysis/VennDiagram.2025-02-19_01-20-22.173368.log
    Untracked:  data/DEG_results/RNAi/All_no_rRNA/sva_scatter_SV1_SV2.png
    Untracked:  data/DEG_results/RNAi/All_no_rRNA/sva_scatter_SV1_SV3.png
    Untracked:  data/DEG_results/RNAi/All_no_rRNA/sva_scatter_SV2_SV3.png
    Untracked:  data/DEG_results/RNAi/All_no_rRNA/sva_stripchart_SV3.png
    Untracked:  data/RefSeq/
    Untracked:  data/orthofinder/Polyneoptera/Results_I5/Orthogroups/Orthogroups.tsv
    Untracked:  data/orthofinder/Polyneoptera/Results_I5/Orthogroups/Orthogroups.txt
    Untracked:  data/orthofinder/Polyneoptera/Results_I5/Orthogroups/Orthogroups_reprocessed.tsv
    Untracked:  data/orthofinder/Polyneoptera/Results_I5/Orthogroups/Orthogroups_reprocessed.txt
    Untracked:  data/orthofinder/Schistocerca/Results_I2/Orthogroups/Orthogroups.tsv
    Untracked:  data/orthofinder/Schistocerca/Results_I2/Orthogroups/Orthogroups.txt
    Untracked:  data/orthofinder/Schistocerca/Results_I2/Orthogroups/Orthogroups_reprocessed.tsv
    Untracked:  data/orthofinder/Schistocerca/Results_I2/Orthogroups/Orthogroups_reprocessed.txt

Unstaged changes:
    Modified:   analysis/3_deseq2-results.Rmd
    Modified:   analysis/3_overlap-venn.Rmd
    Modified:   analysis/4_RNAi_degs.Rmd
    Modified:   data/DEG_results/Bulk_RNAseq/All_species/PCA_VST_togregaria.png
    Modified:   data/DEG_results/Bulk_RNAseq/All_species/PCA_labelled_VST_togregaria.png
    Modified:   data/DEG_results/RNAi/All_no_rRNA/All_raw_counts.csv
    Modified:   data/DEG_results/RNAi/All_no_rRNA/Hex1_vs_GFP/DEG_results_Hex1_vs_GFP.csv
    Modified:   data/DEG_results/RNAi/All_no_rRNA/Hex1_vs_GFP/heatmap_plot_Hex1_vs_GFP.tiff
    Modified:   data/DEG_results/RNAi/All_no_rRNA/Hex1_vs_GFP/volcano_plot_Hex1_vs_GFP.tiff
    Modified:   data/DEG_results/RNAi/All_no_rRNA/Hex2_vs_GFP/DEG_results_Hex2_vs_GFP.csv
    Modified:   data/DEG_results/RNAi/All_no_rRNA/Hex2_vs_GFP/heatmap_plot_Hex2_vs_GFP.tiff
    Modified:   data/DEG_results/RNAi/All_no_rRNA/Hex2_vs_GFP/volcano_plot_Hex2_vs_GFP.tiff
    Modified:   data/DEG_results/RNAi/All_no_rRNA/JHMT_vs_GFP/DEG_results_JHMT_vs_GFP.csv
    Modified:   data/DEG_results/RNAi/All_no_rRNA/JHMT_vs_GFP/heatmap_plot_JHMT_vs_GFP.tiff
    Modified:   data/DEG_results/RNAi/All_no_rRNA/JHMT_vs_GFP/volcano_plot_JHMT_vs_GFP.tiff
    Modified:   data/DEG_results/RNAi/All_no_rRNA/MIOX_vs_GFP/DEG_results_MIOX_vs_GFP.csv
    Modified:   data/DEG_results/RNAi/All_no_rRNA/MIOX_vs_GFP/heatmap_plot_MIOX_vs_GFP.tiff
    Modified:   data/DEG_results/RNAi/All_no_rRNA/MIOX_vs_GFP/volcano_plot_MIOX_vs_GFP.tiff
    Modified:   data/DEG_results/RNAi/All_no_rRNA/PCA_Tissue_Gene_Label.png
    Modified:   data/DEG_results/RNAi/All_no_rRNA/PCA_Tissue_Gene_NoLabel.png
    Modified:   data/DEG_results/RNAi/All_no_rRNA/UNCH_vs_GFP/DEG_results_UNCH_vs_GFP.csv
    Modified:   data/DEG_results/RNAi/All_no_rRNA/UNCH_vs_GFP/heatmap_plot_UNCH_vs_GFP.tiff
    Modified:   data/DEG_results/RNAi/All_no_rRNA/UNCH_vs_GFP/volcano_plot_UNCH_vs_GFP.tiff
    Modified:   data/DEG_results/RNAi/All_no_rRNA/sva_stripchart_SV1.png
    Modified:   data/DEG_results/RNAi/All_no_rRNA/sva_stripchart_SV2.png
    Modified:   data/DEG_results/RNAi/Head/UNCH_vs_GFP/volcano_plot_UNCH_vs_GFP.tiff
    Modified:   data/DEG_results/RNAi/Head_no_rRNA/Hex1_vs_GFP/volcano_plot_Hex1_vs_GFP.tiff
    Modified:   data/DEG_results/RNAi/Head_no_rRNA/Hex2_vs_GFP/volcano_plot_Hex2_vs_GFP.tiff
    Modified:   data/DEG_results/RNAi/Head_no_rRNA/JHMT_vs_GFP/volcano_plot_JHMT_vs_GFP.tiff
    Modified:   data/DEG_results/RNAi/Head_no_rRNA/MIOX_vs_GFP/volcano_plot_MIOX_vs_GFP.tiff
    Modified:   data/DEG_results/RNAi/Head_no_rRNA/PCA_vst_Gene_labelled.png
    Modified:   data/DEG_results/RNAi/Head_no_rRNA/UNCH_vs_GFP/volcano_plot_UNCH_vs_GFP.tiff
    Modified:   data/DEG_results/RNAi/Head_no_rRNA/sva_stripchart_SV1.png
    Modified:   data/DEG_results/RNAi/Head_no_rRNA/sva_stripchart_SV2.png
    Modified:   data/DEG_results/RNAi/Head_no_rRNA/sva_stripchart_SV3.png
    Modified:   data/DEG_results/RNAi/Thorax/Hex1_vs_GFP/DEG_results_Hex1_vs_GFP.csv
    Modified:   data/DEG_results/RNAi/Thorax/Hex1_vs_GFP/heatmap_plot_Hex1_vs_GFP.tiff
    Modified:   data/DEG_results/RNAi/Thorax/Hex1_vs_GFP/volcano_plot_Hex1_vs_GFP.tiff
    Modified:   data/DEG_results/RNAi/Thorax/Hex2_vs_GFP/DEG_results_Hex2_vs_GFP.csv
    Modified:   data/DEG_results/RNAi/Thorax/Hex2_vs_GFP/heatmap_plot_Hex2_vs_GFP.tiff
    Modified:   data/DEG_results/RNAi/Thorax/Hex2_vs_GFP/volcano_plot_Hex2_vs_GFP.tiff
    Modified:   data/DEG_results/RNAi/Thorax/JHMT_vs_GFP/DEG_results_JHMT_vs_GFP.csv
    Modified:   data/DEG_results/RNAi/Thorax/JHMT_vs_GFP/heatmap_plot_JHMT_vs_GFP.tiff
    Modified:   data/DEG_results/RNAi/Thorax/JHMT_vs_GFP/volcano_plot_JHMT_vs_GFP.tiff
    Modified:   data/DEG_results/RNAi/Thorax/MIOX_vs_GFP/DEG_results_MIOX_vs_GFP.csv
    Modified:   data/DEG_results/RNAi/Thorax/MIOX_vs_GFP/heatmap_plot_MIOX_vs_GFP.tiff
    Modified:   data/DEG_results/RNAi/Thorax/MIOX_vs_GFP/volcano_plot_MIOX_vs_GFP.tiff
    Modified:   data/DEG_results/RNAi/Thorax/PCA_vst_Gene_labelled.png
    Modified:   data/DEG_results/RNAi/Thorax/UNCH_vs_GFP/DEG_results_UNCH_vs_GFP.csv
    Modified:   data/DEG_results/RNAi/Thorax/UNCH_vs_GFP/heatmap_plot_UNCH_vs_GFP.tiff
    Modified:   data/DEG_results/RNAi/Thorax/UNCH_vs_GFP/volcano_plot_UNCH_vs_GFP.tiff
    Modified:   data/DEG_results/RNAi/Thorax/sva_scatter_SV1_SV2.png
    Modified:   data/DEG_results/RNAi/Thorax/sva_scatter_SV1_SV3.png
    Modified:   data/DEG_results/RNAi/Thorax/sva_scatter_SV2_SV3.png
    Modified:   data/DEG_results/RNAi/Thorax/sva_stripchart_SV1.png
    Modified:   data/DEG_results/RNAi/Thorax/sva_stripchart_SV2.png
    Modified:   data/DEG_results/RNAi/Thorax/sva_stripchart_SV3.png

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.

These are the previous versions of the repository in which changes were made to the R Markdown (analysis/4_RNAi_degs.Rmd) and HTML (docs/4_RNAi_degs.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File	Version	Author	Date	Message
Rmd	d7fa779	Maeva TECHER	2025-02-14	Update RNAi and overlap
html	d7fa779	Maeva TECHER	2025-02-14	Update RNAi and overlap
Rmd	e9e41d7	Maeva TECHER	2025-02-12	change layout RNAi
html	e9e41d7	Maeva TECHER	2025-02-12	change layout RNAi
Rmd	3746422	Maeva TECHER	2025-02-12	Add RNAi
html	3746422	Maeva TECHER	2025-02-12	Add RNAi

Following the overlap analysis of bulk tissue RNA-seq data from the whole head and thorax across all species, we selected a subset of differentially expressed genes between isolated and crowded individuals. The selection criteria were as follows:

Genes must be shared by at least two or three locust species.
Genes were ranked based on log fold change, prioritizing those with the highest absolute values (whether upregulated or downregulated in gregarious nymphs), and only genes with a significant corrected p-value were considered.
Genes with functional descriptions suggesting a role in phenotypic plasticity in other arthropods were prioritized.

A total of X genes were included in this list and used for functional validation to assess their impact on collective behavior and the transcriptome landscape of gregarious nymphs in the Desert Locust S. gregaria. Following RNAi probes engineering, only genes with a knockdown efficacy exceeding X% in both males and females were kept for further analysis.

Hypothesis: Genes that are highly differentiated between phases are part of the downstream molecular machinery responding to density changes. If these genes do not directly drive rapid behavioral changes, they may instead contribute to the maintenance of phase-specific traits. Disrupting their function could interfere with gene-gene interactions essential for stabilizing either the solitarious or gregarious phase, triggering compensatory maintenance mechanism.

1. RNAi probe engineering

For Seema to add her part

Candidate genes for RNAi (decided from literature): - LOC126355014: S. gregaria heat shock 70 kDa protein 4 - LOC126297585: S. gregaria cAMP-dependent protein kinase catalytic subunit 1 - LOC126284097: S. gregaria DNA (cytosine-5)-methyltransferase 3B-like (Dnmt3)

Candidate genes for RNAi (decided from DEG and overlap analysis): - LOC126336408 (Hex1): S. gregaria hexamerin-like, transcript variant X2 - LOC126334874 (Hex2): S. gregaria hexamerin-like - LOC126335148 (jhmt): S. gregaria juvenile hormone acid O-methyltransferase-like, transcript variant X1 - LOC126334877: S. gregaria allergen Cr-PI-like - LOC126268104 (unch): S. gregaria zona pellucida domain-containing protein miniature - LOC126277894 (miox): S. gregaria inositol oxygenase-like

LOC126335513: S. gregaria protein yellow-like
LOC126328344: S. gregaria protein takeout-like
LOC126272949: S. gregaria putative beta-carotene-binding protein
LOC126355774: S. gregaria cuticle protein 18.7-like

2. Behavioral assays

For Seema and Audelia to add their parts

3. Prepare OrgDB for S. gregaria

To prepare future query of gene annotations for enrichment analysis, we can choose to use R packages that dynamically query them from online resources. We attempted two methods here: one is to build an OrgDB project for S. gregaria using NCBI RefSeq, and the other is using blast2go evidence previously generated for the cross-species RNAseq.

The first method created close to 100 Gb worth of files to cache the NCBI and corresponding files for S. gregaria. However, due to errors with the NCBI genes, we preferred to opt for the second option. Below we present the code used:

library(AnnotationForge)
library(rtracklayer)
library(Biostrings)

# First attempt with NCBI data
makeOrgPackageFromNCBI(version = "0.1",
                       author = "Devon J. Boland <devonjboland@tamu.edu>",
                       maintainer = "Devon J. Boland <devonjboland@tamu.edu>",
                       outputDir = ".",
                       tax_id = "7010",
                       genus = "Schistocerca",
                       species = "gregaria")

# Create custom ORGdb project using blast2go evidence

ggtf <- import("/Users/maevatecher/Documents/GitHub/locust-comparative-genomics/data/RefSeq/GCF_023897955.1_iqSchGreg1.2_genomic.gtf")
gtf_df <- as.data.frame(ggtf)

protein_coding_genes <- gtf_df[which(gtf_df$gene_biotype == "protein_coding"), ]
protein_coding_genes <- protein_coding_genes[which(protein_coding_genes$source != "RefSeq"), ]
rownames(protein_coding_genes) <- NULL

gregariaSym <- protein_coding_genes[, c(10, 12, 13)]
gregariaSym$db_xref <- gsub("GeneID:", "", gregariaSym$db_xref)
colnames(gregariaSym) <- c("GID", "ENTREZ", "GENENAME")

gregariaChr <- protein_coding_genes[, c(10, 1)]
colnames(gregariaChr) <- c("GID", "CHROMOSOME")

# Removed predicted NCBI genes as they were causing errors with package, and not having appropriate information, or model confidence. Additionally, blast2GO assinged some of these EC codes over GO codes so they were removed
gregariaGO <- read.delim("/Users/maevatecher/Documents/GitHub/locust-comparative-genomics/data/list/GO_Annotations/blast2go_gregaria.annot.mgp_removed", sep = "\t", header = F)
colnames(gregariaGO) <- c("GID", "GO", "EVIDENCE")
gregariaGO$EVIDENCE <- "ISS"

gregariaGO <- gregariaGO[!grepl("EC:", gregariaGO$GO), ]  # remove rows containing EC annotation codes

custom_db_package <- "/Users/maevatecher/Documents/GitHub/locust-comparative-genomics/data/custom_sgregaria_orgdb"
dir.create(custom_db_package)

orgdb_df <- data.frame(
  organism = "Schistocerca gregaria",
  tax_id = "7010",
  genus = "Schistocerca",
  species = "gregaria",
  genome_build = "GCF_023897955.1_iqSchGreg1.2"
)

makeOrgPackage(gene_info=gregariaSym,
               chromosome=gregariaChr,
               go=gregariaGO,
               version="1.0.0",
               maintainer= "Devon J. Boland <devonjboland@tamu.edu>",
               author="Devon J. Boland <devonjboland@tamu.edu>",
               outputDir=custom_db_package,
               tax_id = "7010",
               genus = "Schistocerca",
               species = "gregaria",
               goTable="go",
               verbose=TRUE)

Do these two steps before to install the new package:

install.packages("remotes")  # Install remotes if not installed
remotes::install_local("/Users/maevatecher/Documents/GitHub/locust-comparative-genomics/data/custom_sgregaria_orgdb/org.Sgregaria.eg.db")

library("org.Sgregaria.eg.db")
keytypes(org.Sgregaria.eg.db)

 [1] "CHROMOSOME"  "ENTREZ"      "EVIDENCE"    "EVIDENCEALL" "GENENAME"   
 [6] "GID"         "GO"          "GOALL"       "ONTOLOGY"    "ONTOLOGYALL"

4. DEGs and enrichment of RNAi samples

The following results were obtained using the same RNA-seq workflow as the non-RNAi bulk tissue transcriptomics. This includes RNA extraction using Maxwell Promega simplyRNA tisse kit, RNA library preparation with the Illumina Total Stranded RNA kit with RiboDepletion, and short-read sequencing on an Illumina NovaSeq PE150 platform. Differentially expressed genes between GFP-injected controls and RNAi-injected last nymphal instar females of the gregarious phase were analyzed using DESeq2.

We start by loading all the required R packages with in particular DESeq2 for DEG analysis, biomaRt for pathway annotations and clusterProfiler for GO enrichment and visualization.

library("DESeq2")
library("ggplot2")
library("ggrepel")
library("ggConvexHull")
library("AnnotationHub")
library("ensembldb")
library("ComplexHeatmap")
library("RColorBrewer")
library("circlize")
library("EnhancedVolcano")
library("clusterProfiler")
library("sva")
library("cowplot")
library("ashr")
library("dplyr")
library("purrr")
library("httr2")
library("biomaRt")
library("rafalib")
library("DT")

workDir <- "/Users/maevatecher/Documents/GitHub/locust-comparative-genomics/data"
setwd(workDir)
allspecies_path <- file.path(workDir, "/list/13polyneoptera_geneid_ncbi.csv")
allspecies_df <- read.table(allspecies_path, sep = ",", header = TRUE, quote = "", fill = TRUE, stringsAsFactors = FALSE)

We also create ahead function that we will use to output graphs (thanks to Devon’s touch) as files and visible in line in this report.

######################################################################################## 
# DEGs FUNCTIONS
######################################################################################## 

create_output_dirs <- function(label) {
  dir.create(file.path(saveDir, label), showWarnings = FALSE)
  return()
}

######################################################################################## 

create_pca_plots <- function(norm.dds, saveDir, transformation = "vst", intgroup = "Condition", title = NULL) {
  
  # Ensure saveDir exists
  dir.create(saveDir, showWarnings = FALSE, recursive = TRUE)

  # Apply the requested transformation
  if (transformation == "vst") {
    vsd <- vst(dds, blind = FALSE)
  } else if (transformation == "rlog") {
    vsd <- rlog(dds, blind = FALSE)
  } else if (transformation == "log2") {
    vsd <- log2(counts(dds, normalized = TRUE) + 1)
  } else {
    stop("Invalid transformation type. Choose from 'vst', 'rlog', or 'log2'.")
  }

  # If no title is provided, create one dynamically
  if (is.null(title)) {
    title <- paste("PCA on", intgroup, "(", transformation, "transformation)")
  }

  # Construct filename prefix based on transformation & grouping
  file_prefix <- paste0("PCA_", transformation, "_", intgroup)

  # First PCA: **with labels**
  pca_labelled <- plotPCA(vsd, intgroup = intgroup) + 
    geom_text_repel(aes(label = rownames(colData(vsd))), size = 4, max.overlaps = 20) +
    geom_point(size = 3) +
    theme_bw() +
    theme(legend.title = element_blank(),
          legend.text = element_text(face = "bold", size = 16),
          axis.text = element_text(size = 12),
          axis.title = element_text(size = 12)) +
    ggtitle(title)

  # Save labelled PCA plot
  ggsave(paste0(saveDir, "/", file_prefix, "_labelled.png"), width = 10, height = 10, 
         dpi = 600, device = "png", plot = pca_labelled)

  # Second PCA: **Convex Hulls** around groups
  pca_hull <- plotPCA(vsd, intgroup = intgroup) +
    geom_point(size = 3) +
    theme_bw() +
    theme(legend.title = element_blank(),
          legend.text = element_text(face = "bold", size = 16),
          axis.text = element_text(size = 12),
          axis.title = element_text(size = 12)) + 
    geom_convexhull(aes(fill = .data[[intgroup]]), alpha = 0.5) +  # Fully dynamic grouping
    ggtitle(title, subtitle = paste0(transformation, " transformation"))

  # Save hull PCA plot
  ggsave(paste0(saveDir, "/", file_prefix, "_hull.png"), width = 10, height = 10, 
         dpi = 600, device = "png", plot = pca_hull)

  # Return plots for inline display in knitr/RMarkdown
  return(list(PCA_Labelled = pca_labelled, PCA_Hull = pca_hull))
}

######################################################################################## 

create_sva_plots <- function(svseq, dds, saveDir, intgroup = c("Tissue", "Gene"), max_sv = 3) {
  
  # Ensure output directory exists
  dir.create(saveDir, showWarnings = FALSE, recursive = TRUE)

  # **Create grouping factor dynamically**
  tissue_gene_groups <- interaction(dds[[intgroup[1]]], dds[[intgroup[2]]], drop = TRUE)
  unique_groups <- unique(tissue_gene_groups)
  
  # Assign colors per unique group
  group_colors <- setNames(colorRampPalette(brewer.pal(min(length(unique_groups), 8), "Set1"))(length(unique_groups)), unique_groups)

  # **First plot: Stripchart of first N surrogate variables**
  stripchart_list <- list()
  
  for (i in 1:max_sv) {
    sv_values <- svseq$sv[, i]
    
    p <- ggplot(data.frame(SV = sv_values, Group = tissue_gene_groups), aes(x = Group, y = SV, fill = Group)) +
      geom_jitter(shape = 21, size = 3, width = 0.2, color = "black") +
      scale_fill_manual(values = group_colors) +
      theme_minimal() +
      labs(title = paste0("Surrogate Variable ", i, " (SV", i, ") - Technical Variation"), 
           y = "SV Value") +
      theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
      geom_hline(yintercept = 0, linetype = "dashed", color = "gray")
    
    # Save each stripchart plot
    stripchart_file <- file.path(saveDir, paste0("sva_stripchart_SV", i, ".png"))
    ggsave(stripchart_file, plot = p, width = 10, height = 5, dpi = 300)
    
    stripchart_list[[i]] <- p
  }

  # **Second plot: SV scatter plots (pairwise comparisons)**
  scatter_list <- list()
  scatter_pairs <- list(c(1,2), c(1,3), c(2,3))  # Pairwise SV combinations
  
  # Define label colors
  gene_labels <- as.character(dds[[intgroup[2]]])
  unique_genes <- unique(gene_labels)
  gene_colors <- setNames(colorRampPalette(brewer.pal(min(length(unique_genes), 8), "Set1"))(length(unique_genes)), unique_genes)

  for (pair in scatter_pairs) {
    p <- ggplot(data.frame(SV1 = svseq$sv[, pair[1]], SV2 = svseq$sv[, pair[2]], Gene = gene_labels), 
                aes(x = SV1, y = SV2, color = Gene)) +
      geom_point(size = 3) +
      scale_color_manual(values = gene_colors) +
      theme_minimal() +
      labs(title = paste("SVA Analysis: SV", pair[1], " vs SV", pair[2]), 
           x = paste0("SV", pair[1]), y = paste0("SV", pair[2]))
    
    # Save each scatter plot
    scatter_file <- file.path(saveDir, paste0("sva_scatter_SV", pair[1], "_SV", pair[2], ".png"))
    ggsave(scatter_file, plot = p, width = 10, height = 5, dpi = 300)
    
    scatter_list[[paste(pair[1], pair[2], sep = "_")]] <- p
  }

  # **Return plots for knitr/RMarkdown**
  return(list(Stripcharts = stripchart_list, ScatterPlots = scatter_list))
}

######################################################################################## 

# Retrieve various accession IDs
get_sig_genes <- function(res) {
  sig_genes <- res[which(res$padj < 0.05 & abs(res$log2FoldChange)>=1.0), ]
  sig_genes <- sig_genes[order(sig_genes, decreasing = T), ]
  return(sig_genes)
}

######################################################################################## 

library(kableExtra)
library(DT)  # For interactive tables
library(dplyr)

generate_deg_table <- function(ddssva, contrast_name, allspecies_df, tresh_padj = 0.05, tresh_logfold = 1) {
  
  # --- Extract DESeq2 Results ---
  deg_results <- results(ddssva, name = contrast_name, alpha = tresh_padj)
  summary(deg_results)
  
  # Convert to DataFrame and retain GeneID
  deg_df <- as.data.frame(deg_results)
  deg_df$GeneID <- rownames(deg_df)

  # --- Filter Significant DEGs ---
  deg_df <- deg_df %>%
    filter(!is.na(padj) & padj < tresh_padj & abs(log2FoldChange) > tresh_logfold)  # Remove NA values and filter by thresholds
  
  # --- Merge with Metadata ---
  meta_deg_df <- merge(deg_df, allspecies_df, by = "GeneID", all.x = TRUE)

  # Ensure GeneType is retained and replace NA values
  if (!"GeneType" %in% colnames(meta_deg_df)) {
    message("GeneType column missing, filling with 'Unknown'")
    meta_deg_df$GeneType <- "Unknown"
  }
  meta_deg_df$GeneType[is.na(meta_deg_df$GeneType)] <- "Unknown"

  # Select and reorder relevant columns
  meta_deg_df <- meta_deg_df %>%
    dplyr::select(GeneID, GeneType, Description, Species, 
                  baseMean, log2FoldChange, lfcSE, stat, pvalue, padj)

  # Round numeric columns
  numeric_cols <- c("baseMean", "log2FoldChange", "lfcSE", "stat", "pvalue", "padj")
  meta_deg_df[numeric_cols] <- round(meta_deg_df[numeric_cols], 2)

  # --- Apply Row Styling for Visualization ---
  meta_deg_df$row_color <- ifelse(meta_deg_df$log2FoldChange > 1, "red", 
                                  ifelse(meta_deg_df$log2FoldChange < -1, "blue", "black"))

  # --- Create Searchable DataTable with Row Coloring ---
  deg_kable <- datatable(meta_deg_df, options = list(
    pageLength = 10, scrollX = TRUE, autoWidth = TRUE, searchHighlight = TRUE
  ),
  rownames = FALSE, escape = FALSE,
  caption = paste("DEG Table:", contrast_name)  # Title added here
  ) %>%
  formatStyle(
    columns = names(meta_deg_df),  # Apply styling to all columns
    target = 'row',
    backgroundColor = styleEqual(c("red", "blue", "black"), c("#FFDDDD", "#DDDDFF", "white")),  # Light red for up, light blue for down
    color = styleEqual(c("red", "blue", "black"), c("red", "blue", "black")), 
    fontWeight = styleEqual(c("bold", "normal"), c("bold", "normal"))
  ) %>%
  formatStyle(
    'Species', target = 'cell', fontStyle = 'italic'
  )

  # --- Return Both Raw and Interactive Table ---
  return(list(
    meta_results = meta_deg_df, 
    kable_table = deg_kable
  ))
}


######################################################################################## 

create_volcano <- function(res, label) {
  mypalette <- brewer.pal(9, "Set1")
  volcano <-EnhancedVolcano(res,
                            lab=rownames(res),
                            x='log2FoldChange',
                            y='padj',
                            title=paste("Volcano Plot:", label),
                            col=c(mypalette[9], mypalette[3], mypalette[2], 
                                  mypalette[1]),
                            labSize = 4,
                            pCutoff = 0.05,
                            FCcutoff = 1,
                            pointSize = 3,
                            drawConnectors = T,
                            widthConnectors = 0.5,
                            colConnectors = "black",
                            max.overlaps = 25,
                            gridlines.major = F,
                            gridlines.minor = F)
  # Save plot at TIFF
  ggsave(paste0(saveDir, "/", label,"/volcano_plot_",label,".tiff"), device = "tiff",
         plot = volcano, width = 10, height = 10)
  
  # Retrurn the plot for inline display
  return(volcano)
}

######################################################################################## 

create_heatmap <- function(res, label, contrast_) {
  mat <- counts(dds, normalized = TRUE)
  mat.z <- t(apply(mat, 1, scale))
  colnames(mat.z) <- colnames(mat)
  mat.z <- mat.z[rownames(res), contrast_, drop = FALSE]
  rownames(mat.z) <- rownames(res)
  
  # Create the heatmap
  heatmap_plot <- Heatmap(mat.z,
                          cluster_rows = TRUE,
                          cluster_columns = FALSE,
                          column_labels = contrast_,
                          name = "Z-Transformed Counts",
                          row_labels = rownames(mat.z),
                          row_names_gp = gpar(fontsize = 8))
  
  # Save in TIFF
  tiff(paste0(saveDir, "/", label, "/heatmap_plot_", label, ".tiff"),
       units = "in", res = 300, width = 5, height = 10)
  draw(heatmap_plot)
  dev.off()

  # Return the heatmap object for inline display
  return(heatmap_plot)
}

######################################################################################## 

visualize_data <- function(res, label, contrast_) {
  sig_genes <- get_sig_genes(res)
  create_output_dirs(label)
  
 # Save results
  write.csv(as.data.frame(sig_genes),
            paste0(saveDir, "/", label, "/DEG_results_", label, ".csv"))
  
# Generate and display plots
  volcano_plot <- create_volcano(res, label)
  heatmap_plot <- create_heatmap(sig_genes, label, contrast_)

  # Return plots for knitr inline visualization
  list(volcano = volcano_plot, heatmap = heatmap_plot)
}

######################################################################################## 
# ANNOTATION PART
######################################################################################## 

get_ids <- function(res) {
  rownames(res) <- as.character(rownames(res))
  res$ensembl_gene_id <- row.names(res)
  annotations <- getBM(attributes = c("ensembl_gene_id", "geneid"),
                       filters = "ensembl_gene_id",
                       values = rownames(res),
                       mart = dataset)
  return(annotations$geneid)
}

######################################################################################## 

GOMFEnrichment <- function(res, label) {
  # Check if there are valid gene IDs
  if (!is.null(res)) {
    
    # Perform GO enrichment analysis
    ego <- enrichGO(
      gene = rownames(res),
      OrgDb = org.Sgregaria.eg.db,
      keyType = "GID",
      ont = "MF",  # Cellular Component
      pAdjustMethod = "BH",  # Benjamini-Hochberg adjustment
      pvalueCutoff = 0.1
    )
    
    # Check if the result has any significant enrichment terms
    if (nrow(as.data.frame(ego)) > 0) {
      # Create the barplot
      go_barplot <- barplot(ego, showCategory = 20) +  # Show top 20 categories
        ggtitle(paste("GO MF Enrichment:", label))
      
      # Print the plot
      ggsave(paste0(saveDir, "/", label,"/gp_MF_barplot_",label,".tiff"), device = "tiff",
             plot=go_barplot, width=10, height = 10)
      
      change_vec <- res$log2FoldChange
      names(change_vec) <- rownames(res)
      RYD  = brewer.pal(n = 8, name = "RdBu")
      go_network <- cnetplot(ego, foldChange=change_vec) + 
        scale_color_gradientn(colours = RYD, limits=c(-2,2))
      ggsave(paste0(saveDir, "/", label,"/gp_MF_cnetplot_",label,".tiff"), device = "tiff",
             plot=go_network, width=30, height = 30, bg = "white")
      
      write.csv(as.data.frame(ego), paste0(saveDir, "/", label,
                                           "/GO_MF_Enrichment_Results_", label,".csv"))
    } else {
      message("No significant MF GO terms found.")
    }
    
  } else {
    message("No valid gene IDs found.")
  }
  return()
}

######################################################################################## 

GOCCEnrichment <- function(res, label) {
  # Check if there are valid gene IDs
  if (!is.null(res)) {
    
    # Perform GO enrichment analysis
    ego <- enrichGO(
      gene = rownames(res),
      OrgDb = org.Sgregaria.eg.db,
      keyType = "GID",
      ont = "CC",  # Cellular Component
      pAdjustMethod = "BH",  # Benjamini-Hochberg adjustment
      pvalueCutoff = 0.1
    )
    
    # Check if the result has any significant enrichment terms
    if (nrow(as.data.frame(ego)) > 0) {
      # Create the barplot
      go_barplot <- barplot(ego, showCategory = 20) +  # Show top 20 categories
        ggtitle(paste("GO CC Enrichment:", label))
      
      # Print the plot
      ggsave(paste0(saveDir, "/", label,"/gp_CC_barplot_",label,".tiff"), device = "tiff",
             plot=go_barplot, width=10, height = 10)
      
      change_vec <- res$log2FoldChange
      names(change_vec) <- rownames(res)
      RYD  = brewer.pal(n = 8, name = "RdBu")
      go_network <- cnetplot(ego, foldChange=change_vec) + 
        scale_color_gradientn(colours = RYD, limits=c(-2,2))
      ggsave(paste0(saveDir, "/", label,"/gp_CC_cnetplot_",label,".tiff"), device = "tiff",
             plot=go_network, width=30, height = 30, bg = "white")
      
      write.csv(as.data.frame(ego), paste0(saveDir, "/", label,
                                           "/GO_CC_Enrichment_Results_", label,".csv"))
    } else {
      message("No significant CC GO terms found.")
    }
    
  } else {
    message("No valid gene IDs found.")
  }
  return()
}

######################################################################################## 

GOBPEnrichment <- function(res, label) {
  # Check if there are valid gene IDs
  if (!is.null(res)) {
    
    # Perform GO enrichment analysis
    ego <- enrichGO(
      gene = rownames(res),
      OrgDb = org.Sgregaria.eg.db,
      keyType = "GID",
      ont = "BP",  # Cellular Component
      pAdjustMethod = "BH",  # Benjamini-Hochberg adjustment
      pvalueCutoff = 0.1
    )
    
    # Check if the result has any significant enrichment terms
    if (nrow(as.data.frame(ego)) > 0) {
      # Create the barplot
      go_barplot <- barplot(ego, showCategory = 20) +  # Show top 20 categories
        ggtitle(paste("GO BP Enrichment:", label))
      
      # Print the plot
      ggsave(paste0(saveDir, "/", label,"/gp_BP_barplot_",label,".tiff"), device = "tiff",
             plot=go_barplot, width=10, height = 10)
      
      change_vec <- res$log2FoldChange
      names(change_vec) <- rownames(res)
      RYD  = brewer.pal(n = 8, name = "RdBu")
      go_network <- cnetplot(ego, foldChange=change_vec) + 
        scale_color_gradientn(colours = RYD, limits=c(-2,2))
      ggsave(paste0(saveDir, "/", label,"/gp_BP_cnetplot_",label,".tiff"), device = "tiff",
             plot=go_network, width=30, height = 30, bg="white")
      
      write.csv(as.data.frame(ego), paste0(saveDir, "/", label,
                                           "/GO_BP_Enrichment_Results_", label,".csv"))
    } else {
      message("No significant BP GO terms found.")
    }
    
  } else {
    message("No valid gene IDs found.")
  }
  return()
}

######################################################################################## 

KEGGEnrichment <- function(res, label) {
  # Check if there are valid gene IDs
  if (!is.null(res)) {
    
    kk <- enrichKEGG(gene = rownames(res),
                     organism = "sgre",
                     pvalueCutoff = 0.1)
    
    # Check if the result has any significant enrichment terms
    if (nrow(as.data.frame(kk)) > 0) {
      
      kk_barplot <- barplot(kk) + ggtitle(paste("KEGG Enrichment:", label))
      ggsave(paste0(saveDir, "/", label,"/kk_barplot_",label,".tiff"), device = "tiff",
             plot=kk_barplot, width=10, height = 10)
      
      change_vec <- res$log2FoldChange
      names(change_vec) <- rownames(res)
      RYD  = brewer.pal(n = 8, name = "RdBu")
      kk_network <- cnetplot(kk, foldChange=change_vec) + 
        scale_color_gradientn(colours = RYD, limits=c(-2,2))
      ggsave(paste0(saveDir, "/", label,"/kk_cnetplot_",label,".tiff"), device = "tiff",
             plot=kk_network, width=30, height = 30, bg = "white")
      
      write.csv(as.data.frame(kk), paste0(saveDir,  "/",label,
                                          "/KEGG_Enrichment_Results_",
                                          label,".csv"))
    } else {
      message("No significant KEGG terms found.")
    }
    
  } else {
    message("No valid gene IDs found.")
  }
  return()
}

######################################################################################## 

enrich_data <- function(res, label, contrast_) {
  sig_genes <- get_sig_genes(res)
  create_output_dirs(label)
  GOMFEnrichment(sig_genes, label)
  GOBPEnrichment(sig_genes, label)
  GOCCEnrichment(sig_genes, label)
  KEGGEnrichment(sig_genes, label)
  return()
}

########################################################################################

All genes included

All tissue

Minor changes here are made compared to the DESeq2 results regarding the importation of samples to transform into a matrix. Sample names are structured as follow: {Sg}{gene}{#} {Sg} = Schistocerca gregaria {gene} = gene abbreviation gfp, hex1, hex2, jhmt, miox and unch H/T{#} = biological replicate

saveDir <- paste0(workDir,"/DEG_results/RNAi/All")
dir.create(saveDir)
### Prepare Sample CSV file #####
samples <- read.delim(file.path(workDir, "list/RNAi/All_RNAisample_list.csv"), sep = ",", row.names = 1, header = TRUE)
files <- file.path(workDir, "readcounts/RNAi/", samples$Tissue, samples$Filename)
names(files) <- row.names(samples)
all(file.exists(files))

[1] TRUE

### Create count sample matrix
cts <- map_dfc(files, function(sample) {
  data_count <- read.delim(sample, sep = "\t", header = FALSE)
  col_name <- gsub("_counts.txt", "", basename(sample)) 
  setNames(data.frame(data_count[, 2]), col_name)
})

row_get <- read.delim(files[1], sep = "\t", row.names = 1, header = F) # Get proper row names
rownames(cts) <- rownames(row_get)
rm(row_get) # remove unused object from memory

While for bulk RNAseq on head and thorax for all species, the DEGs model was made between isolated and crowded individuals (with isolated as the reference state), here, the DEG analysis will be carried between GFP knock-down nymphs (as reference state) vs Hexamerins / Juvenile Hormones / Uncharacterized proteins.

### Build DESeq2 Object
dds <- DESeqDataSetFromMatrix(countData = cts,
                              colData = samples,
                              design = ~ Gene)
dds$Gene <- relevel(dds$Gene, ref = "GFP")

smallestGroupSize <- 5
keep <- rowSums(counts(dds) >= 10) >= smallestGroupSize
dds <- dds[keep,]

dds <- DESeq(dds)

Following the generation of the DEseq2 object, we annotate the genes with the GeneID using biomaRt.

### Fetch Annotation Gene IDs using biomaRt
ensembl <- useMart("metazoa_mart", host = "https://metazoa.ensembl.org")
metazoa_list <- listDatasets(ensembl)
dataset <- useMart("metazoa_mart", dataset = "sggca023897955v2rs_eg_gene",
                   host = "https://metazoa.ensembl.org")
#listAttributes(dataset)

test_raw_counts <- as.data.frame(counts(dds))
rownames(test_raw_counts) <- as.character(rownames(test_raw_counts))
test_raw_counts$ensembl_gene_id <- row.names(test_raw_counts)
annotations <- getBM(attributes = c("ensembl_gene_id", "geneid"),
                     filters = "ensembl_gene_id",
                     values = rownames(test_raw_counts),
                     mart = dataset)

# Merge dataframes to retain geneid information from biomaRt
test_raw_counts_annotated <- merge(test_raw_counts, annotations,
                                   by = "ensembl_gene_id",
                                   all.x = T)

write.csv(test_raw_counts_annotated, file=paste0(saveDir,"/All_raw_counts.csv"))

Normalization and PCA

# Plot PCA and investigate quality metrics
vsd <- vst(dds, blind = TRUE) 

# Perform PCA
pca_data <- plotPCA(vsd, intgroup = c("Tissue", "Gene"), returnData = TRUE)

# Define colors for genes (slightly transparent) and shapes for tissues
gene_colors <- scale_color_manual(values = alpha(brewer.pal(n = length(unique(pca_data$Gene)), name = "Set1"), 0.8))  # Points are transparent
tissue_shapes <- scale_shape_manual(values = seq(15, 15 + length(unique(pca_data$Tissue))))

# **PCA without labels**
p_pca_nolabel <- ggplot(pca_data, aes(x = PC1, y = PC2, color = Gene, shape = Tissue)) +
  geom_point(size = 4) +
  gene_colors + 
  tissue_shapes +
  theme_bw() +
  theme(legend.title = element_blank(),
        legend.text = element_text(face = "bold", size = 14),
        axis.text = element_text(size = 12),
        axis.title = element_text(size = 14)) +
  ggtitle("PCA of All Tissues (No Labels)", subtitle = "Tissues differentiated by shape, Genes by color")

# Save PCA without labels
ggsave(paste0(saveDir, "/PCA_Tissue_Gene_NoLabel.png"), plot = p_pca_nolabel, width = 10, height = 10, dpi = 600, device = "png")

# **PCA with labels**
p_pca_label <- ggplot(pca_data, aes(x = PC1, y = PC2, color = Gene, shape = Tissue)) +
  geom_point(size = 4) +
  geom_text_repel(aes(label = name), size = 4, color = "black", max.overlaps = 20) +  # Labels are fully visible
  gene_colors + 
  tissue_shapes +
  theme_bw() +
  theme(legend.title = element_blank(),
        legend.text = element_text(face = "bold", size = 14),
        axis.text = element_text(size = 12),
        axis.title = element_text(size = 14)) +
  ggtitle("PCA of All Tissues (With Labels)", subtitle = "Tissues differentiated by shape, Genes by color")

# Save PCA with labels
ggsave(paste0(saveDir, "/PCA_Tissue_Gene_Label.png"), plot = p_pca_label, width = 10, height = 10, dpi = 600, device = "png")

# **Return plots for knitr/RMarkdown**
list(NoLabel = p_pca_nolabel, WithLabel = p_pca_label)

$NoLabel

Version	Author	Date
d7fa779	Maeva TECHER	2025-02-14


$WithLabel

Version	Author	Date
d7fa779	Maeva TECHER	2025-02-14

The PCA plot shows clear distinction between tissue types, while gene silencing has a large variation within each tissue, and presents no distinct clear groupings for a single gene.

SVA

### SVA analysis to control for technical variation 
dat  <- counts(dds, normalized = TRUE)
idx  <- rowMeans(dat) > 1
dat  <- dat[idx, ]
mod  <- model.matrix(~ Gene, colData(dds))
mod0 <- model.matrix(~ 1, colData(dds))
svseq <- svaseq(dat, mod, mod0)

Number of significant surrogate variables is:  3 
Iteration (out of 5 ):1  2  3  4  5

sva_plots <- create_sva_plots(svseq, dds, saveDir, intgroup = c("Tissue", "Gene"))

# Show stripcharts in the report
sva_plots$Stripcharts[[1]]  # Show first stripchart

Version	Author	Date
d7fa779	Maeva TECHER	2025-02-14

sva_plots$Stripcharts[[2]]  # Show second stripchart

Version	Author	Date
d7fa779	Maeva TECHER	2025-02-14

sva_plots$Stripcharts[[3]]  # Show third stripchart

Version	Author	Date
d7fa779	Maeva TECHER	2025-02-14

# Show scatter plots in the report
sva_plots$ScatterPlots[["1_2"]]  # Show SV1 vs SV2

Version	Author	Date
d7fa779	Maeva TECHER	2025-02-14

sva_plots$ScatterPlots[["1_3"]]  # Show SV1 vs SV3

Version	Author	Date
d7fa779	Maeva TECHER	2025-02-14

sva_plots$ScatterPlots[["2_3"]]  # Show SV2 vs SV3

Version	Author	Date
d7fa779	Maeva TECHER	2025-02-14

SV1 is clearly showing an effect of tissue. We rerun the DESeq2 model but this time including the surrogate variable SV2 and SV3 as a covariates only, as we know that the modeled variation is more likely explained by tissue and gene variation rather than batch effects.

ddssva <- dds
ddssva$SV2 <- svseq$sv[,2]
ddssva$SV3 <- svseq$sv[,3]
design(ddssva) <- ~ SV2 + SV3 + Gene
ddssva$Gene <- relevel(ddssva$Gene, ref = "GFP")

smallestGroupSize <- 5
keep <- rowSums(counts(ddssva) >= 10) >= smallestGroupSize
ddssva <- ddssva[keep,]

ddssva <- DESeq(ddssva)

ddssva <- ddssva[which(mcols(ddssva)$betaConv),] # remove non converging rows

### Extract results
resultsNames(ddssva)

[1] "Intercept"        "SV2"              "SV3"              "Gene_HEX1_vs_GFP"
[5] "Gene_HEX2_vs_GFP" "Gene_JHMT_vs_GFP" "Gene_MIOX_vs_GFP" "Gene_UNCH_vs_GFP"

hex1 <- results(ddssva, name = "Gene_HEX1_vs_GFP", alpha = 0.05)
hex2 <- results(ddssva, name = "Gene_HEX2_vs_GFP", alpha = 0.05)
jhmt <- results(ddssva, name = "Gene_JHMT_vs_GFP", alpha = 0.05)
miox <- results(ddssva, name = "Gene_MIOX_vs_GFP", alpha = 0.05)
unch <- results(ddssva, name = "Gene_UNCH_vs_GFP", alpha = 0.05)

Volcano plots and Heatmaps

First we create function to generate the plots we are interested to obtain and then run the whole pipeline for each gene.

# Define contrast_sets
hex1_samples <- c("SggfpH1","SggfpH2","SggfpH3","SggfpH4","SggfpH5",
                  "Sghex1H1","Sghex1H2","Sghex1H3","Sghex1H4","Sghex1H5",
                  "SggfpT1","SggfpT2","SggfpT3","SggfpT4","SggfpT5",
                  "Sghex1T1","Sghex1T2","Sghex1T3","Sghex1T4","Sghex1T5")
hex2_samples <- c("SggfpH1","SggfpH2","SggfpH3","SggfpH4","SggfpH5",
                  "Sghex2H1","Sghex2H2","Sghex2H3","Sghex2H4","Sghex2H5",
                  "SggfpT1","SggfpT2","SggfpT3","SggfpT4","SggfpT5",
                  "Sghex2T1","Sghex2T2","Sghex2T3","Sghex2T4","Sghex2T5")
jhmt_samples <- c("SggfpH1","SggfpH2","SggfpH3","SggfpH4","SggfpH5",
                  "SgjhmtH1","SgjhmtH2","SgjhmtH3","SgjhmtH4","SgjhmtH5",
                  "SggfpT1","SggfpT2","SggfpT3","SggfpT4","SggfpT5",
                  "SgjhmtT1","SgjhmtT2","SgjhmtT3","SgjhmtT4","SgjhmtT5")
miox_samples <- c("SggfpH1","SggfpH2","SggfpH3","SggfpH4","SggfpH5",
                  "SgmioxH1","SgmioxH2","SgmioxH3","SgmioxH4","SgmioxH5",
                  "SggfpT1","SggfpT2","SggfpT3","SggfpT4","SggfpT5",
                  "SgmioxT1","SgmioxT2","SgmioxT3","SgmioxT4","SgmioxT5")
unch_samples <- c("SggfpH1","SggfpH2","SggfpH3","SggfpH4","SggfpH5",
                  "SgunchH1","SgunchH2","SgunchH3","SgunchH4","SgunchH5",
                  "SggfpT1","SggfpT2","SggfpT3","SggfpT4","SggfpT5",
                  "SgunchT1","SgunchT2","SgunchT3","SgunchT4","SgunchT5")

# Run full analysis
hex1_plots <- visualize_data(hex1, "Hex1_vs_GFP", hex1_samples)
hex2_plots <- visualize_data(hex2, "Hex2_vs_GFP", hex2_samples)
jhmt_plots <- visualize_data(jhmt, "JHMT_vs_GFP", jhmt_samples)
miox_plots <- visualize_data(miox, "MIOX_vs_GFP", miox_samples)
unch_plots <- visualize_data(unch, "UNCH_vs_GFP", unch_samples)

hex1_plots$volcano; hex1_plots$heatmap

Version	Author	Date
d7fa779	Maeva TECHER	2025-02-14

Version	Author	Date
d7fa779	Maeva TECHER	2025-02-14

hex2_plots$volcano; hex2_plots$heatmap

Version	Author	Date
d7fa779	Maeva TECHER	2025-02-14

Version	Author	Date
d7fa779	Maeva TECHER	2025-02-14

jhmt_plots$volcano; jhmt_plots$heatmap

Version	Author	Date
d7fa779	Maeva TECHER	2025-02-14

Version	Author	Date
d7fa779	Maeva TECHER	2025-02-14

miox_plots$volcano; miox_plots$heatmap

Version	Author	Date
d7fa779	Maeva TECHER	2025-02-14

Version	Author	Date
d7fa779	Maeva TECHER	2025-02-14

unch_plots$volcano; unch_plots$heatmap

Version	Author	Date
d7fa779	Maeva TECHER	2025-02-14

Version	Author	Date
d7fa779	Maeva TECHER	2025-02-14

DEG table

# Generate tables for each contrast
table_hex1 <- generate_deg_table(ddssva, "Gene_HEX1_vs_GFP", allspecies_df)


out of 16363 with nonzero total read count
adjusted p-value < 0.05
LFC > 0 (up)       : 896, 5.5%
LFC < 0 (down)     : 993, 6.1%
outliers [1]       : 0, 0%
low counts [2]     : 1269, 7.8%
(mean count < 9)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results

table_hex2 <- generate_deg_table(ddssva, "Gene_HEX2_vs_GFP", allspecies_df)


out of 16363 with nonzero total read count
adjusted p-value < 0.05
LFC > 0 (up)       : 522, 3.2%
LFC < 0 (down)     : 541, 3.3%
outliers [1]       : 0, 0%
low counts [2]     : 1587, 9.7%
(mean count < 11)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results

table_jhmt <- generate_deg_table(ddssva, "Gene_JHMT_vs_GFP", allspecies_df)


out of 16363 with nonzero total read count
adjusted p-value < 0.05
LFC > 0 (up)       : 371, 2.3%
LFC < 0 (down)     : 672, 4.1%
outliers [1]       : 0, 0%
low counts [2]     : 318, 1.9%
(mean count < 6)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results

table_miox <- generate_deg_table(ddssva, "Gene_MIOX_vs_GFP", allspecies_df)


out of 16363 with nonzero total read count
adjusted p-value < 0.05
LFC > 0 (up)       : 71, 0.43%
LFC < 0 (down)     : 131, 0.8%
outliers [1]       : 0, 0%
low counts [2]     : 0, 0%
(mean count < 1)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results

table_unch <- generate_deg_table(ddssva, "Gene_UNCH_vs_GFP", allspecies_df)


out of 16363 with nonzero total read count
adjusted p-value < 0.05
LFC > 0 (up)       : 213, 1.3%
LFC < 0 (down)     : 337, 2.1%
outliers [1]       : 0, 0%
low counts [2]     : 0, 0%
(mean count < 1)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results

table_hex1$kable_table

table_hex2$kable_table

table_jhmt$kable_table

table_miox$kable_table

table_unch$kable_table

Head tissue

saveDir <- paste0(workDir,"/DEG_results/RNAi/Head")
dir.create(saveDir)
### Prepare Sample CSV file #####
samples <- read.delim(file.path(workDir, "list/RNAi/Head_RNAisample_list.csv"), sep = ",", row.names = 1, header = TRUE)
files <- file.path(workDir, "readcounts/RNAi/", samples$Tissue, samples$Filename)
names(files) <- row.names(samples)
all(file.exists(files))

[1] TRUE

### Create count sample matrix
cts <- map_dfc(files, function(sample) {
  data_count <- read.delim(sample, sep = "\t", header = FALSE)
  col_name <- gsub("_counts.txt", "", basename(sample)) 
  setNames(data.frame(data_count[, 2]), col_name)
})

row_get <- read.delim(files[1], sep = "\t", row.names = 1, header = F) # Get proper row names
rownames(cts) <- rownames(row_get)
rm(row_get) # remove unused object from memory

### Build DESeq2 Object
dds <- DESeqDataSetFromMatrix(countData = cts,
                              colData = samples,
                              design = ~ Gene)
dds$Gene <- relevel(dds$Gene, ref = "GFP")

smallestGroupSize <- 5
keep <- rowSums(counts(dds) >= 10) >= smallestGroupSize
dds <- dds[keep,]

dds <- DESeq(dds)

Following the generation of the DEseq2 object, we annotate the genes with the GeneID using biomaRt.

### Fetch Annotation Gene IDs using biomaRt
ensembl <- useMart("metazoa_mart", host = "https://metazoa.ensembl.org")
metazoa_list <- listDatasets(ensembl)
dataset <- useMart("metazoa_mart", dataset = "sggca023897955v2rs_eg_gene",
                   host = "https://metazoa.ensembl.org")
#listAttributes(dataset)

test_raw_counts <- as.data.frame(counts(dds))
rownames(test_raw_counts) <- as.character(rownames(test_raw_counts))
test_raw_counts$ensembl_gene_id <- row.names(test_raw_counts)
annotations <- getBM(attributes = c("ensembl_gene_id", "geneid"),
                     filters = "ensembl_gene_id",
                     values = rownames(test_raw_counts),
                     mart = dataset)

# Merge dataframes to retain geneid information from biomaRt
test_raw_counts_annotated <- merge(test_raw_counts, annotations,
                                   by = "ensembl_gene_id",
                                   all.x = T)

write.csv(test_raw_counts_annotated, file=paste0(saveDir,"/Head_raw_counts.csv"))

Normalization and PCA

# Plot PCA and investigate quality metrics
vsd <- vst(dds, blind=T)

pca_results <- create_pca_plots(norm.dds = vsd, saveDir, transformation = "vst", intgroup = "Gene")
pca_results$PCA_Labelled

Version	Author	Date
d7fa779	Maeva TECHER	2025-02-14
3746422	Maeva TECHER	2025-02-12

pca_results$PCA_Hull

Version	Author	Date
d7fa779	Maeva TECHER	2025-02-14
3746422	Maeva TECHER	2025-02-12

SVA

### SVA analysis to control for technical variation 
dat  <- counts(dds, normalized = TRUE)
idx  <- rowMeans(dat) > 1
dat  <- dat[idx, ]
mod  <- model.matrix(~ Gene, colData(dds))
mod0 <- model.matrix(~ 1, colData(dds))
svseq <- svaseq(dat, mod, mod0)

Number of significant surrogate variables is:  3 
Iteration (out of 5 ):1  2  3  4  5

sva_plots <- create_sva_plots(svseq, dds, saveDir, intgroup = c("Tissue", "Gene"))

# Show stripcharts in the report
sva_plots$Stripcharts[[1]]  # Show first stripchart

Version	Author	Date
d7fa779	Maeva TECHER	2025-02-14
3746422	Maeva TECHER	2025-02-12

sva_plots$Stripcharts[[2]]  # Show second stripchart

Version	Author	Date
d7fa779	Maeva TECHER	2025-02-14
3746422	Maeva TECHER	2025-02-12

sva_plots$Stripcharts[[3]]  # Show third stripchart

Version	Author	Date
d7fa779	Maeva TECHER	2025-02-14

# Show scatter plots in the report
sva_plots$ScatterPlots[["1_2"]]  # Show SV1 vs SV2

Version	Author	Date
d7fa779	Maeva TECHER	2025-02-14

sva_plots$ScatterPlots[["1_3"]]  # Show SV1 vs SV3

Version	Author	Date
d7fa779	Maeva TECHER	2025-02-14

sva_plots$ScatterPlots[["2_3"]]  # Show SV2 vs SV3

Version	Author	Date
d7fa779	Maeva TECHER	2025-02-14

We rerun the DESeq2 model but this time including the surrogate variable as a covariate, as we know that the modeled variation is more likely explained by tissue and gene variation rather than batch effects.

ddssva <- dds
ddssva$SV1 <- svseq$sv[,1]
ddssva$SV2 <- svseq$sv[,2]
ddssva$SV3 <- svseq$sv[,3]
design(ddssva) <- ~ SV1 + SV2 + SV3 + Gene
ddssva$Gene <- relevel(ddssva$Gene, ref = "GFP")

smallestGroupSize <- 5
keep <- rowSums(counts(ddssva) >= 10) >= smallestGroupSize
ddssva <- ddssva[keep,]

ddssva <- DESeq(ddssva)

ddssva <- ddssva[which(mcols(ddssva)$betaConv),] # remove non converging rows

### Extract results
resultsNames(ddssva)

[1] "Intercept"        "SV1"              "SV2"              "SV3"             
[5] "Gene_HEX1_vs_GFP" "Gene_HEX2_vs_GFP" "Gene_JHMT_vs_GFP" "Gene_MIOX_vs_GFP"
[9] "Gene_UNCH_vs_GFP"

hex1 <- results(ddssva, name = "Gene_HEX1_vs_GFP", alpha = 0.05)
hex2 <- results(ddssva, name = "Gene_HEX2_vs_GFP", alpha = 0.05)
jhmt <- results(ddssva, name = "Gene_JHMT_vs_GFP", alpha = 0.05)
miox <- results(ddssva, name = "Gene_MIOX_vs_GFP", alpha = 0.05)
unch <- results(ddssva, name = "Gene_UNCH_vs_GFP", alpha = 0.05)

Volcano plots and Heatmaps

First we create function to generate the plots we are interested to obtain and then run the whole pipeline for each gene.

# Define contrast_sets
hex1_samples <- c("SggfpH1","SggfpH2","SggfpH3","SggfpH4","SggfpH5",
                  "Sghex1H1","Sghex1H2","Sghex1H3","Sghex1H4","Sghex1H5")
hex2_samples <- c("SggfpH1","SggfpH2","SggfpH3","SggfpH4","SggfpH5",
                  "Sghex2H1","Sghex2H2","Sghex2H3","Sghex2H4","Sghex2H5")
jhmt_samples <- c("SggfpH1","SggfpH2","SggfpH3","SggfpH4","SggfpH5",
                  "SgjhmtH1","SgjhmtH2","SgjhmtH3","SgjhmtH4","SgjhmtH5")
miox_samples <- c("SggfpH1","SggfpH2","SggfpH3","SggfpH4","SggfpH5",
                  "SgmioxH1","SgmioxH2","SgmioxH3","SgmioxH4","SgmioxH5")
unch_samples <- c("SggfpH1","SggfpH2","SggfpH3","SggfpH4","SggfpH5",
                  "SgunchH1","SgunchH2","SgunchH3","SgunchH4","SgunchH5")

# Run full analysis
hex1_plots <- visualize_data(hex1, "Hex1_vs_GFP", hex1_samples)
hex2_plots <- visualize_data(hex2, "Hex2_vs_GFP", hex2_samples)
jhmt_plots <- visualize_data(jhmt, "JHMT_vs_GFP", jhmt_samples)
miox_plots <- visualize_data(miox, "MIOX_vs_GFP", miox_samples)
unch_plots <- visualize_data(unch, "UNCH_vs_GFP", unch_samples)

hex1_plots$volcano; hex1_plots$heatmap

Version	Author	Date
d7fa779	Maeva TECHER	2025-02-14

Version	Author	Date
d7fa779	Maeva TECHER	2025-02-14

hex2_plots$volcano; hex2_plots$heatmap

Version	Author	Date
d7fa779	Maeva TECHER	2025-02-14

Version	Author	Date
d7fa779	Maeva TECHER	2025-02-14

jhmt_plots$volcano; jhmt_plots$heatmap

Version	Author	Date
d7fa779	Maeva TECHER	2025-02-14

Version	Author	Date
d7fa779	Maeva TECHER	2025-02-14

miox_plots$volcano; miox_plots$heatmap

Version	Author	Date
d7fa779	Maeva TECHER	2025-02-14

Version	Author	Date
d7fa779	Maeva TECHER	2025-02-14

unch_plots$volcano; unch_plots$heatmap

Version	Author	Date
d7fa779	Maeva TECHER	2025-02-14

Version	Author	Date
d7fa779	Maeva TECHER	2025-02-14

DEG table

# Generate tables for each contrast
table_hex1 <- generate_deg_table(ddssva, "Gene_HEX1_vs_GFP", allspecies_df)


out of 15915 with nonzero total read count
adjusted p-value < 0.05
LFC > 0 (up)       : 430, 2.7%
LFC < 0 (down)     : 557, 3.5%
outliers [1]       : 0, 0%
low counts [2]     : 1852, 12%
(mean count < 17)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results

table_hex2 <- generate_deg_table(ddssva, "Gene_HEX2_vs_GFP", allspecies_df)


out of 15915 with nonzero total read count
adjusted p-value < 0.05
LFC > 0 (up)       : 143, 0.9%
LFC < 0 (down)     : 370, 2.3%
outliers [1]       : 0, 0%
low counts [2]     : 4011, 25%
(mean count < 50)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results

table_jhmt <- generate_deg_table(ddssva, "Gene_JHMT_vs_GFP", allspecies_df)


out of 15915 with nonzero total read count
adjusted p-value < 0.05
LFC > 0 (up)       : 79, 0.5%
LFC < 0 (down)     : 104, 0.65%
outliers [1]       : 0, 0%
low counts [2]     : 1543, 9.7%
(mean count < 14)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results

table_miox <- generate_deg_table(ddssva, "Gene_MIOX_vs_GFP", allspecies_df)


out of 15915 with nonzero total read count
adjusted p-value < 0.05
LFC > 0 (up)       : 60, 0.38%
LFC < 0 (down)     : 123, 0.77%
outliers [1]       : 0, 0%
low counts [2]     : 1235, 7.8%
(mean count < 12)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results

table_unch <- generate_deg_table(ddssva, "Gene_UNCH_vs_GFP", allspecies_df)


out of 15915 with nonzero total read count
adjusted p-value < 0.05
LFC > 0 (up)       : 1461, 9.2%
LFC < 0 (down)     : 2252, 14%
outliers [1]       : 0, 0%
low counts [2]     : 0, 0%
(mean count < 3)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results

table_hex1$kable_table

table_hex2$kable_table

table_jhmt$kable_table

table_miox$kable_table

table_unch$kable_table

GO and KEGG enrichment

#enrich_data(hex1, "Hex1_vs_GFP", hex1_samples)
#enrich_data(hex2, "Hex2_vs_GFP", hex2_samples)
#enrich_data(jhmt, "JHMT_vs_GFP", jhmt_samples)
#enrich_data(miox, "MIOX_vs_GFP", miox_samples)
#enrich_data(unch, "UNCH_vs_GFP", unch_samples)

Thorax tissue

saveDir <- paste0(workDir,"/DEG_results/RNAi/Thorax")
dir.create(saveDir)
### Prepare Sample CSV file #####
samples <- read.delim(file.path(workDir, "list/RNAi/Thorax_RNAisample_list.csv"), sep = ",", row.names = 1, header = TRUE)
files <- file.path(workDir, "readcounts/RNAi/", samples$Tissue, samples$Filename)
names(files) <- row.names(samples)
all(file.exists(files))

[1] TRUE

### Create count sample matrix
cts <- map_dfc(files, function(sample) {
  data_count <- read.delim(sample, sep = "\t", header = FALSE)
  col_name <- gsub("_counts.txt", "", basename(sample)) 
  setNames(data.frame(data_count[, 2]), col_name)
})

row_get <- read.delim(files[1], sep = "\t", row.names = 1, header = F) # Get proper row names
rownames(cts) <- rownames(row_get)
rm(row_get) # remove unused object from memory

### Build DESeq2 Object
dds <- DESeqDataSetFromMatrix(countData = cts,
                              colData = samples,
                              design = ~ Gene)
dds$Gene <- relevel(dds$Gene, ref = "GFP")

smallestGroupSize <- 5
keep <- rowSums(counts(dds) >= 10) >= smallestGroupSize
dds <- dds[keep,]

dds <- DESeq(dds)

Following the generation of the DEseq2 object, we annotate the genes with the GeneID using biomaRt.

### Fetch Annotation Gene IDs using biomaRt
ensembl <- useMart("metazoa_mart", host = "https://metazoa.ensembl.org")
metazoa_list <- listDatasets(ensembl)
dataset <- useMart("metazoa_mart", dataset = "sggca023897955v2rs_eg_gene",
                   host = "https://metazoa.ensembl.org")
#listAttributes(dataset)

test_raw_counts <- as.data.frame(counts(dds))
rownames(test_raw_counts) <- as.character(rownames(test_raw_counts))
test_raw_counts$ensembl_gene_id <- row.names(test_raw_counts)
annotations <- getBM(attributes = c("ensembl_gene_id", "geneid"),
                     filters = "ensembl_gene_id",
                     values = rownames(test_raw_counts),
                     mart = dataset)

# Merge dataframes to retain geneid information from biomaRt
test_raw_counts_annotated <- merge(test_raw_counts, annotations,
                                   by = "ensembl_gene_id",
                                   all.x = T)

write.csv(test_raw_counts_annotated, file=paste0(saveDir,"/Thorax_raw_counts.csv"))

Normalization and PCA

# Plot PCA and investigate quality metrics
vsd <- vst(dds, blind=T)

pca_results <- create_pca_plots(norm.dds = vsd, saveDir, transformation = "vst", intgroup = "Gene")
pca_results$PCA_Labelled

Version	Author	Date
d7fa779	Maeva TECHER	2025-02-14
3746422	Maeva TECHER	2025-02-12

pca_results$PCA_Hull

Version	Author	Date
d7fa779	Maeva TECHER	2025-02-14
3746422	Maeva TECHER	2025-02-12

SVA

### SVA analysis to control for technical variation 
dat  <- counts(dds, normalized = TRUE)
idx  <- rowMeans(dat) > 1
dat  <- dat[idx, ]
mod  <- model.matrix(~ Gene, colData(dds))
mod0 <- model.matrix(~ 1, colData(dds))
svseq <- svaseq(dat, mod, mod0)

Number of significant surrogate variables is:  4 
Iteration (out of 5 ):1  2  3  4  5

sva_plots <- create_sva_plots(svseq, dds, saveDir, intgroup = c("Tissue", "Gene"))

# Show stripcharts in the report
sva_plots$Stripcharts[[1]]  # Show first stripchart

Version	Author	Date
d7fa779	Maeva TECHER	2025-02-14
3746422	Maeva TECHER	2025-02-12

sva_plots$Stripcharts[[2]]  # Show second stripchart

Version	Author	Date
d7fa779	Maeva TECHER	2025-02-14
3746422	Maeva TECHER	2025-02-12

sva_plots$Stripcharts[[3]]  # Show third stripchart

Version	Author	Date
d7fa779	Maeva TECHER	2025-02-14

# Show scatter plots in the report
sva_plots$ScatterPlots[["1_2"]]  # Show SV1 vs SV2

Version	Author	Date
d7fa779	Maeva TECHER	2025-02-14

sva_plots$ScatterPlots[["1_3"]]  # Show SV1 vs SV3

Version	Author	Date
d7fa779	Maeva TECHER	2025-02-14

sva_plots$ScatterPlots[["2_3"]]  # Show SV2 vs SV3

Version	Author	Date
d7fa779	Maeva TECHER	2025-02-14

ddssva <- dds
ddssva$SV1 <- svseq$sv[,1]
ddssva$SV2 <- svseq$sv[,2]
ddssva$SV3 <- svseq$sv[,3]
design(ddssva) <- ~ SV1 + SV2 + SV3 + Gene
ddssva$Gene <- relevel(ddssva$Gene, ref = "GFP")

smallestGroupSize <- 5
keep <- rowSums(counts(ddssva) >= 10) >= smallestGroupSize
ddssva <- ddssva[keep,]

ddssva <- DESeq(ddssva)

ddssva <- ddssva[which(mcols(ddssva)$betaConv),] # remove non converging rows

### Extract results
resultsNames(ddssva)

[1] "Intercept"        "SV1"              "SV2"              "SV3"             
[5] "Gene_HEX1_vs_GFP" "Gene_HEX2_vs_GFP" "Gene_JHMT_vs_GFP" "Gene_MIOX_vs_GFP"
[9] "Gene_UNCH_vs_GFP"

hex1 <- results(ddssva, name = "Gene_HEX1_vs_GFP", alpha = 0.05)
hex2 <- results(ddssva, name = "Gene_HEX2_vs_GFP", alpha = 0.05)
jhmt <- results(ddssva, name = "Gene_JHMT_vs_GFP", alpha = 0.05)
miox <- results(ddssva, name = "Gene_MIOX_vs_GFP", alpha = 0.05)
unch <- results(ddssva, name = "Gene_UNCH_vs_GFP", alpha = 0.05)

Volcano plots and Heatmaps

First we create function to generate the plots we are interested to obtain and then run the whole pipeline for each gene.

# Define contrast_sets
hex1_samples <- c("SggfpT1","SggfpT2","SggfpT3","SggfpT4","SggfpT5",
                  "Sghex1T1","Sghex1T2","Sghex1T3","Sghex1T4","Sghex1T5")
hex2_samples <- c("SggfpT1","SggfpT2","SggfpT3","SggfpT4","SggfpT5",
                  "Sghex2T1","Sghex2T2","Sghex2T3","Sghex2T4","Sghex2T5")
jhmt_samples <- c("SggfpT1","SggfpT2","SggfpT3","SggfpT4","SggfpT5",
                  "SgjhmtT1","SgjhmtT2","SgjhmtT3","SgjhmtT4","SgjhmtT5")
miox_samples <- c("SggfpT1","SggfpT2","SggfpT3","SggfpT4","SggfpT5",
                  "SgmioxT1","SgmioxT2","SgmioxT3","SgmioxT4","SgmioxT5")
unch_samples <- c("SggfpT1","SggfpT2","SggfpT3","SggfpT4","SggfpT5",
                  "SgunchT1","SgunchT2","SgunchT3","SgunchT4","SgunchT5")

# Run full analysis
hex1_plots <- visualize_data(hex1, "Hex1_vs_GFP", hex1_samples)
hex2_plots <- visualize_data(hex2, "Hex2_vs_GFP", hex2_samples)
jhmt_plots <- visualize_data(jhmt, "JHMT_vs_GFP", jhmt_samples)
miox_plots <- visualize_data(miox, "MIOX_vs_GFP", miox_samples)
unch_plots <- visualize_data(unch, "UNCH_vs_GFP", unch_samples)

hex1_plots$volcano; hex1_plots$heatmap

Version	Author	Date
d7fa779	Maeva TECHER	2025-02-14

Version	Author	Date
d7fa779	Maeva TECHER	2025-02-14

hex2_plots$volcano; hex2_plots$heatmap

Version	Author	Date
d7fa779	Maeva TECHER	2025-02-14

Version	Author	Date
d7fa779	Maeva TECHER	2025-02-14

jhmt_plots$volcano; jhmt_plots$heatmap

Version	Author	Date
d7fa779	Maeva TECHER	2025-02-14

Version	Author	Date
d7fa779	Maeva TECHER	2025-02-14

miox_plots$volcano; miox_plots$heatmap

Version	Author	Date
d7fa779	Maeva TECHER	2025-02-14

Version	Author	Date
d7fa779	Maeva TECHER	2025-02-14

unch_plots$volcano; unch_plots$heatmap

Version	Author	Date
d7fa779	Maeva TECHER	2025-02-14

Version	Author	Date
d7fa779	Maeva TECHER	2025-02-14

DEG table

# Generate tables for each contrast
table_hex1 <- generate_deg_table(ddssva, "Gene_HEX1_vs_GFP", allspecies_df)


out of 15462 with nonzero total read count
adjusted p-value < 0.05
LFC > 0 (up)       : 436, 2.8%
LFC < 0 (down)     : 598, 3.9%
outliers [1]       : 0, 0%
low counts [2]     : 2998, 19%
(mean count < 27)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results

table_hex2 <- generate_deg_table(ddssva, "Gene_HEX2_vs_GFP", allspecies_df)


out of 15462 with nonzero total read count
adjusted p-value < 0.05
LFC > 0 (up)       : 224, 1.4%
LFC < 0 (down)     : 149, 0.96%
outliers [1]       : 0, 0%
low counts [2]     : 2399, 16%
(mean count < 21)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results

table_jhmt <- generate_deg_table(ddssva, "Gene_JHMT_vs_GFP", allspecies_df)


out of 15462 with nonzero total read count
adjusted p-value < 0.05
LFC > 0 (up)       : 40, 0.26%
LFC < 0 (down)     : 40, 0.26%
outliers [1]       : 0, 0%
low counts [2]     : 3298, 21%
(mean count < 31)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results

table_miox <- generate_deg_table(ddssva, "Gene_MIOX_vs_GFP", allspecies_df)


out of 15462 with nonzero total read count
adjusted p-value < 0.05
LFC > 0 (up)       : 226, 1.5%
LFC < 0 (down)     : 230, 1.5%
outliers [1]       : 0, 0%
low counts [2]     : 2998, 19%
(mean count < 27)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results

table_unch <- generate_deg_table(ddssva, "Gene_UNCH_vs_GFP", allspecies_df)


out of 15462 with nonzero total read count
adjusted p-value < 0.05
LFC > 0 (up)       : 256, 1.7%
LFC < 0 (down)     : 251, 1.6%
outliers [1]       : 0, 0%
low counts [2]     : 1499, 9.7%
(mean count < 13)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results

table_hex1$kable_table

table_hex2$kable_table

table_jhmt$kable_table

table_miox$kable_table

table_unch$kable_table

GO and KEGG enrichment

#enrich_data(hex1, "Hex1_vs_GFP", hex1_samples)
#enrich_data(hex2, "Hex2_vs_GFP", hex2_samples)
#enrich_data(jhmt, "JHMT_vs_GFP", jhmt_samples)
#enrich_data(miox, "MIOX_vs_GFP", miox_samples)
#enrich_data(unch, "UNCH_vs_GFP", unch_samples)

Excluding rRNA

All tissue

saveDir <- paste0(workDir,"/DEG_results/RNAi/All_no_rRNA")
dir.create(saveDir)
### Prepare Sample CSV file #####
samples <- read.delim(file.path(workDir, "list/RNAi/All_RNAisample_list.csv"), sep = ",", row.names = 1, header = TRUE)
files <- file.path(workDir, "readcounts/RNAi/", samples$Tissue, samples$Filename)
names(files) <- row.names(samples)
all(file.exists(files))

[1] TRUE

### Create count sample matrix
cts <- map_dfc(files, function(sample) {
  data_count <- read.delim(sample, sep = "\t", header = FALSE)
  col_name <- gsub("_counts.txt", "", basename(sample)) 
  setNames(data.frame(data_count[, 2]), col_name)
})

row_get <- read.delim(files[1], sep = "\t", row.names = 1, header = F) # Get proper row names
rownames(cts) <- rownames(row_get)
rm(row_get) # remove unused object from memory

### Build DESeq2 Object
dds <- DESeqDataSetFromMatrix(countData = cts,
                              colData = samples,
                              design = ~ Gene)
dds$Gene <- relevel(dds$Gene, ref = "GFP")

smallestGroupSize <- 5
keep <- rowSums(counts(dds) >= 10) >= smallestGroupSize
dds <- dds[keep,]

loci_to_exclude <- readLines(file.path(workDir, "list/excluded_loci/gregaria_rrna_list.txt"))
dds <- dds[!(rownames(dds) %in% loci_to_exclude), ]

dds <- DESeq(dds)
cbind(resultsNames(dds))

     [,1]              
[1,] "Intercept"       
[2,] "Gene_HEX1_vs_GFP"
[3,] "Gene_HEX2_vs_GFP"
[4,] "Gene_JHMT_vs_GFP"
[5,] "Gene_MIOX_vs_GFP"
[6,] "Gene_UNCH_vs_GFP"

Following the generation of the DEseq2 object, we annotate the genes with the GeneID using biomaRt.

### Fetch Annotation Gene IDs using biomaRt
ensembl <- useMart("metazoa_mart", host = "https://metazoa.ensembl.org")
metazoa_list <- listDatasets(ensembl)
dataset <- useMart("metazoa_mart", dataset = "sggca023897955v2rs_eg_gene",
                   host = "https://metazoa.ensembl.org")
#listAttributes(dataset)

test_raw_counts <- as.data.frame(counts(dds))
rownames(test_raw_counts) <- as.character(rownames(test_raw_counts))
test_raw_counts$ensembl_gene_id <- row.names(test_raw_counts)
annotations <- getBM(attributes = c("ensembl_gene_id", "geneid"),
                     filters = "ensembl_gene_id",
                     values = rownames(test_raw_counts),
                     mart = dataset)

# Merge dataframes to retain geneid information from biomaRt
test_raw_counts_annotated <- merge(test_raw_counts, annotations,
                                   by = "ensembl_gene_id",
                                   all.x = T)

write.csv(test_raw_counts_annotated, file=paste0(saveDir,"/All_raw_counts.csv"))

Normalization and PCA

# Plot PCA and investigate quality metrics
vsd <- vst(dds, blind = TRUE) 

# Perform PCA
pca_data <- plotPCA(vsd, intgroup = c("Tissue", "Gene"), returnData = TRUE)

# Define colors for genes (slightly transparent) and shapes for tissues
gene_colors <- scale_color_manual(values = alpha(brewer.pal(n = length(unique(pca_data$Gene)), name = "Set1"), 0.8))  # Points are transparent
tissue_shapes <- scale_shape_manual(values = seq(15, 15 + length(unique(pca_data$Tissue))))

# **PCA without labels**
p_pca_nolabel <- ggplot(pca_data, aes(x = PC1, y = PC2, color = Gene, shape = Tissue)) +
  geom_point(size = 4) +
  gene_colors + 
  tissue_shapes +
  theme_bw() +
  theme(legend.title = element_blank(),
        legend.text = element_text(face = "bold", size = 14),
        axis.text = element_text(size = 12),
        axis.title = element_text(size = 14)) +
  ggtitle("PCA of All Tissues (No Labels)", subtitle = "Tissues differentiated by shape, Genes by color")

# Save PCA without labels
ggsave(paste0(saveDir, "/PCA_Tissue_Gene_NoLabel.png"), plot = p_pca_nolabel, width = 10, height = 10, dpi = 600, device = "png")

# **PCA with labels**
p_pca_label <- ggplot(pca_data, aes(x = PC1, y = PC2, color = Gene, shape = Tissue)) +
  geom_point(size = 4) +
  geom_text_repel(aes(label = name), size = 4, color = "black", max.overlaps = 20) +  # Labels are fully visible
  gene_colors + 
  tissue_shapes +
  theme_bw() +
  theme(legend.title = element_blank(),
        legend.text = element_text(face = "bold", size = 14),
        axis.text = element_text(size = 12),
        axis.title = element_text(size = 14)) +
  ggtitle("PCA of All Tissues (With Labels)", subtitle = "Tissues differentiated by shape, Genes by color")

# Save PCA with labels
ggsave(paste0(saveDir, "/PCA_Tissue_Gene_Label.png"), plot = p_pca_label, width = 10, height = 10, dpi = 600, device = "png")

# **Return plots for knitr/RMarkdown**
list(NoLabel = p_pca_nolabel, WithLabel = p_pca_label)

$NoLabel

Version	Author	Date
d7fa779	Maeva TECHER	2025-02-14


$WithLabel

Version	Author	Date
d7fa779	Maeva TECHER	2025-02-14

The PCA plot shows clear distinction between tissue types, while gene silencing has a large variation within each tissue, and presents no distinct clear groupings for a single gene.

SVA

### SVA analysis to control for technical variation 
dat  <- counts(dds, normalized = TRUE)
idx  <- rowMeans(dat) > 1
dat  <- dat[idx, ]
mod  <- model.matrix(~ Gene, colData(dds))
mod0 <- model.matrix(~ 1, colData(dds))
svseq <- svaseq(dat, mod, mod0)

Number of significant surrogate variables is:  2 
Iteration (out of 5 ):1  2  3  4  5

#sva_plots <- create_sva_plots(svseq, dds, saveDir, intgroup = c("Tissue", "Gene"))

# Show stripcharts in the report
sva_plots$Stripcharts[[1]]  # Show first stripchart

Version	Author	Date
d7fa779	Maeva TECHER	2025-02-14

sva_plots$Stripcharts[[2]]  # Show second stripchart

Version	Author	Date
d7fa779	Maeva TECHER	2025-02-14

# Show scatter plots in the report
sva_plots$ScatterPlots[["1_2"]]  # Show SV1 vs SV2

Version	Author	Date
d7fa779	Maeva TECHER	2025-02-14

ddssva <- dds
ddssva$SV2 <- svseq$sv[,2]
design(ddssva) <- ~ SV2 + Gene
ddssva$Gene <- relevel(ddssva$Gene, ref = "GFP")

smallestGroupSize <- 5
keep <- rowSums(counts(ddssva) >= 10) >= smallestGroupSize
ddssva <- ddssva[keep,]

ddssva <- DESeq(ddssva)

ddssva <- ddssva[which(mcols(ddssva)$betaConv),] # remove non converging rows

### Extract results
resultsNames(ddssva)

[1] "Intercept"        "SV2"              "Gene_HEX1_vs_GFP" "Gene_HEX2_vs_GFP"
[5] "Gene_JHMT_vs_GFP" "Gene_MIOX_vs_GFP" "Gene_UNCH_vs_GFP"

hex1 <- results(ddssva, name = "Gene_HEX1_vs_GFP", alpha = 0.05)
hex2 <- results(ddssva, name = "Gene_HEX2_vs_GFP", alpha = 0.05)
jhmt <- results(ddssva, name = "Gene_JHMT_vs_GFP", alpha = 0.05)
miox <- results(ddssva, name = "Gene_MIOX_vs_GFP", alpha = 0.05)
unch <- results(ddssva, name = "Gene_UNCH_vs_GFP", alpha = 0.05)

Volcano plots and Heatmaps

First we create function to generate the plots we are interested to obtain and then run the whole pipeline for each gene.

# Define contrast_sets
hex1_samples <- c("SggfpH1","SggfpH2","SggfpH3","SggfpH4","SggfpH5",
                  "Sghex1H1","Sghex1H2","Sghex1H3","Sghex1H4","Sghex1H5",
                  "SggfpT1","SggfpT2","SggfpT3","SggfpT4","SggfpT5",
                  "Sghex1T1","Sghex1T2","Sghex1T3","Sghex1T4","Sghex1T5")
hex2_samples <- c("SggfpH1","SggfpH2","SggfpH3","SggfpH4","SggfpH5",
                  "Sghex2H1","Sghex2H2","Sghex2H3","Sghex2H4","Sghex2H5",
                  "SggfpT1","SggfpT2","SggfpT3","SggfpT4","SggfpT5",
                  "Sghex2T1","Sghex2T2","Sghex2T3","Sghex2T4","Sghex2T5")
jhmt_samples <- c("SggfpH1","SggfpH2","SggfpH3","SggfpH4","SggfpH5",
                  "SgjhmtH1","SgjhmtH2","SgjhmtH3","SgjhmtH4","SgjhmtH5",
                  "SggfpT1","SggfpT2","SggfpT3","SggfpT4","SggfpT5",
                  "SgjhmtT1","SgjhmtT2","SgjhmtT3","SgjhmtT4","SgjhmtT5")
miox_samples <- c("SggfpH1","SggfpH2","SggfpH3","SggfpH4","SggfpH5",
                  "SgmioxH1","SgmioxH2","SgmioxH3","SgmioxH4","SgmioxH5",
                  "SggfpT1","SggfpT2","SggfpT3","SggfpT4","SggfpT5",
                  "SgmioxT1","SgmioxT2","SgmioxT3","SgmioxT4","SgmioxT5")
unch_samples <- c("SggfpH1","SggfpH2","SggfpH3","SggfpH4","SggfpH5",
                  "SgunchH1","SgunchH2","SgunchH3","SgunchH4","SgunchH5",
                  "SggfpT1","SggfpT2","SggfpT3","SggfpT4","SggfpT5",
                  "SgunchT1","SgunchT2","SgunchT3","SgunchT4","SgunchT5")

# Run full analysis
hex1_plots <- visualize_data(hex1, "Hex1_vs_GFP", hex1_samples)
hex2_plots <- visualize_data(hex2, "Hex2_vs_GFP", hex2_samples)
jhmt_plots <- visualize_data(jhmt, "JHMT_vs_GFP", jhmt_samples)
miox_plots <- visualize_data(miox, "MIOX_vs_GFP", miox_samples)
unch_plots <- visualize_data(unch, "UNCH_vs_GFP", unch_samples)

hex1_plots$volcano; hex1_plots$heatmap

Version	Author	Date
d7fa779	Maeva TECHER	2025-02-14

Version	Author	Date
d7fa779	Maeva TECHER	2025-02-14

hex2_plots$volcano; hex2_plots$heatmap

Version	Author	Date
d7fa779	Maeva TECHER	2025-02-14

Version	Author	Date
d7fa779	Maeva TECHER	2025-02-14

jhmt_plots$volcano; jhmt_plots$heatmap

Version	Author	Date
d7fa779	Maeva TECHER	2025-02-14

Version	Author	Date
d7fa779	Maeva TECHER	2025-02-14

miox_plots$volcano; miox_plots$heatmap

Version	Author	Date
d7fa779	Maeva TECHER	2025-02-14

Version	Author	Date
d7fa779	Maeva TECHER	2025-02-14

unch_plots$volcano; unch_plots$heatmap

Version	Author	Date
d7fa779	Maeva TECHER	2025-02-14

Version	Author	Date
d7fa779	Maeva TECHER	2025-02-14

DEG table

# Generate tables for each contrast
table_hex1 <- generate_deg_table(ddssva, "Gene_HEX1_vs_GFP", allspecies_df)


out of 14890 with nonzero total read count
adjusted p-value < 0.05
LFC > 0 (up)       : 839, 5.6%
LFC < 0 (down)     : 950, 6.4%
outliers [1]       : 0, 0%
low counts [2]     : 0, 0%
(mean count < 2)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results

table_hex2 <- generate_deg_table(ddssva, "Gene_HEX2_vs_GFP", allspecies_df)


out of 14890 with nonzero total read count
adjusted p-value < 0.05
LFC > 0 (up)       : 201, 1.3%
LFC < 0 (down)     : 321, 2.2%
outliers [1]       : 0, 0%
low counts [2]     : 1444, 9.7%
(mean count < 13)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results

table_jhmt <- generate_deg_table(ddssva, "Gene_JHMT_vs_GFP", allspecies_df)


out of 14890 with nonzero total read count
adjusted p-value < 0.05
LFC > 0 (up)       : 266, 1.8%
LFC < 0 (down)     : 486, 3.3%
outliers [1]       : 0, 0%
low counts [2]     : 0, 0%
(mean count < 2)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results

table_miox <- generate_deg_table(ddssva, "Gene_MIOX_vs_GFP", allspecies_df)


out of 14890 with nonzero total read count
adjusted p-value < 0.05
LFC > 0 (up)       : 111, 0.75%
LFC < 0 (down)     : 100, 0.67%
outliers [1]       : 0, 0%
low counts [2]     : 0, 0%
(mean count < 2)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results

table_unch <- generate_deg_table(ddssva, "Gene_UNCH_vs_GFP", allspecies_df)


out of 14890 with nonzero total read count
adjusted p-value < 0.05
LFC > 0 (up)       : 172, 1.2%
LFC < 0 (down)     : 231, 1.6%
outliers [1]       : 0, 0%
low counts [2]     : 0, 0%
(mean count < 2)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results

table_hex1$kable_table

table_hex2$kable_table

table_jhmt$kable_table

table_miox$kable_table

table_unch$kable_table

Head tissue

saveDir <- paste0(workDir,"/DEG_results/RNAi/Head_no_rRNA")
dir.create(saveDir)

### Prepare Sample CSV file #####
samples <- read.delim(file.path(workDir, "list/RNAi/Head_RNAisample_list.csv"), sep = ",", row.names = 1, header = TRUE)
files <- file.path(workDir, "readcounts/RNAi/", samples$Tissue, samples$Filename)
names(files) <- row.names(samples)
all(file.exists(files))

[1] TRUE

### Create count sample matrix
cts <- map_dfc(files, function(sample) {
  data_count <- read.delim(sample, sep = "\t", header = FALSE)
  col_name <- gsub("_counts.txt", "", basename(sample)) 
  setNames(data.frame(data_count[, 2]), col_name)
})

row_get <- read.delim(files[1], sep = "\t", row.names = 1, header = F) # Get proper row names
rownames(cts) <- rownames(row_get)
rm(row_get) # remove unused object from memory

### Build DESeq2 Object
dds <- DESeqDataSetFromMatrix(countData = cts,
                              colData = samples,
                              design = ~ Gene)
dds$Gene <- relevel(dds$Gene, ref = "GFP")

smallestGroupSize <- 5
keep <- rowSums(counts(dds) >= 10) >= smallestGroupSize
dds <- dds[keep,]

loci_to_exclude <- readLines(file.path(workDir, "list/excluded_loci/gregaria_rrna_list.txt"))
dds <- dds[!(rownames(dds) %in% loci_to_exclude), ]

dds <- DESeq(dds)
cbind(resultsNames(dds))

     [,1]              
[1,] "Intercept"       
[2,] "Gene_HEX1_vs_GFP"
[3,] "Gene_HEX2_vs_GFP"
[4,] "Gene_JHMT_vs_GFP"
[5,] "Gene_MIOX_vs_GFP"
[6,] "Gene_UNCH_vs_GFP"

Following the generation of the DEseq2 object, we annotate the genes with the GeneID using biomaRt.

### Fetch Annotation Gene IDs using biomaRt
ensembl <- useMart("metazoa_mart", host = "https://metazoa.ensembl.org")
metazoa_list <- listDatasets(ensembl)
dataset <- useMart("metazoa_mart", dataset = "sggca023897955v2rs_eg_gene",
                   host = "https://metazoa.ensembl.org")
#listAttributes(dataset)

test_raw_counts <- as.data.frame(counts(dds))
rownames(test_raw_counts) <- as.character(rownames(test_raw_counts))
test_raw_counts$ensembl_gene_id <- row.names(test_raw_counts)
annotations <- getBM(attributes = c("ensembl_gene_id", "geneid"),
                     filters = "ensembl_gene_id",
                     values = rownames(test_raw_counts),
                     mart = dataset)

# Merge dataframes to retain geneid information from biomaRt
test_raw_counts_annotated <- merge(test_raw_counts, annotations,
                                   by = "ensembl_gene_id",
                                   all.x = T)

write.csv(test_raw_counts_annotated, file=paste0(saveDir,"/Head_raw_counts.csv"))

Normalization and PCA

# Plot PCA and investigate quality metrics
vsd <- vst(dds, blind=T)

pca_results <- create_pca_plots(norm.dds = vsd, saveDir, transformation = "vst", intgroup = "Gene")
pca_results$PCA_Labelled

Version	Author	Date
d7fa779	Maeva TECHER	2025-02-14

pca_results$PCA_Hull

Version	Author	Date
d7fa779	Maeva TECHER	2025-02-14

SVA

### SVA analysis to control for technical variation 
dat  <- counts(dds, normalized = TRUE)
idx  <- rowMeans(dat) > 1
dat  <- dat[idx, ]
mod  <- model.matrix(~ Gene, colData(dds))
mod0 <- model.matrix(~ 1, colData(dds))
svseq <- svaseq(dat, mod, mod0)

Number of significant surrogate variables is:  4 
Iteration (out of 5 ):1  2  3  4  5

sva_plots <- create_sva_plots(svseq, dds, saveDir, intgroup = c("Tissue", "Gene"))

# Show stripcharts in the report
sva_plots$Stripcharts[[1]]  # Show first stripchart

Version	Author	Date
d7fa779	Maeva TECHER	2025-02-14

sva_plots$Stripcharts[[2]]  # Show second stripchart

Version	Author	Date
d7fa779	Maeva TECHER	2025-02-14

sva_plots$Stripcharts[[3]]  # Show third stripchart

Version	Author	Date
d7fa779	Maeva TECHER	2025-02-14

# Show scatter plots in the report
sva_plots$ScatterPlots[["1_2"]]  # Show SV1 vs SV2

Version	Author	Date
d7fa779	Maeva TECHER	2025-02-14

sva_plots$ScatterPlots[["1_3"]]  # Show SV1 vs SV3

Version	Author	Date
d7fa779	Maeva TECHER	2025-02-14

sva_plots$ScatterPlots[["2_3"]]  # Show SV2 vs SV3

Version	Author	Date
d7fa779	Maeva TECHER	2025-02-14

ddssva <- dds
ddssva$SV1 <- svseq$sv[,1]
ddssva$SV2 <- svseq$sv[,2]
ddssva$SV3 <- svseq$sv[,3]
design(ddssva) <- ~ SV1 + SV2 + SV3 + Gene
ddssva$Gene <- relevel(ddssva$Gene, ref = "GFP")

smallestGroupSize <- 5
keep <- rowSums(counts(ddssva) >= 10) >= smallestGroupSize
ddssva <- ddssva[keep,]

ddssva <- DESeq(ddssva)

ddssva <- ddssva[which(mcols(ddssva)$betaConv),] # remove non converging rows

### Extract results
resultsNames(ddssva)

[1] "Intercept"        "SV1"              "SV2"              "SV3"             
[5] "Gene_HEX1_vs_GFP" "Gene_HEX2_vs_GFP" "Gene_JHMT_vs_GFP" "Gene_MIOX_vs_GFP"
[9] "Gene_UNCH_vs_GFP"

hex1 <- results(ddssva, name = "Gene_HEX1_vs_GFP", alpha = 0.05)
hex2 <- results(ddssva, name = "Gene_HEX2_vs_GFP", alpha = 0.05)
jhmt <- results(ddssva, name = "Gene_JHMT_vs_GFP", alpha = 0.05)
miox <- results(ddssva, name = "Gene_MIOX_vs_GFP", alpha = 0.05)
unch <- results(ddssva, name = "Gene_UNCH_vs_GFP", alpha = 0.05)

Volcano plots and Heatmaps

First we create function to generate the plots we are interested to obtain and then run the whole pipeline for each gene.

# Define contrast_sets
hex1_samples <- c("SggfpH1","SggfpH2","SggfpH3","SggfpH4","SggfpH5",
                  "Sghex1H1","Sghex1H2","Sghex1H3","Sghex1H4","Sghex1H5")
hex2_samples <- c("SggfpH1","SggfpH2","SggfpH3","SggfpH4","SggfpH5",
                  "Sghex2H1","Sghex2H2","Sghex2H3","Sghex2H4","Sghex2H5")
jhmt_samples <- c("SggfpH1","SggfpH2","SggfpH3","SggfpH4","SggfpH5",
                  "SgjhmtH1","SgjhmtH2","SgjhmtH3","SgjhmtH4","SgjhmtH5")
miox_samples <- c("SggfpH1","SggfpH2","SggfpH3","SggfpH4","SggfpH5",
                  "SgmioxH1","SgmioxH2","SgmioxH3","SgmioxH4","SgmioxH5")
unch_samples <- c("SggfpH1","SggfpH2","SggfpH3","SggfpH4","SggfpH5",
                  "SgunchH1","SgunchH2","SgunchH3","SgunchH4","SgunchH5")

# Run full analysis
hex1_plots <- visualize_data(hex1, "Hex1_vs_GFP", hex1_samples)
hex2_plots <- visualize_data(hex2, "Hex2_vs_GFP", hex2_samples)
jhmt_plots <- visualize_data(jhmt, "JHMT_vs_GFP", jhmt_samples)
miox_plots <- visualize_data(miox, "MIOX_vs_GFP", miox_samples)
unch_plots <- visualize_data(unch, "UNCH_vs_GFP", unch_samples)

hex1_plots$volcano; hex1_plots$heatmap

Version	Author	Date
d7fa779	Maeva TECHER	2025-02-14

Version	Author	Date
d7fa779	Maeva TECHER	2025-02-14

hex2_plots$volcano; hex2_plots$heatmap

Version	Author	Date
d7fa779	Maeva TECHER	2025-02-14

Version	Author	Date
d7fa779	Maeva TECHER	2025-02-14

jhmt_plots$volcano; jhmt_plots$heatmap

Version	Author	Date
d7fa779	Maeva TECHER	2025-02-14

Version	Author	Date
d7fa779	Maeva TECHER	2025-02-14

miox_plots$volcano; miox_plots$heatmap

Version	Author	Date
d7fa779	Maeva TECHER	2025-02-14

Version	Author	Date
d7fa779	Maeva TECHER	2025-02-14

unch_plots$volcano; unch_plots$heatmap

Version	Author	Date
d7fa779	Maeva TECHER	2025-02-14

Version	Author	Date
d7fa779	Maeva TECHER	2025-02-14

DEG table

# Generate tables for each contrast
table_hex1 <- generate_deg_table(ddssva, "Gene_HEX1_vs_GFP", allspecies_df)


out of 14587 with nonzero total read count
adjusted p-value < 0.05
LFC > 0 (up)       : 432, 3%
LFC < 0 (down)     : 463, 3.2%
outliers [1]       : 0, 0%
low counts [2]     : 2546, 17%
(mean count < 31)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results

table_hex2 <- generate_deg_table(ddssva, "Gene_HEX2_vs_GFP", allspecies_df)


out of 14587 with nonzero total read count
adjusted p-value < 0.05
LFC > 0 (up)       : 171, 1.2%
LFC < 0 (down)     : 287, 2%
outliers [1]       : 0, 0%
low counts [2]     : 2263, 16%
(mean count < 26)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results

table_jhmt <- generate_deg_table(ddssva, "Gene_JHMT_vs_GFP", allspecies_df)


out of 14587 with nonzero total read count
adjusted p-value < 0.05
LFC > 0 (up)       : 73, 0.5%
LFC < 0 (down)     : 106, 0.73%
outliers [1]       : 0, 0%
low counts [2]     : 1980, 14%
(mean count < 22)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results

table_miox <- generate_deg_table(ddssva, "Gene_MIOX_vs_GFP", allspecies_df)


out of 14587 with nonzero total read count
adjusted p-value < 0.05
LFC > 0 (up)       : 61, 0.42%
LFC < 0 (down)     : 112, 0.77%
outliers [1]       : 0, 0%
low counts [2]     : 3677, 25%
(mean count < 60)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results

table_unch <- generate_deg_table(ddssva, "Gene_UNCH_vs_GFP", allspecies_df)


out of 14587 with nonzero total read count
adjusted p-value < 0.05
LFC > 0 (up)       : 1355, 9.3%
LFC < 0 (down)     : 1860, 13%
outliers [1]       : 0, 0%
low counts [2]     : 283, 1.9%
(mean count < 8)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results

table_hex1$kable_table

table_hex2$kable_table

table_jhmt$kable_table

table_miox$kable_table

table_unch$kable_table

Thorax tissue

saveDir <- paste0(workDir,"/DEG_results/RNAi/Thorax_no_rRNA")
dir.create(saveDir)
### Prepare Sample CSV file #####
samples <- read.delim(file.path(workDir, "list/RNAi/Thorax_RNAisample_list.csv"), sep = ",", row.names = 1, header = TRUE)
files <- file.path(workDir, "readcounts/RNAi/", samples$Tissue, samples$Filename)
names(files) <- row.names(samples)
all(file.exists(files))

[1] TRUE

### Create count sample matrix
cts <- map_dfc(files, function(sample) {
  data_count <- read.delim(sample, sep = "\t", header = FALSE)
  col_name <- gsub("_counts.txt", "", basename(sample)) 
  setNames(data.frame(data_count[, 2]), col_name)
})

row_get <- read.delim(files[1], sep = "\t", row.names = 1, header = F) # Get proper row names
rownames(cts) <- rownames(row_get)
rm(row_get) # remove unused object from memory

### Build DESeq2 Object
dds <- DESeqDataSetFromMatrix(countData = cts,
                              colData = samples,
                              design = ~ Gene)
dds$Gene <- relevel(dds$Gene, ref = "GFP")

smallestGroupSize <- 5
keep <- rowSums(counts(dds) >= 10) >= smallestGroupSize
dds <- dds[keep,]

loci_to_exclude <- readLines(file.path(workDir, "list/excluded_loci/gregaria_rrna_list.txt"))
dds <- dds[!(rownames(dds) %in% loci_to_exclude), ]

dds <- DESeq(dds)
cbind(resultsNames(dds))

     [,1]              
[1,] "Intercept"       
[2,] "Gene_HEX1_vs_GFP"
[3,] "Gene_HEX2_vs_GFP"
[4,] "Gene_JHMT_vs_GFP"
[5,] "Gene_MIOX_vs_GFP"
[6,] "Gene_UNCH_vs_GFP"

Following the generation of the DEseq2 object, we annotate the genes with the GeneID using biomaRt.

### Fetch Annotation Gene IDs using biomaRt
ensembl <- useMart("metazoa_mart", host = "https://metazoa.ensembl.org")
metazoa_list <- listDatasets(ensembl)
dataset <- useMart("metazoa_mart", dataset = "sggca023897955v2rs_eg_gene",
                   host = "https://metazoa.ensembl.org")
#listAttributes(dataset)

test_raw_counts <- as.data.frame(counts(dds))
rownames(test_raw_counts) <- as.character(rownames(test_raw_counts))
test_raw_counts$ensembl_gene_id <- row.names(test_raw_counts)
annotations <- getBM(attributes = c("ensembl_gene_id", "geneid"),
                     filters = "ensembl_gene_id",
                     values = rownames(test_raw_counts),
                     mart = dataset)

# Merge dataframes to retain geneid information from biomaRt
test_raw_counts_annotated <- merge(test_raw_counts, annotations,
                                   by = "ensembl_gene_id",
                                   all.x = T)

write.csv(test_raw_counts_annotated, file=paste0(saveDir,"/Thorax_raw_counts.csv"))

Normalization and PCA

# Plot PCA and investigate quality metrics
vsd <- vst(dds, blind=T)

pca_results <- create_pca_plots(norm.dds = vsd, saveDir, transformation = "vst", intgroup = "Gene")
pca_results$PCA_Labelled

Version	Author	Date
d7fa779	Maeva TECHER	2025-02-14

pca_results$PCA_Hull

Version	Author	Date
d7fa779	Maeva TECHER	2025-02-14

SVA

### SVA analysis to control for technical variation 
dat  <- counts(dds, normalized = TRUE)
idx  <- rowMeans(dat) > 1
dat  <- dat[idx, ]
mod  <- model.matrix(~ Gene, colData(dds))
mod0 <- model.matrix(~ 1, colData(dds))
svseq <- svaseq(dat, mod, mod0)

Number of significant surrogate variables is:  5 
Iteration (out of 5 ):1  2  3  4  5

sva_plots <- create_sva_plots(svseq, dds, saveDir, intgroup = c("Tissue", "Gene"))

# Show stripcharts in the report
sva_plots$Stripcharts[[1]]  # Show first stripchart

Version	Author	Date
d7fa779	Maeva TECHER	2025-02-14

sva_plots$Stripcharts[[2]]  # Show second stripchart

Version	Author	Date
d7fa779	Maeva TECHER	2025-02-14

sva_plots$Stripcharts[[3]]  # Show third stripchart

Version	Author	Date
d7fa779	Maeva TECHER	2025-02-14

# Show scatter plots in the report
sva_plots$ScatterPlots[["1_2"]]  # Show SV1 vs SV2

Version	Author	Date
d7fa779	Maeva TECHER	2025-02-14

sva_plots$ScatterPlots[["1_3"]]  # Show SV1 vs SV3

Version	Author	Date
d7fa779	Maeva TECHER	2025-02-14

sva_plots$ScatterPlots[["2_3"]]  # Show SV2 vs SV3

Version	Author	Date
d7fa779	Maeva TECHER	2025-02-14

ddssva <- dds
ddssva$SV1 <- svseq$sv[,1]
ddssva$SV2 <- svseq$sv[,2]
ddssva$SV3 <- svseq$sv[,3]
design(ddssva) <- ~ SV1 + SV2 + SV3 + Gene
ddssva$Gene <- relevel(ddssva$Gene, ref = "GFP")

smallestGroupSize <- 5
keep <- rowSums(counts(ddssva) >= 10) >= smallestGroupSize
ddssva <- ddssva[keep,]

ddssva <- DESeq(ddssva)

ddssva <- ddssva[which(mcols(ddssva)$betaConv),] # remove non converging rows

### Extract results
resultsNames(ddssva)

[1] "Intercept"        "SV1"              "SV2"              "SV3"             
[5] "Gene_HEX1_vs_GFP" "Gene_HEX2_vs_GFP" "Gene_JHMT_vs_GFP" "Gene_MIOX_vs_GFP"
[9] "Gene_UNCH_vs_GFP"

hex1 <- results(ddssva, name = "Gene_HEX1_vs_GFP", alpha = 0.05)
hex2 <- results(ddssva, name = "Gene_HEX2_vs_GFP", alpha = 0.05)
jhmt <- results(ddssva, name = "Gene_JHMT_vs_GFP", alpha = 0.05)
miox <- results(ddssva, name = "Gene_MIOX_vs_GFP", alpha = 0.05)
unch <- results(ddssva, name = "Gene_UNCH_vs_GFP", alpha = 0.05)

Volcano plots and Heatmaps

First we create function to generate the plots we are interested to obtain and then run the whole pipeline for each gene.

# Define contrast_sets
hex1_samples <- c("SggfpT1","SggfpT2","SggfpT3","SggfpT4","SggfpT5",
                  "Sghex1T1","Sghex1T2","Sghex1T3","Sghex1T4","Sghex1T5")
hex2_samples <- c("SggfpT1","SggfpT2","SggfpT3","SggfpT4","SggfpT5",
                  "Sghex2T1","Sghex2T2","Sghex2T3","Sghex2T4","Sghex2T5")
jhmt_samples <- c("SggfpT1","SggfpT2","SggfpT3","SggfpT4","SggfpT5",
                  "SgjhmtT1","SgjhmtT2","SgjhmtT3","SgjhmtT4","SgjhmtT5")
miox_samples <- c("SggfpT1","SggfpT2","SggfpT3","SggfpT4","SggfpT5",
                  "SgmioxT1","SgmioxT2","SgmioxT3","SgmioxT4","SgmioxT5")
unch_samples <- c("SggfpT1","SggfpT2","SggfpT3","SggfpT4","SggfpT5",
                  "SgunchT1","SgunchT2","SgunchT3","SgunchT4","SgunchT5")

# Run full analysis
hex1_plots <- visualize_data(hex1, "Hex1_vs_GFP", hex1_samples)
hex2_plots <- visualize_data(hex2, "Hex2_vs_GFP", hex2_samples)
jhmt_plots <- visualize_data(jhmt, "JHMT_vs_GFP", jhmt_samples)
miox_plots <- visualize_data(miox, "MIOX_vs_GFP", miox_samples)
unch_plots <- visualize_data(unch, "UNCH_vs_GFP", unch_samples)

hex1_plots$volcano; hex1_plots$heatmap

Version	Author	Date
d7fa779	Maeva TECHER	2025-02-14

Version	Author	Date
d7fa779	Maeva TECHER	2025-02-14

hex2_plots$volcano; hex2_plots$heatmap

Version	Author	Date
d7fa779	Maeva TECHER	2025-02-14

Version	Author	Date
d7fa779	Maeva TECHER	2025-02-14

jhmt_plots$volcano; jhmt_plots$heatmap

Version	Author	Date
d7fa779	Maeva TECHER	2025-02-14

Version	Author	Date
d7fa779	Maeva TECHER	2025-02-14

miox_plots$volcano; miox_plots$heatmap

Version	Author	Date
d7fa779	Maeva TECHER	2025-02-14

Version	Author	Date
d7fa779	Maeva TECHER	2025-02-14

unch_plots$volcano; unch_plots$heatmap

Version	Author	Date
d7fa779	Maeva TECHER	2025-02-14

Version	Author	Date
d7fa779	Maeva TECHER	2025-02-14

DEG table

# Generate tables for each contrast
table_hex1 <- generate_deg_table(ddssva, "Gene_HEX1_vs_GFP", allspecies_df)


out of 14217 with nonzero total read count
adjusted p-value < 0.05
LFC > 0 (up)       : 213, 1.5%
LFC < 0 (down)     : 244, 1.7%
outliers [1]       : 0, 0%
low counts [2]     : 2757, 19%
(mean count < 30)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results

table_hex2 <- generate_deg_table(ddssva, "Gene_HEX2_vs_GFP", allspecies_df)


out of 14217 with nonzero total read count
adjusted p-value < 0.05
LFC > 0 (up)       : 195, 1.4%
LFC < 0 (down)     : 161, 1.1%
outliers [1]       : 0, 0%
low counts [2]     : 3308, 23%
(mean count < 42)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results

table_jhmt <- generate_deg_table(ddssva, "Gene_JHMT_vs_GFP", allspecies_df)


out of 14217 with nonzero total read count
adjusted p-value < 0.05
LFC > 0 (up)       : 145, 1%
LFC < 0 (down)     : 128, 0.9%
outliers [1]       : 0, 0%
low counts [2]     : 2481, 17%
(mean count < 26)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results

table_miox <- generate_deg_table(ddssva, "Gene_MIOX_vs_GFP", allspecies_df)


out of 14217 with nonzero total read count
adjusted p-value < 0.05
LFC > 0 (up)       : 475, 3.3%
LFC < 0 (down)     : 403, 2.8%
outliers [1]       : 0, 0%
low counts [2]     : 827, 5.8%
(mean count < 10)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results

table_unch <- generate_deg_table(ddssva, "Gene_UNCH_vs_GFP", allspecies_df)


out of 14217 with nonzero total read count
adjusted p-value < 0.05
LFC > 0 (up)       : 287, 2%
LFC < 0 (down)     : 278, 2%
outliers [1]       : 0, 0%
low counts [2]     : 3032, 21%
(mean count < 35)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results

table_hex1$kable_table

table_hex2$kable_table

table_jhmt$kable_table

table_miox$kable_table

table_unch$kable_table

GO and KEGG enrichment

#enrich_data(hex1, "Hex1_vs_GFP", hex1_samples)
#enrich_data(hex2, "Hex2_vs_GFP", hex2_samples)
#enrich_data(jhmt, "JHMT_vs_GFP", jhmt_samples)
#enrich_data(miox, "MIOX_vs_GFP", miox_samples)
#enrich_data(unch, "UNCH_vs_GFP", unch_samples)

sessionInfo()

R version 4.4.2 (2024-10-31)
Platform: aarch64-apple-darwin20
Running under: macOS Sequoia 15.3

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRblas.0.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Asia/Tokyo
tzcode source: internal

attached base packages:
[1] grid      stats4    stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] kableExtra_1.4.0            DT_0.33                    
 [3] rafalib_1.0.0               biomaRt_2.62.1             
 [5] httr2_1.1.0                 purrr_1.0.4                
 [7] dplyr_1.1.4                 ashr_2.2-63                
 [9] cowplot_1.1.3               sva_3.54.0                 
[11] BiocParallel_1.40.0         genefilter_1.88.0          
[13] mgcv_1.9-1                  nlme_3.1-167               
[15] clusterProfiler_4.14.4      EnhancedVolcano_1.24.0     
[17] circlize_0.4.16             RColorBrewer_1.1-3         
[19] ComplexHeatmap_2.22.0       ensembldb_2.30.0           
[21] AnnotationFilter_1.30.0     GenomicFeatures_1.58.0     
[23] AnnotationHub_3.14.0        BiocFileCache_2.14.0       
[25] dbplyr_2.5.0                ggConvexHull_0.1.0         
[27] ggrepel_0.9.6               ggplot2_3.5.1              
[29] DESeq2_1.46.0               SummarizedExperiment_1.36.0
[31] MatrixGenerics_1.18.1       matrixStats_1.5.0          
[33] GenomicRanges_1.58.0        GenomeInfoDb_1.42.3        
[35] org.Sgregaria.eg.db_1.0.0   AnnotationDbi_1.68.0       
[37] IRanges_2.40.1              S4Vectors_0.44.0           
[39] Biobase_2.66.0              BiocGenerics_0.52.0        

loaded via a namespace (and not attached):
  [1] splines_4.4.2            later_1.4.1              BiocIO_1.16.0           
  [4] bitops_1.0-9             ggplotify_0.1.2          filelock_1.0.3          
  [7] tibble_3.2.1             R.oo_1.27.0              XML_3.99-0.18           
 [10] lifecycle_1.0.4          mixsqp_0.3-54            edgeR_4.4.2             
 [13] doParallel_1.0.17        rprojroot_2.0.4          lattice_0.22-6          
 [16] crosstalk_1.2.1          magrittr_2.0.3           limma_3.62.2            
 [19] sass_0.4.9               rmarkdown_2.29           jquerylib_0.1.4         
 [22] yaml_2.3.10              httpuv_1.6.15            ggtangle_0.0.6          
 [25] DBI_1.2.3                abind_1.4-8              zlibbioc_1.52.0         
 [28] R.utils_2.12.3           RCurl_1.98-1.16          yulab.utils_0.2.0       
 [31] rappdirs_0.3.3           git2r_0.35.0             GenomeInfoDbData_1.2.13 
 [34] enrichplot_1.26.6        irlba_2.3.5.1            tidytree_0.4.6          
 [37] annotate_1.84.0          svglite_2.1.3            codetools_0.2-20        
 [40] DelayedArray_0.32.0      xml2_1.3.6               DOSE_4.0.0              
 [43] tidyselect_1.2.1         shape_1.4.6.1            aplot_0.2.4             
 [46] UCSC.utils_1.2.0         farver_2.1.2             GenomicAlignments_1.42.0
 [49] jsonlite_1.8.9           GetoptLong_1.0.5         survival_3.8-3          
 [52] iterators_1.0.14         systemfonts_1.2.1        foreach_1.5.2           
 [55] progress_1.2.3           tools_4.4.2              ragg_1.3.3              
 [58] treeio_1.30.0            Rcpp_1.0.14              glue_1.8.0              
 [61] SparseArray_1.6.1        xfun_0.50                qvalue_2.38.0           
 [64] withr_3.0.2              BiocManager_1.30.25      fastmap_1.2.0           
 [67] truncnorm_1.0-9          digest_0.6.37            R6_2.5.1                
 [70] gridGraphics_0.5-1       textshaping_1.0.0        colorspace_2.1-1        
 [73] GO.db_3.20.0             RSQLite_2.3.9            R.methodsS3_1.8.2       
 [76] tidyr_1.3.1              generics_0.1.3           data.table_1.16.4       
 [79] rtracklayer_1.66.0       htmlwidgets_1.6.4        prettyunits_1.2.0       
 [82] httr_1.4.7               S4Arrays_1.6.0           whisker_0.4.1           
 [85] pkgconfig_2.0.3          gtable_0.3.6             blob_1.2.4              
 [88] workflowr_1.7.1          XVector_0.46.0           htmltools_0.5.8.1       
 [91] fgsea_1.32.2             ProtGenerics_1.38.0      clue_0.3-66             
 [94] scales_1.3.0             png_0.1-8                ggfun_0.1.8             
 [97] knitr_1.49               rstudioapi_0.17.1        reshape2_1.4.4          
[100] rjson_0.2.23             curl_6.2.0               cachem_1.1.0            
[103] GlobalOptions_0.1.2      stringr_1.5.1            BiocVersion_3.20.0      
[106] parallel_4.4.2           restfulr_0.0.15          pillar_1.10.1           
[109] vctrs_0.6.5              promises_1.3.2           xtable_1.8-4            
[112] cluster_2.1.8            evaluate_1.0.3           invgamma_1.1            
[115] cli_3.6.3                locfit_1.5-9.11          compiler_4.4.2          
[118] Rsamtools_2.22.0         rlang_1.1.5              crayon_1.5.3            
[121] SQUAREM_2021.1           labeling_0.4.3           plyr_1.8.9              
[124] fs_1.6.5                 stringi_1.8.4            viridisLite_0.4.2       
[127] munsell_0.5.1            Biostrings_2.74.1        lazyeval_0.2.2          
[130] GOSemSim_2.32.0          Matrix_1.7-2             hms_1.1.3               
[133] patchwork_1.3.0          bit64_4.6.0-1            statmod_1.5.0           
[136] KEGGREST_1.46.0          igraph_2.1.4             memoise_2.0.1           
[139] bslib_0.9.0              ggtree_3.14.0            fastmatch_1.1-6         
[142] bit_4.5.0.1              ape_5.8-1                gson_0.1.0

Behavior and transcriptomics following RNAi

Devon Boland and Maeva Techer

2025-02-19

1. RNAi probe engineering

2. Behavioral assays

3. Prepare OrgDB for S. gregaria

4. DEGs and enrichment of RNAi samples

All genes included

All tissue

Normalization and PCA

SVA

Volcano plots and Heatmaps

DEG table

Head tissue

Normalization and PCA

SVA

Volcano plots and Heatmaps

DEG table

GO and KEGG enrichment

Thorax tissue

Normalization and PCA

SVA

Volcano plots and Heatmaps

DEG table

GO and KEGG enrichment

Excluding rRNA

All tissue

Normalization and PCA

SVA

Volcano plots and Heatmaps

DEG table

Head tissue

Normalization and PCA

SVA

Volcano plots and Heatmaps

DEG table

Thorax tissue

Normalization and PCA

SVA

Volcano plots and Heatmaps

DEG table

GO and KEGG enrichment