Last updated: 2025-02-19
Checks: 5 2
Knit directory:
locust-comparative-genomics/
This reproducible R Markdown analysis was created with workflowr (version 1.7.1). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.
The R Markdown file has unstaged changes. To know which version of
the R Markdown file created these results, you’ll want to first commit
it to the Git repo. If you’re still working on the analysis, you can
ignore this warning. When you’re finished, you can run
wflow_publish to commit the R Markdown file and build the
HTML.
Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.
The command set.seed(20221025) was run prior to running
the code in the R Markdown file. Setting a seed ensures that any results
that rely on randomness, e.g. subsampling or permutations, are
reproducible.
Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.
Using absolute paths to the files within your workflowr project makes it difficult for you and others to run your code on a different machine. Change the absolute path(s) below to the suggested relative path(s) to make your code more reproducible.
| absolute | relative |
|---|---|
| /Users/maevatecher/Documents/GitHub/locust-comparative-genomics/data/RefSeq/GCF_023897955.1_iqSchGreg1.2_genomic.gtf | data/RefSeq/GCF_023897955.1_iqSchGreg1.2_genomic.gtf |
| /Users/maevatecher/Documents/GitHub/locust-comparative-genomics/data/list/GO_Annotations/blast2go_gregaria.annot.mgp_removed | data/list/GO_Annotations/blast2go_gregaria.annot.mgp_removed |
| /Users/maevatecher/Documents/GitHub/locust-comparative-genomics/data/custom_sgregaria_orgdb | data/custom_sgregaria_orgdb |
| /Users/maevatecher/Documents/GitHub/locust-comparative-genomics/data/custom_sgregaria_orgdb/org.Sgregaria.eg.db | data/custom_sgregaria_orgdb/org.Sgregaria.eg.db |
| /Users/maevatecher/Documents/GitHub/locust-comparative-genomics/data | data |
Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.
The results in this page were generated with repository version d7fa779. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.
Note that you need to be careful to ensure that all relevant files for
the analysis have been committed to Git prior to generating the results
(you can use wflow_publish or
wflow_git_commit). workflowr only checks the R Markdown
file, but you know if there are other scripts or data files that it
depends on. Below is the status of the Git repository when the results
were generated:
Ignored files:
Ignored: .DS_Store
Ignored: analysis/.DS_Store
Ignored: analysis/.Rhistory
Ignored: data/.DS_Store
Ignored: data/DEG_results/.DS_Store
Ignored: data/DEG_results/Bulk_RNAseq/.DS_Store
Ignored: data/DEG_results/Bulk_RNAseq/americana/.DS_Store
Ignored: data/DEG_results/Bulk_RNAseq/cancellata/.DS_Store
Ignored: data/DEG_results/Bulk_RNAseq/cubense/.DS_Store
Ignored: data/DEG_results/Bulk_RNAseq/gregaria/.DS_Store
Ignored: data/DEG_results/Bulk_RNAseq/nitens/.DS_Store
Ignored: data/DEG_results/Bulk_RNAseq/piceifrons/.DS_Store
Ignored: data/DEG_results/RNAi/.DS_Store
Ignored: data/DEG_results/RNAi/All/.DS_Store
Ignored: data/DEG_results/RNAi/All_no_rRNA/.DS_Store
Ignored: data/DEG_results/RNAi/Head_no_rRNA/.DS_Store
Ignored: data/DEG_results/RNAi/Thorax_no_rRNA/.DS_Store
Ignored: data/WGCNA/.DS_Store
Ignored: data/WGCNA/input/.DS_Store
Ignored: data/WGCNA/input/Bulk_RNAseq/.DS_Store
Ignored: data/WGCNA/output/
Ignored: data/behavioral_data/.DS_Store
Ignored: data/behavioral_data/Raw_data/.DS_Store
Ignored: data/custom_sgregaria_orgdb/.DS_Store
Ignored: data/list/.DS_Store
Ignored: data/list/Bulk_RNAseq/.DS_Store
Ignored: data/list/GO_Annotations/.DS_Store
Ignored: data/orthofinder/.DS_Store
Ignored: data/orthofinder/Polyneoptera/.DS_Store
Ignored: data/orthofinder/Polyneoptera/Results_I2/.DS_Store
Ignored: data/orthofinder/Polyneoptera/Results_I2/Orthogroups/.DS_Store
Ignored: data/orthofinder/Polyneoptera/Results_I5/.DS_Store
Ignored: data/orthofinder/Polyneoptera/Results_I5/Orthogroups/.DS_Store
Ignored: data/orthofinder/Schistocerca/.DS_Store
Ignored: data/orthofinder/Schistocerca/Results_I2/.DS_Store
Ignored: data/orthofinder/Schistocerca/Results_I2/Orthogroups/.DS_Store
Ignored: data/orthofinder/Schistocerca/Results_I5/.DS_Store
Ignored: data/orthofinder/Schistocerca/Results_I5/Orthogroups/.DS_Store
Ignored: data/overlap/.DS_Store
Ignored: data/overlap/Bulk_RNAseq/.DS_Store
Ignored: data/overlap/Bulk_RNAseq/cancellata/
Ignored: data/readcounts/.DS_Store
Ignored: data/readcounts/Bulk_RNAseq/.DS_Store
Ignored: data/readcounts/RNAi/.DS_Store
Untracked files:
Untracked: VennDiagram.2025-02-19_00-26-10.090756.log
Untracked: VennDiagram.2025-02-19_00-26-10.673464.log
Untracked: VennDiagram.2025-02-19_00-26-11.487897.log
Untracked: VennDiagram.2025-02-19_00-26-12.000365.log
Untracked: VennDiagram.2025-02-19_00-26-12.563903.log
Untracked: VennDiagram.2025-02-19_00-26-13.147608.log
Untracked: VennDiagram.2025-02-19_00-26-13.241595.log
Untracked: VennDiagram.2025-02-19_00-26-13.40181.log
Untracked: VennDiagram.2025-02-19_00-26-14.299368.log
Untracked: VennDiagram.2025-02-19_00-26-14.491279.log
Untracked: VennDiagram.2025-02-19_00-26-14.608716.log
Untracked: VennDiagram.2025-02-19_00-26-15.84461.log
Untracked: VennDiagram.2025-02-19_00-26-15.888073.log
Untracked: VennDiagram.2025-02-19_00-26-16.020382.log
Untracked: VennDiagram.2025-02-19_00-26-16.691562.log
Untracked: VennDiagram.2025-02-19_00-26-16.734494.log
Untracked: VennDiagram.2025-02-19_00-26-16.808353.log
Untracked: VennDiagram.2025-02-19_00-26-17.664569.log
Untracked: VennDiagram.2025-02-19_00-26-17.773688.log
Untracked: VennDiagram.2025-02-19_00-26-17.955978.log
Untracked: VennDiagram.2025-02-19_00-26-18.912722.log
Untracked: VennDiagram.2025-02-19_00-26-19.023018.log
Untracked: VennDiagram.2025-02-19_00-26-19.201162.log
Untracked: VennDiagram.2025-02-19_00-26-20.198876.log
Untracked: VennDiagram.2025-02-19_00-26-20.339394.log
Untracked: VennDiagram.2025-02-19_00-26-20.495203.log
Untracked: VennDiagram.2025-02-19_00-26-20.754769.log
Untracked: VennDiagram.2025-02-19_00-26-20.915182.log
Untracked: VennDiagram.2025-02-19_00-26-21.110117.log
Untracked: VennDiagram.2025-02-19_00-26-28.319055.log
Untracked: VennDiagram.2025-02-19_00-26-28.902249.log
Untracked: VennDiagram.2025-02-19_00-26-29.434464.log
Untracked: VennDiagram.2025-02-19_00-26-29.981347.log
Untracked: VennDiagram.2025-02-19_00-26-30.48741.log
Untracked: VennDiagram.2025-02-19_00-26-33.503353.log
Untracked: VennDiagram.2025-02-19_00-26-33.864508.log
Untracked: VennDiagram.2025-02-19_00-26-33.937407.log
Untracked: VennDiagram.2025-02-19_00-26-39.23504.log
Untracked: VennDiagram.2025-02-19_00-26-39.35344.log
Untracked: VennDiagram.2025-02-19_00-26-39.426602.log
Untracked: VennDiagram.2025-02-19_00-26-45.123997.log
Untracked: VennDiagram.2025-02-19_00-26-45.174552.log
Untracked: VennDiagram.2025-02-19_00-26-45.245105.log
Untracked: VennDiagram.2025-02-19_00-26-49.933369.log
Untracked: VennDiagram.2025-02-19_00-26-49.989636.log
Untracked: VennDiagram.2025-02-19_00-26-50.067423.log
Untracked: VennDiagram.2025-02-19_00-26-55.250141.log
Untracked: VennDiagram.2025-02-19_00-26-55.345536.log
Untracked: VennDiagram.2025-02-19_00-26-55.455281.log
Untracked: VennDiagram.2025-02-19_00-27-00.668035.log
Untracked: VennDiagram.2025-02-19_00-27-00.815473.log
Untracked: VennDiagram.2025-02-19_00-27-00.921827.log
Untracked: VennDiagram.2025-02-19_00-27-07.539614.log
Untracked: VennDiagram.2025-02-19_00-27-07.629385.log
Untracked: VennDiagram.2025-02-19_00-27-07.734975.log
Untracked: VennDiagram.2025-02-19_00-27-07.89401.log
Untracked: VennDiagram.2025-02-19_00-27-07.995417.log
Untracked: VennDiagram.2025-02-19_00-27-08.149242.log
Untracked: VennDiagram.2025-02-19_00-27-16.554396.log
Untracked: VennDiagram.2025-02-19_00-27-16.707058.log
Untracked: VennDiagram.2025-02-19_00-27-16.801039.log
Untracked: VennDiagram.2025-02-19_00-27-23.737163.log
Untracked: VennDiagram.2025-02-19_00-27-23.824665.log
Untracked: VennDiagram.2025-02-19_00-27-23.928958.log
Untracked: VennDiagram.2025-02-19_00-33-20.698591.log
Untracked: VennDiagram.2025-02-19_00-33-21.318215.log
Untracked: VennDiagram.2025-02-19_00-33-21.853493.log
Untracked: VennDiagram.2025-02-19_00-33-22.432323.log
Untracked: VennDiagram.2025-02-19_00-33-23.299242.log
Untracked: VennDiagram.2025-02-19_00-33-23.921532.log
Untracked: VennDiagram.2025-02-19_00-33-24.056721.log
Untracked: VennDiagram.2025-02-19_00-33-24.230483.log
Untracked: VennDiagram.2025-02-19_00-33-25.132705.log
Untracked: VennDiagram.2025-02-19_00-33-25.271917.log
Untracked: VennDiagram.2025-02-19_00-33-25.364172.log
Untracked: VennDiagram.2025-02-19_00-33-26.382907.log
Untracked: VennDiagram.2025-02-19_00-33-26.429725.log
Untracked: VennDiagram.2025-02-19_00-33-26.511566.log
Untracked: VennDiagram.2025-02-19_00-33-27.339106.log
Untracked: VennDiagram.2025-02-19_00-33-27.38223.log
Untracked: VennDiagram.2025-02-19_00-33-27.510075.log
Untracked: VennDiagram.2025-02-19_00-33-28.610624.log
Untracked: VennDiagram.2025-02-19_00-33-28.766455.log
Untracked: VennDiagram.2025-02-19_00-33-28.894421.log
Untracked: VennDiagram.2025-02-19_00-33-29.797531.log
Untracked: VennDiagram.2025-02-19_00-33-29.899743.log
Untracked: VennDiagram.2025-02-19_00-33-30.067137.log
Untracked: VennDiagram.2025-02-19_00-33-31.300233.log
Untracked: VennDiagram.2025-02-19_00-33-31.390845.log
Untracked: VennDiagram.2025-02-19_00-33-31.49441.log
Untracked: VennDiagram.2025-02-19_00-33-31.696759.log
Untracked: VennDiagram.2025-02-19_00-33-31.866906.log
Untracked: VennDiagram.2025-02-19_00-33-31.976004.log
Untracked: VennDiagram.2025-02-19_00-33-38.41082.log
Untracked: VennDiagram.2025-02-19_00-33-39.052663.log
Untracked: VennDiagram.2025-02-19_00-33-39.597515.log
Untracked: VennDiagram.2025-02-19_00-33-40.154527.log
Untracked: VennDiagram.2025-02-19_00-33-40.717342.log
Untracked: VennDiagram.2025-02-19_00-33-44.143492.log
Untracked: VennDiagram.2025-02-19_00-33-44.206999.log
Untracked: VennDiagram.2025-02-19_00-33-44.282855.log
Untracked: VennDiagram.2025-02-19_00-33-49.323427.log
Untracked: VennDiagram.2025-02-19_00-33-49.381072.log
Untracked: VennDiagram.2025-02-19_00-33-49.451647.log
Untracked: VennDiagram.2025-02-19_00-33-54.707214.log
Untracked: VennDiagram.2025-02-19_00-33-54.756589.log
Untracked: VennDiagram.2025-02-19_00-33-54.888789.log
Untracked: VennDiagram.2025-02-19_00-33-59.923873.log
Untracked: VennDiagram.2025-02-19_00-33-59.976814.log
Untracked: VennDiagram.2025-02-19_00-34-00.059093.log
Untracked: VennDiagram.2025-02-19_00-34-05.246014.log
Untracked: VennDiagram.2025-02-19_00-34-05.386983.log
Untracked: VennDiagram.2025-02-19_00-34-05.495627.log
Untracked: VennDiagram.2025-02-19_00-34-10.66816.log
Untracked: VennDiagram.2025-02-19_00-34-10.758927.log
Untracked: VennDiagram.2025-02-19_00-34-10.866676.log
Untracked: VennDiagram.2025-02-19_00-34-17.794858.log
Untracked: VennDiagram.2025-02-19_00-34-17.88475.log
Untracked: VennDiagram.2025-02-19_00-34-18.038407.log
Untracked: VennDiagram.2025-02-19_00-34-18.13879.log
Untracked: VennDiagram.2025-02-19_00-34-18.304421.log
Untracked: VennDiagram.2025-02-19_00-34-18.411287.log
Untracked: VennDiagram.2025-02-19_00-34-27.235827.log
Untracked: VennDiagram.2025-02-19_00-34-27.325741.log
Untracked: VennDiagram.2025-02-19_00-34-27.426145.log
Untracked: VennDiagram.2025-02-19_00-34-34.155129.log
Untracked: VennDiagram.2025-02-19_00-34-34.241188.log
Untracked: VennDiagram.2025-02-19_00-34-34.409073.log
Untracked: VennDiagram.2025-02-19_00-52-33.095813.log
Untracked: VennDiagram.2025-02-19_00-52-33.77208.log
Untracked: VennDiagram.2025-02-19_00-52-34.323694.log
Untracked: VennDiagram.2025-02-19_00-52-34.876006.log
Untracked: VennDiagram.2025-02-19_00-52-35.682899.log
Untracked: VennDiagram.2025-02-19_00-52-36.285793.log
Untracked: VennDiagram.2025-02-19_00-52-36.385768.log
Untracked: VennDiagram.2025-02-19_00-52-36.533758.log
Untracked: VennDiagram.2025-02-19_00-52-37.396762.log
Untracked: VennDiagram.2025-02-19_00-52-37.522269.log
Untracked: VennDiagram.2025-02-19_00-52-37.672476.log
Untracked: VennDiagram.2025-02-19_00-52-38.548478.log
Untracked: VennDiagram.2025-02-19_00-52-38.890685.log
Untracked: VennDiagram.2025-02-19_00-52-38.963824.log
Untracked: VennDiagram.2025-02-19_00-52-39.629011.log
Untracked: VennDiagram.2025-02-19_00-52-39.669878.log
Untracked: VennDiagram.2025-02-19_00-52-39.795557.log
Untracked: VennDiagram.2025-02-19_00-52-40.791824.log
Untracked: VennDiagram.2025-02-19_00-52-40.898838.log
Untracked: VennDiagram.2025-02-19_00-52-41.067889.log
Untracked: VennDiagram.2025-02-19_00-52-41.902756.log
Untracked: VennDiagram.2025-02-19_00-52-42.060632.log
Untracked: VennDiagram.2025-02-19_00-52-42.176172.log
Untracked: VennDiagram.2025-02-19_00-52-43.124775.log
Untracked: VennDiagram.2025-02-19_00-52-43.26273.log
Untracked: VennDiagram.2025-02-19_00-52-43.414173.log
Untracked: VennDiagram.2025-02-19_00-52-43.624641.log
Untracked: VennDiagram.2025-02-19_00-52-43.723864.log
Untracked: VennDiagram.2025-02-19_00-52-43.876686.log
Untracked: VennDiagram.2025-02-19_00-52-50.395671.log
Untracked: VennDiagram.2025-02-19_00-52-50.982669.log
Untracked: VennDiagram.2025-02-19_00-52-51.506272.log
Untracked: VennDiagram.2025-02-19_00-52-52.025053.log
Untracked: VennDiagram.2025-02-19_00-52-52.535647.log
Untracked: VennDiagram.2025-02-19_00-52-55.520631.log
Untracked: VennDiagram.2025-02-19_00-52-55.577491.log
Untracked: VennDiagram.2025-02-19_00-52-55.704049.log
Untracked: VennDiagram.2025-02-19_00-53-01.225018.log
Untracked: VennDiagram.2025-02-19_00-53-01.283446.log
Untracked: VennDiagram.2025-02-19_00-53-01.35707.log
Untracked: VennDiagram.2025-02-19_00-53-07.009745.log
Untracked: VennDiagram.2025-02-19_00-53-07.05761.log
Untracked: VennDiagram.2025-02-19_00-53-07.127274.log
Untracked: VennDiagram.2025-02-19_00-53-12.622344.log
Untracked: VennDiagram.2025-02-19_00-53-12.727351.log
Untracked: VennDiagram.2025-02-19_00-53-12.803588.log
Untracked: VennDiagram.2025-02-19_00-53-18.384503.log
Untracked: VennDiagram.2025-02-19_00-53-18.566148.log
Untracked: VennDiagram.2025-02-19_00-53-18.742062.log
Untracked: VennDiagram.2025-02-19_00-53-24.158518.log
Untracked: VennDiagram.2025-02-19_00-53-24.257476.log
Untracked: VennDiagram.2025-02-19_00-53-24.426949.log
Untracked: VennDiagram.2025-02-19_00-53-31.42607.log
Untracked: VennDiagram.2025-02-19_00-53-31.517315.log
Untracked: VennDiagram.2025-02-19_00-53-31.943246.log
Untracked: VennDiagram.2025-02-19_00-53-32.045029.log
Untracked: VennDiagram.2025-02-19_00-53-32.158777.log
Untracked: VennDiagram.2025-02-19_00-53-32.314934.log
Untracked: VennDiagram.2025-02-19_00-53-40.482904.log
Untracked: VennDiagram.2025-02-19_00-53-40.568202.log
Untracked: VennDiagram.2025-02-19_00-53-40.671108.log
Untracked: VennDiagram.2025-02-19_00-53-47.727146.log
Untracked: VennDiagram.2025-02-19_00-53-47.807306.log
Untracked: VennDiagram.2025-02-19_00-53-47.904046.log
Untracked: analysis/VennDiagram.2025-02-19_00-02-41.857736.log
Untracked: analysis/VennDiagram.2025-02-19_00-02-42.342479.log
Untracked: analysis/VennDiagram.2025-02-19_00-02-42.786429.log
Untracked: analysis/VennDiagram.2025-02-19_00-02-43.346076.log
Untracked: analysis/VennDiagram.2025-02-19_00-02-43.692998.log
Untracked: analysis/VennDiagram.2025-02-19_00-02-44.080721.log
Untracked: analysis/VennDiagram.2025-02-19_00-06-06.617464.log
Untracked: analysis/VennDiagram.2025-02-19_00-06-06.937883.log
Untracked: analysis/VennDiagram.2025-02-19_00-06-07.107717.log
Untracked: analysis/VennDiagram.2025-02-19_00-07-49.53549.log
Untracked: analysis/VennDiagram.2025-02-19_00-07-49.838215.log
Untracked: analysis/VennDiagram.2025-02-19_00-07-50.194331.log
Untracked: analysis/VennDiagram.2025-02-19_00-07-51.368246.log
Untracked: analysis/VennDiagram.2025-02-19_00-07-51.669745.log
Untracked: analysis/VennDiagram.2025-02-19_00-07-52.153586.log
Untracked: analysis/VennDiagram.2025-02-19_00-15-35.217343.log
Untracked: analysis/VennDiagram.2025-02-19_00-15-35.441086.log
Untracked: analysis/VennDiagram.2025-02-19_00-15-35.662977.log
Untracked: analysis/VennDiagram.2025-02-19_00-15-35.883515.log
Untracked: analysis/VennDiagram.2025-02-19_00-15-36.174919.log
Untracked: analysis/VennDiagram.2025-02-19_00-15-36.37529.log
Untracked: analysis/VennDiagram.2025-02-19_00-16-35.357156.log
Untracked: analysis/VennDiagram.2025-02-19_00-16-35.505389.log
Untracked: analysis/VennDiagram.2025-02-19_00-16-35.779052.log
Untracked: analysis/VennDiagram.2025-02-19_00-17-29.763089.log
Untracked: analysis/VennDiagram.2025-02-19_00-17-29.914926.log
Untracked: analysis/VennDiagram.2025-02-19_00-17-30.198819.log
Untracked: analysis/VennDiagram.2025-02-19_00-17-30.941132.log
Untracked: analysis/VennDiagram.2025-02-19_00-17-31.112726.log
Untracked: analysis/VennDiagram.2025-02-19_00-17-31.382201.log
Untracked: analysis/VennDiagram.2025-02-19_00-17-38.602576.log
Untracked: analysis/VennDiagram.2025-02-19_00-17-38.829729.log
Untracked: analysis/VennDiagram.2025-02-19_00-17-39.0535.log
Untracked: analysis/VennDiagram.2025-02-19_00-17-39.997323.log
Untracked: analysis/VennDiagram.2025-02-19_00-17-40.24828.log
Untracked: analysis/VennDiagram.2025-02-19_00-17-40.467315.log
Untracked: analysis/VennDiagram.2025-02-19_00-24-27.135103.log
Untracked: analysis/VennDiagram.2025-02-19_00-24-27.433968.log
Untracked: analysis/VennDiagram.2025-02-19_00-24-27.946018.log
Untracked: analysis/VennDiagram.2025-02-19_00-26-14.710653.log
Untracked: analysis/VennDiagram.2025-02-19_00-26-14.926229.log
Untracked: analysis/VennDiagram.2025-02-19_00-26-15.260861.log
Untracked: analysis/VennDiagram.2025-02-19_00-27-20.796686.log
Untracked: analysis/VennDiagram.2025-02-19_00-27-21.157414.log
Untracked: analysis/VennDiagram.2025-02-19_00-27-21.420226.log
Untracked: analysis/VennDiagram.2025-02-19_00-27-53.716011.log
Untracked: analysis/VennDiagram.2025-02-19_00-27-53.978443.log
Untracked: analysis/VennDiagram.2025-02-19_00-27-54.235876.log
Untracked: analysis/VennDiagram.2025-02-19_00-28-42.999843.log
Untracked: analysis/VennDiagram.2025-02-19_00-28-43.291244.log
Untracked: analysis/VennDiagram.2025-02-19_00-28-43.495347.log
Untracked: analysis/VennDiagram.2025-02-19_00-29-08.797774.log
Untracked: analysis/VennDiagram.2025-02-19_00-29-09.000699.log
Untracked: analysis/VennDiagram.2025-02-19_00-29-09.231426.log
Untracked: analysis/VennDiagram.2025-02-19_00-29-48.724336.log
Untracked: analysis/VennDiagram.2025-02-19_00-29-48.96818.log
Untracked: analysis/VennDiagram.2025-02-19_00-29-49.195993.log
Untracked: analysis/VennDiagram.2025-02-19_00-30-13.873298.log
Untracked: analysis/VennDiagram.2025-02-19_00-30-14.054743.log
Untracked: analysis/VennDiagram.2025-02-19_00-30-14.207976.log
Untracked: analysis/VennDiagram.2025-02-19_00-30-42.323592.log
Untracked: analysis/VennDiagram.2025-02-19_00-30-42.490849.log
Untracked: analysis/VennDiagram.2025-02-19_00-30-42.687886.log
Untracked: analysis/VennDiagram.2025-02-19_00-32-09.35965.log
Untracked: analysis/VennDiagram.2025-02-19_00-32-09.541393.log
Untracked: analysis/VennDiagram.2025-02-19_00-32-09.721318.log
Untracked: analysis/VennDiagram.2025-02-19_00-32-29.804779.log
Untracked: analysis/VennDiagram.2025-02-19_00-32-29.977269.log
Untracked: analysis/VennDiagram.2025-02-19_00-32-30.147694.log
Untracked: analysis/VennDiagram.2025-02-19_01-20-21.674127.log
Untracked: analysis/VennDiagram.2025-02-19_01-20-21.931179.log
Untracked: analysis/VennDiagram.2025-02-19_01-20-22.173368.log
Untracked: data/DEG_results/RNAi/All_no_rRNA/sva_scatter_SV1_SV2.png
Untracked: data/DEG_results/RNAi/All_no_rRNA/sva_scatter_SV1_SV3.png
Untracked: data/DEG_results/RNAi/All_no_rRNA/sva_scatter_SV2_SV3.png
Untracked: data/DEG_results/RNAi/All_no_rRNA/sva_stripchart_SV3.png
Untracked: data/RefSeq/
Untracked: data/orthofinder/Polyneoptera/Results_I5/Orthogroups/Orthogroups.tsv
Untracked: data/orthofinder/Polyneoptera/Results_I5/Orthogroups/Orthogroups.txt
Untracked: data/orthofinder/Polyneoptera/Results_I5/Orthogroups/Orthogroups_reprocessed.tsv
Untracked: data/orthofinder/Polyneoptera/Results_I5/Orthogroups/Orthogroups_reprocessed.txt
Untracked: data/orthofinder/Schistocerca/Results_I2/Orthogroups/Orthogroups.tsv
Untracked: data/orthofinder/Schistocerca/Results_I2/Orthogroups/Orthogroups.txt
Untracked: data/orthofinder/Schistocerca/Results_I2/Orthogroups/Orthogroups_reprocessed.tsv
Untracked: data/orthofinder/Schistocerca/Results_I2/Orthogroups/Orthogroups_reprocessed.txt
Unstaged changes:
Modified: analysis/3_deseq2-results.Rmd
Modified: analysis/3_overlap-venn.Rmd
Modified: analysis/4_RNAi_degs.Rmd
Modified: data/DEG_results/Bulk_RNAseq/All_species/PCA_VST_togregaria.png
Modified: data/DEG_results/Bulk_RNAseq/All_species/PCA_labelled_VST_togregaria.png
Modified: data/DEG_results/RNAi/All_no_rRNA/All_raw_counts.csv
Modified: data/DEG_results/RNAi/All_no_rRNA/Hex1_vs_GFP/DEG_results_Hex1_vs_GFP.csv
Modified: data/DEG_results/RNAi/All_no_rRNA/Hex1_vs_GFP/heatmap_plot_Hex1_vs_GFP.tiff
Modified: data/DEG_results/RNAi/All_no_rRNA/Hex1_vs_GFP/volcano_plot_Hex1_vs_GFP.tiff
Modified: data/DEG_results/RNAi/All_no_rRNA/Hex2_vs_GFP/DEG_results_Hex2_vs_GFP.csv
Modified: data/DEG_results/RNAi/All_no_rRNA/Hex2_vs_GFP/heatmap_plot_Hex2_vs_GFP.tiff
Modified: data/DEG_results/RNAi/All_no_rRNA/Hex2_vs_GFP/volcano_plot_Hex2_vs_GFP.tiff
Modified: data/DEG_results/RNAi/All_no_rRNA/JHMT_vs_GFP/DEG_results_JHMT_vs_GFP.csv
Modified: data/DEG_results/RNAi/All_no_rRNA/JHMT_vs_GFP/heatmap_plot_JHMT_vs_GFP.tiff
Modified: data/DEG_results/RNAi/All_no_rRNA/JHMT_vs_GFP/volcano_plot_JHMT_vs_GFP.tiff
Modified: data/DEG_results/RNAi/All_no_rRNA/MIOX_vs_GFP/DEG_results_MIOX_vs_GFP.csv
Modified: data/DEG_results/RNAi/All_no_rRNA/MIOX_vs_GFP/heatmap_plot_MIOX_vs_GFP.tiff
Modified: data/DEG_results/RNAi/All_no_rRNA/MIOX_vs_GFP/volcano_plot_MIOX_vs_GFP.tiff
Modified: data/DEG_results/RNAi/All_no_rRNA/PCA_Tissue_Gene_Label.png
Modified: data/DEG_results/RNAi/All_no_rRNA/PCA_Tissue_Gene_NoLabel.png
Modified: data/DEG_results/RNAi/All_no_rRNA/UNCH_vs_GFP/DEG_results_UNCH_vs_GFP.csv
Modified: data/DEG_results/RNAi/All_no_rRNA/UNCH_vs_GFP/heatmap_plot_UNCH_vs_GFP.tiff
Modified: data/DEG_results/RNAi/All_no_rRNA/UNCH_vs_GFP/volcano_plot_UNCH_vs_GFP.tiff
Modified: data/DEG_results/RNAi/All_no_rRNA/sva_stripchart_SV1.png
Modified: data/DEG_results/RNAi/All_no_rRNA/sva_stripchart_SV2.png
Modified: data/DEG_results/RNAi/Head/UNCH_vs_GFP/volcano_plot_UNCH_vs_GFP.tiff
Modified: data/DEG_results/RNAi/Head_no_rRNA/Hex1_vs_GFP/volcano_plot_Hex1_vs_GFP.tiff
Modified: data/DEG_results/RNAi/Head_no_rRNA/Hex2_vs_GFP/volcano_plot_Hex2_vs_GFP.tiff
Modified: data/DEG_results/RNAi/Head_no_rRNA/JHMT_vs_GFP/volcano_plot_JHMT_vs_GFP.tiff
Modified: data/DEG_results/RNAi/Head_no_rRNA/MIOX_vs_GFP/volcano_plot_MIOX_vs_GFP.tiff
Modified: data/DEG_results/RNAi/Head_no_rRNA/PCA_vst_Gene_labelled.png
Modified: data/DEG_results/RNAi/Head_no_rRNA/UNCH_vs_GFP/volcano_plot_UNCH_vs_GFP.tiff
Modified: data/DEG_results/RNAi/Head_no_rRNA/sva_stripchart_SV1.png
Modified: data/DEG_results/RNAi/Head_no_rRNA/sva_stripchart_SV2.png
Modified: data/DEG_results/RNAi/Head_no_rRNA/sva_stripchart_SV3.png
Modified: data/DEG_results/RNAi/Thorax/Hex1_vs_GFP/DEG_results_Hex1_vs_GFP.csv
Modified: data/DEG_results/RNAi/Thorax/Hex1_vs_GFP/heatmap_plot_Hex1_vs_GFP.tiff
Modified: data/DEG_results/RNAi/Thorax/Hex1_vs_GFP/volcano_plot_Hex1_vs_GFP.tiff
Modified: data/DEG_results/RNAi/Thorax/Hex2_vs_GFP/DEG_results_Hex2_vs_GFP.csv
Modified: data/DEG_results/RNAi/Thorax/Hex2_vs_GFP/heatmap_plot_Hex2_vs_GFP.tiff
Modified: data/DEG_results/RNAi/Thorax/Hex2_vs_GFP/volcano_plot_Hex2_vs_GFP.tiff
Modified: data/DEG_results/RNAi/Thorax/JHMT_vs_GFP/DEG_results_JHMT_vs_GFP.csv
Modified: data/DEG_results/RNAi/Thorax/JHMT_vs_GFP/heatmap_plot_JHMT_vs_GFP.tiff
Modified: data/DEG_results/RNAi/Thorax/JHMT_vs_GFP/volcano_plot_JHMT_vs_GFP.tiff
Modified: data/DEG_results/RNAi/Thorax/MIOX_vs_GFP/DEG_results_MIOX_vs_GFP.csv
Modified: data/DEG_results/RNAi/Thorax/MIOX_vs_GFP/heatmap_plot_MIOX_vs_GFP.tiff
Modified: data/DEG_results/RNAi/Thorax/MIOX_vs_GFP/volcano_plot_MIOX_vs_GFP.tiff
Modified: data/DEG_results/RNAi/Thorax/PCA_vst_Gene_labelled.png
Modified: data/DEG_results/RNAi/Thorax/UNCH_vs_GFP/DEG_results_UNCH_vs_GFP.csv
Modified: data/DEG_results/RNAi/Thorax/UNCH_vs_GFP/heatmap_plot_UNCH_vs_GFP.tiff
Modified: data/DEG_results/RNAi/Thorax/UNCH_vs_GFP/volcano_plot_UNCH_vs_GFP.tiff
Modified: data/DEG_results/RNAi/Thorax/sva_scatter_SV1_SV2.png
Modified: data/DEG_results/RNAi/Thorax/sva_scatter_SV1_SV3.png
Modified: data/DEG_results/RNAi/Thorax/sva_scatter_SV2_SV3.png
Modified: data/DEG_results/RNAi/Thorax/sva_stripchart_SV1.png
Modified: data/DEG_results/RNAi/Thorax/sva_stripchart_SV2.png
Modified: data/DEG_results/RNAi/Thorax/sva_stripchart_SV3.png
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
These are the previous versions of the repository in which changes were
made to the R Markdown (analysis/4_RNAi_degs.Rmd) and HTML
(docs/4_RNAi_degs.html) files. If you’ve configured a
remote Git repository (see ?wflow_git_remote), click on the
hyperlinks in the table below to view the files as they were in that
past version.
| File | Version | Author | Date | Message |
|---|---|---|---|---|
| Rmd | d7fa779 | Maeva TECHER | 2025-02-14 | Update RNAi and overlap |
| html | d7fa779 | Maeva TECHER | 2025-02-14 | Update RNAi and overlap |
| Rmd | e9e41d7 | Maeva TECHER | 2025-02-12 | change layout RNAi |
| html | e9e41d7 | Maeva TECHER | 2025-02-12 | change layout RNAi |
| Rmd | 3746422 | Maeva TECHER | 2025-02-12 | Add RNAi |
| html | 3746422 | Maeva TECHER | 2025-02-12 | Add RNAi |
Following the overlap analysis of bulk tissue RNA-seq data from the whole head and thorax across all species, we selected a subset of differentially expressed genes between isolated and crowded individuals. The selection criteria were as follows:
A total of X genes were included in this list and used for functional validation to assess their impact on collective behavior and the transcriptome landscape of gregarious nymphs in the Desert Locust S. gregaria. Following RNAi probes engineering, only genes with a knockdown efficacy exceeding X% in both males and females were kept for further analysis.
Hypothesis: Genes that are highly differentiated between phases are part of the downstream molecular machinery responding to density changes. If these genes do not directly drive rapid behavioral changes, they may instead contribute to the maintenance of phase-specific traits. Disrupting their function could interfere with gene-gene interactions essential for stabilizing either the solitarious or gregarious phase, triggering compensatory maintenance mechanism.
For Seema to add her part
Candidate genes for RNAi (decided from literature): - LOC126355014: S. gregaria heat shock 70 kDa protein 4 - LOC126297585: S. gregaria cAMP-dependent protein kinase catalytic subunit 1 - LOC126284097: S. gregaria DNA (cytosine-5)-methyltransferase 3B-like (Dnmt3)
Candidate genes for RNAi (decided from DEG and overlap analysis): - LOC126336408 (Hex1): S. gregaria hexamerin-like, transcript variant X2 - LOC126334874 (Hex2): S. gregaria hexamerin-like - LOC126335148 (jhmt): S. gregaria juvenile hormone acid O-methyltransferase-like, transcript variant X1 - LOC126334877: S. gregaria allergen Cr-PI-like - LOC126268104 (unch): S. gregaria zona pellucida domain-containing protein miniature - LOC126277894 (miox): S. gregaria inositol oxygenase-like
For Seema and Audelia to add their parts
To prepare future query of gene annotations for enrichment analysis,
we can choose to use R packages that dynamically query them from online
resources. We attempted two methods here: one is to build an OrgDB
project for S. gregaria using NCBI RefSeq, and the other is
using blast2go evidence previously generated for the
cross-species RNAseq.
The first method created close to 100 Gb worth of files to cache the NCBI and corresponding files for S. gregaria. However, due to errors with the NCBI genes, we preferred to opt for the second option. Below we present the code used:
library(AnnotationForge)
library(rtracklayer)
library(Biostrings)
# First attempt with NCBI data
makeOrgPackageFromNCBI(version = "0.1",
author = "Devon J. Boland <devonjboland@tamu.edu>",
maintainer = "Devon J. Boland <devonjboland@tamu.edu>",
outputDir = ".",
tax_id = "7010",
genus = "Schistocerca",
species = "gregaria")
# Create custom ORGdb project using blast2go evidence
ggtf <- import("/Users/maevatecher/Documents/GitHub/locust-comparative-genomics/data/RefSeq/GCF_023897955.1_iqSchGreg1.2_genomic.gtf")
gtf_df <- as.data.frame(ggtf)
protein_coding_genes <- gtf_df[which(gtf_df$gene_biotype == "protein_coding"), ]
protein_coding_genes <- protein_coding_genes[which(protein_coding_genes$source != "RefSeq"), ]
rownames(protein_coding_genes) <- NULL
gregariaSym <- protein_coding_genes[, c(10, 12, 13)]
gregariaSym$db_xref <- gsub("GeneID:", "", gregariaSym$db_xref)
colnames(gregariaSym) <- c("GID", "ENTREZ", "GENENAME")
gregariaChr <- protein_coding_genes[, c(10, 1)]
colnames(gregariaChr) <- c("GID", "CHROMOSOME")
# Removed predicted NCBI genes as they were causing errors with package, and not having appropriate information, or model confidence. Additionally, blast2GO assinged some of these EC codes over GO codes so they were removed
gregariaGO <- read.delim("/Users/maevatecher/Documents/GitHub/locust-comparative-genomics/data/list/GO_Annotations/blast2go_gregaria.annot.mgp_removed", sep = "\t", header = F)
colnames(gregariaGO) <- c("GID", "GO", "EVIDENCE")
gregariaGO$EVIDENCE <- "ISS"
gregariaGO <- gregariaGO[!grepl("EC:", gregariaGO$GO), ] # remove rows containing EC annotation codes
custom_db_package <- "/Users/maevatecher/Documents/GitHub/locust-comparative-genomics/data/custom_sgregaria_orgdb"
dir.create(custom_db_package)
orgdb_df <- data.frame(
organism = "Schistocerca gregaria",
tax_id = "7010",
genus = "Schistocerca",
species = "gregaria",
genome_build = "GCF_023897955.1_iqSchGreg1.2"
)
makeOrgPackage(gene_info=gregariaSym,
chromosome=gregariaChr,
go=gregariaGO,
version="1.0.0",
maintainer= "Devon J. Boland <devonjboland@tamu.edu>",
author="Devon J. Boland <devonjboland@tamu.edu>",
outputDir=custom_db_package,
tax_id = "7010",
genus = "Schistocerca",
species = "gregaria",
goTable="go",
verbose=TRUE)
Do these two steps before to install the new package:
install.packages("remotes") # Install remotes if not installed
remotes::install_local("/Users/maevatecher/Documents/GitHub/locust-comparative-genomics/data/custom_sgregaria_orgdb/org.Sgregaria.eg.db")
library("org.Sgregaria.eg.db")
keytypes(org.Sgregaria.eg.db)
[1] "CHROMOSOME" "ENTREZ" "EVIDENCE" "EVIDENCEALL" "GENENAME"
[6] "GID" "GO" "GOALL" "ONTOLOGY" "ONTOLOGYALL"
The following results were obtained using the same RNA-seq workflow
as the non-RNAi bulk tissue transcriptomics. This includes RNA
extraction using Maxwell Promega simplyRNA tisse kit, RNA library
preparation with the Illumina Total Stranded RNA kit with RiboDepletion,
and short-read sequencing on an Illumina NovaSeq PE150 platform.
Differentially expressed genes between GFP-injected controls and
RNAi-injected last nymphal instar females of the gregarious phase were
analyzed using DESeq2.
We start by loading all the required R packages with in particular
DESeq2 for DEG analysis, biomaRt for pathway
annotations and clusterProfiler for GO enrichment and
visualization.
library("DESeq2")
library("ggplot2")
library("ggrepel")
library("ggConvexHull")
library("AnnotationHub")
library("ensembldb")
library("ComplexHeatmap")
library("RColorBrewer")
library("circlize")
library("EnhancedVolcano")
library("clusterProfiler")
library("sva")
library("cowplot")
library("ashr")
library("dplyr")
library("purrr")
library("httr2")
library("biomaRt")
library("rafalib")
library("DT")
workDir <- "/Users/maevatecher/Documents/GitHub/locust-comparative-genomics/data"
setwd(workDir)
allspecies_path <- file.path(workDir, "/list/13polyneoptera_geneid_ncbi.csv")
allspecies_df <- read.table(allspecies_path, sep = ",", header = TRUE, quote = "", fill = TRUE, stringsAsFactors = FALSE)
We also create ahead function that we will use to output graphs (thanks to Devon’s touch) as files and visible in line in this report.
########################################################################################
# DEGs FUNCTIONS
########################################################################################
create_output_dirs <- function(label) {
dir.create(file.path(saveDir, label), showWarnings = FALSE)
return()
}
########################################################################################
create_pca_plots <- function(norm.dds, saveDir, transformation = "vst", intgroup = "Condition", title = NULL) {
# Ensure saveDir exists
dir.create(saveDir, showWarnings = FALSE, recursive = TRUE)
# Apply the requested transformation
if (transformation == "vst") {
vsd <- vst(dds, blind = FALSE)
} else if (transformation == "rlog") {
vsd <- rlog(dds, blind = FALSE)
} else if (transformation == "log2") {
vsd <- log2(counts(dds, normalized = TRUE) + 1)
} else {
stop("Invalid transformation type. Choose from 'vst', 'rlog', or 'log2'.")
}
# If no title is provided, create one dynamically
if (is.null(title)) {
title <- paste("PCA on", intgroup, "(", transformation, "transformation)")
}
# Construct filename prefix based on transformation & grouping
file_prefix <- paste0("PCA_", transformation, "_", intgroup)
# First PCA: **with labels**
pca_labelled <- plotPCA(vsd, intgroup = intgroup) +
geom_text_repel(aes(label = rownames(colData(vsd))), size = 4, max.overlaps = 20) +
geom_point(size = 3) +
theme_bw() +
theme(legend.title = element_blank(),
legend.text = element_text(face = "bold", size = 16),
axis.text = element_text(size = 12),
axis.title = element_text(size = 12)) +
ggtitle(title)
# Save labelled PCA plot
ggsave(paste0(saveDir, "/", file_prefix, "_labelled.png"), width = 10, height = 10,
dpi = 600, device = "png", plot = pca_labelled)
# Second PCA: **Convex Hulls** around groups
pca_hull <- plotPCA(vsd, intgroup = intgroup) +
geom_point(size = 3) +
theme_bw() +
theme(legend.title = element_blank(),
legend.text = element_text(face = "bold", size = 16),
axis.text = element_text(size = 12),
axis.title = element_text(size = 12)) +
geom_convexhull(aes(fill = .data[[intgroup]]), alpha = 0.5) + # Fully dynamic grouping
ggtitle(title, subtitle = paste0(transformation, " transformation"))
# Save hull PCA plot
ggsave(paste0(saveDir, "/", file_prefix, "_hull.png"), width = 10, height = 10,
dpi = 600, device = "png", plot = pca_hull)
# Return plots for inline display in knitr/RMarkdown
return(list(PCA_Labelled = pca_labelled, PCA_Hull = pca_hull))
}
########################################################################################
create_sva_plots <- function(svseq, dds, saveDir, intgroup = c("Tissue", "Gene"), max_sv = 3) {
# Ensure output directory exists
dir.create(saveDir, showWarnings = FALSE, recursive = TRUE)
# **Create grouping factor dynamically**
tissue_gene_groups <- interaction(dds[[intgroup[1]]], dds[[intgroup[2]]], drop = TRUE)
unique_groups <- unique(tissue_gene_groups)
# Assign colors per unique group
group_colors <- setNames(colorRampPalette(brewer.pal(min(length(unique_groups), 8), "Set1"))(length(unique_groups)), unique_groups)
# **First plot: Stripchart of first N surrogate variables**
stripchart_list <- list()
for (i in 1:max_sv) {
sv_values <- svseq$sv[, i]
p <- ggplot(data.frame(SV = sv_values, Group = tissue_gene_groups), aes(x = Group, y = SV, fill = Group)) +
geom_jitter(shape = 21, size = 3, width = 0.2, color = "black") +
scale_fill_manual(values = group_colors) +
theme_minimal() +
labs(title = paste0("Surrogate Variable ", i, " (SV", i, ") - Technical Variation"),
y = "SV Value") +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
geom_hline(yintercept = 0, linetype = "dashed", color = "gray")
# Save each stripchart plot
stripchart_file <- file.path(saveDir, paste0("sva_stripchart_SV", i, ".png"))
ggsave(stripchart_file, plot = p, width = 10, height = 5, dpi = 300)
stripchart_list[[i]] <- p
}
# **Second plot: SV scatter plots (pairwise comparisons)**
scatter_list <- list()
scatter_pairs <- list(c(1,2), c(1,3), c(2,3)) # Pairwise SV combinations
# Define label colors
gene_labels <- as.character(dds[[intgroup[2]]])
unique_genes <- unique(gene_labels)
gene_colors <- setNames(colorRampPalette(brewer.pal(min(length(unique_genes), 8), "Set1"))(length(unique_genes)), unique_genes)
for (pair in scatter_pairs) {
p <- ggplot(data.frame(SV1 = svseq$sv[, pair[1]], SV2 = svseq$sv[, pair[2]], Gene = gene_labels),
aes(x = SV1, y = SV2, color = Gene)) +
geom_point(size = 3) +
scale_color_manual(values = gene_colors) +
theme_minimal() +
labs(title = paste("SVA Analysis: SV", pair[1], " vs SV", pair[2]),
x = paste0("SV", pair[1]), y = paste0("SV", pair[2]))
# Save each scatter plot
scatter_file <- file.path(saveDir, paste0("sva_scatter_SV", pair[1], "_SV", pair[2], ".png"))
ggsave(scatter_file, plot = p, width = 10, height = 5, dpi = 300)
scatter_list[[paste(pair[1], pair[2], sep = "_")]] <- p
}
# **Return plots for knitr/RMarkdown**
return(list(Stripcharts = stripchart_list, ScatterPlots = scatter_list))
}
########################################################################################
# Retrieve various accession IDs
get_sig_genes <- function(res) {
sig_genes <- res[which(res$padj < 0.05 & abs(res$log2FoldChange)>=1.0), ]
sig_genes <- sig_genes[order(sig_genes, decreasing = T), ]
return(sig_genes)
}
########################################################################################
library(kableExtra)
library(DT) # For interactive tables
library(dplyr)
generate_deg_table <- function(ddssva, contrast_name, allspecies_df, tresh_padj = 0.05, tresh_logfold = 1) {
# --- Extract DESeq2 Results ---
deg_results <- results(ddssva, name = contrast_name, alpha = tresh_padj)
summary(deg_results)
# Convert to DataFrame and retain GeneID
deg_df <- as.data.frame(deg_results)
deg_df$GeneID <- rownames(deg_df)
# --- Filter Significant DEGs ---
deg_df <- deg_df %>%
filter(!is.na(padj) & padj < tresh_padj & abs(log2FoldChange) > tresh_logfold) # Remove NA values and filter by thresholds
# --- Merge with Metadata ---
meta_deg_df <- merge(deg_df, allspecies_df, by = "GeneID", all.x = TRUE)
# Ensure GeneType is retained and replace NA values
if (!"GeneType" %in% colnames(meta_deg_df)) {
message("GeneType column missing, filling with 'Unknown'")
meta_deg_df$GeneType <- "Unknown"
}
meta_deg_df$GeneType[is.na(meta_deg_df$GeneType)] <- "Unknown"
# Select and reorder relevant columns
meta_deg_df <- meta_deg_df %>%
dplyr::select(GeneID, GeneType, Description, Species,
baseMean, log2FoldChange, lfcSE, stat, pvalue, padj)
# Round numeric columns
numeric_cols <- c("baseMean", "log2FoldChange", "lfcSE", "stat", "pvalue", "padj")
meta_deg_df[numeric_cols] <- round(meta_deg_df[numeric_cols], 2)
# --- Apply Row Styling for Visualization ---
meta_deg_df$row_color <- ifelse(meta_deg_df$log2FoldChange > 1, "red",
ifelse(meta_deg_df$log2FoldChange < -1, "blue", "black"))
# --- Create Searchable DataTable with Row Coloring ---
deg_kable <- datatable(meta_deg_df, options = list(
pageLength = 10, scrollX = TRUE, autoWidth = TRUE, searchHighlight = TRUE
),
rownames = FALSE, escape = FALSE,
caption = paste("DEG Table:", contrast_name) # Title added here
) %>%
formatStyle(
columns = names(meta_deg_df), # Apply styling to all columns
target = 'row',
backgroundColor = styleEqual(c("red", "blue", "black"), c("#FFDDDD", "#DDDDFF", "white")), # Light red for up, light blue for down
color = styleEqual(c("red", "blue", "black"), c("red", "blue", "black")),
fontWeight = styleEqual(c("bold", "normal"), c("bold", "normal"))
) %>%
formatStyle(
'Species', target = 'cell', fontStyle = 'italic'
)
# --- Return Both Raw and Interactive Table ---
return(list(
meta_results = meta_deg_df,
kable_table = deg_kable
))
}
########################################################################################
create_volcano <- function(res, label) {
mypalette <- brewer.pal(9, "Set1")
volcano <-EnhancedVolcano(res,
lab=rownames(res),
x='log2FoldChange',
y='padj',
title=paste("Volcano Plot:", label),
col=c(mypalette[9], mypalette[3], mypalette[2],
mypalette[1]),
labSize = 4,
pCutoff = 0.05,
FCcutoff = 1,
pointSize = 3,
drawConnectors = T,
widthConnectors = 0.5,
colConnectors = "black",
max.overlaps = 25,
gridlines.major = F,
gridlines.minor = F)
# Save plot at TIFF
ggsave(paste0(saveDir, "/", label,"/volcano_plot_",label,".tiff"), device = "tiff",
plot = volcano, width = 10, height = 10)
# Retrurn the plot for inline display
return(volcano)
}
########################################################################################
create_heatmap <- function(res, label, contrast_) {
mat <- counts(dds, normalized = TRUE)
mat.z <- t(apply(mat, 1, scale))
colnames(mat.z) <- colnames(mat)
mat.z <- mat.z[rownames(res), contrast_, drop = FALSE]
rownames(mat.z) <- rownames(res)
# Create the heatmap
heatmap_plot <- Heatmap(mat.z,
cluster_rows = TRUE,
cluster_columns = FALSE,
column_labels = contrast_,
name = "Z-Transformed Counts",
row_labels = rownames(mat.z),
row_names_gp = gpar(fontsize = 8))
# Save in TIFF
tiff(paste0(saveDir, "/", label, "/heatmap_plot_", label, ".tiff"),
units = "in", res = 300, width = 5, height = 10)
draw(heatmap_plot)
dev.off()
# Return the heatmap object for inline display
return(heatmap_plot)
}
########################################################################################
visualize_data <- function(res, label, contrast_) {
sig_genes <- get_sig_genes(res)
create_output_dirs(label)
# Save results
write.csv(as.data.frame(sig_genes),
paste0(saveDir, "/", label, "/DEG_results_", label, ".csv"))
# Generate and display plots
volcano_plot <- create_volcano(res, label)
heatmap_plot <- create_heatmap(sig_genes, label, contrast_)
# Return plots for knitr inline visualization
list(volcano = volcano_plot, heatmap = heatmap_plot)
}
########################################################################################
# ANNOTATION PART
########################################################################################
get_ids <- function(res) {
rownames(res) <- as.character(rownames(res))
res$ensembl_gene_id <- row.names(res)
annotations <- getBM(attributes = c("ensembl_gene_id", "geneid"),
filters = "ensembl_gene_id",
values = rownames(res),
mart = dataset)
return(annotations$geneid)
}
########################################################################################
GOMFEnrichment <- function(res, label) {
# Check if there are valid gene IDs
if (!is.null(res)) {
# Perform GO enrichment analysis
ego <- enrichGO(
gene = rownames(res),
OrgDb = org.Sgregaria.eg.db,
keyType = "GID",
ont = "MF", # Cellular Component
pAdjustMethod = "BH", # Benjamini-Hochberg adjustment
pvalueCutoff = 0.1
)
# Check if the result has any significant enrichment terms
if (nrow(as.data.frame(ego)) > 0) {
# Create the barplot
go_barplot <- barplot(ego, showCategory = 20) + # Show top 20 categories
ggtitle(paste("GO MF Enrichment:", label))
# Print the plot
ggsave(paste0(saveDir, "/", label,"/gp_MF_barplot_",label,".tiff"), device = "tiff",
plot=go_barplot, width=10, height = 10)
change_vec <- res$log2FoldChange
names(change_vec) <- rownames(res)
RYD = brewer.pal(n = 8, name = "RdBu")
go_network <- cnetplot(ego, foldChange=change_vec) +
scale_color_gradientn(colours = RYD, limits=c(-2,2))
ggsave(paste0(saveDir, "/", label,"/gp_MF_cnetplot_",label,".tiff"), device = "tiff",
plot=go_network, width=30, height = 30, bg = "white")
write.csv(as.data.frame(ego), paste0(saveDir, "/", label,
"/GO_MF_Enrichment_Results_", label,".csv"))
} else {
message("No significant MF GO terms found.")
}
} else {
message("No valid gene IDs found.")
}
return()
}
########################################################################################
GOCCEnrichment <- function(res, label) {
# Check if there are valid gene IDs
if (!is.null(res)) {
# Perform GO enrichment analysis
ego <- enrichGO(
gene = rownames(res),
OrgDb = org.Sgregaria.eg.db,
keyType = "GID",
ont = "CC", # Cellular Component
pAdjustMethod = "BH", # Benjamini-Hochberg adjustment
pvalueCutoff = 0.1
)
# Check if the result has any significant enrichment terms
if (nrow(as.data.frame(ego)) > 0) {
# Create the barplot
go_barplot <- barplot(ego, showCategory = 20) + # Show top 20 categories
ggtitle(paste("GO CC Enrichment:", label))
# Print the plot
ggsave(paste0(saveDir, "/", label,"/gp_CC_barplot_",label,".tiff"), device = "tiff",
plot=go_barplot, width=10, height = 10)
change_vec <- res$log2FoldChange
names(change_vec) <- rownames(res)
RYD = brewer.pal(n = 8, name = "RdBu")
go_network <- cnetplot(ego, foldChange=change_vec) +
scale_color_gradientn(colours = RYD, limits=c(-2,2))
ggsave(paste0(saveDir, "/", label,"/gp_CC_cnetplot_",label,".tiff"), device = "tiff",
plot=go_network, width=30, height = 30, bg = "white")
write.csv(as.data.frame(ego), paste0(saveDir, "/", label,
"/GO_CC_Enrichment_Results_", label,".csv"))
} else {
message("No significant CC GO terms found.")
}
} else {
message("No valid gene IDs found.")
}
return()
}
########################################################################################
GOBPEnrichment <- function(res, label) {
# Check if there are valid gene IDs
if (!is.null(res)) {
# Perform GO enrichment analysis
ego <- enrichGO(
gene = rownames(res),
OrgDb = org.Sgregaria.eg.db,
keyType = "GID",
ont = "BP", # Cellular Component
pAdjustMethod = "BH", # Benjamini-Hochberg adjustment
pvalueCutoff = 0.1
)
# Check if the result has any significant enrichment terms
if (nrow(as.data.frame(ego)) > 0) {
# Create the barplot
go_barplot <- barplot(ego, showCategory = 20) + # Show top 20 categories
ggtitle(paste("GO BP Enrichment:", label))
# Print the plot
ggsave(paste0(saveDir, "/", label,"/gp_BP_barplot_",label,".tiff"), device = "tiff",
plot=go_barplot, width=10, height = 10)
change_vec <- res$log2FoldChange
names(change_vec) <- rownames(res)
RYD = brewer.pal(n = 8, name = "RdBu")
go_network <- cnetplot(ego, foldChange=change_vec) +
scale_color_gradientn(colours = RYD, limits=c(-2,2))
ggsave(paste0(saveDir, "/", label,"/gp_BP_cnetplot_",label,".tiff"), device = "tiff",
plot=go_network, width=30, height = 30, bg="white")
write.csv(as.data.frame(ego), paste0(saveDir, "/", label,
"/GO_BP_Enrichment_Results_", label,".csv"))
} else {
message("No significant BP GO terms found.")
}
} else {
message("No valid gene IDs found.")
}
return()
}
########################################################################################
KEGGEnrichment <- function(res, label) {
# Check if there are valid gene IDs
if (!is.null(res)) {
kk <- enrichKEGG(gene = rownames(res),
organism = "sgre",
pvalueCutoff = 0.1)
# Check if the result has any significant enrichment terms
if (nrow(as.data.frame(kk)) > 0) {
kk_barplot <- barplot(kk) + ggtitle(paste("KEGG Enrichment:", label))
ggsave(paste0(saveDir, "/", label,"/kk_barplot_",label,".tiff"), device = "tiff",
plot=kk_barplot, width=10, height = 10)
change_vec <- res$log2FoldChange
names(change_vec) <- rownames(res)
RYD = brewer.pal(n = 8, name = "RdBu")
kk_network <- cnetplot(kk, foldChange=change_vec) +
scale_color_gradientn(colours = RYD, limits=c(-2,2))
ggsave(paste0(saveDir, "/", label,"/kk_cnetplot_",label,".tiff"), device = "tiff",
plot=kk_network, width=30, height = 30, bg = "white")
write.csv(as.data.frame(kk), paste0(saveDir, "/",label,
"/KEGG_Enrichment_Results_",
label,".csv"))
} else {
message("No significant KEGG terms found.")
}
} else {
message("No valid gene IDs found.")
}
return()
}
########################################################################################
enrich_data <- function(res, label, contrast_) {
sig_genes <- get_sig_genes(res)
create_output_dirs(label)
GOMFEnrichment(sig_genes, label)
GOBPEnrichment(sig_genes, label)
GOCCEnrichment(sig_genes, label)
KEGGEnrichment(sig_genes, label)
return()
}
########################################################################################
Minor changes here are made compared to the DESeq2 results regarding the importation of samples to transform into a matrix. Sample names are structured as follow: {Sg}{gene}{#} {Sg} = Schistocerca gregaria {gene} = gene abbreviation gfp, hex1, hex2, jhmt, miox and unch H/T{#} = biological replicate
saveDir <- paste0(workDir,"/DEG_results/RNAi/All")
dir.create(saveDir)
### Prepare Sample CSV file #####
samples <- read.delim(file.path(workDir, "list/RNAi/All_RNAisample_list.csv"), sep = ",", row.names = 1, header = TRUE)
files <- file.path(workDir, "readcounts/RNAi/", samples$Tissue, samples$Filename)
names(files) <- row.names(samples)
all(file.exists(files))
[1] TRUE
### Create count sample matrix
cts <- map_dfc(files, function(sample) {
data_count <- read.delim(sample, sep = "\t", header = FALSE)
col_name <- gsub("_counts.txt", "", basename(sample))
setNames(data.frame(data_count[, 2]), col_name)
})
row_get <- read.delim(files[1], sep = "\t", row.names = 1, header = F) # Get proper row names
rownames(cts) <- rownames(row_get)
rm(row_get) # remove unused object from memory
While for bulk RNAseq on head and thorax for all species, the DEGs model was made between isolated and crowded individuals (with isolated as the reference state), here, the DEG analysis will be carried between GFP knock-down nymphs (as reference state) vs Hexamerins / Juvenile Hormones / Uncharacterized proteins.
### Build DESeq2 Object
dds <- DESeqDataSetFromMatrix(countData = cts,
colData = samples,
design = ~ Gene)
dds$Gene <- relevel(dds$Gene, ref = "GFP")
smallestGroupSize <- 5
keep <- rowSums(counts(dds) >= 10) >= smallestGroupSize
dds <- dds[keep,]
dds <- DESeq(dds)
Following the generation of the DEseq2 object, we
annotate the genes with the GeneID using biomaRt.
### Fetch Annotation Gene IDs using biomaRt
ensembl <- useMart("metazoa_mart", host = "https://metazoa.ensembl.org")
metazoa_list <- listDatasets(ensembl)
dataset <- useMart("metazoa_mart", dataset = "sggca023897955v2rs_eg_gene",
host = "https://metazoa.ensembl.org")
#listAttributes(dataset)
test_raw_counts <- as.data.frame(counts(dds))
rownames(test_raw_counts) <- as.character(rownames(test_raw_counts))
test_raw_counts$ensembl_gene_id <- row.names(test_raw_counts)
annotations <- getBM(attributes = c("ensembl_gene_id", "geneid"),
filters = "ensembl_gene_id",
values = rownames(test_raw_counts),
mart = dataset)
# Merge dataframes to retain geneid information from biomaRt
test_raw_counts_annotated <- merge(test_raw_counts, annotations,
by = "ensembl_gene_id",
all.x = T)
write.csv(test_raw_counts_annotated, file=paste0(saveDir,"/All_raw_counts.csv"))
# Plot PCA and investigate quality metrics
vsd <- vst(dds, blind = TRUE)
# Perform PCA
pca_data <- plotPCA(vsd, intgroup = c("Tissue", "Gene"), returnData = TRUE)
# Define colors for genes (slightly transparent) and shapes for tissues
gene_colors <- scale_color_manual(values = alpha(brewer.pal(n = length(unique(pca_data$Gene)), name = "Set1"), 0.8)) # Points are transparent
tissue_shapes <- scale_shape_manual(values = seq(15, 15 + length(unique(pca_data$Tissue))))
# **PCA without labels**
p_pca_nolabel <- ggplot(pca_data, aes(x = PC1, y = PC2, color = Gene, shape = Tissue)) +
geom_point(size = 4) +
gene_colors +
tissue_shapes +
theme_bw() +
theme(legend.title = element_blank(),
legend.text = element_text(face = "bold", size = 14),
axis.text = element_text(size = 12),
axis.title = element_text(size = 14)) +
ggtitle("PCA of All Tissues (No Labels)", subtitle = "Tissues differentiated by shape, Genes by color")
# Save PCA without labels
ggsave(paste0(saveDir, "/PCA_Tissue_Gene_NoLabel.png"), plot = p_pca_nolabel, width = 10, height = 10, dpi = 600, device = "png")
# **PCA with labels**
p_pca_label <- ggplot(pca_data, aes(x = PC1, y = PC2, color = Gene, shape = Tissue)) +
geom_point(size = 4) +
geom_text_repel(aes(label = name), size = 4, color = "black", max.overlaps = 20) + # Labels are fully visible
gene_colors +
tissue_shapes +
theme_bw() +
theme(legend.title = element_blank(),
legend.text = element_text(face = "bold", size = 14),
axis.text = element_text(size = 12),
axis.title = element_text(size = 14)) +
ggtitle("PCA of All Tissues (With Labels)", subtitle = "Tissues differentiated by shape, Genes by color")
# Save PCA with labels
ggsave(paste0(saveDir, "/PCA_Tissue_Gene_Label.png"), plot = p_pca_label, width = 10, height = 10, dpi = 600, device = "png")
# **Return plots for knitr/RMarkdown**
list(NoLabel = p_pca_nolabel, WithLabel = p_pca_label)
$NoLabel

| Version | Author | Date |
|---|---|---|
| d7fa779 | Maeva TECHER | 2025-02-14 |
$WithLabel

| Version | Author | Date |
|---|---|---|
| d7fa779 | Maeva TECHER | 2025-02-14 |
The PCA plot shows clear distinction between tissue types, while gene silencing has a large variation within each tissue, and presents no distinct clear groupings for a single gene.
### SVA analysis to control for technical variation
dat <- counts(dds, normalized = TRUE)
idx <- rowMeans(dat) > 1
dat <- dat[idx, ]
mod <- model.matrix(~ Gene, colData(dds))
mod0 <- model.matrix(~ 1, colData(dds))
svseq <- svaseq(dat, mod, mod0)
Number of significant surrogate variables is: 3
Iteration (out of 5 ):1 2 3 4 5
sva_plots <- create_sva_plots(svseq, dds, saveDir, intgroup = c("Tissue", "Gene"))
# Show stripcharts in the report
sva_plots$Stripcharts[[1]] # Show first stripchart

| Version | Author | Date |
|---|---|---|
| d7fa779 | Maeva TECHER | 2025-02-14 |
sva_plots$Stripcharts[[2]] # Show second stripchart

| Version | Author | Date |
|---|---|---|
| d7fa779 | Maeva TECHER | 2025-02-14 |
sva_plots$Stripcharts[[3]] # Show third stripchart

| Version | Author | Date |
|---|---|---|
| d7fa779 | Maeva TECHER | 2025-02-14 |
# Show scatter plots in the report
sva_plots$ScatterPlots[["1_2"]] # Show SV1 vs SV2

| Version | Author | Date |
|---|---|---|
| d7fa779 | Maeva TECHER | 2025-02-14 |
sva_plots$ScatterPlots[["1_3"]] # Show SV1 vs SV3

| Version | Author | Date |
|---|---|---|
| d7fa779 | Maeva TECHER | 2025-02-14 |
sva_plots$ScatterPlots[["2_3"]] # Show SV2 vs SV3

| Version | Author | Date |
|---|---|---|
| d7fa779 | Maeva TECHER | 2025-02-14 |
SV1 is clearly showing an effect of tissue. We rerun the
DESeq2 model but this time including the surrogate variable
SV2 and SV3 as a covariates only, as we know that the modeled variation
is more likely explained by tissue and gene variation rather than batch
effects.
ddssva <- dds
ddssva$SV2 <- svseq$sv[,2]
ddssva$SV3 <- svseq$sv[,3]
design(ddssva) <- ~ SV2 + SV3 + Gene
ddssva$Gene <- relevel(ddssva$Gene, ref = "GFP")
smallestGroupSize <- 5
keep <- rowSums(counts(ddssva) >= 10) >= smallestGroupSize
ddssva <- ddssva[keep,]
ddssva <- DESeq(ddssva)
ddssva <- ddssva[which(mcols(ddssva)$betaConv),] # remove non converging rows
### Extract results
resultsNames(ddssva)
[1] "Intercept" "SV2" "SV3" "Gene_HEX1_vs_GFP"
[5] "Gene_HEX2_vs_GFP" "Gene_JHMT_vs_GFP" "Gene_MIOX_vs_GFP" "Gene_UNCH_vs_GFP"
hex1 <- results(ddssva, name = "Gene_HEX1_vs_GFP", alpha = 0.05)
hex2 <- results(ddssva, name = "Gene_HEX2_vs_GFP", alpha = 0.05)
jhmt <- results(ddssva, name = "Gene_JHMT_vs_GFP", alpha = 0.05)
miox <- results(ddssva, name = "Gene_MIOX_vs_GFP", alpha = 0.05)
unch <- results(ddssva, name = "Gene_UNCH_vs_GFP", alpha = 0.05)
First we create function to generate the plots we are interested to obtain and then run the whole pipeline for each gene.
# Define contrast_sets
hex1_samples <- c("SggfpH1","SggfpH2","SggfpH3","SggfpH4","SggfpH5",
"Sghex1H1","Sghex1H2","Sghex1H3","Sghex1H4","Sghex1H5",
"SggfpT1","SggfpT2","SggfpT3","SggfpT4","SggfpT5",
"Sghex1T1","Sghex1T2","Sghex1T3","Sghex1T4","Sghex1T5")
hex2_samples <- c("SggfpH1","SggfpH2","SggfpH3","SggfpH4","SggfpH5",
"Sghex2H1","Sghex2H2","Sghex2H3","Sghex2H4","Sghex2H5",
"SggfpT1","SggfpT2","SggfpT3","SggfpT4","SggfpT5",
"Sghex2T1","Sghex2T2","Sghex2T3","Sghex2T4","Sghex2T5")
jhmt_samples <- c("SggfpH1","SggfpH2","SggfpH3","SggfpH4","SggfpH5",
"SgjhmtH1","SgjhmtH2","SgjhmtH3","SgjhmtH4","SgjhmtH5",
"SggfpT1","SggfpT2","SggfpT3","SggfpT4","SggfpT5",
"SgjhmtT1","SgjhmtT2","SgjhmtT3","SgjhmtT4","SgjhmtT5")
miox_samples <- c("SggfpH1","SggfpH2","SggfpH3","SggfpH4","SggfpH5",
"SgmioxH1","SgmioxH2","SgmioxH3","SgmioxH4","SgmioxH5",
"SggfpT1","SggfpT2","SggfpT3","SggfpT4","SggfpT5",
"SgmioxT1","SgmioxT2","SgmioxT3","SgmioxT4","SgmioxT5")
unch_samples <- c("SggfpH1","SggfpH2","SggfpH3","SggfpH4","SggfpH5",
"SgunchH1","SgunchH2","SgunchH3","SgunchH4","SgunchH5",
"SggfpT1","SggfpT2","SggfpT3","SggfpT4","SggfpT5",
"SgunchT1","SgunchT2","SgunchT3","SgunchT4","SgunchT5")
# Run full analysis
hex1_plots <- visualize_data(hex1, "Hex1_vs_GFP", hex1_samples)
hex2_plots <- visualize_data(hex2, "Hex2_vs_GFP", hex2_samples)
jhmt_plots <- visualize_data(jhmt, "JHMT_vs_GFP", jhmt_samples)
miox_plots <- visualize_data(miox, "MIOX_vs_GFP", miox_samples)
unch_plots <- visualize_data(unch, "UNCH_vs_GFP", unch_samples)
hex1_plots$volcano; hex1_plots$heatmap

| Version | Author | Date |
|---|---|---|
| d7fa779 | Maeva TECHER | 2025-02-14 |

| Version | Author | Date |
|---|---|---|
| d7fa779 | Maeva TECHER | 2025-02-14 |
hex2_plots$volcano; hex2_plots$heatmap

| Version | Author | Date |
|---|---|---|
| d7fa779 | Maeva TECHER | 2025-02-14 |

| Version | Author | Date |
|---|---|---|
| d7fa779 | Maeva TECHER | 2025-02-14 |
jhmt_plots$volcano; jhmt_plots$heatmap

| Version | Author | Date |
|---|---|---|
| d7fa779 | Maeva TECHER | 2025-02-14 |

| Version | Author | Date |
|---|---|---|
| d7fa779 | Maeva TECHER | 2025-02-14 |
miox_plots$volcano; miox_plots$heatmap

| Version | Author | Date |
|---|---|---|
| d7fa779 | Maeva TECHER | 2025-02-14 |

| Version | Author | Date |
|---|---|---|
| d7fa779 | Maeva TECHER | 2025-02-14 |
unch_plots$volcano; unch_plots$heatmap

| Version | Author | Date |
|---|---|---|
| d7fa779 | Maeva TECHER | 2025-02-14 |

| Version | Author | Date |
|---|---|---|
| d7fa779 | Maeva TECHER | 2025-02-14 |
# Generate tables for each contrast
table_hex1 <- generate_deg_table(ddssva, "Gene_HEX1_vs_GFP", allspecies_df)
out of 16363 with nonzero total read count
adjusted p-value < 0.05
LFC > 0 (up) : 896, 5.5%
LFC < 0 (down) : 993, 6.1%
outliers [1] : 0, 0%
low counts [2] : 1269, 7.8%
(mean count < 9)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results
table_hex2 <- generate_deg_table(ddssva, "Gene_HEX2_vs_GFP", allspecies_df)
out of 16363 with nonzero total read count
adjusted p-value < 0.05
LFC > 0 (up) : 522, 3.2%
LFC < 0 (down) : 541, 3.3%
outliers [1] : 0, 0%
low counts [2] : 1587, 9.7%
(mean count < 11)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results
table_jhmt <- generate_deg_table(ddssva, "Gene_JHMT_vs_GFP", allspecies_df)
out of 16363 with nonzero total read count
adjusted p-value < 0.05
LFC > 0 (up) : 371, 2.3%
LFC < 0 (down) : 672, 4.1%
outliers [1] : 0, 0%
low counts [2] : 318, 1.9%
(mean count < 6)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results
table_miox <- generate_deg_table(ddssva, "Gene_MIOX_vs_GFP", allspecies_df)
out of 16363 with nonzero total read count
adjusted p-value < 0.05
LFC > 0 (up) : 71, 0.43%
LFC < 0 (down) : 131, 0.8%
outliers [1] : 0, 0%
low counts [2] : 0, 0%
(mean count < 1)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results
table_unch <- generate_deg_table(ddssva, "Gene_UNCH_vs_GFP", allspecies_df)
out of 16363 with nonzero total read count
adjusted p-value < 0.05
LFC > 0 (up) : 213, 1.3%
LFC < 0 (down) : 337, 2.1%
outliers [1] : 0, 0%
low counts [2] : 0, 0%
(mean count < 1)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results
table_hex1$kable_table
table_hex2$kable_table
table_jhmt$kable_table
table_miox$kable_table
table_unch$kable_table
Minor changes here are made compared to the DESeq2 results regarding the importation of samples to transform into a matrix. Sample names are structured as follow: {Sg}{gene}{#} {Sg} = Schistocerca gregaria {gene} = gene abbreviation gfp, hex1, hex2, jhmt, miox and unch H{#} = biological replicate
saveDir <- paste0(workDir,"/DEG_results/RNAi/Head")
dir.create(saveDir)
### Prepare Sample CSV file #####
samples <- read.delim(file.path(workDir, "list/RNAi/Head_RNAisample_list.csv"), sep = ",", row.names = 1, header = TRUE)
files <- file.path(workDir, "readcounts/RNAi/", samples$Tissue, samples$Filename)
names(files) <- row.names(samples)
all(file.exists(files))
[1] TRUE
### Create count sample matrix
cts <- map_dfc(files, function(sample) {
data_count <- read.delim(sample, sep = "\t", header = FALSE)
col_name <- gsub("_counts.txt", "", basename(sample))
setNames(data.frame(data_count[, 2]), col_name)
})
row_get <- read.delim(files[1], sep = "\t", row.names = 1, header = F) # Get proper row names
rownames(cts) <- rownames(row_get)
rm(row_get) # remove unused object from memory
While for bulk RNAseq on head and thorax for all species, the DEGs model was made between isolated and crowded individuals (with isolated as the reference state), here, the DEG analysis will be carried between GFP knock-down nymphs (as reference state) vs Hexamerins / Juvenile Hormones / Uncharacterized proteins.
### Build DESeq2 Object
dds <- DESeqDataSetFromMatrix(countData = cts,
colData = samples,
design = ~ Gene)
dds$Gene <- relevel(dds$Gene, ref = "GFP")
smallestGroupSize <- 5
keep <- rowSums(counts(dds) >= 10) >= smallestGroupSize
dds <- dds[keep,]
dds <- DESeq(dds)
Following the generation of the DEseq2 object, we
annotate the genes with the GeneID using biomaRt.
### Fetch Annotation Gene IDs using biomaRt
ensembl <- useMart("metazoa_mart", host = "https://metazoa.ensembl.org")
metazoa_list <- listDatasets(ensembl)
dataset <- useMart("metazoa_mart", dataset = "sggca023897955v2rs_eg_gene",
host = "https://metazoa.ensembl.org")
#listAttributes(dataset)
test_raw_counts <- as.data.frame(counts(dds))
rownames(test_raw_counts) <- as.character(rownames(test_raw_counts))
test_raw_counts$ensembl_gene_id <- row.names(test_raw_counts)
annotations <- getBM(attributes = c("ensembl_gene_id", "geneid"),
filters = "ensembl_gene_id",
values = rownames(test_raw_counts),
mart = dataset)
# Merge dataframes to retain geneid information from biomaRt
test_raw_counts_annotated <- merge(test_raw_counts, annotations,
by = "ensembl_gene_id",
all.x = T)
write.csv(test_raw_counts_annotated, file=paste0(saveDir,"/Head_raw_counts.csv"))
# Plot PCA and investigate quality metrics
vsd <- vst(dds, blind=T)
pca_results <- create_pca_plots(norm.dds = vsd, saveDir, transformation = "vst", intgroup = "Gene")
pca_results$PCA_Labelled

pca_results$PCA_Hull

### SVA analysis to control for technical variation
dat <- counts(dds, normalized = TRUE)
idx <- rowMeans(dat) > 1
dat <- dat[idx, ]
mod <- model.matrix(~ Gene, colData(dds))
mod0 <- model.matrix(~ 1, colData(dds))
svseq <- svaseq(dat, mod, mod0)
Number of significant surrogate variables is: 3
Iteration (out of 5 ):1 2 3 4 5
sva_plots <- create_sva_plots(svseq, dds, saveDir, intgroup = c("Tissue", "Gene"))
# Show stripcharts in the report
sva_plots$Stripcharts[[1]] # Show first stripchart

sva_plots$Stripcharts[[2]] # Show second stripchart

sva_plots$Stripcharts[[3]] # Show third stripchart

| Version | Author | Date |
|---|---|---|
| d7fa779 | Maeva TECHER | 2025-02-14 |
# Show scatter plots in the report
sva_plots$ScatterPlots[["1_2"]] # Show SV1 vs SV2

| Version | Author | Date |
|---|---|---|
| d7fa779 | Maeva TECHER | 2025-02-14 |
sva_plots$ScatterPlots[["1_3"]] # Show SV1 vs SV3

| Version | Author | Date |
|---|---|---|
| d7fa779 | Maeva TECHER | 2025-02-14 |
sva_plots$ScatterPlots[["2_3"]] # Show SV2 vs SV3

| Version | Author | Date |
|---|---|---|
| d7fa779 | Maeva TECHER | 2025-02-14 |
We rerun the DESeq2 model but this time including the
surrogate variable as a covariate, as we know that the modeled variation
is more likely explained by tissue and gene variation rather than batch
effects.
ddssva <- dds
ddssva$SV1 <- svseq$sv[,1]
ddssva$SV2 <- svseq$sv[,2]
ddssva$SV3 <- svseq$sv[,3]
design(ddssva) <- ~ SV1 + SV2 + SV3 + Gene
ddssva$Gene <- relevel(ddssva$Gene, ref = "GFP")
smallestGroupSize <- 5
keep <- rowSums(counts(ddssva) >= 10) >= smallestGroupSize
ddssva <- ddssva[keep,]
ddssva <- DESeq(ddssva)
ddssva <- ddssva[which(mcols(ddssva)$betaConv),] # remove non converging rows
### Extract results
resultsNames(ddssva)
[1] "Intercept" "SV1" "SV2" "SV3"
[5] "Gene_HEX1_vs_GFP" "Gene_HEX2_vs_GFP" "Gene_JHMT_vs_GFP" "Gene_MIOX_vs_GFP"
[9] "Gene_UNCH_vs_GFP"
hex1 <- results(ddssva, name = "Gene_HEX1_vs_GFP", alpha = 0.05)
hex2 <- results(ddssva, name = "Gene_HEX2_vs_GFP", alpha = 0.05)
jhmt <- results(ddssva, name = "Gene_JHMT_vs_GFP", alpha = 0.05)
miox <- results(ddssva, name = "Gene_MIOX_vs_GFP", alpha = 0.05)
unch <- results(ddssva, name = "Gene_UNCH_vs_GFP", alpha = 0.05)
First we create function to generate the plots we are interested to obtain and then run the whole pipeline for each gene.
# Define contrast_sets
hex1_samples <- c("SggfpH1","SggfpH2","SggfpH3","SggfpH4","SggfpH5",
"Sghex1H1","Sghex1H2","Sghex1H3","Sghex1H4","Sghex1H5")
hex2_samples <- c("SggfpH1","SggfpH2","SggfpH3","SggfpH4","SggfpH5",
"Sghex2H1","Sghex2H2","Sghex2H3","Sghex2H4","Sghex2H5")
jhmt_samples <- c("SggfpH1","SggfpH2","SggfpH3","SggfpH4","SggfpH5",
"SgjhmtH1","SgjhmtH2","SgjhmtH3","SgjhmtH4","SgjhmtH5")
miox_samples <- c("SggfpH1","SggfpH2","SggfpH3","SggfpH4","SggfpH5",
"SgmioxH1","SgmioxH2","SgmioxH3","SgmioxH4","SgmioxH5")
unch_samples <- c("SggfpH1","SggfpH2","SggfpH3","SggfpH4","SggfpH5",
"SgunchH1","SgunchH2","SgunchH3","SgunchH4","SgunchH5")
# Run full analysis
hex1_plots <- visualize_data(hex1, "Hex1_vs_GFP", hex1_samples)
hex2_plots <- visualize_data(hex2, "Hex2_vs_GFP", hex2_samples)
jhmt_plots <- visualize_data(jhmt, "JHMT_vs_GFP", jhmt_samples)
miox_plots <- visualize_data(miox, "MIOX_vs_GFP", miox_samples)
unch_plots <- visualize_data(unch, "UNCH_vs_GFP", unch_samples)
hex1_plots$volcano; hex1_plots$heatmap

| Version | Author | Date |
|---|---|---|
| d7fa779 | Maeva TECHER | 2025-02-14 |

| Version | Author | Date |
|---|---|---|
| d7fa779 | Maeva TECHER | 2025-02-14 |
hex2_plots$volcano; hex2_plots$heatmap

| Version | Author | Date |
|---|---|---|
| d7fa779 | Maeva TECHER | 2025-02-14 |

| Version | Author | Date |
|---|---|---|
| d7fa779 | Maeva TECHER | 2025-02-14 |
jhmt_plots$volcano; jhmt_plots$heatmap

| Version | Author | Date |
|---|---|---|
| d7fa779 | Maeva TECHER | 2025-02-14 |

| Version | Author | Date |
|---|---|---|
| d7fa779 | Maeva TECHER | 2025-02-14 |
miox_plots$volcano; miox_plots$heatmap

| Version | Author | Date |
|---|---|---|
| d7fa779 | Maeva TECHER | 2025-02-14 |

| Version | Author | Date |
|---|---|---|
| d7fa779 | Maeva TECHER | 2025-02-14 |
unch_plots$volcano; unch_plots$heatmap

| Version | Author | Date |
|---|---|---|
| d7fa779 | Maeva TECHER | 2025-02-14 |

| Version | Author | Date |
|---|---|---|
| d7fa779 | Maeva TECHER | 2025-02-14 |
# Generate tables for each contrast
table_hex1 <- generate_deg_table(ddssva, "Gene_HEX1_vs_GFP", allspecies_df)
out of 15915 with nonzero total read count
adjusted p-value < 0.05
LFC > 0 (up) : 430, 2.7%
LFC < 0 (down) : 557, 3.5%
outliers [1] : 0, 0%
low counts [2] : 1852, 12%
(mean count < 17)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results
table_hex2 <- generate_deg_table(ddssva, "Gene_HEX2_vs_GFP", allspecies_df)
out of 15915 with nonzero total read count
adjusted p-value < 0.05
LFC > 0 (up) : 143, 0.9%
LFC < 0 (down) : 370, 2.3%
outliers [1] : 0, 0%
low counts [2] : 4011, 25%
(mean count < 50)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results
table_jhmt <- generate_deg_table(ddssva, "Gene_JHMT_vs_GFP", allspecies_df)
out of 15915 with nonzero total read count
adjusted p-value < 0.05
LFC > 0 (up) : 79, 0.5%
LFC < 0 (down) : 104, 0.65%
outliers [1] : 0, 0%
low counts [2] : 1543, 9.7%
(mean count < 14)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results
table_miox <- generate_deg_table(ddssva, "Gene_MIOX_vs_GFP", allspecies_df)
out of 15915 with nonzero total read count
adjusted p-value < 0.05
LFC > 0 (up) : 60, 0.38%
LFC < 0 (down) : 123, 0.77%
outliers [1] : 0, 0%
low counts [2] : 1235, 7.8%
(mean count < 12)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results
table_unch <- generate_deg_table(ddssva, "Gene_UNCH_vs_GFP", allspecies_df)
out of 15915 with nonzero total read count
adjusted p-value < 0.05
LFC > 0 (up) : 1461, 9.2%
LFC < 0 (down) : 2252, 14%
outliers [1] : 0, 0%
low counts [2] : 0, 0%
(mean count < 3)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results
table_hex1$kable_table
table_hex2$kable_table
table_jhmt$kable_table
table_miox$kable_table
table_unch$kable_table
#enrich_data(hex1, "Hex1_vs_GFP", hex1_samples)
#enrich_data(hex2, "Hex2_vs_GFP", hex2_samples)
#enrich_data(jhmt, "JHMT_vs_GFP", jhmt_samples)
#enrich_data(miox, "MIOX_vs_GFP", miox_samples)
#enrich_data(unch, "UNCH_vs_GFP", unch_samples)
Minor changes here are made compared to the DESeq2 results regarding the importation of samples to transform into a matrix. Sample names are structured as follow: {Sg}{gene}{#} {Sg} = Schistocerca gregaria {gene} = gene abbreviation gfp, hex1, hex2, jhmt, miox and unch T{#} = biological replicate
saveDir <- paste0(workDir,"/DEG_results/RNAi/Thorax")
dir.create(saveDir)
### Prepare Sample CSV file #####
samples <- read.delim(file.path(workDir, "list/RNAi/Thorax_RNAisample_list.csv"), sep = ",", row.names = 1, header = TRUE)
files <- file.path(workDir, "readcounts/RNAi/", samples$Tissue, samples$Filename)
names(files) <- row.names(samples)
all(file.exists(files))
[1] TRUE
### Create count sample matrix
cts <- map_dfc(files, function(sample) {
data_count <- read.delim(sample, sep = "\t", header = FALSE)
col_name <- gsub("_counts.txt", "", basename(sample))
setNames(data.frame(data_count[, 2]), col_name)
})
row_get <- read.delim(files[1], sep = "\t", row.names = 1, header = F) # Get proper row names
rownames(cts) <- rownames(row_get)
rm(row_get) # remove unused object from memory
While for bulk RNAseq on head and thorax for all species, the DEGs model was made between isolated and crowded individuals (with isolated as the reference state), here, the DEG analysis will be carried between GFP knock-down nymphs (as reference state) vs Hexamerins / Juvenile Hormones / Uncharacterized proteins.
### Build DESeq2 Object
dds <- DESeqDataSetFromMatrix(countData = cts,
colData = samples,
design = ~ Gene)
dds$Gene <- relevel(dds$Gene, ref = "GFP")
smallestGroupSize <- 5
keep <- rowSums(counts(dds) >= 10) >= smallestGroupSize
dds <- dds[keep,]
dds <- DESeq(dds)
Following the generation of the DEseq2 object, we
annotate the genes with the GeneID using biomaRt.
### Fetch Annotation Gene IDs using biomaRt
ensembl <- useMart("metazoa_mart", host = "https://metazoa.ensembl.org")
metazoa_list <- listDatasets(ensembl)
dataset <- useMart("metazoa_mart", dataset = "sggca023897955v2rs_eg_gene",
host = "https://metazoa.ensembl.org")
#listAttributes(dataset)
test_raw_counts <- as.data.frame(counts(dds))
rownames(test_raw_counts) <- as.character(rownames(test_raw_counts))
test_raw_counts$ensembl_gene_id <- row.names(test_raw_counts)
annotations <- getBM(attributes = c("ensembl_gene_id", "geneid"),
filters = "ensembl_gene_id",
values = rownames(test_raw_counts),
mart = dataset)
# Merge dataframes to retain geneid information from biomaRt
test_raw_counts_annotated <- merge(test_raw_counts, annotations,
by = "ensembl_gene_id",
all.x = T)
write.csv(test_raw_counts_annotated, file=paste0(saveDir,"/Thorax_raw_counts.csv"))
# Plot PCA and investigate quality metrics
vsd <- vst(dds, blind=T)
pca_results <- create_pca_plots(norm.dds = vsd, saveDir, transformation = "vst", intgroup = "Gene")
pca_results$PCA_Labelled

pca_results$PCA_Hull

### SVA analysis to control for technical variation
dat <- counts(dds, normalized = TRUE)
idx <- rowMeans(dat) > 1
dat <- dat[idx, ]
mod <- model.matrix(~ Gene, colData(dds))
mod0 <- model.matrix(~ 1, colData(dds))
svseq <- svaseq(dat, mod, mod0)
Number of significant surrogate variables is: 4
Iteration (out of 5 ):1 2 3 4 5
sva_plots <- create_sva_plots(svseq, dds, saveDir, intgroup = c("Tissue", "Gene"))
# Show stripcharts in the report
sva_plots$Stripcharts[[1]] # Show first stripchart

sva_plots$Stripcharts[[2]] # Show second stripchart

sva_plots$Stripcharts[[3]] # Show third stripchart

| Version | Author | Date |
|---|---|---|
| d7fa779 | Maeva TECHER | 2025-02-14 |
# Show scatter plots in the report
sva_plots$ScatterPlots[["1_2"]] # Show SV1 vs SV2

| Version | Author | Date |
|---|---|---|
| d7fa779 | Maeva TECHER | 2025-02-14 |
sva_plots$ScatterPlots[["1_3"]] # Show SV1 vs SV3

| Version | Author | Date |
|---|---|---|
| d7fa779 | Maeva TECHER | 2025-02-14 |
sva_plots$ScatterPlots[["2_3"]] # Show SV2 vs SV3

| Version | Author | Date |
|---|---|---|
| d7fa779 | Maeva TECHER | 2025-02-14 |
We rerun the DESeq2 model but this time including the
surrogate variable as a covariate, as we know that the modeled variation
is more likely explained by tissue and gene variation rather than batch
effects.
ddssva <- dds
ddssva$SV1 <- svseq$sv[,1]
ddssva$SV2 <- svseq$sv[,2]
ddssva$SV3 <- svseq$sv[,3]
design(ddssva) <- ~ SV1 + SV2 + SV3 + Gene
ddssva$Gene <- relevel(ddssva$Gene, ref = "GFP")
smallestGroupSize <- 5
keep <- rowSums(counts(ddssva) >= 10) >= smallestGroupSize
ddssva <- ddssva[keep,]
ddssva <- DESeq(ddssva)
ddssva <- ddssva[which(mcols(ddssva)$betaConv),] # remove non converging rows
### Extract results
resultsNames(ddssva)
[1] "Intercept" "SV1" "SV2" "SV3"
[5] "Gene_HEX1_vs_GFP" "Gene_HEX2_vs_GFP" "Gene_JHMT_vs_GFP" "Gene_MIOX_vs_GFP"
[9] "Gene_UNCH_vs_GFP"
hex1 <- results(ddssva, name = "Gene_HEX1_vs_GFP", alpha = 0.05)
hex2 <- results(ddssva, name = "Gene_HEX2_vs_GFP", alpha = 0.05)
jhmt <- results(ddssva, name = "Gene_JHMT_vs_GFP", alpha = 0.05)
miox <- results(ddssva, name = "Gene_MIOX_vs_GFP", alpha = 0.05)
unch <- results(ddssva, name = "Gene_UNCH_vs_GFP", alpha = 0.05)
First we create function to generate the plots we are interested to obtain and then run the whole pipeline for each gene.
# Define contrast_sets
hex1_samples <- c("SggfpT1","SggfpT2","SggfpT3","SggfpT4","SggfpT5",
"Sghex1T1","Sghex1T2","Sghex1T3","Sghex1T4","Sghex1T5")
hex2_samples <- c("SggfpT1","SggfpT2","SggfpT3","SggfpT4","SggfpT5",
"Sghex2T1","Sghex2T2","Sghex2T3","Sghex2T4","Sghex2T5")
jhmt_samples <- c("SggfpT1","SggfpT2","SggfpT3","SggfpT4","SggfpT5",
"SgjhmtT1","SgjhmtT2","SgjhmtT3","SgjhmtT4","SgjhmtT5")
miox_samples <- c("SggfpT1","SggfpT2","SggfpT3","SggfpT4","SggfpT5",
"SgmioxT1","SgmioxT2","SgmioxT3","SgmioxT4","SgmioxT5")
unch_samples <- c("SggfpT1","SggfpT2","SggfpT3","SggfpT4","SggfpT5",
"SgunchT1","SgunchT2","SgunchT3","SgunchT4","SgunchT5")
# Run full analysis
hex1_plots <- visualize_data(hex1, "Hex1_vs_GFP", hex1_samples)
hex2_plots <- visualize_data(hex2, "Hex2_vs_GFP", hex2_samples)
jhmt_plots <- visualize_data(jhmt, "JHMT_vs_GFP", jhmt_samples)
miox_plots <- visualize_data(miox, "MIOX_vs_GFP", miox_samples)
unch_plots <- visualize_data(unch, "UNCH_vs_GFP", unch_samples)
hex1_plots$volcano; hex1_plots$heatmap

| Version | Author | Date |
|---|---|---|
| d7fa779 | Maeva TECHER | 2025-02-14 |

| Version | Author | Date |
|---|---|---|
| d7fa779 | Maeva TECHER | 2025-02-14 |
hex2_plots$volcano; hex2_plots$heatmap

| Version | Author | Date |
|---|---|---|
| d7fa779 | Maeva TECHER | 2025-02-14 |

| Version | Author | Date |
|---|---|---|
| d7fa779 | Maeva TECHER | 2025-02-14 |
jhmt_plots$volcano; jhmt_plots$heatmap

| Version | Author | Date |
|---|---|---|
| d7fa779 | Maeva TECHER | 2025-02-14 |

| Version | Author | Date |
|---|---|---|
| d7fa779 | Maeva TECHER | 2025-02-14 |
miox_plots$volcano; miox_plots$heatmap

| Version | Author | Date |
|---|---|---|
| d7fa779 | Maeva TECHER | 2025-02-14 |

| Version | Author | Date |
|---|---|---|
| d7fa779 | Maeva TECHER | 2025-02-14 |
unch_plots$volcano; unch_plots$heatmap

| Version | Author | Date |
|---|---|---|
| d7fa779 | Maeva TECHER | 2025-02-14 |

| Version | Author | Date |
|---|---|---|
| d7fa779 | Maeva TECHER | 2025-02-14 |
# Generate tables for each contrast
table_hex1 <- generate_deg_table(ddssva, "Gene_HEX1_vs_GFP", allspecies_df)
out of 15462 with nonzero total read count
adjusted p-value < 0.05
LFC > 0 (up) : 436, 2.8%
LFC < 0 (down) : 598, 3.9%
outliers [1] : 0, 0%
low counts [2] : 2998, 19%
(mean count < 27)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results
table_hex2 <- generate_deg_table(ddssva, "Gene_HEX2_vs_GFP", allspecies_df)
out of 15462 with nonzero total read count
adjusted p-value < 0.05
LFC > 0 (up) : 224, 1.4%
LFC < 0 (down) : 149, 0.96%
outliers [1] : 0, 0%
low counts [2] : 2399, 16%
(mean count < 21)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results
table_jhmt <- generate_deg_table(ddssva, "Gene_JHMT_vs_GFP", allspecies_df)
out of 15462 with nonzero total read count
adjusted p-value < 0.05
LFC > 0 (up) : 40, 0.26%
LFC < 0 (down) : 40, 0.26%
outliers [1] : 0, 0%
low counts [2] : 3298, 21%
(mean count < 31)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results
table_miox <- generate_deg_table(ddssva, "Gene_MIOX_vs_GFP", allspecies_df)
out of 15462 with nonzero total read count
adjusted p-value < 0.05
LFC > 0 (up) : 226, 1.5%
LFC < 0 (down) : 230, 1.5%
outliers [1] : 0, 0%
low counts [2] : 2998, 19%
(mean count < 27)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results
table_unch <- generate_deg_table(ddssva, "Gene_UNCH_vs_GFP", allspecies_df)
out of 15462 with nonzero total read count
adjusted p-value < 0.05
LFC > 0 (up) : 256, 1.7%
LFC < 0 (down) : 251, 1.6%
outliers [1] : 0, 0%
low counts [2] : 1499, 9.7%
(mean count < 13)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results
table_hex1$kable_table
table_hex2$kable_table
table_jhmt$kable_table
table_miox$kable_table
table_unch$kable_table
#enrich_data(hex1, "Hex1_vs_GFP", hex1_samples)
#enrich_data(hex2, "Hex2_vs_GFP", hex2_samples)
#enrich_data(jhmt, "JHMT_vs_GFP", jhmt_samples)
#enrich_data(miox, "MIOX_vs_GFP", miox_samples)
#enrich_data(unch, "UNCH_vs_GFP", unch_samples)
Minor changes here are made compared to the DESeq2 results regarding the importation of samples to transform into a matrix. Sample names are structured as follow: {Sg}{gene}{#} {Sg} = Schistocerca gregaria {gene} = gene abbreviation gfp, hex1, hex2, jhmt, miox and unch H/T{#} = biological replicate
saveDir <- paste0(workDir,"/DEG_results/RNAi/All_no_rRNA")
dir.create(saveDir)
### Prepare Sample CSV file #####
samples <- read.delim(file.path(workDir, "list/RNAi/All_RNAisample_list.csv"), sep = ",", row.names = 1, header = TRUE)
files <- file.path(workDir, "readcounts/RNAi/", samples$Tissue, samples$Filename)
names(files) <- row.names(samples)
all(file.exists(files))
[1] TRUE
### Create count sample matrix
cts <- map_dfc(files, function(sample) {
data_count <- read.delim(sample, sep = "\t", header = FALSE)
col_name <- gsub("_counts.txt", "", basename(sample))
setNames(data.frame(data_count[, 2]), col_name)
})
row_get <- read.delim(files[1], sep = "\t", row.names = 1, header = F) # Get proper row names
rownames(cts) <- rownames(row_get)
rm(row_get) # remove unused object from memory
While for bulk RNAseq on head and thorax for all species, the DEGs model was made between isolated and crowded individuals (with isolated as the reference state), here, the DEG analysis will be carried between GFP knock-down nymphs (as reference state) vs Hexamerins / Juvenile Hormones / Uncharacterized proteins.
### Build DESeq2 Object
dds <- DESeqDataSetFromMatrix(countData = cts,
colData = samples,
design = ~ Gene)
dds$Gene <- relevel(dds$Gene, ref = "GFP")
smallestGroupSize <- 5
keep <- rowSums(counts(dds) >= 10) >= smallestGroupSize
dds <- dds[keep,]
loci_to_exclude <- readLines(file.path(workDir, "list/excluded_loci/gregaria_rrna_list.txt"))
dds <- dds[!(rownames(dds) %in% loci_to_exclude), ]
dds <- DESeq(dds)
cbind(resultsNames(dds))
[,1]
[1,] "Intercept"
[2,] "Gene_HEX1_vs_GFP"
[3,] "Gene_HEX2_vs_GFP"
[4,] "Gene_JHMT_vs_GFP"
[5,] "Gene_MIOX_vs_GFP"
[6,] "Gene_UNCH_vs_GFP"
Following the generation of the DEseq2 object, we
annotate the genes with the GeneID using biomaRt.
### Fetch Annotation Gene IDs using biomaRt
ensembl <- useMart("metazoa_mart", host = "https://metazoa.ensembl.org")
metazoa_list <- listDatasets(ensembl)
dataset <- useMart("metazoa_mart", dataset = "sggca023897955v2rs_eg_gene",
host = "https://metazoa.ensembl.org")
#listAttributes(dataset)
test_raw_counts <- as.data.frame(counts(dds))
rownames(test_raw_counts) <- as.character(rownames(test_raw_counts))
test_raw_counts$ensembl_gene_id <- row.names(test_raw_counts)
annotations <- getBM(attributes = c("ensembl_gene_id", "geneid"),
filters = "ensembl_gene_id",
values = rownames(test_raw_counts),
mart = dataset)
# Merge dataframes to retain geneid information from biomaRt
test_raw_counts_annotated <- merge(test_raw_counts, annotations,
by = "ensembl_gene_id",
all.x = T)
write.csv(test_raw_counts_annotated, file=paste0(saveDir,"/All_raw_counts.csv"))
# Plot PCA and investigate quality metrics
vsd <- vst(dds, blind = TRUE)
# Perform PCA
pca_data <- plotPCA(vsd, intgroup = c("Tissue", "Gene"), returnData = TRUE)
# Define colors for genes (slightly transparent) and shapes for tissues
gene_colors <- scale_color_manual(values = alpha(brewer.pal(n = length(unique(pca_data$Gene)), name = "Set1"), 0.8)) # Points are transparent
tissue_shapes <- scale_shape_manual(values = seq(15, 15 + length(unique(pca_data$Tissue))))
# **PCA without labels**
p_pca_nolabel <- ggplot(pca_data, aes(x = PC1, y = PC2, color = Gene, shape = Tissue)) +
geom_point(size = 4) +
gene_colors +
tissue_shapes +
theme_bw() +
theme(legend.title = element_blank(),
legend.text = element_text(face = "bold", size = 14),
axis.text = element_text(size = 12),
axis.title = element_text(size = 14)) +
ggtitle("PCA of All Tissues (No Labels)", subtitle = "Tissues differentiated by shape, Genes by color")
# Save PCA without labels
ggsave(paste0(saveDir, "/PCA_Tissue_Gene_NoLabel.png"), plot = p_pca_nolabel, width = 10, height = 10, dpi = 600, device = "png")
# **PCA with labels**
p_pca_label <- ggplot(pca_data, aes(x = PC1, y = PC2, color = Gene, shape = Tissue)) +
geom_point(size = 4) +
geom_text_repel(aes(label = name), size = 4, color = "black", max.overlaps = 20) + # Labels are fully visible
gene_colors +
tissue_shapes +
theme_bw() +
theme(legend.title = element_blank(),
legend.text = element_text(face = "bold", size = 14),
axis.text = element_text(size = 12),
axis.title = element_text(size = 14)) +
ggtitle("PCA of All Tissues (With Labels)", subtitle = "Tissues differentiated by shape, Genes by color")
# Save PCA with labels
ggsave(paste0(saveDir, "/PCA_Tissue_Gene_Label.png"), plot = p_pca_label, width = 10, height = 10, dpi = 600, device = "png")
# **Return plots for knitr/RMarkdown**
list(NoLabel = p_pca_nolabel, WithLabel = p_pca_label)
$NoLabel

| Version | Author | Date |
|---|---|---|
| d7fa779 | Maeva TECHER | 2025-02-14 |
$WithLabel

| Version | Author | Date |
|---|---|---|
| d7fa779 | Maeva TECHER | 2025-02-14 |
The PCA plot shows clear distinction between tissue types, while gene silencing has a large variation within each tissue, and presents no distinct clear groupings for a single gene.
### SVA analysis to control for technical variation
dat <- counts(dds, normalized = TRUE)
idx <- rowMeans(dat) > 1
dat <- dat[idx, ]
mod <- model.matrix(~ Gene, colData(dds))
mod0 <- model.matrix(~ 1, colData(dds))
svseq <- svaseq(dat, mod, mod0)
Number of significant surrogate variables is: 2
Iteration (out of 5 ):1 2 3 4 5
#sva_plots <- create_sva_plots(svseq, dds, saveDir, intgroup = c("Tissue", "Gene"))
# Show stripcharts in the report
sva_plots$Stripcharts[[1]] # Show first stripchart

| Version | Author | Date |
|---|---|---|
| d7fa779 | Maeva TECHER | 2025-02-14 |
sva_plots$Stripcharts[[2]] # Show second stripchart

| Version | Author | Date |
|---|---|---|
| d7fa779 | Maeva TECHER | 2025-02-14 |
# Show scatter plots in the report
sva_plots$ScatterPlots[["1_2"]] # Show SV1 vs SV2

| Version | Author | Date |
|---|---|---|
| d7fa779 | Maeva TECHER | 2025-02-14 |
SV1 is clearly showing an effect of tissue. We rerun the
DESeq2 model but this time including the surrogate variable
SV2 and SV3 as a covariates only, as we know that the modeled variation
is more likely explained by tissue and gene variation rather than batch
effects.
ddssva <- dds
ddssva$SV2 <- svseq$sv[,2]
design(ddssva) <- ~ SV2 + Gene
ddssva$Gene <- relevel(ddssva$Gene, ref = "GFP")
smallestGroupSize <- 5
keep <- rowSums(counts(ddssva) >= 10) >= smallestGroupSize
ddssva <- ddssva[keep,]
ddssva <- DESeq(ddssva)
ddssva <- ddssva[which(mcols(ddssva)$betaConv),] # remove non converging rows
### Extract results
resultsNames(ddssva)
[1] "Intercept" "SV2" "Gene_HEX1_vs_GFP" "Gene_HEX2_vs_GFP"
[5] "Gene_JHMT_vs_GFP" "Gene_MIOX_vs_GFP" "Gene_UNCH_vs_GFP"
hex1 <- results(ddssva, name = "Gene_HEX1_vs_GFP", alpha = 0.05)
hex2 <- results(ddssva, name = "Gene_HEX2_vs_GFP", alpha = 0.05)
jhmt <- results(ddssva, name = "Gene_JHMT_vs_GFP", alpha = 0.05)
miox <- results(ddssva, name = "Gene_MIOX_vs_GFP", alpha = 0.05)
unch <- results(ddssva, name = "Gene_UNCH_vs_GFP", alpha = 0.05)
First we create function to generate the plots we are interested to obtain and then run the whole pipeline for each gene.
# Define contrast_sets
hex1_samples <- c("SggfpH1","SggfpH2","SggfpH3","SggfpH4","SggfpH5",
"Sghex1H1","Sghex1H2","Sghex1H3","Sghex1H4","Sghex1H5",
"SggfpT1","SggfpT2","SggfpT3","SggfpT4","SggfpT5",
"Sghex1T1","Sghex1T2","Sghex1T3","Sghex1T4","Sghex1T5")
hex2_samples <- c("SggfpH1","SggfpH2","SggfpH3","SggfpH4","SggfpH5",
"Sghex2H1","Sghex2H2","Sghex2H3","Sghex2H4","Sghex2H5",
"SggfpT1","SggfpT2","SggfpT3","SggfpT4","SggfpT5",
"Sghex2T1","Sghex2T2","Sghex2T3","Sghex2T4","Sghex2T5")
jhmt_samples <- c("SggfpH1","SggfpH2","SggfpH3","SggfpH4","SggfpH5",
"SgjhmtH1","SgjhmtH2","SgjhmtH3","SgjhmtH4","SgjhmtH5",
"SggfpT1","SggfpT2","SggfpT3","SggfpT4","SggfpT5",
"SgjhmtT1","SgjhmtT2","SgjhmtT3","SgjhmtT4","SgjhmtT5")
miox_samples <- c("SggfpH1","SggfpH2","SggfpH3","SggfpH4","SggfpH5",
"SgmioxH1","SgmioxH2","SgmioxH3","SgmioxH4","SgmioxH5",
"SggfpT1","SggfpT2","SggfpT3","SggfpT4","SggfpT5",
"SgmioxT1","SgmioxT2","SgmioxT3","SgmioxT4","SgmioxT5")
unch_samples <- c("SggfpH1","SggfpH2","SggfpH3","SggfpH4","SggfpH5",
"SgunchH1","SgunchH2","SgunchH3","SgunchH4","SgunchH5",
"SggfpT1","SggfpT2","SggfpT3","SggfpT4","SggfpT5",
"SgunchT1","SgunchT2","SgunchT3","SgunchT4","SgunchT5")
# Run full analysis
hex1_plots <- visualize_data(hex1, "Hex1_vs_GFP", hex1_samples)
hex2_plots <- visualize_data(hex2, "Hex2_vs_GFP", hex2_samples)
jhmt_plots <- visualize_data(jhmt, "JHMT_vs_GFP", jhmt_samples)
miox_plots <- visualize_data(miox, "MIOX_vs_GFP", miox_samples)
unch_plots <- visualize_data(unch, "UNCH_vs_GFP", unch_samples)
hex1_plots$volcano; hex1_plots$heatmap

| Version | Author | Date |
|---|---|---|
| d7fa779 | Maeva TECHER | 2025-02-14 |

| Version | Author | Date |
|---|---|---|
| d7fa779 | Maeva TECHER | 2025-02-14 |
hex2_plots$volcano; hex2_plots$heatmap

| Version | Author | Date |
|---|---|---|
| d7fa779 | Maeva TECHER | 2025-02-14 |

| Version | Author | Date |
|---|---|---|
| d7fa779 | Maeva TECHER | 2025-02-14 |
jhmt_plots$volcano; jhmt_plots$heatmap

| Version | Author | Date |
|---|---|---|
| d7fa779 | Maeva TECHER | 2025-02-14 |

| Version | Author | Date |
|---|---|---|
| d7fa779 | Maeva TECHER | 2025-02-14 |
miox_plots$volcano; miox_plots$heatmap

| Version | Author | Date |
|---|---|---|
| d7fa779 | Maeva TECHER | 2025-02-14 |

| Version | Author | Date |
|---|---|---|
| d7fa779 | Maeva TECHER | 2025-02-14 |
unch_plots$volcano; unch_plots$heatmap

| Version | Author | Date |
|---|---|---|
| d7fa779 | Maeva TECHER | 2025-02-14 |

| Version | Author | Date |
|---|---|---|
| d7fa779 | Maeva TECHER | 2025-02-14 |
# Generate tables for each contrast
table_hex1 <- generate_deg_table(ddssva, "Gene_HEX1_vs_GFP", allspecies_df)
out of 14890 with nonzero total read count
adjusted p-value < 0.05
LFC > 0 (up) : 839, 5.6%
LFC < 0 (down) : 950, 6.4%
outliers [1] : 0, 0%
low counts [2] : 0, 0%
(mean count < 2)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results
table_hex2 <- generate_deg_table(ddssva, "Gene_HEX2_vs_GFP", allspecies_df)
out of 14890 with nonzero total read count
adjusted p-value < 0.05
LFC > 0 (up) : 201, 1.3%
LFC < 0 (down) : 321, 2.2%
outliers [1] : 0, 0%
low counts [2] : 1444, 9.7%
(mean count < 13)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results
table_jhmt <- generate_deg_table(ddssva, "Gene_JHMT_vs_GFP", allspecies_df)
out of 14890 with nonzero total read count
adjusted p-value < 0.05
LFC > 0 (up) : 266, 1.8%
LFC < 0 (down) : 486, 3.3%
outliers [1] : 0, 0%
low counts [2] : 0, 0%
(mean count < 2)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results
table_miox <- generate_deg_table(ddssva, "Gene_MIOX_vs_GFP", allspecies_df)
out of 14890 with nonzero total read count
adjusted p-value < 0.05
LFC > 0 (up) : 111, 0.75%
LFC < 0 (down) : 100, 0.67%
outliers [1] : 0, 0%
low counts [2] : 0, 0%
(mean count < 2)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results
table_unch <- generate_deg_table(ddssva, "Gene_UNCH_vs_GFP", allspecies_df)
out of 14890 with nonzero total read count
adjusted p-value < 0.05
LFC > 0 (up) : 172, 1.2%
LFC < 0 (down) : 231, 1.6%
outliers [1] : 0, 0%
low counts [2] : 0, 0%
(mean count < 2)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results
table_hex1$kable_table
table_hex2$kable_table
table_jhmt$kable_table
table_miox$kable_table
table_unch$kable_table
saveDir <- paste0(workDir,"/DEG_results/RNAi/Head_no_rRNA")
dir.create(saveDir)
### Prepare Sample CSV file #####
samples <- read.delim(file.path(workDir, "list/RNAi/Head_RNAisample_list.csv"), sep = ",", row.names = 1, header = TRUE)
files <- file.path(workDir, "readcounts/RNAi/", samples$Tissue, samples$Filename)
names(files) <- row.names(samples)
all(file.exists(files))
[1] TRUE
### Create count sample matrix
cts <- map_dfc(files, function(sample) {
data_count <- read.delim(sample, sep = "\t", header = FALSE)
col_name <- gsub("_counts.txt", "", basename(sample))
setNames(data.frame(data_count[, 2]), col_name)
})
row_get <- read.delim(files[1], sep = "\t", row.names = 1, header = F) # Get proper row names
rownames(cts) <- rownames(row_get)
rm(row_get) # remove unused object from memory
While for bulk RNAseq on head and thorax for all species, the DEGs model was made between isolated and crowded individuals (with isolated as the reference state), here, the DEG analysis will be carried between GFP knock-down nymphs (as reference state) vs Hexamerins / Juvenile Hormones / Uncharacterized proteins.
### Build DESeq2 Object
dds <- DESeqDataSetFromMatrix(countData = cts,
colData = samples,
design = ~ Gene)
dds$Gene <- relevel(dds$Gene, ref = "GFP")
smallestGroupSize <- 5
keep <- rowSums(counts(dds) >= 10) >= smallestGroupSize
dds <- dds[keep,]
loci_to_exclude <- readLines(file.path(workDir, "list/excluded_loci/gregaria_rrna_list.txt"))
dds <- dds[!(rownames(dds) %in% loci_to_exclude), ]
dds <- DESeq(dds)
cbind(resultsNames(dds))
[,1]
[1,] "Intercept"
[2,] "Gene_HEX1_vs_GFP"
[3,] "Gene_HEX2_vs_GFP"
[4,] "Gene_JHMT_vs_GFP"
[5,] "Gene_MIOX_vs_GFP"
[6,] "Gene_UNCH_vs_GFP"
Following the generation of the DEseq2 object, we
annotate the genes with the GeneID using biomaRt.
### Fetch Annotation Gene IDs using biomaRt
ensembl <- useMart("metazoa_mart", host = "https://metazoa.ensembl.org")
metazoa_list <- listDatasets(ensembl)
dataset <- useMart("metazoa_mart", dataset = "sggca023897955v2rs_eg_gene",
host = "https://metazoa.ensembl.org")
#listAttributes(dataset)
test_raw_counts <- as.data.frame(counts(dds))
rownames(test_raw_counts) <- as.character(rownames(test_raw_counts))
test_raw_counts$ensembl_gene_id <- row.names(test_raw_counts)
annotations <- getBM(attributes = c("ensembl_gene_id", "geneid"),
filters = "ensembl_gene_id",
values = rownames(test_raw_counts),
mart = dataset)
# Merge dataframes to retain geneid information from biomaRt
test_raw_counts_annotated <- merge(test_raw_counts, annotations,
by = "ensembl_gene_id",
all.x = T)
write.csv(test_raw_counts_annotated, file=paste0(saveDir,"/Head_raw_counts.csv"))
# Plot PCA and investigate quality metrics
vsd <- vst(dds, blind=T)
pca_results <- create_pca_plots(norm.dds = vsd, saveDir, transformation = "vst", intgroup = "Gene")
pca_results$PCA_Labelled

| Version | Author | Date |
|---|---|---|
| d7fa779 | Maeva TECHER | 2025-02-14 |
pca_results$PCA_Hull

| Version | Author | Date |
|---|---|---|
| d7fa779 | Maeva TECHER | 2025-02-14 |
### SVA analysis to control for technical variation
dat <- counts(dds, normalized = TRUE)
idx <- rowMeans(dat) > 1
dat <- dat[idx, ]
mod <- model.matrix(~ Gene, colData(dds))
mod0 <- model.matrix(~ 1, colData(dds))
svseq <- svaseq(dat, mod, mod0)
Number of significant surrogate variables is: 4
Iteration (out of 5 ):1 2 3 4 5
sva_plots <- create_sva_plots(svseq, dds, saveDir, intgroup = c("Tissue", "Gene"))
# Show stripcharts in the report
sva_plots$Stripcharts[[1]] # Show first stripchart

| Version | Author | Date |
|---|---|---|
| d7fa779 | Maeva TECHER | 2025-02-14 |
sva_plots$Stripcharts[[2]] # Show second stripchart

| Version | Author | Date |
|---|---|---|
| d7fa779 | Maeva TECHER | 2025-02-14 |
sva_plots$Stripcharts[[3]] # Show third stripchart

| Version | Author | Date |
|---|---|---|
| d7fa779 | Maeva TECHER | 2025-02-14 |
# Show scatter plots in the report
sva_plots$ScatterPlots[["1_2"]] # Show SV1 vs SV2

| Version | Author | Date |
|---|---|---|
| d7fa779 | Maeva TECHER | 2025-02-14 |
sva_plots$ScatterPlots[["1_3"]] # Show SV1 vs SV3

| Version | Author | Date |
|---|---|---|
| d7fa779 | Maeva TECHER | 2025-02-14 |
sva_plots$ScatterPlots[["2_3"]] # Show SV2 vs SV3

| Version | Author | Date |
|---|---|---|
| d7fa779 | Maeva TECHER | 2025-02-14 |
We rerun the DESeq2 model but this time including the
surrogate variable as a covariate, as we know that the modeled variation
is more likely explained by tissue and gene variation rather than batch
effects.
ddssva <- dds
ddssva$SV1 <- svseq$sv[,1]
ddssva$SV2 <- svseq$sv[,2]
ddssva$SV3 <- svseq$sv[,3]
design(ddssva) <- ~ SV1 + SV2 + SV3 + Gene
ddssva$Gene <- relevel(ddssva$Gene, ref = "GFP")
smallestGroupSize <- 5
keep <- rowSums(counts(ddssva) >= 10) >= smallestGroupSize
ddssva <- ddssva[keep,]
ddssva <- DESeq(ddssva)
ddssva <- ddssva[which(mcols(ddssva)$betaConv),] # remove non converging rows
### Extract results
resultsNames(ddssva)
[1] "Intercept" "SV1" "SV2" "SV3"
[5] "Gene_HEX1_vs_GFP" "Gene_HEX2_vs_GFP" "Gene_JHMT_vs_GFP" "Gene_MIOX_vs_GFP"
[9] "Gene_UNCH_vs_GFP"
hex1 <- results(ddssva, name = "Gene_HEX1_vs_GFP", alpha = 0.05)
hex2 <- results(ddssva, name = "Gene_HEX2_vs_GFP", alpha = 0.05)
jhmt <- results(ddssva, name = "Gene_JHMT_vs_GFP", alpha = 0.05)
miox <- results(ddssva, name = "Gene_MIOX_vs_GFP", alpha = 0.05)
unch <- results(ddssva, name = "Gene_UNCH_vs_GFP", alpha = 0.05)
First we create function to generate the plots we are interested to obtain and then run the whole pipeline for each gene.
# Define contrast_sets
hex1_samples <- c("SggfpH1","SggfpH2","SggfpH3","SggfpH4","SggfpH5",
"Sghex1H1","Sghex1H2","Sghex1H3","Sghex1H4","Sghex1H5")
hex2_samples <- c("SggfpH1","SggfpH2","SggfpH3","SggfpH4","SggfpH5",
"Sghex2H1","Sghex2H2","Sghex2H3","Sghex2H4","Sghex2H5")
jhmt_samples <- c("SggfpH1","SggfpH2","SggfpH3","SggfpH4","SggfpH5",
"SgjhmtH1","SgjhmtH2","SgjhmtH3","SgjhmtH4","SgjhmtH5")
miox_samples <- c("SggfpH1","SggfpH2","SggfpH3","SggfpH4","SggfpH5",
"SgmioxH1","SgmioxH2","SgmioxH3","SgmioxH4","SgmioxH5")
unch_samples <- c("SggfpH1","SggfpH2","SggfpH3","SggfpH4","SggfpH5",
"SgunchH1","SgunchH2","SgunchH3","SgunchH4","SgunchH5")
# Run full analysis
hex1_plots <- visualize_data(hex1, "Hex1_vs_GFP", hex1_samples)
hex2_plots <- visualize_data(hex2, "Hex2_vs_GFP", hex2_samples)
jhmt_plots <- visualize_data(jhmt, "JHMT_vs_GFP", jhmt_samples)
miox_plots <- visualize_data(miox, "MIOX_vs_GFP", miox_samples)
unch_plots <- visualize_data(unch, "UNCH_vs_GFP", unch_samples)
hex1_plots$volcano; hex1_plots$heatmap

| Version | Author | Date |
|---|---|---|
| d7fa779 | Maeva TECHER | 2025-02-14 |

| Version | Author | Date |
|---|---|---|
| d7fa779 | Maeva TECHER | 2025-02-14 |
hex2_plots$volcano; hex2_plots$heatmap

| Version | Author | Date |
|---|---|---|
| d7fa779 | Maeva TECHER | 2025-02-14 |

| Version | Author | Date |
|---|---|---|
| d7fa779 | Maeva TECHER | 2025-02-14 |
jhmt_plots$volcano; jhmt_plots$heatmap

| Version | Author | Date |
|---|---|---|
| d7fa779 | Maeva TECHER | 2025-02-14 |

| Version | Author | Date |
|---|---|---|
| d7fa779 | Maeva TECHER | 2025-02-14 |
miox_plots$volcano; miox_plots$heatmap

| Version | Author | Date |
|---|---|---|
| d7fa779 | Maeva TECHER | 2025-02-14 |

| Version | Author | Date |
|---|---|---|
| d7fa779 | Maeva TECHER | 2025-02-14 |
unch_plots$volcano; unch_plots$heatmap

| Version | Author | Date |
|---|---|---|
| d7fa779 | Maeva TECHER | 2025-02-14 |

| Version | Author | Date |
|---|---|---|
| d7fa779 | Maeva TECHER | 2025-02-14 |
# Generate tables for each contrast
table_hex1 <- generate_deg_table(ddssva, "Gene_HEX1_vs_GFP", allspecies_df)
out of 14587 with nonzero total read count
adjusted p-value < 0.05
LFC > 0 (up) : 432, 3%
LFC < 0 (down) : 463, 3.2%
outliers [1] : 0, 0%
low counts [2] : 2546, 17%
(mean count < 31)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results
table_hex2 <- generate_deg_table(ddssva, "Gene_HEX2_vs_GFP", allspecies_df)
out of 14587 with nonzero total read count
adjusted p-value < 0.05
LFC > 0 (up) : 171, 1.2%
LFC < 0 (down) : 287, 2%
outliers [1] : 0, 0%
low counts [2] : 2263, 16%
(mean count < 26)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results
table_jhmt <- generate_deg_table(ddssva, "Gene_JHMT_vs_GFP", allspecies_df)
out of 14587 with nonzero total read count
adjusted p-value < 0.05
LFC > 0 (up) : 73, 0.5%
LFC < 0 (down) : 106, 0.73%
outliers [1] : 0, 0%
low counts [2] : 1980, 14%
(mean count < 22)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results
table_miox <- generate_deg_table(ddssva, "Gene_MIOX_vs_GFP", allspecies_df)
out of 14587 with nonzero total read count
adjusted p-value < 0.05
LFC > 0 (up) : 61, 0.42%
LFC < 0 (down) : 112, 0.77%
outliers [1] : 0, 0%
low counts [2] : 3677, 25%
(mean count < 60)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results
table_unch <- generate_deg_table(ddssva, "Gene_UNCH_vs_GFP", allspecies_df)
out of 14587 with nonzero total read count
adjusted p-value < 0.05
LFC > 0 (up) : 1355, 9.3%
LFC < 0 (down) : 1860, 13%
outliers [1] : 0, 0%
low counts [2] : 283, 1.9%
(mean count < 8)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results
table_hex1$kable_table
table_hex2$kable_table
table_jhmt$kable_table
table_miox$kable_table
table_unch$kable_table
saveDir <- paste0(workDir,"/DEG_results/RNAi/Thorax_no_rRNA")
dir.create(saveDir)
### Prepare Sample CSV file #####
samples <- read.delim(file.path(workDir, "list/RNAi/Thorax_RNAisample_list.csv"), sep = ",", row.names = 1, header = TRUE)
files <- file.path(workDir, "readcounts/RNAi/", samples$Tissue, samples$Filename)
names(files) <- row.names(samples)
all(file.exists(files))
[1] TRUE
### Create count sample matrix
cts <- map_dfc(files, function(sample) {
data_count <- read.delim(sample, sep = "\t", header = FALSE)
col_name <- gsub("_counts.txt", "", basename(sample))
setNames(data.frame(data_count[, 2]), col_name)
})
row_get <- read.delim(files[1], sep = "\t", row.names = 1, header = F) # Get proper row names
rownames(cts) <- rownames(row_get)
rm(row_get) # remove unused object from memory
While for bulk RNAseq on head and thorax for all species, the DEGs model was made between isolated and crowded individuals (with isolated as the reference state), here, the DEG analysis will be carried between GFP knock-down nymphs (as reference state) vs Hexamerins / Juvenile Hormones / Uncharacterized proteins.
### Build DESeq2 Object
dds <- DESeqDataSetFromMatrix(countData = cts,
colData = samples,
design = ~ Gene)
dds$Gene <- relevel(dds$Gene, ref = "GFP")
smallestGroupSize <- 5
keep <- rowSums(counts(dds) >= 10) >= smallestGroupSize
dds <- dds[keep,]
loci_to_exclude <- readLines(file.path(workDir, "list/excluded_loci/gregaria_rrna_list.txt"))
dds <- dds[!(rownames(dds) %in% loci_to_exclude), ]
dds <- DESeq(dds)
cbind(resultsNames(dds))
[,1]
[1,] "Intercept"
[2,] "Gene_HEX1_vs_GFP"
[3,] "Gene_HEX2_vs_GFP"
[4,] "Gene_JHMT_vs_GFP"
[5,] "Gene_MIOX_vs_GFP"
[6,] "Gene_UNCH_vs_GFP"
Following the generation of the DEseq2 object, we
annotate the genes with the GeneID using biomaRt.
### Fetch Annotation Gene IDs using biomaRt
ensembl <- useMart("metazoa_mart", host = "https://metazoa.ensembl.org")
metazoa_list <- listDatasets(ensembl)
dataset <- useMart("metazoa_mart", dataset = "sggca023897955v2rs_eg_gene",
host = "https://metazoa.ensembl.org")
#listAttributes(dataset)
test_raw_counts <- as.data.frame(counts(dds))
rownames(test_raw_counts) <- as.character(rownames(test_raw_counts))
test_raw_counts$ensembl_gene_id <- row.names(test_raw_counts)
annotations <- getBM(attributes = c("ensembl_gene_id", "geneid"),
filters = "ensembl_gene_id",
values = rownames(test_raw_counts),
mart = dataset)
# Merge dataframes to retain geneid information from biomaRt
test_raw_counts_annotated <- merge(test_raw_counts, annotations,
by = "ensembl_gene_id",
all.x = T)
write.csv(test_raw_counts_annotated, file=paste0(saveDir,"/Thorax_raw_counts.csv"))
# Plot PCA and investigate quality metrics
vsd <- vst(dds, blind=T)
pca_results <- create_pca_plots(norm.dds = vsd, saveDir, transformation = "vst", intgroup = "Gene")
pca_results$PCA_Labelled

| Version | Author | Date |
|---|---|---|
| d7fa779 | Maeva TECHER | 2025-02-14 |
pca_results$PCA_Hull

| Version | Author | Date |
|---|---|---|
| d7fa779 | Maeva TECHER | 2025-02-14 |
### SVA analysis to control for technical variation
dat <- counts(dds, normalized = TRUE)
idx <- rowMeans(dat) > 1
dat <- dat[idx, ]
mod <- model.matrix(~ Gene, colData(dds))
mod0 <- model.matrix(~ 1, colData(dds))
svseq <- svaseq(dat, mod, mod0)
Number of significant surrogate variables is: 5
Iteration (out of 5 ):1 2 3 4 5
sva_plots <- create_sva_plots(svseq, dds, saveDir, intgroup = c("Tissue", "Gene"))
# Show stripcharts in the report
sva_plots$Stripcharts[[1]] # Show first stripchart

| Version | Author | Date |
|---|---|---|
| d7fa779 | Maeva TECHER | 2025-02-14 |
sva_plots$Stripcharts[[2]] # Show second stripchart

| Version | Author | Date |
|---|---|---|
| d7fa779 | Maeva TECHER | 2025-02-14 |
sva_plots$Stripcharts[[3]] # Show third stripchart

| Version | Author | Date |
|---|---|---|
| d7fa779 | Maeva TECHER | 2025-02-14 |
# Show scatter plots in the report
sva_plots$ScatterPlots[["1_2"]] # Show SV1 vs SV2

| Version | Author | Date |
|---|---|---|
| d7fa779 | Maeva TECHER | 2025-02-14 |
sva_plots$ScatterPlots[["1_3"]] # Show SV1 vs SV3

| Version | Author | Date |
|---|---|---|
| d7fa779 | Maeva TECHER | 2025-02-14 |
sva_plots$ScatterPlots[["2_3"]] # Show SV2 vs SV3

| Version | Author | Date |
|---|---|---|
| d7fa779 | Maeva TECHER | 2025-02-14 |
We rerun the DESeq2 model but this time including the
surrogate variable as a covariate, as we know that the modeled variation
is more likely explained by tissue and gene variation rather than batch
effects.
ddssva <- dds
ddssva$SV1 <- svseq$sv[,1]
ddssva$SV2 <- svseq$sv[,2]
ddssva$SV3 <- svseq$sv[,3]
design(ddssva) <- ~ SV1 + SV2 + SV3 + Gene
ddssva$Gene <- relevel(ddssva$Gene, ref = "GFP")
smallestGroupSize <- 5
keep <- rowSums(counts(ddssva) >= 10) >= smallestGroupSize
ddssva <- ddssva[keep,]
ddssva <- DESeq(ddssva)
ddssva <- ddssva[which(mcols(ddssva)$betaConv),] # remove non converging rows
### Extract results
resultsNames(ddssva)
[1] "Intercept" "SV1" "SV2" "SV3"
[5] "Gene_HEX1_vs_GFP" "Gene_HEX2_vs_GFP" "Gene_JHMT_vs_GFP" "Gene_MIOX_vs_GFP"
[9] "Gene_UNCH_vs_GFP"
hex1 <- results(ddssva, name = "Gene_HEX1_vs_GFP", alpha = 0.05)
hex2 <- results(ddssva, name = "Gene_HEX2_vs_GFP", alpha = 0.05)
jhmt <- results(ddssva, name = "Gene_JHMT_vs_GFP", alpha = 0.05)
miox <- results(ddssva, name = "Gene_MIOX_vs_GFP", alpha = 0.05)
unch <- results(ddssva, name = "Gene_UNCH_vs_GFP", alpha = 0.05)
First we create function to generate the plots we are interested to obtain and then run the whole pipeline for each gene.
# Define contrast_sets
hex1_samples <- c("SggfpT1","SggfpT2","SggfpT3","SggfpT4","SggfpT5",
"Sghex1T1","Sghex1T2","Sghex1T3","Sghex1T4","Sghex1T5")
hex2_samples <- c("SggfpT1","SggfpT2","SggfpT3","SggfpT4","SggfpT5",
"Sghex2T1","Sghex2T2","Sghex2T3","Sghex2T4","Sghex2T5")
jhmt_samples <- c("SggfpT1","SggfpT2","SggfpT3","SggfpT4","SggfpT5",
"SgjhmtT1","SgjhmtT2","SgjhmtT3","SgjhmtT4","SgjhmtT5")
miox_samples <- c("SggfpT1","SggfpT2","SggfpT3","SggfpT4","SggfpT5",
"SgmioxT1","SgmioxT2","SgmioxT3","SgmioxT4","SgmioxT5")
unch_samples <- c("SggfpT1","SggfpT2","SggfpT3","SggfpT4","SggfpT5",
"SgunchT1","SgunchT2","SgunchT3","SgunchT4","SgunchT5")
# Run full analysis
hex1_plots <- visualize_data(hex1, "Hex1_vs_GFP", hex1_samples)
hex2_plots <- visualize_data(hex2, "Hex2_vs_GFP", hex2_samples)
jhmt_plots <- visualize_data(jhmt, "JHMT_vs_GFP", jhmt_samples)
miox_plots <- visualize_data(miox, "MIOX_vs_GFP", miox_samples)
unch_plots <- visualize_data(unch, "UNCH_vs_GFP", unch_samples)
hex1_plots$volcano; hex1_plots$heatmap

| Version | Author | Date |
|---|---|---|
| d7fa779 | Maeva TECHER | 2025-02-14 |

| Version | Author | Date |
|---|---|---|
| d7fa779 | Maeva TECHER | 2025-02-14 |
hex2_plots$volcano; hex2_plots$heatmap

| Version | Author | Date |
|---|---|---|
| d7fa779 | Maeva TECHER | 2025-02-14 |

| Version | Author | Date |
|---|---|---|
| d7fa779 | Maeva TECHER | 2025-02-14 |
jhmt_plots$volcano; jhmt_plots$heatmap

| Version | Author | Date |
|---|---|---|
| d7fa779 | Maeva TECHER | 2025-02-14 |

| Version | Author | Date |
|---|---|---|
| d7fa779 | Maeva TECHER | 2025-02-14 |
miox_plots$volcano; miox_plots$heatmap

| Version | Author | Date |
|---|---|---|
| d7fa779 | Maeva TECHER | 2025-02-14 |

| Version | Author | Date |
|---|---|---|
| d7fa779 | Maeva TECHER | 2025-02-14 |
unch_plots$volcano; unch_plots$heatmap

| Version | Author | Date |
|---|---|---|
| d7fa779 | Maeva TECHER | 2025-02-14 |

| Version | Author | Date |
|---|---|---|
| d7fa779 | Maeva TECHER | 2025-02-14 |
# Generate tables for each contrast
table_hex1 <- generate_deg_table(ddssva, "Gene_HEX1_vs_GFP", allspecies_df)
out of 14217 with nonzero total read count
adjusted p-value < 0.05
LFC > 0 (up) : 213, 1.5%
LFC < 0 (down) : 244, 1.7%
outliers [1] : 0, 0%
low counts [2] : 2757, 19%
(mean count < 30)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results
table_hex2 <- generate_deg_table(ddssva, "Gene_HEX2_vs_GFP", allspecies_df)
out of 14217 with nonzero total read count
adjusted p-value < 0.05
LFC > 0 (up) : 195, 1.4%
LFC < 0 (down) : 161, 1.1%
outliers [1] : 0, 0%
low counts [2] : 3308, 23%
(mean count < 42)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results
table_jhmt <- generate_deg_table(ddssva, "Gene_JHMT_vs_GFP", allspecies_df)
out of 14217 with nonzero total read count
adjusted p-value < 0.05
LFC > 0 (up) : 145, 1%
LFC < 0 (down) : 128, 0.9%
outliers [1] : 0, 0%
low counts [2] : 2481, 17%
(mean count < 26)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results
table_miox <- generate_deg_table(ddssva, "Gene_MIOX_vs_GFP", allspecies_df)
out of 14217 with nonzero total read count
adjusted p-value < 0.05
LFC > 0 (up) : 475, 3.3%
LFC < 0 (down) : 403, 2.8%
outliers [1] : 0, 0%
low counts [2] : 827, 5.8%
(mean count < 10)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results
table_unch <- generate_deg_table(ddssva, "Gene_UNCH_vs_GFP", allspecies_df)
out of 14217 with nonzero total read count
adjusted p-value < 0.05
LFC > 0 (up) : 287, 2%
LFC < 0 (down) : 278, 2%
outliers [1] : 0, 0%
low counts [2] : 3032, 21%
(mean count < 35)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results
table_hex1$kable_table
table_hex2$kable_table
table_jhmt$kable_table
table_miox$kable_table
table_unch$kable_table
#enrich_data(hex1, "Hex1_vs_GFP", hex1_samples)
#enrich_data(hex2, "Hex2_vs_GFP", hex2_samples)
#enrich_data(jhmt, "JHMT_vs_GFP", jhmt_samples)
#enrich_data(miox, "MIOX_vs_GFP", miox_samples)
#enrich_data(unch, "UNCH_vs_GFP", unch_samples)
sessionInfo()
R version 4.4.2 (2024-10-31)
Platform: aarch64-apple-darwin20
Running under: macOS Sequoia 15.3
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.0
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
time zone: Asia/Tokyo
tzcode source: internal
attached base packages:
[1] grid stats4 stats graphics grDevices utils datasets
[8] methods base
other attached packages:
[1] kableExtra_1.4.0 DT_0.33
[3] rafalib_1.0.0 biomaRt_2.62.1
[5] httr2_1.1.0 purrr_1.0.4
[7] dplyr_1.1.4 ashr_2.2-63
[9] cowplot_1.1.3 sva_3.54.0
[11] BiocParallel_1.40.0 genefilter_1.88.0
[13] mgcv_1.9-1 nlme_3.1-167
[15] clusterProfiler_4.14.4 EnhancedVolcano_1.24.0
[17] circlize_0.4.16 RColorBrewer_1.1-3
[19] ComplexHeatmap_2.22.0 ensembldb_2.30.0
[21] AnnotationFilter_1.30.0 GenomicFeatures_1.58.0
[23] AnnotationHub_3.14.0 BiocFileCache_2.14.0
[25] dbplyr_2.5.0 ggConvexHull_0.1.0
[27] ggrepel_0.9.6 ggplot2_3.5.1
[29] DESeq2_1.46.0 SummarizedExperiment_1.36.0
[31] MatrixGenerics_1.18.1 matrixStats_1.5.0
[33] GenomicRanges_1.58.0 GenomeInfoDb_1.42.3
[35] org.Sgregaria.eg.db_1.0.0 AnnotationDbi_1.68.0
[37] IRanges_2.40.1 S4Vectors_0.44.0
[39] Biobase_2.66.0 BiocGenerics_0.52.0
loaded via a namespace (and not attached):
[1] splines_4.4.2 later_1.4.1 BiocIO_1.16.0
[4] bitops_1.0-9 ggplotify_0.1.2 filelock_1.0.3
[7] tibble_3.2.1 R.oo_1.27.0 XML_3.99-0.18
[10] lifecycle_1.0.4 mixsqp_0.3-54 edgeR_4.4.2
[13] doParallel_1.0.17 rprojroot_2.0.4 lattice_0.22-6
[16] crosstalk_1.2.1 magrittr_2.0.3 limma_3.62.2
[19] sass_0.4.9 rmarkdown_2.29 jquerylib_0.1.4
[22] yaml_2.3.10 httpuv_1.6.15 ggtangle_0.0.6
[25] DBI_1.2.3 abind_1.4-8 zlibbioc_1.52.0
[28] R.utils_2.12.3 RCurl_1.98-1.16 yulab.utils_0.2.0
[31] rappdirs_0.3.3 git2r_0.35.0 GenomeInfoDbData_1.2.13
[34] enrichplot_1.26.6 irlba_2.3.5.1 tidytree_0.4.6
[37] annotate_1.84.0 svglite_2.1.3 codetools_0.2-20
[40] DelayedArray_0.32.0 xml2_1.3.6 DOSE_4.0.0
[43] tidyselect_1.2.1 shape_1.4.6.1 aplot_0.2.4
[46] UCSC.utils_1.2.0 farver_2.1.2 GenomicAlignments_1.42.0
[49] jsonlite_1.8.9 GetoptLong_1.0.5 survival_3.8-3
[52] iterators_1.0.14 systemfonts_1.2.1 foreach_1.5.2
[55] progress_1.2.3 tools_4.4.2 ragg_1.3.3
[58] treeio_1.30.0 Rcpp_1.0.14 glue_1.8.0
[61] SparseArray_1.6.1 xfun_0.50 qvalue_2.38.0
[64] withr_3.0.2 BiocManager_1.30.25 fastmap_1.2.0
[67] truncnorm_1.0-9 digest_0.6.37 R6_2.5.1
[70] gridGraphics_0.5-1 textshaping_1.0.0 colorspace_2.1-1
[73] GO.db_3.20.0 RSQLite_2.3.9 R.methodsS3_1.8.2
[76] tidyr_1.3.1 generics_0.1.3 data.table_1.16.4
[79] rtracklayer_1.66.0 htmlwidgets_1.6.4 prettyunits_1.2.0
[82] httr_1.4.7 S4Arrays_1.6.0 whisker_0.4.1
[85] pkgconfig_2.0.3 gtable_0.3.6 blob_1.2.4
[88] workflowr_1.7.1 XVector_0.46.0 htmltools_0.5.8.1
[91] fgsea_1.32.2 ProtGenerics_1.38.0 clue_0.3-66
[94] scales_1.3.0 png_0.1-8 ggfun_0.1.8
[97] knitr_1.49 rstudioapi_0.17.1 reshape2_1.4.4
[100] rjson_0.2.23 curl_6.2.0 cachem_1.1.0
[103] GlobalOptions_0.1.2 stringr_1.5.1 BiocVersion_3.20.0
[106] parallel_4.4.2 restfulr_0.0.15 pillar_1.10.1
[109] vctrs_0.6.5 promises_1.3.2 xtable_1.8-4
[112] cluster_2.1.8 evaluate_1.0.3 invgamma_1.1
[115] cli_3.6.3 locfit_1.5-9.11 compiler_4.4.2
[118] Rsamtools_2.22.0 rlang_1.1.5 crayon_1.5.3
[121] SQUAREM_2021.1 labeling_0.4.3 plyr_1.8.9
[124] fs_1.6.5 stringi_1.8.4 viridisLite_0.4.2
[127] munsell_0.5.1 Biostrings_2.74.1 lazyeval_0.2.2
[130] GOSemSim_2.32.0 Matrix_1.7-2 hms_1.1.3
[133] patchwork_1.3.0 bit64_4.6.0-1 statmod_1.5.0
[136] KEGGREST_1.46.0 igraph_2.1.4 memoise_2.0.1
[139] bslib_0.9.0 ggtree_3.14.0 fastmatch_1.1-6
[142] bit_4.5.0.1 ape_5.8-1 gson_0.1.0