Last updated: 2023-03-05

Checks: 2 0

Knit directory: dgrp-starve/

This reproducible R Markdown analysis was created with workflowr (version 1.7.0). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.

Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version 82d0314. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:

Ignored files:
    Ignored:    .RData

Untracked files:
    Untracked:  analysis/linearReg.Rmd
    Untracked:  code/PCA/
    Untracked:  code/data-prep/
    Untracked:  code/fabio/
    Untracked:  code/intro-starve/
    Untracked:  code/regress/
    Untracked:  code/stepwise/
    Untracked:  data/corLoop-f.rds
    Untracked:  data/corLoop-m.rds
    Untracked:  data/eQTL_traits_females.csv
    Untracked:  data/eQTL_traits_males.csv
    Untracked:  data/fRegress.txt
    Untracked:  data/fRegress_adj.txt
    Untracked:  data/gbayesC-f.Rds
    Untracked:  data/gbayesC-m.Rds
    Untracked:  data/gbayesC.Rds
    Untracked:  data/goGroups.txt
    Untracked:  data/mPart.txt
    Untracked:  data/mRegress.txt
    Untracked:  data/mRegress_adj.txt
    Untracked:  data/multiReg.rData
    Untracked:  data/starve-f.txt
    Untracked:  data/starve-m.txt
    Untracked:  data/xp-f.txt
    Untracked:  data/xp-m.txt
    Untracked:  data/y_save.txt
    Untracked:  figure/
    Untracked:  notes/

Unstaged changes:
    Deleted:    analysis/gremlo.R
    Deleted:    analysis/stepwise-f.Rmd
    Deleted:    analysis/stepwise-m.Rmd
    Deleted:    analysis/testing.R
    Deleted:    code/baseScript-lineComp.R
    Deleted:    code/combineSNP.R
    Deleted:    code/four-comp.76979.err
    Deleted:    code/four-comp.76979.out
    Deleted:    code/four-comp.sbatch
    Deleted:    code/fourLinePrep.R
    Deleted:    code/line_avgMinus.R
    Deleted:    code/line_avgPlus.R
    Deleted:    code/line_difMinus.R
    Deleted:    code/line_difPlus.R
    Deleted:    code/snpGene.R
    Deleted:    code/starveDataPrep.R

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.

These are the previous versions of the repository in which changes were made to the R Markdown (analysis/recap.Rmd) and HTML (docs/recap.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File Version Author Date Message
Rmd 82d0314 nklimko 2023-03-05 wflow_publish("analysis/recap.Rmd")
html 91713ac nklimko 2023-01-22 Build site.
Rmd 709ca14 nklimko 2023-01-22 wflow_publish("analysis/recap.Rmd")
html e4cf8cb nklimko 2023-01-10 Build site.
Rmd c826636 nklimko 2023-01-10 wflow_publish("analysis/recap.Rmd")
html bf1900e nklimko 2023-01-10 Build site.
Rmd a05fc48 nklimko 2023-01-10 wflow_publish("analysis/recap.Rmd")


• Reinstall R packages and rerun method comparison

• Use plotBayes source code to create custom plot function for trace plots

• Assess convergence using custom function plots

• Rename stepwise to something useful and descriptive

• Set partition to fm-bigmem-1 for sbatch scripts


• Move heavy lifting from Rmd to sbatch script for compute node processing, save output to .Rds file and perform analysis in workflowR

• Upload stepwise-f.html with working ggplot figures when complete

• Run gbayesC with nit = 10k and nburn = 5k to ensure convergence

• qgg::plotBayes with what = “trace” to generate trace plots of MCMC convergence

• Follow-up with Dr. Starr-Moss Tuesday morning through email

• nextflow/snakemake for long-term parallelization workflow


• Fit four models to real expression data for both genders

• Email Vijay/Maria about secretariat access or use OoD to rename files

• Email Dr. Starr-Moss(cc) about admission to graduate program in the Fall

• Varbvs uses a spike-slab prior, glmnet uses LASSO variable selection


• Continue implementing models to generated expression data



• Restructure loop to use the same validate set for test and train IDs for each iteration

• Correlate y_test to y_obs to measure accuracy


• Simulate data and fit the BayesC model to toy data in preparation for secretariat reactivation.

• Look through gbayes documentation for help.

• Change fMeans and mMeans data files to exp_f and exp_m when secretariat reopens - tweak older pages accordingly.

• Next meeting will be over Zoom time/day TBD.


• Split pages into association and prediction.

• Male data was appropriate, finish details on prediction with final data.

• The cross-validation only should be done with a for loop, greml still runs for each fit.

• Perform custom cv on a toy data set as secretariat is down until around Feb. 3rd.

• Review GBLUP, Gaussian prior, and spike-slab prior from Whole-Genome Regression paper

• Review GBLUP from Leveraging Data to Predict Complex Traits paper.


• Upload rds/Rdata file of y_adj, mu, TRM, and validate for further review.

• Perform cross validation using own code: generate TRM with both model and validation set.

• Use fitG and fitB for transcriptomic predictors and intercepts respectively.

• Summary statistics of accuracy: compare predicted phenotype to observed.

• Address qqman not taking input/arguments.

• Following meeting 2:00pm on Monday the 23rd.


• Adjust phenotypes for Wolbachia infection status and inversions for Linear Regression

• Use validate to perform partitioned prediction on data using Pearson’s correlation coefficient ($corr) from qqman methods

• Troubleshoot stat from qqman not taking input/arguments -> determine issue

• Plan to review progress each Monday at 11:00 am

• Troubleshoot stat from qqman not taking input/arguments -> determine issue


• Fit data to model using starvation resistance vector

• Reorganize site to hyperlinks from main index rather than new pages every week

• Next meeting will be January 10th at CHG

• Merry Christmas!


• Fix x axis uniform distribution label: incorrect.

• Inflection in qq plot indicates inflation of p-values.

• Address this by using mixed model of fixed effect and random effect.

• Sample data will be sent later on how to perform this with simulated data.

• Send shift schedule to plan next week’s meeting.


• Fix correlations: lm not reading in data properly, p-values wrong

• Add -log10 scale on qqplot, not just Manhattan plot

• qqman package for both qqplot and Manhattan plot

• Next meeting Wednesday, Dec. 14th 2:30-3:00 via Zoom due to exams.


• Use gene expression data to preform simple regression with starvation resistance.

• Run summary(lm(y~x)) for starvation resistance against gene expression for each gene in both sexes.

• p-value of slope can be accessed from summary()[[4]][8]

• Create a -10 log scale QQ-plot for a uniform distribution versus p-values.

• Determine why this statistical approach is inappropriate.


• Multiple comparison scatter plots should be sex specific.

• PCA should be performed with genotype data, not gene expression.

o Calculate mean expression and variance for gene expression.

o Scatter plot of overlapping genes between males and females.

o Count # of genes expressed in males, females, and both.


• Continue multiple comparisons with trendlines, multiple tips:

o cor.test() for summary statistics

o List notation to store ggplots

o Use print() with ggplot, cowplot for grid arrangement

• Try to find an efficient method for column means of gene expression

• Principal Component analysis – begin brainstorming

• Discuss further PhD plans mid-February


• Presented starvation analysis of DGRP lines

o Include “code_folding: hide” in yaml headers for Rmd

o Include variance in results

• Multiple comparison of starvation against all other traits

• Perform analysis of gene expression by line


• Get creative with analysis – scatterplot trendline, normality, beyond

• WorkflowR from cmd line, develop website

• Base repositories in /data/morgante_lab/nklimko instead of home drive

• Next meeting Friday the 11th at 10am by Zoom due to election


• Set github ssh keys and config settings on personal laptop and secretariat

• Recap of workflowR and walkthrough of model layout

• Introductory project to analyze starvation resistance trait in male and female lines on computational node in secretariat.

• Postdoctoral candidate meeting and seminar on Wed 10/27 – provide feedback on both candidates


• Presented SNP Prediction Paper on Plant and Animal breeding

• Begin working with github and workflowR to begin data processing in upcoming weeks


• Create simple presentation for Whole-Genome Regression, de los Campos by 10/11 - build habit of summarizing papers/distilling main points

• Class selection: Adv. Biochem, Seminar, Intro to Quant Gen, and Regression + Least Squares

o Consult on further class selection semester basis

• Talk with Dr. Starr-Moss regarding credit hours for master’s research – three recommended


• Review of Presentation for Levaraging Data to Predict Complex Traits

o Transcriptome data higher accuracy than SNP variation

o External data boosts prediction accuracy of transcriptome alone(TBLUP vs GO-TBLUP)

o Redundancy of genome and transcriptome – additive vs non-additive

• Complete final Dataquest subunit by Friday

• Review Whole-Genome Regression Paper


• Goal to complete R modules on Dataquest by 9/30: Dataquest is paid software

• Read two papers focused on DGRP creation and usage

• Begin reading additional paper for class presentation

• Meeting time shift to 2:30pm for travel considerations