Last updated: 2023-03-05

• Reinstall R packages and rerun method comparison

• Use plotBayes source code to create custom plot function for trace plots

• Assess convergence using custom function plots

• Rename stepwise to something useful and descriptive

• Set partition to fm-bigmem-1 for sbatch scripts


• Move heavy lifting from Rmd to sbatch script for compute node processing, save output to .Rds file and perform analysis in workflowR

• Upload stepwise-f.html with working ggplot figures when complete

• Run gbayesC with nit = 10k and nburn = 5k to ensure convergence

• qgg::plotBayes with what = “trace” to generate trace plots of MCMC convergence

• Follow-up with Dr. Starr-Moss Tuesday morning through email

• nextflow/snakemake for long-term parallelization workflow


• Fit four models to real expression data for both genders

• Email Vijay/Maria about secretariat access or use OoD to rename files

• Email Dr. Starr-Moss(cc) about admission to graduate program in the Fall

• Varbvs uses a spike-slab prior, glmnet uses LASSO variable selection


• Continue implementing models to generated expression data



• Restructure loop to use the same validate set for test and train IDs for each iteration

• Correlate y_test to y_obs to measure accuracy


• Simulate data and fit the BayesC model to toy data in preparation for secretariat reactivation.

• Look through gbayes documentation for help.

• Change fMeans and mMeans data files to exp_f and exp_m when secretariat reopens - tweak older pages accordingly.

• Next meeting will be over Zoom time/day TBD.


• Split pages into association and prediction.

• Male data was appropriate, finish details on prediction with final data.

• The cross-validation only should be done with a for loop, greml still runs for each fit.

• Perform custom cv on a toy data set as secretariat is down until around Feb. 3rd.

• Review GBLUP, Gaussian prior, and spike-slab prior from Whole-Genome Regression paper

• Review GBLUP from Leveraging Data to Predict Complex Traits paper.


• Upload rds/Rdata file of y_adj, mu, TRM, and validate for further review.

• Perform cross validation using own code: generate TRM with both model and validation set.

• Use fitG and fitB for transcriptomic predictors and intercepts respectively.

• Summary statistics of accuracy: compare predicted phenotype to observed.

• Address qqman not taking input/arguments.

• Following meeting 2:00pm on Monday the 23rd.


• Adjust phenotypes for Wolbachia infection status and inversions for Linear Regression

• Use validate to perform partitioned prediction on data using Pearson’s correlation coefficient ($corr) from qqman methods

• Troubleshoot stat from qqman not taking input/arguments -> determine issue

• Plan to review progress each Monday at 11:00 am

• Troubleshoot stat from qqman not taking input/arguments -> determine issue


• Fit data to model using starvation resistance vector

• Reorganize site to hyperlinks from main index rather than new pages every week

• Next meeting will be January 10th at CHG

• Merry Christmas!


• Fix x axis uniform distribution label: incorrect.

• Inflection in qq plot indicates inflation of p-values.

• Address this by using mixed model of fixed effect and random effect.

• Sample data will be sent later on how to perform this with simulated data.

• Send shift schedule to plan next week’s meeting.


• Fix correlations: lm not reading in data properly, p-values wrong

• Add -log10 scale on qqplot, not just Manhattan plot

• qqman package for both qqplot and Manhattan plot

• Next meeting Wednesday, Dec. 14th 2:30-3:00 via Zoom due to exams.


• Use gene expression data to preform simple regression with starvation resistance.

• Run summary(lm(y~x)) for starvation resistance against gene expression for each gene in both sexes.

• p-value of slope can be accessed from summary()[[4]][8]

• Create a -10 log scale QQ-plot for a uniform distribution versus p-values.

• Determine why this statistical approach is inappropriate.


• Multiple comparison scatter plots should be sex specific.

• PCA should be performed with genotype data, not gene expression.

o Calculate mean expression and variance for gene expression.

o Scatter plot of overlapping genes between males and females.

o Count # of genes expressed in males, females, and both.


• Continue multiple comparisons with trendlines, multiple tips:

o cor.test() for summary statistics

o List notation to store ggplots

o Use print() with ggplot, cowplot for grid arrangement

• Try to find an efficient method for column means of gene expression

• Principal Component analysis – begin brainstorming

• Discuss further PhD plans mid-February


• Presented starvation analysis of DGRP lines

o Include “code_folding: hide” in yaml headers for Rmd

o Include variance in results

• Multiple comparison of starvation against all other traits

• Perform analysis of gene expression by line


• Get creative with analysis – scatterplot trendline, normality, beyond

• WorkflowR from cmd line, develop website

• Base repositories in /data/morgante_lab/nklimko instead of home drive

• Next meeting Friday the 11th at 10am by Zoom due to election


• Set github ssh keys and config settings on personal laptop and secretariat

• Recap of workflowR and walkthrough of model layout

• Introductory project to analyze starvation resistance trait in male and female lines on computational node in secretariat.

• Postdoctoral candidate meeting and seminar on Wed 10/27 – provide feedback on both candidates


• Presented SNP Prediction Paper on Plant and Animal breeding

• Begin working with github and workflowR to begin data processing in upcoming weeks


• Create simple presentation for Whole-Genome Regression, de los Campos by 10/11 - build habit of summarizing papers/distilling main points

• Class selection: Adv. Biochem, Seminar, Intro to Quant Gen, and Regression + Least Squares

o Consult on further class selection semester basis

• Talk with Dr. Starr-Moss regarding credit hours for master’s research – three recommended


• Review of Presentation for Levaraging Data to Predict Complex Traits

o Transcriptome data higher accuracy than SNP variation

o External data boosts prediction accuracy of transcriptome alone(TBLUP vs GO-TBLUP)

o Redundancy of genome and transcriptome – additive vs non-additive

• Complete final Dataquest subunit by Friday

• Review Whole-Genome Regression Paper


• Goal to complete R modules on Dataquest by 9/30: Dataquest is paid software

• Read two papers focused on DGRP creation and usage

• Begin reading additional paper for class presentation

• Meeting time shift to 2:30pm for travel considerations