Last updated: 2023-01-22
Checks: 2 0
Knit directory: dgrp-starve/
This reproducible R Markdown analysis was created with workflowr (version 1.7.0). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.
Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.
Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.
The results in this page were generated with repository version 709ca14. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.
Note that you need to be careful to ensure that all relevant files for
the analysis have been committed to Git prior to generating the results
(you can use wflow_publish
or
wflow_git_commit
). workflowr only checks the R Markdown
file, but you know if there are other scripts or data files that it
depends on. Below is the status of the Git repository when the results
were generated:
Ignored files:
Ignored: .RData
Untracked files:
Untracked: analysis/gremlo.R
Untracked: analysis/linearReg.Rmd
Untracked: analysis/rewrite.Rmd
Untracked: code/aaaTest
Untracked: code/analysisSR.R
Untracked: code/geneGO.R
Untracked: code/multiPrep.R
Untracked: code/regress.81916.err
Untracked: code/regress.81916.out
Untracked: code/regress.81918.err
Untracked: code/regress.81918.out
Untracked: code/regress.R
Untracked: code/regress.sbatch
Untracked: code/regressF.81919.err
Untracked: code/regressF.81919.out
Untracked: code/regressF.R
Untracked: code/regressF.sbatch
Untracked: code/regress_f_adj.109973.err
Untracked: code/regress_f_adj.109973.out
Untracked: code/regress_f_adj.109974.err
Untracked: code/regress_f_adj.109974.out
Untracked: code/regress_f_adj.R
Untracked: code/regress_f_adj.sbatch
Untracked: code/regress_m_adj.109971.err
Untracked: code/regress_m_adj.109971.out
Untracked: code/regress_m_adj.109972.err
Untracked: code/regress_m_adj.109972.out
Untracked: code/regress_m_adj.R
Untracked: code/regress_m_adj.sbatch
Untracked: code/snpGene.77509.err
Untracked: code/snpGene.77509.out
Untracked: code/snpGene.77515.err
Untracked: code/snpGene.77515.out
Untracked: code/snpGene.sbatch
Untracked: data/eQTL_traits_females.csv
Untracked: data/eQTL_traits_males.csv
Untracked: data/fMeans.txt
Untracked: data/fRegress.txt
Untracked: data/fRegress_adj.txt
Untracked: data/f_adj.txt
Untracked: data/goGroups.txt
Untracked: data/mMeans.txt
Untracked: data/mPart.txt
Untracked: data/mRegress.txt
Untracked: data/mRegress_adj.txt
Untracked: data/m_adj.txt
Untracked: data/multiReg.rData
Untracked: data/starve-f.txt
Untracked: data/starve-m.txt
Untracked: data/xp-f.txt
Untracked: data/xp-m.txt
Untracked: data/y_save.txt
Untracked: figure/
Untracked: lmm.R
Untracked: qqdum.R
Untracked: scoreAnalysisMulticomp.R
Untracked: temp.Rmd
Unstaged changes:
Deleted: analysis/database.Rmd
Modified: analysis/index.Rmd
Modified: analysis/linReg.Rmd
Modified: analysis/multiComp.Rmd
Modified: analysis/multiReg.Rmd
Deleted: analysis/scripts.Rmd
Modified: code/baseScript-lineComp.R
Modified: code/fourLinePrep.R
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
These are the previous versions of the repository in which changes were
made to the R Markdown (analysis/recap.Rmd
) and HTML
(docs/recap.html
) files. If you’ve configured a remote Git
repository (see ?wflow_git_remote
), click on the hyperlinks
in the table below to view the files as they were in that past version.
File | Version | Author | Date | Message |
---|---|---|---|---|
Rmd | 709ca14 | nklimko | 2023-01-22 | wflow_publish("analysis/recap.Rmd") |
html | e4cf8cb | nklimko | 2023-01-10 | Build site. |
Rmd | c826636 | nklimko | 2023-01-10 | wflow_publish("analysis/recap.Rmd") |
html | bf1900e | nklimko | 2023-01-10 | Build site. |
Rmd | a05fc48 | nklimko | 2023-01-10 | wflow_publish("analysis/recap.Rmd") |
• Upload rds/Rdata file of y_adj, mu, TRM, and validate for further review.
• Perform cross validation using own code: generate TRM with both model and validation set.
• Use fitG and fitB for transcriptomic predictors and intercepts respectively.
• Summary statistics of accuracy: compare predicted phenotype to observed.
• Address qqman not taking input/arguments.
• Following meeting 2:00pm on Monday the 23rd.
• Adjust phenotypes for Wolbachia infection status and inversions for Linear Regression
• Use validate to perform partitioned prediction on data using Pearson’s correlation coefficient ($corr) from qqman methods
• Troubleshoot stat from qqman not taking input/arguments -> determine issue
• Plan to review progress each Monday at 11:00 am
• Troubleshoot stat from qqman not taking input/arguments -> determine issue
• Fit data to model using starvation resistance vector
• Reorganize site to hyperlinks from main index rather than new pages every week
• Next meeting will be January 10th at CHG
• Merry Christmas!
• Fix x axis uniform distribution label: incorrect.
• Inflection in qq plot indicates inflation of p-values.
• Address this by using mixed model of fixed effect and random effect.
• Sample data will be sent later on how to perform this with simulated data.
• Send shift schedule to plan next week’s meeting.
• Fix correlations: lm not reading in data properly, p-values wrong
• Add -log10 scale on qqplot, not just Manhattan plot
• qqman package for both qqplot and Manhattan plot
• Next meeting Wednesday, Dec. 14th 2:30-3:00 via Zoom due to exams.
• Use gene expression data to preform simple regression with starvation resistance.
• Run summary(lm(y~x)) for starvation resistance against gene expression for each gene in both sexes.
• p-value of slope can be accessed from
summary()[[4]][8]
• Create a -10 log scale QQ-plot for a uniform distribution versus p-values.
• Determine why this statistical approach is inappropriate.
• Multiple comparison scatter plots should be sex specific.
• PCA should be performed with genotype data, not gene expression.
o Calculate mean expression and variance for gene expression.
o Scatter plot of overlapping genes between males and females.
o Count # of genes expressed in males, females, and both.
• Continue multiple comparisons with trendlines, multiple tips:
o cor.test() for summary statistics
o List notation to store ggplots
o Use print() with ggplot, cowplot for grid arrangement
• Try to find an efficient method for column means of gene expression
• Principal Component analysis – begin brainstorming
• Discuss further PhD plans mid-February
• Presented starvation analysis of DGRP lines
o Include “code_folding: hide” in yaml headers for Rmd
o Include variance in results
• Multiple comparison of starvation against all other traits
• Perform analysis of gene expression by line
• Get creative with analysis – scatterplot trendline, normality, beyond
• WorkflowR from cmd line, develop website
• Base repositories in /data/morgante_lab/nklimko instead of home drive
• Next meeting Friday the 11th at 10am by Zoom due to election
• Set github ssh keys and config settings on personal laptop and secretariat
• Recap of workflowR and walkthrough of model layout
• Introductory project to analyze starvation resistance trait in male and female lines on computational node in secretariat.
• Postdoctoral candidate meeting and seminar on Wed 10/27 – provide feedback on both candidates
• Presented SNP Prediction Paper on Plant and Animal breeding
• Begin working with github and workflowR to begin data processing in upcoming weeks
• Create simple presentation for https://www.genetics.org/content/genetics/193/2/327.full.pdf by 10/11 - build habit of summarizing papers/distilling main points
• Class selection: Adv. Biochem, Seminar, Intro to Quant Gen, and Regression + Least Squares
o Consult on further class selection semester basis
• Talk with Dr. Starr-Moss regarding credit hours for master’s research – three recommended
• Review of Presentation for https://academic.oup.com/g3journal/article/10/12/4599/6048688
o Transcriptome data higher accuracy than SNP variation
o External data boosts prediction accuracy of transcriptome alone(TBLUP vs GO-TBLUP)
o Redundancy of genome and transcriptome – additive vs non-additive
• Complete final Dataquest subunit by Friday
• Review https://www.genetics.org/content/genetics/193/2/327.full.pdf
• Goal to complete R modules on Dataquest by 9/30: Dataquest is paid software
• Read two papers focused on DGRP creation and usage
• Begin reading additional paper for class presentation
• Meeting time shift to 2:30pm for travel considerations