Last updated: 2024-01-16

Knit directory: dgrp-starve/

options(knitr.kable.NA = '')


GBLUP had the R2 constraints removed from the model to allow each component of the model to maximally explain variance. Overall maximum still 0.8.

While this does remove a degree of certainty, model accuracy not only improved significantly but was able to find similar results to bayesC.

partMake <- function(data, sex, nullInt, upperCutoff, lowerCutoff, psize, custom.title, custom.Xlab, custom.Ylab){
  plothole <- ggplot(data, aes(x=term, y=cor, label=term))+
    geom_point(color=viridis(1, begin=0.5), size=psize)+
    geom_text(aes(label=ifelse(cor>upperCutoff, as.character(term),'')), hjust=0, size=2, angle=0)+
    geom_text(aes(label=ifelse(cor<lowerCutoff, as.character(term),'')), hjust=1, size=2, angle=90)+
    geom_hline(yintercept = nullInt) +
    theme_minimal() +
    labs(x=custom.Xlab, y=custom.Ylab, tag=sex, title=custom.title) +
    theme(text=element_text(size=10), plot.tag = element_text(size=15))


#Cutoff selection
sigFactor <- 3

sdf <- sd(unlist(allDataF[,2]))
meanf <- mean(unlist(allDataF[,2]))
cutoffF <- meanf + sigFactor*sdf

sdm <- sd(unlist(allDataM[,2]))
meanm <- mean(unlist(allDataM[,2]))
cutoffM <- meanm + sigFactor*sdm

gg[[1]] <- partMake(allDataF, 'F', 0.31, cutoffF, 0.2, 1, 'Effect of GO Annotations in TBLUP models', 'GO Term', 'Prediction Accuracy')

gg[[2]] <- partMake(allDataM, 'M', 0.43, cutoffM, 0.2, 1, 'Effect of GO Annotations in TBLUP models', 'GO Term', 'Prediction Accuracy')

subF <- allDataF[which(cor>cutoffF),]
subM <- allDataM[which(cor>cutoffM),]

subF <- subF[order(-cor),]
subM <- subM[order(-cor),]

#write ordered GO terms to table file for enrichment purposes
cat(unlist(subF[,1]), sep = '\n', file='snake/data/go/50_tables/topHitsF.txt')
cat(unlist(subM[,1]), sep = '\n', file='snake/data/go/50_tables/topHitsM.txt')
cat(unlist(subF[,1]), sep = '\n', file='snake/code/go/enrichment/blup/f/topHitsF.txt')
cat(unlist(subM[,1]), sep = '\n', file='snake/code/go/enrichment/blup/m/topHitsM.txt')

topBlupSoloF <- readRDS('snake/code/go/enrichment/blup/f/finalData.Rds')
topBlupSoloM <- readRDS('snake/code/go/enrichment/blup/m/finalData.Rds')

Initial Findings

Comparison of top 20 terms from both BayesC and TBLUP yields familiar results: 11 of top 20 match for females, 10 of top 20 match for males

This suggests the models are both accurate and able to detect GO terms of interest, even with delimited R2.


kable(finF, caption = 'Female BayesC/BLUP Comparison', "simple")
Female BayesC/BLUP Comparison
BayesC_Cor TBLUP_Cor Term BayesC_Rank TBLUP_Rank
0.4382497 0.4319931 GO.0045819 1 3
0.4231229 0.3881740 GO.0033500 2 11
0.4178735 0.4514600 GO.0055088 3 1
0.4014416 0.4147260 GO.0042675 4 6
0.3984704 0.4366905 GO.0008586 6 2
0.3919625 0.3811383 GO.0016042 7 15
0.3873729 0.3805084 GO.0007368 8 17
0.3799043 0.3833308 GO.0006644 10 13
0.3777651 0.3805090 GO.0046488 12 16
0.3736580 0.4223885 GO.0017056 14 5
0.3691317 0.3786734 GO.0061883 19 18


kable(finM, caption = 'Male BayesC/BLUP Comparison', "simple")
Male BayesC/BLUP Comparison
BayesC_Cor TBLUP_Cor Term BayesC_Rank TBLUP_Rank
0.5116379 0.5225415 GO.0035008 1 2
0.5064311 0.5079819 GO.0140042 2 8
0.5062906 0.5156219 GO.0007485 3 4
0.4984272 0.5117790 GO.0042461 6 7
0.4978650 0.5141982 GO.0016327 8 5
0.4924000 0.5048001 GO.0006044 11 11
0.4923077 0.5124686 GO.0040018 12 6
0.4918994 0.5326720 GO.0042593 13 1
0.4882313 0.5156778 GO.0001738 15 3
0.4866070 0.4983168 GO.0045196 17 18

Overall Results

For GO-TBLUP, I filtered top terms that were 3 standard deviations above the mean for each sex.

We then translated the top GO terms into human readable categories to assess our findings. Below are the top ten ordered by correlation.


All Data
plot_grid(gg[[1]], ncol=1)

Version Author Date
7327d20 nklimko 2024-01-16
Top Terms

id: GO:0055088 name: lipid homeostasis

id: GO:0008586 name: imaginal disc-derived wing vein morphogenesis

id: GO:0045819 name: positive regulation of glycogen catabolic process

id: GO:0043066 name: negative regulation of apoptotic process

id: GO:0017056 name: structural constituent of nuclear pore

id: GO:0042675 name: compound eye cone cell differentiation

id: GO:0035556 name: intracellular signal transduction

id: GO:0006606 name: protein import into nucleus

id: GO:0005524 name: ATP binding

id: GO:0000281 name: mitotic cytokinesis

id: GO:0033500 name: carbohydrate homeostasis

id: GO:0042749 name: regulation of circadian sleep/wake cycle

id: GO:0006644 name: phospholipid metabolic process

id: GO:0004672 name: protein kinase activity

id: GO:0016042 name: lipid catabolic process

id: GO:0046488 name: phosphatidylinositol metabolic process

id: GO:0007368 name: determination of left/right symmetry


All Data
plot_grid(gg[[2]], ncol=1)

Version Author Date
7327d20 nklimko 2024-01-16
Top Terms

id: GO:0042593 name: glucose homeostasis

id: GO:0035008 name: positive regulation of melanization defense response

id: GO:0001738 name: morphogenesis of a polarized epithelium

id: GO:0007485 name: imaginal disc-derived male genitalia development

id: GO:0016327 name: apicolateral plasma membrane

id: GO:0040018 name: positive regulation of multicellular organism growth

id: GO:0042461 name: photoreceptor cell development

id: GO:0140042 name: lipid droplet formation

id: GO:0030295 name: protein kinase activator activity

id: GO:0050830 name: defense response to Gram-positive bacterium

id: GO:0006044 name: N-acetylglucosamine metabolic process

id: GO:0070328 name: triglyceride homeostasis

id: GO:0045793 name: positive regulation of cell size

id: GO:0007166 name: cell surface receptor signaling pathway

id: GO:0007419 name: ventral cord development

Post Processing

Beyond this, we took the models to determine if certain genes were enriched in the GO terms of interest. From the selected terms, we pooled the associated genes and totaled gene occurrence.

Females involved 81 unique genes at least 3 times across top terms. Of these, 7 were present 4 or more times.

Males had a significantly lower number of genes involved than expected. Only 7 genes were involved at least 3 times across top terms. This may suggest that the selection criteria is too stringent for males despite males having a higher base prediction accuracy.

After establishing unique genes, we translated the FlyBase gene codes to human-readable genes.


kable(topBlupSoloF, caption = 'GO-TBLUP Genes', "simple")
flybase count gene
FBgn0010379 5 Akt1
FBgn0003731 5 Egfr
FBgn0025595 4 AkhR
FBgn0262103 4 Sik3
FBgn0283499 4 InR
FBgn0020386 4 Pdk1
FBgn0028484 4 Ack
FBgn0004552 3 Akh
FBgn0035039 3 Adck
FBgn0261984 3 Ire1
FBgn0283472 3 S6k
FBgn0000575 3 emc
FBgn0004635 3 rho
FBgn0026252 3 msk
FBgn0035142 3 Hipk
FBgn0003169 3 put
FBgn0011300 3 babo
FBgn0260945 3 Atg1
FBgn0002413 3 dco
FBgn0032006 3 Pvr
FBgn0003091 3 Pkc53E
FBgn0003093 3 Pkc98E
FBgn0003256 3 rl
FBgn0003502 3 Btk29A
FBgn0003744 3 trc
FBgn0004784 3 inaC
FBgn0004864 3 hop
FBgn0010197 3 Gyc32E
FBgn0010441 3 pll
FBgn0011817 3 nmo
FBgn0013987 3 MAPk-Ak2
FBgn0015765 3 p38a
FBgn0017581 3 Lk6
FBgn0020621 3 Pkn
FBgn0023169 3 AMPKalpha
FBgn0024846 3 p38b
FBgn0025625 3 Sik2
FBgn0025743 3 mbt
FBgn0026063 3 KP78b
FBgn0026064 3 KP78a
FBgn0027497 3 Madm
FBgn0028741 3 fab1
FBgn0031299 3 CG4629
FBgn0031784 3 CG9222
FBgn0033915 3 CG8485
FBgn0034568 3 CG3216
FBgn0034950 3 Pask
FBgn0036368 3 CG10738
FBgn0036544 3 sff
FBgn0037098 3 Wnk
FBgn0038167 3 Lkb1
FBgn0038630 3 CG14305
FBgn0039083 3 CG10177
FBgn0040056 3 CG17698
FBgn0044826 3 Pak3
FBgn0046706 3 Haspin
FBgn0051183 3 CG31183
FBgn0052666 3 Drak
FBgn0052703 3 Erk7
FBgn0052944 3 CG32944
FBgn0085386 3 CG34357
FBgn0259712 3 CG42366
FBgn0260399 3 gwl
FBgn0260934 3 par-1
FBgn0261278 3 grp
FBgn0261360 3 CG42637
FBgn0261387 3 CG17528
FBgn0261456 3 hpo
FBgn0261854 3 aPKC
FBgn0262738 3 norpA
FBgn0262866 3 S6kII
FBgn0263395 3 hppy
FBgn0266136 3 Gyc76C
FBgn0267339 3 p38c
FBgn0267390 3 dop
FBgn0267698 3 Pak
FBgn0283657 3 Tlk
FBgn0002466 3 sti
FBgn0010303 3 hep
FBgn0024227 3 aurB
FBgn0026181 3 Rok


kable(topBlupSoloM, caption = 'GO-TBLUP Genes', "simple")
flybase count gene
FBgn0036046 4 Ilp2
FBgn0086687 4 Desat1
FBgn0283499 4 InR
FBgn0020386 4 Pdk1
FBgn0024248 3 chico
FBgn0261873 3 sdt
FBgn0037874 3 Tctp


allSolo <- cbind(topBlupSoloF[1:7], topBlupSoloM)

names(allSolo) <- c('Female Gene', 'Count', 'Name', 'Male Gene', 'Count', 'Name')

kable(allSolo, caption = 'GO-TBLUP Gene Comparison', "simple")
GO-TBLUP Gene Comparison
Female Gene Count Name Male Gene Count Name
FBgn0010379 5 Akt1 FBgn0036046 4 Ilp2
FBgn0003731 5 Egfr FBgn0086687 4 Desat1
FBgn0025595 4 AkhR FBgn0283499 4 InR
FBgn0262103 4 Sik3 FBgn0020386 4 Pdk1
FBgn0283499 4 InR FBgn0024248 3 chico
FBgn0020386 4 Pdk1 FBgn0261873 3 sdt
FBgn0028484 4 Ack FBgn0037874 3 Tctp

Looking at both sexes together, the only two genes that are found in both rankings are InR and Pdk1. Coincidentally, both are significantly involved genes for both sexes.

  • InR is an insulin receptor.

  • Pdk is a pyruvate dehydrogenase kinase.

Intuitively, both are heavily involved in carbohydrate modification activity.

