Background

I wanted to try out David’s vicar package, particularly mouthwash_secondstep, and see how the $t$ distribution vs normal distribution works, particularly with the variance inflation option.

Simulate data

I start with “null” data, but with t errors.

set.seed(1)
n=1000
bhat = rt(n,df=4) # t with 4 df
shat = rep(1,n)
library(ashr)
bhat.ash.t4 = ash(bhat,shat,df = 4)
bhat.ash.norm = ash(bhat,shat)
get_pi0(bhat.ash.norm)

[1] 0.8433816

get_pi0(bhat.ash.t4)

[1] 0.988749

min(get_lfsr(bhat.ash.norm))

[1] 0

min(get_lfsr(bhat.ash.t4))

[1] 0.2116995

So we see that use of a normal likelihood with $t$ data creates “false positives”, not suprisingly.

Now I wanted to check if this also occurs in mouthwash - the idea was that maybe the use of a variance inflation parameter in mouthwash would avoid this behaviour. However, it appeared not to.

library("vicar")
a = matrix(rep(1,n),nrow = n, ncol=1) # just put in an "intercept" confounder with no effect
a_seq = bhat.ash.norm$fitted_g$a
b_seq = bhat.ash.norm$fitted_g$b
lambda_seq = rep(1,length(a_seq))
lambda_seq[1] = 10
bhat.m.norm = mouthwash_second_step(bhat,shat,a,lambda_seq = lambda_seq,a_seq = a_seq, b_seq=b_seq,mixing_dist = "uniform", likelihood="normal", scale_var = FALSE, degrees_freedom = 100000) #I had to set very high df to get it to run
bhat.m.t4 = mouthwash_second_step(bhat,shat,a,lambda_seq = lambda_seq,a_seq = a_seq, b_seq=b_seq,mixing_dist = "uniform", likelihood="t", scale_var = FALSE, degrees_freedom = 4)
bhat.m.norm.c = mouthwash_second_step(bhat,shat,a,lambda_seq = lambda_seq,a_seq = a_seq, b_seq=b_seq,mixing_dist = "uniform", likelihood="normal", scale_var = TRUE, degrees_freedom = 100000) #I had to set very high df to get it to run
bhat.m.t4.c = mouthwash_second_step(bhat,shat,a,lambda_seq = lambda_seq,a_seq = a_seq, b_seq=b_seq,mixing_dist = "uniform", likelihood="t", scale_var = TRUE, degrees_freedom = 4)

bhat.m.norm$pi0

[1] 0.8440323

bhat.m.t4$pi0

[1] 0.988799

bhat.m.norm.c$pi0

[1] 0.6972446

bhat.m.t4.c$pi0

[1] 0.9484263

min(bhat.m.norm$result$lfsr)

[1] 0

min(bhat.m.t4$result$lfsr)

[1] 0.2131923

min(bhat.m.norm.c$result$lfsr)

[1] 0

min(bhat.m.t4.c$result$lfsr)

[1] 0.1497321

So actually the scaling seems to make things worse here. Actually this seems to be because the scaling parameter is estimated to be <1.

bhat.m.norm.c$xi

[1] 0.7074148

bhat.m.t4.c$xi

[1] 0.8854449

Session information

sessionInfo()

R version 3.3.2 (2016-10-31)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X El Capitan 10.11.6

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] ashr_2.1-8      vicar_0.1.6     workflowr_0.4.0 rmarkdown_1.4  

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.10      rstudioapi_0.6    knitr_1.15.1     
 [4] magrittr_1.5      REBayes_0.73      MASS_7.3-45      
 [7] etrunct_0.1       doParallel_1.0.10 pscl_1.4.9       
[10] SQUAREM_2016.8-2  lattice_0.20-34   foreach_1.4.3    
[13] stringr_1.2.0     tools_3.3.2       grid_3.3.2       
[16] parallel_3.3.2    git2r_0.18.0      htmltools_0.3.5  
[19] iterators_1.0.8   assertthat_0.2.0  yaml_2.1.14      
[22] rprojroot_1.2     digest_0.6.12     Matrix_1.2-8     
[25] codetools_0.2-15  rsconnect_0.7     evaluate_0.10    
[28] stringi_1.1.2     Rmosek_7.1.2      backports_1.0.5  
[31] truncnorm_1.0-7

This R Markdown site was created with workflowr

explore vicar package; t distribution

Matthew Stephens

2017-04-14

Background

Simulate data

Session information