Last updated: 2018-11-24
workflowr checks: (Click a bullet for more information) ✔ R Markdown file: up-to-date
Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.
✔ Environment: empty
Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.
✔ Seed:
set.seed(1)
The command set.seed(1)
was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.
✔ Session information: recorded
Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
✔ Repository version: c09cb2a
wflow_publish
or wflow_git_commit
). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:
Ignored files:
Ignored: .DS_Store
Ignored: .Rhistory
Ignored: .Rproj.user/
Ignored: analysis/.DS_Store
Ignored: analysis/.Rhistory
Ignored: analysis/include/.DS_Store
Ignored: code/.DS_Store
Ignored: data/.DS_Store
Ignored: docs/.DS_Store
Ignored: output/.DS_Store
Untracked files:
Untracked: analysis/Classify.Rmd
Untracked: analysis/EstimateCorMaxGD.Rmd
Untracked: analysis/HierarchicalFlashSim.Rmd
Untracked: analysis/MashLowSignalGTEx4.Rmd
Untracked: analysis/Mash_GTEx.Rmd
Untracked: analysis/MeanAsh.Rmd
Untracked: analysis/OutlierDetection.Rmd
Untracked: analysis/OutlierDetection2.Rmd
Untracked: analysis/OutlierDetection3.Rmd
Untracked: analysis/OutlierDetection4.Rmd
Untracked: analysis/mash_missing_row.Rmd
Untracked: code/GTExNullModel.R
Untracked: code/MashClassify.R
Untracked: code/MashCorResult.R
Untracked: code/MashCormVResult.R
Untracked: code/MashNULLCorResult.R
Untracked: code/MashSource.R
Untracked: code/Weight_plot.R
Untracked: code/addemV.R
Untracked: code/estimate_cor.R
Untracked: code/generateDataV.R
Untracked: code/johnprocess.R
Untracked: code/mV.R
Untracked: code/sim_mean_sig.R
Untracked: code/summary.R
Untracked: data/Blischak_et_al_2015/
Untracked: data/scale_data.rds
Untracked: docs/figure/Classify.Rmd/
Untracked: docs/figure/OutlierDetection.Rmd/
Untracked: docs/figure/OutlierDetection2.Rmd/
Untracked: docs/figure/OutlierDetection3.Rmd/
Untracked: docs/figure/Test.Rmd/
Untracked: docs/figure/mash_missing_whole_row_5.Rmd/
Untracked: docs/include/
Untracked: output/AddEMV/
Untracked: output/CovED_UKBio_strong.rds
Untracked: output/CovED_UKBio_strong_Z.rds
Untracked: output/Flash_UKBio_strong.rds
Untracked: output/GTExNULLres/
Untracked: output/GTEx_2.5_nullData.rds
Untracked: output/GTEx_2.5_nullModel.rds
Untracked: output/GTEx_2.5_nullPermData.rds
Untracked: output/GTEx_2.5_nullPermModel.rds
Untracked: output/GTEx_3.5_nullData.rds
Untracked: output/GTEx_3.5_nullModel.rds
Untracked: output/GTEx_3.5_nullPermData.rds
Untracked: output/GTEx_3.5_nullPermModel.rds
Untracked: output/GTEx_3_nullData.rds
Untracked: output/GTEx_3_nullModel.rds
Untracked: output/GTEx_3_nullPermData.rds
Untracked: output/GTEx_3_nullPermModel.rds
Untracked: output/GTEx_4.5_nullData.rds
Untracked: output/GTEx_4.5_nullModel.rds
Untracked: output/GTEx_4.5_nullPermData.rds
Untracked: output/GTEx_4.5_nullPermModel.rds
Untracked: output/GTEx_4_nullData.rds
Untracked: output/GTEx_4_nullModel.rds
Untracked: output/GTEx_4_nullPermData.rds
Untracked: output/GTEx_4_nullPermModel.rds
Untracked: output/MASH.10.em2.result.rds
Untracked: output/MASH.10.mle.result.rds
Untracked: output/MashCorSim--midway/
Untracked: output/Mash_EE_Cov_0_plusR1.rds
Untracked: output/UKBio_mash_model.rds
Untracked: output/mVIterations/
Untracked: output/mVUlist/
Untracked: output/mVsubset/
Untracked: output/result.em.rds
Unstaged changes:
Modified: analysis/EstimateCorIndex.Rmd
Modified: analysis/Mash_UKBio.Rmd
Modified: analysis/mash_missing_samplesize.Rmd
Modified: output/Flash_T2_0.rds
Modified: output/Flash_T2_0_mclust.rds
Modified: output/Mash_model_0_plusR1.rds
Modified: output/PresiAddVarCol.rds
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
File | Version | Author | Date | Message |
---|---|---|---|---|
Rmd | c09cb2a | zouyuxin | 2018-11-24 | wflow_publish(“analysis/EstimateCorMaxMVSample.Rmd”) |
library(mashr)
Loading required package: ashr
library(ggplot2)
source('../code/generateDataV.R')
Simple simulation in \(R^2\): \[ \hat{\beta}|\beta \sim N_{2}(\hat{\beta}; \beta, \left(\begin{matrix} 1 & 0.5 \\ 0.5 & 1 \end{matrix}\right)) \]
\[ \beta \sim \frac{1}{4}\delta_{0} + \frac{1}{4}N_{2}(0, \left(\begin{matrix} 1 & 0 \\ 0 & 0 \end{matrix}\right)) + \frac{1}{4}N_{2}(0, \left(\begin{matrix} 0 & 0 \\ 0 & 1 \end{matrix}\right)) + \frac{1}{4}N_{2}(0, \left(\begin{matrix} 1 & 1 \\ 1 & 1 \end{matrix}\right)) \]
set.seed(1)
n = 4000; p = 2
Sigma = matrix(c(1,0.5,0.5,1),p,p)
U0 = matrix(0,2,2)
U1 = U0; U1[1,1] = 1
U2 = U0; U2[2,2] = 1
U3 = matrix(1,2,2)
Utrue = list(U0=U0, U1=U1, U2=U2, U3=U3)
data = generate_data(n, p, Sigma, Utrue)
m.data = mash_set_data(data$Bhat, data$Shat)
U.c = cov_canonical(m.data)
samples = sample(1:n, 1000)
m.data.subset = mash_set_data(Bhat = data$Bhat[samples,], Shat = data$Shat)
result.mV.full <- estimate_null_correlation(m.data, U.c, tol=1e-3)
result.mV.subset <- estimate_null_correlation(m.data.subset, U.c, tol=1e-3)
Fit full mash model using estimated V
m.data.subset.V = mash_update_data(m.data, V=result.mV.subset$V)
model.subset = mash(m.data.subset.V, U.c)
- Computing 4000 x 106 likelihood matrix.
- Likelihood calculations took 0.09 seconds.
- Fitting model with 106 mixture components.
- Model fitting took 0.64 seconds.
- Computing posterior matrices.
- Computation allocated took 0.05 seconds.
| model with V.Full | model with V.subset
————–|——————-|——————– loglikelihood | get_loglik(result.mV.full$loglik) | get_loglik(model.subset)
I randomly generate 10 positive definite correlation matrices, V. The sample size is 4000.
\[ \hat{b}_{j}|b_{j} \sim N_{5}(z, S_{j}VS_{j}) \] \[ b_{j}\sim\frac{1}{4}\delta_{0} + \frac{1}{4}N_{5}(0,\left(\begin{matrix} 1 & \mathbf{0}_{1\times 4} \\ \mathbf{0}_{4\times 1} & \mathbf{0}_{4\times 4} \end{matrix}\right)) + \frac{1}{4}N_{5}(0,\left(\begin{matrix} \mathbf{1}_{2\times 2} & \mathbf{0}_{1\times 3} \\ \mathbf{0}_{3\times 1} & \mathbf{0}_{3\times 3} \end{matrix}\right)) + \frac{1}{4}N_{5}(0,\mathbf{1}_{5\times 5}) \]
set.seed(20181124)
n=4000; p = 5
U0 = matrix(0,p,p)
U1 = U0; U1[1,1] = 1
U2 = U0; U2[c(1:2), c(1:2)] = 1
U3 = matrix(1, p,p)
Utrue = list(U0 = U0, U1 = U1, U2 = U2, U3 = U3)
for(t in 1:10){
print(paste0('Data: ', t))
Vtrue = clusterGeneration::rcorrmatrix(p)
data = generate_data(n, p, Vtrue, Utrue)
# mash cov
m.data = mash_set_data(Bhat = data$Bhat, Shat = data$Shat)
m.1by1 = mash_1by1(m.data)
strong = get_significant_results(m.1by1)
U.pca = cov_pca(m.data, 3, subset = strong)
U.ed = cov_ed(m.data, U.pca, subset = strong)
U.c = cov_canonical(m.data)
samples = sample(1:n, 1000)
m.data.subset = mash_set_data(Bhat = data$Bhat[samples,], Shat = data$Shat)
print('Method: mV')
print('Full')
Vhat.full <- estimate_null_correlation(m.data, c(U.c, U.ed),
tol=1e-3, max_iter = 5)
print('Subset')
Vhat.subset <- estimate_null_correlation(m.data.subset, c(U.c, U.ed),
tol=1e-3, max_iter = 5)
m.data.subset.V = mash_update_data(m.data, V = Vhat.subset$V)
model.subset = mash(m.data.subset.V, c(U.c,U.ed))
saveRDS(list(V.true = Vtrue, V.mV.Full = Vhat.full, V.mV.subset = Vhat.subset,
model.subset = model.subset, data = data, strong=strong),
paste0('../output/mVsubset/MASH.mV.subset.result.',t,'.rds'))
}
files = dir("../output/mVsubset/"); files = files[grep("MASH.mV.subset.result",files)]
times = length(files)
result = vector(mode="list",length = times)
for(i in 1:times) {
result[[i]] = readRDS(paste("../output/mVsubset/", files[[i]], sep=""))
}
result_wrap = vector("list", times)
for(i in 1:times){
m.data = mash_set_data(result[[i]]$data$Bhat, result[[i]]$data$Shat)
m.1by1 = mash_1by1(m.data)
strong = get_significant_results(m.1by1)
U.c = cov_canonical(m.data)
U.pca = cov_pca(m.data, 3, subset = strong)
U.ed = cov_ed(m.data, U.pca, subset = strong)
m.data.true = mash_set_data(Bhat = m.data$Bhat, Shat = m.data$Shat, V = result[[i]]$V.true)
m.model.true = mash(m.data.true, c(U.c,U.ed), verbose = FALSE)
# mV
result_wrap[[i]]$V.true = result[[i]]$V.true
result_wrap[[i]]$V.full = result[[i]]$V.mV.Full$V
result_wrap[[i]]$V.subset = result[[i]]$V.mV.subset$V
result_wrap[[i]]$m.model = list(m.model.true = m.model.true, m.model.full = result[[i]]$V.mV.Full$mash.model,
m.model.subset = result[[i]]$model.subset)
}
The Frobenius norm is
norm.type='F'
temp = matrix(0,nrow = times, ncol = 2)
for(i in 1:times){
temp[i, ] = c(norm(result_wrap[[i]]$V.full - result_wrap[[i]]$V.true, type = norm.type),
norm(result_wrap[[i]]$V.subset - result_wrap[[i]]$V.true, type = norm.type))
}
colnames(temp) = c('Full','Subset')
temp = reshape2::melt(temp[])
colnames(temp) = c('Data', 'Method', 'FrobError')
ggplot(temp, aes(x = Data, y=FrobError, group = Method, color = Method)) + geom_line()
The spectral norm is
norm.type='2'
temp = matrix(0,nrow = times, ncol = 2)
for(i in 1:times){
temp[i, ] = c(norm(result_wrap[[i]]$V.full - result_wrap[[i]]$V.true, type = norm.type),
norm(result_wrap[[i]]$V.subset - result_wrap[[i]]$V.true, type = norm.type))
}
colnames(temp) = c('Full','Subset')
temp = reshape2::melt(temp[])
colnames(temp) = c('Data', 'Method', 'SpecError')
ggplot(temp, aes(x = Data, y=SpecError, group = Method, color = Method)) + geom_line()
temp = matrix(0,nrow = times, ncol = 3)
for(i in 1:times){
temp[i, ] = c(get_loglik(result_wrap[[i]]$m.model$m.model.true), get_loglik(result_wrap[[i]]$m.model$m.model.full),
get_loglik(result_wrap[[i]]$m.model$m.model.subset))
}
colnames(temp) = c('True', 'Full','Subset')
temp = reshape2::melt(temp)
colnames(temp) = c('Data', 'Method', 'loglikelihood')
ggplot(temp, aes(x = Data, y=loglikelihood, group = Method, color = Method)) + geom_line()
ROC.table = function(data, model){
sign.test = data*model$result$PosteriorMean
thresh.seq = seq(0, 1, by=0.005)[-1]
m.seq = matrix(0,length(thresh.seq), 2)
colnames(m.seq) = c('TPR', 'FPR')
for(t in 1:length(thresh.seq)){
m.seq[t,] = c(sum(sign.test>0 & model$result$lfsr <= thresh.seq[t])/sum(data!=0),
sum(data==0 & model$result$lfsr <=thresh.seq[t])/sum(data==0))
}
return(m.seq)
}
plotROC = function(data.true, result.model, title){
m.full.seq = ROC.table(data.true, result.model$m.model.full)
m.subset.seq = ROC.table(data.true, result.model$m.model.subset)
m.true.seq = ROC.table(data.true, result.model$m.model.true)
plot(m.true.seq[,'FPR'], m.true.seq[,'TPR'],type='l',xlab = 'FPR', ylab='TPR',
main=paste0(title, 'True Pos vs False Pos'), cex=1.5, lwd = 1.5)
lines(m.full.seq[,'FPR'], m.full.seq[,'TPR'], col='red', lwd = 1.5)
lines(m.subset.seq[,'FPR'], m.subset.seq[,'TPR'], col='darkorchid', lwd = 1.5)
legend('bottomright', c('True','Full', 'Subset'),col=c('black','red','darkorchid'),
lty=c(1,1,1), lwd=c(1.5,1.5,1.5))
}
par(mfrow=c(1,2))
for(i in 1:times){
plotROC(result[[i]]$data$B, result_wrap[[i]]$m.model, title=paste0('Data', i, ' '))
}
RRMSE = function(datatrue, dataobs, model){
model = Filter(length, model)
rrmse = numeric(length(model))
for(k in 1:length(model)){
rrmse[k] = sqrt(mean((datatrue - model[[k]]$result$PosteriorMean)^2)/mean((datatrue - dataobs)^2))
}
rrmse = as.matrix(t(rrmse))
colnames(rrmse) = names(model)
return(rrmse)
}
par(mfrow=c(1,2))
for(i in 1:times){
rrmse = rbind(RRMSE(result[[i]]$data$B, result[[i]]$data$Bhat, result_wrap[[i]]$m.model))
barplot(rrmse, ylim=c(0,(1+max(rrmse))/2), las=2, cex.names = 0.7, main='RRMSE')
}
sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS 10.14.1
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] ggplot2_3.1.0 mashr_0.2.18.0536 ashr_2.2-23
loaded via a namespace (and not attached):
[1] tidyselect_0.2.5 purrr_0.2.5 reshape2_1.4.3
[4] lattice_0.20-35 Rmosek_8.0.69 colorspace_1.3-2
[7] htmltools_0.3.6 yaml_2.2.0 rlang_0.3.0.1
[10] R.oo_1.22.0 pillar_1.3.0 glue_1.3.0
[13] withr_2.1.2 R.utils_2.7.0 REBayes_1.3
[16] bindrcpp_0.2.2 foreach_1.4.4 plyr_1.8.4
[19] bindr_0.1.1 stringr_1.3.1 munsell_0.5.0
[22] gtable_0.2.0 workflowr_1.1.1 R.methodsS3_1.7.1
[25] mvtnorm_1.0-8 codetools_0.2-15 evaluate_0.12
[28] labeling_0.3 knitr_1.20 pscl_1.5.2
[31] doParallel_1.0.14 parallel_3.5.1 Rcpp_1.0.0
[34] scales_1.0.0 backports_1.1.2 rmeta_3.0
[37] truncnorm_1.0-8 abind_1.4-5 digest_0.6.18
[40] stringi_1.2.4 dplyr_0.7.6 grid_3.5.1
[43] rprojroot_1.3-2 tools_3.5.1 magrittr_1.5
[46] lazyeval_0.2.1 tibble_1.4.2 crayon_1.3.4
[49] whisker_0.3-2 pkgconfig_2.0.2 MASS_7.3-50
[52] Matrix_1.2-14 SQUAREM_2017.10-1 assertthat_0.2.0
[55] rmarkdown_1.10 iterators_1.0.10 R6_2.3.0
[58] git2r_0.23.0 compiler_3.5.1
This reproducible R Markdown analysis was created with workflowr 1.1.1