Last updated: 2018-08-20

workflowr checks: (Click a bullet for more information)
  • R Markdown file: up-to-date

    Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

  • Environment: empty

    Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

  • Seed: set.seed(1)

    The command set.seed(1) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

  • Session information: recorded

    Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

  • Repository version: 88274cc

    Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility. The version displayed above was the version of the Git repository at the time these results were generated.

    Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:
    
    Ignored files:
        Ignored:    .DS_Store
        Ignored:    .Rhistory
        Ignored:    .Rproj.user/
        Ignored:    analysis/.DS_Store
        Ignored:    analysis/.Rhistory
        Ignored:    analysis/include/.DS_Store
        Ignored:    data/.DS_Store
        Ignored:    docs/.DS_Store
        Ignored:    output/.DS_Store
    
    Untracked files:
        Untracked:  analysis/Classify.Rmd
        Untracked:  analysis/EstimateCorMaxEM.Rmd
        Untracked:  analysis/EstimateCorMaxEMGD.Rmd
        Untracked:  analysis/EstimateCorPrior.Rmd
        Untracked:  analysis/EstimateCorSol.Rmd
        Untracked:  analysis/HierarchicalFlashSim.Rmd
        Untracked:  analysis/MashLowSignal.Rmd
        Untracked:  analysis/Mash_GTEx.Rmd
        Untracked:  analysis/MeanAsh.Rmd
        Untracked:  analysis/OutlierDetection.Rmd
        Untracked:  analysis/OutlierDetection2.Rmd
        Untracked:  analysis/OutlierDetection3.Rmd
        Untracked:  analysis/OutlierDetection4.Rmd
        Untracked:  analysis/mash_missing_row.Rmd
        Untracked:  code/MashClassify.R
        Untracked:  code/MashCorResult.R
        Untracked:  code/MashSource.R
        Untracked:  code/Weight_plot.R
        Untracked:  code/addemV.R
        Untracked:  code/estimate_cor.R
        Untracked:  code/generateDataV.R
        Untracked:  code/johnprocess.R
        Untracked:  code/sim_mean_sig.R
        Untracked:  code/summary.R
        Untracked:  data/Blischak_et_al_2015/
        Untracked:  data/scale_data.rds
        Untracked:  docs/figure/Classify.Rmd/
        Untracked:  docs/figure/OutlierDetection.Rmd/
        Untracked:  docs/figure/OutlierDetection2.Rmd/
        Untracked:  docs/figure/OutlierDetection3.Rmd/
        Untracked:  docs/figure/Test.Rmd/
        Untracked:  docs/figure/mash_missing_whole_row_5.Rmd/
        Untracked:  docs/include/
        Untracked:  output/AddEMV/
        Untracked:  output/CovED_UKBio_strong.rds
        Untracked:  output/CovED_UKBio_strong_Z.rds
        Untracked:  output/Flash_UKBio_strong.rds
        Untracked:  output/MASH.10.em2.result.rds
        Untracked:  output/MASH.10.mle.result.rds
        Untracked:  output/MASH.result.1.rds
        Untracked:  output/MASH.result.10.rds
        Untracked:  output/MASH.result.2.rds
        Untracked:  output/MASH.result.3.rds
        Untracked:  output/MASH.result.4.rds
        Untracked:  output/MASH.result.5.rds
        Untracked:  output/MASH.result.6.rds
        Untracked:  output/MASH.result.7.rds
        Untracked:  output/MASH.result.8.rds
        Untracked:  output/MASH.result.9.rds
        Untracked:  output/Mash_EE_Cov_0_plusR1.rds
        Untracked:  output/Trail 1/
        Untracked:  output/Trail 2/
        Untracked:  output/UKBio_mash_model.rds
    
    Unstaged changes:
        Modified:   analysis/Mash_UKBio.Rmd
        Modified:   analysis/mash_missing_samplesize.Rmd
        Modified:   output/Flash_T2_0.rds
        Modified:   output/Flash_T2_0_mclust.rds
        Modified:   output/Mash_model_0_plusR1.rds
        Modified:   output/PresiAddVarCol.rds
    
    
    Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
Expand here to see past versions:
    File Version Author Date Message
    Rmd 88274cc zouyuxin 2018-08-20 wflow_publish(“analysis/EstimateCorMaxMash.Rmd”)
    html cb0eb4d zouyuxin 2018-08-16 Build site.
    Rmd 6357604 zouyuxin 2018-08-16 wflow_publish(c(“analysis/EstimateCorIndex.Rmd”, “analysis/EstimateCorMaxMash.Rmd”))
    html 6281062 zouyuxin 2018-08-15 Build site.
    Rmd 3e3e128 zouyuxin 2018-08-15 wflow_publish(c(“analysis/EstimateCor.Rmd”, “analysis/EstimateCorMax.Rmd”,
    html 055978d zouyuxin 2018-08-15 Build site.
    Rmd 40dc525 zouyuxin 2018-08-15 wflow_publish(“analysis/EstimateCorMaxMash.Rmd”)
    html 96ec142 zouyuxin 2018-08-15 Build site.
    Rmd 56743af zouyuxin 2018-08-15 wflow_publish(“analysis/EstimateCorMaxMash.Rmd”)
    html 802266a zouyuxin 2018-08-15 Build site.
    Rmd c7b3665 zouyuxin 2018-08-15 wflow_publish(“analysis/EstimateCorMaxMash.Rmd”)
    html b9e760f zouyuxin 2018-08-03 Build site.
    Rmd 1e6ff48 zouyuxin 2018-08-03 wflow_publish(c(“analysis/EstimateCorMaxMash.Rmd”))


library(mashr)
Loading required package: ashr
source('../code/generateDataV.R')
source('../code/estimate_cor.R')
source('../code/summary.R')
library(knitr)
library(kableExtra)

Apply the max methods for correlation matrix on mash data.

After we estimate each pairwise correlation, the final resulting \(p\times p\) correlation matrix may not be positive definite. I estimate the nearest PD cor matrix with nearPD function.

The estimated V from MLE(optim function) and \(EM_\rho\) perform better than the truncated correlation (error, mash log likelihood, ROC).

Comparing the estimated V from MLE and EM, \(EM_\rho\) algorithm tends to compute faster, and the estimated correlation is slightly better than the one from MLE in terms of estimation error, mash log likelihood, ROC.

One example: p = 3

\[ \hat{\beta}|\beta \sim N_{3}(\hat{\beta}; \beta, \left(\begin{matrix} 1 & 0.7 & 0.2 \\ 0.7 & 1 & 0.4 \\ 0.2 & 0.4 & 1 \end{matrix}\right)) \]

\[ \beta \sim \frac{1}{4}\delta_{0} + \frac{1}{4}N_{3}(0, \left(\begin{matrix} 1 & 0 &0\\ 0 & 0 & 0 \\ 0 & 0 & 0 \end{matrix}\right)) + \frac{1}{4}N_{3}(0, \left(\begin{matrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 0 \end{matrix}\right)) + \frac{1}{4}N_{3}(0, \left(\begin{matrix} 1 & 1 & 1 \\ 1 & 1 & 1 \\ 1 & 1 & 1 \end{matrix}\right)) \]

set.seed(1)
Sigma = cbind(c(1,0.7,0.2), c(0.7,1,0.4), c(0.2,0.4,1))
U0 = matrix(0,3,3)
U1 = matrix(0,3,3); U1[1,1] = 1
U2 = diag(3); U2[3,3] = 0
U3 = matrix(1,3,3)
data = generate_data(n=4000, p=3, V=Sigma, Utrue = list(U0=U0, U1=U1,U2=U2,U3=U3))

We find the estimate of V with canonical covariances and the PCA covariances.

m.data = mash_set_data(data$Bhat, data$Shat)
m.1by1 = mash_1by1(m.data)
strong = get_significant_results(m.1by1)

U.pca = cov_pca(m.data, 3, subset = strong)
U.ed = cov_ed(m.data, U.pca, subset = strong)
U.c = cov_canonical(m.data)

The PCA correlation matrices are:

  1. We run the algorithm in estimate cor mle with 3 different initial points for \(\rho\), (-0.5,0,0.5). The \(\rho\) in each iteration is estimated using optim function. The estimated correlation is
Vhat.mle = estimateV(m.data, c(U.c, U.ed), init_rho = c(-0.5,0,0.5), tol=1e-4, optmethod = 'mle')
Vhat.mle$V
          [,1]      [,2]      [,3]
[1,] 1.0000000 0.7009268 0.1588733
[2,] 0.7009268 1.0000000 0.4001842
[3,] 0.1588733 0.4001842 1.0000000
  1. The result uses algorithm in estimate cor em. \(\rho\) in each iteration is the root of a third degree polynomial.
Vhat.em = estimateV(m.data, c(U.c, U.ed), init_rho = c(-0.5,0,0.5), tol = 1e-4, optmethod = 'em2')
Vhat.em$V
          [,1]      [,2]      [,3]
[1,] 1.0000000 0.7086927 0.1706391
[2,] 0.7086927 1.0000000 0.4187296
[3,] 0.1706391 0.4187296 1.0000000

The running time (in sec.) for each pairwise correlation is

table = data.frame(rbind(Vhat.mle$ttime, Vhat.em$ttime), row.names = c('mle', 'em'))
colnames(table) = c('12','13','23')
table %>% kable() %>% kable_styling()
12 13 23
mle 312.547 228.092 70.155
em 179.781 65.843 54.555

The time is the total running time with different initial point.

  1. The result uses algorithm in estimate cor em V.
Vhat.emV = estimateV(m.data, c(U.c, U.ed), init_V = list(diag(ncol(m.data$Bhat)), clusterGeneration::rcorrmatrix(3), clusterGeneration::rcorrmatrix(3)), tol = 1e-4, optmethod = 'emV')
Vhat.emV$V
          [,1]      [,2]      [,3]
[1,] 1.0000000 0.5487183 0.2669639
[2,] 0.5487183 1.0000000 0.4654711
[3,] 0.2669639 0.4654711 1.0000000
  1. Using the original truncated correlation:
Vhat.tru = estimate_null_correlation(m.data)
Vhat.tru
          [,1]      [,2]      [,3]
[1,] 1.0000000 0.4296283 0.1222433
[2,] 0.4296283 1.0000000 0.3324459
[3,] 0.1222433 0.3324459 1.0000000

The truncated correlation underestimates the correlations.

  1. mash 1by1: Run ash for each condition, and estimate correlation matrix based on the non-significant samples.
V.mash = cor((data$Bhat/data$Shat)[-strong,])
V.mash
          [,1]      [,2]      [,3]
[1,] 1.0000000 0.5313446 0.2445663
[2,] 0.5313446 1.0000000 0.4490049
[3,] 0.2445663 0.4490049 1.0000000

All the estimated correlation matrices above are positive definite.

Error

Check the estimation error:

FError = c(norm(Vhat.mle$V - Sigma, 'F'),
           norm(Vhat.em$V - Sigma, 'F'),
           norm(Vhat.emV$V - Sigma, 'F'),
           norm(Vhat.tru - Sigma, 'F'),
           norm(V.mash - Sigma, 'F'))
OpError = c(norm(Vhat.mle$V - Sigma, '2'),
           norm(Vhat.em$V - Sigma, '2'),
           norm(Vhat.emV$V - Sigma, '2'),
           norm(Vhat.tru - Sigma, '2'),
           norm(V.mash - Sigma, '2'))
table = data.frame(FrobeniusError = FError, SpectralError = OpError, row.names = c('mle','em','emV','trunc','m.1by1'))
table %>% kable() %>% kable_styling()
FrobeniusError SpectralError
mle 0.0581773 0.0411417
em 0.0507627 0.0391489
emV 0.2516219 0.1960202
trunc 0.4091712 0.3049974
m.1by1 0.2562509 0.1915171

mash log likelihood

In mash model, the model with correlation from mle has larger loglikelihood.

m.data.mle = mash_set_data(data$Bhat, data$Shat, V=Vhat.mle$V)
m.model.mle = mash(m.data.mle, c(U.c,U.ed), verbose = FALSE)
m.data.em = mash_set_data(data$Bhat, data$Shat, V=Vhat.em$V)
m.model.em = mash(m.data.em, c(U.c,U.ed), verbose = FALSE)
m.data.emV = mash_set_data(data$Bhat, data$Shat, V=Vhat.emV$V)
m.model.emV = mash(m.data.emV, c(U.c,U.ed), verbose = FALSE)
m.data.trunc = mash_set_data(data$Bhat, data$Shat, V=Vhat.tru)
m.model.trunc = mash(m.data.trunc, c(U.c,U.ed), verbose = FALSE)
m.data.1by1 = mash_set_data(data$Bhat, data$Shat, V=V.mash)
m.model.1by1 = mash(m.data.1by1, c(U.c,U.ed), verbose = FALSE)
m.data.correct = mash_set_data(data$Bhat, data$Shat, V=Sigma)
m.model.correct = mash(m.data.correct, c(U.c,U.ed), verbose = FALSE)

The results are summarized in table:

null.ind = which(apply(data$B,1,sum) == 0)
V.trun = c(get_loglik(m.model.trunc), length(get_significant_results(m.model.trunc)), sum(get_significant_results(m.model.trunc) %in% null.ind))
V.mle = c(get_loglik(m.model.mle), length(get_significant_results(m.model.mle)), sum(get_significant_results(m.model.mle) %in% null.ind))
V.em = c(get_loglik(m.model.em), length(get_significant_results(m.model.em)), sum(get_significant_results(m.model.em) %in% null.ind))
V.emV = c(get_loglik(m.model.emV), length(get_significant_results(m.model.emV)), sum(get_significant_results(m.model.emV) %in% null.ind))
V.1by1 = c(get_loglik(m.model.1by1), length(get_significant_results(m.model.1by1)), sum(get_significant_results(m.model.1by1) %in% null.ind))
V.correct = c(get_loglik(m.model.correct), length(get_significant_results(m.model.correct)), sum(get_significant_results(m.model.correct) %in% null.ind))
temp = cbind(V.mle, V.em, V.emV, V.trun, V.1by1, V.correct)
colnames(temp) = c('MLE','EM','EMV', 'Truncate', 'm.1by1', 'True')
row.names(temp) = c('log likelihood', '# significance', '# False positive')
temp %>% kable() %>% kable_styling()
MLE EM EMV Truncate m.1by1 True
log likelihood -17917.69 -17919.23 -17945.16 -17951.46 -17943.49 -17913.58
# significance 146.00 149.00 82.00 85.00 73.00 149.00
# False positive 1.00 1.00 0.00 1.00 0.00 1.00

The estimated pi is

par(mfrow=c(2,3))
barplot(get_estimated_pi(m.model.mle), las=2, cex.names = 0.7, main='MLE', ylim=c(0,0.8))
barplot(get_estimated_pi(m.model.em), las=2, cex.names = 0.7, main='EM', ylim=c(0,0.8))
barplot(get_estimated_pi(m.model.emV), las=2, cex.names = 0.7, main='EMV', ylim=c(0,0.8))
barplot(get_estimated_pi(m.model.trunc), las=2, cex.names = 0.7, main='Truncate', ylim=c(0,0.8))
barplot(get_estimated_pi(m.model.1by1), las=2, cex.names = 0.7, main='m.1by1', ylim=c(0,0.8))
barplot(get_estimated_pi(m.model.correct), las=2, cex.names = 0.7, main='True', ylim=c(0,0.8))

ROC

m.mle.seq = ROC.table(data$B, m.model.mle)
m.em.seq = ROC.table(data$B, m.model.em)
m.emV.seq = ROC.table(data$B, m.model.emV)
m.trun.seq = ROC.table(data$B, m.model.trunc)
m.1by1.seq = ROC.table(data$B, m.model.1by1)
m.correct.seq = ROC.table(data$B, m.model.correct)

RRMSE

rrmse = rbind(RRMSE(data$B, data$Bhat, list(m.model.mle, m.model.em, m.model.emV, m.model.trunc, m.model.1by1, m.model.correct)))
colnames(rrmse) = c('MLE','EM','EMV', 'Truncate','m.1by1','True')
row.names(rrmse) = 'RRMSE'
rrmse %>% kable() %>% kable_styling()
MLE EM EMV Truncate m.1by1 True
RRMSE 0.5277488 0.5285126 0.5437355 0.5592648 0.5442074 0.5283068
barplot(rrmse, ylim=c(0,(1+max(rrmse))/2), names.arg = c('MLE','EM', 'EMV','Truncate','m.1by1','True'), las=2, cex.names = 0.7, main='RRMSE')

More simulations: p = 5

I randomly generate 10 positive definite correlation matrices, V. The sample size is 4000.

\[ \hat{z}|z \sim N_{5}(z, V) \] \[ z\sim\frac{1}{4}\delta_{0} + \frac{1}{4}N_{5}(0,\left(\begin{matrix} 1 & \mathbf{0}_{1\times 4} \\ \mathbf{0}_{4\times 1} & \mathbf{0}_{4\times 4} \end{matrix}\right)) + \frac{1}{4}N_{5}(0,\left(\begin{matrix} \mathbf{1}_{2\times 2} & \mathbf{0}_{1\times 3} \\ \mathbf{0}_{3\times 1} & \mathbf{0}_{3\times 3} \end{matrix}\right)) + \frac{1}{4}N_{5}(0,\mathbf{1}_{5\times 5}) \]

set.seed(100)
n=4000; p = 5
U0 = matrix(0,p,p)
U1 = U0; U1[1,1] = 1
U2 = U0; U2[c(1:2), c(1:2)] = 1
U3 = matrix(1, p,p)
Utrue = list(U0 = U0, U1 = U1, U2 = U2, U3 = U3)
for(t in 1:10){
  Vtrue = clusterGeneration::rcorrmatrix(p)
  data = generate_data(n, p, Vtrue, Utrue)
  # mash cov
  m.data = mash_set_data(Bhat = data$Bhat, Shat = data$Shat)
  m.1by1 = mash_1by1(m.data)
  strong = get_significant_results(m.1by1)

  U.pca = cov_pca(m.data, 3, subset = strong)
  U.ed = cov_ed(m.data, U.pca, subset = strong)
  U.c = cov_canonical(m.data)
  Vhat.mle <- estimateV(m.data, c(U.c, U.ed), init_rho = c(-0.5,0,0.5), tol=1e-4, optmethod = 'mle')
  Vhat.em <- estimateV(m.data, c(U.c, U.ed), init_rho = c(-0.5,0,0.5), tol=1e-4, optmethod = 'em2')
  Vhat.emV <- estimateV(m.data, c(U.c, U.ed), init_V = list(diag(ncol(m.data$Bhat)), clusterGeneration::rcorrmatrix(p), clusterGeneration::rcorrmatrix(p)),tol=1e-4, optmethod = 'emV')
  saveRDS(list(V.true = Vtrue, V.mle = Vhat.mle, V.em = Vhat.em, V.emV = Vhat.emV, data = data, strong=strong),
          paste0('../output/MASH.result.',t,'.rds'))
}
files = dir("../output/AddEMV/"); files = files[grep("MASH.result",files)]
times = length(files)
result = vector(mode="list",length = times)
for(i in 1:times) {
  result[[i]] = readRDS(paste("../output/AddEMV/", files[[i]], sep=""))
}
mle.pd = numeric(times)
em.pd = numeric(times)
for(i in 1:times){
  m.data = mash_set_data(result[[i]]$data$Bhat, result[[i]]$data$Shat)
  
  result[[i]]$V.trun = estimate_null_correlation(m.data, apply_lower_bound = FALSE)
  m.1by1 = mash_1by1(m.data)
  strong = get_significant_results(m.1by1)
  result[[i]]$V.1by1 = cor(m.data$Bhat[-strong,])
  U.c = cov_canonical(m.data)
  U.pca = cov_pca(m.data, 3, subset = strong)
  U.ed = cov_ed(m.data, U.pca, subset = strong)

  m.data.true = mash_set_data(Bhat = m.data$Bhat, Shat = m.data$Shat, V = result[[i]]$V.true)
  m.model.true = mash(m.data.true, c(U.c,U.ed), verbose = FALSE)
  
  m.data.trunc = mash_set_data(Bhat = m.data$Bhat, Shat = m.data$Shat, V = result[[i]]$V.trun)
  m.model.trunc = mash(m.data.trunc, c(U.c,U.ed), verbose = FALSE)
  
  m.data.1by1 = mash_set_data(Bhat = m.data$Bhat, Shat = m.data$Shat, V = result[[i]]$V.1by1)
  m.model.1by1 = mash(m.data.1by1, c(U.c,U.ed), verbose = FALSE)
  
  m.data.emV = mash_set_data(Bhat = m.data$Bhat, Shat = m.data$Shat, V = result[[i]]$V.emV$V)
  m.model.emV = mash(m.data.emV, c(U.c,U.ed), verbose = FALSE)
  
  # MLE
  
  m.model.mle = m.model.mle.F = m.model.mle.2 = list()
  R <- tryCatch(chol(result[[i]]$V.mle$V),error = function (e) FALSE)
  if(is.matrix(R)){
    mle.pd[i] = 1
    m.data.mle = mash_set_data(Bhat = m.data$Bhat, Shat = m.data$Shat, V = result[[i]]$V.mle$V)
    m.model.mle = mash(m.data.mle, c(U.c,U.ed), verbose = FALSE)
  }else{
    V.mle.near.F = as.matrix(Matrix::nearPD(result[[i]]$V.mle$V, conv.norm.type = 'F', keepDiag = TRUE)$mat)
    V.mle.near.2 = as.matrix(Matrix::nearPD(result[[i]]$V.mle$V, conv.norm.type = '2', keepDiag = TRUE)$mat)
    
    result[[i]]$V.mle.F = V.mle.near.F
    result[[i]]$V.mle.2 = V.mle.near.2
    
    # mashmodel
    m.data.mle.F = mash_set_data(Bhat = m.data$Bhat, Shat = m.data$Shat, V = V.mle.near.F)
    m.model.mle.F = mash(m.data.mle.F, c(U.c,U.ed), verbose = FALSE)
    
    m.data.mle.2 = mash_set_data(Bhat = m.data$Bhat, Shat = m.data$Shat, V = V.mle.near.2)
    m.model.mle.2 = mash(m.data.mle.2, c(U.c,U.ed), verbose = FALSE)
  }
  
  # EM
  m.model.em = m.model.em.F = m.model.em.2 = list()
  R <- tryCatch(chol(result[[i]]$V.em$V),error = function (e) FALSE)
  if(is.matrix(R)){
    em.pd[i] = 1
    m.data.em = mash_set_data(Bhat = m.data$Bhat, Shat = m.data$Shat, V = result[[i]]$V.em$V)
    m.model.em = mash(m.data.em, c(U.c,U.ed), verbose = FALSE)
  }else{
    V.em.near.F = as.matrix(Matrix::nearPD(result[[i]]$V.em$V, conv.norm.type = 'F', keepDiag = TRUE)$mat)
    V.em.near.2 = as.matrix(Matrix::nearPD(result[[i]]$V.em$V, conv.norm.type = '2', keepDiag = TRUE)$mat)
    
    result[[i]]$V.em.F = V.em.near.F
    result[[i]]$V.em.2 = V.em.near.2
    
    # mashmodel
    m.data.em.F = mash_set_data(Bhat = m.data$Bhat, Shat = m.data$Shat, V = V.em.near.F)
    m.model.em.F = mash(m.data.em.F, c(U.c,U.ed), verbose = FALSE)
      
    m.data.em.2 = mash_set_data(Bhat = m.data$Bhat, Shat = m.data$Shat, V = V.em.near.2)
    m.model.em.2 = mash(m.data.em.2, c(U.c,U.ed), verbose = FALSE)
  }
  
  result[[i]]$m.model = list(m.model.true = m.model.true, m.model.trunc = m.model.trunc, 
                             m.model.1by1 = m.model.1by1, m.model.emV = m.model.emV,
                             m.model.mle = m.model.mle,
                             m.model.mle.F = m.model.mle.F, m.model.mle.2 = m.model.mle.2, 
                             m.model.em = m.model.em,
                             m.model.em.F = m.model.em.F, m.model.em.2 = m.model.em.2)
}

Error

The column with .F, .2 are from the nearest positive definite matrix with respect to Frobenius norm and spectral norm.

The Frobenius norm is

temp = matrix(0,nrow = times, ncol = 7)
for(i in 1:times){
  temp[i, ] = error.cor(result[[i]], norm.type='F', mle.pd = mle.pd[i], em.pd = em.pd[i])
}
colnames(temp) = c('Trunc','m.1by1', 'MLE','MLE.F', 'EM', 'EM.F', 'EMV')
temp[temp==0] = NA
temp %>% kable() %>% kable_styling()
Trunc m.1by1 MLE MLE.F EM EM.F EMV
0.5847549 0.9286016 0.3185293 0.3053278 0.0941757 NA 1.0198618
0.6345196 0.8140108 0.1169154 NA 0.0981075 NA 0.8913631
0.7201453 0.9531300 0.2344733 NA 0.1734402 NA 1.0216638
0.8370832 1.0534335 0.1968091 NA 0.2411372 NA 1.1115619
0.8206008 0.8466408 0.1194142 NA 0.1189734 NA 0.9176794
0.8455747 1.1393764 0.1650726 0.1479670 0.1653665 0.1516188 1.2178752
0.5173056 0.8194980 0.1211599 NA 0.0861518 NA 0.8914736
0.8840057 0.9546138 0.1642744 NA 0.1563607 NA 1.0169747
0.6535878 1.0554833 0.1240352 NA 0.1323659 NA 1.1491978
0.6425639 0.9346341 0.0794776 NA 0.0700813 NA 1.0196620

The spectral norm is

temp = matrix(0,nrow = times, ncol = 7)
for(i in 1:times){
  temp[i, ] = error.cor(result[[i]], norm.type='2', mle.pd = mle.pd[i], em.pd = em.pd[i])
}
colnames(temp) = c('Trunc','m.1by1', 'MLE','MLE.2', 'EM', 'EM.2', 'EMV')
temp[temp==0] = NA
temp %>% kable() %>% kable_styling()
Trunc m.1by1 MLE MLE.2 EM EM.2 EMV
0.4230698 0.7526315 0.2258153 0.2205340 0.0662442 NA 0.8335745
0.5228925 0.6281308 0.0783098 NA 0.0677285 NA 0.7069909
0.5156642 0.7945237 0.1750836 NA 0.1239438 NA 0.8577274
0.6529121 0.8335401 0.1484773 NA 0.1768351 NA 0.8944724
0.6500336 0.6377762 0.0778356 NA 0.0849870 NA 0.7052165
0.5607948 0.8613851 0.1001500 0.0937816 0.1085885 0.1015186 0.9343139
0.3157614 0.6406310 0.0795125 NA 0.0659174 NA 0.6982291
0.7134025 0.7290570 0.1090477 NA 0.1103719 NA 0.7987348
0.4767534 0.8894951 0.0964157 NA 0.0957622 NA 0.9760343
0.4591215 0.7926789 0.0568252 NA 0.0558999 NA 0.8740268

Time

The total running time for each matrix is

mle.time = em.time = numeric(times)
for(i in 1:times){
  mle.time[i] = sum(result[[i]]$V.mle$ttime)
  em.time[i] = sum(result[[i]]$V.em$ttime)
}
temp = cbind(mle.time, em.time)
colnames(temp) = c('MLE', 'EM')
row.names(temp) = 1:10
temp %>% kable() %>% kable_styling()
MLE EM
3171.983 848.351
2341.424 704.601
1990.764 740.046
3249.162 1072.233
1988.220 717.808
2580.794 958.710
1928.634 597.511
2992.277 1114.932
2339.513 708.779
2727.560 772.780

mash log likelihood

The NA means the estimated correlation matrix is not positive definite.

temp = matrix(0,nrow = times, ncol = 10)
for(i in 1:times){
  temp[i, ] = loglik.cor(result[[i]]$m.model, mle.pd = mle.pd[i], em.pd = em.pd[i])
}
colnames(temp) = c('True', 'Trunc','m.1by1', 'MLE','MLE.F', 'MLE.2', 'EM', 'EM.F', 'EM.2','EMV')
temp[temp == 0] = NA
temp[,-c(6,9)] %>% kable() %>% kable_styling()
True Trunc m.1by1 MLE MLE.F EM EM.F EMV
-26039.92 -26130.50 -26112.94 NA -26265.49 -26120.81 NA -26136.19
-25669.59 -26997.92 -26950.67 -26028.96 NA -25859.86 NA -26967.47
-27473.71 -27547.11 -27535.67 -27465.76 NA -27463.43 NA -27551.75
-28215.48 -28604.98 -28646.72 -28301.41 NA -28338.57 NA -28669.24
-24988.68 -25236.18 -25110.41 -25056.59 NA -25048.86 NA -25123.96
-24299.89 -24978.79 -24972.25 NA -24492.32 NA -24478.09 -25020.38
-27574.71 -27698.40 -27662.48 -27517.65 NA -27540.65 NA -27684.83
-27941.86 -28159.65 -28182.04 -27979.99 NA -27953.02 NA -28222.45
-29788.75 -29824.45 -29921.87 -29760.81 NA -29759.50 NA -29954.96
-28542.34 -28922.58 -29163.51 -28537.96 NA -28550.73 NA -29221.61

ROC

par(mfrow=c(1,2))
for(i in 1:times){
  plotROC(result[[i]]$data$B, result[[i]]$m.model, mle.pd = mle.pd[i], em.pd = em.pd[i], title=paste0('Data', i, ' '))
}

RRMSE

par(mfrow=c(1,2))
for(i in 1:times){
  rrmse = rbind(RRMSE(result[[i]]$data$B, result[[i]]$data$Bhat, result[[i]]$m.model))
  barplot(rrmse, ylim=c(0,(1+max(rrmse))/2), las=2, cex.names = 0.7, main='RRMSE')
}

Expand here to see past versions of unnamed-chunk-31-1.png:
Version Author Date
802266a zouyuxin 2018-08-15

Expand here to see past versions of unnamed-chunk-31-2.png:
Version Author Date
802266a zouyuxin 2018-08-15

Expand here to see past versions of unnamed-chunk-31-3.png:
Version Author Date
802266a zouyuxin 2018-08-15

Expand here to see past versions of unnamed-chunk-31-4.png:
Version Author Date
802266a zouyuxin 2018-08-15

Expand here to see past versions of unnamed-chunk-31-5.png:
Version Author Date
802266a zouyuxin 2018-08-15

Session information

sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.6

Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] kableExtra_0.9.0 knitr_1.20       mashr_0.2-11     ashr_2.2-10     

loaded via a namespace (and not attached):
 [1] corrplot_0.84            lattice_0.20-35         
 [3] Rmosek_8.0.69            colorspace_1.3-2        
 [5] htmltools_0.3.6          viridisLite_0.3.0       
 [7] yaml_2.2.0               rlang_0.2.1             
 [9] R.oo_1.22.0              pillar_1.3.0            
[11] R.utils_2.6.0            ExtremeDeconvolution_1.3
[13] REBayes_1.3              foreach_1.4.4           
[15] plyr_1.8.4               stringr_1.3.1           
[17] munsell_0.5.0            workflowr_1.1.1         
[19] rvest_0.3.2              R.methodsS3_1.7.1       
[21] mvtnorm_1.0-8            codetools_0.2-15        
[23] evaluate_0.11            pscl_1.5.2              
[25] doParallel_1.0.11        parallel_3.5.1          
[27] highr_0.7                Rcpp_0.12.18            
[29] readr_1.1.1              backports_1.1.2         
[31] scales_0.5.0             rmeta_3.0               
[33] clusterGeneration_1.3.4  truncnorm_1.0-8         
[35] hms_0.4.2                digest_0.6.15           
[37] stringi_1.2.4            grid_3.5.1              
[39] rprojroot_1.3-2          tools_3.5.1             
[41] magrittr_1.5             tibble_1.4.2            
[43] crayon_1.3.4             whisker_0.3-2           
[45] pkgconfig_2.0.1          MASS_7.3-50             
[47] Matrix_1.2-14            SQUAREM_2017.10-1       
[49] xml2_1.2.0               assertthat_0.2.0        
[51] rmarkdown_1.10           httr_1.3.1              
[53] rstudioapi_0.7           iterators_1.0.10        
[55] R6_2.2.2                 git2r_0.23.0            
[57] compiler_3.5.1          

This reproducible R Markdown analysis was created with workflowr 1.1.1