Estimate cor—max MASH

Last updated: 2018-08-20

workflowr checks: (Click a bullet for more information)

✔ R Markdown file: up-to-date

Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.
✔ Environment: empty

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.
✔ Seed: set.seed(1)

The command set.seed(1) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.
✔ Session information: recorded

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

✔ Repository version: 88274cc

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility. The version displayed above was the version of the Git repository at the time these results were generated.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .DS_Store
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/
    Ignored:    analysis/.DS_Store
    Ignored:    analysis/.Rhistory
    Ignored:    analysis/include/.DS_Store
    Ignored:    data/.DS_Store
    Ignored:    docs/.DS_Store
    Ignored:    output/.DS_Store

Untracked files:
    Untracked:  analysis/Classify.Rmd
    Untracked:  analysis/EstimateCorMaxEM.Rmd
    Untracked:  analysis/EstimateCorMaxEMGD.Rmd
    Untracked:  analysis/EstimateCorPrior.Rmd
    Untracked:  analysis/EstimateCorSol.Rmd
    Untracked:  analysis/HierarchicalFlashSim.Rmd
    Untracked:  analysis/MashLowSignal.Rmd
    Untracked:  analysis/Mash_GTEx.Rmd
    Untracked:  analysis/MeanAsh.Rmd
    Untracked:  analysis/OutlierDetection.Rmd
    Untracked:  analysis/OutlierDetection2.Rmd
    Untracked:  analysis/OutlierDetection3.Rmd
    Untracked:  analysis/OutlierDetection4.Rmd
    Untracked:  analysis/mash_missing_row.Rmd
    Untracked:  code/MashClassify.R
    Untracked:  code/MashCorResult.R
    Untracked:  code/MashSource.R
    Untracked:  code/Weight_plot.R
    Untracked:  code/addemV.R
    Untracked:  code/estimate_cor.R
    Untracked:  code/generateDataV.R
    Untracked:  code/johnprocess.R
    Untracked:  code/sim_mean_sig.R
    Untracked:  code/summary.R
    Untracked:  data/Blischak_et_al_2015/
    Untracked:  data/scale_data.rds
    Untracked:  docs/figure/Classify.Rmd/
    Untracked:  docs/figure/OutlierDetection.Rmd/
    Untracked:  docs/figure/OutlierDetection2.Rmd/
    Untracked:  docs/figure/OutlierDetection3.Rmd/
    Untracked:  docs/figure/Test.Rmd/
    Untracked:  docs/figure/mash_missing_whole_row_5.Rmd/
    Untracked:  docs/include/
    Untracked:  output/AddEMV/
    Untracked:  output/CovED_UKBio_strong.rds
    Untracked:  output/CovED_UKBio_strong_Z.rds
    Untracked:  output/Flash_UKBio_strong.rds
    Untracked:  output/MASH.10.em2.result.rds
    Untracked:  output/MASH.10.mle.result.rds
    Untracked:  output/MASH.result.1.rds
    Untracked:  output/MASH.result.10.rds
    Untracked:  output/MASH.result.2.rds
    Untracked:  output/MASH.result.3.rds
    Untracked:  output/MASH.result.4.rds
    Untracked:  output/MASH.result.5.rds
    Untracked:  output/MASH.result.6.rds
    Untracked:  output/MASH.result.7.rds
    Untracked:  output/MASH.result.8.rds
    Untracked:  output/MASH.result.9.rds
    Untracked:  output/Mash_EE_Cov_0_plusR1.rds
    Untracked:  output/Trail 1/
    Untracked:  output/Trail 2/
    Untracked:  output/UKBio_mash_model.rds

Unstaged changes:
    Modified:   analysis/Mash_UKBio.Rmd
    Modified:   analysis/mash_missing_samplesize.Rmd
    Modified:   output/Flash_T2_0.rds
    Modified:   output/Flash_T2_0_mclust.rds
    Modified:   output/Mash_model_0_plusR1.rds
    Modified:   output/PresiAddVarCol.rds

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.

Expand here to see past versions:

File	Version	Author	Date	Message
Rmd	88274cc	zouyuxin	2018-08-20	wflow_publish(“analysis/EstimateCorMaxMash.Rmd”)
html	cb0eb4d	zouyuxin	2018-08-16	Build site.
Rmd	6357604	zouyuxin	2018-08-16	wflow_publish(c(“analysis/EstimateCorIndex.Rmd”, “analysis/EstimateCorMaxMash.Rmd”))
html	6281062	zouyuxin	2018-08-15	Build site.
Rmd	3e3e128	zouyuxin	2018-08-15	wflow_publish(c(“analysis/EstimateCor.Rmd”, “analysis/EstimateCorMax.Rmd”,
html	055978d	zouyuxin	2018-08-15	Build site.
Rmd	40dc525	zouyuxin	2018-08-15	wflow_publish(“analysis/EstimateCorMaxMash.Rmd”)
html	96ec142	zouyuxin	2018-08-15	Build site.
Rmd	56743af	zouyuxin	2018-08-15	wflow_publish(“analysis/EstimateCorMaxMash.Rmd”)
html	802266a	zouyuxin	2018-08-15	Build site.
Rmd	c7b3665	zouyuxin	2018-08-15	wflow_publish(“analysis/EstimateCorMaxMash.Rmd”)
html	b9e760f	zouyuxin	2018-08-03	Build site.
Rmd	1e6ff48	zouyuxin	2018-08-03	wflow_publish(c(“analysis/EstimateCorMaxMash.Rmd”))

library(mashr)

Loading required package: ashr

source('../code/generateDataV.R')
source('../code/estimate_cor.R')
source('../code/summary.R')
library(knitr)
library(kableExtra)

Apply the max methods for correlation matrix on mash data.

After we estimate each pairwise correlation, the final resulting \(p\times p\) correlation matrix may not be positive definite. I estimate the nearest PD cor matrix with nearPD function.

The estimated V from MLE(optim function) and \(EM_\rho\) perform better than the truncated correlation (error, mash log likelihood, ROC).

Comparing the estimated V from MLE and EM, \(EM_\rho\) algorithm tends to compute faster, and the estimated correlation is slightly better than the one from MLE in terms of estimation error, mash log likelihood, ROC.

One example: p = 3

\[ \hat{\beta}|\beta \sim N_{3}(\hat{\beta}; \beta, \left(\begin{matrix} 1 & 0.7 & 0.2 \\ 0.7 & 1 & 0.4 \\ 0.2 & 0.4 & 1 \end{matrix}\right)) \]

\[ \beta \sim \frac{1}{4}\delta_{0} + \frac{1}{4}N_{3}(0, \left(\begin{matrix} 1 & 0 &0\\ 0 & 0 & 0 \\ 0 & 0 & 0 \end{matrix}\right)) + \frac{1}{4}N_{3}(0, \left(\begin{matrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 0 \end{matrix}\right)) + \frac{1}{4}N_{3}(0, \left(\begin{matrix} 1 & 1 & 1 \\ 1 & 1 & 1 \\ 1 & 1 & 1 \end{matrix}\right)) \]

set.seed(1)
Sigma = cbind(c(1,0.7,0.2), c(0.7,1,0.4), c(0.2,0.4,1))
U0 = matrix(0,3,3)
U1 = matrix(0,3,3); U1[1,1] = 1
U2 = diag(3); U2[3,3] = 0
U3 = matrix(1,3,3)
data = generate_data(n=4000, p=3, V=Sigma, Utrue = list(U0=U0, U1=U1,U2=U2,U3=U3))

We find the estimate of V with canonical covariances and the PCA covariances.

m.data = mash_set_data(data$Bhat, data$Shat)
m.1by1 = mash_1by1(m.data)
strong = get_significant_results(m.1by1)

U.pca = cov_pca(m.data, 3, subset = strong)
U.ed = cov_ed(m.data, U.pca, subset = strong)
U.c = cov_canonical(m.data)

The PCA correlation matrices are:

We run the algorithm in estimate cor mle with 3 different initial points for \(\rho\), (-0.5,0,0.5). The \(\rho\) in each iteration is estimated using optim function. The estimated correlation is

Vhat.mle = estimateV(m.data, c(U.c, U.ed), init_rho = c(-0.5,0,0.5), tol=1e-4, optmethod = 'mle')
Vhat.mle$V

          [,1]      [,2]      [,3]
[1,] 1.0000000 0.7009268 0.1588733
[2,] 0.7009268 1.0000000 0.4001842
[3,] 0.1588733 0.4001842 1.0000000

The result uses algorithm in estimate cor em. \(\rho\) in each iteration is the root of a third degree polynomial.

Vhat.em = estimateV(m.data, c(U.c, U.ed), init_rho = c(-0.5,0,0.5), tol = 1e-4, optmethod = 'em2')
Vhat.em$V

          [,1]      [,2]      [,3]
[1,] 1.0000000 0.7086927 0.1706391
[2,] 0.7086927 1.0000000 0.4187296
[3,] 0.1706391 0.4187296 1.0000000

The running time (in sec.) for each pairwise correlation is

table = data.frame(rbind(Vhat.mle$ttime, Vhat.em$ttime), row.names = c('mle', 'em'))
colnames(table) = c('12','13','23')
table %>% kable() %>% kable_styling()

	12	13	23
mle	312.547	228.092	70.155
em	179.781	65.843	54.555

The time is the total running time with different initial point.

The result uses algorithm in estimate cor em V.

Vhat.emV = estimateV(m.data, c(U.c, U.ed), init_V = list(diag(ncol(m.data$Bhat)), clusterGeneration::rcorrmatrix(3), clusterGeneration::rcorrmatrix(3)), tol = 1e-4, optmethod = 'emV')
Vhat.emV$V

          [,1]      [,2]      [,3]
[1,] 1.0000000 0.5487183 0.2669639
[2,] 0.5487183 1.0000000 0.4654711
[3,] 0.2669639 0.4654711 1.0000000

Using the original truncated correlation:

Vhat.tru = estimate_null_correlation(m.data)
Vhat.tru

          [,1]      [,2]      [,3]
[1,] 1.0000000 0.4296283 0.1222433
[2,] 0.4296283 1.0000000 0.3324459
[3,] 0.1222433 0.3324459 1.0000000

The truncated correlation underestimates the correlations.

mash 1by1: Run ash for each condition, and estimate correlation matrix based on the non-significant samples.

V.mash = cor((data$Bhat/data$Shat)[-strong,])
V.mash

          [,1]      [,2]      [,3]
[1,] 1.0000000 0.5313446 0.2445663
[2,] 0.5313446 1.0000000 0.4490049
[3,] 0.2445663 0.4490049 1.0000000

All the estimated correlation matrices above are positive definite.

Error

Check the estimation error:

FError = c(norm(Vhat.mle$V - Sigma, 'F'),
           norm(Vhat.em$V - Sigma, 'F'),
           norm(Vhat.emV$V - Sigma, 'F'),
           norm(Vhat.tru - Sigma, 'F'),
           norm(V.mash - Sigma, 'F'))
OpError = c(norm(Vhat.mle$V - Sigma, '2'),
           norm(Vhat.em$V - Sigma, '2'),
           norm(Vhat.emV$V - Sigma, '2'),
           norm(Vhat.tru - Sigma, '2'),
           norm(V.mash - Sigma, '2'))
table = data.frame(FrobeniusError = FError, SpectralError = OpError, row.names = c('mle','em','emV','trunc','m.1by1'))
table %>% kable() %>% kable_styling()

	FrobeniusError	SpectralError
mle	0.0581773	0.0411417
em	0.0507627	0.0391489
emV	0.2516219	0.1960202
trunc	0.4091712	0.3049974
m.1by1	0.2562509	0.1915171

mash log likelihood

In mash model, the model with correlation from mle has larger loglikelihood.

m.data.mle = mash_set_data(data$Bhat, data$Shat, V=Vhat.mle$V)
m.model.mle = mash(m.data.mle, c(U.c,U.ed), verbose = FALSE)

m.data.em = mash_set_data(data$Bhat, data$Shat, V=Vhat.em$V)
m.model.em = mash(m.data.em, c(U.c,U.ed), verbose = FALSE)

m.data.emV = mash_set_data(data$Bhat, data$Shat, V=Vhat.emV$V)
m.model.emV = mash(m.data.emV, c(U.c,U.ed), verbose = FALSE)

m.data.trunc = mash_set_data(data$Bhat, data$Shat, V=Vhat.tru)
m.model.trunc = mash(m.data.trunc, c(U.c,U.ed), verbose = FALSE)

m.data.1by1 = mash_set_data(data$Bhat, data$Shat, V=V.mash)
m.model.1by1 = mash(m.data.1by1, c(U.c,U.ed), verbose = FALSE)

m.data.correct = mash_set_data(data$Bhat, data$Shat, V=Sigma)
m.model.correct = mash(m.data.correct, c(U.c,U.ed), verbose = FALSE)

The results are summarized in table:

null.ind = which(apply(data$B,1,sum) == 0)
V.trun = c(get_loglik(m.model.trunc), length(get_significant_results(m.model.trunc)), sum(get_significant_results(m.model.trunc) %in% null.ind))
V.mle = c(get_loglik(m.model.mle), length(get_significant_results(m.model.mle)), sum(get_significant_results(m.model.mle) %in% null.ind))
V.em = c(get_loglik(m.model.em), length(get_significant_results(m.model.em)), sum(get_significant_results(m.model.em) %in% null.ind))
V.emV = c(get_loglik(m.model.emV), length(get_significant_results(m.model.emV)), sum(get_significant_results(m.model.emV) %in% null.ind))
V.1by1 = c(get_loglik(m.model.1by1), length(get_significant_results(m.model.1by1)), sum(get_significant_results(m.model.1by1) %in% null.ind))
V.correct = c(get_loglik(m.model.correct), length(get_significant_results(m.model.correct)), sum(get_significant_results(m.model.correct) %in% null.ind))
temp = cbind(V.mle, V.em, V.emV, V.trun, V.1by1, V.correct)
colnames(temp) = c('MLE','EM','EMV', 'Truncate', 'm.1by1', 'True')
row.names(temp) = c('log likelihood', '# significance', '# False positive')
temp %>% kable() %>% kable_styling()

	MLE	EM	EMV	Truncate	m.1by1	True
log likelihood	-17917.69	-17919.23	-17945.16	-17951.46	-17943.49	-17913.58
# significance	146.00	149.00	82.00	85.00	73.00	149.00
# False positive	1.00	1.00	0.00	1.00	0.00	1.00

The estimated pi is

par(mfrow=c(2,3))
barplot(get_estimated_pi(m.model.mle), las=2, cex.names = 0.7, main='MLE', ylim=c(0,0.8))
barplot(get_estimated_pi(m.model.em), las=2, cex.names = 0.7, main='EM', ylim=c(0,0.8))
barplot(get_estimated_pi(m.model.emV), las=2, cex.names = 0.7, main='EMV', ylim=c(0,0.8))
barplot(get_estimated_pi(m.model.trunc), las=2, cex.names = 0.7, main='Truncate', ylim=c(0,0.8))
barplot(get_estimated_pi(m.model.1by1), las=2, cex.names = 0.7, main='m.1by1', ylim=c(0,0.8))
barplot(get_estimated_pi(m.model.correct), las=2, cex.names = 0.7, main='True', ylim=c(0,0.8))

ROC

m.mle.seq = ROC.table(data$B, m.model.mle)
m.em.seq = ROC.table(data$B, m.model.em)
m.emV.seq = ROC.table(data$B, m.model.emV)
m.trun.seq = ROC.table(data$B, m.model.trunc)
m.1by1.seq = ROC.table(data$B, m.model.1by1)
m.correct.seq = ROC.table(data$B, m.model.correct)

RRMSE

rrmse = rbind(RRMSE(data$B, data$Bhat, list(m.model.mle, m.model.em, m.model.emV, m.model.trunc, m.model.1by1, m.model.correct)))
colnames(rrmse) = c('MLE','EM','EMV', 'Truncate','m.1by1','True')
row.names(rrmse) = 'RRMSE'
rrmse %>% kable() %>% kable_styling()

	MLE	EM	EMV	Truncate	m.1by1	True
RRMSE	0.5277488	0.5285126	0.5437355	0.5592648	0.5442074	0.5283068

barplot(rrmse, ylim=c(0,(1+max(rrmse))/2), names.arg = c('MLE','EM', 'EMV','Truncate','m.1by1','True'), las=2, cex.names = 0.7, main='RRMSE')

More simulations: p = 5

I randomly generate 10 positive definite correlation matrices, V. The sample size is 4000.

\[ \hat{z}|z \sim N_{5}(z, V) \] \[ z\sim\frac{1}{4}\delta_{0} + \frac{1}{4}N_{5}(0,\left(\begin{matrix} 1 & \mathbf{0}_{1\times 4} \\ \mathbf{0}_{4\times 1} & \mathbf{0}_{4\times 4} \end{matrix}\right)) + \frac{1}{4}N_{5}(0,\left(\begin{matrix} \mathbf{1}_{2\times 2} & \mathbf{0}_{1\times 3} \\ \mathbf{0}_{3\times 1} & \mathbf{0}_{3\times 3} \end{matrix}\right)) + \frac{1}{4}N_{5}(0,\mathbf{1}_{5\times 5}) \]

set.seed(100)
n=4000; p = 5
U0 = matrix(0,p,p)
U1 = U0; U1[1,1] = 1
U2 = U0; U2[c(1:2), c(1:2)] = 1
U3 = matrix(1, p,p)
Utrue = list(U0 = U0, U1 = U1, U2 = U2, U3 = U3)
for(t in 1:10){
  Vtrue = clusterGeneration::rcorrmatrix(p)
  data = generate_data(n, p, Vtrue, Utrue)
  # mash cov
  m.data = mash_set_data(Bhat = data$Bhat, Shat = data$Shat)
  m.1by1 = mash_1by1(m.data)
  strong = get_significant_results(m.1by1)

  U.pca = cov_pca(m.data, 3, subset = strong)
  U.ed = cov_ed(m.data, U.pca, subset = strong)
  U.c = cov_canonical(m.data)
  Vhat.mle <- estimateV(m.data, c(U.c, U.ed), init_rho = c(-0.5,0,0.5), tol=1e-4, optmethod = 'mle')
  Vhat.em <- estimateV(m.data, c(U.c, U.ed), init_rho = c(-0.5,0,0.5), tol=1e-4, optmethod = 'em2')
  Vhat.emV <- estimateV(m.data, c(U.c, U.ed), init_V = list(diag(ncol(m.data$Bhat)), clusterGeneration::rcorrmatrix(p), clusterGeneration::rcorrmatrix(p)),tol=1e-4, optmethod = 'emV')
  saveRDS(list(V.true = Vtrue, V.mle = Vhat.mle, V.em = Vhat.em, V.emV = Vhat.emV, data = data, strong=strong),
          paste0('../output/MASH.result.',t,'.rds'))
}

files = dir("../output/AddEMV/"); files = files[grep("MASH.result",files)]
times = length(files)
result = vector(mode="list",length = times)
for(i in 1:times) {
  result[[i]] = readRDS(paste("../output/AddEMV/", files[[i]], sep=""))
}

mle.pd = numeric(times)
em.pd = numeric(times)
for(i in 1:times){
  m.data = mash_set_data(result[[i]]$data$Bhat, result[[i]]$data$Shat)
  
  result[[i]]$V.trun = estimate_null_correlation(m.data, apply_lower_bound = FALSE)
  m.1by1 = mash_1by1(m.data)
  strong = get_significant_results(m.1by1)
  result[[i]]$V.1by1 = cor(m.data$Bhat[-strong,])
  U.c = cov_canonical(m.data)
  U.pca = cov_pca(m.data, 3, subset = strong)
  U.ed = cov_ed(m.data, U.pca, subset = strong)

  m.data.true = mash_set_data(Bhat = m.data$Bhat, Shat = m.data$Shat, V = result[[i]]$V.true)
  m.model.true = mash(m.data.true, c(U.c,U.ed), verbose = FALSE)
  
  m.data.trunc = mash_set_data(Bhat = m.data$Bhat, Shat = m.data$Shat, V = result[[i]]$V.trun)
  m.model.trunc = mash(m.data.trunc, c(U.c,U.ed), verbose = FALSE)
  
  m.data.1by1 = mash_set_data(Bhat = m.data$Bhat, Shat = m.data$Shat, V = result[[i]]$V.1by1)
  m.model.1by1 = mash(m.data.1by1, c(U.c,U.ed), verbose = FALSE)
  
  m.data.emV = mash_set_data(Bhat = m.data$Bhat, Shat = m.data$Shat, V = result[[i]]$V.emV$V)
  m.model.emV = mash(m.data.emV, c(U.c,U.ed), verbose = FALSE)
  
  # MLE
  
  m.model.mle = m.model.mle.F = m.model.mle.2 = list()
  R <- tryCatch(chol(result[[i]]$V.mle$V),error = function (e) FALSE)
  if(is.matrix(R)){
    mle.pd[i] = 1
    m.data.mle = mash_set_data(Bhat = m.data$Bhat, Shat = m.data$Shat, V = result[[i]]$V.mle$V)
    m.model.mle = mash(m.data.mle, c(U.c,U.ed), verbose = FALSE)
  }else{
    V.mle.near.F = as.matrix(Matrix::nearPD(result[[i]]$V.mle$V, conv.norm.type = 'F', keepDiag = TRUE)$mat)
    V.mle.near.2 = as.matrix(Matrix::nearPD(result[[i]]$V.mle$V, conv.norm.type = '2', keepDiag = TRUE)$mat)
    
    result[[i]]$V.mle.F = V.mle.near.F
    result[[i]]$V.mle.2 = V.mle.near.2
    
    # mashmodel
    m.data.mle.F = mash_set_data(Bhat = m.data$Bhat, Shat = m.data$Shat, V = V.mle.near.F)
    m.model.mle.F = mash(m.data.mle.F, c(U.c,U.ed), verbose = FALSE)
    
    m.data.mle.2 = mash_set_data(Bhat = m.data$Bhat, Shat = m.data$Shat, V = V.mle.near.2)
    m.model.mle.2 = mash(m.data.mle.2, c(U.c,U.ed), verbose = FALSE)
  }
  
  # EM
  m.model.em = m.model.em.F = m.model.em.2 = list()
  R <- tryCatch(chol(result[[i]]$V.em$V),error = function (e) FALSE)
  if(is.matrix(R)){
    em.pd[i] = 1
    m.data.em = mash_set_data(Bhat = m.data$Bhat, Shat = m.data$Shat, V = result[[i]]$V.em$V)
    m.model.em = mash(m.data.em, c(U.c,U.ed), verbose = FALSE)
  }else{
    V.em.near.F = as.matrix(Matrix::nearPD(result[[i]]$V.em$V, conv.norm.type = 'F', keepDiag = TRUE)$mat)
    V.em.near.2 = as.matrix(Matrix::nearPD(result[[i]]$V.em$V, conv.norm.type = '2', keepDiag = TRUE)$mat)
    
    result[[i]]$V.em.F = V.em.near.F
    result[[i]]$V.em.2 = V.em.near.2
    
    # mashmodel
    m.data.em.F = mash_set_data(Bhat = m.data$Bhat, Shat = m.data$Shat, V = V.em.near.F)
    m.model.em.F = mash(m.data.em.F, c(U.c,U.ed), verbose = FALSE)
      
    m.data.em.2 = mash_set_data(Bhat = m.data$Bhat, Shat = m.data$Shat, V = V.em.near.2)
    m.model.em.2 = mash(m.data.em.2, c(U.c,U.ed), verbose = FALSE)
  }
  
  result[[i]]$m.model = list(m.model.true = m.model.true, m.model.trunc = m.model.trunc, 
                             m.model.1by1 = m.model.1by1, m.model.emV = m.model.emV,
                             m.model.mle = m.model.mle,
                             m.model.mle.F = m.model.mle.F, m.model.mle.2 = m.model.mle.2, 
                             m.model.em = m.model.em,
                             m.model.em.F = m.model.em.F, m.model.em.2 = m.model.em.2)
}

Error

The column with .F, .2 are from the nearest positive definite matrix with respect to Frobenius norm and spectral norm.

The Frobenius norm is

temp = matrix(0,nrow = times, ncol = 7)
for(i in 1:times){
  temp[i, ] = error.cor(result[[i]], norm.type='F', mle.pd = mle.pd[i], em.pd = em.pd[i])
}
colnames(temp) = c('Trunc','m.1by1', 'MLE','MLE.F', 'EM', 'EM.F', 'EMV')
temp[temp==0] = NA
temp %>% kable() %>% kable_styling()

Trunc	m.1by1	MLE	MLE.F	EM	EM.F	EMV
0.5847549	0.9286016	0.3185293	0.3053278	0.0941757	NA	1.0198618
0.6345196	0.8140108	0.1169154	NA	0.0981075	NA	0.8913631
0.7201453	0.9531300	0.2344733	NA	0.1734402	NA	1.0216638
0.8370832	1.0534335	0.1968091	NA	0.2411372	NA	1.1115619
0.8206008	0.8466408	0.1194142	NA	0.1189734	NA	0.9176794
0.8455747	1.1393764	0.1650726	0.1479670	0.1653665	0.1516188	1.2178752
0.5173056	0.8194980	0.1211599	NA	0.0861518	NA	0.8914736
0.8840057	0.9546138	0.1642744	NA	0.1563607	NA	1.0169747
0.6535878	1.0554833	0.1240352	NA	0.1323659	NA	1.1491978
0.6425639	0.9346341	0.0794776	NA	0.0700813	NA	1.0196620

The spectral norm is

temp = matrix(0,nrow = times, ncol = 7)
for(i in 1:times){
  temp[i, ] = error.cor(result[[i]], norm.type='2', mle.pd = mle.pd[i], em.pd = em.pd[i])
}
colnames(temp) = c('Trunc','m.1by1', 'MLE','MLE.2', 'EM', 'EM.2', 'EMV')
temp[temp==0] = NA
temp %>% kable() %>% kable_styling()

Trunc	m.1by1	MLE	MLE.2	EM	EM.2	EMV
0.4230698	0.7526315	0.2258153	0.2205340	0.0662442	NA	0.8335745
0.5228925	0.6281308	0.0783098	NA	0.0677285	NA	0.7069909
0.5156642	0.7945237	0.1750836	NA	0.1239438	NA	0.8577274
0.6529121	0.8335401	0.1484773	NA	0.1768351	NA	0.8944724
0.6500336	0.6377762	0.0778356	NA	0.0849870	NA	0.7052165
0.5607948	0.8613851	0.1001500	0.0937816	0.1085885	0.1015186	0.9343139
0.3157614	0.6406310	0.0795125	NA	0.0659174	NA	0.6982291
0.7134025	0.7290570	0.1090477	NA	0.1103719	NA	0.7987348
0.4767534	0.8894951	0.0964157	NA	0.0957622	NA	0.9760343
0.4591215	0.7926789	0.0568252	NA	0.0558999	NA	0.8740268

Time

The total running time for each matrix is

mle.time = em.time = numeric(times)
for(i in 1:times){
  mle.time[i] = sum(result[[i]]$V.mle$ttime)
  em.time[i] = sum(result[[i]]$V.em$ttime)
}
temp = cbind(mle.time, em.time)
colnames(temp) = c('MLE', 'EM')
row.names(temp) = 1:10
temp %>% kable() %>% kable_styling()

MLE	EM
3171.983	848.351
2341.424	704.601
1990.764	740.046
3249.162	1072.233
1988.220	717.808
2580.794	958.710
1928.634	597.511
2992.277	1114.932
2339.513	708.779
2727.560	772.780

mash log likelihood

The NA means the estimated correlation matrix is not positive definite.

temp = matrix(0,nrow = times, ncol = 10)
for(i in 1:times){
  temp[i, ] = loglik.cor(result[[i]]$m.model, mle.pd = mle.pd[i], em.pd = em.pd[i])
}
colnames(temp) = c('True', 'Trunc','m.1by1', 'MLE','MLE.F', 'MLE.2', 'EM', 'EM.F', 'EM.2','EMV')
temp[temp == 0] = NA
temp[,-c(6,9)] %>% kable() %>% kable_styling()

True	Trunc	m.1by1	MLE	MLE.F	EM	EM.F	EMV
-26039.92	-26130.50	-26112.94	NA	-26265.49	-26120.81	NA	-26136.19
-25669.59	-26997.92	-26950.67	-26028.96	NA	-25859.86	NA	-26967.47
-27473.71	-27547.11	-27535.67	-27465.76	NA	-27463.43	NA	-27551.75
-28215.48	-28604.98	-28646.72	-28301.41	NA	-28338.57	NA	-28669.24
-24988.68	-25236.18	-25110.41	-25056.59	NA	-25048.86	NA	-25123.96
-24299.89	-24978.79	-24972.25	NA	-24492.32	NA	-24478.09	-25020.38
-27574.71	-27698.40	-27662.48	-27517.65	NA	-27540.65	NA	-27684.83
-27941.86	-28159.65	-28182.04	-27979.99	NA	-27953.02	NA	-28222.45
-29788.75	-29824.45	-29921.87	-29760.81	NA	-29759.50	NA	-29954.96
-28542.34	-28922.58	-29163.51	-28537.96	NA	-28550.73	NA	-29221.61

ROC

par(mfrow=c(1,2))
for(i in 1:times){
  plotROC(result[[i]]$data$B, result[[i]]$m.model, mle.pd = mle.pd[i], em.pd = em.pd[i], title=paste0('Data', i, ' '))
}

RRMSE

par(mfrow=c(1,2))
for(i in 1:times){
  rrmse = rbind(RRMSE(result[[i]]$data$B, result[[i]]$data$Bhat, result[[i]]$m.model))
  barplot(rrmse, ylim=c(0,(1+max(rrmse))/2), las=2, cex.names = 0.7, main='RRMSE')
}

Expand here to see past versions of unnamed-chunk-31-1.png:

Version	Author	Date
802266a	zouyuxin	2018-08-15

Expand here to see past versions of unnamed-chunk-31-2.png:

Version	Author	Date
802266a	zouyuxin	2018-08-15

Expand here to see past versions of unnamed-chunk-31-3.png:

Version	Author	Date
802266a	zouyuxin	2018-08-15

Expand here to see past versions of unnamed-chunk-31-4.png:

Version	Author	Date
802266a	zouyuxin	2018-08-15

Expand here to see past versions of unnamed-chunk-31-5.png:

Version	Author	Date
802266a	zouyuxin	2018-08-15

Session information

sessionInfo()

R version 3.5.1 (2018-07-02)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.6

Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] kableExtra_0.9.0 knitr_1.20       mashr_0.2-11     ashr_2.2-10     

loaded via a namespace (and not attached):
 [1] corrplot_0.84            lattice_0.20-35         
 [3] Rmosek_8.0.69            colorspace_1.3-2        
 [5] htmltools_0.3.6          viridisLite_0.3.0       
 [7] yaml_2.2.0               rlang_0.2.1             
 [9] R.oo_1.22.0              pillar_1.3.0            
[11] R.utils_2.6.0            ExtremeDeconvolution_1.3
[13] REBayes_1.3              foreach_1.4.4           
[15] plyr_1.8.4               stringr_1.3.1           
[17] munsell_0.5.0            workflowr_1.1.1         
[19] rvest_0.3.2              R.methodsS3_1.7.1       
[21] mvtnorm_1.0-8            codetools_0.2-15        
[23] evaluate_0.11            pscl_1.5.2              
[25] doParallel_1.0.11        parallel_3.5.1          
[27] highr_0.7                Rcpp_0.12.18            
[29] readr_1.1.1              backports_1.1.2         
[31] scales_0.5.0             rmeta_3.0               
[33] clusterGeneration_1.3.4  truncnorm_1.0-8         
[35] hms_0.4.2                digest_0.6.15           
[37] stringi_1.2.4            grid_3.5.1              
[39] rprojroot_1.3-2          tools_3.5.1             
[41] magrittr_1.5             tibble_1.4.2            
[43] crayon_1.3.4             whisker_0.3-2           
[45] pkgconfig_2.0.1          MASS_7.3-50             
[47] Matrix_1.2-14            SQUAREM_2017.10-1       
[49] xml2_1.2.0               assertthat_0.2.0        
[51] rmarkdown_1.10           httr_1.3.1              
[53] rstudioapi_0.7           iterators_1.0.10        
[55] R6_2.2.2                 git2r_0.23.0            
[57] compiler_3.5.1

This reproducible R Markdown analysis was created with workflowr 1.1.1

Estimate cor—max MASH

Yuxin Zou

2018-07-25

One example: p = 3

Error

mash log likelihood

ROC

RRMSE

More simulations: p = 5

Error

Time

mash log likelihood

ROC

RRMSE

Session information