Last updated: 2020-09-28

Checks: 7 0

Knit directory: mr_mash_test/

This reproducible R Markdown analysis was created with workflowr (version 1.6.2). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.

R Markdown file: up-to-date

Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Environment: empty

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

Seed: set.seed(20200328)

The command set.seed(20200328) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Session information: recorded

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Cache: none

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

File paths: relative

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Repository version: 686329d

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version 686329d. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .sos/
    Ignored:    code/assess_mrmash_speed.2835257.err
    Ignored:    code/assess_mrmash_speed.2835257.out
    Ignored:    code/assess_mrmash_speed.2914525.err
    Ignored:    code/assess_mrmash_speed.2914525.out
    Ignored:    code/fit_mr_mash.66662433.err
    Ignored:    code/fit_mr_mash.66662433.out
    Ignored:    dsc/.sos/
    Ignored:    dsc/dsc_mvreg_07_29_20.3339807.err
    Ignored:    dsc/dsc_mvreg_07_29_20.3339807.out
    Ignored:    dsc/dsc_mvreg_07_29_20.3340460.err
    Ignored:    dsc/dsc_mvreg_07_29_20.3340460.out
    Ignored:    dsc/dsc_mvreg_07_29_20.3341113.err
    Ignored:    dsc/dsc_mvreg_07_29_20.3341113.out
    Ignored:    dsc/dsc_mvreg_07_29_20.3352804.err
    Ignored:    dsc/dsc_mvreg_07_29_20.3352804.out
    Ignored:    dsc/outfiles/
    Ignored:    output/dsc.html
    Ignored:    output/dsc/
    Ignored:    output/dsc_05_18_20.html
    Ignored:    output/dsc_05_18_20/
    Ignored:    output/dsc_07_29_20.html
    Ignored:    output/dsc_07_29_20/
    Ignored:    output/dsc_OLD.html
    Ignored:    output/dsc_OLD/
    Ignored:    output/dsc_test.html
    Ignored:    output/dsc_test/
    Ignored:    output/dsc_test_inter/
    Ignored:    output/mvreg_all_genes_prior_corrX_indepV_sharedB_2blocksr10.html
    Ignored:    output/mvreg_all_genes_prior_corrX_indepV_sharedB_2blocksr10/
    Ignored:    output/mvreg_all_genes_prior_corrX_indepV_sharedB_2blocksr10_inter/
    Ignored:    output/mvreg_all_genes_prior_highcorrX_indepV_sharedB_2blocksr10.html
    Ignored:    output/mvreg_all_genes_prior_highcorrX_indepV_sharedB_2blocksr10/
    Ignored:    output/mvreg_all_genes_prior_highcorrX_indepV_sharedB_2blocksr10_inter/
    Ignored:    output/mvreg_all_genes_prior_indepX_indepV_sharedB_2blocksr10.html
    Ignored:    output/mvreg_all_genes_prior_indepX_indepV_sharedB_2blocksr10/
    Ignored:    output/mvreg_all_genes_prior_indepX_indepV_sharedB_2blocksr10_inter/
    Ignored:    output/mvreg_all_genes_prior_indepX_indepV_sharedB_3causalrespr10.html
    Ignored:    output/mvreg_all_genes_prior_indepX_indepV_sharedB_3causalrespr10/
    Ignored:    output/mvreg_all_genes_prior_indepX_indepV_sharedB_3causalrespr10_inter/
    Ignored:    output/test_dense_issue.rds
    Ignored:    output/test_sparse_issue.rds
    Ignored:    scripts/.sos/
    Ignored:    scripts/mvreg_all_genes_prior_corrX_indepV_sharedB_2blocksr10_pipeline.5297178.err
    Ignored:    scripts/mvreg_all_genes_prior_corrX_indepV_sharedB_2blocksr10_pipeline.5297178.out
    Ignored:    scripts/mvreg_all_genes_prior_corrX_indepV_sharedB_2blocksr10_pipeline.5297184.err
    Ignored:    scripts/mvreg_all_genes_prior_corrX_indepV_sharedB_2blocksr10_pipeline.5297184.out
    Ignored:    scripts/mvreg_all_genes_prior_corrX_indepV_sharedB_2blocksr10_pipeline.5299552.err
    Ignored:    scripts/mvreg_all_genes_prior_corrX_indepV_sharedB_2blocksr10_pipeline.5299552.out
    Ignored:    scripts/mvreg_all_genes_prior_corrX_indepV_sharedB_2blocksr10_pipeline.5300596.err
    Ignored:    scripts/mvreg_all_genes_prior_corrX_indepV_sharedB_2blocksr10_pipeline.5300596.out
    Ignored:    scripts/mvreg_all_genes_prior_corrX_indepV_sharedB_2blocksr10_pipeline.5300796.err
    Ignored:    scripts/mvreg_all_genes_prior_corrX_indepV_sharedB_2blocksr10_pipeline.5300796.out
    Ignored:    scripts/mvreg_all_genes_prior_highcorrX_indepV_sharedB_2blocksr10_pipeline.5305725.err
    Ignored:    scripts/mvreg_all_genes_prior_highcorrX_indepV_sharedB_2blocksr10_pipeline.5305725.out
    Ignored:    scripts/mvreg_all_genes_prior_highcorrX_indepV_sharedB_2blocksr10_pipeline.5305727.err
    Ignored:    scripts/mvreg_all_genes_prior_highcorrX_indepV_sharedB_2blocksr10_pipeline.5305727.out
    Ignored:    scripts/mvreg_all_genes_prior_highcorrX_indepV_sharedB_2blocksr10_pipeline.5308354.err
    Ignored:    scripts/mvreg_all_genes_prior_highcorrX_indepV_sharedB_2blocksr10_pipeline.5308354.out
    Ignored:    scripts/mvreg_all_genes_prior_highcorrX_indepV_sharedB_2blocksr10_pipeline.5309181.err
    Ignored:    scripts/mvreg_all_genes_prior_highcorrX_indepV_sharedB_2blocksr10_pipeline.5309181.out
    Ignored:    scripts/mvreg_all_genes_prior_highcorrX_indepV_sharedB_2blocksr10_pipeline.5310380.err
    Ignored:    scripts/mvreg_all_genes_prior_highcorrX_indepV_sharedB_2blocksr10_pipeline.5310380.out
    Ignored:    scripts/mvreg_all_genes_prior_indepX_indepV_sharedB_2blocksr10_pipeline.5290364.err
    Ignored:    scripts/mvreg_all_genes_prior_indepX_indepV_sharedB_2blocksr10_pipeline.5290364.out
    Ignored:    scripts/mvreg_all_genes_prior_indepX_indepV_sharedB_2blocksr10_pipeline.5290365.err
    Ignored:    scripts/mvreg_all_genes_prior_indepX_indepV_sharedB_2blocksr10_pipeline.5290365.out
    Ignored:    scripts/mvreg_all_genes_prior_indepX_indepV_sharedB_2blocksr10_pipeline.5293048.err
    Ignored:    scripts/mvreg_all_genes_prior_indepX_indepV_sharedB_2blocksr10_pipeline.5293048.out
    Ignored:    scripts/mvreg_all_genes_prior_indepX_indepV_sharedB_2blocksr10_pipeline.5294079.err
    Ignored:    scripts/mvreg_all_genes_prior_indepX_indepV_sharedB_2blocksr10_pipeline.5294079.out
    Ignored:    scripts/mvreg_all_genes_prior_indepX_indepV_sharedB_2blocksr10_pipeline.5295637.err
    Ignored:    scripts/mvreg_all_genes_prior_indepX_indepV_sharedB_2blocksr10_pipeline.5295637.out
    Ignored:    scripts/mvreg_all_genes_prior_indepX_indepV_sharedB_2blocksr10_pipeline.5297100.err
    Ignored:    scripts/mvreg_all_genes_prior_indepX_indepV_sharedB_2blocksr10_pipeline.5297100.out
    Ignored:    scripts/mvreg_all_genes_prior_indepX_indepV_sharedB_3causalrespr10_pipeline.5315542.err
    Ignored:    scripts/mvreg_all_genes_prior_indepX_indepV_sharedB_3causalrespr10_pipeline.5315542.out
    Ignored:    scripts/mvreg_all_genes_prior_indepX_indepV_sharedB_3causalrespr10_pipeline.5315575.err
    Ignored:    scripts/mvreg_all_genes_prior_indepX_indepV_sharedB_3causalrespr10_pipeline.5315575.out
    Ignored:    scripts/mvreg_all_genes_prior_indepX_indepV_sharedB_3causalrespr10_pipeline.5318936.err
    Ignored:    scripts/mvreg_all_genes_prior_indepX_indepV_sharedB_3causalrespr10_pipeline.5318936.out
    Ignored:    scripts/mvreg_all_genes_prior_indepX_indepV_sharedB_3causalrespr10_pipeline.5374687.err
    Ignored:    scripts/mvreg_all_genes_prior_indepX_indepV_sharedB_3causalrespr10_pipeline.5374687.out
    Ignored:    scripts/mvreg_all_genes_prior_indepX_indepV_sharedB_3causalrespr10_pipeline.5374902.err
    Ignored:    scripts/mvreg_all_genes_prior_indepX_indepV_sharedB_3causalrespr10_pipeline.5374902.out

Untracked files:
    Untracked:  analysis/dense_issue_investigation2/
    Untracked:  code/20200502_Prepare_ED_prior.ipynb
    Untracked:  code/plot_test.R
    Untracked:  dsc/mvreg_all_genes_prior_09_11_20_COPY.dsc

Unstaged changes:
    Deleted:    analysis/results_mvreg_all_genes_prior_indepX_indepV_sharedB_2blocksr10.Rmd
    Modified:   dsc/midway2.yml

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.

These are the previous versions of the repository in which changes were made to the R Markdown (analysis/results_mvreg_all_genes_prior_allscenariosX_indepV_sharedB_3causalrespr10.Rmd) and HTML (docs/results_mvreg_all_genes_prior_allscenariosX_indepV_sharedB_3causalrespr10.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File	Version	Author	Date	Message
Rmd	686329d	fmorgante	2020-09-28	Fix bullets (maybe now?)
html	0cfeb33	fmorgante	2020-09-28	Build site.
Rmd	736065e	fmorgante	2020-09-28	Fix bullets (for real?)
html	98989fe	fmorgante	2020-09-28	Build site.
html	d3995e4	fmorgante	2020-09-28	Build site.
Rmd	4266133	fmorgante	2020-09-28	Change order of methods
html	a91280e	fmorgante	2020-09-28	Build site.
Rmd	b7427f7	fmorgante	2020-09-28	A few small changes
html	ec37fde	fmorgante	2020-09-28	Build site.
Rmd	d6974b3	fmorgante	2020-09-28	Add 3/10 causal responses results

###Set options
options(stringsAsFactors=FALSE)

###Load libraries
library(dscrutils)
library(ggplot2)
library(cowplot)
library(scales)

###Function to convert dscquery output from list to data.frame suitable for plotting
convert_dsc_to_dataframe <- function(dsc){
  ###Data.frame to store the results after convertion
  dsc_df <- data.frame()
  
  ###Get length of list elements 
  n_elem <- length(dsc$DSC)
  
  ###Loop through the dsc list
  for(i in 1:n_elem){
    ##Prepare vectors making up the final data frame
    r_scalar <- dsc$simulate.r[i]
    repp <- rep(dsc$DSC[i], times=r_scalar)
    n <- rep(dsc$simulate.n[i], times=r_scalar)
    p <- rep(dsc$simulate.p[i], times=r_scalar)
    p_causal <- rep(dsc$simulate.p_causal[i], times=r_scalar)
    r <- rep(dsc$simulate.r[i], times=r_scalar)
    response <- 1:r_scalar
    pve <- rep(dsc$simulate.pve[i], times=r_scalar)
    simulate <- rep(dsc$simulate[i], times=r_scalar)
    fit <- rep(dsc$fit[i], times=r_scalar)
    score <- rep(dsc$score[i], times=r_scalar)
    score.err <- dsc$score.err[[i]]
    timing <- rep(dsc$fit.time[i], times=r_scalar)
    
    ##Build the data frame
    df <- data.frame(rep=repp, n=n, p=p, p_num_caus=p_causal, r=r, response=response, pve=pve,  
                     scenario=simulate, method=fit, score_metric=score, score_value=score.err, time=timing)
    dsc_df <- rbind(dsc_df, df)
  }
  
  return(dsc_df)
}

###Function to compute rmse (relative to mr_mash_consec_em)
compute_rrmse <- function(dsc_plot, log10_scale=FALSE){
  dsc_plot <- transform(dsc_plot, experiment=paste(rep, response, scenario, sep="-"))
  t <- 0
  for (i in unique(dsc_plot$experiment)) {
    t <- t+1
    rmse_data  <- dsc_plot[which(dsc_plot$experiment == i & dsc_plot$score_metric=="scaled_mse"), ]
    mse_mr_mash_consec_em <- rmse_data[which(rmse_data$method=="mr_mash_em_can"), "score_value"]
    if(!log10_scale)
      rmse_data$score_value <- rmse_data$score_value/mse_mr_mash_consec_em
    else
      rmse_data$score_value <- log10(rmse_data$score_value/mse_mr_mash_consec_em)
    rmse_data$score_metric <- "rrmse"
    if(t>1){
      rmse_data_tot <- rbind(rmse_data_tot, rmse_data)
    } else if(t==1){
      rmse_data_tot <- rmse_data
    }
  }
  
  rmse_data_tot$experiment <- NULL
  
  return(rmse_data_tot)
}

###Function to shift legend in the empty facet
shift_legend <- function(p) {
  library(gtable)
  library(lemon)
  # check if p is a valid object
  if(!(inherits(p, "gtable"))){
    if(inherits(p, "ggplot")){
      gp <- ggplotGrob(p) # convert to grob
    } else {
      message("This is neither a ggplot object nor a grob generated from ggplotGrob. Returning original plot.")
      return(p)
    }
  } else {
    gp <- p
  }
  
  # check for unfilled facet panels
  facet.panels <- grep("^panel", gp[["layout"]][["name"]])
  empty.facet.panels <- sapply(facet.panels, function(i) "zeroGrob" %in% class(gp[["grobs"]][[i]]), 
                               USE.NAMES = F)
  empty.facet.panels <- facet.panels[empty.facet.panels]
  
  if(length(empty.facet.panels) == 0){
    message("There are no unfilled facet panels to shift legend into. Returning original plot.")
    return(p)
  }
  
  # establish name of empty panels
  empty.facet.panels <- gp[["layout"]][empty.facet.panels, ]
  names <- empty.facet.panels$name
  
  # return repositioned legend
  reposition_legend(p, 'center', panel=names)
}

###Set some quantities used in the following plots
colors <- c("skyblue", "dodgerblue", "limegreen", "green", "gold", "orange", "red", "firebrick", "darkmagenta", "mediumpurple")
facet_labels <- c(r2 = "r2", bias = "bias", rrmse="rRMSE")

###Load the dsc results
dsc_out <- dscquery("output/mvreg_all_genes_prior_indepX_indepV_sharedB_3causalrespr10", 
                    c("simulate.n", "simulate.p", "simulate.p_causal", "simulate.r",   
                      "simulate.w","simulate.r_causal", "simulate.pve", "simulate.B_cor",  
                      "simulate.B_scale", "simulate.X_cor", "simulate.X_scale",
                      "simulate.V_cor", "simulate", "fit", "score", "score.err", "fit.time"), 
                    groups="fit: mr_mash_em_can, mr_mash_em_data, mr_mash_em_dataAndcan, mlasso, mridge, menet", 
                    verbose=FALSE)

###Obtain simulation parameters
prop_testset <- 0.2
n <- unique(dsc_out$simulate.n)
p <- unique(dsc_out$simulate.p)
p_causal <- unique(dsc_out$simulate.p_causal)
r <- unique(dsc_out$simulate.r)
r_causal <- eval(parse(text=unique(dsc_out$simulate.r_causal)))
pve <- unique(dsc_out$simulate.pve)
w <- eval(parse(text=unique(dsc_out$simulate.w)))
B_cor <- eval(parse(text=unique(dsc_out$simulate.B_cor)))
B_scale <- eval(parse(text=unique(dsc_out$simulate.B_scale)))
X_cor <- unique(dsc_out$simulate.X_cor)
X_scale <- unique(dsc_out$simulate.X_scale)
V_cor <- unique(dsc_out$simulate.V_cor)

###Remove list elements that are not useful anymore
dsc_out$simulate.r_causal <- NULL
dsc_out$simulate.w <- NULL
dsc_out$simulate.B_cor <- NULL
dsc_out$simulate.B_scale <- NULL
dsc_out$simulate.X_cor <- NULL
dsc_out$simulate.X_scale <- NULL
dsc_out$simulate.V_cor <- NULL

Independent predictors

Simulation set up

The results below are based on simulations with 900 samples, 5000 variables of which 5 were causal, 10 responses with a per-response proportion of variance explained (PVE) of 0.2. Variables, X, were drawn from MVN(0, Gamma), where Gamma is such that it achieves a correlation between variables of 0 and a scale of 1. Causal effects, B, were drawn from MVN(0, Sigma), Sigma is such that it achieves a correlation between responses of 1. Response 1, 2, 3 have causal effects while the remaining seven responses do not have any causal effect. The responses, Y, were drawn from MN(XB, I, V), where V is such that it achieves a correlation between responses of 0 and a scale defined by PVE, in the case of cuasal responses. Non causal responses are from N(0, 1).

2000 such datasets (i.e., “genes”) were simulated and univariate summary statistics were obtained by simple linear regression in the training data (80% of the data. To mirror what would happen in real data analysis, the indexes of the training-test individuals were the same for all the datasets. However, since these 2000 datasets were simulated independently, I do not think it matters). These regression coefficients and standard errors were used as input in the mash pipeline (from Gao) to compute data-driven covariance matrices (up to the ED step included). In particular, the top variable per dataset was used to define a “strong” set and 4 random variables per dataset were used to define a “random” set. Covariance matrices were estimated using flash, PCA (including the top 3 PCs), and the empirical covariance matrix.

The first 50 datasets were used for the prediction analysis. mr.mash was fitted to the training data, updating V (imposing a diagonal structure) and updating the prior weights using EM updates. The mixture prior consisted of components defined by:

canonical matrices correpsonding to different settings of effect sharing/specificity (i.e., singletons, independent, low heterogeneity, medium heterogeneity, high heterogeneity, shared) plus the spike.
data-driven matrices estimated as described above plus the spike.
both canonical and data-driven matrices plus the spike.

The covariance matrices were scaled by a grid of values computed from the univariate summary statistics as in the mash paper. The posterior mean of the regression coefficients were initialized to the estimates of the group-LASSO. The mixture weights were initialized with the proportion of zero-coefficients from the group-LASSO estimate as the weight on the spike and the proportion of non-zero-coefficients split equally among the remaining components.Convergence was declared when the maximum difference in the ELBO between two successive iterations was smaller than 1e-2.

Then, responses were predicted on the test data (20% of the data).

Here, we evaluate the accuracy of prediction assessed by \(r^2\) and bias (slope) from the regression of the true response on the predicted response, and the relative root mean square error (rRMSE) scaled by the standard deviation of the true responses in the test data. The boxplots are across the 50 datasets, for each response separately.

mr.mash vs other methods

Here, we compare mr.mash to the multivariate versions of LASSO as implemented in glmnet. The form of the penalty id the following: \(\lambda[(1-\alpha)/2 ||\mathbf{\beta}_j||^2_2 + \alpha ||\mathbf{\beta}_j||_2]\). \(\lambda\) is chosen by cross-validation in the training set. All the methods were using 4 threads – mr.mash loops over the mixture components in parallel, glmnet loops over folds in parallel.

###Convert from list to data.frame for plotting
dsc_plots <- convert_dsc_to_dataframe(dsc_out)

###Compute rmse score (relative to mr_mash_consec_em) and add it to the data
rrmse_dat <- compute_rrmse(dsc_plots)
dsc_plots <- rbind(dsc_plots, rrmse_dat)

###Remove mse from scores and keep only methods wanted
dsc_plots <- dsc_plots[which(dsc_plots$score_metric!="scaled_mse" ), ]
dsc_plots <- dsc_plots[which(dsc_plots$method %in% c("mr_mash_em_can", "mr_mash_em_data",
                                                          "mr_mash_em_dataAndcan", "mlasso")), ]

###Create factor version of method
dsc_plots$method_fac <- factor(dsc_plots$method, levels=c("mr_mash_em_can", "mr_mash_em_data",
                                                          "mr_mash_em_dataAndcan", "mlasso"),
                                                labels=c("mr_mash_can", "mr_mash_data", "mr_mash_both",
                                                         "mlasso"))

###Build data.frame with best accuracy achievable
hlines <- data.frame(score_metric=c("r2", "bias", "rrmse"), max_val=c(unique(dsc_plots$pve), 1, 1))

###Create factor version of response
dsc_plots$response_fac <- as.factor(dsc_plots$response)

###Plot bias
p_bias <- ggplot(dsc_plots[which(dsc_plots$score_metric=="bias"), ], aes_string(x = "response_fac", y = "score_value", fill = "method_fac")) +
  geom_boxplot(color = "black", outlier.size = 1, width = 0.85) +
  scale_fill_manual(values = colors) +
  labs(x = "Response", y = "Bias", title = "Bias", fill="Method") +
  geom_hline(data=hlines[which(hlines$score_metric=="bias"), ], aes(yintercept = max_val), linetype="dashed", size=1) +
  theme_cowplot(font_size = 20) +
  theme(plot.title = element_text(hjust = 0.5))

print(p_bias)

Version	Author	Date
d3995e4	fmorgante	2020-09-28
a91280e	fmorgante	2020-09-28
ec37fde	fmorgante	2020-09-28

###Plot r2
p_r2 <- ggplot(dsc_plots[which(dsc_plots$score_metric=="r2"), ], aes_string(x = "response_fac", y = "score_value", fill = "method_fac")) +
  geom_boxplot(color = "black", outlier.size = 1, width = 0.85) +
  scale_fill_manual(values = colors) +
  labs(x = "Response", y = "r2", title = "r2", fill="Method") +
  geom_hline(data=hlines[which(hlines$score_metric=="r2"), ], aes(yintercept = max_val), linetype="dashed", size=1) +
  theme_cowplot(font_size = 20) +
  theme(plot.title = element_text(hjust = 0.5))

print(p_r2)

Version	Author	Date
d3995e4	fmorgante	2020-09-28
a91280e	fmorgante	2020-09-28
ec37fde	fmorgante	2020-09-28

###Plot rrmse
p_rrmse <- ggplot(dsc_plots[which(dsc_plots$score_metric=="rrmse"), ], aes_string(x = "response_fac", y = "score_value", fill = "method_fac")) +
  geom_boxplot(color = "black", outlier.size = 1, width = 0.85) +
  scale_fill_manual(values = colors) +
  labs(x = "Response", y = "rRMSE", title = "rMSE", fill="Method") +
  geom_hline(data=hlines[which(hlines$score_metric=="rrmse"), ], aes(yintercept = max_val), linetype="dashed", size=1) +
  theme_cowplot(font_size = 20) +
  theme(plot.title = element_text(hjust = 0.5))

print(p_rrmse)

Version	Author	Date
d3995e4	fmorgante	2020-09-28
a91280e	fmorgante	2020-09-28
ec37fde	fmorgante	2020-09-28

Let’s now remove outliers from the plots to make things a little clearer.

###Plot bias
p_bias <- ggplot(dsc_plots[which(dsc_plots$score_metric=="bias"), ], aes_string(x = "response_fac", y = "score_value", fill = "method_fac")) +
  Ipaper::geom_boxplot2(color = "black",width.errorbar = 0, width = 0.85) +  
  scale_fill_manual(values = colors) +
  labs(x = "Response", y = "Bias", title = "Bias", fill="Method") +
  geom_hline(data=hlines[which(hlines$score_metric=="bias"), ], aes(yintercept = max_val), linetype="dashed", size=1) +
  theme_cowplot(font_size = 20) +
  theme(plot.title = element_text(hjust = 0.5))

print(p_bias)

Version	Author	Date
d3995e4	fmorgante	2020-09-28
a91280e	fmorgante	2020-09-28

###Plot r2
p_r2 <- ggplot(dsc_plots[which(dsc_plots$score_metric=="r2"), ], aes_string(x = "response_fac", y = "score_value", fill = "method_fac")) +
  Ipaper::geom_boxplot2(color = "black",width.errorbar = 0, width = 0.85) +  
  scale_fill_manual(values = colors) +
  labs(x = "Response", y = "r2", title = "r2", fill="Method") +
  geom_hline(data=hlines[which(hlines$score_metric=="r2"), ], aes(yintercept = max_val), linetype="dashed", size=1) +
  theme_cowplot(font_size = 20) +
  theme(plot.title = element_text(hjust = 0.5))

print(p_r2)

Version	Author	Date
d3995e4	fmorgante	2020-09-28
a91280e	fmorgante	2020-09-28

###Plot rrmse
p_rrmse <- ggplot(dsc_plots[which(dsc_plots$score_metric=="rrmse"), ], aes_string(x = "response_fac", y = "score_value", fill = "method_fac")) +
  Ipaper::geom_boxplot2(color = "black",width.errorbar = 0, width = 0.85) +
  scale_fill_manual(values = colors) +
  labs(x = "Response", y = "rRMSE", title = "rMSE", fill="Method") +
  geom_hline(data=hlines[which(hlines$score_metric=="rrmse"), ], aes(yintercept = max_val), linetype="dashed", size=1) +
  theme_cowplot(font_size = 20) +
  theme(plot.title = element_text(hjust = 0.5))

print(p_rrmse)

Version	Author	Date
d3995e4	fmorgante	2020-09-28
a91280e	fmorgante	2020-09-28

Here, we look at the elapsed time (\(log_{10}\) seconds) of each method. Note that the mr.mash run time does not include the run time of group-LASSO (but should be considered since we used it to initialize mr.mash).

dsc_plots_time <- dsc_plots[which(dsc_plots$response==1 & dsc_plots$score_metric=="r2"), 
                          -which(colnames(dsc_plots) %in% c("score_metric", "score_value", "response"))]

p_time <- ggplot(dsc_plots_time, aes_string(x = "method_fac", y = "time", fill = "method_fac")) +
  geom_boxplot(color = "black", outlier.size = 1, width = 0.85) +
  scale_fill_manual(values = colors) +
  scale_y_continuous(trans="log10", breaks = trans_breaks("log10", function(x) 10^x),
                      labels = trans_format("log10", math_format(10^.x))) +
  labs(x = "", y = "Elapsed time (seconds) in log10 scale",title = "Run time", fill="Method") +
  theme_cowplot(font_size = 20) +
  theme(axis.line.x = element_blank(), axis.text.x = element_blank(), axis.ticks.x = element_blank(),
        plot.title = element_text(hjust = 0.5))

print(p_time)

Version	Author	Date
a91280e	fmorgante	2020-09-28
ec37fde	fmorgante	2020-09-28

In summary, mr.mash with the data-driven covariance matrices does very well and runs pretty fast. Using only the canonical covariance matrices is not as effective (as expected) in this case, but it’s still good. Using both types of covariance matrices does not add anything in terms of performance to using only the data-driven matrices. However, it makes the method much slower to run.

sessionInfo()

R version 3.5.1 (2018-07-02)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Scientific Linux 7.4 (Nitrogen)

Matrix products: default
BLAS/LAPACK: /software/openblas-0.2.19-el7-x86_64/lib/libopenblas_haswellp-r0.2.19.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] devtools_2.1.0  usethis_1.5.1   magrittr_1.5    scales_1.1.1   
[5] cowplot_1.0.0   ggplot2_3.3.2   dscrutils_0.4.2

loaded via a namespace (and not attached):
 [1] pkgload_1.1.0      jsonlite_1.6       foreach_1.4.4     
 [4] Ipaper_0.1.5       assertthat_0.2.1   sp_1.3-1          
 [7] cellranger_1.1.0   yaml_2.2.1         remotes_2.1.0     
[10] progress_1.2.2     sessioninfo_1.1.1  pillar_1.4.4      
[13] backports_1.1.8    lattice_0.20-38    glue_1.4.1        
[16] digest_0.6.25      RColorBrewer_1.1-2 promises_1.0.1    
[19] colorspace_1.4-1   htmltools_0.3.6    httpuv_1.4.5      
[22] plyr_1.8.6         clipr_0.4.1        pkgconfig_2.0.3   
[25] purrr_0.3.3        processx_3.4.2     whisker_0.3-2     
[28] openxlsx_4.1.0     later_0.7.5        git2r_0.26.1      
[31] tibble_3.0.1       farver_2.0.3       ellipsis_0.3.1    
[34] withr_2.2.0        repr_0.17          cli_2.0.2         
[37] crayon_1.3.4       readxl_1.1.0       memoise_1.1.0     
[40] evaluate_0.14      ps_1.3.3           fs_1.3.1          
[43] fansi_0.4.1        doParallel_1.0.14  xml2_1.2.0        
[46] pkgbuild_1.0.8     tools_3.5.1        data.table_1.12.8 
[49] prettyunits_1.1.1  hms_0.5.3          matrixStats_0.56.0
[52] lifecycle_0.2.0    stringr_1.4.0      munsell_0.5.0     
[55] zip_1.0.0          callr_3.4.3        compiler_3.5.1    
[58] rlang_0.4.6        grid_3.5.1         rstudioapi_0.11   
[61] iterators_1.0.10   base64enc_0.1-3    labeling_0.3      
[64] rmarkdown_1.10     boot_1.3-20        testthat_2.3.2    
[67] gtable_0.3.0       codetools_0.2-15   reshape2_1.4.4    
[70] R6_2.4.1           lubridate_1.7.4    knitr_1.20        
[73] dplyr_0.8.0.1      workflowr_1.6.2    rprojroot_1.3-2   
[76] desc_1.2.0         stringi_1.4.6      parallel_3.5.1    
[79] IRdisplay_0.6.1    Rcpp_1.0.5         vctrs_0.3.1       
[82] tidyselect_0.2.5

Effects structure with 3 causal responses out of 10

Fabio Morgante

September 28, 2020

Independent predictors

Simulation set up

mr.mash vs other methods