Last updated: 2020-03-30

Checks: 7 0

Knit directory: mr_mash_test/

This reproducible R Markdown analysis was created with workflowr (version 1.6.1). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.

R Markdown file: up-to-date

Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Environment: empty

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

Seed: set.seed(20200328)

The command set.seed(20200328) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Session information: recorded

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Cache: none

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

File paths: relative

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Repository version: cf442a2

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version cf442a2. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    code/fit_mr_mash.66662433.err
    Ignored:    code/fit_mr_mash.66662433.out

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.

These are the previous versions of the repository in which changes were made to the R Markdown (analysis/results_accuracy.Rmd) and HTML (docs/results_accuracy.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File	Version	Author	Date	Message
Rmd	cf442a2	fmorgante	2020-03-30	Fix prediction results
html	4e9aad6	fmorgante	2020-03-30	Build site.
Rmd	1f94def	fmorgante	2020-03-30	Tweak prediction results
html	7911e81	fmorgante	2020-03-30	Build site.
Rmd	a6fd418	fmorgante	2020-03-30	Add prediction results
html	e1707b2	fmorgante	2020-03-30	Build site.
Rmd	2b58162	fmorgante	2020-03-30	Start adding prediction results
html	f2451af	fmorgante	2020-03-29	Build site.
html	2fe7214	fmorgante	2020-03-29	Build site.
Rmd	d43e6a6	fmorgante	2020-03-29	Set current date automatically
html	dbf0d8e	fmorgante	2020-03-29	Build site.
html	c71dfff	fmorgante	2020-03-29	Build site.
Rmd	bafd72f	fmorgante	2020-03-29	Modify titles and author
html	796c93e	fmorgante	2020-03-29	Build site.
Rmd	ed9843a	fmorgante	2020-03-29	Add additional pages

options(stringsAsFactors = FALSE)

mse <- function(Y, Yhat) {
    msee <- rep(NA, ncol(Y))
    
    for(i in 1:ncol(Y)){
        msee[i] <- mean((Y[, i] - Yhat[, i])^2)
    }
    
    return(msee)
}

accuracy <- function(Y, Yhat) {
  bias <- rep(NA, ncol(Y))
  r2 <- rep(NA, ncol(Y))
  
  for(i in 1:ncol(Y)){
    fit  <- lm(Y[, i] ~ Yhat[, i])
    bias[i] <- coef(fit)[2] 
    r2[i] <- summary(fit)$r.squared
  }
  
  return(list(bias=bias, r2=r2))
}

Simulation 1 – Shared effects, independent variables

dat1 <- readRDS("output/fit_mr_mash_n600_p1000_p_caus50_r5_pve0.5_sigmaoffdiag1_sigmascale0.8_gammaoffdiag0_gammascale0.8_Voffdiag0.2_Vscale0_updatew0TRUE_updatew0TRUE_updatew0methodmixsqp_updateVTRUE.rds")
n1 <- dat1$params$n
p1 <- dat1$params$p
p_causal1 <- dat1$params$p_causal
r1 <- dat1$params$r
pve1 <- dat1$params$pve
prop_testset1 <- dat1$params$prop_testset
B1 <- dat1$inputs$B
V1 <- dat1$inputs$V
Sigma1 <- dat1$inputs$Sigma
Gamma1 <- dat1$inputs$Gamma
Ytrain1 <- dat1$Ytrain
Ytest1 <- dat1$Ytest
mu11 <- dat1$fit$mu1
fitted1 <- dat1$fit$fitted
Yhat_test1 <- dat1$Yhat_test

The simulation below is based on 600 samples, 1000 variables of which 50 were causal, 5 responses with a per-response proportion of variance explained (PVE) of 0.5. Variables, X, were drawn from MVN(0, Gamma), causal effects, B, were drawn from MVN(0, Sigma). The responses, Y, were drawn from MN(XB, I, V).

cat("Gamma (First 5 elements)")

Gamma (First 5 elements)

Gamma1[1:5, 1:5]

     [,1] [,2] [,3] [,4] [,5]
[1,]  0.8  0.0  0.0  0.0  0.0
[2,]  0.0  0.8  0.0  0.0  0.0
[3,]  0.0  0.0  0.8  0.0  0.0
[4,]  0.0  0.0  0.0  0.8  0.0
[5,]  0.0  0.0  0.0  0.0  0.8

cat("Sigma")

Sigma

Sigma1

     [,1] [,2] [,3] [,4] [,5]
[1,]  0.8  0.8  0.8  0.8  0.8
[2,]  0.8  0.8  0.8  0.8  0.8
[3,]  0.8  0.8  0.8  0.8  0.8
[4,]  0.8  0.8  0.8  0.8  0.8
[5,]  0.8  0.8  0.8  0.8  0.8

cat("V")

V1

         [,1]     [,2]     [,3]     [,4]     [,5]
[1,] 25.55836  0.00000  0.00000  0.00000  0.00000
[2,]  0.00000 25.55836  0.00000  0.00000  0.00000
[3,]  0.00000  0.00000 25.55836  0.00000  0.00000
[4,]  0.00000  0.00000  0.00000 25.55836  0.00000
[5,]  0.00000  0.00000  0.00000  0.00000 25.55836

mr.mash was fitted to the training data (80% of the data) updating V and updating the prior weights using mixSQP. Then, responses were predicted on the test data (20% of the data).

In the plots below, each color/symbol defines a diffrent response.

Here, we compare the estimated effects with the true effects.

plot(B1[, 1], mu11[, 1], xlab="True effects", ylab="Estimated effects", main="True vs Estimated Effects", pch=1, cex.lab=1.5)
points(B1[, 2], mu11[, 2], col="blue", pch=2)
points(B1[, 3], mu11[, 3], col="red", pch=3)
points(B1[, 4], mu11[, 4], col="green", pch=4)
points(B1[, 5], mu11[, 5], col="yellow", pch=8)

Version	Author	Date
4e9aad6	fmorgante	2020-03-30
7911e81	fmorgante	2020-03-30
e1707b2	fmorgante	2020-03-30

Then, we compare the predicted responses with the true responses in the training data (left panel) and test data (right panel).

par(mfrow=c(1,2))
plot(Ytrain1[, 1], fitted1[, 1], xlab="True responses", ylab="Fitted values", main="True vs Fitted values \nTraining data", pch=1, cex.lab=1.5)
points(Ytrain1[, 2], fitted1[, 2], col="blue", pch=2)
points(Ytrain1[, 3], fitted1[, 3], col="red", pch=3)
points(Ytrain1[, 4], fitted1[, 4], col="green", pch=4)
points(Ytrain1[, 5], fitted1[, 5], col="yellow", pch=8)
abline(0, 1)

plot(Ytrain1[, 1], fitted1[, 1], xlab="True responses", ylab="Predicted responses", main="True vs Predicted Responses \nTest data", pch=1, cex.lab=1.5)
points(Ytest1[, 2], Yhat_test1[, 2], col="blue", pch=2)
points(Ytest1[, 3], Yhat_test1[, 3], col="red", pch=3)
points(Ytest1[, 4], Yhat_test1[, 4], col="green", pch=4)
points(Ytest1[, 5], Yhat_test1[, 5], col="yellow", pch=8)
abline(0, 1)

Version	Author	Date
4e9aad6	fmorgante	2020-03-30
7911e81	fmorgante	2020-03-30

par(mfrow=c(1,1))

cat("Training data r2 =", accuracy(Ytrain1, fitted1)$r2)

Training data r2 = 0.5337638 0.5418288 0.4904039 0.5238892 0.5620165

cat("Test data r2 =", accuracy(Ytest1, Yhat_test1)$r2)

Test data r2 = 0.4590632 0.4475933 0.4630462 0.4522742 0.4236621

cat("Training data bias =", accuracy(Ytrain1, fitted1)$bias)

Training data bias = 1.060574 1.058574 0.9935321 1.00128 1.097094

cat("Test data bias =", accuracy(Ytest1, Yhat_test1)$bias)

Test data bias = 0.989153 1.066026 1.066558 1.09046 0.9659183

cat("Training data MSE =", mse(Ytrain1, fitted1))

Training data MSE = 24.69854 23.85404 25.39129 22.46033 23.9106

cat("Test data MSE =", mse(Ytest1, Yhat_test1))

Test data MSE = 24.50541 29.489 27.16416 29.67339 26.74964

sessionInfo()

R version 3.5.1 (2018-07-02)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Scientific Linux 7.4 (Nitrogen)

Matrix products: default
BLAS/LAPACK: /software/openblas-0.2.19-el7-x86_64/lib/libopenblas_haswellp-r0.2.19.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
 [1] workflowr_1.6.1 Rcpp_1.0.3      digest_0.6.23   later_0.7.5    
 [5] rprojroot_1.3-2 R6_2.4.1        backports_1.1.5 git2r_0.26.1   
 [9] magrittr_1.5    evaluate_0.12   stringi_1.4.3   fs_1.3.1       
[13] promises_1.0.1  whisker_0.3-2   rmarkdown_1.10  tools_3.5.1    
[17] stringr_1.4.0   glue_1.3.1      httpuv_1.4.5    yaml_2.2.0     
[21] compiler_3.5.1  htmltools_0.3.6 knitr_1.20

Prediction accuracy

Fabio Morgante

30 March, 2020

Simulation 1 – Shared effects, independent variables