Last updated: 2026-05-17
Checks: 7 0
Knit directory: misc/
This reproducible R Markdown analysis was created with workflowr (version 1.7.2). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.
Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.
Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.
The command set.seed(20251108) was run prior to running
the code in the R Markdown file. Setting a seed ensures that any results
that rely on randomness, e.g. subsampling or permutations, are
reproducible.
Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.
Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.
Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.
The results in this page were generated with repository version f011595. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.
Note that you need to be careful to ensure that all relevant files for
the analysis have been committed to Git prior to generating the results
(you can use wflow_publish or
wflow_git_commit). workflowr only checks the R Markdown
file, but you know if there are other scripts or data files that it
depends on. Below is the status of the Git repository when the results
were generated:
Ignored files:
Ignored: .DS_Store
Ignored: .Rhistory
Ignored: .Rproj.user/
Ignored: analysis/presentation.html
Untracked files:
Untracked: Rplots.pdf
Untracked: analysis/eb_ica_backup.Rmd
Untracked: analysis/ebica_math.md
Untracked: analysis/ica_vs_flash_r1_rad copy.Rmd
Untracked: analysis/ica_vs_flash_r1_rad.Rmd
Untracked: analysis/ica_vs_flash_r1_update_2.Rmd
Untracked: analysis/index_bacup.Rmd
Untracked: analysis/preemble.tex
Untracked: analysis/presentation.qmd
Untracked: analysis/presentation_files/
Untracked: analysis/references.bib
Untracked: analysis/tree_example.Rmd
Untracked: code/references.bib
Untracked: code/utils.R
Unstaged changes:
Modified: .gitignore
Modified: analysis/eb_ica.Rmd
Modified: analysis/ica_vs_flash_r1_update_1.Rmd
Modified: analysis/ica_vs_flashier_r1_rad.Rmd
Modified: analysis/ica_with_noise.Rmd
Modified: code/ebcd_functions.R
Modified: code/ica_functions.R
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
These are the previous versions of the repository in which changes were
made to the R Markdown
(analysis/flashier_whitened_and_tree_example.Rmd) and HTML
(docs/flashier_whitened_and_tree_example.html) files. If
you’ve configured a remote Git repository (see
?wflow_git_remote), click on the hyperlinks in the table
below to view the files as they were in that past version.
| File | Version | Author | Date | Message |
|---|---|---|---|---|
| Rmd | f011595 | junmingguan | 2026-05-17 | wflow_publish(c("analysis/index.Rmd", "analysis/flashier_whitened_and_tree_example.Rmd")) |
In this analysis, we want to understand two things:
For the first question, we again look at the 9-overlapping-group example. The answer seems to depend on how many factors we keep in the whitening step. When we keep the correct number of factors, rank-1 flashierRad is more likely to find a true source, but it still does not perform as well as fastICA. This is probably due to the choice of noise variance. If we let flashier estimate the variance, it backfits everything to zero. If we fix the variance instead, we may need to run flashier on the whitened data first to estimate the noise, but flashier may again treat everything as noise. For now, I set it to the true noise level of the data matrix, which is not ideal.
For the second question, the tiny objective differences in the tree example appear to come from how directly each candidate contrast is represented in the whitened data with the intercept. Although all solutions have objectives very close to the maximum, they form distinct plateaus: the highest plateau corresponds to the trivial solution, the next to the 2-vs-2 top split, and the lower plateaus to different 1-vs-3 contrasts. The trivial and top-split solutions can be recovered directly from individual rows of the whitened-plus-intercept matrix, while the 1-vs-3 contrasts require combinations of several factors. These combinations are more error-prone, so their fitted values are slightly farther from exact +/-1 vectors, which lowers the log-cosh objective.
library(ebnm)
library(flashier)
library(Matrix)
library(fastICA)
Warning: package 'fastICA' was built under R version 4.3.3
library(fastTopics)
library(proxy)
Attaching package: 'proxy'
The following object is masked from 'package:Matrix':
as.matrix
The following objects are masked from 'package:stats':
as.dist, dist
The following object is masked from 'package:base':
as.matrix
library(pheatmap)
Warning: package 'pheatmap' was built under R version 4.3.3
source('code/ebcd_functions.R')
source('code/ica_functions.R')
source('code/ebnm_rademacher.R')
source('code/ebnm_point_rademacher.R')
source('code/utils.R')
flash_x_rademacher <- function(X, w, ebnm_fn = ebnm_normal, s = NULL) {
if (is.vector(w)) w <- matrix(w, ncol = 1)
S_init = t(X) %*% w
A_init = t(solve(crossprod(S_init), crossprod(S_init, t(X))))
if (is.null(s)) {
var_type = 0
} else {
var_type = NULL
}
res <- flash_init(X, var_type = var_type, S = s) |>
flash_factors_init(
init = list(A_init, S_init),
ebnm_fn = list(ebnm_fn, ebnm_rademacher)
) |>
flash_backfit(verbose = 0)
return(res)
}
K=9
p = 1000
n = 100
set.seed(1)
L = matrix(-1,nrow=n,ncol=K)
for(i in 1:K){L[sample(1:n,20),i]=1}
FF = matrix(rnorm(p*K), nrow = p, ncol=K)
X = L %*% t(FF) + rnorm(n*p,0,0.01)
plot(X %*% FF[,1])

Here we keep more factors than the true rank, so the whitened data still contain substantial noise.
X1 = preprocess(X,n.comp=20)
X1 = rbind(rep(1,ncol(X1)), X1)
init_W = matrix(0, ncol = 100, nrow = dim(X1)[1])
for (j in 1:100) {
set.seed(j)
init_W[,j] = rnorm(dim(X1)[1])
}
W = fastica_vectorized_rank_one_update(X1, init_W = init_W, include_G2 = TRUE)
obj = apply(W, 2, FUN = function(w) {compute_objective(X1, w)})
cor_max = apply(W, 2, FUN = function(w) {max(abs(cor(L, t(X1) %*% w)))})
sum(cor_max > 0.9)
[1] 17
plot(sort(obj))

which(obj > 0.43)
[1] 11 13 27 28 31 38 50 52 53 60 62 65 66 74 76 79 87
which(cor_max > 0.9)
[1] 11 12 13 27 28 31 38 50 52 53 60 62 66 74 76 79 87
Lhat = t(t(X1) %*% W)
obj_ = obj[obj>0.43]
Lhat_ = Lhat[obj>0.43, ]
plot_heatmap(t(Lhat_[order(obj_),]))

For seed 12, fastICA finds a Bernoulli-type solution that is highly correlated with a true source, but it does not maximize the objective.
set.seed(12)
init_w = rnorm(dim(X1)[1])
w = fastica_rank_one_update(X1, init_w)
max(abs(cor(L, t(X1) %*% w)))
[1] 0.9998218
compute_objective(X1, w)
[1] 0.310858
plot(t(X1) %*% w)

For seed 65, fastICA finds a binary solution with the maximum objective, but it does not correspond to any true source.
set.seed(65)
init_w = rnorm(dim(X1)[1])
w = fastica_rank_one_update(X1, init_w)
max(abs(cor(L, t(X1) %*% w)))
[1] 0.6123724
compute_objective(X1, w)
[1] 0.4337808
plot(t(X1) %*% w)

Rank-1 flashierRad applied to whitened data plus an intercept can sometimes find a true source, although not very often. The encouraging result is that solutions corresponding to a true source have the largest objectives, while solutions that do not correspond to a true source appear far from binary.
# ebnm_normal_fixed <- flash_ebnm(
# prior_family = "normal",
# mode = 0,
# scale = 1
# )
obj = c()
cor_max = c()
Lhat = matrix(0, nrow = 100, ncol = dim(X1)[2])
for (i in 1:100) {
set.seed(i)
init_w = rnorm(dim(X1)[1])
res = flash_x_rademacher(X1, init_w, s=0.01, ebnm_fn = ebnm_normal)
cor_max[i] = max(abs(cor(L, t(X1) %*% res$L_pm)))
obj[i] = compute_objective(X1, res$L_pm)
Lhat[i,] = as.vector(t(X1) %*% res$L_pm)
}
# sum(cor_max > 0.9)
plot(sort(obj))

which(obj > 0.43)
[1] 6 89
which(cor_max > 0.9)
[1] 6 89
# Lhat_harm = harminize_L_sign(Lhat, obj)
obj_ = obj[obj>0.43]
Lhat_ = Lhat[obj>0.43, ]
plot_heatmap(t(Lhat_[order(obj_),]))

For comparison, applying rank-1 flashierRad to the original data fails to recover a true source. As we have shown before, the solution is essentially the first PC.
obj = c()
cor_max = c()
X2 = t(scale(X,center=FALSE))
Lhat = matrix(0, nrow = 100, ncol = dim(X1)[2])
for (i in 1:100) {
set.seed(i)
init_w = rnorm(dim(X2)[1])
res = flash_x_rademacher(X2, init_w, s=0.01, ebnm_fn = ebnm_normal)
cor_max[i] = max(abs(cor(L, t(X2) %*% res$L_pm)))
obj[i] = compute_objective(X2, res$L_pm)
Lhat[i,] = as.vector(t(X2) %*% res$L_pm)
}
# sum(cor_max > 0.9)
plot(sort(obj))

which(obj > 0.43)
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
[19] 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
[37] 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54
[55] 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72
[73] 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90
[91] 91 92 93 94 95 96 97 98 99 100
which(cor_max > 0.9)
integer(0)
Examining the solution with seed 2:
set.seed(2)
init_w = rnorm(dim(X1)[1])
res = flash_x_rademacher(X1, init_w, s=0.01, ebnm_fn = ebnm_normal)
max(abs(cor(L, t(X1) %*% res$L_pm)))
[1] 0.3007622
compute_objective(X1, res$L_pm)
[1] 0.3224484
plot(t(X1) %*% res$L_pm)

plot(svd(X1)$u[,1], res$L_pm)

Here we also consider estimating the noise variance, but the fit is numerically zero.
obj = c()
cor_max = c()
for (i in 1:5) {
set.seed(i)
init_w = rnorm(dim(X1)[1])
res = flash_x_rademacher(X1, init_w, s=NULL, ebnm_fn = ebnm_normal)
cor_max[i] = max(abs(cor(L, t(X1) %*% res$L_pm)))
obj[i] = compute_objective(X1, res$L_pm)
}
--Estimate of factor 1 is numerically zero!
--Estimate of factor 1 is numerically zero!
--Estimate of factor 1 is numerically zero!
--Estimate of factor 1 is numerically zero!
sum(cor_max > 0.9)
[1] 0
Here we keep the correct number of factors, so the whitened data can be thought of as noise-free.
X1 = preprocess(X,n.comp=9)
X1 = rbind(rep(1,ncol(X1)), X1)
init_W = matrix(0, ncol = 100, nrow = dim(X1)[1])
for (j in 1:100) {
set.seed(j)
init_W[,j] = rnorm(dim(X1)[1])
}
W = fastica_vectorized_rank_one_update(X1, init_W = init_W, include_G2 = TRUE)
obj = apply(W, 2, FUN = function(w) {compute_objective(X1, w)})
cor_max = apply(W, 2, FUN = function(w) {max(abs(cor(L, t(X1) %*% w)))})
sum(cor_max > 0.9)
[1] 79
plot(sort(obj))

which(obj > 0.43)
[1] 1 2 3 4 5 6 7 8 9 12 14 17 18 19 20 21 23 25 26
[20] 27 28 30 34 35 36 38 39 40 41 42 43 44 47 48 49 50 52 53
[39] 54 56 57 58 59 60 62 63 64 65 66 67 68 69 70 71 72 73 74
[58] 75 78 79 80 81 82 83 85 86 87 88 90 91 92 93 94 95 96 97
[77] 98 99 100
which(cor_max > 0.9)
[1] 1 2 3 4 5 6 7 8 9 11 12 14 16 17 18 19 20 21 22
[20] 24 25 26 27 28 30 31 34 35 36 38 40 41 42 44 45 46 47 48
[39] 49 50 51 52 53 54 56 57 59 60 62 63 65 66 67 68 69 70 71
[58] 72 73 77 78 79 80 81 84 86 87 88 89 90 91 92 93 94 95 96
[77] 97 99 100
Lhat = t(t(X1) %*% W)
obj_ = obj[obj>0.43]
Lhat_ = Lhat[obj>0.43, ]
plot_heatmap(t(Lhat_[order(obj_),]))

Seeds with solutions that are highly correlated with one true source but have small objectives:
setdiff(which(cor_max > 0.9), which(obj > 0.425))
[1] 11 16 22 24 31 45 46 51 77 84 89
Seeds with high objective values that do not correspond to any true source. Some of them correspond to the trivial solution.
setdiff(which(obj > 0.425), which(cor_max > 0.9))
[1] 23 39 43 58 64 74 75 82 83 85 98
For seed 11, fastICA finds a Bernoulli-type solution that is highly correlated with a true source, but it does not maximize the objective.
set.seed(11)
init_w = rnorm(dim(X1)[1])
w = fastica_rank_one_update(X1, init_w)
max(abs(cor(L, t(X1) %*% w)))
[1] 0.9999999
compute_objective(X1, w)
[1] 0.3108558
plot(t(X1) %*% w)

For seed 23:
set.seed(23)
init_w = rnorm(dim(X1)[1])
w = fastica_rank_one_update(X1, init_w)
max(abs(cor(L, t(X1) %*% w)))
[1] 0.6123724
compute_objective(X1, w)
[1] 0.4337808
plot(t(X1) %*% w)

Rank-1 flashierRad applied to whitened data plus an intercept is more likely to find a true source with the “correct” whitening.
# ebnm_normal_fixed <- flash_ebnm(
# prior_family = "normal",
# mode = 0,
# scale = 1
# )
obj = c()
cor_max = c()
Lhat = matrix(0, nrow = 100, ncol = dim(X1)[2])
for (i in 1:100) {
set.seed(i)
init_w = rnorm(dim(X1)[1])
res = flash_x_rademacher(X1, init_w, s=0.01, ebnm_fn = ebnm_normal)
cor_max[i] = max(abs(cor(L, t(X1) %*% res$L_pm)))
obj[i] = compute_objective(X1, res$L_pm)
Lhat[i,] = as.vector(t(X1) %*% res$L_pm)
}
# sum(cor_max > 0.9)
plot(sort(obj))

which(obj > 0.43)
[1] 3 14 16 18 19 23 26 27 28 30 35 36 40 41 42 43 44 47 48
[20] 49 50 52 54 59 60 61 62 66 67 68 73 74 79 81 82 85 88 94
[39] 98 100
which(cor_max > 0.9)
[1] 3 14 16 18 26 27 28 30 35 36 44 49 50 52 54 59 60 61 62
[20] 67 68 73 79 81 82 85 88 94 98 100
obj_ = obj[obj>0.43]
Lhat_ = Lhat[obj>0.43, ]
plot_heatmap(t(Lhat_[order(obj_),]))

Similarly, if we estimate the noise variance, the fit is numerically zero.
obj = c()
cor_max = c()
for (i in 1:100) {
set.seed(i)
init_w = rnorm(dim(X1)[1])
res = flash_x_rademacher(X1, init_w, s=NULL, ebnm_fn = ebnm_normal)
cor_max[i] = max(abs(cor(L, t(X1) %*% res$L_pm)))
obj[i] = compute_objective(X1, res$L_pm)
}
--Estimate of factor 1 is numerically zero!
--Estimate of factor 1 is numerically zero!
--Estimate of factor 1 is numerically zero!
--Estimate of factor 1 is numerically zero!
--Estimate of factor 1 is numerically zero!
--Estimate of factor 1 is numerically zero!
--Estimate of factor 1 is numerically zero!
--Estimate of factor 1 is numerically zero!
--Estimate of factor 1 is numerically zero!
--Estimate of factor 1 is numerically zero!
--Estimate of factor 1 is numerically zero!
--Estimate of factor 1 is numerically zero!
--Estimate of factor 1 is numerically zero!
--Estimate of factor 1 is numerically zero!
--Estimate of factor 1 is numerically zero!
--Estimate of factor 1 is numerically zero!
--Estimate of factor 1 is numerically zero!
--Estimate of factor 1 is numerically zero!
--Estimate of factor 1 is numerically zero!
--Estimate of factor 1 is numerically zero!
--Estimate of factor 1 is numerically zero!
--Estimate of factor 1 is numerically zero!
--Estimate of factor 1 is numerically zero!
--Estimate of factor 1 is numerically zero!
--Estimate of factor 1 is numerically zero!
--Estimate of factor 1 is numerically zero!
--Estimate of factor 1 is numerically zero!
--Estimate of factor 1 is numerically zero!
--Estimate of factor 1 is numerically zero!
--Estimate of factor 1 is numerically zero!
--Estimate of factor 1 is numerically zero!
--Estimate of factor 1 is numerically zero!
--Estimate of factor 1 is numerically zero!
--Estimate of factor 1 is numerically zero!
--Estimate of factor 1 is numerically zero!
--Estimate of factor 1 is numerically zero!
--Estimate of factor 1 is numerically zero!
--Estimate of factor 1 is numerically zero!
--Estimate of factor 1 is numerically zero!
--Estimate of factor 1 is numerically zero!
--Estimate of factor 1 is numerically zero!
--Estimate of factor 1 is numerically zero!
sum(cor_max > 0.9)
[1] 0
# set up L to be a tree with 4 tips and 7 branches (including top shared branch)
set.seed(1)
n = 100
p = 1000
L = matrix(0,nrow=n,ncol=6)
L[1:50,1] = 1 #top split L
L[51:100,2] = 1 # top split R
L[1:25,3] = 1
L[26:50,4] = 1
L[51:75,5] = 1
L[76:100,6] = 1
FF = matrix(rnorm(p*6),nrow=p)
# FF <- qr.Q(qr(FF))
X = L %*% t(FF) + rnorm(n*p,0,0.01)
image(X%*% t(X))

X1 = preprocess(X,n.comp=4)
X1 = rbind(rep(1,ncol(X1)), X1)
Here we examine the bifurcating tree example and try to understand why there are tiny differences between the objectives of the different solutions.
To illustrate the phenomenon, we see that although the objectives of all 100 solutions are very close to the maximum, there appear to be several plateaus, each corresponding to one class of solutions. For example, the highest plateau corresponds to the trivial solution, the second corresponds to the 2-vs-2 top-split solutions, and the lower plateaus correspond to different types of 1-vs-3 solutions.
init_W = matrix(0, ncol = 100, nrow = dim(X1)[1])
for (j in 1:100) {
set.seed(j)
init_W[,j] = rnorm(dim(X1)[1])
}
W = fastica_vectorized_rank_one_update(X1, init_W = init_W, include_G2 = TRUE)
obj = apply(W, 2, FUN = function(w) {compute_objective(X1, w)})
cor_max = apply(W, 2, FUN = function(w) {max(abs(cor(L, t(X1) %*% w)))})
plot(sort(obj))

Lhat = t(t(X1) %*% W)
Lhat_harm = harminize_L_sign(Lhat, obj)
plot_heatmap(t(Lhat_harm))

If we look at the MSE of each solution relative to its closest population solution (an exact +/-1 vector), we see that the trend is opposite to what we see in the objective values. We know that log-cosh is maximized at the exact +/-1 solutions. Any slight perturbation from these exact solutions, due to the unit variance constraint and the geometry of the log-cosh function, results in a lower objective than the maximum.

Recall that in rank-1 fastICA, we seek \(\mathbf{w}\) such that \(\log \cosh (X_1^\top \mathbf{w})\) is maximized. Therefore, the columns of the whitened data with intercept, \(X_1\), are crucial for understanding this phenomenon. Indeed, the trivial solution and the top-split solution can be recovered directly from \(X_1\) using \(\mathbf{w}=(1,0,0,0)\) and \(\mathbf{w}=(0,1,0,0)\), while the 1-vs-3 contrasts require several factors and are therefore more error-prone.
plot_heatmap(t(X1))

sessionInfo()
R version 4.3.1 (2023-06-16)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS 26.2
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
time zone: America/Chicago
tzcode source: internal
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] pheatmap_1.0.13 proxy_0.4-29 fastTopics_0.7-44 fastICA_1.2-7
[5] Matrix_1.6-4 flashier_1.0.59 ebnm_1.1-42 workflowr_1.7.2
loaded via a namespace (and not attached):
[1] tidyselect_1.2.1 viridisLite_0.4.3 dplyr_1.1.4
[4] farver_2.1.2 fastmap_1.2.0 lazyeval_0.2.3
[7] promises_1.5.0 digest_0.6.37 lifecycle_1.0.5
[10] processx_3.8.7 invgamma_1.2 magrittr_2.0.5
[13] compiler_4.3.1 rlang_1.1.6 sass_0.4.10
[16] progress_1.2.3 tools_4.3.1 yaml_2.3.12
[19] data.table_1.17.8 knitr_1.51 prettyunits_1.2.0
[22] htmlwidgets_1.6.4 scatterplot3d_0.3-45 plyr_1.8.9
[25] RColorBrewer_1.1-3 Rtsne_0.17 purrr_1.0.4
[28] grid_4.3.1 git2r_0.36.2 colorspace_2.1-2
[31] ggplot2_3.5.2 scales_1.4.0 gtools_3.9.5
[34] cli_3.6.5 rmarkdown_2.31 crayon_1.5.3
[37] generics_0.1.4 otel_0.2.0 RcppParallel_5.1.11-2
[40] rstudioapi_0.18.0 httr_1.4.8 reshape2_1.4.5
[43] pbapply_1.7-4 cachem_1.1.0 stringr_1.6.0
[46] splines_4.3.1 parallel_4.3.1 softImpute_1.4-3
[49] matrixStats_1.5.0 vctrs_0.6.5 jsonlite_2.0.0
[52] callr_3.7.6 hms_1.1.4 mixsqp_0.3-54
[55] ggrepel_0.9.6 irlba_2.3.7 trust_0.1-9
[58] plotly_4.12.0 jquerylib_0.1.4 tidyr_1.3.2
[61] glue_1.8.0 cowplot_1.2.0 uwot_0.2.4
[64] stringi_1.8.7 Polychrome_1.5.1 gtable_0.3.6
[67] later_1.4.8 quadprog_1.5-8 tibble_3.3.1
[70] pillar_1.11.1 htmltools_0.5.9 truncnorm_1.0-9
[73] R6_2.6.1 rprojroot_2.1.1 evaluate_1.0.5
[76] lattice_0.22-9 RhpcBLASctl_0.23-42 SQUAREM_2026.1
[79] ashr_2.2-63 httpuv_1.6.17 bslib_0.10.0
[82] Rcpp_1.1.0 deconvolveR_1.2-1 whisker_0.4.1
[85] xfun_0.57 fs_1.6.6 getPass_0.2-4
[88] pkgconfig_2.0.3