Last updated: 2020-08-03
Checks: 7 0
Knit directory: misc/analysis/
This reproducible R Markdown analysis was created with workflowr (version 1.6.2). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.
Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.
Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.
The command set.seed(1)
was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.
Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.
Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.
Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.
The results in this page were generated with repository version d41fc88. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.
Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish
or wflow_git_commit
). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:
Ignored files:
Ignored: .DS_Store
Ignored: .Rhistory
Ignored: .Rproj.user/
Ignored: analysis/.RData
Ignored: analysis/.Rhistory
Ignored: analysis/ALStruct_cache/
Ignored: data/.Rhistory
Ignored: data/pbmc/
Untracked files:
Untracked: .dropbox
Untracked: Icon
Untracked: analysis/GHstan.Rmd
Untracked: analysis/GTEX-cogaps.Rmd
Untracked: analysis/PACS.Rmd
Untracked: analysis/Rplot.png
Untracked: analysis/SPCAvRP.rmd
Untracked: analysis/admm_02.Rmd
Untracked: analysis/admm_03.Rmd
Untracked: analysis/compare-transformed-models.Rmd
Untracked: analysis/cormotif.Rmd
Untracked: analysis/cp_ash.Rmd
Untracked: analysis/eQTL.perm.rand.pdf
Untracked: analysis/eb_prepilot.Rmd
Untracked: analysis/eb_var.Rmd
Untracked: analysis/ebpmf1.Rmd
Untracked: analysis/flash_test_tree.Rmd
Untracked: analysis/flash_tree.Rmd
Untracked: analysis/ieQTL.perm.rand.pdf
Untracked: analysis/lasso_em_03.Rmd
Untracked: analysis/m6amash.Rmd
Untracked: analysis/mash_bhat_z.Rmd
Untracked: analysis/mash_ieqtl_permutations.Rmd
Untracked: analysis/mixsqp.Rmd
Untracked: analysis/mr.ash_lasso_init.Rmd
Untracked: analysis/mr.mash.test.Rmd
Untracked: analysis/mr_ash_modular.Rmd
Untracked: analysis/mr_ash_parameterization.Rmd
Untracked: analysis/mr_ash_pen.Rmd
Untracked: analysis/mr_ash_ridge.Rmd
Untracked: analysis/nejm.Rmd
Untracked: analysis/normalize.Rmd
Untracked: analysis/pbmc.Rmd
Untracked: analysis/poisson_transform.Rmd
Untracked: analysis/pseudodata.Rmd
Untracked: analysis/qrnotes.txt
Untracked: analysis/ridge_iterative_02.Rmd
Untracked: analysis/ridge_iterative_splitting.Rmd
Untracked: analysis/samps/
Untracked: analysis/sc_bimodal.Rmd
Untracked: analysis/shrinkage_comparisons_changepoints.Rmd
Untracked: analysis/susie_en.Rmd
Untracked: analysis/susie_z_investigate.Rmd
Untracked: analysis/svd-timing.Rmd
Untracked: analysis/temp.RDS
Untracked: analysis/temp.Rmd
Untracked: analysis/test-figure/
Untracked: analysis/test.Rmd
Untracked: analysis/test.Rpres
Untracked: analysis/test.md
Untracked: analysis/test_qr.R
Untracked: analysis/test_sparse.Rmd
Untracked: analysis/z.txt
Untracked: code/multivariate_testfuncs.R
Untracked: code/rqb.hacked.R
Untracked: data/4matthew/
Untracked: data/4matthew2/
Untracked: data/E-MTAB-2805.processed.1/
Untracked: data/ENSG00000156738.Sim_Y2.RDS
Untracked: data/GDS5363_full.soft.gz
Untracked: data/GSE41265_allGenesTPM.txt
Untracked: data/Muscle_Skeletal.ACTN3.pm1Mb.RDS
Untracked: data/Thyroid.FMO2.pm1Mb.RDS
Untracked: data/bmass.HaemgenRBC2016.MAF01.Vs2.MergedDataSources.200kRanSubset.ChrBPMAFMarkerZScores.vs1.txt.gz
Untracked: data/bmass.HaemgenRBC2016.Vs2.NewSNPs.ZScores.hclust.vs1.txt
Untracked: data/bmass.HaemgenRBC2016.Vs2.PreviousSNPs.ZScores.hclust.vs1.txt
Untracked: data/eb_prepilot/
Untracked: data/finemap_data/fmo2.sim/b.txt
Untracked: data/finemap_data/fmo2.sim/dap_out.txt
Untracked: data/finemap_data/fmo2.sim/dap_out2.txt
Untracked: data/finemap_data/fmo2.sim/dap_out2_snp.txt
Untracked: data/finemap_data/fmo2.sim/dap_out_snp.txt
Untracked: data/finemap_data/fmo2.sim/data
Untracked: data/finemap_data/fmo2.sim/fmo2.sim.config
Untracked: data/finemap_data/fmo2.sim/fmo2.sim.k
Untracked: data/finemap_data/fmo2.sim/fmo2.sim.k4.config
Untracked: data/finemap_data/fmo2.sim/fmo2.sim.k4.snp
Untracked: data/finemap_data/fmo2.sim/fmo2.sim.ld
Untracked: data/finemap_data/fmo2.sim/fmo2.sim.snp
Untracked: data/finemap_data/fmo2.sim/fmo2.sim.z
Untracked: data/finemap_data/fmo2.sim/pos.txt
Untracked: data/logm.csv
Untracked: data/m.cd.RDS
Untracked: data/m.cdu.old.RDS
Untracked: data/m.new.cd.RDS
Untracked: data/m.old.cd.RDS
Untracked: data/mainbib.bib.old
Untracked: data/mat.csv
Untracked: data/mat.txt
Untracked: data/mat_new.csv
Untracked: data/matrix_lik.rds
Untracked: data/paintor_data/
Untracked: data/temp.txt
Untracked: data/y.txt
Untracked: data/y_f.txt
Untracked: data/zscore_jointLCLs_m6AQTLs_susie_eQTLpruned.rds
Untracked: data/zscore_jointLCLs_random.rds
Untracked: explore_udi.R
Untracked: output/fit.k10.rds
Untracked: output/fit.varbvs.RDS
Untracked: output/glmnet.fit.RDS
Untracked: output/test.bv.txt
Untracked: output/test.gamma.txt
Untracked: output/test.hyp.txt
Untracked: output/test.log.txt
Untracked: output/test.param.txt
Untracked: output/test2.bv.txt
Untracked: output/test2.gamma.txt
Untracked: output/test2.hyp.txt
Untracked: output/test2.log.txt
Untracked: output/test2.param.txt
Untracked: output/test3.bv.txt
Untracked: output/test3.gamma.txt
Untracked: output/test3.hyp.txt
Untracked: output/test3.log.txt
Untracked: output/test3.param.txt
Untracked: output/test4.bv.txt
Untracked: output/test4.gamma.txt
Untracked: output/test4.hyp.txt
Untracked: output/test4.log.txt
Untracked: output/test4.param.txt
Untracked: output/test5.bv.txt
Untracked: output/test5.gamma.txt
Untracked: output/test5.hyp.txt
Untracked: output/test5.log.txt
Untracked: output/test5.param.txt
Unstaged changes:
Modified: analysis/ash_delta_operator.Rmd
Modified: analysis/ash_pois_bcell.Rmd
Modified: analysis/lasso_em.Rmd
Modified: analysis/minque.Rmd
Modified: analysis/mr_missing_data.Rmd
Modified: analysis/ridge_admm.Rmd
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
These are the previous versions of the repository in which changes were made to the R Markdown (analysis/tree_pca_02.Rmd
) and HTML (docs/tree_pca_02.html
) files. If you’ve configured a remote Git repository (see ?wflow_git_remote
), click on the hyperlinks in the table below to view the files as they were in that past version.
File | Version | Author | Date | Message |
---|---|---|---|---|
Rmd | d41fc88 | Matthew Stephens | 2020-08-03 | workflowr::wflow_publish(“tree_pca_02.Rmd”) |
Following up on a previous analysis to look at how PCA behaves on some tree-structure covariance matrices.
I look at the simple 4-tip tree where the root first splits into 2 branches, and then each branch splits into two tips.
This function produces the covariance matrix of this tree, with branch lengths given by b
. The first two elements of b
give the left and right branches of the first split, etc (There are 6 branches in all.)
Sigma4_fun = function(b){
Sigma4 = matrix(0,nrow=4, ncol=4) # shared branch
Sigma4[1:2,1:2] = Sigma4[1:2,1:2] + b[1] # left
Sigma4[3:4,3:4] = Sigma4[3:4,3:4] + b[2] # right split
for(i in 1:4)
Sigma4[i,i] = Sigma4[i,i] + b[2+i]
Sigma4
}
Here we assume all the branch lengths are the same. We see that the PCA (eigenvectors) reproduce the tree.
Sigma4_even = Sigma4_fun(rep(1,6))
Sigma4_even
[,1] [,2] [,3] [,4]
[1,] 2 1 0 0
[2,] 1 2 0 0
[3,] 0 0 2 1
[4,] 0 0 1 2
eigen(Sigma4_even)
eigen() decomposition
$values
[1] 3 3 1 1
$vectors
[,1] [,2] [,3] [,4]
[1,] 0.0000000 0.7071068 0.0000000 0.7071068
[2,] 0.0000000 0.7071068 0.0000000 -0.7071068
[3,] 0.7071068 0.0000000 0.7071068 0.0000000
[4,] 0.7071068 0.0000000 -0.7071068 0.0000000
However, note that the equal eigenvalues (3,3) and (1,1) mean that there is ambiguity about these eigenvectors. Any linear cobination of these last two eigenvectors will also be an eigenvector. For example:
v=c(-1,1,-1,1)
Sigma4_even %*% v
[,1]
[1,] -1
[2,] 1
[3,] -1
[4,] 1
This means that if the truth is a tree like this, the representation learned could be quite prior dependent. It makes it more attractive to make assumptions like sparsity, or even a hierarchical structure on the loadings. (One long-term question is whether a sparsity assumption is enough, or whether we really need to put in a hierarchical assumption to recover trees.)
We can also center the covariance matrix before eigen-decomposition, which is analogous to including a constant factor. This way each eigenvector represents a “split” of the samples in the tree (rather than the top two branches being put in two different eigen vectors).
Sigma4_even_centered = Sigma4_even-mean(Sigma4_even)
eigen(Sigma4_even_centered)
eigen() decomposition
$values
[1] 3.000000e+00 1.000000e+00 1.000000e+00 2.664535e-15
$vectors
[,1] [,2] [,3] [,4]
[1,] -0.5 7.071068e-01 0.000000e+00 0.5
[2,] -0.5 -7.071068e-01 6.824628e-17 0.5
[3,] 0.5 -9.992007e-16 -7.071068e-01 0.5
[4,] 0.5 -1.054712e-15 7.071068e-01 0.5
By a (locally-)balanced tree I mean one where every split is balanced (although not all splits are the same). We see similar results in this case to the even tree (but this is not true for the locally-balanced 8 tip tree!)
Sigma4_balanced = Sigma4_fun(c(1,1,2,2,3,3))
Sigma4_balanced
[,1] [,2] [,3] [,4]
[1,] 3 1 0 0
[2,] 1 3 0 0
[3,] 0 0 4 1
[4,] 0 0 1 4
eigen(Sigma4_balanced)
eigen() decomposition
$values
[1] 5 4 3 2
$vectors
[,1] [,2] [,3] [,4]
[1,] 0.0000000 0.7071068 0.0000000 0.7071068
[2,] 0.0000000 0.7071068 0.0000000 -0.7071068
[3,] 0.7071068 0.0000000 0.7071068 0.0000000
[4,] 0.7071068 0.0000000 -0.7071068 0.0000000
eigen(Sigma4_balanced-mean(Sigma4_balanced))
eigen() decomposition
$values
[1] 4.55488611 3.00000000 2.00000000 -0.05488611
$vectors
[,1] [,2] [,3] [,4]
[1,] 0.4424561 0.000000e+00 7.071068e-01 -0.5515729
[2,] 0.4424561 -5.985912e-17 -7.071068e-01 -0.5515729
[3,] -0.5515729 -7.071068e-01 -3.885781e-16 -0.4424561
[4,] -0.5515729 7.071068e-01 -3.330669e-16 -0.4424561
With arbitrary branch lengths we don’t see the nice exact results. For example, the first eigen-vector, the loadings are not equal for all the samples that split the same way. However the general patterns are somewhat similar.
Sigma4_unbalanced = Sigma4_fun(c(1,2,3,4,5,6))
Sigma4_unbalanced
[,1] [,2] [,3] [,4]
[1,] 4 1 0 0
[2,] 1 5 0 0
[3,] 0 0 7 2
[4,] 0 0 2 8
eigen(Sigma4_unbalanced)
eigen() decomposition
$values
[1] 9.561553 5.618034 5.438447 3.381966
$vectors
[,1] [,2] [,3] [,4]
[1,] 0.0000000 0.5257311 0.0000000 0.8506508
[2,] 0.0000000 0.8506508 0.0000000 -0.5257311
[3,] 0.6154122 0.0000000 -0.7882054 0.0000000
[4,] 0.7882054 0.0000000 0.6154122 0.0000000
eigen(Sigma4_unbalanced-mean(Sigma4_unbalanced))
eigen() decomposition
$values
[1] 8.085568 5.441261 3.529426 -0.556255
$vectors
[,1] [,2] [,3] [,4]
[1,] 0.3181096 0.06396412 0.66046264 -0.6771292
[2,] 0.3959714 0.10834478 -0.74306199 -0.5285139
[3,] -0.4823772 -0.77854288 -0.09273996 -0.3906181
[4,] -0.7136702 0.61485038 -0.05520215 -0.3310386
Part of the issue is that with arbitrary branch lengths the different nodes do not have the same variance. Taking the correlation matrix instead of the covariance matrix seemms to make the results for irregular trees closer to the “regular” trees.
eigen(cov2cor(Sigma4_unbalanced)-mean(cov2cor(Sigma4_unbalanced)))
eigen() decomposition
$values
[1] 1.245816442 0.776393202 0.732738758 -0.000382422
$vectors
[,1] [,2] [,3] [,4]
[1,] 0.4911644 -7.071068e-01 0.000000e+00 -0.5086821
[2,] 0.4911644 7.071068e-01 -6.636048e-17 -0.5086821
[3,] -0.5086821 1.665335e-16 -7.071068e-01 -0.4911644
[4,] -0.5086821 3.330669e-16 7.071068e-01 -0.4911644
Here I create an 8-tip tree by joining two 4-tip trees with a top splitting branch whose branch lengths are given by b_top
.
Sigma8_fun = function(b_top,b_left,b_right){
Sigma8 = matrix(0,nrow=8,ncol=8)
Sigma8[1:4,1:4] = b_top[1] + Sigma4_fun(b_left) # left
Sigma8[5:8,5:8] = b_top[2] + Sigma4_fun(b_right) # right split
Sigma8
}
Note that for the 8-tip case, the (locally-)balanced trees don’t give such clean results as the 4-tip case. In particular, the first eigenvector varies among samples that split the same way. And the second eigenvector is not really reflecting the tree!
Sigma8_balanced = Sigma8_fun(c(1,1),c(2,2,3,3,4,4),c(5,5,6,6,7,7))
Sigma8_balanced
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,] 6 3 1 1 0 0 0 0
[2,] 3 6 1 1 0 0 0 0
[3,] 1 1 7 3 0 0 0 0
[4,] 1 1 3 7 0 0 0 0
[5,] 0 0 0 0 12 6 1 1
[6,] 0 0 0 0 6 12 1 1
[7,] 0 0 0 0 1 1 13 6
[8,] 0 0 0 0 1 1 6 13
eigen(Sigma8_balanced)
eigen() decomposition
$values
[1] 20.561553 16.438447 11.561553 7.438447 7.000000 6.000000 4.000000
[8] 3.000000
$vectors
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 0.0000000 0.0000000 -0.4351621 0.5573454 0.000000e+00 0.000000e+00
[2,] 0.0000000 0.0000000 -0.4351621 0.5573454 0.000000e+00 0.000000e+00
[3,] 0.0000000 0.0000000 -0.5573454 -0.4351621 0.000000e+00 0.000000e+00
[4,] 0.0000000 0.0000000 -0.5573454 -0.4351621 0.000000e+00 0.000000e+00
[5,] 0.4351621 0.5573454 0.0000000 0.0000000 0.000000e+00 7.071068e-01
[6,] 0.4351621 0.5573454 0.0000000 0.0000000 1.369272e-17 -7.071068e-01
[7,] 0.5573454 -0.4351621 0.0000000 0.0000000 -7.071068e-01 -3.191891e-16
[8,] 0.5573454 -0.4351621 0.0000000 0.0000000 7.071068e-01 -3.191891e-16
[,7] [,8]
[1,] 0.000000e+00 7.071068e-01
[2,] -2.643072e-17 -7.071068e-01
[3,] -7.071068e-01 -3.885781e-16
[4,] 7.071068e-01 -3.885781e-16
[5,] 0.000000e+00 0.000000e+00
[6,] 0.000000e+00 0.000000e+00
[7,] 0.000000e+00 0.000000e+00
[8,] 0.000000e+00 0.000000e+00
eigen(Sigma8_balanced-mean(Sigma8_balanced))
eigen() decomposition
$values
[1] 17.394206 16.294806 7.514771 7.000000 6.000000 4.000000 3.000000
[8] -1.203784
$vectors
[,1] [,2] [,3] [,4] [,5]
[1,] -0.2134273 0.1091374 0.48446566 0.000000e+00 -4.100270e-16
[2,] -0.2134273 0.1091374 0.48446566 6.824628e-17 -4.909405e-18
[3,] -0.2361463 0.1222948 -0.51396174 -5.556812e-17 1.251287e-15
[4,] -0.2361463 0.1222948 -0.51396174 2.681578e-17 -5.824634e-16
[5,] 0.1717947 -0.6346232 -0.02512398 -6.742347e-18 7.071068e-01
[6,] 0.1717947 -0.6346232 -0.02512398 -2.757250e-17 -7.071068e-01
[7,] 0.6075939 0.2653043 -0.02247523 -7.071068e-01 2.335306e-16
[8,] 0.6075939 0.2653043 -0.02247523 7.071068e-01 2.335306e-16
[,6] [,7] [,8]
[1,] -9.969376e-17 7.071068e-01 -0.4558847
[2,] -1.193919e-16 -7.071068e-01 -0.4558847
[3,] 7.071068e-01 -8.881784e-16 -0.4063524
[4,] -7.071068e-01 -5.273559e-16 -0.4063524
[5,] -6.840255e-16 -2.220446e-16 -0.2590535
[6,] 9.257979e-16 -5.551115e-16 -0.2590535
[7,] -1.618737e-16 -2.775558e-16 -0.2448228
[8,] -1.618737e-16 -2.775558e-16 -0.2448228
Using the correlation matrix instead helps make the results cleaner and more “tree-like”:
eigen(cov2cor(Sigma8_balanced)-mean(cov2cor(Sigma8_balanced)))
eigen() decomposition
$values
[1] 1.711116942 1.320360834 1.154943654 0.571428571 0.538461538
[6] 0.500000000 0.500000000 -0.003206439
$vectors
[,1] [,2] [,3] [,4] [,5]
[1,] -0.3900132 -0.01923418 0.48904193 -3.038644e-16 0.000000e+00
[2,] -0.3900132 -0.01923418 0.48904193 2.368025e-16 -4.303177e-17
[3,] -0.3428881 -0.01237841 -0.50960415 7.071068e-01 2.720817e-15
[4,] -0.3428881 -0.01237841 -0.50960415 -7.071068e-01 -2.691877e-15
[5,] 0.3556323 -0.49240356 0.02097782 -5.244414e-15 -7.324422e-16
[6,] 0.3556323 -0.49240356 0.02097782 5.691283e-15 7.938681e-16
[7,] 0.3222470 0.50696701 0.02648651 2.471636e-15 -7.071068e-01
[8,] 0.3222470 0.50696701 0.02648651 -2.968457e-15 7.071068e-01
[,6] [,7] [,8]
[1,] 2.174194e-02 7.067724e-01 -0.3291774
[2,] -2.174194e-02 -7.067724e-01 -0.3291774
[3,] 7.133183e-15 1.040834e-15 -0.3501116
[4,] -7.219919e-15 1.290634e-15 -0.3501116
[5,] 7.067724e-01 -2.174194e-02 -0.3614199
[6,] -7.067724e-01 2.174194e-02 -0.3614199
[7,] -4.839878e-16 3.330669e-16 -0.3720749
[8,] 9.037909e-16 3.330669e-16 -0.3720749
Here I add a 9th “outgroup” to an even 8-tip tree to see what happens. I try eigen-decomposition of correlation matrix both with and without centering. In both cases the first eigen-vector has no weight on the 9th population. This might not get in the way of partition-based approaches because in both cases the first eigenvector partitions the samples consistent with the unrooted tree. However, it may complicate the issue of taking account of admixed populations; how to distinguish admixed vs an outgroup?
Sigma8_even = Sigma8_fun(c(1,1),rep(1,6),rep(1,6))
Sigma9 = rbind(cbind(1+Sigma8_even, rep(0,8)),rep(0,9))
Sigma9[9,9] = 1
Sigma9
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
[1,] 4 3 2 2 1 1 1 1 0
[2,] 3 4 2 2 1 1 1 1 0
[3,] 2 2 4 3 1 1 1 1 0
[4,] 2 2 3 4 1 1 1 1 0
[5,] 1 1 1 1 4 3 2 2 0
[6,] 1 1 1 1 3 4 2 2 0
[7,] 1 1 1 1 2 2 4 3 0
[8,] 1 1 1 1 2 2 3 4 0
[9,] 0 0 0 0 0 0 0 0 1
Sigma9_cor = cov2cor(Sigma9)
Sigma9_cor
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
[1,] 1.00 0.75 0.50 0.50 0.25 0.25 0.25 0.25 0
[2,] 0.75 1.00 0.50 0.50 0.25 0.25 0.25 0.25 0
[3,] 0.50 0.50 1.00 0.75 0.25 0.25 0.25 0.25 0
[4,] 0.50 0.50 0.75 1.00 0.25 0.25 0.25 0.25 0
[5,] 0.25 0.25 0.25 0.25 1.00 0.75 0.50 0.50 0
[6,] 0.25 0.25 0.25 0.25 0.75 1.00 0.50 0.50 0
[7,] 0.25 0.25 0.25 0.25 0.50 0.50 1.00 0.75 0
[8,] 0.25 0.25 0.25 0.25 0.50 0.50 0.75 1.00 0
[9,] 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1
eigen(Sigma9_cor)
eigen() decomposition
$values
[1] 3.75 1.75 1.00 0.75 0.75 0.25 0.25 0.25 0.25
$vectors
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] -0.3535534 0.3535534 0 0.000000e+00 5.000000e-01 7.071068e-01
[2,] -0.3535534 0.3535534 0 -2.296293e-16 5.000000e-01 -7.071068e-01
[3,] -0.3535534 0.3535534 0 1.442622e-16 -5.000000e-01 1.665335e-16
[4,] -0.3535534 0.3535534 0 3.633327e-16 -5.000000e-01 3.885781e-16
[5,] -0.3535534 -0.3535534 0 -5.000000e-01 -7.632783e-16 2.775558e-16
[6,] -0.3535534 -0.3535534 0 -5.000000e-01 -5.967449e-16 2.220446e-16
[7,] -0.3535534 -0.3535534 0 5.000000e-01 1.804112e-16 -1.665335e-16
[8,] -0.3535534 -0.3535534 0 5.000000e-01 -7.632783e-16 2.775558e-16
[9,] 0.0000000 0.0000000 1 0.000000e+00 0.000000e+00 0.000000e+00
[,7] [,8] [,9]
[1,] 0.000000e+00 0.000000e+00 0.000000e+00
[2,] 4.018513e-17 4.592587e-16 -5.740733e-18
[3,] 8.357616e-03 1.342745e-01 6.941905e-01
[4,] -8.357616e-03 -1.342745e-01 -6.941905e-01
[5,] -6.977392e-01 1.139040e-01 -1.363166e-02
[6,] 6.977392e-01 -1.139040e-01 1.363166e-02
[7,] -1.144119e-01 -6.848330e-01 1.338420e-01
[8,] 1.144119e-01 6.848330e-01 -1.338420e-01
[9,] 0.000000e+00 0.000000e+00 0.000000e+00
eigen(Sigma9_cor-mean(Sigma9_cor))
eigen() decomposition
$values
[1] 1.7500000 1.7358440 0.7500000 0.7500000 0.2500000 0.2500000 0.2500000
[8] 0.2500000 -0.4302884
$vectors
[,1] [,2] [,3] [,4] [,5]
[1,] -3.535534e-01 -0.2540634 4.983457e-01 -4.063962e-02 1.342371e-02
[2,] -3.535534e-01 -0.2540634 4.983457e-01 -4.063962e-02 -1.342371e-02
[3,] -3.535534e-01 -0.2540634 -4.983457e-01 4.063962e-02 -6.078290e-01
[4,] -3.535534e-01 -0.2540634 -4.983457e-01 4.063962e-02 6.078290e-01
[5,] 3.535534e-01 -0.2540634 4.063962e-02 4.983457e-01 3.522214e-01
[6,] 3.535534e-01 -0.2540634 4.063962e-02 4.983457e-01 -3.522214e-01
[7,] 3.535534e-01 -0.2540634 -4.063962e-02 -4.983457e-01 -7.939662e-02
[8,] 3.535534e-01 -0.2540634 -4.063962e-02 -4.983457e-01 7.939662e-02
[9,] -3.019807e-14 0.6954238 -1.693090e-15 2.983724e-16 -8.153200e-17
[,6] [,7] [,8] [,9]
[1,] -2.763869e-03 0.000000e+00 7.069739e-01 -0.2458694
[2,] 2.763869e-03 -1.791617e-16 -7.069739e-01 -0.2458694
[3,] -3.160485e-01 -1.747888e-01 1.030561e-02 -0.2458694
[4,] 3.160485e-01 1.747888e-01 -1.030561e-02 -0.2458694
[5,] -4.436719e-01 -4.231127e-01 -8.422330e-03 -0.2458694
[6,] 4.436719e-01 4.231127e-01 8.422330e-03 -0.2458694
[7,] 4.508447e-01 -5.389105e-01 3.270096e-03 -0.2458694
[8,] -4.508447e-01 5.389105e-01 -3.270096e-03 -0.2458694
[9,] -6.982262e-17 -7.597605e-17 1.110223e-15 -0.7185999
This section is a bit of a diversion and may be ignored… I just wanted to remind myself about the relationship between the pairwise distances and covariances.
Basically if \(D\) is a distance matrix, and \(L\) is the centering matrix (so premultiplying by \(L\) subtracts the) then \(W=-LDL'\) is symmetric positive semi-definite, and closely related to the covariance matrix. See Petkova et al for more discussion….
D4_fun = function(b){
D4 = matrix(0,nrow=4, ncol=4)
D4[1:2,3:4] = b[2] + b[3] # top split
D4[1,-1] = D4[1,-1] + b[4]
D4[2,-2] = D4[2,-2] + b[5]
D4[3,-3] = D4[3,-3] + b[6]
D4[4,-4] = D4[4,-4] + b[7]
D4 + t(D4)
}
D4_even = D4_fun(rep(1,7))
D4_even
[,1] [,2] [,3] [,4]
[1,] 0 2 4 4
[2,] 2 0 4 4
[3,] 4 4 0 2
[4,] 4 4 2 0
L = diag(4) - matrix(1/4,nrow=4,ncol=4)
W = -L %*% D4_even %*% t(L)
W
[,1] [,2] [,3] [,4]
[1,] 2.5 0.5 -1.5 -1.5
[2,] 0.5 2.5 -1.5 -1.5
[3,] -1.5 -1.5 2.5 0.5
[4,] -1.5 -1.5 0.5 2.5
0.5*W - Sigma4_even
[,1] [,2] [,3] [,4]
[1,] -0.75 -0.75 -0.75 -0.75
[2,] -0.75 -0.75 -0.75 -0.75
[3,] -0.75 -0.75 -0.75 -0.75
[4,] -0.75 -0.75 -0.75 -0.75
eigen(W)
eigen() decomposition
$values
[1] 6.000000e+00 2.000000e+00 2.000000e+00 5.329071e-15
$vectors
[,1] [,2] [,3] [,4]
[1,] -0.5 7.071068e-01 0.000000e+00 0.5
[2,] -0.5 -7.071068e-01 6.824628e-17 0.5
[3,] 0.5 -9.992007e-16 -7.071068e-01 0.5
[4,] 0.5 -1.054712e-15 7.071068e-01 0.5
D4_unbalanced_noshare = D4_fun(c(0,2,3,4,5,6,7))
D4_unbalanced_noshare
[,1] [,2] [,3] [,4]
[1,] 0 9 15 16
[2,] 9 0 16 17
[3,] 15 16 0 13
[4,] 16 17 13 0
W = -L %*% D4_unbalanced_noshare %*% t(L)
W
[,1] [,2] [,3] [,4]
[1,] 9.25 0.75 -4.75 -5.25
[2,] 0.75 10.25 -5.25 -5.75
[3,] -4.75 -5.25 11.25 -1.25
[4,] -5.25 -5.75 -1.25 12.25
eigen(W)
eigen() decomposition
$values
[1] 2.110302e+01 1.293894e+01 8.958048e+00 3.552714e-14
$vectors
[,1] [,2] [,3] [,4]
[1,] -0.4561794 0.03209393 0.73543887 0.5
[2,] -0.5383516 0.05393445 -0.67621645 0.5
[3,] 0.4358975 -0.74741486 -0.03693718 0.5
[4,] 0.5586334 0.66138648 -0.02228524 0.5
sessionInfo()
R version 3.6.0 (2019-04-26)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Mojave 10.14.6
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] Rcpp_1.0.5 rstudioapi_0.11 whisker_0.4 knitr_1.29
[5] magrittr_1.5 workflowr_1.6.2 R6_2.4.1 rlang_0.4.7
[9] stringr_1.4.0 tools_3.6.0 xfun_0.16 git2r_0.27.1
[13] htmltools_0.5.0 ellipsis_0.3.1 yaml_2.2.1 digest_0.6.25
[17] rprojroot_1.3-2 tibble_3.0.3 lifecycle_0.2.0 crayon_1.3.4
[21] later_1.1.0.1 vctrs_0.3.2 fs_1.4.2 promises_1.1.1
[25] glue_1.4.1 evaluate_0.14 rmarkdown_2.3 stringi_1.4.6
[29] compiler_3.6.0 pillar_1.4.6 backports_1.1.8 httpuv_1.5.4
[33] pkgconfig_2.0.3