Last updated: 2020-08-06
Checks: 7 0
Knit directory: misc/analysis/
This reproducible R Markdown analysis was created with workflowr (version 1.6.2). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.
Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.
Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.
The command set.seed(1)
was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.
Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.
Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.
Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.
The results in this page were generated with repository version 3502017. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.
Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish
or wflow_git_commit
). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:
Ignored files:
Ignored: .DS_Store
Ignored: .Rhistory
Ignored: .Rproj.user/
Ignored: analysis/.RData
Ignored: analysis/.Rhistory
Ignored: analysis/ALStruct_cache/
Ignored: data/.Rhistory
Ignored: data/pbmc/
Untracked files:
Untracked: .dropbox
Untracked: Icon
Untracked: analysis/GHstan.Rmd
Untracked: analysis/GTEX-cogaps.Rmd
Untracked: analysis/PACS.Rmd
Untracked: analysis/Rplot.png
Untracked: analysis/SPCAvRP.rmd
Untracked: analysis/admm_02.Rmd
Untracked: analysis/admm_03.Rmd
Untracked: analysis/compare-transformed-models.Rmd
Untracked: analysis/cormotif.Rmd
Untracked: analysis/cp_ash.Rmd
Untracked: analysis/eQTL.perm.rand.pdf
Untracked: analysis/eb_prepilot.Rmd
Untracked: analysis/eb_var.Rmd
Untracked: analysis/ebpmf1.Rmd
Untracked: analysis/flash_test_tree.Rmd
Untracked: analysis/flash_tree.Rmd
Untracked: analysis/ieQTL.perm.rand.pdf
Untracked: analysis/lasso_em_03.Rmd
Untracked: analysis/m6amash.Rmd
Untracked: analysis/mash_bhat_z.Rmd
Untracked: analysis/mash_ieqtl_permutations.Rmd
Untracked: analysis/mixsqp.Rmd
Untracked: analysis/mr.ash_lasso_init.Rmd
Untracked: analysis/mr.mash.test.Rmd
Untracked: analysis/mr_ash_modular.Rmd
Untracked: analysis/mr_ash_parameterization.Rmd
Untracked: analysis/mr_ash_pen.Rmd
Untracked: analysis/mr_ash_ridge.Rmd
Untracked: analysis/nejm.Rmd
Untracked: analysis/normalize.Rmd
Untracked: analysis/pbmc.Rmd
Untracked: analysis/poisson_transform.Rmd
Untracked: analysis/pseudodata.Rmd
Untracked: analysis/qrnotes.txt
Untracked: analysis/ridge_iterative_02.Rmd
Untracked: analysis/ridge_iterative_splitting.Rmd
Untracked: analysis/samps/
Untracked: analysis/sc_bimodal.Rmd
Untracked: analysis/shrinkage_comparisons_changepoints.Rmd
Untracked: analysis/susie_en.Rmd
Untracked: analysis/susie_z_investigate.Rmd
Untracked: analysis/svd-timing.Rmd
Untracked: analysis/temp.RDS
Untracked: analysis/temp.Rmd
Untracked: analysis/test-figure/
Untracked: analysis/test.Rmd
Untracked: analysis/test.Rpres
Untracked: analysis/test.md
Untracked: analysis/test_qr.R
Untracked: analysis/test_sparse.Rmd
Untracked: analysis/z.txt
Untracked: code/multivariate_testfuncs.R
Untracked: code/rqb.hacked.R
Untracked: data/4matthew/
Untracked: data/4matthew2/
Untracked: data/E-MTAB-2805.processed.1/
Untracked: data/ENSG00000156738.Sim_Y2.RDS
Untracked: data/GDS5363_full.soft.gz
Untracked: data/GSE41265_allGenesTPM.txt
Untracked: data/Muscle_Skeletal.ACTN3.pm1Mb.RDS
Untracked: data/Thyroid.FMO2.pm1Mb.RDS
Untracked: data/bmass.HaemgenRBC2016.MAF01.Vs2.MergedDataSources.200kRanSubset.ChrBPMAFMarkerZScores.vs1.txt.gz
Untracked: data/bmass.HaemgenRBC2016.Vs2.NewSNPs.ZScores.hclust.vs1.txt
Untracked: data/bmass.HaemgenRBC2016.Vs2.PreviousSNPs.ZScores.hclust.vs1.txt
Untracked: data/eb_prepilot/
Untracked: data/finemap_data/fmo2.sim/b.txt
Untracked: data/finemap_data/fmo2.sim/dap_out.txt
Untracked: data/finemap_data/fmo2.sim/dap_out2.txt
Untracked: data/finemap_data/fmo2.sim/dap_out2_snp.txt
Untracked: data/finemap_data/fmo2.sim/dap_out_snp.txt
Untracked: data/finemap_data/fmo2.sim/data
Untracked: data/finemap_data/fmo2.sim/fmo2.sim.config
Untracked: data/finemap_data/fmo2.sim/fmo2.sim.k
Untracked: data/finemap_data/fmo2.sim/fmo2.sim.k4.config
Untracked: data/finemap_data/fmo2.sim/fmo2.sim.k4.snp
Untracked: data/finemap_data/fmo2.sim/fmo2.sim.ld
Untracked: data/finemap_data/fmo2.sim/fmo2.sim.snp
Untracked: data/finemap_data/fmo2.sim/fmo2.sim.z
Untracked: data/finemap_data/fmo2.sim/pos.txt
Untracked: data/logm.csv
Untracked: data/m.cd.RDS
Untracked: data/m.cdu.old.RDS
Untracked: data/m.new.cd.RDS
Untracked: data/m.old.cd.RDS
Untracked: data/mainbib.bib.old
Untracked: data/mat.csv
Untracked: data/mat.txt
Untracked: data/mat_new.csv
Untracked: data/matrix_lik.rds
Untracked: data/paintor_data/
Untracked: data/temp.txt
Untracked: data/y.txt
Untracked: data/y_f.txt
Untracked: data/zscore_jointLCLs_m6AQTLs_susie_eQTLpruned.rds
Untracked: data/zscore_jointLCLs_random.rds
Untracked: explore_udi.R
Untracked: output/fit.k10.rds
Untracked: output/fit.varbvs.RDS
Untracked: output/glmnet.fit.RDS
Untracked: output/test.bv.txt
Untracked: output/test.gamma.txt
Untracked: output/test.hyp.txt
Untracked: output/test.log.txt
Untracked: output/test.param.txt
Untracked: output/test2.bv.txt
Untracked: output/test2.gamma.txt
Untracked: output/test2.hyp.txt
Untracked: output/test2.log.txt
Untracked: output/test2.param.txt
Untracked: output/test3.bv.txt
Untracked: output/test3.gamma.txt
Untracked: output/test3.hyp.txt
Untracked: output/test3.log.txt
Untracked: output/test3.param.txt
Untracked: output/test4.bv.txt
Untracked: output/test4.gamma.txt
Untracked: output/test4.hyp.txt
Untracked: output/test4.log.txt
Untracked: output/test4.param.txt
Untracked: output/test5.bv.txt
Untracked: output/test5.gamma.txt
Untracked: output/test5.hyp.txt
Untracked: output/test5.log.txt
Untracked: output/test5.param.txt
Unstaged changes:
Modified: analysis/ash_delta_operator.Rmd
Modified: analysis/ash_pois_bcell.Rmd
Modified: analysis/lasso_em.Rmd
Modified: analysis/minque.Rmd
Modified: analysis/mr_missing_data.Rmd
Modified: analysis/ridge_admm.Rmd
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
These are the previous versions of the repository in which changes were made to the R Markdown (analysis/tree_pca_02.Rmd
) and HTML (docs/tree_pca_02.html
) files. If you’ve configured a remote Git repository (see ?wflow_git_remote
), click on the hyperlinks in the table below to view the files as they were in that past version.
File | Version | Author | Date | Message |
---|---|---|---|---|
Rmd | 3502017 | Matthew Stephens | 2020-08-06 | workflowr::wflow_publish(“tree_pca_02.Rmd”) |
html | ea38312 | Matthew Stephens | 2020-08-06 | Build site. |
Rmd | d749eb4 | Matthew Stephens | 2020-08-06 | workflowr::wflow_publish(“tree_pca_02.Rmd”) |
html | 2091ae2 | Matthew Stephens | 2020-08-03 | Build site. |
Rmd | d41fc88 | Matthew Stephens | 2020-08-03 | workflowr::wflow_publish(“tree_pca_02.Rmd”) |
Following up on a previous analysis to look at how PCA behaves on some tree-structure covariance matrices.
I look at the simple 4-tip tree where the root first splits into 2 branches, and then each branch splits into two tips.
This function produces the covariance matrix of this tree, with branch lengths given by b
. The first two elements of b
give the left and right branches of the first split, etc (There are 6 branches in all.)
Sigma4_fun = function(b){
Sigma4 = matrix(0,nrow=4, ncol=4) # shared branch
Sigma4[1:2,1:2] = Sigma4[1:2,1:2] + b[1] # left
Sigma4[3:4,3:4] = Sigma4[3:4,3:4] + b[2] # right split
for(i in 1:4)
Sigma4[i,i] = Sigma4[i,i] + b[2+i]
Sigma4
}
Here we assume all the branch lengths are the same. We see that the PCA (eigenvectors) reproduce the tree.
Sigma4_even = Sigma4_fun(rep(1,6))
Sigma4_even
[,1] [,2] [,3] [,4]
[1,] 2 1 0 0
[2,] 1 2 0 0
[3,] 0 0 2 1
[4,] 0 0 1 2
eigen(Sigma4_even)
eigen() decomposition
$values
[1] 3 3 1 1
$vectors
[,1] [,2] [,3] [,4]
[1,] 0.0000000 0.7071068 0.0000000 0.7071068
[2,] 0.0000000 0.7071068 0.0000000 -0.7071068
[3,] 0.7071068 0.0000000 0.7071068 0.0000000
[4,] 0.7071068 0.0000000 -0.7071068 0.0000000
However, note that the equal eigenvalues (3,3) and (1,1) mean that there is ambiguity about these eigenvectors. Any linear cobination of these last two eigenvectors will also be an eigenvector. For example:
v=c(-1,1,-1,1)
Sigma4_even %*% v
[,1]
[1,] -1
[2,] 1
[3,] -1
[4,] 1
This means that if the truth is a tree like this, the representation learned could be quite prior dependent. It makes it more attractive to make assumptions like sparsity, or even a hierarchical structure on the loadings. (One long-term question is whether a sparsity assumption is enough, or whether we really need to put in a hierarchical assumption to recover trees.)
We can also center the covariance matrix before eigen-decomposition, which is analogous to including a constant factor. This way each eigenvector represents a “split” of the samples in the tree (rather than the top two branches being put in two different eigen vectors).
Sigma4_even_centered = Sigma4_even-mean(Sigma4_even)
eigen(Sigma4_even_centered)
eigen() decomposition
$values
[1] 3.000000e+00 1.000000e+00 1.000000e+00 2.664535e-15
$vectors
[,1] [,2] [,3] [,4]
[1,] -0.5 7.071068e-01 0.000000e+00 0.5
[2,] -0.5 -7.071068e-01 6.824628e-17 0.5
[3,] 0.5 -9.992007e-16 -7.071068e-01 0.5
[4,] 0.5 -1.054712e-15 7.071068e-01 0.5
By a (locally-)balanced tree I mean one where every split is balanced (although not all splits are the same). We see similar results in this case to the even tree (but this is not true for the locally-balanced 8 tip tree!)
Sigma4_balanced = Sigma4_fun(c(1,1,2,2,3,3))
Sigma4_balanced
[,1] [,2] [,3] [,4]
[1,] 3 1 0 0
[2,] 1 3 0 0
[3,] 0 0 4 1
[4,] 0 0 1 4
eigen(Sigma4_balanced)
eigen() decomposition
$values
[1] 5 4 3 2
$vectors
[,1] [,2] [,3] [,4]
[1,] 0.0000000 0.7071068 0.0000000 0.7071068
[2,] 0.0000000 0.7071068 0.0000000 -0.7071068
[3,] 0.7071068 0.0000000 0.7071068 0.0000000
[4,] 0.7071068 0.0000000 -0.7071068 0.0000000
eigen(Sigma4_balanced-mean(Sigma4_balanced))
eigen() decomposition
$values
[1] 4.55488611 3.00000000 2.00000000 -0.05488611
$vectors
[,1] [,2] [,3] [,4]
[1,] 0.4424561 0.000000e+00 7.071068e-01 -0.5515729
[2,] 0.4424561 -5.985912e-17 -7.071068e-01 -0.5515729
[3,] -0.5515729 -7.071068e-01 -3.885781e-16 -0.4424561
[4,] -0.5515729 7.071068e-01 -3.330669e-16 -0.4424561
With arbitrary branch lengths we don’t see the nice exact results. For example, the first eigen-vector, the loadings are not equal for all the samples that split the same way. However the general patterns are somewhat similar.
Sigma4_unbalanced = Sigma4_fun(c(1,2,3,4,5,6))
Sigma4_unbalanced
[,1] [,2] [,3] [,4]
[1,] 4 1 0 0
[2,] 1 5 0 0
[3,] 0 0 7 2
[4,] 0 0 2 8
eigen(Sigma4_unbalanced)
eigen() decomposition
$values
[1] 9.561553 5.618034 5.438447 3.381966
$vectors
[,1] [,2] [,3] [,4]
[1,] 0.0000000 0.5257311 0.0000000 0.8506508
[2,] 0.0000000 0.8506508 0.0000000 -0.5257311
[3,] 0.6154122 0.0000000 -0.7882054 0.0000000
[4,] 0.7882054 0.0000000 0.6154122 0.0000000
eigen(Sigma4_unbalanced-mean(Sigma4_unbalanced))
eigen() decomposition
$values
[1] 8.085568 5.441261 3.529426 -0.556255
$vectors
[,1] [,2] [,3] [,4]
[1,] 0.3181096 0.06396412 0.66046264 -0.6771292
[2,] 0.3959714 0.10834478 -0.74306199 -0.5285139
[3,] -0.4823772 -0.77854288 -0.09273996 -0.3906181
[4,] -0.7136702 0.61485038 -0.05520215 -0.3310386
Part of the issue is that with arbitrary branch lengths the different nodes do not have the same variance. Taking the correlation matrix instead of the covariance matrix seemms to make the results for irregular trees closer to the “regular” trees.
eigen(cov2cor(Sigma4_unbalanced)-mean(cov2cor(Sigma4_unbalanced)))
eigen() decomposition
$values
[1] 1.245816442 0.776393202 0.732738758 -0.000382422
$vectors
[,1] [,2] [,3] [,4]
[1,] 0.4911644 -7.071068e-01 0.000000e+00 -0.5086821
[2,] 0.4911644 7.071068e-01 -6.636048e-17 -0.5086821
[3,] -0.5086821 1.665335e-16 -7.071068e-01 -0.4911644
[4,] -0.5086821 3.330669e-16 7.071068e-01 -0.4911644
Here I create an 8-tip tree by joining two 4-tip trees with a top splitting branch whose branch lengths are given by b_top
.
Sigma8_fun = function(b_top,b_left,b_right){
Sigma8 = matrix(0,nrow=8,ncol=8)
Sigma8[1:4,1:4] = b_top[1] + Sigma4_fun(b_left) # left
Sigma8[5:8,5:8] = b_top[2] + Sigma4_fun(b_right) # right split
Sigma8
}
Note that for the 8-tip case, the (locally-)balanced trees don’t give such clean results as the 4-tip case. In particular, the first eigenvector varies among samples that split the same way. However if you take the strategy of splitting each eigenvector at its largest gap, then I think every eigenvector splits the tree according to some branch…
Sigma8_balanced = Sigma8_fun(c(1,1),c(2,2,3,3,4,4),c(5,5,6,6,7,7))
Sigma8_balanced
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,] 6 3 1 1 0 0 0 0
[2,] 3 6 1 1 0 0 0 0
[3,] 1 1 7 3 0 0 0 0
[4,] 1 1 3 7 0 0 0 0
[5,] 0 0 0 0 12 6 1 1
[6,] 0 0 0 0 6 12 1 1
[7,] 0 0 0 0 1 1 13 6
[8,] 0 0 0 0 1 1 6 13
eigen(Sigma8_balanced)
eigen() decomposition
$values
[1] 20.561553 16.438447 11.561553 7.438447 7.000000 6.000000 4.000000
[8] 3.000000
$vectors
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 0.0000000 0.0000000 -0.4351621 0.5573454 0.000000e+00 0.000000e+00
[2,] 0.0000000 0.0000000 -0.4351621 0.5573454 0.000000e+00 0.000000e+00
[3,] 0.0000000 0.0000000 -0.5573454 -0.4351621 0.000000e+00 0.000000e+00
[4,] 0.0000000 0.0000000 -0.5573454 -0.4351621 0.000000e+00 0.000000e+00
[5,] 0.4351621 0.5573454 0.0000000 0.0000000 0.000000e+00 7.071068e-01
[6,] 0.4351621 0.5573454 0.0000000 0.0000000 1.369272e-17 -7.071068e-01
[7,] 0.5573454 -0.4351621 0.0000000 0.0000000 -7.071068e-01 -3.191891e-16
[8,] 0.5573454 -0.4351621 0.0000000 0.0000000 7.071068e-01 -3.191891e-16
[,7] [,8]
[1,] 0.000000e+00 7.071068e-01
[2,] -2.643072e-17 -7.071068e-01
[3,] -7.071068e-01 -3.885781e-16
[4,] 7.071068e-01 -3.885781e-16
[5,] 0.000000e+00 0.000000e+00
[6,] 0.000000e+00 0.000000e+00
[7,] 0.000000e+00 0.000000e+00
[8,] 0.000000e+00 0.000000e+00
eigen(Sigma8_balanced-mean(Sigma8_balanced))
eigen() decomposition
$values
[1] 17.394206 16.294806 7.514771 7.000000 6.000000 4.000000 3.000000
[8] -1.203784
$vectors
[,1] [,2] [,3] [,4] [,5]
[1,] -0.2134273 0.1091374 0.48446566 0.000000e+00 -4.100270e-16
[2,] -0.2134273 0.1091374 0.48446566 6.824628e-17 -4.909405e-18
[3,] -0.2361463 0.1222948 -0.51396174 -5.556812e-17 1.251287e-15
[4,] -0.2361463 0.1222948 -0.51396174 2.681578e-17 -5.824634e-16
[5,] 0.1717947 -0.6346232 -0.02512398 -6.742347e-18 7.071068e-01
[6,] 0.1717947 -0.6346232 -0.02512398 -2.757250e-17 -7.071068e-01
[7,] 0.6075939 0.2653043 -0.02247523 -7.071068e-01 2.335306e-16
[8,] 0.6075939 0.2653043 -0.02247523 7.071068e-01 2.335306e-16
[,6] [,7] [,8]
[1,] -9.969376e-17 7.071068e-01 -0.4558847
[2,] -1.193919e-16 -7.071068e-01 -0.4558847
[3,] 7.071068e-01 -8.881784e-16 -0.4063524
[4,] -7.071068e-01 -5.273559e-16 -0.4063524
[5,] -6.840255e-16 -2.220446e-16 -0.2590535
[6,] 9.257979e-16 -5.551115e-16 -0.2590535
[7,] -1.618737e-16 -2.775558e-16 -0.2448228
[8,] -1.618737e-16 -2.775558e-16 -0.2448228
Using the correlation matrix instead helps make the results cleaner and more “tree-like”:
eigen(cov2cor(Sigma8_balanced)-mean(cov2cor(Sigma8_balanced)))
eigen() decomposition
$values
[1] 1.711116942 1.320360834 1.154943654 0.571428571 0.538461538
[6] 0.500000000 0.500000000 -0.003206439
$vectors
[,1] [,2] [,3] [,4] [,5]
[1,] -0.3900132 -0.01923418 0.48904193 -3.038644e-16 0.000000e+00
[2,] -0.3900132 -0.01923418 0.48904193 2.368025e-16 -4.303177e-17
[3,] -0.3428881 -0.01237841 -0.50960415 7.071068e-01 2.720817e-15
[4,] -0.3428881 -0.01237841 -0.50960415 -7.071068e-01 -2.691877e-15
[5,] 0.3556323 -0.49240356 0.02097782 -5.244414e-15 -7.324422e-16
[6,] 0.3556323 -0.49240356 0.02097782 5.691283e-15 7.938681e-16
[7,] 0.3222470 0.50696701 0.02648651 2.471636e-15 -7.071068e-01
[8,] 0.3222470 0.50696701 0.02648651 -2.968457e-15 7.071068e-01
[,6] [,7] [,8]
[1,] 2.174194e-02 7.067724e-01 -0.3291774
[2,] -2.174194e-02 -7.067724e-01 -0.3291774
[3,] 7.133183e-15 1.040834e-15 -0.3501116
[4,] -7.219919e-15 1.290634e-15 -0.3501116
[5,] 7.067724e-01 -2.174194e-02 -0.3614199
[6,] -7.067724e-01 2.174194e-02 -0.3614199
[7,] -4.839878e-16 3.330669e-16 -0.3720749
[8,] 9.037909e-16 3.330669e-16 -0.3720749
Here I add a 9th “outgroup” to an even 8-tip tree to see what happens. I try eigen-decomposition of correlation matrix both with and without centering. In both cases the first eigen-vector has no weight on the 9th population. This might not get in the way of partition-based approaches because in both cases the first eigenvector partitions the samples consistent with the unrooted tree. However, it may complicate the issue of taking account of admixed populations; how to distinguish admixed vs an outgroup?
Sigma8_even = Sigma8_fun(c(1,1),rep(1,6),rep(1,6))
Sigma9 = rbind(cbind(1+Sigma8_even, rep(0,8)),rep(0,9))
Sigma9[9,9] = 1
Sigma9
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
[1,] 4 3 2 2 1 1 1 1 0
[2,] 3 4 2 2 1 1 1 1 0
[3,] 2 2 4 3 1 1 1 1 0
[4,] 2 2 3 4 1 1 1 1 0
[5,] 1 1 1 1 4 3 2 2 0
[6,] 1 1 1 1 3 4 2 2 0
[7,] 1 1 1 1 2 2 4 3 0
[8,] 1 1 1 1 2 2 3 4 0
[9,] 0 0 0 0 0 0 0 0 1
Sigma9_cor = cov2cor(Sigma9)
Sigma9_cor
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
[1,] 1.00 0.75 0.50 0.50 0.25 0.25 0.25 0.25 0
[2,] 0.75 1.00 0.50 0.50 0.25 0.25 0.25 0.25 0
[3,] 0.50 0.50 1.00 0.75 0.25 0.25 0.25 0.25 0
[4,] 0.50 0.50 0.75 1.00 0.25 0.25 0.25 0.25 0
[5,] 0.25 0.25 0.25 0.25 1.00 0.75 0.50 0.50 0
[6,] 0.25 0.25 0.25 0.25 0.75 1.00 0.50 0.50 0
[7,] 0.25 0.25 0.25 0.25 0.50 0.50 1.00 0.75 0
[8,] 0.25 0.25 0.25 0.25 0.50 0.50 0.75 1.00 0
[9,] 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1
eigen(Sigma9_cor)
eigen() decomposition
$values
[1] 3.75 1.75 1.00 0.75 0.75 0.25 0.25 0.25 0.25
$vectors
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] -0.3535534 0.3535534 0 0.000000e+00 5.000000e-01 7.071068e-01
[2,] -0.3535534 0.3535534 0 -2.296293e-16 5.000000e-01 -7.071068e-01
[3,] -0.3535534 0.3535534 0 1.442622e-16 -5.000000e-01 1.665335e-16
[4,] -0.3535534 0.3535534 0 3.633327e-16 -5.000000e-01 3.885781e-16
[5,] -0.3535534 -0.3535534 0 -5.000000e-01 -7.632783e-16 2.775558e-16
[6,] -0.3535534 -0.3535534 0 -5.000000e-01 -5.967449e-16 2.220446e-16
[7,] -0.3535534 -0.3535534 0 5.000000e-01 1.804112e-16 -1.665335e-16
[8,] -0.3535534 -0.3535534 0 5.000000e-01 -7.632783e-16 2.775558e-16
[9,] 0.0000000 0.0000000 1 0.000000e+00 0.000000e+00 0.000000e+00
[,7] [,8] [,9]
[1,] 0.000000e+00 0.000000e+00 0.000000e+00
[2,] 4.018513e-17 4.592587e-16 -5.740733e-18
[3,] 8.357616e-03 1.342745e-01 6.941905e-01
[4,] -8.357616e-03 -1.342745e-01 -6.941905e-01
[5,] -6.977392e-01 1.139040e-01 -1.363166e-02
[6,] 6.977392e-01 -1.139040e-01 1.363166e-02
[7,] -1.144119e-01 -6.848330e-01 1.338420e-01
[8,] 1.144119e-01 6.848330e-01 -1.338420e-01
[9,] 0.000000e+00 0.000000e+00 0.000000e+00
eigen(Sigma9_cor-mean(Sigma9_cor))
eigen() decomposition
$values
[1] 1.7500000 1.7358440 0.7500000 0.7500000 0.2500000 0.2500000 0.2500000
[8] 0.2500000 -0.4302884
$vectors
[,1] [,2] [,3] [,4] [,5]
[1,] -3.535534e-01 -0.2540634 4.983457e-01 -4.063962e-02 1.342371e-02
[2,] -3.535534e-01 -0.2540634 4.983457e-01 -4.063962e-02 -1.342371e-02
[3,] -3.535534e-01 -0.2540634 -4.983457e-01 4.063962e-02 -6.078290e-01
[4,] -3.535534e-01 -0.2540634 -4.983457e-01 4.063962e-02 6.078290e-01
[5,] 3.535534e-01 -0.2540634 4.063962e-02 4.983457e-01 3.522214e-01
[6,] 3.535534e-01 -0.2540634 4.063962e-02 4.983457e-01 -3.522214e-01
[7,] 3.535534e-01 -0.2540634 -4.063962e-02 -4.983457e-01 -7.939662e-02
[8,] 3.535534e-01 -0.2540634 -4.063962e-02 -4.983457e-01 7.939662e-02
[9,] -3.019807e-14 0.6954238 -1.693090e-15 2.983724e-16 -8.153200e-17
[,6] [,7] [,8] [,9]
[1,] -2.763869e-03 0.000000e+00 7.069739e-01 -0.2458694
[2,] 2.763869e-03 -1.791617e-16 -7.069739e-01 -0.2458694
[3,] -3.160485e-01 -1.747888e-01 1.030561e-02 -0.2458694
[4,] 3.160485e-01 1.747888e-01 -1.030561e-02 -0.2458694
[5,] -4.436719e-01 -4.231127e-01 -8.422330e-03 -0.2458694
[6,] 4.436719e-01 4.231127e-01 8.422330e-03 -0.2458694
[7,] 4.508447e-01 -5.389105e-01 3.270096e-03 -0.2458694
[8,] -4.508447e-01 5.389105e-01 -3.270096e-03 -0.2458694
[9,] -6.982262e-17 -7.597605e-17 1.110223e-15 -0.7185999
Here I add an admixed population that I intended to represent \(X=qX_1 + (1-q)X_2\), from which one obtains \(cov(X,X_k) = q cov(X_1,X_k) + (1-q) cov(X_2,X_k)\).
q = 0.5
Sigma8_even = Sigma8_fun(c(1,1),rep(1,6),rep(1,6))
Sigma9 = rbind(Sigma8_even, q*Sigma8_even[1,]+(1-q)*Sigma8_even[2,])
Sigma9 = cbind(Sigma9, c(q*Sigma8_even[1,]+(1-q)*Sigma8_even[2,], 0))
Sigma9[9,9] = q^2*Sigma8_even[1,1]+(1-q)^2*Sigma8_even[1,1] + 2*q*(1-q)*Sigma8_even[1,2]
Sigma9
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
[1,] 3.0 2.0 1 1 0 0 0 0 2.5
[2,] 2.0 3.0 1 1 0 0 0 0 2.5
[3,] 1.0 1.0 3 2 0 0 0 0 1.0
[4,] 1.0 1.0 2 3 0 0 0 0 1.0
[5,] 0.0 0.0 0 0 3 2 1 1 0.0
[6,] 0.0 0.0 0 0 2 3 1 1 0.0
[7,] 0.0 0.0 0 0 1 1 3 2 0.0
[8,] 0.0 0.0 0 0 1 1 2 3 0.0
[9,] 2.5 2.5 1 1 0 0 0 0 2.5
cov2cor(Sigma9)
[,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,] 1.0000000 0.6666667 0.3333333 0.3333333 0.0000000 0.0000000 0.0000000
[2,] 0.6666667 1.0000000 0.3333333 0.3333333 0.0000000 0.0000000 0.0000000
[3,] 0.3333333 0.3333333 1.0000000 0.6666667 0.0000000 0.0000000 0.0000000
[4,] 0.3333333 0.3333333 0.6666667 1.0000000 0.0000000 0.0000000 0.0000000
[5,] 0.0000000 0.0000000 0.0000000 0.0000000 1.0000000 0.6666667 0.3333333
[6,] 0.0000000 0.0000000 0.0000000 0.0000000 0.6666667 1.0000000 0.3333333
[7,] 0.0000000 0.0000000 0.0000000 0.0000000 0.3333333 0.3333333 1.0000000
[8,] 0.0000000 0.0000000 0.0000000 0.0000000 0.3333333 0.3333333 0.6666667
[9,] 0.9128709 0.9128709 0.3651484 0.3651484 0.0000000 0.0000000 0.0000000
[,8] [,9]
[1,] 0.0000000 0.9128709
[2,] 0.0000000 0.9128709
[3,] 0.0000000 0.3651484
[4,] 0.0000000 0.3651484
[5,] 0.3333333 0.0000000
[6,] 0.3333333 0.0000000
[7,] 0.6666667 0.0000000
[8,] 1.0000000 0.0000000
[9,] 0.0000000 1.0000000
eigen(Sigma9)
eigen() decomposition
$values
[1] 9.000000e+00 7.000000e+00 3.500000e+00 3.000000e+00 1.000000e+00
[6] 1.000000e+00 1.000000e+00 1.000000e+00 -2.220446e-16
$vectors
[,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,] 0.4923660 0.0 0.3015113 0.0 0.000000e+00 -7.071068e-01 0.000000e+00
[2,] 0.4923660 0.0 0.3015113 0.0 0.000000e+00 7.071068e-01 0.000000e+00
[3,] 0.3692745 0.0 -0.6030227 0.0 0.000000e+00 3.330669e-16 0.000000e+00
[4,] 0.3692745 0.0 -0.6030227 0.0 0.000000e+00 4.440892e-16 0.000000e+00
[5,] 0.0000000 -0.5 0.0000000 0.5 -7.071068e-01 0.000000e+00 0.000000e+00
[6,] 0.0000000 -0.5 0.0000000 0.5 7.071068e-01 0.000000e+00 1.110223e-16
[7,] 0.0000000 -0.5 0.0000000 -0.5 8.881784e-16 0.000000e+00 -7.071068e-01
[8,] 0.0000000 -0.5 0.0000000 -0.5 8.326673e-16 0.000000e+00 7.071068e-01
[9,] 0.4923660 0.0 0.3015113 0.0 0.000000e+00 3.330669e-16 0.000000e+00
[,8] [,9]
[1,] 0.000000e+00 4.082483e-01
[2,] -3.965082e-18 4.082483e-01
[3,] 7.071068e-01 -6.071532e-17
[4,] -7.071068e-01 -3.295975e-17
[5,] 0.000000e+00 0.000000e+00
[6,] 0.000000e+00 0.000000e+00
[7,] 0.000000e+00 0.000000e+00
[8,] 0.000000e+00 0.000000e+00
[9,] -5.731346e-17 -8.164966e-01
eigen(cov2cor(Sigma9)-mean(cov2cor(Sigma9)))
eigen() decomposition
$values
[1] 2.76344712 1.23405077 1.00000000 0.33333333 0.33333333 0.33333333
[7] 0.33333333 0.03681731 -0.12084230
$vectors
[,1] [,2] [,3] [,4] [,5]
[1,] 0.3845929 0.3325932 0.000000e+00 0.000000e+00 0.000000e+00
[2,] 0.3845929 0.3325932 -3.334252e-17 2.815022e-17 9.516514e-17
[3,] 0.2522873 -0.5423974 -4.654053e-16 1.358119e-02 -3.808985e-02
[4,] 0.2522873 -0.5423974 5.203529e-16 -1.358119e-02 3.808985e-02
[5,] -0.3144151 0.1124435 5.000000e-01 -6.379040e-02 7.031336e-01
[6,] -0.3144151 0.1124435 5.000000e-01 6.379040e-02 -7.031336e-01
[7,] -0.3144151 0.1124435 -5.000000e-01 7.040926e-01 6.443823e-02
[8,] -0.3144151 0.1124435 -5.000000e-01 -7.040926e-01 -6.443823e-02
[9,] 0.4259712 0.3738977 3.574992e-17 -3.018272e-17 -1.427755e-16
[,6] [,7] [,8] [,9]
[1,] 6.539159e-01 -2.690613e-01 0.2682864 0.4116946
[2,] -6.539159e-01 2.690613e-01 0.2682864 0.4116946
[3,] 2.686210e-01 6.528457e-01 -0.1894673 0.3259729
[4,] -2.686210e-01 -6.528457e-01 -0.1894673 0.3259729
[5,] 1.490273e-02 3.621900e-02 -0.1845749 0.3231590
[6,] -1.490273e-02 -3.621900e-02 -0.1845749 0.3231590
[7,] -3.831233e-03 -9.311274e-03 -0.1845749 0.3231590
[8,] 3.831233e-03 9.311274e-03 -0.1845749 0.3231590
[9,] 3.330669e-16 3.330669e-16 -0.8049705 -0.1754182
Both the correlation and covariance decompose in an intutive way, with the admixed population being loaded as approximately the average of populations 1 and 2 (exactly in the case of the covariance).
Also try a different q:
q = 0.25
Sigma9 = rbind(Sigma8_even, q*Sigma8_even[1,]+(1-q)*Sigma8_even[2,])
Sigma9 = cbind(Sigma9, c(q*Sigma8_even[1,]+(1-q)*Sigma8_even[2,], 0))
Sigma9[9,9] = q^2*Sigma8_even[1,1]+(1-q)^2*Sigma8_even[1,1] + 2*q*(1-q)*Sigma8_even[1,2]
eigen(Sigma9)
eigen() decomposition
$values
[1] 9.034512e+00 7.000000e+00 3.516293e+00 3.000000e+00 1.074194e+00
[6] 1.000000e+00 1.000000e+00 1.000000e+00 -6.824962e-17
$vectors
[,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,] 0.4752866 0.0 0.2629910 0.0 0.81638031 0.000000e+00 0.000000e+00
[2,] 0.5063125 0.0 0.3244004 0.0 -0.54060905 0.000000e+00 0.000000e+00
[3,] 0.3668734 0.0 -0.6041893 0.0 -0.01895396 0.000000e+00 0.000000e+00
[4,] 0.3668734 0.0 -0.6041893 0.0 -0.01895396 0.000000e+00 0.000000e+00
[5,] 0.0000000 -0.5 0.0000000 0.5 0.00000000 -7.071068e-01 0.000000e+00
[6,] 0.0000000 -0.5 0.0000000 0.5 0.00000000 7.071068e-01 1.110223e-16
[7,] 0.0000000 -0.5 0.0000000 -0.5 0.00000000 8.881784e-16 -7.071068e-01
[8,] 0.0000000 -0.5 0.0000000 -0.5 0.00000000 8.326673e-16 7.071068e-01
[9,] 0.4985560 0.0 0.3090481 0.0 -0.20136171 0.000000e+00 0.000000e+00
[,8] [,9]
[1,] 0.000000e+00 1.961161e-01
[2,] 8.199773e-17 5.883484e-01
[3,] -7.071068e-01 2.220446e-16
[4,] 7.071068e-01 -1.387779e-16
[5,] 0.000000e+00 0.000000e+00
[6,] 0.000000e+00 0.000000e+00
[7,] 0.000000e+00 0.000000e+00
[8,] 0.000000e+00 0.000000e+00
[9,] -2.087095e-17 -7.844645e-01
The graph laplacian for a graph with edge weights (distances) \(w\) is \(L=D-W\) where \(D\) is a diagonal matrix with \(d_{ii} = \sum_j w_{ij}\). Cutting a graph into two can be acheived by taking the top eigenvector of the Graph Laplacian, as here for example.
Here we have functions to compute distance matrices for the 4-tip and 8-tip trees as above.
D4_fun = function(b){
D4 = matrix(0,nrow=4, ncol=4)
D4[1:2,3:4] = b[1] + b[2] # top split
D4[1,-1] = D4[1,-1] + b[3]
D4[2,-2] = D4[2,-2] + b[4]
D4[3,-3] = D4[3,-3] + b[5]
D4[4,-4] = D4[4,-4] + b[6]
D4 + t(D4)
}
D8_fun = function(b_top, b_left, b_right){
D8 = matrix(0,nrow=8, ncol=8)
D8[1:4,-(1:4)] = b_top[1]
D8[5:8,-(5:8)] = b_top[2] # top split
D8[1:2,-(1:2)] = D8[1:2,-(1:2)] + b_left[1]
D8[3:4,-(3:4)] = D8[3:4,-(3:4)] + b_left[2]
D8[1,-1] = D8[1,-1] + b_left[3]
D8[2,-2] = D8[2,-2] + b_left[4]
D8[3,-3] = D8[3,-3] + b_left[5]
D8[4,-4] = D8[4,-4] + b_left[6]
D8[5:6,-(5:6)] = D8[5:6,-(5:6)] + b_right[1]
D8[7:8,-(7:8)] = D8[7:8,-(7:8)] + b_right[2]
D8[5,-5] = D8[5,-5] + b_right[3]
D8[6,-6] = D8[6,-6] + b_right[4]
D8[7,-7] = D8[7,-7] + b_right[5]
D8[8,-8] = D8[8,-8] + b_right[6]
D8 + t(D8)
}
Interestingly, in the case with even branches not even the top eigenvector is unique – there are three equal eigenvalues. (However, all the top eigenvectors would produce a “valid” split of the tree if you applied the strategy of splitting them at the largest gap. The second and third eigenvectors would split into 2 vs 6 rather than 4 vs 4. Would all linear combinations of eigenvectors produce a valid split? Not sure…)
D8_even = D8_fun(c(1,1),rep(1,6),rep(1,6))
D8_even
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,] 0 2 4 4 6 6 6 6
[2,] 2 0 4 4 6 6 6 6
[3,] 4 4 0 2 6 6 6 6
[4,] 4 4 2 0 6 6 6 6
[5,] 6 6 6 6 0 2 4 4
[6,] 6 6 6 6 2 0 4 4
[7,] 6 6 6 6 4 4 0 2
[8,] 6 6 6 6 4 4 2 0
L_even = diag(rowSums(D8_even)) - D8_even
L_even
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,] 34 -2 -4 -4 -6 -6 -6 -6
[2,] -2 34 -4 -4 -6 -6 -6 -6
[3,] -4 -4 34 -2 -6 -6 -6 -6
[4,] -4 -4 -2 34 -6 -6 -6 -6
[5,] -6 -6 -6 -6 34 -2 -4 -4
[6,] -6 -6 -6 -6 -2 34 -4 -4
[7,] -6 -6 -6 -6 -4 -4 34 -2
[8,] -6 -6 -6 -6 -4 -4 -2 34
eigen(L_even)
eigen() decomposition
$values
[1] 4.80000e+01 4.00000e+01 4.00000e+01 3.60000e+01 3.60000e+01 3.60000e+01
[7] 3.60000e+01 7.81597e-14
$vectors
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] -0.3535534 0.4819643 0.1330804 -0.0726462 0.3162704 0.000000e+00
[2,] -0.3535534 0.4819643 0.1330804 0.0726462 -0.3162704 6.378627e-17
[3,] -0.3535534 -0.4819643 -0.1330804 0.2512920 -0.5715778 9.894067e-02
[4,] -0.3535534 -0.4819643 -0.1330804 -0.2512920 0.5715778 -9.894067e-02
[5,] 0.3535534 0.1330804 -0.4819643 0.6338747 0.2198524 -2.201603e-01
[6,] 0.3535534 0.1330804 -0.4819643 -0.6338747 -0.2198524 2.201603e-01
[7,] 0.3535534 -0.1330804 0.4819643 -0.1725623 -0.1579137 -6.646354e-01
[8,] 0.3535534 -0.1330804 0.4819643 0.1725623 0.1579137 6.646354e-01
[,7] [,8]
[1,] 0.62824800 -0.3535534
[2,] -0.62824800 -0.3535534
[3,] 0.31679938 -0.3535534
[4,] -0.31679938 -0.3535534
[5,] -0.03738048 -0.3535534
[6,] 0.03738048 -0.3535534
[7,] 0.05954248 -0.3535534
[8,] -0.05954248 -0.3535534
D8_balanced = D8_fun(c(1,1),c(2,2,3,3,4,4),c(5,5,6,6,7,7))
D8_balanced
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,] 0 6 11 11 18 18 19 19
[2,] 6 0 11 11 18 18 19 19
[3,] 11 11 0 8 19 19 20 20
[4,] 11 11 8 0 19 19 20 20
[5,] 18 18 19 19 0 12 23 23
[6,] 18 18 19 19 12 0 23 23
[7,] 19 19 20 20 23 23 0 14
[8,] 19 19 20 20 23 23 14 0
eigen(diag(rowSums(D8_balanced))-D8_balanced)
eigen() decomposition
$values
[1] 1.684900e+02 1.520000e+02 1.517593e+02 1.440000e+02 1.197507e+02
[6] 1.160000e+02 1.080000e+02 2.273737e-13
$vectors
[,1] [,2] [,3] [,4] [,5]
[1,] -0.05537472 0.000000e+00 -0.3032768 -2.494094e-16 0.52910948
[2,] -0.05537472 0.000000e+00 -0.3032768 -4.141320e-16 0.52910948
[3,] -0.06533195 7.050795e-17 -0.3906786 -2.572166e-16 -0.46701389
[4,] -0.06533195 -6.592973e-17 -0.3906786 1.117597e-16 -0.46701389
[5,] -0.43228260 -5.438079e-15 0.4324365 -7.071068e-01 -0.03362174
[6,] -0.43228260 5.392787e-15 0.4324365 7.071068e-01 -0.03362174
[7,] 0.55298927 -7.071068e-01 0.2615189 5.419620e-15 -0.02847385
[8,] 0.55298927 7.071068e-01 0.2615189 -5.349543e-15 -0.02847385
[,6] [,7] [,8]
[1,] -1.008211e-14 7.071068e-01 -0.3535534
[2,] -7.754136e-15 -7.071068e-01 -0.3535534
[3,] -7.071068e-01 9.436896e-16 -0.3535534
[4,] 7.071068e-01 3.219647e-15 -0.3535534
[5,] 9.632002e-17 -5.551115e-17 -0.3535534
[6,] 2.073423e-16 -1.110223e-16 -0.3535534
[7,] -3.562959e-16 -3.330669e-16 -0.3535534
[8,] -1.342513e-16 -3.330669e-16 -0.3535534
Add an outgroup to the even tree. The top eigenvector splits 5 vs 4.
D9 = rbind(cbind(D8_even, rep(5,8)),rep(5,9))
D9[9,9] = 0
L9 = diag(rowSums(D9)) - D9
eigen(L9)
eigen() decomposition
$values
[1] 5.300000e+01 4.500000e+01 4.500000e+01 4.500000e+01 4.100000e+01
[6] 4.100000e+01 4.100000e+01 4.100000e+01 5.684342e-14
$vectors
[,1] [,2] [,3] [,4] [,5]
[1,] -3.535534e-01 -0.2436648 0.42289173 -0.16024642 4.753566e-01
[2,] -3.535534e-01 -0.2436648 0.42289173 -0.16024642 -4.753566e-01
[3,] -3.535534e-01 0.2597599 -0.43688105 -0.07448918 -4.804414e-01
[4,] -3.535534e-01 0.2597599 -0.43688105 -0.07448918 4.804414e-01
[5,] 3.535534e-01 0.4387158 0.24661338 -0.10295233 1.856798e-01
[6,] 3.535534e-01 0.4387158 0.24661338 -0.10295233 -1.856798e-01
[7,] 3.535534e-01 -0.4226207 -0.26060270 -0.13178327 -9.346202e-02
[8,] 3.535534e-01 -0.4226207 -0.26060270 -0.13178327 9.346202e-02
[9,] 2.354887e-16 -0.0643802 0.05595726 0.93894242 -3.469447e-15
[,6] [,7] [,8] [,9]
[1,] -3.682758e-01 2.438096e-01 -2.810086e-01 -0.3333333
[2,] 3.682758e-01 -2.438096e-01 2.810086e-01 -0.3333333
[3,] -2.496118e-01 3.721106e-02 -4.533049e-01 -0.3333333
[4,] 2.496118e-01 -3.721106e-02 4.533049e-01 -0.3333333
[5,] 2.050295e-02 -6.305419e-01 -2.598453e-01 -0.3333333
[6,] -2.050295e-02 6.305419e-01 2.598453e-01 -0.3333333
[7,] -5.492236e-01 -2.039342e-01 3.847456e-01 -0.3333333
[8,] 5.492236e-01 2.039342e-01 -3.847456e-01 -0.3333333
[9,] -7.494005e-16 2.345346e-15 -7.494005e-16 -0.3333333
Notice here that, unlike the correlation case, the length of the outgroup will matter. If we make it long enough then the first eigenvector will split 8 vs 1.
D9 = rbind(cbind(D8_even, rep(50,8)),rep(50,9))
D9[9,9] = 0
L9 = diag(rowSums(D9)) - D9
eigen(L9)
eigen() decomposition
$values
[1] 4.500000e+02 9.800000e+01 9.000000e+01 9.000000e+01 8.600000e+01
[6] 8.600000e+01 8.600000e+01 8.600000e+01 9.753393e-14
$vectors
[,1] [,2] [,3] [,4] [,5]
[1,] 0.1178511 -3.535534e-01 0.000000e+00 -5.000000e-01 0.000000e+00
[2,] 0.1178511 -3.535534e-01 -2.409528e-16 -5.000000e-01 5.334851e-16
[3,] 0.1178511 -3.535534e-01 -2.248180e-15 5.000000e-01 7.241999e-02
[4,] 0.1178511 -3.535534e-01 2.969590e-15 5.000000e-01 -7.241999e-02
[5,] 0.1178511 3.535534e-01 -5.000000e-01 1.387779e-15 1.576739e-01
[6,] 0.1178511 3.535534e-01 -5.000000e-01 -4.996004e-16 -1.576739e-01
[7,] 0.1178511 3.535534e-01 5.000000e-01 -4.996004e-16 6.854884e-01
[8,] 0.1178511 3.535534e-01 5.000000e-01 2.498002e-16 -6.854884e-01
[9,] -0.9428090 -8.326673e-17 1.795032e-17 1.110223e-16 3.430958e-18
[,6] [,7] [,8] [,9]
[1,] 0.000000e+00 0.000000e+00 7.071068e-01 -0.3333333
[2,] -3.833340e-15 8.563212e-15 -7.071068e-01 -0.3333333
[3,] -2.107800e-01 6.710642e-01 8.659740e-15 -0.3333333
[4,] 2.107800e-01 -6.710642e-01 -5.828671e-16 -0.3333333
[5,] -6.525791e-01 -2.219898e-01 2.275957e-15 -0.3333333
[6,] 6.525791e-01 2.219898e-01 1.276756e-15 -0.3333333
[7,] 1.723725e-01 -1.983473e-02 1.276756e-15 -0.3333333
[8,] -1.723725e-01 1.983473e-02 1.915135e-15 -0.3333333
[9,] -3.969815e-17 1.914446e-16 0.000000e+00 -0.3333333
This section is a bit of a diversion and may be ignored… I just wanted to remind myself about the relationship between the pairwise distances and covariances.
Basically if \(D\) is a distance matrix, and \(L\) is the centering matrix (so premultiplying by \(L\) subtracts the) then \(W=-LDL'\) is symmetric positive semi-definite, and closely related to the covariance matrix. See Petkova et al for more discussion….
D4_fun = function(b){
D4 = matrix(0,nrow=4, ncol=4)
D4[1:2,3:4] = b[1] + b[2] # top split
D4[1,-1] = D4[1,-1] + b[3]
D4[2,-2] = D4[2,-2] + b[4]
D4[3,-3] = D4[3,-3] + b[5]
D4[4,-4] = D4[4,-4] + b[6]
D4 + t(D4)
}
D4_even = D4_fun(rep(1,6))
D4_even
[,1] [,2] [,3] [,4]
[1,] 0 2 4 4
[2,] 2 0 4 4
[3,] 4 4 0 2
[4,] 4 4 2 0
L = diag(4) - matrix(1/4,nrow=4,ncol=4)
W = -L %*% D4_even %*% t(L)
W
[,1] [,2] [,3] [,4]
[1,] 2.5 0.5 -1.5 -1.5
[2,] 0.5 2.5 -1.5 -1.5
[3,] -1.5 -1.5 2.5 0.5
[4,] -1.5 -1.5 0.5 2.5
0.5*W - Sigma4_even
[,1] [,2] [,3] [,4]
[1,] -0.75 -0.75 -0.75 -0.75
[2,] -0.75 -0.75 -0.75 -0.75
[3,] -0.75 -0.75 -0.75 -0.75
[4,] -0.75 -0.75 -0.75 -0.75
eigen(W)
eigen() decomposition
$values
[1] 6.000000e+00 2.000000e+00 2.000000e+00 5.329071e-15
$vectors
[,1] [,2] [,3] [,4]
[1,] -0.5 7.071068e-01 0.000000e+00 0.5
[2,] -0.5 -7.071068e-01 6.824628e-17 0.5
[3,] 0.5 -9.992007e-16 -7.071068e-01 0.5
[4,] 0.5 -1.054712e-15 7.071068e-01 0.5
sessionInfo()
R version 3.6.0 (2019-04-26)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Mojave 10.14.6
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] Rcpp_1.0.5 rstudioapi_0.11 whisker_0.4 knitr_1.29
[5] magrittr_1.5 workflowr_1.6.2 R6_2.4.1 rlang_0.4.7
[9] stringr_1.4.0 tools_3.6.0 xfun_0.16 git2r_0.27.1
[13] htmltools_0.5.0 ellipsis_0.3.1 yaml_2.2.1 digest_0.6.25
[17] rprojroot_1.3-2 tibble_3.0.3 lifecycle_0.2.0 crayon_1.3.4
[21] later_1.1.0.1 vctrs_0.3.2 fs_1.4.2 promises_1.1.1
[25] glue_1.4.1 evaluate_0.14 rmarkdown_2.3 stringi_1.4.6
[29] compiler_3.6.0 pillar_1.4.6 backports_1.1.8 httpuv_1.5.4
[33] pkgconfig_2.0.3