Last updated: 2020-05-13
Checks: 7 0
Knit directory: drift-workflow/analysis/
This reproducible R Markdown analysis was created with workflowr (version 1.6.1). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.
Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.
Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.
The command set.seed(20190211)
was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.
Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.
Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.
Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.
The results in this page were generated with repository version 90947d3. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.
Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish
or wflow_git_commit
). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:
Ignored files:
Ignored: .snakemake/
Ignored: analysis/figure/
Ignored: data/datasets/
Ignored: data/raw/
Ignored: data/simulations/
Ignored: nb-log-1272639.err
Ignored: nb-log-1272639.out
Ignored: notebooks/.ipynb_checkpoints/
Ignored: output/
Ignored: sandbox/.ipynb_checkpoints/
Unstaged changes:
Modified: analysis/OutOfAfrica_3G09.Rmd
Deleted: test.png
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
These are the previous versions of the repository in which changes were made to the R Markdown (analysis/AmericanAdmixture_4B11.Rmd
) and HTML (docs/AmericanAdmixture_4B11.html
) files. If you’ve configured a remote Git repository (see ?wflow_git_remote
), click on the hyperlinks in the table below to view the files as they were in that past version.
File | Version | Author | Date | Message |
---|---|---|---|---|
Rmd | 90947d3 | Joseph Marcus | 2020-05-13 | wflow_publish(“analysis/AmericanAdmixture_4B11.Rmd”) |
Here I visualize population structure with simulated data from the AmericanAdmixture_4B11 scenario. See Browning et al. 2018 for details.
Import the required libraries and scripts:
suppressMessages({
library(lfa)
library(flashier)
library(drift.alpha)
library(ggplot2)
library(reshape2)
library(tidyverse)
library(alstructure)
})
data_path <- "../output/simulations/AmericanAdmixture_4B11/rep1.txt"
Y <- t(as.matrix(read.table(data_path, sep=" ")))
n <- nrow(Y)
maf <- colSums(Y) / (2*n)
# filter out too rare and too common SNPs
Y <- Y[,((maf>=.05) & (maf <=.95))]
p <- ncol(Y)
Z <- scale(Y)
print(n)
[1] 160
print(p)
[1] 24643
# sub-population labels from stdpop
labs <- rep(c("AFR", "EUR", "ASIA", "ADMIX"), each=40)
we end up with 160 individuals and ~24643 SNPs.
Lets run PCA
on the centered and scaled genotype matrix:
svd_res <- lfa:::trunc.svd(Z, 5)
L_hat <- svd_res$u
plot_loadings(L_hat, labs)
Plot the first two factors against each other:
qplot(L_hat[,1], L_hat[,2], color=labs) +
xlab("PC1") +
ylab("PC2") +
theme_bw()
the admixed population is in the center of the PC1 vs PC2 bi-plot.
Run ALStructure
with \(K=3\):
admix_res <- alstructure::alstructure(t(Y), d_hat=3)
Qhat <- t(admix_res$Q_hat)
plot_loadings(Qhat, labs)
the three factors seems to represent “ancestral populations” from ASIA, EUR, and AFR and the admixed population draws ancestry from all three of them as expected.
Run the greedy algorithm:
fl <- flash(Y,
greedy.Kmax=8,
prior.family=c(prior.bimodal(), prior.normal()))
Adding factor 1 to flash object...
Adding factor 2 to flash object...
Adding factor 3 to flash object...
Adding factor 4 to flash object...
Adding factor 5 to flash object...
Factor doesn't significantly increase objective and won't be added.
Wrapping up...
Done.
Nullchecking 4 factors...
Done.
plot_loadings(fl$flash.fit$EF[[1]], labs)
there are 4 factors learned by the greedy algorithm: one share factor with a slightly lower loading in the AFR population and 3 sparser factors. The third factor seems to be defined by ASIA individuals and the admixed population has a small weight on this factor.
Run flash [backfit]
initializing from the greedy solution:
flbf <- fl %>%
flash.backfit() %>%
flash.nullcheck(remove = TRUE)
Backfitting 4 factors (tolerance: 5.88e-02)...
Difference between iterations is within 1.0e+03...
Difference between iterations is within 1.0e+02...
Difference between iterations is within 1.0e+01...
Difference between iterations is within 1.0e+00...
Difference between iterations is within 1.0e-01...
Wrapping up...
Done.
Nullchecking 4 factors...
Done.
plot_loadings(flbf$flash.fit$EF[[1]], labs)
the results looks qualitatively similar to the greedy algorithm though now the the third factor is completely defined ASIA individuals.
Run drift
initializing from the greedy solution:
init <- init_from_flash(fl)
dr <- drift(init, miniter=2, maxiter=500, tol = 0.01, verbose = TRUE)
1 : -3229696.286
2 : -3228212.910
3 : -3227165.987
4 : -3226509.597
5 : -3226125.724
6 : -3225898.835
7 : -3225767.675
8 : -3225688.709
9 : -3225636.374
10 : -3225601.979
11 : -3225577.336
12 : -3225560.772
13 : -3225549.245
14 : -3225540.549
15 : -3225533.633
16 : -3225527.895
17 : -3225522.022
18 : -3225517.249
19 : -3225513.311
20 : -3225510.013
21 : -3225507.188
22 : -3225504.714
23 : -3225502.511
24 : -3225500.495
25 : -3225498.561
26 : -3225496.605
27 : -3225494.601
28 : -3225492.551
29 : -3225490.591
30 : -3225488.818
31 : -3225487.163
32 : -3225485.714
33 : -3225484.471
34 : -3225483.377
35 : -3225482.388
36 : -3225481.452
37 : -3225480.461
38 : -3225479.105
39 : -3225477.126
40 : -3225475.658
41 : -3225474.857
42 : -3225474.300
43 : -3225473.842
44 : -3225473.437
45 : -3225473.071
46 : -3225472.735
47 : -3225472.425
48 : -3225472.139
49 : -3225471.872
50 : -3225471.622
51 : -3225471.389
52 : -3225471.169
53 : -3225470.962
54 : -3225470.767
55 : -3225470.582
56 : -3225470.407
57 : -3225470.241
58 : -3225470.083
59 : -3225469.933
60 : -3225469.789
61 : -3225469.652
62 : -3225469.522
63 : -3225469.397
64 : -3225469.277
65 : -3225469.163
66 : -3225469.053
67 : -3225468.948
68 : -3225468.847
69 : -3225468.750
70 : -3225468.656
71 : -3225468.567
72 : -3225468.480
73 : -3225468.397
74 : -3225468.316
75 : -3225468.239
76 : -3225468.164
77 : -3225468.091
78 : -3225468.021
79 : -3225467.953
80 : -3225467.888
81 : -3225467.824
82 : -3225467.762
83 : -3225467.702
84 : -3225467.644
85 : -3225467.587
86 : -3225467.532
87 : -3225467.478
88 : -3225467.426
89 : -3225467.375
90 : -3225467.325
91 : -3225467.277
92 : -3225467.229
93 : -3225467.183
94 : -3225467.138
95 : -3225467.093
96 : -3225467.050
97 : -3225467.007
98 : -3225466.966
99 : -3225466.925
100 : -3225466.885
101 : -3225466.846
102 : -3225466.807
103 : -3225466.769
104 : -3225466.732
105 : -3225466.696
106 : -3225466.660
107 : -3225466.624
108 : -3225466.590
109 : -3225466.556
110 : -3225466.522
111 : -3225466.489
112 : -3225466.456
113 : -3225466.424
114 : -3225466.392
115 : -3225466.361
116 : -3225466.330
117 : -3225466.300
118 : -3225466.270
119 : -3225466.241
120 : -3225466.211
121 : -3225466.183
122 : -3225466.154
123 : -3225466.126
124 : -3225466.099
125 : -3225466.071
126 : -3225466.044
127 : -3225466.018
128 : -3225465.991
129 : -3225465.965
130 : -3225465.940
131 : -3225465.914
132 : -3225465.889
133 : -3225465.864
134 : -3225465.840
135 : -3225465.816
136 : -3225465.792
137 : -3225465.768
138 : -3225465.744
139 : -3225465.721
140 : -3225465.698
141 : -3225465.675
142 : -3225465.653
143 : -3225465.631
144 : -3225465.609
145 : -3225465.587
146 : -3225465.565
147 : -3225465.544
148 : -3225465.523
149 : -3225465.502
150 : -3225465.481
151 : -3225465.460
152 : -3225465.440
153 : -3225465.420
154 : -3225465.400
155 : -3225465.380
156 : -3225465.361
157 : -3225465.341
158 : -3225465.322
159 : -3225465.303
160 : -3225465.284
161 : -3225465.266
162 : -3225465.247
163 : -3225465.229
164 : -3225465.211
165 : -3225465.193
166 : -3225465.175
167 : -3225465.157
168 : -3225465.140
169 : -3225465.122
170 : -3225465.105
171 : -3225465.088
172 : -3225465.071
173 : -3225465.054
174 : -3225465.038
175 : -3225465.021
176 : -3225465.005
177 : -3225464.989
178 : -3225464.973
179 : -3225464.957
180 : -3225464.941
181 : -3225464.925
182 : -3225464.910
183 : -3225464.894
184 : -3225464.879
185 : -3225464.863
186 : -3225464.846
187 : -3225464.829
188 : -3225464.811
189 : -3225464.792
190 : -3225464.772
191 : -3225464.752
192 : -3225464.730
193 : -3225464.708
194 : -3225464.684
195 : -3225464.660
196 : -3225464.634
197 : -3225464.607
198 : -3225464.578
199 : -3225464.548
200 : -3225464.517
201 : -3225464.483
202 : -3225464.448
203 : -3225464.411
204 : -3225464.372
205 : -3225464.330
206 : -3225464.286
207 : -3225464.239
208 : -3225464.189
209 : -3225464.136
210 : -3225464.080
211 : -3225464.020
212 : -3225463.956
213 : -3225463.887
214 : -3225463.813
215 : -3225463.734
216 : -3225463.649
217 : -3225463.557
218 : -3225463.459
219 : -3225463.352
220 : -3225463.237
221 : -3225463.111
222 : -3225462.975
223 : -3225462.828
224 : -3225462.666
225 : -3225462.490
226 : -3225462.297
227 : -3225462.086
228 : -3225461.853
229 : -3225461.598
230 : -3225461.317
231 : -3225461.007
232 : -3225460.666
233 : -3225460.290
234 : -3225459.876
235 : -3225459.422
236 : -3225458.925
237 : -3225458.385
238 : -3225457.800
239 : -3225457.170
240 : -3225456.498
241 : -3225455.822
242 : -3225455.175
243 : -3225454.557
244 : -3225453.966
245 : -3225453.402
246 : -3225452.863
247 : -3225452.349
248 : -3225451.858
249 : -3225451.390
250 : -3225450.944
251 : -3225450.520
252 : -3225450.115
253 : -3225449.731
254 : -3225449.365
255 : -3225449.018
256 : -3225448.688
257 : -3225448.374
258 : -3225448.076
259 : -3225447.793
260 : -3225447.525
261 : -3225447.270
262 : -3225447.027
263 : -3225446.797
264 : -3225446.577
265 : -3225446.369
266 : -3225446.170
267 : -3225445.981
268 : -3225445.801
269 : -3225445.629
270 : -3225445.465
271 : -3225445.308
272 : -3225445.158
273 : -3225445.015
274 : -3225444.878
275 : -3225444.746
276 : -3225444.620
277 : -3225444.499
278 : -3225444.383
279 : -3225444.272
280 : -3225444.165
281 : -3225444.062
282 : -3225443.962
283 : -3225443.867
284 : -3225443.775
285 : -3225443.686
286 : -3225443.600
287 : -3225443.518
288 : -3225443.438
289 : -3225443.361
290 : -3225443.286
291 : -3225443.214
292 : -3225443.145
293 : -3225443.077
294 : -3225443.012
295 : -3225442.948
296 : -3225442.887
297 : -3225442.827
298 : -3225442.769
299 : -3225442.713
300 : -3225442.659
301 : -3225442.606
302 : -3225442.554
303 : -3225442.504
304 : -3225442.456
305 : -3225442.408
306 : -3225442.362
307 : -3225442.317
308 : -3225442.274
309 : -3225442.231
310 : -3225442.190
311 : -3225442.150
312 : -3225442.110
313 : -3225442.072
314 : -3225442.035
315 : -3225441.998
316 : -3225441.962
317 : -3225441.928
318 : -3225441.894
319 : -3225441.861
320 : -3225441.828
321 : -3225441.796
322 : -3225441.766
323 : -3225441.735
324 : -3225441.706
325 : -3225441.677
326 : -3225441.648
327 : -3225441.621
328 : -3225441.593
329 : -3225441.567
330 : -3225441.541
331 : -3225441.515
332 : -3225441.490
333 : -3225441.466
334 : -3225441.442
335 : -3225441.418
336 : -3225441.395
337 : -3225441.373
338 : -3225441.350
339 : -3225441.329
340 : -3225441.307
341 : -3225441.286
342 : -3225441.266
343 : -3225441.246
344 : -3225441.226
345 : -3225441.206
346 : -3225441.187
347 : -3225441.168
348 : -3225441.150
349 : -3225441.131
350 : -3225441.114
351 : -3225441.096
352 : -3225441.079
353 : -3225441.062
354 : -3225441.045
355 : -3225441.028
356 : -3225441.012
357 : -3225440.996
358 : -3225440.981
359 : -3225440.965
360 : -3225440.950
361 : -3225440.935
362 : -3225440.920
363 : -3225440.906
364 : -3225440.891
365 : -3225440.877
366 : -3225440.863
367 : -3225440.849
368 : -3225440.836
369 : -3225440.823
370 : -3225440.809
371 : -3225440.796
372 : -3225440.784
373 : -3225440.771
374 : -3225440.759
375 : -3225440.746
376 : -3225440.734
377 : -3225440.722
378 : -3225440.710
379 : -3225440.699
380 : -3225440.687
381 : -3225440.676
382 : -3225440.665
383 : -3225440.653
384 : -3225440.642
385 : -3225440.632
386 : -3225440.621
387 : -3225440.610
388 : -3225440.600
389 : -3225440.590
390 : -3225440.579
391 : -3225440.569
392 : -3225440.559
plot_loadings(dr$EL, labs)
the drift
algorithm seems to give a pretty different result from flash
. There is one shared factor but then the admixed population has intermediate values on all the other factors. Here are the admixture proportions from the coalescent simulation:
ADMIX percentage 1/6 | Amount African admixture
ADMIX percentage 1/3 | Amount European admixture
ADMIX percentage 1/2 | Amount Asian admixture
which actually qualitatively lines up with ADMIX population loadings on factors 2-4! It’s kinda interesting that we recapitulate a similar behavior we saw in real data for “known” admixed populations.
sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Scientific Linux 7.4 (Nitrogen)
Matrix products: default
BLAS/LAPACK: /software/openblas-0.2.19-el7-x86_64/lib/libopenblas_haswellp-r0.2.19.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] alstructure_0.1.0 forcats_0.5.0 stringr_1.4.0
[4] dplyr_0.8.5 purrr_0.3.4 readr_1.3.1
[7] tidyr_1.0.2 tibble_3.0.1 tidyverse_1.3.0
[10] reshape2_1.4.3 ggplot2_3.3.0 drift.alpha_0.0.9
[13] flashier_0.2.4 lfa_1.9.0
loaded via a namespace (and not attached):
[1] httr_1.4.1 jsonlite_1.6 modelr_0.1.6 assertthat_0.2.1
[5] mixsqp_0.3-17 cellranger_1.1.0 yaml_2.2.0 ebnm_0.1-24
[9] pillar_1.4.3 backports_1.1.6 lattice_0.20-38 glue_1.4.0
[13] digest_0.6.25 promises_1.0.1 rvest_0.3.5 colorspace_1.4-1
[17] htmltools_0.3.6 httpuv_1.4.5 Matrix_1.2-15 plyr_1.8.4
[21] pkgconfig_2.0.3 invgamma_1.1 broom_0.5.6 haven_2.2.0
[25] corpcor_1.6.9 scales_1.1.0 whisker_0.3-2 later_0.7.5
[29] git2r_0.26.1 farver_2.0.3 generics_0.0.2 ellipsis_0.3.0
[33] withr_2.2.0 ashr_2.2-50 cli_2.0.2 magrittr_1.5
[37] crayon_1.3.4 readxl_1.3.1 evaluate_0.14 fs_1.3.1
[41] fansi_0.4.1 nlme_3.1-137 xml2_1.3.2 truncnorm_1.0-8
[45] tools_3.5.1 hms_0.5.3 lifecycle_0.2.0 munsell_0.5.0
[49] reprex_0.3.0 irlba_2.3.3 compiler_3.5.1 rlang_0.4.5
[53] grid_3.5.1 rstudioapi_0.11 labeling_0.3 rmarkdown_1.10
[57] gtable_0.3.0 DBI_1.0.0 R6_2.4.1 lubridate_1.7.4
[61] knitr_1.20 workflowr_1.6.1 rprojroot_1.3-2 stringi_1.4.6
[65] parallel_3.5.1 SQUAREM_2020.2 Rcpp_1.0.4.6 vctrs_0.2.4
[69] dbplyr_1.4.3 tidyselect_1.0.0