Last updated: 2020-11-24
Checks: 7 0
Knit directory: 2020_HairPheno_manuscript/analysis/
This reproducible R Markdown analysis was created with workflowr (version 1.6.2). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.
Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.
Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.
The command set.seed(12345)
was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.
Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.
Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.
Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.
The results in this page were generated with repository version 4b704ad. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.
Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish
or wflow_git_commit
). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:
Ignored files:
Ignored: .Rproj.user/
Ignored: output/
Untracked files:
Untracked: .DS_Store
Untracked: .gitignore
Untracked: 2020_HairPheno_manuscript.Rproj
Untracked: analysis/
Untracked: code/
Untracked: data/
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
There are no past versions. Publish this analysis with wflow_publish()
to start tracking its development.
To explore the significance of quantifying hair fiber morphology, we explore the relationship between various quantitative hair traits, categorical data and genotype data on the same sample.
Our data consists of 193 individuals for whom we have quantitative hair phenotype data. In our first data quality control step, we filter to keep individuals who have more than 4 hair fragments in their curvature image and over 10% African ancestry. We calculate mean and median values for the cross-sectional data we have collected for individuals (~ 6 sectioned hair fibers). In our analyses, we use median values as they are less affected by intra-individual outliers.
We compare the self-reported hair texture with mean and median curvature for our sample.
The single individual with a high mean curvature in the straight group is the result of an artefact in the image.
The red arrow points to a stray fiber that likely contaminated the sample and was missed during imaging. Such potential outliers are the reason we chose to use the median curvature for a sample in our analyses.
To explore how much data is lost when binning continuous variation, we compared mean and median curvature to classified hair texture. This classification is based on Loussouarn et al.’s 2007 paper "Worldwide diversity of hair curliness: a new method of assessment.
While the authors propose a number of parameters to distinguish curlier hair types (based on number of twists and waves among other factors), their primary classification is based on curvature. We demonstrate that, regardless of additional parameters, a considerable range of curvature is obscured when collapsing hair variation according to their curvature thresholds.
We carried out a number of analyses using the genotype data collected for this diverse sample. In an admixed sample where a continuous trait has divergent distributions in the parental ancestry groups, the resulting admixed population can show a correlation between ancestry and that trait. Finding such a correlation suggests may imply a polygenic trait with high heritability.
Our sample consists of admixed individuals with primarily African and European ancestry.
The colors represent ancestries that correspond to the following 1000 Genomes populations: - SAS = South Asian - AMR = American - AFR = African - EUR = European - EAS = East Asian
Each of these are metapopulations based on the grouping of multiple (sub)continental population groups in the 1000 Genomes repository.
Here we plot the correlation between proportion of African ancestry and m-index, median curvature, and eccentricity.
The relationship between cross-sectional shape (eccentricity) and curvature has long been debated. Due to the coincidence of cross-sectional shape and curvature in various populations that are often contrasted (i.e. East Asian vs. North European vs. West African), it has been unclear whether these traits have a causal relationship (specifically that higher eccentricity predicts higher curvature). In our admixed sample, we have the opportunity to test this question and fit a model between these traits with and without ancestry.
First we examine the data without correcting for ancestry.
Call:
lm(formula = curv_median ~ eccentricity_median, data = df_curv_ecc)
Residuals:
Min 1Q Median 3Q Max
-0.2843 -0.1245 -0.0264 0.1125 0.4770
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.5682 0.1270 -4.474 1.94e-05 ***
eccentricity_median 1.0284 0.1692 6.076 1.96e-08 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.1576 on 106 degrees of freedom
Multiple R-squared: 0.2583, Adjusted R-squared: 0.2513
F-statistic: 36.92 on 1 and 106 DF, p-value: 1.959e-08
If we consider the relationship between curvature and eccentricity without taking into account ancestry, we find that eccentricity is a significant predictor of curvature.
We then re-analyze the data with ancestry as a covariate.
Call:
lm(formula = curv_median ~ eccentricity_median + AFR, data = df_curv_ecc)
Residuals:
Min 1Q Median 3Q Max
-0.37902 -0.04122 -0.00657 0.03606 0.29971
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.07237 0.08661 -0.836 0.405
eccentricity_median 0.10891 0.12511 0.871 0.386
AFR 0.48101 0.03632 13.243 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.09693 on 105 degrees of freedom
Multiple R-squared: 0.7222, Adjusted R-squared: 0.7169
F-statistic: 136.5 on 2 and 105 DF, p-value: < 2.2e-16
However, when we correct for ancestry, this correlation is no longer significant. This supports the idea that these traits may be independent.
To demonstrate the potential effect of population stratification on traits, we compare hair curvature with skin pigmentation (m-index). These two traits are not biologically related, yet, in an admixed population, we may see a correlation that is due to population stratification of these polygenic traits.
First we examine the relationship between curvature and skin pigmentation without correcting for ancestry.
Call:
lm(formula = curv_median ~ m_index, data = df_curv_mindex)
Residuals:
Min 1Q Median 3Q Max
-0.20715 -0.06804 -0.03471 0.04818 0.51483
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.280232 0.041533 -6.747 4.91e-10 ***
m_index 0.012887 0.001031 12.501 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.1261 on 126 degrees of freedom
Multiple R-squared: 0.5536, Adjusted R-squared: 0.5501
F-statistic: 156.3 on 1 and 126 DF, p-value: < 2.2e-16
As expected, we see a significant correlation between the two traits.
We then apply a correction for ancestry and re-analyze the data.
Call:
lm(formula = curv_median ~ m_index + AFR, data = df_curv_mindex)
Residuals:
Min 1Q Median 3Q Max
-0.34321 -0.04877 -0.01348 0.03562 0.34806
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.061501 0.042384 -1.451 0.149
m_index 0.002734 0.001469 1.862 0.065 .
AFR 0.406609 0.048561 8.373 9.63e-14 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.1014 on 125 degrees of freedom
Multiple R-squared: 0.714, Adjusted R-squared: 0.7094
F-statistic: 156 on 2 and 125 DF, p-value: < 2.2e-16
Like with curvature and eccentricity, the relationship between curvature and skin pigmentation is no longer significant when ancestry is taken into account.
R version 3.6.3 (2020-02-29)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Mojave 10.14.6
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] cowplot_1.1.0 knitr_1.30 forcats_0.5.0 stringr_1.4.0
[5] dplyr_1.0.2 purrr_0.3.4 readr_1.4.0 tidyr_1.1.2
[9] tibble_3.0.4 ggplot2_3.3.2 tidyverse_1.3.0
loaded via a namespace (and not attached):
[1] Rcpp_1.0.5 lattice_0.20-38 lubridate_1.7.9.2 assertthat_0.2.1
[5] rprojroot_2.0.2 digest_0.6.27 R6_2.5.0 cellranger_1.1.0
[9] plyr_1.8.6 backports_1.2.0 reprex_0.3.0 evaluate_0.14
[13] httr_1.4.2 pillar_1.4.6 rlang_0.4.8 readxl_1.3.1
[17] rstudioapi_0.13 Matrix_1.2-18 rmarkdown_2.5 labeling_0.4.2
[21] splines_3.6.3 munsell_0.5.0 broom_0.7.2 compiler_3.6.3
[25] httpuv_1.5.4 modelr_0.1.8 xfun_0.19 pkgconfig_2.0.3
[29] mgcv_1.8-31 htmltools_0.5.0 tidyselect_1.1.0 workflowr_1.6.2
[33] fansi_0.4.1 crayon_1.3.4 dbplyr_2.0.0 withr_2.3.0
[37] later_1.1.0.1 grid_3.6.3 nlme_3.1-144 jsonlite_1.7.1
[41] gtable_0.3.0 lifecycle_0.2.0 DBI_1.1.0 git2r_0.27.1
[45] magrittr_1.5 scales_1.1.1 cli_2.1.0 stringi_1.5.3
[49] farver_2.0.3 reshape2_1.4.4 fs_1.5.0 promises_1.1.1
[53] xml2_1.3.2 ellipsis_0.3.1 generics_0.1.0 vctrs_0.3.4
[57] tools_3.6.3 glue_1.4.2 hms_0.5.3 yaml_2.2.1
[61] colorspace_2.0-0 rvest_0.3.6 haven_2.3.1