Last updated: 2022-03-17

Checks: 1 1

Knit directory: ~/Documents/Winter_Quarter_2022/Fundamentals/jaurbanChicago.github.io/bin/

This reproducible R Markdown analysis was created with workflowr (version 1.7.0). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.

R Markdown file: uncommitted changes

The R Markdown file has unstaged changes. To know which version of the R Markdown file created these results, you’ll want to first commit it to the Git repo. If you’re still working on the analysis, you can ignore this warning. When you’re finished, you can run wflow_publish to commit the R Markdown file and build the HTML.

Repository version: 6ad5355

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version 6ad5355. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Untracked files:
    Untracked:  .DS_Store
    Untracked:  .Rhistory

Unstaged changes:
    Deleted:    New Folder With Items/.DS_Store
    Deleted:    New Folder With Items/Genetic_Drift_Markov_Chain.Rmd
    Deleted:    New Folder With Items/_site.yml
    Deleted:    New Folder With Items/include/footer.html
    Modified:   bin/Genetic_Drift_Introduction.Rmd
    Modified:   bin/Genetic_Drift_Markov.Rmd
    Deleted:    index.html

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.

These are the previous versions of the repository in which changes were made to the R Markdown (bin/Genetic_Drift_Markov.Rmd) and HTML (docs/Genetic_Drift_Markov.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File	Version	Author	Date	Message
html	112890c	jaurbanChicago	2022-03-17	Added Genetic_Drift_Markov.html
Rmd	6b81ff1	jaurbanChicago	2022-03-17	Added Genetic_Drift_Markov.Rmd

Pre-Requisites

Wright-Fisher Model
Discrete Markov Chains
Basic notions of probability theory
Basic notions of population genetics

Introduction to Discrete Markov Chains

Before getting into the specific details of the relationship between Markov chains and genetic drift, it is necessary to briefly explain what a discrete, finite Markov chain is.

Let´s consider a discrete random variable \(X\), which at time points \(0,1,2,3,...\), takes values \(0,1,2,...,M\). Therefore, it is precise to say that each state \(E_i\), \(X\) takes the value \(i\) [2]. This means that at each state \(E_i\) at some time \(t\), \(X\) is in some state \(E_i\). A variable is said to be Markovian if the probability of a certain outcome in the next time step \(t+1\) depends only on the present state at time \(t\) and is memoryless with regard to any previous time step \(t-1,t-2,...\). Therefore, we can now define a discrete Markov chain as a sequence of discrete random variables in which the probability distribution of the different states at each time \(t\) depends only on the state at time \(t-1\) [1,2].

Some important mathematical notation to describe a discrete Markov chain is [2,3]:
\[ P(E_{n+1}|E_{n})=i,E_{n-1}=i_{n-1},...,E_1=i_1,E_0=i_0)=P(E_{n+1}=j|E_n=i)\\ \mathbb{A \space discrete \space Markov \space is \space homogeneous \space if:}\\ P(E_{n+1}|E_{n})=P(E_i=j|E_{i-1}=i) \space \mathbb{for \space all} \space n, \mathbb{so} \space P(E_i=j|E_{i-1})=P_{ij}\\ P_{ij} \gt 0;i,j \ge 0; \sum_{j=0}^{\infty}P_{ij}=1 \\ \mathbb{A \space transition \space probability \space matrix \space can \space be \space assembled \space to \space represent \space the \space probabilities \space of \space} P_{ij} \space \\ P_{ij}= \begin{pmatrix} p_{00} & p_{01} & ... & p_{0M}\\ p_{00} & p_{11} & ... & p_{1M} \\ .\\ .\\ .\\ p_{M0} & p_{M1} & ... & p_{MM} \end{pmatrix} \\ \mathbb{This \space is \space an \space stochastic \space matrix, so \space all \space rows \space must \space sum \space to \space 1} \]

Next, we are going to classify the chains in two different classes:

Irreducible: All states can communicate with one another[3].
Ergodic: Irreducible, recurrent, and aperiodic. Recurrent means that every state \(E_i\) can reenter \(E_i\) often infinitely when the chain starts at this particular \(E_i\)., i.e. \(f_{i}=1\). Aperiodic means all of the Markov chain states are aperiodic [3].

For the remainder of this vignette, it is important to denote that we will be working with Markov chains that have no periodicities in them, which hardly arise in any genetical applications [2]. Finally, we need to define the concept of stationary distribution of a Markov chain. A stationary distribution may be defined with the following mathematical terms [3]:

\[ \mathbb{Consider \space}P^t \space \mathbb{as \space t \space gets \space large}:\\ \lim_{t\to\infty}(\pi^{(0)})^TP^t=\pi^T ; \space \mathbb{the \space \pi \space vector \space is \space the \space limiting \space distribution \space of \space the \space Markov \space chain}\\ \mathbb{Stationarity \space property \space of \space \pi}:\\ \pi^T=\pi^TP; \space \mathbb{because} \space \pi_i=\sum_k \pi_kP_{ki} \] For ergodic chains, the limiting distribution is also the stationary distribution and there is only one unique stationary distribution (i.e., equilibrium distribution) [3]. To obtain the stationary distribution of the Markov chain, we can use matrix exponentiation (each row of the resulting matrix will have \(\pi\) in each row), solve linear equations (solve for \(\pi^T=\pi^TP\)), and use eigendecomposition (\(\pi\) is left eigenvector of \(P\), so eigendecomposition can lead to \(\pi\)).

Markov Chains in Genetic Drift

A discrete Markov chain can be quite helpful

This site was created with R Markdown

Markov Chain Model of Genetic Drift

Jose Antonio Urban Aragon, Ethan Zhong

3/16/2022

Pre-Requisites

Introduction to Discrete Markov Chains

Markov Chains in Genetic Drift