Markov Chains

Introduction
The Markov assumption
- The central dogma of biology as a Markov chain
Components of Markov Chains
Hidden Markov Models

Last updated: 2019-05-22

Checks: 2 0

Knit directory: MSTPsummerstatistics/

This reproducible R Markdown analysis was created with workflowr (version 1.3.0). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.

R Markdown file: up-to-date

Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Repository version: dd1e411

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility. The version displayed above was the version of the Git repository at the time these results were generated.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .DS_Store
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/
    Ignored:    analysis/.RData
    Ignored:    analysis/.Rhistory

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.

These are the previous versions of the R Markdown and HTML files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view them.

File	Version	Author	Date	Message
Rmd	4ce8e85	Anthony Hung	2019-05-21	bandersnatch add
html	096760a	Anthony Hung	2019-05-18	Build site.
Rmd	193ab25	Anthony Hung	2019-05-18	additions to complete mult testing
html	193ab25	Anthony Hung	2019-05-18	additions to complete mult testing
html	da98ae8	Anthony Hung	2019-05-17	Build site.
Rmd	239723e	Anthony Hung	2019-05-08	Update learning objectives
html	2ec7944	Anthony Hung	2019-05-06	Build site.
Rmd	d45dca4	Anthony Hung	2019-05-06	Republish
html	d45dca4	Anthony Hung	2019-05-06	Republish
Rmd	ee75486	Anthony Hung	2019-05-04	Build site.

Introduction

Markov chains are models that describe the sequence of possible countable events for a system in which the probability of transitions from each event to the next is dependent only on the event immediately preceeding that event. Markov chains are a staple in computational statistics. Our objective today is to learn the basics behind Markov Chains and their long-run behavior.

The Markov assumption

The Markov assumption assumes that in order to predict the future behavior of a system, all that is required is knowledge of the present state of the system and not the past state of the system. For example, given a set of times \(t_1, t_2, t_3, t_4\) and states \(X_1, X_2, X_3, X_4\), under the Markov assumption or Markov property:

\[P(X_4=1|X_3=0, X_2=1, X_1=1) = P(X_4=1|X_3=0)\]

In other words, “the past and the future are conditionally independent given the present”. If we have knowledge about the present, then knowing the past does not give us any more information to predict what will happen in the future. Another term that is commonly used to describe Markov chains is “memorylessness.”

Question: What distribution that we have discussed in probability is also described by the property of “memorylessness”?

The Poisson distribution is memoryless. You can set any point along a Poisson process as time 0 and have it be another Poisson process.

The central dogma of biology as a Markov chain

The central dogma of biology describes how information moves from DNA to RNA to Protein.

\[DNA \rightarrow RNA \rightarrow Protein\]

The assumption under the central dogma is that information flows only in one direction, and never backwards. Under a Markov chain model of the central dogma, the amount of RNA you observe in a cell is some function of the genetic variations seen at the DNA sequence level (in coding and noncoding regulatory regions), and the amount of protein you see in the cell is some function of the abundance of RNA transcripts in the cell coding for that protein. If you know the amount of RNA in the cell, then knowing the underlying DNA sequence of the cell at the gene encoding the protein does not give you more information to better predict the amount of protein in the cell. Obviously, there are exceptions to such a simple model of biology, but in the vast majority of cases this model does a very good job of describing biological networks.

Components of Markov Chains

A Markov chain can be described by two

Hidden Markov Models

https://www.sciencedirect.com/science/article/pii/S0006349596794341