Using Longitudinal 16S rRNA Abundance Data to Identify Microbial Interaction Network

Last updated: 2022-10-22

Checks: 1 1

Knit directory: lglasso_data_analysis/

This reproducible R Markdown analysis was created with workflowr (version 1.7.0). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.

R Markdown file: uncommitted changes

The R Markdown file has unstaged changes. To know which version of the R Markdown file created these results, you’ll want to first commit it to the Git repo. If you’re still working on the analysis, you can ignore this warning. When you’re finished, you can run wflow_publish to commit the R Markdown file and build the HTML.

Repository version: 32641ef

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version 32641ef. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/
    Ignored:    analysis/.Rhistory
    Ignored:    analysis/networkcommon_he.png
    Ignored:    data/heterc1/

Unstaged changes:
    Modified:   analysis/index.Rmd

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.

These are the previous versions of the repository in which changes were made to the R Markdown (analysis/index.Rmd) and HTML (docs/index.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File	Version	Author	Date	Message
Rmd	32641ef	Jie Zhou	2022-10-21	minor revisions
html	32641ef	Jie Zhou	2022-10-21	minor revisions
html	534bd70	Jie Zhou	2022-10-21	minor revisions
html	5246568	Jie Zhou	2022-10-21	updated code for all the figures and tables
Rmd	f1fba0a	Jie Zhou	2022-10-21	updated code for all the figures and tables
html	f1fba0a	Jie Zhou	2022-10-21	updated code for all the figures and tables
Rmd	520f495	Jie Zhou	2022-10-21	updated code for all the figures and tables
html	520f495	Jie Zhou	2022-10-21	updated code for all the figures and tables
Rmd	e045a4e	Jie Zhou	2022-09-29	complete version
html	e045a4e	Jie Zhou	2022-09-29	complete version
Rmd	dd32d09	Jie Zhou	2022-09-28	create the repo
html	dd32d09	Jie Zhou	2022-09-28	create the repo
html	4d8e172	Jie Zhou	2022-09-27	Build site.
Rmd	716a0c1	Jie Zhou	2022-09-27	Start workflowr project.

Introduction

This website demonstrate the specific procedures to reproduce the results in the paper Identifying Microbial Interaction Networks Based on Irregularly Spaced Longitudinal 16S rRNA sequence data.
In the paper, we compared the proposed network identification algorithm lglasso with other conventional algorithms, i.e., glasso, neighborhood method, GGMselect-CO1 and GGMselect-LA. It is shown that the proposed lglasso outperform the other methods when the data are longitudinal. In order to carry out the simulation studies, in addition to the functions defined in package lglasso, we also defined some other functions to facilitate the simulation. These functions are then sourced into the simulation scripts.
Unfortunately, because the authors have not been authorized to disclose the real data used in the paper, we only demonstrate the procedure used to reproduce the simulation results in the paper.

Reproduce the results

In order to run the script, you need to install the package first, using the following code,

remotes::install_github("jiezhou-2/lglasso",ref = "conditional")
Note since in each figure, there are four scenarios being investigated which only differ in terms of their parameter settings, so only the code for one of the four scenarios are displayed. You can change the parameter setting to get the results for other settings. The same rule is used for the results in the tables. Also since running the code can take hours,if possible, I would suggest to submit the code to a server instead of on your local computer when runing the code.
All the simulation are implemented based on the R function power_compare1, which has the following form result= power_compare1(m,n,p,coe,l,rho,prob,heter,community2,zirate) where
- m is the number of subjects to be simulated
- n is the number of observations for each subject
- p is the number of nodes in the network to be simulated
- coe is the coefficient for the covariate-adjusted lglasso algorithm
- l is the number of replication for the simulation
- rho is a list with length equal to 5. Each component of rho is a sequence of tuning parameters on which the solution path is computed. These five components correspond to the algorithms lglasso, glasso, nh, GMMselect-C01 and GGMselect-LA respectively.
- prob is the edge density of the network to be generated
- heter is a binary indicator. If heter=0 then generate the data using homogeneous SGGM; if heter=1, then generate the data using heterogeneous SGGM.
- community2 is a binary indicator. If community2=T, then the data are generated from homogeneous microbial community; if community2=F, then the data are generated from heterogeneous microbial community.
- zirate is a 2-component vector which controls the zero inflation rate in the simulated data.
Output result:
- result[[1]] is a length 5 list corresponding to the five algorithms. Each of the five components of result[[1]] is a l * 2 matrix. Each row of this matrix is a (TPR, FPR) pair which corresponds to the tuning parameter sequence. The Figures are plotted based on this results.
- results[[2]] is a 5*2 matrix corresponding to the (TPR, FPR) pairs of the five networks selected by the five algorithms based on EBIC.
- results[[3]] is a list recording all (TPR, FPR) results for each replicate.
- Procedure for generating Figure 1.
- Procedure for generating Figure 2.
- Procedure for generating Figure 3.
- Procedure for generating Table 1.
- Procedure for generating Table 2.
- Procedure for generating Table 3.