Last updated: 2021-05-23
Checks: 2 0
Knit directory: wildlife-bacteria/
This reproducible R Markdown analysis was created with workflowr (version 1.6.2). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.
Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.
Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.
The results in this page were generated with repository version 3a8e4cd. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.
Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish
or wflow_git_commit
). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:
Ignored files:
Ignored: .DS_Store
Ignored: .Rhistory
Ignored: .Rproj.user/
Ignored: analysis/.DS_Store
Ignored: data/.DS_Store
Ignored: output/.DS_Store
Ignored: output/plots/.DS_Store
Ignored: output/plots/QC/.DS_Store
Ignored: output/plots/boxplots_select_taxa/.DS_Store
Ignored: output/plots/heatmaps/.DS_Store
Ignored: output/plots/maps/.DS_Store
Ignored: output/plots/tax_prev_abund/.DS_Store
Untracked files:
Untracked: NCBI_data/
Untracked: analysis/microbiome-viz-extra.Rmd
Untracked: analysis/tois.Rmd
Untracked: data/dada2/
Untracked: data/dada2_tois/
Untracked: data/taxa_trees/
Untracked: data/tmp/
Untracked: output/beta-div-statistics.txt
Untracked: output/supp_table_pos.xlsx
Untracked: tmp/
Unstaged changes:
Modified: analysis/_site.yml
Modified: analysis/index.Rmd
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
These are the previous versions of the repository in which changes were made to the R Markdown (analysis/QIIME2.Rmd
) and HTML (docs/QIIME2.html
) files. If you’ve configured a remote Git repository (see ?wflow_git_remote
), click on the hyperlinks in the table below to view the files as they were in that past version.
File | Version | Author | Date | Message |
---|---|---|---|---|
Rmd | 0d602a3 | siobhon-egan | 2021-05-23 | updates to pages |
html | 0d602a3 | siobhon-egan | 2021-05-23 | updates to pages |
html | a69dea3 | siobhon-egan | 2021-04-24 | Build site. |
Rmd | 6ebcc5d | siobhon-egan | 2021-04-24 | updates |
html | 6ebcc5d | siobhon-egan | 2021-04-24 | updates |
html | 5486f7e | siobhon-egan | 2021-02-26 | Build site. |
Rmd | ebe20d0 | siobhon-egan | 2021-02-26 | add qiime and phyloseq pages |
Analysis of 16S rRNA (hypervariable region v1-2) metabarcoding.
Raw Illumina MiSeq .fastq.gz
reads analysed using QIIME2-2020.11 pipeline using dada2 denoising to create ASVs.
Author: Siobhon Egan siobhonegan@hotmail.comsiobhon.egan@murdoch.edu.au Last updated: Jan 20201
Background
This workflow is written for analysing amplicon data in QIIME2. Input data is Illumina MiSeq paired-end data prepared using Nextera XT indexes (i.e. no additional demultiplexing steps are needed in this case however should your data require demultiplexing it can easily be added in).
This workflow utilsing commandline interface with QIIME2.
Requires miniconda/conda, see here
Latest version = QIIME2-2020.11, see QIIME2 documentation for install based on your platform.
Activate qiime2 environment
conda activate qiime2-2020.11
Assumes paired-end data that does not require demultiplexing
Place raw data files in zipped format (i.e. .fastq.gz
in a directory called raw_data/
).
In Casava 1.8 demultiplexed (paired-end) format, there are two .fastq.gz
files for each sample in the study, each containing the forward or reverse reads for that sample. The file name includes the sample identifier. The forward and reverse read file names for a single sample might look like XXXX_L001_R1_001.fastq.gz
and XXXX_L001_R2_001.fastq.gz
, respectively. The underscore-separated fields in this file name are:
Depending on sequencing facility you may need to add the _001
prefix to sample files.
Note however that you do not need to unzip fastq data to analyse.
Navigate into the directory with raw data files:
for file in raw_data/*.fastq.gz;
do
newname=$(echo "$file" | sed 's/0_BPDNR//' | sed 's/.fastq/_001.fastq/')
mv $file $newname
done
Import .fastq.gz
data into QIIME2 format using Casava 1.8 demultiplexed (paired-end) option. Remember assumes raw data is in directory labelled raw_data/
and file naming format as above.
qiime tools import \
--type 'SampleData[PairedEndSequencesWithQuality]' \
--input-path raw_data \
--input-format CasavaOneEightSingleLanePerSampleDirFmt \
--output-path 16S_demux_seqs.qza
In this case we are using Nextera Indexes which mean they are demultiplexed automatically by basespace and therefore we can skip over any reference to demultiplexing steps.
Inspect reads for quality To inspect raw reads
qiime demux summarize \
--i-data 16S_demux_seqs.qza \
--o-visualization 16S_demux_seqs.qzv
View this output by importing into QIIME2 view. Use this output to choose your parameters for QC such as trimming low quality sequences and truncating sequence length.
This holds you associated metadata related to your samples (e.g. host information, sampling data, etc). Tutorial here
The metadata needs to be in .tsv
format, the best way to do this is to access the QIIME2 googlesheet example. Save a copy and edit/add in your sample details. Then select File > Download as > Tab-separated values
. Alternatively, the command wget "https://data.qiime2.org/2020.11/tutorials/moving-pictures/sample_metadata.tsv"
will download the sample metadata as tab-separated text and save it in the file sample-metadata.tsv
. It is import you don’t change the header for the first column sample-id
.
Denoise using dada2
Based on quality plot in the above output 16S_demux_seqs.qza
adjust trim length to where quality falls.
Then you can also trim primers. In this case working with 16S v1-2 data with the following primers
Example data - amplicon NGS data targeting bacteria using 16S rRNA hypervariable region 1-2 with the following primers:
qiime dada2 denoise-paired \
--i-demultiplexed-seqs 16S_demux_seqs.qza \
--p-trim-left-f 20 \
--p-trim-left-r 19 \
--p-trunc-len-f 250 \
--p-trunc-len-r 250 \
--o-table 16S_denoise_table.qza \
--o-representative-sequences 16S_denoise_rep-seqs.qza \
--o-denoising-stats 16S_denoise-stats.qza
At this stage, you will have artifacts containing the feature table, corresponding feature sequences, and DADA2 denoising stats. You can generate summaries of these as follows.
qiime feature-table summarize \
--i-table 16S_denoise_table.qza \
--o-visualization 16S_denoise_table.qzv \
--m-sample-metadata-file sample-metadata.tsv # Can skip this bit if needed.
qiime feature-table tabulate-seqs \
--i-data 16S_denoise_rep-seqs.qza \
--o-visualization 16S_denoise_rep-seqs.qzv
qiime metadata tabulate \
--m-input-file 16S_denoise-stats.qza \
--o-visualization 16S_denoise-stats.qzv
To merge denoised data sets and generate one FeatureTable[Frequency]
and FeatureData[Sequence]
artifacts
qiime feature-table merge \
--i-tables table-1.qza \
--i-tables table-2.qza \
--o-merged-table table.qza
qiime feature-table merge-seqs \
--i-data rep-seqs-1.qza \
--i-data rep-seqs-2.qza \
--o-merged-data rep-seqs.qza
To produce an ASV table with number of each ASV reads per sample that you can open in excel. Use tutorial here
Need to make biom file first
qiime tools export \
--input-path 16S_denoise_table.qza \
--output-path feature-table
biom convert \
-i feature-table/feature-table.biom \
-o feature-table/feature-table.tsv \
--to-tsv
Several downstream diversity metrics, available within QIIME 2, require that a phylogenetic tree be constructed using the Operational Taxonomic Units (OTUs) or Amplicon Sequence Variants (ASVs) being investigated. Documentation here
qiime phylogeny align-to-tree-mafft-fasttree \
--i-sequences rep-seqs.qza \
--o-alignment aligned-rep-seqs.qza \
--o-masked-alignment masked-aligned-rep-seqs.qza \
--o-tree unrooted-tree.qza \
--o-rooted-tree rooted-tree.qza
Export
Covert unrooted tree output to newick formatted file
qiime tools export \
--input-path unrooted-tree.qza \
--output-path exported-tree
Assign taxonomy to denoised sequences using a pre-tarined naive bayes classifier and the q2-feature-classifier plugin. Details on how to create a classifier are available here.
Note that taxonomic classifiers perform best when they are trained based on your specific sample preparation and sequencing parameters, including the primers that were used for amplification and the length of your sequence reads.
qiime feature-classifier classify-sklearn \
--i-classifier /Taxonomy/QIIME2_classifiers_v2020.11/Silva_99_Otus/27F-388Y/classifier.qza \
--i-reads 16S_denoise_rep-seqs.qza \
--o-classification qiime2-taxa-silva/taxonomy.qza
qiime metadata tabulate \
--m-input-file qiime2-taxa-silva/taxonomy.qza \
--o-visualization qiime2-taxa-silva/taxonomy.qzv
In order to be able to download the sample OTU table need to do the taxonomy assignment and then make the taxa barplot. Then can download csv file with sequence number, samples and taxonomy. see here
qiime taxa barplot \
--i-table table.qza \
--i-taxonomy taxonomy.qza \
--m-metadata-file sample-metadata.tsv \
--o-visualization taxa-bar-plots.qzv
Details on sample metadata available here
Extra bit of code to generate a taxonomy table table to tsv from the commandline
qiime tools export \
--input-path taxonomy.qza \
--output-path exports
Place to leave some links