Tool Documentation Index


Showing docs for 3.7-0

Name Summary
CommandLineGATK Command line parameters accepted by most if not all tools in the GATK

Name Summary
ASEReadCounter Calculate read counts per allele for allele-specific expression analysis
AnalyzeCovariates Create plots to visualize base recalibration results
CallableLoci Collect statistics on callable, uncallable, poorly mapped, and other parts of the genome
CheckPileup Compare GATK's internal pileup to a reference Samtools pileup
CompareCallableLoci Compare callability statistics
ContEst Estimate cross-sample contamination
CountBases Count the number of bases in a set of reads
CountIntervals Count contiguous regions in an interval list
CountLoci Count the total number of covered loci
CountMales Count the number of reads seen from male samples
CountRODs Count the number of ROD objects encountered
CountRODsByRef Count the number of ROD objects encountered along the reference
CountReadEvents Count the number of read events
CountReads Count the number of reads
CountTerminusEvent Count the number of reads ending in insertions, deletions or soft-clips
DepthOfCoverage Assess sequence coverage by a wide array of metrics, partitioned by sample, read group, or library
DiagnoseTargets Analyze coverage distribution and validate read mates per interval and per sample
DiffObjects A generic engine for comparing tree-structured objects
ErrorRatePerCycle Compute the read error rate per position
FastaStats Calculate basic statistics about the reference sequence itself
FindCoveredIntervals Outputs a list of intervals that are covered to or above a given threshold
FlagStat Collect statistics about sequence reads based on their SAM flags
GCContentByInterval Calculates the GC content of the reference sequence for each interval
GatherBqsrReports Gather recalibration reports from parallelized base recalibration runs This tool is intended to be used to combine recalibration tables from runs of BaseRecalibrator parallelized per-interval.
Pileup Print read alignments in Pileup-style format
PrintRODs Print out all of the RODs in the input data set
QualifyMissingIntervals Collect quality metrics for a set of intervals
ReadClippingStats Collect read clipping statistics
ReadGroupProperties Collect statistics about read groups and their properties
ReadLengthDistribution Collect read length statistics
SimulateReadsForVariants Generate simulated reads for variants

Name Summary
BaseRecalibrator Detect systematic errors in base quality scores
ClipReads Read clipping based on quality, position or sequence matching
IndelRealigner Perform local realignment of reads around indels
LeftAlignIndels Left-align indels within reads in a bam file
PrintReads Write out sequence read data (for filtering, merging, subsetting etc)
RealignerTargetCreator Define intervals to target for local realignment
SplitNCigarReads Splits reads that contain Ns in their CIGAR string
SplitSamFile Split a BAM file by sample

Name Summary
ApplyRecalibration Apply a score cutoff to filter variants based on a recalibration table
CalculateGenotypePosteriors Calculate genotype posterior likelihoods given panel data
GATKPaperGenotyper Simple Bayesian genotyper used in the original GATK paper
GenotypeGVCFs Perform joint genotyping on gVCF files produced by HaplotypeCaller
HaplotypeCaller Call germline SNPs and indels via local re-assembly of haplotypes
MuTect2 Call somatic SNPs and indels via local re-assembly of haplotypes
RegenotypeVariants Regenotypes the variants from a VCF containing PLs or GLs.
UnifiedGenotyper Call SNPs and indels on a per-locus basis
VariantRecalibrator Build a recalibration model to score variant quality for filtering purposes

Name Summary
GenotypeConcordance Genotype concordance between two callsets
ValidateVariants Validate a VCF file with an extra strict set of criteria
VariantEval General-purpose tool for variant evaluation (% in dbSNP, genotype concordance, Ti/Tv ratios, and a lot more)
VariantFiltration Filter variant calls based on INFO and FORMAT annotations

Name Summary
CatVariants Concatenate VCF files of non-overlapping genome intervals, all with the same set of samples
CombineGVCFs Combine per-sample gVCF files produced by HaplotypeCaller into a multi-sample gVCF file
CombineVariants Combine variant records from different sources
HaplotypeResolver Haplotype-based resolution of variants in separate callsets.
LeftAlignAndTrimVariants Left-align indels in a variant callset
PhaseByTransmission Compute the most likely genotype combination and phasing for trios and parent/child pairs
RandomlySplitVariants Randomly split variants into different sets
ReadBackedPhasing Annotate physical phasing information
SelectHeaders Selects headers from a VCF source
SelectVariants Select a subset of variants from a larger callset
ValidationSiteSelector Randomly select variant records according to specified options
VariantAnnotator Annotate variant calls with context information
VariantsToAllelicPrimitives Simplify multi-nucleotide variants (MNPs) into more basic/primitive alleles.
VariantsToBinaryPed Convert VCF to binary pedigree file
VariantsToTable Extract specific fields from a VCF file to a tab-delimited table
VariantsToVCF Convert variants from other file formats to VCF format

Annotations available to VariantAnnotator and the variant callers (some restrictions apply)

Name Summary
AS_BaseQualityRankSumTest Allele-specific rank Sum Test of REF versus ALT base quality scores
AS_FisherStrand Allele-specific strand bias estimated using Fisher's Exact Test *
AS_InbreedingCoeff Allele-specific likelihood-based test for the inbreeding among samples
AS_InsertSizeRankSum Allele specific Rank Sum Test for insert sizes of REF versus ALT reads
AS_MQMateRankSumTest Allele specific Rank Sum Test for mate's mapping qualities of REF versus ALT reads
AS_MappingQualityRankSumTest Allele specific Rank Sum Test for mapping qualities of REF versus ALT reads
AS_QualByDepth Allele-specific call confidence normalized by depth of sample reads supporting the allele
AS_RMSMappingQuality Allele-specific Root Mean Square of the mapping quality of reads across all samples.
AS_ReadPosRankSumTest Allele-specific Rank Sum Test for relative positioning of REF versus ALT allele within reads
AS_StrandOddsRatio Allele-specific strand bias estimated by the Symmetric Odds Ratio test
AlleleBalance Allele balance across all samples
AlleleBalanceBySample Allele balance per sample
AlleleCountBySample Allele count and frequency expectation per sample
BaseCounts Count of A, C, G, T bases across all samples
BaseCountsBySample Count of A, C, G, T bases for each sample
BaseQualityRankSumTest Rank Sum Test of REF versus ALT base quality scores
BaseQualitySumPerAlleleBySample Sum of evidence in reads supporting each allele for each sample
ChromosomeCounts Counts and frequency of alleles in called genotypes
ClippingRankSumTest Rank Sum Test for hard-clipped bases on REF versus ALT reads
ClusteredReadPosition Detect clustering of variants near the ends of reads
Coverage Total depth of coverage per sample and over all samples.
DepthPerAlleleBySample Depth of coverage of each allele per sample
DepthPerSampleHC Depth of informative coverage for each sample.
ExcessHet Phred-scaled p-value for exact test of excess heterozygosity
FisherStrand Strand bias estimated using Fisher's Exact Test
FractionInformativeReads The fraction of reads deemed informative over the entire cohort
GCContent GC content of the reference around the given site
GenotypeSummaries Summarize genotype statistics from all samples at the site level
HaplotypeScore Consistency of the site with strictly two segregating haplotypes
HardyWeinberg Hardy-Weinberg test for transmission disequilibrium
HomopolymerRun Largest contiguous homopolymer run of the variant allele
InbreedingCoeff Likelihood-based test for the inbreeding among samples
LikelihoodRankSumTest Rank Sum Test of per-read likelihoods of REF versus ALT reads
LowMQ Proportion of low quality reads
MVLikelihoodRatio Likelihood of being a Mendelian Violation
MappingQualityRankSumTest Rank Sum Test for mapping qualities of REF versus ALT reads
MappingQualityZero Count of all reads with MAPQ = 0 across all samples
MappingQualityZeroBySample Count of reads with mapping quality zero for each sample
NBaseCount Percentage of N bases
OxoGReadCounts Count of read pairs in the F1R2 and F2R1 configurations supporting the reference and alternate alleles
PossibleDeNovo Existence of a de novo mutation in at least one of the given families
QualByDepth Variant call confidence normalized by depth of sample reads supporting a variant
RMSMappingQuality Root Mean Square of the mapping quality of reads across all samples.
ReadPosRankSumTest Rank Sum Test for relative positioning of REF versus ALT alleles within reads
SampleList List samples that are non-reference at a given site
SnpEff Top effect from SnpEff functional predictions
SpanningDeletions Fraction of reads containing spanning deletions
StrandAlleleCountsBySample Number of forward and reverse reads that support each allele
StrandBiasBySample Number of forward and reverse reads that support REF and ALT alleles
StrandOddsRatio Strand bias estimated by the Symmetric Odds Ratio test
TandemRepeatAnnotator Tandem repeat unit composition and counts per allele
TransmissionDisequilibriumTest Wittkowski transmission disequilibrium test
VariantType General category of variant

GATK Engine arguments that filter or transfer incoming SAM/BAM data files

Name Summary
BadCigarFilter Filter out reads with wonky CIGAR strings
BadMateFilter Filter out reads whose mate maps to a different contig
CountingFilteringIterator.CountingReadFilter
DuplicateReadFilter Filter out duplicate reads
FailsVendorQualityCheckFilter Filter out reads that fail the vendor quality check
HCMappingQualityFilter Filter out reads with low mapping qualities for HaplotypeCaller
LibraryReadFilter Only use reads from the specified library
MalformedReadFilter Filter out malformed reads
MappingQualityFilter Filter out reads with low mapping qualities
MappingQualityUnavailableFilter Filter out reads with no mapping quality information
MappingQualityZeroFilter Filter out reads with mapping quality zero
MateSameStrandFilter Filter out reads with bad pairing (and related) properties
MaxInsertSizeFilter Filter out reads that exceed a given insert size
MissingReadGroupFilter Filter out reads without read group information
NoOriginalQualityScoresFilter Filter out reads that do not have an original quality quality score (OQ) tag
NotPrimaryAlignmentFilter Filter out read records that are secondary alignments
OverclippedReadFilter Filter out reads that are over-soft-clipped
Platform454Filter Filter out reads produced by 454 technology
PlatformFilter Filter out reads that were generated by a specific sequencing platform
PlatformUnitFilter Filter out reads with blacklisted platform unit tags
ReadGroupBlackListFilter Filter out reads matching a read group tag value
ReadLengthFilter Filter out reads based on length
ReadNameFilter Only use reads with this read name
ReadStrandFilter Filter out reads based on strand orientation
ReassignMappingQualityFilter Set the mapping quality of all reads to a given value
ReassignOneMappingQualityFilter Set the mapping quality of reads with a given value to another given value
ReassignOriginalMQAfterIndelRealignmentFilter Revert the MQ of reads that were modified by IndelRealigner
SampleFilter Only use reads belonging to a specific sample
SingleReadGroupFilter Only use reads from the specified read group
UnmappedReadFilter Filter out unmapped reads

Codecs for reading resource files in reference ordered data (ROD) files such as BED

Name Summary
BeagleCodec Codec for Beagle imputation engine
BedTableCodec The standard table codec that expects loci as contig start stop, not contig:start-stop
RawHapMapCodec A codec for the file types produced by the HapMap consortium
RefSeqCodec Allows for reading in RefSeq information
SAMPileupCodec Decoder for SAM pileup data
SAMReadCodec Decodes a simple SAM text string
TableCodec Reads tab deliminated tabular text files

Name Summary
FastaAlternateReferenceMaker Generate an alternative reference sequence over the specified interval
FastaReferenceMaker Create a subset of a FASTA reference sequence
QCRef Quality control for the reference fasta