• Diagnostics and Quality Control Tools
  • ASEReadCounter
  • AnalyzeCovariates
  • CallableLoci
  • CheckPileup
  • CompareCallableLoci
  • ContEst
  • CountBases
  • CountIntervals
  • CountLoci
  • CountMales
  • CountRODs
  • CountRODsByRef
  • CountReadEvents
  • CountReads
  • CountTerminusEvent
  • DepthOfCoverage
  • DiagnoseTargets
  • DiffObjects
  • ErrorRatePerCycle
  • FastaStats
  • FindCoveredIntervals
  • FlagStat
  • GCContentByInterval
  • GatherBqsrReports
  • Pileup
  • PrintRODs
  • QualifyMissingIntervals
  • ReadClippingStats
  • ReadGroupProperties
  • ReadLengthDistribution
  • SimulateReadsForVariants
  • Sequence Data Processing Tools
  • BaseRecalibrator
  • ClipReads
  • IndelRealigner
  • LeftAlignIndels
  • PrintReads
  • RealignerTargetCreator
  • SplitNCigarReads
  • SplitSamFile
  • Variant Discovery Tools
  • ApplyRecalibration
  • CalculateGenotypePosteriors
  • GATKPaperGenotyper
  • GenotypeGVCFs
  • HaplotypeCaller
  • MuTect2
  • RegenotypeVariants
  • UnifiedGenotyper
  • VariantRecalibrator
  • Variant Evaluation Tools
  • GenotypeConcordance
  • ValidateVariants
  • VariantEval
  • VariantFiltration
  • Variant Manipulation Tools
  • CatVariants
  • CombineGVCFs
  • CombineVariants
  • HaplotypeResolver
  • LeftAlignAndTrimVariants
  • PhaseByTransmission
  • RandomlySplitVariants
  • ReadBackedPhasing
  • SelectHeaders
  • SelectVariants
  • ValidationSiteSelector
  • VariantAnnotator
  • VariantsToAllelicPrimitives
  • VariantsToBinaryPed
  • VariantsToTable
  • VariantsToVCF

  • Annotation Modules
  • AS_BaseQualityRankSumTest
  • AS_FisherStrand
  • AS_InbreedingCoeff
  • AS_InsertSizeRankSum
  • AS_MQMateRankSumTest
  • AS_MappingQualityRankSumTest
  • AS_QualByDepth
  • AS_RMSMappingQuality
  • AS_ReadPosRankSumTest
  • AS_StrandOddsRatio
  • AlleleBalance
  • AlleleBalanceBySample
  • AlleleCountBySample
  • BaseCounts
  • BaseCountsBySample
  • BaseQualityRankSumTest
  • BaseQualitySumPerAlleleBySample
  • ChromosomeCounts
  • ClippingRankSumTest
  • ClusteredReadPosition
  • Coverage
  • DepthPerAlleleBySample
  • DepthPerSampleHC
  • ExcessHet
  • FisherStrand
  • FractionInformativeReads
  • GCContent
  • GenotypeSummaries
  • HaplotypeScore
  • HardyWeinberg
  • HomopolymerRun
  • InbreedingCoeff
  • LikelihoodRankSumTest
  • LowMQ
  • MVLikelihoodRatio
  • MappingQualityRankSumTest
  • MappingQualityZero
  • MappingQualityZeroBySample
  • NBaseCount
  • OxoGReadCounts
  • PossibleDeNovo
  • QualByDepth
  • RMSMappingQuality
  • ReadPosRankSumTest
  • SampleList
  • SnpEff
  • SpanningDeletions
  • StrandAlleleCountsBySample
  • StrandBiasBySample
  • StrandOddsRatio
  • TandemRepeatAnnotator
  • TransmissionDisequilibriumTest
  • VariantType
  • Read Filters
  • BadCigarFilter
  • BadMateFilter
  • CountingFilteringIterator.CountingReadFilter
  • DuplicateReadFilter
  • FailsVendorQualityCheckFilter
  • HCMappingQualityFilter
  • LibraryReadFilter
  • MalformedReadFilter
  • MappingQualityFilter
  • MappingQualityUnavailableFilter
  • MappingQualityZeroFilter
  • MateSameStrandFilter
  • MaxInsertSizeFilter
  • MissingReadGroupFilter
  • NoOriginalQualityScoresFilter
  • NotPrimaryAlignmentFilter
  • OverclippedReadFilter
  • Platform454Filter
  • PlatformFilter
  • PlatformUnitFilter
  • ReadGroupBlackListFilter
  • ReadLengthFilter
  • ReadNameFilter
  • ReadStrandFilter
  • ReassignMappingQualityFilter
  • ReassignOneMappingQualityFilter
  • ReassignOriginalMQAfterIndelRealignmentFilter
  • SampleFilter
  • SingleReadGroupFilter
  • UnmappedReadFilter
  • Resource File Codecs
  • BeagleCodec
  • BedTableCodec
  • RawHapMapCodec
  • RefSeqCodec
  • SAMPileupCodec
  • SAMReadCodec
  • TableCodec

  • Reference Utilities
  • FastaAlternateReferenceMaker
  • FastaReferenceMaker
  • QCRef
  • Showing docs for version 3.7-0


    ApplyRecalibration

    Apply a score cutoff to filter variants based on a recalibration table

    Category Variant Discovery Tools

    Traversal LocusWalker

    PartitionBy LOCUS


    Overview

    This tool performs the second pass in a two-stage process called VQSR; the first pass is performed by the VariantRecalibrator tool. In brief, the first pass consists of creating a Gaussian mixture model by looking at the distribution of annotation values over a high quality subset of the input call set, and then scoring all input variants according to the model. The second pass consists of filtering variants based on score cutoffs identified in the first pass.

    Using the tranche file and recalibration table generated by the previous step, the ApplyRecalibration tool looks at each variant's VQSLOD value and decides which tranche it falls in. Variants in tranches that fall below the specified truth sensitivity filter level have their FILTER field annotated with the corresponding tranche level. This will result in a call set that is filtered to the desired level but retains the information necessary to increase sensitivity if needed.

    To be clear, please note that by "filtered", we mean that variants failing the requested tranche cutoff are marked as filtered in the output VCF; they are not discarded.

    VQSR is probably the hardest part of the Best Practices to get right, so be sure to read the method documentation, parameter recommendations and tutorial to really understand what these tools and how to use them for best results on your own data.

    Input

    Output

    Usage example for filtering SNPs

     java -jar GenomeAnalysisTK.jar \
       -T ApplyRecalibration \
       -R reference.fasta \
       -input raw_variants.vcf \
       --ts_filter_level 99.0 \
       -tranchesFile output.tranches \
       -recalFile output.recal \
       -mode SNP \
       -o path/to/output.recalibrated.filtered.vcf
     

    Allele-specific usage

     java -jar GenomeAnalysisTK.jar \
       -T ApplyRecalibration \
       -R reference.fasta \
       -input raw_variants.withASannotations.vcf \
       -AS \
       --ts_filter_level 99.0 \
       -tranchesFile output.AS.tranches \
       -recalFile output.AS.recal \
       -mode SNP \
       -o path/to/output.recalibrated.ASfiltered.vcf
     
    Each allele will be annotated by its corresponding entry in the AS_FilterStatus INFO field annotation. Allele-specific VQSLOD and culprit are also carried through from VariantRecalibrator and stored in the AS_VQSLOD and AS_culprit INFO fields, respectively. The site-level filter is set to the most lenient of any of the allele filters. That is, if one allele passes, the whole site will be PASS. If no alleles pass, the site-level filter will be set to the lowest sensitivity tranche among all the alleles. Note that the .tranches and .recal files should be derived from an allele-specific run of VariantRecalibrator Also note that the AS_culprit, AS_FilterStatus, and AS_VQSLOD fields will have placeholder values (NA or NaN) for alleles of a type that have not yet been processed by ApplyRecalibration The spanning deletion allele (*) will not be recalibrated because it represents missing data. Its VQSLOD will remain NaN and it's culprit and FilterStatus will be NA.

    Caveats


    Additional Information

    Read filters

    These Read Filters are automatically applied to the data by the Engine before processing by ApplyRecalibration.

    Parallelism options

    This tool can be run in multi-threaded mode using this option.


    Command-line Arguments

    Engine arguments

    All tools inherit arguments from the GATK Engine' "CommandLineGATK" argument collection, which can be used to modify various aspects of the tool's function. For example, the -L argument directs the GATK engine to restrict processing to specific genomic intervals; or the -rf argument allows you to apply certain read filters to exclude some of the data from the analysis.

    ApplyRecalibration specific arguments

    This table summarizes the command-line arguments that are specific to this tool. For more details on each argument, see the list further down below the table or click on an argument name to jump directly to that entry in the list.

    Argument name(s) Default value Summary
    Required Inputs
    --input
    NA The raw input variants to be recalibrated
    --recal_file
     -recalFile
    NA The input recal file used by ApplyRecalibration
    Optional Inputs
    --tranches_file
     -tranchesFile
    NA The input tranches file describing where to cut the data
    Optional Outputs
    --out
     -o
    stdout The output filtered and recalibrated VCF file in which each variant is annotated with its VQSLOD value
    Optional Parameters
    --ignore_filter
     -ignoreFilter
    NA If specified, the recalibration will be applied to variants marked as filtered by the specified filter name in the input VCF file
    --mode
    SNP Recalibration mode to employ: 1.) SNP for recalibrating only SNPs (emitting indels untouched in the output VCF); 2.) INDEL for indels; and 3.) BOTH for recalibrating both SNPs and indels simultaneously.
    --ts_filter_level
    NA The truth sensitivity level at which to start filtering
    Optional Flags
    --excludeFiltered
     -ef
    false Don't output filtered loci after applying the recalibration
    --ignore_all_filters
     -ignoreAllFilters
    false If specified, the variant recalibrator will ignore all input filters. Useful to rerun the VQSR from a filtered output file.
    --useAlleleSpecificAnnotations
     -AS
    false If specified, the tool will attempt to apply a filter to each allele based on the input tranches and allele-specific .recal file.
    Advanced Parameters
    --lodCutoff
    NA The VQSLOD score below which to start filtering

    Argument details

    Arguments in this list are specific to this tool. Keep in mind that other arguments are available that are shared with other tools (e.g. command-line GATK arguments); see Inherited arguments above.


    --excludeFiltered / -ef

    Don't output filtered loci after applying the recalibration

    boolean  false


    --ignore_all_filters / -ignoreAllFilters

    If specified, the variant recalibrator will ignore all input filters. Useful to rerun the VQSR from a filtered output file.

    boolean  false


    --ignore_filter / -ignoreFilter

    If specified, the recalibration will be applied to variants marked as filtered by the specified filter name in the input VCF file
    For this to work properly, the -ignoreFilter argument should also be applied to the VariantRecalibration command.

    String[]  NA


    --input / -input

    The raw input variants to be recalibrated
    These calls should be unfiltered and annotated with the error covariates that are intended to use for modeling.

    This argument supports reference-ordered data (ROD) files in the following formats: BCF2, VCF, VCF3

    R List[RodBinding[VariantContext]]  NA


    --lodCutoff / -lodCutoff

    The VQSLOD score below which to start filtering

    Double  NA


    --mode / -mode

    Recalibration mode to employ: 1.) SNP for recalibrating only SNPs (emitting indels untouched in the output VCF); 2.) INDEL for indels; and 3.) BOTH for recalibrating both SNPs and indels simultaneously.

    The --mode argument is an enumerated type (Mode), which can have one of the following values:

    SNP
    INDEL
    BOTH

    Mode  SNP


    --out / -o

    The output filtered and recalibrated VCF file in which each variant is annotated with its VQSLOD value

    VariantContextWriter  stdout


    --recal_file / -recalFile

    The input recal file used by ApplyRecalibration

    This argument supports reference-ordered data (ROD) files in the following formats: BCF2, VCF, VCF3

    R RodBinding[VariantContext]  NA


    --tranches_file / -tranchesFile

    The input tranches file describing where to cut the data

    File  NA


    --ts_filter_level / -ts_filter_level

    The truth sensitivity level at which to start filtering

    Double  NA


    --useAlleleSpecificAnnotations / -AS

    If specified, the tool will attempt to apply a filter to each allele based on the input tranches and allele-specific .recal file.
    Filter the input file based on allele-specific recalibration data. See tool docs for site-level and allele-level filtering details. Requires a .recal file produced using an allele-specific run of VariantRecalibrator

    boolean  false