• Diagnostics and Quality Control Tools
  • ASEReadCounter
  • AnalyzeCovariates
  • CallableLoci
  • CheckPileup
  • CompareCallableLoci
  • ContEst
  • CountBases
  • CountIntervals
  • CountLoci
  • CountMales
  • CountRODs
  • CountRODsByRef
  • CountReadEvents
  • CountReads
  • CountTerminusEvent
  • DepthOfCoverage
  • DiagnoseTargets
  • DiffObjects
  • ErrorRatePerCycle
  • FastaStats
  • FindCoveredIntervals
  • FlagStat
  • GCContentByInterval
  • GatherBqsrReports
  • Pileup
  • PrintRODs
  • QualifyMissingIntervals
  • ReadClippingStats
  • ReadGroupProperties
  • ReadLengthDistribution
  • SimulateReadsForVariants
  • Sequence Data Processing Tools
  • BaseRecalibrator
  • ClipReads
  • IndelRealigner
  • LeftAlignIndels
  • PrintReads
  • RealignerTargetCreator
  • SplitNCigarReads
  • SplitSamFile
  • Variant Discovery Tools
  • ApplyRecalibration
  • CalculateGenotypePosteriors
  • GATKPaperGenotyper
  • GenotypeGVCFs
  • HaplotypeCaller
  • MuTect2
  • RegenotypeVariants
  • UnifiedGenotyper
  • VariantRecalibrator
  • Variant Evaluation Tools
  • GenotypeConcordance
  • ValidateVariants
  • VariantEval
  • VariantFiltration
  • Variant Manipulation Tools
  • CatVariants
  • CombineGVCFs
  • CombineVariants
  • HaplotypeResolver
  • LeftAlignAndTrimVariants
  • PhaseByTransmission
  • RandomlySplitVariants
  • ReadBackedPhasing
  • SelectHeaders
  • SelectVariants
  • ValidationSiteSelector
  • VariantAnnotator
  • VariantsToAllelicPrimitives
  • VariantsToBinaryPed
  • VariantsToTable
  • VariantsToVCF

  • Annotation Modules
  • AS_BaseQualityRankSumTest
  • AS_FisherStrand
  • AS_InbreedingCoeff
  • AS_InsertSizeRankSum
  • AS_MQMateRankSumTest
  • AS_MappingQualityRankSumTest
  • AS_QualByDepth
  • AS_RMSMappingQuality
  • AS_ReadPosRankSumTest
  • AS_StrandOddsRatio
  • AlleleBalance
  • AlleleBalanceBySample
  • AlleleCountBySample
  • BaseCounts
  • BaseCountsBySample
  • BaseQualityRankSumTest
  • BaseQualitySumPerAlleleBySample
  • ChromosomeCounts
  • ClippingRankSumTest
  • ClusteredReadPosition
  • Coverage
  • DepthPerAlleleBySample
  • DepthPerSampleHC
  • ExcessHet
  • FisherStrand
  • FractionInformativeReads
  • GCContent
  • GenotypeSummaries
  • HaplotypeScore
  • HardyWeinberg
  • HomopolymerRun
  • InbreedingCoeff
  • LikelihoodRankSumTest
  • LowMQ
  • MVLikelihoodRatio
  • MappingQualityRankSumTest
  • MappingQualityZero
  • MappingQualityZeroBySample
  • NBaseCount
  • OxoGReadCounts
  • PossibleDeNovo
  • QualByDepth
  • RMSMappingQuality
  • ReadPosRankSumTest
  • SampleList
  • SnpEff
  • SpanningDeletions
  • StrandAlleleCountsBySample
  • StrandBiasBySample
  • StrandOddsRatio
  • TandemRepeatAnnotator
  • TransmissionDisequilibriumTest
  • VariantType
  • Read Filters
  • BadCigarFilter
  • BadMateFilter
  • CountingFilteringIterator.CountingReadFilter
  • DuplicateReadFilter
  • FailsVendorQualityCheckFilter
  • HCMappingQualityFilter
  • LibraryReadFilter
  • MalformedReadFilter
  • MappingQualityFilter
  • MappingQualityUnavailableFilter
  • MappingQualityZeroFilter
  • MateSameStrandFilter
  • MaxInsertSizeFilter
  • MissingReadGroupFilter
  • NoOriginalQualityScoresFilter
  • NotPrimaryAlignmentFilter
  • OverclippedReadFilter
  • Platform454Filter
  • PlatformFilter
  • PlatformUnitFilter
  • ReadGroupBlackListFilter
  • ReadLengthFilter
  • ReadNameFilter
  • ReadStrandFilter
  • ReassignMappingQualityFilter
  • ReassignOneMappingQualityFilter
  • ReassignOriginalMQAfterIndelRealignmentFilter
  • SampleFilter
  • SingleReadGroupFilter
  • UnmappedReadFilter
  • Resource File Codecs
  • BeagleCodec
  • BedTableCodec
  • RawHapMapCodec
  • RefSeqCodec
  • SAMPileupCodec
  • SAMReadCodec
  • TableCodec

  • Reference Utilities
  • FastaAlternateReferenceMaker
  • FastaReferenceMaker
  • QCRef
  • Showing docs for version 3.7-0


    VariantFiltration

    Filter variant calls based on INFO and FORMAT annotations

    Category Variant Evaluation Tools

    Traversal LocusWalker

    PartitionBy LOCUS


    Overview

    This tool is designed for hard-filtering variant calls based on certain criteria. Records are hard-filtered by changing the value in the FILTER field to something other than PASS. Filtered records will be preserved in the output unless their removal is requested in the command line.

    The most common way of specifying filtering criteria is by using JEXL queries. See the article on JEXL expressions in the documentation Guide for detailed information and examples.

    Input

    A variant set to filter.

    Output

    A filtered VCF.

    Usage example

     java -jar GenomeAnalysisTK.jar \
       -T VariantFiltration \
       -R reference.fasta \
       -o output.vcf \
       --variant input.vcf \
       --filterExpression "AB < 0.2 || MQ0 > 50" \
       --filterName "SomeFilterName" 
     

    Caveat

    when you run {@link VariantFiltration} with a command that includes multiple logical parts, each part of the command is applied individually to the original form of the VCF record. Say you ran a VF command that includes three parts: one applies some genotype filters, another applies setFilterGtToNoCall (which changes sample genotypes to ./. whenever a sample has a genotype-level FT annotation), and yet another one filters sites based on whether any samples have a no-call there. You might think that such a command would allow you to filter sites based on sample-level annotations in one go. However, that would only work if the parts of the command were applied internally in series (like a pipeline) but that's not the case; they are applied in parallel to the same original record. So unfortunately, to achieve the desired result, these filters should be applied as separate commands.


    Additional Information

    Read filters

    These Read Filters are automatically applied to the data by the Engine before processing by VariantFiltration.

    Parallelism options

    This tool can be run in multi-threaded mode using this option.

    Window size

    This tool uses a sliding window on the reference.


    Command-line Arguments

    Engine arguments

    All tools inherit arguments from the GATK Engine' "CommandLineGATK" argument collection, which can be used to modify various aspects of the tool's function. For example, the -L argument directs the GATK engine to restrict processing to specific genomic intervals; or the -rf argument allows you to apply certain read filters to exclude some of the data from the analysis.

    VariantFiltration specific arguments

    This table summarizes the command-line arguments that are specific to this tool. For more details on each argument, see the list further down below the table or click on an argument name to jump directly to that entry in the list.

    Argument name(s) Default value Summary
    Required Inputs
    --variant
     -V
    NA Input VCF file
    Optional Inputs
    --mask
    none Input ROD mask
    Optional Outputs
    --out
     -o
    stdout File to which variants should be written
    Optional Parameters
    --clusterSize
     -cluster
    3 The number of SNPs which make up a cluster
    --clusterWindowSize
     -window
    0 The window size (in bases) in which to evaluate clustered SNPs
    --filterExpression
     -filter
    [] One or more expression used with INFO fields to filter
    --filterName
    [] Names to use for the list of filters
    --genotypeFilterExpression
     -G_filter
    [] One or more expression used with FORMAT (sample/genotype-level) fields to filter (see documentation guide for more info)
    --genotypeFilterName
     -G_filterName
    [] Names to use for the list of sample/genotype filters (must be a 1-to-1 mapping); this name is put in the FILTER field for variants that get filtered
    --maskExtension
     -maskExtend
    0 How many bases beyond records from a provided 'mask' rod should variants be filtered
    --maskName
    Mask The text to put in the FILTER field if a 'mask' rod is provided and overlaps with a variant call
    Optional Flags
    --filterNotInMask
    false Filter records NOT in given input mask.
    --invalidatePreviousFilters
    false Remove previous filters applied to the VCF
    --invertFilterExpression
     -invfilter
    false Invert the selection criteria for --filterExpression
    --invertGenotypeFilterExpression
     -invG_filter
    false Invert the selection criteria for --genotypeFilterExpression
    --missingValuesInExpressionsShouldEvaluateAsFailing
    false When evaluating the JEXL expressions, missing values should be considered failing the expression
    --setFilteredGtToNocall
    false Set filtered genotypes to no-call

    Argument details

    Arguments in this list are specific to this tool. Keep in mind that other arguments are available that are shared with other tools (e.g. command-line GATK arguments); see Inherited arguments above.


    --clusterSize / -cluster

    The number of SNPs which make up a cluster
    Works together with the {@code --clusterWindowSize} argument.

    Integer  3  [ [ -∞  ∞ ] ]


    --clusterWindowSize / -window

    The window size (in bases) in which to evaluate clustered SNPs
    Works together with the {@code --clusterWindowSize} argument. To disable the clustered SNP filter, set this value to less than 1.

    Integer  0  [ [ -∞  ∞ ] ]


    --filterExpression / -filter

    One or more expression used with INFO fields to filter
    VariantFiltration accepts any number of JEXL expressions (so you can have two named filters by using {@code --filterName One --filterExpression "X < 1" --filterName Two --filterExpression "X > 2"}).

    ArrayList[String]  []


    --filterName / -filterName

    Names to use for the list of filters
    This name is put in the

    FILTER
    field for variants that get filtered. Note that there must be a 1-to-1 mapping between filter expressions and filter names.

    ArrayList[String]  []


    --filterNotInMask / -filterNotInMask

    Filter records NOT in given input mask.
    By default, if the {@code -mask} argument is used, any variant falling in a mask will be filtered. If this argument is used, logic is reversed, and variants falling outside a given mask will be filtered. Use case is, for example, if we have an interval list or BED file with "good" sites. Note that it is up to the user to adapt the name of the mask to make it clear that the reverse logic was used (e.g. if masking against Hapmap, use {@code -maskName=hapmap} for the normal masking and {@code -maskName=not_hapmap} for the reverse masking).

    boolean  false


    --genotypeFilterExpression / -G_filter

    One or more expression used with FORMAT (sample/genotype-level) fields to filter (see documentation guide for more info)
    Similar to the

    INFO
    field based expressions, but used on the
    FORMAT
    (genotype) fields instead. {@link VariantFiltration} will add the sample-level
    FT
    tag to the
    FORMAT
    field of filtered samples (this does not affect the record's
    FILTER
    tag). One can filter normally based on most fields (e.g. {@code "GQ < 5.0"}), but the
    GT
    (genotype) field is an exception. We have put in convenience methods so that one can now filter out hets ({@code "isHet == 1"}), refs ({@code "isHomRef == 1"}), or homs ({@code "isHomVar == 1"}). Also available are expressions {@code isCalled}, {@code isNoCall}, {@code isMixed}, and {@code isAvailable}, in accordance with the methods of the {@link Genotype} object.

    ArrayList[String]  []


    --genotypeFilterName / -G_filterName

    Names to use for the list of sample/genotype filters (must be a 1-to-1 mapping); this name is put in the FILTER field for variants that get filtered
    Similar to the

    INFO
    field based expressions, but used on the
    FORMAT
    (genotype) fields instead.

    ArrayList[String]  []


    --invalidatePreviousFilters / NA

    Remove previous filters applied to the VCF
    Invalidate previous filters applied to the {@link VariantContext}, applying only the filters here.

    boolean  false


    --invertFilterExpression / -invfilter

    Invert the selection criteria for --filterExpression
    Invert the selection criteria for {@code --filterExpression}.

    boolean  false


    --invertGenotypeFilterExpression / -invG_filter

    Invert the selection criteria for --genotypeFilterExpression
    Invert the selection criteria for {@code --genotypeFilterExpression}.

    boolean  false


    --mask / -mask

    Input ROD mask
    Any variant which overlaps entries from the provided mask rod will be filtered. If the user wants logic to be reversed, i.e. filter variants that do not overlap with provided mask, then argument {@code -filterNotInMask} can be used. Note that it is up to the user to adapt the name of the mask to make it clear that the reverse logic was used (e.g. if masking against Hapmap, use {@code -maskName=hapmap} for the normal masking and {@code -maskName=not_hapmap} for the reverse masking).

    This argument supports reference-ordered data (ROD) files in the following formats: BCF2, BEAGLE, BED, BEDTABLE, EXAMPLEBINARY, GELITEXT, RAWHAPMAP, REFSEQ, SAMPILEUP, SAMREAD, TABLE, VCF, VCF3

    RodBinding[Feature]  none


    --maskExtension / -maskExtend

    How many bases beyond records from a provided 'mask' rod should variants be filtered

    Integer  0  [ [ -∞  ∞ ] ]


    --maskName / -maskName

    The text to put in the FILTER field if a 'mask' rod is provided and overlaps with a variant call
    When using the {@code -mask} argument, the {@code maskName} will be annotated in the variant record. Note that when using the {@code -filterNotInMask} argument to reverse the masking logic, it is up to the user to adapt the name of the mask to make it clear that the reverse logic was used (e.g. if masking against Hapmap, use {@code -maskName=hapmap} for the normal masking and {@code -maskName=not_hapmap} for the reverse masking).

    String  Mask


    --missingValuesInExpressionsShouldEvaluateAsFailing / NA

    When evaluating the JEXL expressions, missing values should be considered failing the expression
    By default, if JEXL cannot evaluate your expression for a particular record because one of the annotations is not present, the whole expression evaluates as

    PASS
    ing. Use this argument to have it evaluate as failing filters instead for these cases.

    Boolean  false


    --out / -o

    File to which variants should be written

    VariantContextWriter  stdout


    --setFilteredGtToNocall / NA

    Set filtered genotypes to no-call
    If this argument is provided, set filtered genotypes to no-call (./.).

    boolean  false


    --variant / -V

    Input VCF file
    Variants from this VCF file are used by this tool as input. The file must at least contain the standard VCF header lines, but can be empty (i.e., no variants are contained in the file).

    This argument supports reference-ordered data (ROD) files in the following formats: BCF2, VCF, VCF3

    R RodBinding[VariantContext]  NA