• Diagnostics and Quality Control Tools
  • ASEReadCounter
  • AnalyzeCovariates
  • CallableLoci
  • CheckPileup
  • CompareCallableLoci
  • ContEst
  • CountBases
  • CountIntervals
  • CountLoci
  • CountMales
  • CountRODs
  • CountRODsByRef
  • CountReadEvents
  • CountReads
  • CountTerminusEvent
  • DepthOfCoverage
  • DiagnoseTargets
  • DiffObjects
  • ErrorRatePerCycle
  • FastaStats
  • FindCoveredIntervals
  • FlagStat
  • GCContentByInterval
  • GatherBqsrReports
  • Pileup
  • PrintRODs
  • QualifyMissingIntervals
  • ReadClippingStats
  • ReadGroupProperties
  • ReadLengthDistribution
  • SimulateReadsForVariants
  • Sequence Data Processing Tools
  • BaseRecalibrator
  • ClipReads
  • IndelRealigner
  • LeftAlignIndels
  • PrintReads
  • RealignerTargetCreator
  • SplitNCigarReads
  • SplitSamFile
  • Variant Discovery Tools
  • ApplyRecalibration
  • CalculateGenotypePosteriors
  • GATKPaperGenotyper
  • GenotypeGVCFs
  • HaplotypeCaller
  • MuTect2
  • RegenotypeVariants
  • UnifiedGenotyper
  • VariantRecalibrator
  • Variant Evaluation Tools
  • GenotypeConcordance
  • ValidateVariants
  • VariantEval
  • VariantFiltration
  • Variant Manipulation Tools
  • CatVariants
  • CombineGVCFs
  • CombineVariants
  • HaplotypeResolver
  • LeftAlignAndTrimVariants
  • PhaseByTransmission
  • RandomlySplitVariants
  • ReadBackedPhasing
  • SelectHeaders
  • SelectVariants
  • ValidationSiteSelector
  • VariantAnnotator
  • VariantsToAllelicPrimitives
  • VariantsToBinaryPed
  • VariantsToTable
  • VariantsToVCF

  • Annotation Modules
  • AS_BaseQualityRankSumTest
  • AS_FisherStrand
  • AS_InbreedingCoeff
  • AS_InsertSizeRankSum
  • AS_MQMateRankSumTest
  • AS_MappingQualityRankSumTest
  • AS_QualByDepth
  • AS_RMSMappingQuality
  • AS_ReadPosRankSumTest
  • AS_StrandOddsRatio
  • AlleleBalance
  • AlleleBalanceBySample
  • AlleleCountBySample
  • BaseCounts
  • BaseCountsBySample
  • BaseQualityRankSumTest
  • BaseQualitySumPerAlleleBySample
  • ChromosomeCounts
  • ClippingRankSumTest
  • ClusteredReadPosition
  • Coverage
  • DepthPerAlleleBySample
  • DepthPerSampleHC
  • ExcessHet
  • FisherStrand
  • FractionInformativeReads
  • GCContent
  • GenotypeSummaries
  • HaplotypeScore
  • HardyWeinberg
  • HomopolymerRun
  • InbreedingCoeff
  • LikelihoodRankSumTest
  • LowMQ
  • MVLikelihoodRatio
  • MappingQualityRankSumTest
  • MappingQualityZero
  • MappingQualityZeroBySample
  • NBaseCount
  • OxoGReadCounts
  • PossibleDeNovo
  • QualByDepth
  • RMSMappingQuality
  • ReadPosRankSumTest
  • SampleList
  • SnpEff
  • SpanningDeletions
  • StrandAlleleCountsBySample
  • StrandBiasBySample
  • StrandOddsRatio
  • TandemRepeatAnnotator
  • TransmissionDisequilibriumTest
  • VariantType
  • Read Filters
  • BadCigarFilter
  • BadMateFilter
  • CountingFilteringIterator.CountingReadFilter
  • DuplicateReadFilter
  • FailsVendorQualityCheckFilter
  • HCMappingQualityFilter
  • LibraryReadFilter
  • MalformedReadFilter
  • MappingQualityFilter
  • MappingQualityUnavailableFilter
  • MappingQualityZeroFilter
  • MateSameStrandFilter
  • MaxInsertSizeFilter
  • MissingReadGroupFilter
  • NoOriginalQualityScoresFilter
  • NotPrimaryAlignmentFilter
  • OverclippedReadFilter
  • Platform454Filter
  • PlatformFilter
  • PlatformUnitFilter
  • ReadGroupBlackListFilter
  • ReadLengthFilter
  • ReadNameFilter
  • ReadStrandFilter
  • ReassignMappingQualityFilter
  • ReassignOneMappingQualityFilter
  • ReassignOriginalMQAfterIndelRealignmentFilter
  • SampleFilter
  • SingleReadGroupFilter
  • UnmappedReadFilter
  • Resource File Codecs
  • BeagleCodec
  • BedTableCodec
  • RawHapMapCodec
  • RefSeqCodec
  • SAMPileupCodec
  • SAMReadCodec
  • TableCodec

  • Reference Utilities
  • FastaAlternateReferenceMaker
  • FastaReferenceMaker
  • QCRef
  • Showing docs for version 3.7-0


    VariantAnnotator

    Annotate variant calls with context information

    Category Variant Manipulation Tools

    Traversal LocusWalker

    PartitionBy LOCUS


    Overview

    This tool is designed to annotate variant calls based on their context (as opposed to functional annotation). Various annotation modules are available; see the "Annotation Modules" page linked in the Tool Documentation sidebar for a complete list.

    Input

    A variant set to annotate and optionally one or more BAM files.

    Output

    An annotated VCF.

    Usage examples


    Annotate a VCF with dbSNP IDs and depth of coverage for each sample

     java -jar GenomeAnalysisTK.jar \
       -R reference.fasta \
       -T VariantAnnotator \
       -I input.bam \
       -V input.vcf \
       -o output.vcf \
       -A Coverage \
       -L input.vcf \
       --dbsnp dbsnp.vcf
     

    Annotate a VCF with allele frequency by an external resource. Annotation will only occur if there is allele concordance between the resource and the input VCF

     java -jar GenomeAnalysisTK.jar \
       -R reference.fasta \
       -T VariantAnnotator \
       -I input.bam \
       -V input.vcf \
       -o output.vcf \
       -L input.vcf \
       --resource:foo resource.vcf \
       -E foo.AF \
       --resourceAlleleConcordance
     

    Annotate with AF and FILTER fields from an external resource

     java -jar GenomeAnalysisTK.jar \
       -R reference.fasta \
       -T VariantAnnotator \
       -V input.vcf \
       -o output.vcf \
       --resource:foo resource.vcf \
       --expression foo.AF \
       --expression foo.FILTER
     

    Additional Information

    Read filters

    These Read Filters are automatically applied to the data by the Engine before processing by VariantAnnotator.

    Parallelism options

    This tool can be run in multi-threaded mode using this option.

    Window size

    This tool uses a sliding window on the reference.


    Command-line Arguments

    Engine arguments

    All tools inherit arguments from the GATK Engine' "CommandLineGATK" argument collection, which can be used to modify various aspects of the tool's function. For example, the -L argument directs the GATK engine to restrict processing to specific genomic intervals; or the -rf argument allows you to apply certain read filters to exclude some of the data from the analysis.

    VariantAnnotator specific arguments

    This table summarizes the command-line arguments that are specific to this tool. For more details on each argument, see the list further down below the table or click on an argument name to jump directly to that entry in the list.

    Argument name(s) Default value Summary
    Required Inputs
    --variant
     -V
    NA Input VCF file
    Optional Inputs
    --comp
    [] Comparison VCF file
    --dbsnp
     -D
    none dbSNP file
    --resource
    [] External resource VCF file
    --snpEffFile
    none SnpEff file from which to get annotations
    Optional Outputs
    --out
     -o
    stdout File to which variants should be written
    Optional Parameters
    --annotation
     -A
    [] One or more specific annotations to apply to variant calls
    --excludeAnnotation
     -XA
    [] One or more specific annotations to exclude
    --expression
     -E
    {} One or more specific expressions to apply to variant calls
    --group
     -G
    [] One or more classes/groups of annotations to apply to variant calls
    --MendelViolationGenotypeQualityThreshold
     -mvq
    0.0 GQ threshold for annotating MV ratio
    Optional Flags
    --alwaysAppendDbsnpId
    false Add dbSNP ID even if one is already present
    --list
     -ls
    false List the available annotations and exit
    --resourceAlleleConcordance
     -rac
    false Check for allele concordances when using an external resource VCF file
    --useAllAnnotations
     -all
    false Use all possible annotations (not for the faint of heart)

    Argument details

    Arguments in this list are specific to this tool. Keep in mind that other arguments are available that are shared with other tools (e.g. command-line GATK arguments); see Inherited arguments above.


    --alwaysAppendDbsnpId / -alwaysAppendDbsnpId

    Add dbSNP ID even if one is already present
    By default, a dbSNP ID is added only when the ID field in the variant record is empty (not already annotated). This argument allows you to override that behavior, and appends the new ID to the existing one. This is used in conjunction with the -dbsnp argument.

    Boolean  false


    --annotation / -A

    One or more specific annotations to apply to variant calls
    See the --list argument to view available annotations.

    List[String]  []


    --comp / -comp

    Comparison VCF file
    If a record in the 'variant' track overlaps with a record from the provided comp track, the INFO field will be annotated as such in the output with the track name (e.g. -comp:FOO will have 'FOO' in the INFO field). Records that are filtered in the comp track will be ignored. Note that 'dbSNP' has been special-cased (see the --dbsnp argument).

    This argument supports reference-ordered data (ROD) files in the following formats: BCF2, VCF, VCF3

    List[RodBinding[VariantContext]]  []


    --dbsnp / -D

    dbSNP file
    rsIDs from this file are used to populate the ID column of the output. Also, the DB INFO flag will be set when appropriate.

    This argument supports reference-ordered data (ROD) files in the following formats: BCF2, VCF, VCF3

    RodBinding[VariantContext]  none


    --excludeAnnotation / -XA

    One or more specific annotations to exclude
    Note that this argument has higher priority than the -A or -G arguments, so annotations will be excluded even if they are explicitly included with the other options.

    List[String]  []


    --expression / -E

    One or more specific expressions to apply to variant calls
    This option enables you to add annotations from one VCF to another. For example, if you want to annotate your callset with the AC field value from a VCF file named 'resource_file.vcf', you tag it with '-resource:my_resource resource_file.vcf' (see the -resource argument, also documented on this page) and you specify '-E my_resource.AC'. In the resulting output VCF, any records for which there is a record at the same position in the resource file will be annotated with 'my_resource.AC=N'. INFO field data, ID, ALT, and FILTER fields may be used as expression values. Note that if there are multiple records in the resource file that overlap the given position, one is chosen randomly.

    Set[String]  {}


    --group / -G

    One or more classes/groups of annotations to apply to variant calls
    If specified, all available annotations in the group will be applied. See the VariantAnnotator -list argument to view available groups. Keep in mind that RODRequiringAnnotations are not intended to be used as a group, because they require specific ROD inputs.

    List[String]  []


    --list / -ls

    List the available annotations and exit
    Note that the --list argument requires a fully resolved and correct command-line to work. As an alternative, you can use ListAnnotations (see Help Utilities).

    Boolean  false


    --MendelViolationGenotypeQualityThreshold / -mvq

    GQ threshold for annotating MV ratio
    The genotype quality (GQ) threshold above which the mendelian violation ratio should be annotated.

    double  0.0  [ [ -∞  ∞ ] ]


    --out / -o

    File to which variants should be written

    VariantContextWriter  stdout


    --resource / -resource

    External resource VCF file
    An external resource VCF file or files from which to annotate. Use this option to add annotations from a resource file to the output. For example, if you want to annotate your callset with the AC field value from a VCF file named 'resource_file.vcf', you tag it with '-resource:my_resource resource_file.vcf' and you additionally specify '-E my_resource.AC' (-E is short for --expression, also documented on this page). In the resulting output VCF, any records for which there is a record at the same position in the resource file will be annotated with 'my_resource.AC=N'. Note that if there are multiple records in the resource file that overlap the given position, one is chosen randomly. Check for allele concordance if using --resourceAlleleConcordance, otherwise the match is based on position only.

    This argument supports reference-ordered data (ROD) files in the following formats: BCF2, VCF, VCF3

    List[RodBinding[VariantContext]]  []


    --resourceAlleleConcordance / -rac

    Check for allele concordances when using an external resource VCF file
    If this argument is specified, add annotations (specified by --expression) from an external resource (specified by --resource) to the input VCF (specified by --variant) only if the alleles are concordant between input and the resource VCFs. Otherwise, always add the annotations.

    Boolean  false


    --snpEffFile / -snpEffFile

    SnpEff file from which to get annotations
    The INFO field will be annotated with information on the most biologically significant effect listed for each variant in the SnpEff file.

    This argument supports reference-ordered data (ROD) files in the following formats: BCF2, VCF, VCF3

    RodBinding[VariantContext]  none


    --useAllAnnotations / -all

    Use all possible annotations (not for the faint of heart)
    You can use the -XL argument in combination with this one to exclude specific annotations.Note that some annotations may not be actually applied if they are not applicable to the data provided or if they are unavailable to the tool (e.g. there are several annotations that are currently not hooked up to HaplotypeCaller). At present no error or warning message will be provided, the annotation will simply be skipped silently. You can check the output VCF header to see which annotations were actually applied (although this does not guarantee that the annotation was applied to all records in the VCF, since some annotations have additional requirements, e.g. minimum number of samples or heterozygous sites only -- see the documentation for individual annotations' requirements).

    Boolean  false


    --variant / -V

    Input VCF file
    Variants from this VCF file are used by this tool as input. The file must at least contain the standard VCF header lines, but can be empty (i.e., no variants are contained in the file).

    This argument supports reference-ordered data (ROD) files in the following formats: BCF2, VCF, VCF3

    R RodBinding[VariantContext]  NA