• Diagnostics and Quality Control Tools
  • ASEReadCounter
  • AnalyzeCovariates
  • CallableLoci
  • CheckPileup
  • CompareCallableLoci
  • ContEst
  • CountBases
  • CountIntervals
  • CountLoci
  • CountMales
  • CountRODs
  • CountRODsByRef
  • CountReadEvents
  • CountReads
  • CountTerminusEvent
  • DepthOfCoverage
  • DiagnoseTargets
  • DiffObjects
  • ErrorRatePerCycle
  • FastaStats
  • FindCoveredIntervals
  • FlagStat
  • GCContentByInterval
  • GatherBqsrReports
  • Pileup
  • PrintRODs
  • QualifyMissingIntervals
  • ReadClippingStats
  • ReadGroupProperties
  • ReadLengthDistribution
  • SimulateReadsForVariants
  • Sequence Data Processing Tools
  • BaseRecalibrator
  • ClipReads
  • IndelRealigner
  • LeftAlignIndels
  • PrintReads
  • RealignerTargetCreator
  • SplitNCigarReads
  • SplitSamFile
  • Variant Discovery Tools
  • ApplyRecalibration
  • CalculateGenotypePosteriors
  • GATKPaperGenotyper
  • GenotypeGVCFs
  • HaplotypeCaller
  • MuTect2
  • RegenotypeVariants
  • UnifiedGenotyper
  • VariantRecalibrator
  • Variant Evaluation Tools
  • GenotypeConcordance
  • ValidateVariants
  • VariantEval
  • VariantFiltration
  • Variant Manipulation Tools
  • CatVariants
  • CombineGVCFs
  • CombineVariants
  • HaplotypeResolver
  • LeftAlignAndTrimVariants
  • PhaseByTransmission
  • RandomlySplitVariants
  • ReadBackedPhasing
  • SelectHeaders
  • SelectVariants
  • ValidationSiteSelector
  • VariantAnnotator
  • VariantsToAllelicPrimitives
  • VariantsToBinaryPed
  • VariantsToTable
  • VariantsToVCF

  • Annotation Modules
  • AS_BaseQualityRankSumTest
  • AS_FisherStrand
  • AS_InbreedingCoeff
  • AS_InsertSizeRankSum
  • AS_MQMateRankSumTest
  • AS_MappingQualityRankSumTest
  • AS_QualByDepth
  • AS_RMSMappingQuality
  • AS_ReadPosRankSumTest
  • AS_StrandOddsRatio
  • AlleleBalance
  • AlleleBalanceBySample
  • AlleleCountBySample
  • BaseCounts
  • BaseCountsBySample
  • BaseQualityRankSumTest
  • BaseQualitySumPerAlleleBySample
  • ChromosomeCounts
  • ClippingRankSumTest
  • ClusteredReadPosition
  • Coverage
  • DepthPerAlleleBySample
  • DepthPerSampleHC
  • ExcessHet
  • FisherStrand
  • FractionInformativeReads
  • GCContent
  • GenotypeSummaries
  • HaplotypeScore
  • HardyWeinberg
  • HomopolymerRun
  • InbreedingCoeff
  • LikelihoodRankSumTest
  • LowMQ
  • MVLikelihoodRatio
  • MappingQualityRankSumTest
  • MappingQualityZero
  • MappingQualityZeroBySample
  • NBaseCount
  • OxoGReadCounts
  • PossibleDeNovo
  • QualByDepth
  • RMSMappingQuality
  • ReadPosRankSumTest
  • SampleList
  • SnpEff
  • SpanningDeletions
  • StrandAlleleCountsBySample
  • StrandBiasBySample
  • StrandOddsRatio
  • TandemRepeatAnnotator
  • TransmissionDisequilibriumTest
  • VariantType
  • Read Filters
  • BadCigarFilter
  • BadMateFilter
  • CountingFilteringIterator.CountingReadFilter
  • DuplicateReadFilter
  • FailsVendorQualityCheckFilter
  • HCMappingQualityFilter
  • LibraryReadFilter
  • MalformedReadFilter
  • MappingQualityFilter
  • MappingQualityUnavailableFilter
  • MappingQualityZeroFilter
  • MateSameStrandFilter
  • MaxInsertSizeFilter
  • MissingReadGroupFilter
  • NoOriginalQualityScoresFilter
  • NotPrimaryAlignmentFilter
  • OverclippedReadFilter
  • Platform454Filter
  • PlatformFilter
  • PlatformUnitFilter
  • ReadGroupBlackListFilter
  • ReadLengthFilter
  • ReadNameFilter
  • ReadStrandFilter
  • ReassignMappingQualityFilter
  • ReassignOneMappingQualityFilter
  • ReassignOriginalMQAfterIndelRealignmentFilter
  • SampleFilter
  • SingleReadGroupFilter
  • UnmappedReadFilter
  • Resource File Codecs
  • BeagleCodec
  • BedTableCodec
  • RawHapMapCodec
  • RefSeqCodec
  • SAMPileupCodec
  • SAMReadCodec
  • TableCodec

  • Reference Utilities
  • FastaAlternateReferenceMaker
  • FastaReferenceMaker
  • QCRef
  • Showing docs for version 3.7-0


    ValidateVariants

    Validate a VCF file with an extra strict set of criteria

    Category Variant Evaluation Tools

    Traversal LocusWalker

    PartitionBy LOCUS


    Overview

    This tool is designed to validate much of the information inside a VCF file. In addition to standard adherence to the VCF specification, this tool performs extra strict validations to ensure the information contained within the file is correct. These include:

    REF
    the correctness of the reference base(s).
    CHR_COUNTS
    accuracy of AC & AN values.
    IDS
    tests against rsIDs when a dbSNP file is provided. Notice that for this one to work, you need to provide a reference to the dbsnp variant containing file using the --dbsnp as show in examples below.
    ALLELES
    and that all alternate alleles are present in at least one sample.

    By default it will apply all the strict validations unless you indicate which one you want you want to exclude using -Xtype|--validationTypeToExclude <code>, where code is one of the listed above. You can exclude as many types as you want

    Yo can exclude all strict validations with the special code ALL. In this case the tool will only test the adherence to the VCF specification.

    Input

    A variant set to validate using -V or --variant as shown below.

    Usage examples

    To perform VCF format tests and all strict validations

     java -jar GenomeAnalysisTK.jar \
       -T ValidateVariants \
       -R reference.fasta \
       -V input.vcf \
       --dbsnp dbsnp.vcf
     

    To perform VCF format tests and all strict validations with the VCFs containing alleles <= 208 bases

     java -jar GenomeAnalysisTK.jar \
       -T ValidateVariants \
       -R reference.fasta \
       -V input.vcf \
       --dbsnp dbsnp.vcf
       --reference_window_stop 208
     

    To perform only VCF format tests

     java -jar GenomeAnalysisTK.jar \
       -T ValidateVariants \
       -R reference.fasta \
       -V input.vcf \
       --validationTypeToExclude ALL
     

    To perform all validations except the strict ALLELE validation

     java -jar GenomeAnalysisTK.jar \
       -T ValidateVariants \
       -R reference.fasta \
       -V input.vcf \
       --validationTypeToExclude ALLELES
     

    Additional Information

    Read filters

    These Read Filters are automatically applied to the data by the Engine before processing by ValidateVariants.

    Window size

    This tool uses a sliding window on the reference.


    Command-line Arguments

    Engine arguments

    All tools inherit arguments from the GATK Engine' "CommandLineGATK" argument collection, which can be used to modify various aspects of the tool's function. For example, the -L argument directs the GATK engine to restrict processing to specific genomic intervals; or the -rf argument allows you to apply certain read filters to exclude some of the data from the analysis.

    ValidateVariants specific arguments

    This table summarizes the command-line arguments that are specific to this tool. For more details on each argument, see the list further down below the table or click on an argument name to jump directly to that entry in the list.

    Argument name(s) Default value Summary
    Required Inputs
    --variant
     -V
    NA Input VCF file
    Optional Inputs
    --dbsnp
     -D
    none dbSNP file
    Optional Parameters
    --validationTypeToExclude
     -Xtype
    [] which validation type to exclude from a full strict validation
    Optional Flags
    --doNotValidateFilteredRecords
    false skip validation on filtered records
    --validateGVCF
     -gvcf
    false Validate this file as a GVCF
    --warnOnErrors
    false just emit warnings on errors instead of terminating the run at the first instance

    Argument details

    Arguments in this list are specific to this tool. Keep in mind that other arguments are available that are shared with other tools (e.g. command-line GATK arguments); see Inherited arguments above.


    --dbsnp / -D

    dbSNP file

    This argument supports reference-ordered data (ROD) files in the following formats: BCF2, VCF, VCF3

    RodBinding[VariantContext]  none


    --doNotValidateFilteredRecords / -doNotValidateFilteredRecords

    skip validation on filtered records
    By default, even filtered records are validated.

    Boolean  false


    --validateGVCF / -gvcf

    Validate this file as a GVCF
    This validation option REQUIRES that the input GVCF satisfies the following conditions: (1) every variant record must feature an allele in the list of ALT alleles, and (2) every position in the genomic territory under consideration must covered by a record, whether a single-position record or a reference block record. If the analysis that produced the file was restricted to a subset of genomic regions (for example using the -L or -XL arguments), the same intervals must be provided for validation. Otherwise, the validation tool will find positions that are not covered by records and will fail.

    Boolean  false


    --validationTypeToExclude / -Xtype

    which validation type to exclude from a full strict validation

    List[ValidationType]  []


    --variant / -V

    Input VCF file
    Variants from this VCF file are used by this tool as input. The file must at least contain the standard VCF header lines, but can be empty (i.e., no variants are contained in the file).

    This argument supports reference-ordered data (ROD) files in the following formats: BCF2, VCF, VCF3

    R RodBinding[VariantContext]  NA


    --warnOnErrors / -warnOnErrors

    just emit warnings on errors instead of terminating the run at the first instance

    Boolean  false