Validate a VCF file with an extra strict set of criteria
This tool is designed to validate much of the information inside a VCF file. In addition to standard adherence to the VCF specification, this tool performs extra strict validations to ensure the information contained within the file is correct. These include:
--dbsnp
as show in examples below.
By default it will apply all the strict validations unless you indicate which one you want you want to exclude
using -Xtype|--validationTypeToExclude <code>
, where code is one of the listed above. You
can exclude as many types as you want
Yo can exclude all strict validations with the special code ALL
. In this case the tool will only
test the adherence to the VCF specification.
A variant set to validate using -V
or --variant
as shown below.
java -jar GenomeAnalysisTK.jar \ -T ValidateVariants \ -R reference.fasta \ -V input.vcf \ --dbsnp dbsnp.vcf
java -jar GenomeAnalysisTK.jar \ -T ValidateVariants \ -R reference.fasta \ -V input.vcf \ --dbsnp dbsnp.vcf --reference_window_stop 208
java -jar GenomeAnalysisTK.jar \ -T ValidateVariants \ -R reference.fasta \ -V input.vcf \ --validationTypeToExclude ALL
java -jar GenomeAnalysisTK.jar \ -T ValidateVariants \ -R reference.fasta \ -V input.vcf \ --validationTypeToExclude ALLELES
These Read Filters are automatically applied to the data by the Engine before processing by ValidateVariants.
This tool uses a sliding window on the reference.
All tools inherit arguments from the GATK Engine' "CommandLineGATK" argument collection, which can be used to modify various aspects of the tool's function. For example, the -L argument directs the GATK engine to restrict processing to specific genomic intervals; or the -rf argument allows you to apply certain read filters to exclude some of the data from the analysis.
This table summarizes the command-line arguments that are specific to this tool. For more details on each argument, see the list further down below the table or click on an argument name to jump directly to that entry in the list.
Argument name(s) | Default value | Summary | |
---|---|---|---|
Required Inputs | |||
--variant -V |
NA | Input VCF file | |
Optional Inputs | |||
--dbsnp -D |
none | dbSNP file | |
Optional Parameters | |||
--validationTypeToExclude -Xtype |
[] | which validation type to exclude from a full strict validation | |
Optional Flags | |||
--doNotValidateFilteredRecords |
false | skip validation on filtered records | |
--validateGVCF -gvcf |
false | Validate this file as a GVCF | |
--warnOnErrors |
false | just emit warnings on errors instead of terminating the run at the first instance |
Arguments in this list are specific to this tool. Keep in mind that other arguments are available that are shared with other tools (e.g. command-line GATK arguments); see Inherited arguments above.
dbSNP file
This argument supports reference-ordered data (ROD) files in the following formats: BCF2, VCF, VCF3
RodBinding[VariantContext] none
skip validation on filtered records
By default, even filtered records are validated.
Boolean false
Validate this file as a GVCF
This validation option REQUIRES that the input GVCF satisfies the following conditions:
(1) every variant record must feature an allele in the list of ALT alleles, and
(2) every position in the genomic territory under consideration must covered by a record, whether a single-position record or a reference block record.
If the analysis that produced the file was restricted to a subset of genomic regions (for example using the -L or -XL arguments), the same intervals must be provided for validation.
Otherwise, the validation tool will find positions that are not covered by records and will fail.
Boolean false
which validation type to exclude from a full strict validation
List[ValidationType] []
Input VCF file
Variants from this VCF file are used by this tool as input.
The file must at least contain the standard VCF header lines, but
can be empty (i.e., no variants are contained in the file).
This argument supports reference-ordered data (ROD) files in the following formats: BCF2, VCF, VCF3
R RodBinding[VariantContext] NA
just emit warnings on errors instead of terminating the run at the first instance
Boolean false