Concatenate VCF files of non-overlapping genome intervals, all with the same set of samples
The main purpose of this tool is to speed up the gather function when using scatter-gather parallelization. This tool concatenates the scattered output VCF files. It assumes that:
When the input files are already sorted based on the intervals start positions, use -assumeSorted.
Two or more variant sets to combine. They should be of non-overlapping genome intervals and with the same samples (sorted in the same order). If the files are ordered according to the appearance of intervals in the ref genome, then one can use the -assumeSorted flag.
A combined VCF or BCF. The output file should have the same extension as the input(s). <\p>
This is a command-line utility that bypasses the GATK engine. As a result, the command-line you must use to invoke it is a little different from other GATK tools (see example below), and it does not accept any of the classic "CommandLineGATK" arguments.
java -cp GenomeAnalysisTK.jar org.broadinstitute.gatk.tools.CatVariants \ -R reference.fasta \ -V input1.vcf \ -V input2.vcf \ -out output.vcf \ -assumeSorted
Currently the tool is more efficient when working with VCFs than with BCFs.
This table summarizes the command-line arguments that are specific to this tool. For more details on each argument, see the list further down below the table or click on an argument name to jump directly to that entry in the list.
Argument name(s) | Default value | Summary | |
---|---|---|---|
Required Inputs | |||
--reference -R |
NA | genome reference file |
|
--variant -V |
NA | Input VCF file/s | |
Required Outputs | |||
--outputFile -out |
NA | output file | |
Optional Outputs | |||
--log_to_file -log |
NA | Set the logging location | |
Optional Parameters | |||
--logging_level -l |
INFO | Set the minimum level of logging | |
--variant_index_parameter |
-1 | the parameter (bin width or features per bin) to pass to the VCF/BCF IndexCreator | |
--variant_index_type |
DYNAMIC_SEEK | which type of IndexCreator to use for VCF/BCF indices | |
Optional Flags | |||
--assumeSorted |
false | assumeSorted should be true if the input files are already sorted (based on the position of the variants) | |
--help -h |
false | Generate the help message | |
--version |
false | Output version information |
Arguments in this list are specific to this tool. Keep in mind that other arguments are available that are shared with other tools (e.g. command-line GATK arguments); see Inherited arguments above.
assumeSorted should be true if the input files are already sorted (based on the position of the variants)
Boolean false
Generate the help message
This will produce a help message in the terminal with general usage information, listing available arguments
as well as tool-specific information if applicable.
Boolean false
Set the logging location
File to save the logging output.
String NA
Set the minimum level of logging
Setting INFO gets you INFO up to FATAL, setting ERROR gets you ERROR and FATAL level logging, and so on.
String INFO
output file
R File NA
genome reference file
R File NA
Input VCF file/s
The VCF or BCF files to merge together
CatVariants can take any number of -V arguments on the command line. Each -V argument
will be included in the final merged output VCF/BCF. The order of arguments does not matter, but it runs more
efficiently if they are sorted based on the intervals and the assumeSorted argument is used.
R List[File] NA
the parameter (bin width or features per bin) to pass to the VCF/BCF IndexCreator
Integer -1 [ [ -∞ ∞ ] ]
which type of IndexCreator to use for VCF/BCF indices
The --variant_index_type argument is an enumerated type (GATKVCFIndexType), which can have one of the following values:
GATKVCFIndexType DYNAMIC_SEEK
Output version information
Use this to check the version number of the GATK executable you are invoking. Note that the version number is
always included in the output at the start of every run as well as any error message.
Boolean false