• Diagnostics and Quality Control Tools
  • ASEReadCounter
  • AnalyzeCovariates
  • CallableLoci
  • CheckPileup
  • CompareCallableLoci
  • ContEst
  • CountBases
  • CountIntervals
  • CountLoci
  • CountMales
  • CountRODs
  • CountRODsByRef
  • CountReadEvents
  • CountReads
  • CountTerminusEvent
  • DepthOfCoverage
  • DiagnoseTargets
  • DiffObjects
  • ErrorRatePerCycle
  • FastaStats
  • FindCoveredIntervals
  • FlagStat
  • GCContentByInterval
  • GatherBqsrReports
  • Pileup
  • PrintRODs
  • QualifyMissingIntervals
  • ReadClippingStats
  • ReadGroupProperties
  • ReadLengthDistribution
  • SimulateReadsForVariants
  • Sequence Data Processing Tools
  • BaseRecalibrator
  • ClipReads
  • IndelRealigner
  • LeftAlignIndels
  • PrintReads
  • RealignerTargetCreator
  • SplitNCigarReads
  • SplitSamFile
  • Variant Discovery Tools
  • ApplyRecalibration
  • CalculateGenotypePosteriors
  • GATKPaperGenotyper
  • GenotypeGVCFs
  • HaplotypeCaller
  • MuTect2
  • RegenotypeVariants
  • UnifiedGenotyper
  • VariantRecalibrator
  • Variant Evaluation Tools
  • GenotypeConcordance
  • ValidateVariants
  • VariantEval
  • VariantFiltration
  • Variant Manipulation Tools
  • CatVariants
  • CombineGVCFs
  • CombineVariants
  • HaplotypeResolver
  • LeftAlignAndTrimVariants
  • PhaseByTransmission
  • RandomlySplitVariants
  • ReadBackedPhasing
  • SelectHeaders
  • SelectVariants
  • ValidationSiteSelector
  • VariantAnnotator
  • VariantsToAllelicPrimitives
  • VariantsToBinaryPed
  • VariantsToTable
  • VariantsToVCF

  • Annotation Modules
  • AS_BaseQualityRankSumTest
  • AS_FisherStrand
  • AS_InbreedingCoeff
  • AS_InsertSizeRankSum
  • AS_MQMateRankSumTest
  • AS_MappingQualityRankSumTest
  • AS_QualByDepth
  • AS_RMSMappingQuality
  • AS_ReadPosRankSumTest
  • AS_StrandOddsRatio
  • AlleleBalance
  • AlleleBalanceBySample
  • AlleleCountBySample
  • BaseCounts
  • BaseCountsBySample
  • BaseQualityRankSumTest
  • BaseQualitySumPerAlleleBySample
  • ChromosomeCounts
  • ClippingRankSumTest
  • ClusteredReadPosition
  • Coverage
  • DepthPerAlleleBySample
  • DepthPerSampleHC
  • ExcessHet
  • FisherStrand
  • FractionInformativeReads
  • GCContent
  • GenotypeSummaries
  • HaplotypeScore
  • HardyWeinberg
  • HomopolymerRun
  • InbreedingCoeff
  • LikelihoodRankSumTest
  • LowMQ
  • MVLikelihoodRatio
  • MappingQualityRankSumTest
  • MappingQualityZero
  • MappingQualityZeroBySample
  • NBaseCount
  • OxoGReadCounts
  • PossibleDeNovo
  • QualByDepth
  • RMSMappingQuality
  • ReadPosRankSumTest
  • SampleList
  • SnpEff
  • SpanningDeletions
  • StrandAlleleCountsBySample
  • StrandBiasBySample
  • StrandOddsRatio
  • TandemRepeatAnnotator
  • TransmissionDisequilibriumTest
  • VariantType
  • Read Filters
  • BadCigarFilter
  • BadMateFilter
  • CountingFilteringIterator.CountingReadFilter
  • DuplicateReadFilter
  • FailsVendorQualityCheckFilter
  • HCMappingQualityFilter
  • LibraryReadFilter
  • MalformedReadFilter
  • MappingQualityFilter
  • MappingQualityUnavailableFilter
  • MappingQualityZeroFilter
  • MateSameStrandFilter
  • MaxInsertSizeFilter
  • MissingReadGroupFilter
  • NoOriginalQualityScoresFilter
  • NotPrimaryAlignmentFilter
  • OverclippedReadFilter
  • Platform454Filter
  • PlatformFilter
  • PlatformUnitFilter
  • ReadGroupBlackListFilter
  • ReadLengthFilter
  • ReadNameFilter
  • ReadStrandFilter
  • ReassignMappingQualityFilter
  • ReassignOneMappingQualityFilter
  • ReassignOriginalMQAfterIndelRealignmentFilter
  • SampleFilter
  • SingleReadGroupFilter
  • UnmappedReadFilter
  • Resource File Codecs
  • BeagleCodec
  • BedTableCodec
  • RawHapMapCodec
  • RefSeqCodec
  • SAMPileupCodec
  • SAMReadCodec
  • TableCodec

  • Reference Utilities
  • FastaAlternateReferenceMaker
  • FastaReferenceMaker
  • QCRef
  • Showing docs for version 3.7-0


    CatVariants

    Concatenate VCF files of non-overlapping genome intervals, all with the same set of samples

    Category Variant Manipulation Tools


    Overview

    The main purpose of this tool is to speed up the gather function when using scatter-gather parallelization. This tool concatenates the scattered output VCF files. It assumes that:

    When the input files are already sorted based on the intervals start positions, use -assumeSorted.

    Input

    Two or more variant sets to combine. They should be of non-overlapping genome intervals and with the same samples (sorted in the same order). If the files are ordered according to the appearance of intervals in the ref genome, then one can use the -assumeSorted flag.

    Output

    A combined VCF or BCF. The output file should have the same extension as the input(s). <\p>

    Important note

    This is a command-line utility that bypasses the GATK engine. As a result, the command-line you must use to invoke it is a little different from other GATK tools (see example below), and it does not accept any of the classic "CommandLineGATK" arguments.

    Usage example

     java -cp GenomeAnalysisTK.jar org.broadinstitute.gatk.tools.CatVariants \
        -R reference.fasta \
        -V input1.vcf \
        -V input2.vcf \
        -out output.vcf \
        -assumeSorted
     

    Caveat

    Currently the tool is more efficient when working with VCFs than with BCFs.


    Command-line Arguments

    CatVariants specific arguments

    This table summarizes the command-line arguments that are specific to this tool. For more details on each argument, see the list further down below the table or click on an argument name to jump directly to that entry in the list.

    Argument name(s) Default value Summary
    Required Inputs
    --reference
     -R
    NA genome reference file .fasta
    --variant
     -V
    NA Input VCF file/s
    Required Outputs
    --outputFile
     -out
    NA output file
    Optional Outputs
    --log_to_file
     -log
    NA Set the logging location
    Optional Parameters
    --logging_level
     -l
    INFO Set the minimum level of logging
    --variant_index_parameter
    -1 the parameter (bin width or features per bin) to pass to the VCF/BCF IndexCreator
    --variant_index_type
    DYNAMIC_SEEK which type of IndexCreator to use for VCF/BCF indices
    Optional Flags
    --assumeSorted
    false assumeSorted should be true if the input files are already sorted (based on the position of the variants)
    --help
     -h
    false Generate the help message
    --version
    false Output version information

    Argument details

    Arguments in this list are specific to this tool. Keep in mind that other arguments are available that are shared with other tools (e.g. command-line GATK arguments); see Inherited arguments above.


    --assumeSorted / -assumeSorted

    assumeSorted should be true if the input files are already sorted (based on the position of the variants)

    Boolean  false


    --help / -h

    Generate the help message
    This will produce a help message in the terminal with general usage information, listing available arguments as well as tool-specific information if applicable.

    Boolean  false


    --log_to_file / -log

    Set the logging location
    File to save the logging output.

    String  NA


    --logging_level / -l

    Set the minimum level of logging
    Setting INFO gets you INFO up to FATAL, setting ERROR gets you ERROR and FATAL level logging, and so on.

    String  INFO


    --outputFile / -out

    output file

    R File  NA


    --reference / -R

    genome reference file .fasta

    R File  NA


    --variant / -V

    Input VCF file/s
    The VCF or BCF files to merge together CatVariants can take any number of -V arguments on the command line. Each -V argument will be included in the final merged output VCF/BCF. The order of arguments does not matter, but it runs more efficiently if they are sorted based on the intervals and the assumeSorted argument is used.

    R List[File]  NA


    --variant_index_parameter / NA

    the parameter (bin width or features per bin) to pass to the VCF/BCF IndexCreator

    Integer  -1  [ [ -∞  ∞ ] ]


    --variant_index_type / NA

    which type of IndexCreator to use for VCF/BCF indices

    The --variant_index_type argument is an enumerated type (GATKVCFIndexType), which can have one of the following values:

    DYNAMIC_SEEK
    DYNAMIC_SIZE
    LINEAR
    INTERVAL

    GATKVCFIndexType  DYNAMIC_SEEK


    --version / -version

    Output version information
    Use this to check the version number of the GATK executable you are invoking. Note that the version number is always included in the output at the start of every run as well as any error message.

    Boolean  false