• Diagnostics and Quality Control Tools
  • ASEReadCounter
  • AnalyzeCovariates
  • CallableLoci
  • CheckPileup
  • CompareCallableLoci
  • ContEst
  • CountBases
  • CountIntervals
  • CountLoci
  • CountMales
  • CountRODs
  • CountRODsByRef
  • CountReadEvents
  • CountReads
  • CountTerminusEvent
  • DepthOfCoverage
  • DiagnoseTargets
  • DiffObjects
  • ErrorRatePerCycle
  • FastaStats
  • FindCoveredIntervals
  • FlagStat
  • GCContentByInterval
  • GatherBqsrReports
  • Pileup
  • PrintRODs
  • QualifyMissingIntervals
  • ReadClippingStats
  • ReadGroupProperties
  • ReadLengthDistribution
  • SimulateReadsForVariants
  • Sequence Data Processing Tools
  • BaseRecalibrator
  • ClipReads
  • IndelRealigner
  • LeftAlignIndels
  • PrintReads
  • RealignerTargetCreator
  • SplitNCigarReads
  • SplitSamFile
  • Variant Discovery Tools
  • ApplyRecalibration
  • CalculateGenotypePosteriors
  • GATKPaperGenotyper
  • GenotypeGVCFs
  • HaplotypeCaller
  • MuTect2
  • RegenotypeVariants
  • UnifiedGenotyper
  • VariantRecalibrator
  • Variant Evaluation Tools
  • GenotypeConcordance
  • ValidateVariants
  • VariantEval
  • VariantFiltration
  • Variant Manipulation Tools
  • CatVariants
  • CombineGVCFs
  • CombineVariants
  • HaplotypeResolver
  • LeftAlignAndTrimVariants
  • PhaseByTransmission
  • RandomlySplitVariants
  • ReadBackedPhasing
  • SelectHeaders
  • SelectVariants
  • ValidationSiteSelector
  • VariantAnnotator
  • VariantsToAllelicPrimitives
  • VariantsToBinaryPed
  • VariantsToTable
  • VariantsToVCF

  • Annotation Modules
  • AS_BaseQualityRankSumTest
  • AS_FisherStrand
  • AS_InbreedingCoeff
  • AS_InsertSizeRankSum
  • AS_MQMateRankSumTest
  • AS_MappingQualityRankSumTest
  • AS_QualByDepth
  • AS_RMSMappingQuality
  • AS_ReadPosRankSumTest
  • AS_StrandOddsRatio
  • AlleleBalance
  • AlleleBalanceBySample
  • AlleleCountBySample
  • BaseCounts
  • BaseCountsBySample
  • BaseQualityRankSumTest
  • BaseQualitySumPerAlleleBySample
  • ChromosomeCounts
  • ClippingRankSumTest
  • ClusteredReadPosition
  • Coverage
  • DepthPerAlleleBySample
  • DepthPerSampleHC
  • ExcessHet
  • FisherStrand
  • FractionInformativeReads
  • GCContent
  • GenotypeSummaries
  • HaplotypeScore
  • HardyWeinberg
  • HomopolymerRun
  • InbreedingCoeff
  • LikelihoodRankSumTest
  • LowMQ
  • MVLikelihoodRatio
  • MappingQualityRankSumTest
  • MappingQualityZero
  • MappingQualityZeroBySample
  • NBaseCount
  • OxoGReadCounts
  • PossibleDeNovo
  • QualByDepth
  • RMSMappingQuality
  • ReadPosRankSumTest
  • SampleList
  • SnpEff
  • SpanningDeletions
  • StrandAlleleCountsBySample
  • StrandBiasBySample
  • StrandOddsRatio
  • TandemRepeatAnnotator
  • TransmissionDisequilibriumTest
  • VariantType
  • Read Filters
  • BadCigarFilter
  • BadMateFilter
  • CountingFilteringIterator.CountingReadFilter
  • DuplicateReadFilter
  • FailsVendorQualityCheckFilter
  • HCMappingQualityFilter
  • LibraryReadFilter
  • MalformedReadFilter
  • MappingQualityFilter
  • MappingQualityUnavailableFilter
  • MappingQualityZeroFilter
  • MateSameStrandFilter
  • MaxInsertSizeFilter
  • MissingReadGroupFilter
  • NoOriginalQualityScoresFilter
  • NotPrimaryAlignmentFilter
  • OverclippedReadFilter
  • Platform454Filter
  • PlatformFilter
  • PlatformUnitFilter
  • ReadGroupBlackListFilter
  • ReadLengthFilter
  • ReadNameFilter
  • ReadStrandFilter
  • ReassignMappingQualityFilter
  • ReassignOneMappingQualityFilter
  • ReassignOriginalMQAfterIndelRealignmentFilter
  • SampleFilter
  • SingleReadGroupFilter
  • UnmappedReadFilter
  • Resource File Codecs
  • BeagleCodec
  • BedTableCodec
  • RawHapMapCodec
  • RefSeqCodec
  • SAMPileupCodec
  • SAMReadCodec
  • TableCodec

  • Reference Utilities
  • FastaAlternateReferenceMaker
  • FastaReferenceMaker
  • QCRef
  • Showing docs for version 3.7-0


    CombineGVCFs

    Combine per-sample gVCF files produced by HaplotypeCaller into a multi-sample gVCF file

    Category Variant Manipulation Tools

    Traversal LocusWalker

    PartitionBy LOCUS


    Overview

    CombineGVCFs is meant to be used for hierarchical merging of gVCFs that will eventually be input into GenotypeGVCFs. One would use this tool when needing to genotype too large a number of individual gVCFs; instead of passing them all in to GenotypeGVCFs, one would first use CombineGVCFs on smaller batches of samples and then pass these combined gVCFs to GenotypeGVCFs.

    Input

    Two or more Haplotype Caller gVCFs to combine.

    Output

    A combined multisample gVCF.

    Usage example

     java -jar GenomeAnalysisTK.jar \
       -T CombineGVCFs \
       -R reference.fasta \
       --variant sample1.g.vcf \
       --variant sample2.g.vcf \
       -o cohort.g.vcf
     

    Caveat

    Only gVCF files produced by HaplotypeCaller (or CombineGVCFs) can be used as input for this tool. Some other programs produce files that they call gVCFs but those lack some important information (accurate genotype likelihoods for every position) that GenotypeGVCFs requires for its operation.


    Additional Information

    Read filters

    These Read Filters are automatically applied to the data by the Engine before processing by CombineGVCFs.

    Window size

    This tool uses a sliding window on the reference.


    Command-line Arguments

    Engine arguments

    All tools inherit arguments from the GATK Engine' "CommandLineGATK" argument collection, which can be used to modify various aspects of the tool's function. For example, the -L argument directs the GATK engine to restrict processing to specific genomic intervals; or the -rf argument allows you to apply certain read filters to exclude some of the data from the analysis.

    CombineGVCFs specific arguments

    This table summarizes the command-line arguments that are specific to this tool. For more details on each argument, see the list further down below the table or click on an argument name to jump directly to that entry in the list.

    Argument name(s) Default value Summary
    Required Inputs
    --variant
     -V
    NA One or more input gVCF files
    Optional Inputs
    --dbsnp
     -D
    none dbSNP file
    Optional Outputs
    --out
     -o
    stdout File to which the combined gVCF should be written
    Optional Parameters
    --breakBandsAtMultiplesOf
    0 If > 0, reference bands will be broken up at genomic positions that are multiples of this number
    --group
     -G
    [StandardAnnotation] One or more classes/groups of annotations to apply to variant calls
    Optional Flags
    --convertToBasePairResolution
     -bpResolution
    false If specified, convert banded gVCFs to all-sites gVCFs
    Advanced Parameters
    --annotation
     -A
    [AS_RMSMappingQuality] One or more specific annotations to recompute. The single value 'none' removes the default annotations

    Argument details

    Arguments in this list are specific to this tool. Keep in mind that other arguments are available that are shared with other tools (e.g. command-line GATK arguments); see Inherited arguments above.


    --annotation / -A

    One or more specific annotations to recompute. The single value 'none' removes the default annotations
    Which annotations to recompute for the combined output VCF file.

    List[String]  [AS_RMSMappingQuality]


    --breakBandsAtMultiplesOf / -breakBandsAtMultiplesOf

    If > 0, reference bands will be broken up at genomic positions that are multiples of this number
    To reduce file sizes our gVCFs group similar reference positions into bands. However, there are cases when users will want to know that no bands span across a given genomic position (e.g. when scatter-gathering jobs across a compute farm). The option below enables users to break bands at pre-defined positions. For example, a value of 10,000 would mean that we would ensure that no bands span across chr1:10000, chr1:20000, etc. Note that the --convertToBasePairResolution argument is just a special case of this argument with a value of 1.

    int  0  [ [ -∞  ∞ ] ]


    --convertToBasePairResolution / -bpResolution

    If specified, convert banded gVCFs to all-sites gVCFs

    boolean  false


    --dbsnp / -D

    dbSNP file
    The rsIDs from this file are used to populate the ID column of the output. Also, the DB INFO flag will be set when appropriate. Note that dbSNP is not used in any way for the calculations themselves.

    This argument supports reference-ordered data (ROD) files in the following formats: BCF2, VCF, VCF3

    RodBinding[VariantContext]  none


    --group / -G

    One or more classes/groups of annotations to apply to variant calls
    Which groups of annotations to add to the output VCF file. The single value 'none' removes the default group. See the VariantAnnotator -list argument to view available groups. Note that this usage is not recommended because it obscures the specific requirements of individual annotations. Any requirements that are not met (e.g. failing to provide a pedigree file for a pedigree-based annotation) may cause the run to fail.

    String[]  [StandardAnnotation]


    --out / -o

    File to which the combined gVCF should be written

    VariantContextWriter  stdout


    --variant / -V

    One or more input gVCF files
    The gVCF files to merge together

    R List[RodBindingCollection[VariantContext]]  NA