• Diagnostics and Quality Control Tools
  • ASEReadCounter
  • AnalyzeCovariates
  • CallableLoci
  • CheckPileup
  • CompareCallableLoci
  • ContEst
  • CountBases
  • CountIntervals
  • CountLoci
  • CountMales
  • CountRODs
  • CountRODsByRef
  • CountReadEvents
  • CountReads
  • CountTerminusEvent
  • DepthOfCoverage
  • DiagnoseTargets
  • DiffObjects
  • ErrorRatePerCycle
  • FastaStats
  • FindCoveredIntervals
  • FlagStat
  • GCContentByInterval
  • GatherBqsrReports
  • Pileup
  • PrintRODs
  • QualifyMissingIntervals
  • ReadClippingStats
  • ReadGroupProperties
  • ReadLengthDistribution
  • SimulateReadsForVariants
  • Sequence Data Processing Tools
  • BaseRecalibrator
  • ClipReads
  • IndelRealigner
  • LeftAlignIndels
  • PrintReads
  • RealignerTargetCreator
  • SplitNCigarReads
  • SplitSamFile
  • Variant Discovery Tools
  • ApplyRecalibration
  • CalculateGenotypePosteriors
  • GATKPaperGenotyper
  • GenotypeGVCFs
  • HaplotypeCaller
  • MuTect2
  • RegenotypeVariants
  • UnifiedGenotyper
  • VariantRecalibrator
  • Variant Evaluation Tools
  • GenotypeConcordance
  • ValidateVariants
  • VariantEval
  • VariantFiltration
  • Variant Manipulation Tools
  • CatVariants
  • CombineGVCFs
  • CombineVariants
  • HaplotypeResolver
  • LeftAlignAndTrimVariants
  • PhaseByTransmission
  • RandomlySplitVariants
  • ReadBackedPhasing
  • SelectHeaders
  • SelectVariants
  • ValidationSiteSelector
  • VariantAnnotator
  • VariantsToAllelicPrimitives
  • VariantsToBinaryPed
  • VariantsToTable
  • VariantsToVCF

  • Annotation Modules
  • AS_BaseQualityRankSumTest
  • AS_FisherStrand
  • AS_InbreedingCoeff
  • AS_InsertSizeRankSum
  • AS_MQMateRankSumTest
  • AS_MappingQualityRankSumTest
  • AS_QualByDepth
  • AS_RMSMappingQuality
  • AS_ReadPosRankSumTest
  • AS_StrandOddsRatio
  • AlleleBalance
  • AlleleBalanceBySample
  • AlleleCountBySample
  • BaseCounts
  • BaseCountsBySample
  • BaseQualityRankSumTest
  • BaseQualitySumPerAlleleBySample
  • ChromosomeCounts
  • ClippingRankSumTest
  • ClusteredReadPosition
  • Coverage
  • DepthPerAlleleBySample
  • DepthPerSampleHC
  • ExcessHet
  • FisherStrand
  • FractionInformativeReads
  • GCContent
  • GenotypeSummaries
  • HaplotypeScore
  • HardyWeinberg
  • HomopolymerRun
  • InbreedingCoeff
  • LikelihoodRankSumTest
  • LowMQ
  • MVLikelihoodRatio
  • MappingQualityRankSumTest
  • MappingQualityZero
  • MappingQualityZeroBySample
  • NBaseCount
  • OxoGReadCounts
  • PossibleDeNovo
  • QualByDepth
  • RMSMappingQuality
  • ReadPosRankSumTest
  • SampleList
  • SnpEff
  • SpanningDeletions
  • StrandAlleleCountsBySample
  • StrandBiasBySample
  • StrandOddsRatio
  • TandemRepeatAnnotator
  • TransmissionDisequilibriumTest
  • VariantType
  • Read Filters
  • BadCigarFilter
  • BadMateFilter
  • CountingFilteringIterator.CountingReadFilter
  • DuplicateReadFilter
  • FailsVendorQualityCheckFilter
  • HCMappingQualityFilter
  • LibraryReadFilter
  • MalformedReadFilter
  • MappingQualityFilter
  • MappingQualityUnavailableFilter
  • MappingQualityZeroFilter
  • MateSameStrandFilter
  • MaxInsertSizeFilter
  • MissingReadGroupFilter
  • NoOriginalQualityScoresFilter
  • NotPrimaryAlignmentFilter
  • OverclippedReadFilter
  • Platform454Filter
  • PlatformFilter
  • PlatformUnitFilter
  • ReadGroupBlackListFilter
  • ReadLengthFilter
  • ReadNameFilter
  • ReadStrandFilter
  • ReassignMappingQualityFilter
  • ReassignOneMappingQualityFilter
  • ReassignOriginalMQAfterIndelRealignmentFilter
  • SampleFilter
  • SingleReadGroupFilter
  • UnmappedReadFilter
  • Resource File Codecs
  • BeagleCodec
  • BedTableCodec
  • RawHapMapCodec
  • RefSeqCodec
  • SAMPileupCodec
  • SAMReadCodec
  • TableCodec

  • Reference Utilities
  • FastaAlternateReferenceMaker
  • FastaReferenceMaker
  • QCRef
  • Showing docs for version 3.7-0


    GenotypeGVCFs

    Perform joint genotyping on gVCF files produced by HaplotypeCaller

    Category Variant Discovery Tools

    Traversal LocusWalker

    PartitionBy LOCUS


    Overview

    GenotypeGVCFs merges gVCF records that were produced as part of the Best Practices workflow for variant discovery (see Best Practices documentation for more details) using the '-ERC GVCF' or '-ERC BP_RESOLUTION' mode of the HaplotypeCaller, or result from combining such gVCF files using CombineGVCFs. This tool performs the multi-sample joint aggregation step and merges the records together in a sophisticated manner: at each position of the input gVCFs, this tool will combine all spanning records, produce correct genotype likelihoods, re-genotype the newly merged record, and then re-annotate it.

    Input

    One or more HaplotypeCaller gVCFs to genotype.

    Output

    A combined, genotyped VCF.

    Usage example

     java -jar GenomeAnalysisTK.jar \
       -T GenotypeGVCFs \
       -R reference.fasta \
       --variant sample1.g.vcf \
       --variant sample2.g.vcf \
       -o output.vcf
     

    Caveat

    Only gVCF files produced by HaplotypeCaller (or CombineGVCFs) can be used as input for this tool. Some other programs produce files that they call gVCFs but those lack some important information (accurate genotype likelihoods for every position) that GenotypeGVCFs requires for its operation.

    Special note on ploidy

    This tool is able to handle any ploidy (or mix of ploidies) intelligently; there is no need to specify ploidy for non-diploid organisms.


    Additional Information

    Read filters

    These Read Filters are automatically applied to the data by the Engine before processing by GenotypeGVCFs.

    Parallelism options

    This tool can be run in multi-threaded mode using this option.

    Window size

    This tool uses a sliding window on the reference.


    Command-line Arguments

    Engine arguments

    All tools inherit arguments from the GATK Engine' "CommandLineGATK" argument collection, which can be used to modify various aspects of the tool's function. For example, the -L argument directs the GATK engine to restrict processing to specific genomic intervals; or the -rf argument allows you to apply certain read filters to exclude some of the data from the analysis.

    GenotypeGVCFs specific arguments

    This table summarizes the command-line arguments that are specific to this tool. For more details on each argument, see the list further down below the table or click on an argument name to jump directly to that entry in the list.

    Argument name(s) Default value Summary
    Required Inputs
    --variant
     -V
    NA One or more input gVCF files
    Optional Inputs
    --dbsnp
     -D
    none dbSNP file
    Optional Outputs
    --out
     -o
    stdout File to which variants should be written
    Optional Parameters
    --group
     -G
    [StandardAnnotation] One or more classes/groups of annotations to apply to variant calls
    --heterozygosity
     -hets
    0.001 Heterozygosity value used to compute prior likelihoods for any locus
    --heterozygosity_stdev
     -heterozygosityStandardDeviation
    0.01 Standard deviation of eterozygosity for SNP and indel calling.
    --indel_heterozygosity
     -indelHeterozygosity
    1.25E-4 Heterozygosity for indel calling
    --sample_ploidy
     -ploidy
    2 Ploidy per sample. For pooled data, set to (Number of samples in each pool * Sample Ploidy).
    --standard_min_confidence_threshold_for_calling
     -stand_call_conf
    10.0 The minimum phred-scaled confidence threshold at which variants should be called
    Optional Flags
    --annotateNDA
     -nda
    false Annotate number of alleles observed
    --includeNonVariantSites
     -allSites
    false Include loci found to be non-variant after genotyping
    --useNewAFCalculator
     -newQual
    false Use new AF model instead of the so-called exact model
    Advanced Parameters
    --annotation
     -A
    [] One or more specific annotations to recompute. The single value 'none' removes the default annotations
    --input_prior
     -inputPrior
    [] Input prior for calls
    --max_alternate_alleles
     -maxAltAlleles
    6 Maximum number of alternate alleles to genotype
    --max_genotype_count
     -maxGT
    1024 Maximum number of genotypes to consider at any site
    --max_num_PL_values
     -maxNumPLValues
    100 Maximum number of PL values to output

    Argument details

    Arguments in this list are specific to this tool. Keep in mind that other arguments are available that are shared with other tools (e.g. command-line GATK arguments); see Inherited arguments above.


    --annotateNDA / -nda

    Annotate number of alleles observed
    Depending on the value of the --max_alternate_alleles argument, we may genotype only a fraction of the alleles being sent on for genotyping. Using this argument instructs the genotyper to annotate (in the INFO field) the number of alternate alleles that were originally discovered (but not necessarily genotyped) at the site.

    boolean  false


    --annotation / -A

    One or more specific annotations to recompute. The single value 'none' removes the default annotations
    Which annotations to recompute for the combined output VCF file.

    List[String]  []


    --dbsnp / -D

    dbSNP file
    The rsIDs from this file are used to populate the ID column of the output. Also, the DB INFO flag will be set when appropriate. Note that dbSNP is not used in any way for the calculations themselves.

    This argument supports reference-ordered data (ROD) files in the following formats: BCF2, VCF, VCF3

    RodBinding[VariantContext]  none


    --group / -G

    One or more classes/groups of annotations to apply to variant calls
    Which groups of annotations to add to the output VCF file. The single value 'none' removes the default group. See the VariantAnnotator -list argument to view available groups. Note that this usage is not recommended because it obscures the specific requirements of individual annotations. Any requirements that are not met (e.g. failing to provide a pedigree file for a pedigree-based annotation) may cause the run to fail.

    List[String]  [StandardAnnotation]


    --heterozygosity / -hets

    Heterozygosity value used to compute prior likelihoods for any locus
    The expected heterozygosity value used to compute prior probability that a locus is non-reference. See https://software.broadinstitute.org/gatk/documentation/article?id=8603 for more details.

    Double  0.001  [ [ -∞  ∞ ] ]


    --heterozygosity_stdev / -heterozygosityStandardDeviation

    Standard deviation of eterozygosity for SNP and indel calling.
    The standard deviation of the distribution of alt allele fractions. The above heterozygosity parameters give the *mean* of this distribution; this parameter gives its spread.

    double  0.01  [ [ -∞  ∞ ] ]


    --includeNonVariantSites / -allSites

    Include loci found to be non-variant after genotyping

    boolean  false


    --indel_heterozygosity / -indelHeterozygosity

    Heterozygosity for indel calling
    This argument informs the prior probability of having an indel at a site.

    double  1.25E-4  [ [ -∞  ∞ ] ]


    --input_prior / -inputPrior

    Input prior for calls
    By default, the prior specified with the argument --heterozygosity/-hets is used for variant discovery at a particular locus, using an infinite sites model (see e.g. Waterson, 1975 or Tajima, 1996). This model asserts that the probability of having a population of k variant sites in N chromosomes is proportional to theta/k, for 1=1:N. However, there are instances where using this prior might not be desirable, e.g. for population studies where prior might not be appropriate, as for example when the ancestral status of the reference allele is not known. This argument allows you to manually specify a list of probabilities for each AC>1 to be used as priors for genotyping, with the following restrictions: only diploid calls are supported; you must specify 2 * N values where N is the number of samples; probability values must be positive and specified in Double format, in linear space (not log10 space nor Phred-scale); and all values must sume to 1. For completely flat priors, specify the same value (=1/(2*N+1)) 2*N times, e.g. -inputPrior 0.33 -inputPrior 0.33 for the single-sample diploid case.

    List[Double]  []


    --max_alternate_alleles / -maxAltAlleles

    Maximum number of alternate alleles to genotype
    If there are more than this number of alternate alleles presented to the genotyper (either through discovery or GENOTYPE_GIVEN_ALLELES), then only this many alleles will be used. Note that genotyping sites with many alternate alleles is both CPU and memory intensive and it scales exponentially based on the number of alternate alleles. Unless there is a good reason to change the default value, we highly recommend that you not play around with this parameter. See also {@link #MAX_GENOTYPE_COUNT}.

    int  6  [ [ -∞  ∞ ] ]


    --max_genotype_count / -maxGT

    Maximum number of genotypes to consider at any site
    If there are more than this number of genotypes at a locus presented to the genotyper, then only this many genotypes will be used. This is intended to deal with sites where the combination of high ploidy and high alt allele count can lead to an explosion in the number of possible genotypes, with extreme adverse effects on runtime performance. How does it work? The possible genotypes are simply different ways of partitioning alleles given a specific ploidy assumption. Therefore, we remove genotypes from consideration by removing alternate alleles that are the least well supported. The estimate of allele support is based on the ranking of the candidate haplotypes coming out of the graph building step. Note however that the reference allele is always kept. The maximum number of alternative alleles used in the genotyping step will be the lesser of the two: 1. the largest number of alt alleles, given ploidy, that yields a genotype count no higher than {@link #MAX_GENOTYPE_COUNT} 2. the value of {@link #MAX_ALTERNATE_ALLELES} As noted above, genotyping sites with large genotype counts is both CPU and memory intensive. Unless you have a good reason to change the default value, we highly recommend that you not play around with this parameter. See also {@link #MAX_ALTERNATE_ALLELES}.

    int  1024  [ [ -∞  ∞ ] ]


    --max_num_PL_values / -maxNumPLValues

    Maximum number of PL values to output
    Determines the maximum number of PL values that will be logged in the output. If the number of genotypes (which is determined by the ploidy and the number of alleles) exceeds the value provided by this argument, then output of all of the PL values will be suppressed.

    int  100  [ [ -∞  ∞ ] ]


    --out / -o

    File to which variants should be written

    VariantContextWriter  stdout


    --sample_ploidy / -ploidy

    Ploidy per sample. For pooled data, set to (Number of samples in each pool * Sample Ploidy).
    Sample ploidy - equivalent to number of chromosome copies per pool. For pooled experiments this should be set to the number of samples in pool multiplied by individual sample ploidy.

    int  2  [ [ -∞  ∞ ] ]


    --standard_min_confidence_threshold_for_calling / -stand_call_conf

    The minimum phred-scaled confidence threshold at which variants should be called
    The minimum phred-scaled Qscore threshold to separate high confidence from low confidence calls. Only genotypes with confidence >= this threshold are emitted as called sites. A reasonable threshold is 30 for high-pass calling (this is the default).

    double  10.0  [ [ -∞  ∞ ] ]


    --useNewAFCalculator / -newQual

    Use new AF model instead of the so-called exact model
    This activates a model for calculating QUAL that was introduced in version 3.7 (November 2016). We expect this model will become the default in future versions.

    boolean  false


    --variant / -V

    One or more input gVCF files
    The gVCF files to merge together

    R List[RodBindingCollection[VariantContext]]  NA