• Diagnostics and Quality Control Tools
  • ASEReadCounter
  • AnalyzeCovariates
  • CallableLoci
  • CheckPileup
  • CompareCallableLoci
  • ContEst
  • CountBases
  • CountIntervals
  • CountLoci
  • CountMales
  • CountRODs
  • CountRODsByRef
  • CountReadEvents
  • CountReads
  • CountTerminusEvent
  • DepthOfCoverage
  • DiagnoseTargets
  • DiffObjects
  • ErrorRatePerCycle
  • FastaStats
  • FindCoveredIntervals
  • FlagStat
  • GCContentByInterval
  • GatherBqsrReports
  • Pileup
  • PrintRODs
  • QualifyMissingIntervals
  • ReadClippingStats
  • ReadGroupProperties
  • ReadLengthDistribution
  • SimulateReadsForVariants
  • Sequence Data Processing Tools
  • BaseRecalibrator
  • ClipReads
  • IndelRealigner
  • LeftAlignIndels
  • PrintReads
  • RealignerTargetCreator
  • SplitNCigarReads
  • SplitSamFile
  • Variant Discovery Tools
  • ApplyRecalibration
  • CalculateGenotypePosteriors
  • GATKPaperGenotyper
  • GenotypeGVCFs
  • HaplotypeCaller
  • MuTect2
  • RegenotypeVariants
  • UnifiedGenotyper
  • VariantRecalibrator
  • Variant Evaluation Tools
  • GenotypeConcordance
  • ValidateVariants
  • VariantEval
  • VariantFiltration
  • Variant Manipulation Tools
  • CatVariants
  • CombineGVCFs
  • CombineVariants
  • HaplotypeResolver
  • LeftAlignAndTrimVariants
  • PhaseByTransmission
  • RandomlySplitVariants
  • ReadBackedPhasing
  • SelectHeaders
  • SelectVariants
  • ValidationSiteSelector
  • VariantAnnotator
  • VariantsToAllelicPrimitives
  • VariantsToBinaryPed
  • VariantsToTable
  • VariantsToVCF

  • Annotation Modules
  • AS_BaseQualityRankSumTest
  • AS_FisherStrand
  • AS_InbreedingCoeff
  • AS_InsertSizeRankSum
  • AS_MQMateRankSumTest
  • AS_MappingQualityRankSumTest
  • AS_QualByDepth
  • AS_RMSMappingQuality
  • AS_ReadPosRankSumTest
  • AS_StrandOddsRatio
  • AlleleBalance
  • AlleleBalanceBySample
  • AlleleCountBySample
  • BaseCounts
  • BaseCountsBySample
  • BaseQualityRankSumTest
  • BaseQualitySumPerAlleleBySample
  • ChromosomeCounts
  • ClippingRankSumTest
  • ClusteredReadPosition
  • Coverage
  • DepthPerAlleleBySample
  • DepthPerSampleHC
  • ExcessHet
  • FisherStrand
  • FractionInformativeReads
  • GCContent
  • GenotypeSummaries
  • HaplotypeScore
  • HardyWeinberg
  • HomopolymerRun
  • InbreedingCoeff
  • LikelihoodRankSumTest
  • LowMQ
  • MVLikelihoodRatio
  • MappingQualityRankSumTest
  • MappingQualityZero
  • MappingQualityZeroBySample
  • NBaseCount
  • OxoGReadCounts
  • PossibleDeNovo
  • QualByDepth
  • RMSMappingQuality
  • ReadPosRankSumTest
  • SampleList
  • SnpEff
  • SpanningDeletions
  • StrandAlleleCountsBySample
  • StrandBiasBySample
  • StrandOddsRatio
  • TandemRepeatAnnotator
  • TransmissionDisequilibriumTest
  • VariantType
  • Read Filters
  • BadCigarFilter
  • BadMateFilter
  • CountingFilteringIterator.CountingReadFilter
  • DuplicateReadFilter
  • FailsVendorQualityCheckFilter
  • HCMappingQualityFilter
  • LibraryReadFilter
  • MalformedReadFilter
  • MappingQualityFilter
  • MappingQualityUnavailableFilter
  • MappingQualityZeroFilter
  • MateSameStrandFilter
  • MaxInsertSizeFilter
  • MissingReadGroupFilter
  • NoOriginalQualityScoresFilter
  • NotPrimaryAlignmentFilter
  • OverclippedReadFilter
  • Platform454Filter
  • PlatformFilter
  • PlatformUnitFilter
  • ReadGroupBlackListFilter
  • ReadLengthFilter
  • ReadNameFilter
  • ReadStrandFilter
  • ReassignMappingQualityFilter
  • ReassignOneMappingQualityFilter
  • ReassignOriginalMQAfterIndelRealignmentFilter
  • SampleFilter
  • SingleReadGroupFilter
  • UnmappedReadFilter
  • Resource File Codecs
  • BeagleCodec
  • BedTableCodec
  • RawHapMapCodec
  • RefSeqCodec
  • SAMPileupCodec
  • SAMReadCodec
  • TableCodec

  • Reference Utilities
  • FastaAlternateReferenceMaker
  • FastaReferenceMaker
  • QCRef
  • Showing docs for version 3.7-0


    CalculateGenotypePosteriors

    Calculate genotype posterior likelihoods given panel data

    Category Variant Discovery Tools

    Traversal LocusWalker

    PartitionBy LOCUS


    Overview

    Given a VCF with genotype likelihoods from the HaplotypeCaller, UnifiedGenotyper, or another source which provides unbiased genotype likelihoods, calculate the posterior genotype state and likelihood given allele frequency information from both the samples themselves and input VCFs describing allele frequencies in related populations.

    The AF field will not be used in this calculation as it does not provide a way to estimate the confidence interval or uncertainty around the allele frequency, while AN provides this necessary information. This uncertainty is modeled by a Dirichlet distribution: that is, the frequency is known up to a Dirichlet distribution with parameters AC1+q,AC2+q,...,(AN-AC1-AC2-...)+q, where "q" is the global frequency prior (typically q << 1). The genotype priors applied then follow a Dirichlet-Multinomial distribution, where 2 alleles per sample are drawn independently. This assumption of independent draws is the assumption Hardy-Weinberg Equilibrium. Thus, HWE is imposed on the likelihoods as a result of CalculateGenotypePosteriors.

    Input

    A collection of VCFs to use for informing allele frequency priors. Each VCF must have one of

    Output

    A new VCF with:

    Notes

    Using the default behavior, priors will only be applied for each variants (provided each variant has at least 10 called samples.) SNP sites in the input callset that have a SNP at the matching site in the supporting VCF will have priors applied based on the AC from the supporting samples and the input callset (unless the --ignoreInputSamples flag is used). If the site is not called in the supporting VCF, priors will be applied using the discovered AC from the input samples (unless the --discoveredACpriorsOff flag is used). Flat priors are applied for any non-SNP sites in the input callset.

    Usage examples

    Inform the genotype assignment of NA12878 using the 1000G Euro panel

     java -jar GenomeAnalysisTK.jar \
       -T CalculateGenotypePosteriors \
       -R reference.fasta \
       -V NA12878.wgs.HC.vcf \
       -supporting 1000G_EUR.genotypes.combined.vcf \
       -o NA12878.wgs.HC.posteriors.vcf 
     

    Refine the genotypes of a large panel based on the discovered allele frequency

     java -jar GenomeAnalysisTK.jar \
       -T CalculateGenotypePosteriors \
       -R reference.fasta \
       -V input.vcf \
       -o output.withPosteriors.vcf
     

    Apply frequency and HWE-based priors to the genotypes of a family without including the family allele counts in the allele frequency estimates the genotypes of a large panel based on the discovered allele frequency

     java -jar GenomeAnalysisTK.jar \
       -T CalculateGenotypePosteriors \
       -R reference.fasta \
       -V input.vcf \
       -o output.withPosteriors.vcf \
       --ignoreInputSamples
     

    Calculate the posterior genotypes of a callset, and impose that a variant *not seen* in the external panel is tantamount to being AC=0, AN=100 within that panel

     java -jar GenomeAnalysisTK.jar \
       -T CalculateGenotypePosteriors \
       -R reference.fasta \
       -supporting external.panel.vcf \
       -V input.vcf \
       -o output.withPosteriors.vcf \
       --numRefSamplesIfNoCall 100
     

    Apply only family priors to a callset

     java -jar GenomeAnalysisTK.jar \
       -T CalculateGenotypePosteriors \
       -R reference.fasta \
       -V input.vcf \
       --skipPopulationPriors \
       -ped family.ped \
       -o output.withPosteriors.vcf
     

    Additional Information

    Read filters

    These Read Filters are automatically applied to the data by the Engine before processing by CalculateGenotypePosteriors.


    Command-line Arguments

    Engine arguments

    All tools inherit arguments from the GATK Engine' "CommandLineGATK" argument collection, which can be used to modify various aspects of the tool's function. For example, the -L argument directs the GATK engine to restrict processing to specific genomic intervals; or the -rf argument allows you to apply certain read filters to exclude some of the data from the analysis.

    CalculateGenotypePosteriors specific arguments

    This table summarizes the command-line arguments that are specific to this tool. For more details on each argument, see the list further down below the table or click on an argument name to jump directly to that entry in the list.

    Argument name(s) Default value Summary
    Required Inputs
    --variant
     -V
    NA Input VCF file
    Optional Inputs
    --supporting
    [] Other callsets to use in generating genotype posteriors
    Optional Outputs
    --out
     -o
    stdout File to which variants should be written
    Optional Parameters
    --deNovoPrior
     -DNP
    1.0E-6 The de novo mutation prior
    --globalPrior
     -G
    0.001 The global Dirichlet prior parameters for the allele frequency
    --numRefSamplesIfNoCall
     -nrs
    0 The number of homozygous reference to infer were seen at a position where an "other callset" contains no site or genotype information
    Optional Flags
    --defaultToAC
     -useAC
    false Use the AC field as opposed to MLEAC. Does nothing if VCF lacks MLEAC field
    --discoveredACpriorsOff
     -useACoff
    false Do not use discovered allele count in the input callset for variants that do not appear in the external callset.
    --ignoreInputSamples
     -ext
    false Use external information only; do not inform genotype priors by the discovered allele frequency in the callset whose posteriors are being calculated. Useful for callsets containing related individuals.
    --skipFamilyPriors
     -skipFam
    false Skip application of family-based priors
    --skipPopulationPriors
     -skipPop
    false Skip application of population-based priors

    Argument details

    Arguments in this list are specific to this tool. Keep in mind that other arguments are available that are shared with other tools (e.g. command-line GATK arguments); see Inherited arguments above.


    --defaultToAC / -useAC

    Use the AC field as opposed to MLEAC. Does nothing if VCF lacks MLEAC field
    Rather than looking for the MLEAC field first, and then falling back to AC; first look for the AC field and then fall back to MLEAC or raw genotypes

    boolean  false


    --deNovoPrior / -DNP

    The de novo mutation prior
    The mutation prior -- i.e. the probability that a new mutation occurs. Sensitivity analysis on known de novo mutations suggests a default value of 10^-6.

    double  1.0E-6  [ [ -∞  ∞ ] ]


    --discoveredACpriorsOff / -useACoff

    Do not use discovered allele count in the input callset for variants that do not appear in the external callset.
    Calculate priors for missing external variants from sample data -- default behavior is to apply flat priors

    boolean  false


    --globalPrior / -G

    The global Dirichlet prior parameters for the allele frequency
    The global prior of a variant site -- i.e. the expected allele frequency distribution knowing only that N alleles exist, and having observed none of them. This is the "typical" 1/x trend, modeled here as not varying across alleles. The calculation for this parameter is (Effective population size) * (steady state mutation rate)

    double  0.001  [ [ -∞  ∞ ] ]


    --ignoreInputSamples / -ext

    Use external information only; do not inform genotype priors by the discovered allele frequency in the callset whose posteriors are being calculated. Useful for callsets containing related individuals.
    Do not use the [MLE] allele count from the input samples (the ones for which you're calculating posteriors) in the site frequency distribution; only use the AC and AN calculated from external sources.

    boolean  false


    --numRefSamplesIfNoCall / -nrs

    The number of homozygous reference to infer were seen at a position where an "other callset" contains no site or genotype information
    When a variant is not seen in a panel, whether to infer (and with what effective strength) that only reference alleles were ascertained at that site. E.g. "If not seen in 1000Genomes, treat it as AC=0, AN=2000". This is applied across all external panels, so if numRefIsMissing = 10, and the variant is absent in two panels, this confers evidence of AC=0,AN=20

    int  0  [ [ -∞  ∞ ] ]


    --out / -o

    File to which variants should be written

    VariantContextWriter  stdout


    --skipFamilyPriors / -skipFam

    Skip application of family-based priors
    Skip application of family-based priors. Note: if pedigree file is absent, family-based priors will be skipped.

    boolean  false


    --skipPopulationPriors / -skipPop

    Skip application of population-based priors
    Skip application of population-based priors

    boolean  false


    --supporting / -supporting

    Other callsets to use in generating genotype posteriors
    Supporting external panels. Allele counts from these panels (taken from AC,AN or MLEAC,AN or raw genotypes) will be used to inform the frequency distribution underying the genotype priors. These files must be VCF 4.2 spec or later.

    This argument supports reference-ordered data (ROD) files in the following formats: BCF2, VCF, VCF3

    List[RodBinding[VariantContext]]  []


    --variant / -V

    Input VCF file
    Variants from this VCF file are used by this tool as input. The file must at least contain the standard VCF header lines, but can be empty (i.e., no variants are contained in the file).

    This argument supports reference-ordered data (ROD) files in the following formats: BCF2, VCF, VCF3

    R RodBinding[VariantContext]  NA