• Diagnostics and Quality Control Tools
  • ASEReadCounter
  • AnalyzeCovariates
  • CallableLoci
  • CheckPileup
  • CompareCallableLoci
  • ContEst
  • CountBases
  • CountIntervals
  • CountLoci
  • CountMales
  • CountRODs
  • CountRODsByRef
  • CountReadEvents
  • CountReads
  • CountTerminusEvent
  • DepthOfCoverage
  • DiagnoseTargets
  • DiffObjects
  • ErrorRatePerCycle
  • FastaStats
  • FindCoveredIntervals
  • FlagStat
  • GCContentByInterval
  • GatherBqsrReports
  • Pileup
  • PrintRODs
  • QualifyMissingIntervals
  • ReadClippingStats
  • ReadGroupProperties
  • ReadLengthDistribution
  • SimulateReadsForVariants
  • Sequence Data Processing Tools
  • BaseRecalibrator
  • ClipReads
  • IndelRealigner
  • LeftAlignIndels
  • PrintReads
  • RealignerTargetCreator
  • SplitNCigarReads
  • SplitSamFile
  • Variant Discovery Tools
  • ApplyRecalibration
  • CalculateGenotypePosteriors
  • GATKPaperGenotyper
  • GenotypeGVCFs
  • HaplotypeCaller
  • MuTect2
  • RegenotypeVariants
  • UnifiedGenotyper
  • VariantRecalibrator
  • Variant Evaluation Tools
  • GenotypeConcordance
  • ValidateVariants
  • VariantEval
  • VariantFiltration
  • Variant Manipulation Tools
  • CatVariants
  • CombineGVCFs
  • CombineVariants
  • HaplotypeResolver
  • LeftAlignAndTrimVariants
  • PhaseByTransmission
  • RandomlySplitVariants
  • ReadBackedPhasing
  • SelectHeaders
  • SelectVariants
  • ValidationSiteSelector
  • VariantAnnotator
  • VariantsToAllelicPrimitives
  • VariantsToBinaryPed
  • VariantsToTable
  • VariantsToVCF

  • Annotation Modules
  • AS_BaseQualityRankSumTest
  • AS_FisherStrand
  • AS_InbreedingCoeff
  • AS_InsertSizeRankSum
  • AS_MQMateRankSumTest
  • AS_MappingQualityRankSumTest
  • AS_QualByDepth
  • AS_RMSMappingQuality
  • AS_ReadPosRankSumTest
  • AS_StrandOddsRatio
  • AlleleBalance
  • AlleleBalanceBySample
  • AlleleCountBySample
  • BaseCounts
  • BaseCountsBySample
  • BaseQualityRankSumTest
  • BaseQualitySumPerAlleleBySample
  • ChromosomeCounts
  • ClippingRankSumTest
  • ClusteredReadPosition
  • Coverage
  • DepthPerAlleleBySample
  • DepthPerSampleHC
  • ExcessHet
  • FisherStrand
  • FractionInformativeReads
  • GCContent
  • GenotypeSummaries
  • HaplotypeScore
  • HardyWeinberg
  • HomopolymerRun
  • InbreedingCoeff
  • LikelihoodRankSumTest
  • LowMQ
  • MVLikelihoodRatio
  • MappingQualityRankSumTest
  • MappingQualityZero
  • MappingQualityZeroBySample
  • NBaseCount
  • OxoGReadCounts
  • PossibleDeNovo
  • QualByDepth
  • RMSMappingQuality
  • ReadPosRankSumTest
  • SampleList
  • SnpEff
  • SpanningDeletions
  • StrandAlleleCountsBySample
  • StrandBiasBySample
  • StrandOddsRatio
  • TandemRepeatAnnotator
  • TransmissionDisequilibriumTest
  • VariantType
  • Read Filters
  • BadCigarFilter
  • BadMateFilter
  • CountingFilteringIterator.CountingReadFilter
  • DuplicateReadFilter
  • FailsVendorQualityCheckFilter
  • HCMappingQualityFilter
  • LibraryReadFilter
  • MalformedReadFilter
  • MappingQualityFilter
  • MappingQualityUnavailableFilter
  • MappingQualityZeroFilter
  • MateSameStrandFilter
  • MaxInsertSizeFilter
  • MissingReadGroupFilter
  • NoOriginalQualityScoresFilter
  • NotPrimaryAlignmentFilter
  • OverclippedReadFilter
  • Platform454Filter
  • PlatformFilter
  • PlatformUnitFilter
  • ReadGroupBlackListFilter
  • ReadLengthFilter
  • ReadNameFilter
  • ReadStrandFilter
  • ReassignMappingQualityFilter
  • ReassignOneMappingQualityFilter
  • ReassignOriginalMQAfterIndelRealignmentFilter
  • SampleFilter
  • SingleReadGroupFilter
  • UnmappedReadFilter
  • Resource File Codecs
  • BeagleCodec
  • BedTableCodec
  • RawHapMapCodec
  • RefSeqCodec
  • SAMPileupCodec
  • SAMReadCodec
  • TableCodec

  • Reference Utilities
  • FastaAlternateReferenceMaker
  • FastaReferenceMaker
  • QCRef
  • Showing docs for version 3.7-0


    VariantsToTable

    Extract specific fields from a VCF file to a tab-delimited table

    Category Variant Manipulation Tools

    Traversal LocusWalker

    PartitionBy LOCUS


    Overview

    This tool is designed to extract fields from the VCF to a table format that is more convenient to work with in downstream analyses.

    The user specifies one or more fields to print with the -F NAME, each of which appears as a single column in the output file, with a header named NAME, and the value of this field in the VCF one per line. NAME can be any standard VCF column (CHROM, ID, QUAL) or any binding in the INFO field (AC=10). In addition, there are specially supported values like EVENTLENGTH (length of the event), TRANSITION (for SNPs), HET (count of het genotypes), HOM-REF (count of homozygous reference genotypes), HOM-VAR (count of homozygous variant genotypes), NO-CALL (count of no-call genotypes), TYPE (the type of event), VAR (count of non-reference genotypes), NSAMPLES (number of samples), NCALLED (number of called samples), GQ (from the genotype field; works only for a file with a single sample), and MULTI-ALLELIC (is the record from a multi-allelic site).

    Input

    Output

    A tab-delimited file containing the values of the requested fields in the VCF file

    Usage example

         java -jar GenomeAnalysisTK.jar \
         -R reference.fasta
         -T VariantsToTable \
         -V file.vcf \
         -F CHROM -F POS -F ID -F QUAL -F AC \
         -o results.table
     

    would produce a file that looks like:

         CHROM    POS ID      QUAL    AC
         1        10  .       50      1
         1        20  rs10    99      10
         et cetera...
     

    Caveat

    If a VCF record is missing a value, then the tool by default throws an error, but the special value NA can be emitted instead if requested at the command line using --allowMissingData.


    Additional Information

    Read filters

    These Read Filters are automatically applied to the data by the Engine before processing by VariantsToTable.


    Command-line Arguments

    Engine arguments

    All tools inherit arguments from the GATK Engine' "CommandLineGATK" argument collection, which can be used to modify various aspects of the tool's function. For example, the -L argument directs the GATK engine to restrict processing to specific genomic intervals; or the -rf argument allows you to apply certain read filters to exclude some of the data from the analysis.

    VariantsToTable specific arguments

    This table summarizes the command-line arguments that are specific to this tool. For more details on each argument, see the list further down below the table or click on an argument name to jump directly to that entry in the list.

    Argument name(s) Default value Summary
    Required Inputs
    --variant
     -V
    NA Input VCF file
    Optional Outputs
    --out
     -o
    stdout File to which results should be written
    Optional Parameters
    --fields
     -F
    [] The name of each field to capture for output in the table
    --genotypeFields
     -GF
    [] The name of each genotype field to capture for output in the table
    --maxRecords
     -M
    -1 If provided, we will emit at most maxRecord records to the table
    Optional Flags
    --splitMultiAllelic
     -SMA
    false If provided, we will split multi-allelic records into multiple lines of output
    Advanced Flags
    --allowMissingData
     -AMD
    false If provided, we will not require every record to contain every field
    --moltenize
    false If provided, we will produce molten output
    --showFiltered
     -raw
    false If provided, field values from filtered records will be included in the output

    Argument details

    Arguments in this list are specific to this tool. Keep in mind that other arguments are available that are shared with other tools (e.g. command-line GATK arguments); see Inherited arguments above.


    --allowMissingData / -AMD

    If provided, we will not require every record to contain every field
    By default, this tool throws a UserException when it encounters a field without a value in some record. This is generally useful when you mistype -F CHROM, so that you get a friendly warning about CHROM not being found before the tool runs through 40M 1000G records. However, in some cases you genuinely want to allow such fields (e.g., AC not being calculated for filtered records, if included). When provided, this argument will cause VariantsToTable to write out NA values for missing fields instead of throwing an error.

    boolean  false


    --fields / -F

    The name of each field to capture for output in the table
    -F NAME can be any standard VCF column (CHROM, ID, QUAL) or any binding in the INFO field (e.g., AC=10). Note that to capture GENOTYPE (FORMAT) field values, see the GF argument. This argument accepts any number of inputs. So -F CHROM -F POS is allowed.

    List[String]  []


    --genotypeFields / -GF

    The name of each genotype field to capture for output in the table
    -GF NAME can be any binding in the FORMAT field (e.g., GQ, PL). Note this argument accepts any number of inputs. So -GF GQ -GF PL is allowed.

    List[String]  []


    --maxRecords / -M

    If provided, we will emit at most maxRecord records to the table
    If provided, then this tool will exit with success after this number of VCF records have been emitted to the file.

    int  -1  [ [ -∞  ∞ ] ]


    --moltenize / -moltenize

    If provided, we will produce molten output
    By default, this tool emits one line per usable VCF record (or per allele if the -SMA flag is provided). Using the -moltenize flag will cause records to be split into multiple lines of output: one for each field provided with -F or one for each combination of sample and field provided with -GF. Note that the "Sample" column for -F fields will always be "site".

    boolean  false


    --out / -o

    File to which results should be written

    PrintStream  stdout


    --showFiltered / -raw

    If provided, field values from filtered records will be included in the output
    By default this tool only emits values for fields where the FILTER field is either PASS or . (unfiltered). Throwing this flag will cause VariantsToTable to emit values regardless of the FILTER field value.

    boolean  false


    --splitMultiAllelic / -SMA

    If provided, we will split multi-allelic records into multiple lines of output
    By default, records with multiple ALT alleles will comprise just one line of output; note that in general this can make your resulting file unreadable/malformed for certain tools like R, as the representation of multi-allelic INFO field values are often comma-separated lists of values. Using the flag will cause multi-allelic records to be split into multiple lines of output (one for each allele in the ALT field); INFO field values that are not lists are copied for each of the output records while only the appropriate entry is used for lists.

    boolean  false


    --variant / -V

    Input VCF file
    Variants from this VCF file are used by this tool as input. The file must at least contain the standard VCF header lines, but can be empty (i.e., no variants are contained in the file).

    This argument supports reference-ordered data (ROD) files in the following formats: BCF2, VCF, VCF3

    R List[RodBinding[VariantContext]]  NA