• Diagnostics and Quality Control Tools
  • ASEReadCounter
  • AnalyzeCovariates
  • CallableLoci
  • CheckPileup
  • CompareCallableLoci
  • ContEst
  • CountBases
  • CountIntervals
  • CountLoci
  • CountMales
  • CountRODs
  • CountRODsByRef
  • CountReadEvents
  • CountReads
  • CountTerminusEvent
  • DepthOfCoverage
  • DiagnoseTargets
  • DiffObjects
  • ErrorRatePerCycle
  • FastaStats
  • FindCoveredIntervals
  • FlagStat
  • GCContentByInterval
  • GatherBqsrReports
  • Pileup
  • PrintRODs
  • QualifyMissingIntervals
  • ReadClippingStats
  • ReadGroupProperties
  • ReadLengthDistribution
  • SimulateReadsForVariants
  • Sequence Data Processing Tools
  • BaseRecalibrator
  • ClipReads
  • IndelRealigner
  • LeftAlignIndels
  • PrintReads
  • RealignerTargetCreator
  • SplitNCigarReads
  • SplitSamFile
  • Variant Discovery Tools
  • ApplyRecalibration
  • CalculateGenotypePosteriors
  • GATKPaperGenotyper
  • GenotypeGVCFs
  • HaplotypeCaller
  • MuTect2
  • RegenotypeVariants
  • UnifiedGenotyper
  • VariantRecalibrator
  • Variant Evaluation Tools
  • GenotypeConcordance
  • ValidateVariants
  • VariantEval
  • VariantFiltration
  • Variant Manipulation Tools
  • CatVariants
  • CombineGVCFs
  • CombineVariants
  • HaplotypeResolver
  • LeftAlignAndTrimVariants
  • PhaseByTransmission
  • RandomlySplitVariants
  • ReadBackedPhasing
  • SelectHeaders
  • SelectVariants
  • ValidationSiteSelector
  • VariantAnnotator
  • VariantsToAllelicPrimitives
  • VariantsToBinaryPed
  • VariantsToTable
  • VariantsToVCF

  • Annotation Modules
  • AS_BaseQualityRankSumTest
  • AS_FisherStrand
  • AS_InbreedingCoeff
  • AS_InsertSizeRankSum
  • AS_MQMateRankSumTest
  • AS_MappingQualityRankSumTest
  • AS_QualByDepth
  • AS_RMSMappingQuality
  • AS_ReadPosRankSumTest
  • AS_StrandOddsRatio
  • AlleleBalance
  • AlleleBalanceBySample
  • AlleleCountBySample
  • BaseCounts
  • BaseCountsBySample
  • BaseQualityRankSumTest
  • BaseQualitySumPerAlleleBySample
  • ChromosomeCounts
  • ClippingRankSumTest
  • ClusteredReadPosition
  • Coverage
  • DepthPerAlleleBySample
  • DepthPerSampleHC
  • ExcessHet
  • FisherStrand
  • FractionInformativeReads
  • GCContent
  • GenotypeSummaries
  • HaplotypeScore
  • HardyWeinberg
  • HomopolymerRun
  • InbreedingCoeff
  • LikelihoodRankSumTest
  • LowMQ
  • MVLikelihoodRatio
  • MappingQualityRankSumTest
  • MappingQualityZero
  • MappingQualityZeroBySample
  • NBaseCount
  • OxoGReadCounts
  • PossibleDeNovo
  • QualByDepth
  • RMSMappingQuality
  • ReadPosRankSumTest
  • SampleList
  • SnpEff
  • SpanningDeletions
  • StrandAlleleCountsBySample
  • StrandBiasBySample
  • StrandOddsRatio
  • TandemRepeatAnnotator
  • TransmissionDisequilibriumTest
  • VariantType
  • Read Filters
  • BadCigarFilter
  • BadMateFilter
  • CountingFilteringIterator.CountingReadFilter
  • DuplicateReadFilter
  • FailsVendorQualityCheckFilter
  • HCMappingQualityFilter
  • LibraryReadFilter
  • MalformedReadFilter
  • MappingQualityFilter
  • MappingQualityUnavailableFilter
  • MappingQualityZeroFilter
  • MateSameStrandFilter
  • MaxInsertSizeFilter
  • MissingReadGroupFilter
  • NoOriginalQualityScoresFilter
  • NotPrimaryAlignmentFilter
  • OverclippedReadFilter
  • Platform454Filter
  • PlatformFilter
  • PlatformUnitFilter
  • ReadGroupBlackListFilter
  • ReadLengthFilter
  • ReadNameFilter
  • ReadStrandFilter
  • ReassignMappingQualityFilter
  • ReassignOneMappingQualityFilter
  • ReassignOriginalMQAfterIndelRealignmentFilter
  • SampleFilter
  • SingleReadGroupFilter
  • UnmappedReadFilter
  • Resource File Codecs
  • BeagleCodec
  • BedTableCodec
  • RawHapMapCodec
  • RefSeqCodec
  • SAMPileupCodec
  • SAMReadCodec
  • TableCodec

  • Reference Utilities
  • FastaAlternateReferenceMaker
  • FastaReferenceMaker
  • QCRef
  • Showing docs for version 3.7-0


    VariantsToBinaryPed

    Convert VCF to binary pedigree file

    Category Variant Manipulation Tools

    Traversal LocusWalker

    PartitionBy LOCUS


    Overview

    This tool takes a VCF and produces a binary pedigree as used by PLINK, consisting of three associated files (.bed/.bim/.fam).

    Inputs

    A VCF file and a metadata file.

    The metaData file can take two formats, the first of which is the first 6 lines of the standard pedigree file. This is what Plink describes as a .fam file. Note that the sex encoding convention is 1=male; 2=female; other=unknown. An example .fam file is as follows (note that there is no header):

     CEUTrio NA12878 NA12891 NA12892 2 -9
     CEUTrio NA12891 UNKN1 UNKN2 1 -9
     CEUTrio NA12892 UNKN3 UNKN4 2 -9
     

    where the entries are: FamilyID IndividualID DadID MomID Sex Phenotype.

    An alternate format is a two-column key-value file:

     NA12878        fid=CEUTrio;dad=NA12891;mom=NA12892;sex=2;phenotype=-9
     NA12891        fid=CEUTrio;sex=1;phenotype=-9
     NA12892        fid=CEUTrio;sex=2;phenotype=-9
     

    where unknown parents do not need to be specified. The columns are the individual ID and a list of key-value pairs.

    Regardless of which file is specified, the tool will output a .fam file alongside the pedigree file. If the command line has "-m [name].fam", the fam file will be subset and reordered to match the sample content and ordering of the VCF. However, if a metadata file of the alternate format is passed by "-m [name].txt", the tool will construct a formatted .fam file from the data.

    Outputs

    A binary pedigree in PLINK format, composed of three files (.bed/.bim/.fam). See the PLINK format specification for more details.

    Example

     java -jar GenomeAnalysisTK.jar \
       -T VariantsToBinaryPed \
       -R reference.fasta \
       -V variants.vcf \
       -m metadata.fam \
       -bed output.bed \
       -bim output.bim \
       -fam output.fam
     

    Additional Information

    Read filters

    These Read Filters are automatically applied to the data by the Engine before processing by VariantsToBinaryPed.

    Window size

    This tool uses a sliding window on the reference.


    Command-line Arguments

    Engine arguments

    All tools inherit arguments from the GATK Engine' "CommandLineGATK" argument collection, which can be used to modify various aspects of the tool's function. For example, the -L argument directs the GATK engine to restrict processing to specific genomic intervals; or the -rf argument allows you to apply certain read filters to exclude some of the data from the analysis.

    VariantsToBinaryPed specific arguments

    This table summarizes the command-line arguments that are specific to this tool. For more details on each argument, see the list further down below the table or click on an argument name to jump directly to that entry in the list.

    Argument name(s) Default value Summary
    Required Inputs
    --metaData
     -m
    NA Sample metadata file
    --variant
     -V
    NA Input VCF file
    Required Outputs
    --bed
    NA output bed file
    --bim
    NA output map file
    --fam
    NA output fam file
    Required Parameters
    --minGenotypeQuality
     -mgq
    0 If genotype quality is lower than this value, output NO_CALL
    Optional Inputs
    --dbsnp
     -D
    none dbSNP file
    --outputMode
     -mode
    INDIVIDUAL_MAJOR The output file mode (SNP major or individual major)
    Optional Flags
    --checkAlternateAlleles
    false Checks that alternate alleles actually appear in samples, erroring out if they do not
    --majorAlleleFirst
    false Sets the major allele to be 'reference' for the bim file, rather than the ref allele

    Argument details

    Arguments in this list are specific to this tool. Keep in mind that other arguments are available that are shared with other tools (e.g. command-line GATK arguments); see Inherited arguments above.


    --bed / -bed

    output bed file

    R PrintStream  NA


    --bim / -bim

    output map file

    R PrintStream  NA


    --checkAlternateAlleles / NA

    Checks that alternate alleles actually appear in samples, erroring out if they do not

    boolean  false


    --dbsnp / -D

    dbSNP file

    This argument supports reference-ordered data (ROD) files in the following formats: BCF2, VCF, VCF3

    RodBinding[VariantContext]  none


    --fam / -fam

    output fam file

    R PrintStream  NA


    --majorAlleleFirst / NA

    Sets the major allele to be 'reference' for the bim file, rather than the ref allele

    boolean  false


    --metaData / -m

    Sample metadata file

    R File  NA


    --minGenotypeQuality / -mgq

    If genotype quality is lower than this value, output NO_CALL

    R int  0  [ [ -∞  ∞ ] ]


    --outputMode / -mode

    The output file mode (SNP major or individual major)

    The --outputMode argument is an enumerated type (OutputMode), which can have one of the following values:

    INDIVIDUAL_MAJOR
    SNP_MAJOR

    OutputMode  INDIVIDUAL_MAJOR


    --variant / -V

    Input VCF file
    Variants from this VCF file are used by this tool as input. The file must at least contain the standard VCF header lines, but can be empty (i.e., no variants are contained in the file).

    This argument supports reference-ordered data (ROD) files in the following formats: BCF2, VCF, VCF3

    R RodBinding[VariantContext]  NA