• Diagnostics and Quality Control Tools
  • ASEReadCounter
  • AnalyzeCovariates
  • CallableLoci
  • CheckPileup
  • CompareCallableLoci
  • ContEst
  • CountBases
  • CountIntervals
  • CountLoci
  • CountMales
  • CountRODs
  • CountRODsByRef
  • CountReadEvents
  • CountReads
  • CountTerminusEvent
  • DepthOfCoverage
  • DiagnoseTargets
  • DiffObjects
  • ErrorRatePerCycle
  • FastaStats
  • FindCoveredIntervals
  • FlagStat
  • GCContentByInterval
  • GatherBqsrReports
  • Pileup
  • PrintRODs
  • QualifyMissingIntervals
  • ReadClippingStats
  • ReadGroupProperties
  • ReadLengthDistribution
  • SimulateReadsForVariants
  • Sequence Data Processing Tools
  • BaseRecalibrator
  • ClipReads
  • IndelRealigner
  • LeftAlignIndels
  • PrintReads
  • RealignerTargetCreator
  • SplitNCigarReads
  • SplitSamFile
  • Variant Discovery Tools
  • ApplyRecalibration
  • CalculateGenotypePosteriors
  • GATKPaperGenotyper
  • GenotypeGVCFs
  • HaplotypeCaller
  • MuTect2
  • RegenotypeVariants
  • UnifiedGenotyper
  • VariantRecalibrator
  • Variant Evaluation Tools
  • GenotypeConcordance
  • ValidateVariants
  • VariantEval
  • VariantFiltration
  • Variant Manipulation Tools
  • CatVariants
  • CombineGVCFs
  • CombineVariants
  • HaplotypeResolver
  • LeftAlignAndTrimVariants
  • PhaseByTransmission
  • RandomlySplitVariants
  • ReadBackedPhasing
  • SelectHeaders
  • SelectVariants
  • ValidationSiteSelector
  • VariantAnnotator
  • VariantsToAllelicPrimitives
  • VariantsToBinaryPed
  • VariantsToTable
  • VariantsToVCF

  • Annotation Modules
  • AS_BaseQualityRankSumTest
  • AS_FisherStrand
  • AS_InbreedingCoeff
  • AS_InsertSizeRankSum
  • AS_MQMateRankSumTest
  • AS_MappingQualityRankSumTest
  • AS_QualByDepth
  • AS_RMSMappingQuality
  • AS_ReadPosRankSumTest
  • AS_StrandOddsRatio
  • AlleleBalance
  • AlleleBalanceBySample
  • AlleleCountBySample
  • BaseCounts
  • BaseCountsBySample
  • BaseQualityRankSumTest
  • BaseQualitySumPerAlleleBySample
  • ChromosomeCounts
  • ClippingRankSumTest
  • ClusteredReadPosition
  • Coverage
  • DepthPerAlleleBySample
  • DepthPerSampleHC
  • ExcessHet
  • FisherStrand
  • FractionInformativeReads
  • GCContent
  • GenotypeSummaries
  • HaplotypeScore
  • HardyWeinberg
  • HomopolymerRun
  • InbreedingCoeff
  • LikelihoodRankSumTest
  • LowMQ
  • MVLikelihoodRatio
  • MappingQualityRankSumTest
  • MappingQualityZero
  • MappingQualityZeroBySample
  • NBaseCount
  • OxoGReadCounts
  • PossibleDeNovo
  • QualByDepth
  • RMSMappingQuality
  • ReadPosRankSumTest
  • SampleList
  • SnpEff
  • SpanningDeletions
  • StrandAlleleCountsBySample
  • StrandBiasBySample
  • StrandOddsRatio
  • TandemRepeatAnnotator
  • TransmissionDisequilibriumTest
  • VariantType
  • Read Filters
  • BadCigarFilter
  • BadMateFilter
  • CountingFilteringIterator.CountingReadFilter
  • DuplicateReadFilter
  • FailsVendorQualityCheckFilter
  • HCMappingQualityFilter
  • LibraryReadFilter
  • MalformedReadFilter
  • MappingQualityFilter
  • MappingQualityUnavailableFilter
  • MappingQualityZeroFilter
  • MateSameStrandFilter
  • MaxInsertSizeFilter
  • MissingReadGroupFilter
  • NoOriginalQualityScoresFilter
  • NotPrimaryAlignmentFilter
  • OverclippedReadFilter
  • Platform454Filter
  • PlatformFilter
  • PlatformUnitFilter
  • ReadGroupBlackListFilter
  • ReadLengthFilter
  • ReadNameFilter
  • ReadStrandFilter
  • ReassignMappingQualityFilter
  • ReassignOneMappingQualityFilter
  • ReassignOriginalMQAfterIndelRealignmentFilter
  • SampleFilter
  • SingleReadGroupFilter
  • UnmappedReadFilter
  • Resource File Codecs
  • BeagleCodec
  • BedTableCodec
  • RawHapMapCodec
  • RefSeqCodec
  • SAMPileupCodec
  • SAMReadCodec
  • TableCodec

  • Reference Utilities
  • FastaAlternateReferenceMaker
  • FastaReferenceMaker
  • QCRef
  • Showing docs for version 3.7-0


    CommandLineGATK

    Command line parameters accepted by most if not all tools in the GATK

    Category Engine Parameters (available to all tools)


    Overview

    Info for end users

    This is a list of options and parameters that are generally available to all tools in the GATK.

    There may be a few restrictions, which are indicated in individual argument descriptions. For example the -BQSR argument is only meant to be used with a subset of tools, and the -pedigree argument will only be effectively used by a subset of tools as well. Some arguments conflict with others, and some conversely are dependent on others. This is all indicated in the detailed argument descriptions, so be sure to read those in their entirety rather than just skimming the one-line summary in the table.

    Info for developers

    This class is the GATK engine itself, which manages map/reduce data access and runs walkers.

    We run command line GATK programs using this class. It gets the command line args, parses them, and hands the gatk all the parsed out information. Pretty much anything dealing with the underlying system should go here; the GATK engine should deal with any data related information.


    Command-line Arguments

    CommandLineGATK specific arguments

    This table summarizes the command-line arguments that are specific to this tool. For more details on each argument, see the list further down below the table or click on an argument name to jump directly to that entry in the list.

    Argument name(s) Default value Summary
    Required Parameters
    --analysis_type
     -T
    NA Name of the tool to run
    Required Flags
    --showFullBamList
    false Emit list of input BAM/CRAM files to log
    Optional Inputs
    --BQSR
    NA Input covariates table file for on-the-fly base quality score recalibration
    --excludeIntervals
     -XL
    NA One or more genomic intervals to exclude from processing
    --input_file
     -I
    [] Input file containing sequence data (BAM or CRAM)
    --intervals
     -L
    NA One or more genomic intervals over which to operate
    --read_group_black_list
     -rgbl
    NA Exclude read groups based on tags
    --reference_sequence
     -R
    NA Reference sequence file
    Optional Outputs
    --log_to_file
     -log
    NA Set the logging location
    Optional Parameters
    --defaultBaseQualities
     -DBQ
    -1 Assign a default base quality
    --disable_read_filter
     -drf
    [] Read filters to disable
    --downsample_to_coverage
     -dcov
    NA Target coverage threshold for downsampling to coverage
    --downsample_to_fraction
     -dfrac
    NA Fraction of reads to downsample to
    --downsampling_type
     -dt
    NA Type of read downsampling to employ at a given locus
    --interval_merging
     -im
    ALL Interval merging rule for abutting intervals
    --interval_padding
     -ip
    0 Amount of padding (in bp) to add to each interval
    --interval_set_rule
     -isr
    UNION Set merging approach to use for combining interval inputs
    --logging_level
     -l
    INFO Set the minimum level of logging
    --maxRuntime
    -1 Stop execution cleanly as soon as maxRuntime has been reached
    --maxRuntimeUnits
    MINUTES Unit of time used by maxRuntime
    --num_cpu_threads_per_data_thread
     -nct
    1 Number of CPU threads to allocate per data thread
    --num_threads
     -nt
    1 Number of data threads to allocate to this analysis
    --pedigree
     -ped
    [] Pedigree files for samples
    --pedigreeString
     -pedString
    [] Pedigree string for samples
    --pedigreeValidationType
     -pedValidationType
    STRICT Validation strictness for pedigree
    --performanceLog
     -PF
    NA Write GATK runtime performance log to this file
    --quantize_quals
     -qq
    0 Quantize quality scores to a given number of levels (with -BQSR)
    --read_filter
     -rf
    [] Filters to apply to reads before analysis
    --validation_strictness
     -S
    SILENT How strict should we be with validation
    Optional Flags
    --allow_potentially_misencoded_quality_scores
     -allowPotentiallyMisencodedQuals
    false Ignore warnings about base quality score encoding
    --disable_indel_quals
     -DIQ
    false Disable printing of base insertion and deletion tags (with -BQSR)
    --emit_original_quals
     -EOQ
    false Emit the OQ tag with the original base qualities (with -BQSR)
    --fix_misencoded_quality_scores
     -fixMisencodedQuals
    false Fix mis-encoded base quality scores
    --generate_md5
    false Enable on-the-fly creation of md5s for output BAM files.
    --help
     -h
    false Generate the help message
    --keep_program_records
     -kpr
    false Keep program records in the SAM header
    --monitorThreadEfficiency
     -mte
    false Enable threading efficiency monitoring
    --never_trim_vcf_format_field
     -writeFullFormat
    false Always output all the records in VCF FORMAT fields, even if some are missing
    --no_cmdline_in_header
    false Don't include the command line in output VCF headers
    --nonDeterministicRandomSeed
     -ndrs
    false Use a non-deterministic random seed
    --refactor_NDN_cigar_string
     -fixNDN
    false Reduce NDN elements in CIGAR string
    --remove_program_records
     -rpr
    false Remove program records from the SAM header
    --sites_only
    false Output sites-only VCF
    --useOriginalQualities
     -OQ
    false Use the base quality scores from the OQ tag
    --version
    false Output version information
    Advanced Parameters
    --bam_compression
     -compress
    NA Compression level to use for writing BAM files (0 - 9, higher is more compressed)
    --baq
    OFF Type of BAQ calculation to apply in the engine
    --baqGapOpenPenalty
     -baqGOP
    40.0 BAQ gap open penalty
    --globalQScorePrior
    -1.0 Global Qscore Bayesian prior to use for BQSR
    --preserve_qscores_less_than
     -preserveQ
    6 Don't recalibrate bases with quality scores less than this threshold (with -BQSR)
    --read_buffer_size
     -rbs
    NA Number of reads per SAM file to buffer in memory
    --reference_window_stop
     -ref_win_stop
    0 Reference window stop
    --sample_rename_mapping_file
    NA Rename sample IDs on-the-fly at runtime using the provided mapping file
    --secondsBetweenProgressUpdates
    10 Time interval for process meter information output (in seconds)
    --static_quantized_quals
     -SQQ
    NA Use static quantized quality scores to a given number of levels (with -BQSR)
    --unsafe
     -U
    NA Enable unsafe operations: nothing will be checked at runtime
    --variant_index_parameter
    -1 Parameter to pass to the VCF/BCF IndexCreator
    --variant_index_type
    DYNAMIC_SEEK Type of IndexCreator to use for VCF/BCF indices
    Advanced Flags
    --disable_auto_index_creation_and_locking_when_reading_rods
    false Disable both auto-generation of index files and index file locking
    --disable_bam_indexing
    false Turn off on-the-fly creation of indices for output BAM/CRAM files
    --simplifyBAM
    false Strip down read content and tags

    Argument details

    Arguments in this list are specific to this tool. Keep in mind that other arguments are available that are shared with other tools (e.g. command-line GATK arguments); see Inherited arguments above.


    --allow_potentially_misencoded_quality_scores / -allowPotentiallyMisencodedQuals

    Ignore warnings about base quality score encoding
    This flag tells GATK to ignore warnings when encountering base qualities that are too high and that seemingly indicate a problem with the base quality encoding of the BAM or CRAM file. You should only use this if you really know what you are doing; otherwise you could seriously mess up your data and ruin your analysis.

    boolean  false


    --analysis_type / -T

    Name of the tool to run
    A complete list of tools (sometimes also called walkers because they "walk" through the data to perform analyses) is available in the online documentation.

    R String  NA


    --bam_compression / -compress

    Compression level to use for writing BAM files (0 - 9, higher is more compressed)

    Integer  NA


    --baq / -baq

    Type of BAQ calculation to apply in the engine

    The --baq argument is an enumerated type (CalculationMode), which can have one of the following values:

    OFF
    CALCULATE_AS_NECESSARY
    RECALCULATE

    CalculationMode  OFF


    --baqGapOpenPenalty / -baqGOP

    BAQ gap open penalty
    Phred-scaled gap open penalty for BAQ calculation. Although the default value is 40, a value of 30 may be better for whole genome call sets.

    double  40.0  [ [ 0  ∞ ] ]


    --BQSR / -BQSR

    Input covariates table file for on-the-fly base quality score recalibration
    Enables on-the-fly recalibrate of base qualities, intended primarily for use with BaseRecalibrator and PrintReads (see Best Practices workflow documentation). The covariates tables are produced by the BaseRecalibrator tool. Please be aware that you should only run recalibration with the covariates file created on the same input BAM(s) or CRAM(s).

    File  NA


    --defaultBaseQualities / -DBQ

    Assign a default base quality
    If reads are missing some or all base quality scores, this value will be used for all base quality scores. By default this is set to -1 to disable default base quality assignment.

    byte  -1  [ [ 0  127 ] ]


    --disable_auto_index_creation_and_locking_when_reading_rods / -disable_auto_index_creation_and_locking_when_reading_rods

    Disable both auto-generation of index files and index file locking
    Not recommended for general use. Disables both auto-generation of index files and index file locking when reading VCFs and other rods and an index isn't present or is out-of-date. The file locking necessary for auto index generation to work safely is prone to random failures/hangs on certain platforms, which makes it desirable to disable it for situations like test suite runs where the indices are already known to exist, however this option is unsafe in general because it allows reading from index files without first acquiring a lock.

    boolean  false


    --disable_bam_indexing / NA

    Turn off on-the-fly creation of indices for output BAM/CRAM files

    boolean  false


    --disable_indel_quals / -DIQ

    Disable printing of base insertion and deletion tags (with -BQSR)
    Turns off printing of the base insertion and base deletion tags when using the -BQSR argument. Only the base substitution qualities will be produced.

    boolean  false


    --disable_read_filter / -drf

    Read filters to disable

    List[String]  []


    --downsample_to_coverage / -dcov

    Target coverage threshold for downsampling to coverage
    The principle of this downsampling type is to downsample reads to a given capping threshold coverage. Its purpose is to get rid of excessive coverage, because above a certain depth, having additional data is not informative and imposes unreasonable computational costs. The downsampling process takes two different forms depending on the type of analysis it is used with. For locus-based traversals (LocusWalkers like UnifiedGenotyper and ActiveRegionWalkers like HaplotypeCaller), downsample_to_coverage controls the maximum depth of coverage at each locus. For read-based traversals (ReadWalkers like BaseRecalibrator), it controls the maximum number of reads sharing the same alignment start position. For ReadWalkers you will typically need to use much lower dcov values than you would with LocusWalkers to see an effect. Note that this downsampling option does not produce an unbiased random sampling from all available reads at each locus: instead, the primary goal of the to-coverage downsampler is to maintain an even representation of reads from all alignment start positions when removing excess coverage. For a truly unbiased random sampling of reads, use -dfrac instead. Also note that the coverage target is an approximate goal that is not guaranteed to be met exactly: the downsampling algorithm will under some circumstances retain slightly more or less coverage than requested.

    Integer  NA


    --downsample_to_fraction / -dfrac

    Fraction of reads to downsample to
    Reads will be downsampled so the specified fraction remains; e.g. if you specify -dfrac 0.25, three-quarters of the reads will be removed, and the remaining one quarter will be used in the analysis. This method of downsampling is truly unbiased and random. It is typically used to simulate the effect of generating different amounts of sequence data for a given sample. For example, you can use this in a pilot experiment to evaluate how much target coverage you need to aim for in order to obtain enough coverage in all loci of interest.

    Double  NA


    --downsampling_type / -dt

    Type of read downsampling to employ at a given locus
    There are several ways to downsample reads, i.e. to remove reads from the pile of reads that will be used for analysis. See the documentation of the individual downsampling options for details on how they work. Note that many GATK tools specify a default downsampling type and target, but this behavior can be overridden from the command line using the downsampling arguments.

    The --downsampling_type argument is an enumerated type (DownsampleType), which can have one of the following values:

    NONE
    ALL_READS
    BY_SAMPLE

    DownsampleType  NA


    --emit_original_quals / -EOQ

    Emit the OQ tag with the original base qualities (with -BQSR)
    By default, the OQ tag in not emitted when using the -BQSR argument. Use this flag to include OQ tags in the output BAM or CRAM file. Note that this may results in significant file size increase.

    boolean  false


    --excludeIntervals / -XL

    One or more genomic intervals to exclude from processing
    Use this option to exclude certain parts of the genome from the analysis (like -L, but the opposite). This argument can be specified multiple times. You can use samtools-style intervals either explicitly on the command line (e.g. -XL chr1 or -XL chr1:100-200) or by loading in a file containing a list of intervals (e.g. -XL myFile.intervals). Additionally, you can also specify a ROD file (such as a VCF file) in order to exclude specific positions from the analysis based on the records present in the file (e.g. -XL file.vcf).

    List[IntervalBinding[Feature]]  NA


    --fix_misencoded_quality_scores / -fixMisencodedQuals

    Fix mis-encoded base quality scores
    By default the GATK assumes that base quality scores start at Q0 == ASCII 33 according to the SAM specification. However, encoding in some datasets (especially older Illumina ones) starts at Q64. This argument will fix the encodings on the fly (as the data is read in) by subtracting 31 from every quality score. Note that this argument should NEVER be used by default; you should only use it when you have confirmed that the quality scores in your data are not in the correct encoding.

    boolean  false


    --generate_md5 / NA

    Enable on-the-fly creation of md5s for output BAM files.

    boolean  false


    --globalQScorePrior / -globalQScorePrior

    Global Qscore Bayesian prior to use for BQSR
    If specified, this value will be used as the prior for all mismatch quality scores instead of the actual reported quality score.

    double  -1.0  [ [ -∞  ∞ ] ]


    --help / -h

    Generate the help message
    This will produce a help message in the terminal with general usage information, listing available arguments as well as tool-specific information if applicable.

    Boolean  false


    --input_file / -I

    Input file containing sequence data (BAM or CRAM)
    An input file containing sequence data mapped to a reference, in BAM or CRAM format, or a text file containing a list of input files (with extension .list). Note that the GATK requires an accompanying .bai index for each BAM or CRAM file. Please see our online documentation for more details on input formatting requirements.

    List[String]  []


    --interval_merging / -im

    Interval merging rule for abutting intervals
    By default, the program merges abutting intervals (i.e. intervals that are directly side-by-side but do not actually overlap) into a single continuous interval. However you can change this behavior if you want them to be treated as separate intervals instead.

    The --interval_merging argument is an enumerated type (IntervalMergingRule), which can have one of the following values:

    ALL
    OVERLAPPING_ONLY

    IntervalMergingRule  ALL


    --interval_padding / -ip

    Amount of padding (in bp) to add to each interval
    Use this to add padding to the intervals specified using -L and/or -XL. For example, '-L chr1:100' with a padding value of 20 would turn into '-L chr1:80-120'. This is typically used to add padding around exons when analyzing exomes. The general Broad exome calling pipeline uses 100 bp padding by default.

    int  0  [ [ 0  ∞ ] ]


    --interval_set_rule / -isr

    Set merging approach to use for combining interval inputs
    By default, the program will take the UNION of all intervals specified using -L and/or -XL. However, you can change this setting for -L, for example if you want to take the INTERSECTION of the sets instead. E.g. to perform the analysis on positions for which there is a record in a VCF, but restrict this to just those on chromosome 20, you would do -L chr20 -L file.vcf -isr INTERSECTION. However, it is not possible to modify the merging approach for intervals passed using -XL (they will always be merged using UNION). Note that if you specify both -L and -XL, the -XL interval set will be subtracted from the -L interval set.

    The --interval_set_rule argument is an enumerated type (IntervalSetRule), which can have one of the following values:

    UNION
    Take the union of all intervals
    INTERSECTION
    Take the intersection of intervals (the subset that overlaps all intervals specified)

    IntervalSetRule  UNION


    --intervals / -L

    One or more genomic intervals over which to operate
    Use this option to perform the analysis over only part of the genome. This argument can be specified multiple times. You can use samtools-style intervals either explicitly on the command line (e.g. -L chr1 or -L chr1:100-200) or by loading in a file containing a list of intervals (e.g. -L myFile.intervals). Additionally, you can also specify a ROD file (such as a VCF file) in order to perform the analysis at specific positions based on the records present in the file (e.g. -L file.vcf). Finally, you can also use this to perform the analysis on the reads that are completely unmapped in the BAM file (i.e. those without a reference contig) by specifying -L unmapped.

    List[IntervalBinding[Feature]]  NA


    --keep_program_records / -kpr

    Keep program records in the SAM header
    Some tools discard program records from the SAM header by default. Use this argument to override that behavior and keep program records in the SAM header.

    boolean  false


    --log_to_file / -log

    Set the logging location
    File to save the logging output.

    String  NA


    --logging_level / -l

    Set the minimum level of logging
    Setting INFO gets you INFO up to FATAL, setting ERROR gets you ERROR and FATAL level logging, and so on.

    String  INFO


    --maxRuntime / -maxRuntime

    Stop execution cleanly as soon as maxRuntime has been reached
    This will truncate the run but without exiting with a failure. By default the value is interpreted in minutes, but this can be changed with the maxRuntimeUnits argument.

    long  -1  [ [ -∞  ∞ ] ]


    --maxRuntimeUnits / -maxRuntimeUnits

    Unit of time used by maxRuntime

    The --maxRuntimeUnits argument is an enumerated type (TimeUnit), which can have one of the following values:

    NANOSECONDS
    MICROSECONDS
    MILLISECONDS
    SECONDS
    MINUTES
    HOURS
    DAYS

    TimeUnit  MINUTES


    --monitorThreadEfficiency / -mte

    Enable threading efficiency monitoring
    Enable GATK to monitor its own threading efficiency, at an itsy-bitsy tiny cost (< 0.1%) in runtime because of turning on the JavaBean. This is largely for debugging purposes. Note that this argument is not compatible with -nt, it only works with -nct.

    Boolean  false


    --never_trim_vcf_format_field / -writeFullFormat

    Always output all the records in VCF FORMAT fields, even if some are missing

    The VCF specification permits missing records to be dropped from the end of FORMAT fields, so long as GT is always output. This option prevents GATK from performing that trimming.

    For example, given a FORMAT of GT:AD:DP:PL, GATK will by default emit ./. for a variant with no reads present (ie, the AD, DP, and PL fields are trimmed). If you specify -writeFullFormat, this record would be emitted as ./.:.:.:.

    boolean  false


    --no_cmdline_in_header / -no_cmdline_in_header

    Don't include the command line in output VCF headers
    This option is intended to be used FOR DEBUGGING PURPOSES ONLY. Note to developers: it is required in order to pass integration tests.

    boolean  false


    --nonDeterministicRandomSeed / -ndrs

    Use a non-deterministic random seed
    If this flag is enabled, the random numbers generated will be different in every run, causing GATK to behave non-deterministically.

    boolean  false


    --num_cpu_threads_per_data_thread / -nct

    Number of CPU threads to allocate per data thread
    Each CPU thread operates the map cycle independently, but may run into earlier scaling problems with IO than data threads. Has the benefit of not requiring X times as much memory per thread as data threads do, but rather only a constant overhead. See online documentation FAQs for more information.

    int  1  [ [ 1  ∞ ] ]


    --num_threads / -nt

    Number of data threads to allocate to this analysis
    Data threads contains N cpu threads per data thread, and act as completely data parallel processing, increasing the memory usage of GATK by M data threads. Data threads generally scale extremely effectively, up to 24 cores. See online documentation FAQs for more information.

    Integer  1  [ [ 1  ∞ ] ]


    --pedigree / -ped

    Pedigree files for samples
    Reads PED file-formatted tabular text files describing meta-data about the samples being processed in the GATK. See https://www.broadinstitute.org/gatk/guide/article?id=7696 for more information on format requirements. Note that most GATK tools do not use pedigree information; for those that do it is indicated in their documentation.

    List[File]  []


    --pedigreeString / -pedString

    Pedigree string for samples
    Inline PED records. Each -pedString STRING can contain one or more valid PED records (see -ped) separated by semi-colons. Supports all tags for each pedString as -ped supports

    List[String]  []


    --pedigreeValidationType / -pedValidationType

    Validation strictness for pedigree
    How strict should we be in parsing the PED files?

    The --pedigreeValidationType argument is an enumerated type (PedigreeValidationType), which can have one of the following values:

    STRICT
    Require if a pedigree file is provided at all samples in the VCF or BAM files have a corresponding entry in the pedigree file(s).
    SILENT
    Do not enforce any overlap between the VCF/BAM samples and the pedigree data

    PedigreeValidationType  STRICT


    --performanceLog / -PF

    Write GATK runtime performance log to this file
    The file name for the GATK performance log output, or null if you don't want to generate the detailed performance logging table. This table is suitable for importing into R or any other analysis software that can read tsv files.

    File  NA


    --preserve_qscores_less_than / -preserveQ

    Don't recalibrate bases with quality scores less than this threshold (with -BQSR)
    This flag tells GATK not to modify quality scores less than this value. Instead they will be written out unmodified in the recalibrated BAM or CRAM file. In general it's unsafe to change qualities scores below < 6, since base callers use these values to indicate random or bad bases. For example, Illumina writes Q2 bases when the machine has really gone wrong. This would be fine in and of itself, but when you select a subset of these reads based on their ability to align to the reference and their dinucleotide effect, your Q2 bin can be elevated to Q8 or Q10, leading to issues downstream.

    int  6  [ [ 0  [ 6  ∞ ] ]


    --quantize_quals / -qq

    Quantize quality scores to a given number of levels (with -BQSR)
    Turns on the base quantization module. It requires a recalibration report (-BQSR). A value of 0 here means "do not quantize". Any value greater than zero will be used to recalculate the quantization using that many levels. Negative values mean that we should quantize using the recalibration report's quantization level.

    int  0  [ [ -∞  ∞ ] ]


    --read_buffer_size / -rbs

    Number of reads per SAM file to buffer in memory

    Integer  NA


    --read_filter / -rf

    Filters to apply to reads before analysis
    Reads that fail the specified filters will not be used in the analysis. Multiple filters can be specified separately, e.g. you can do -rf MalformedRead -rf BadCigar and so on. Available read filters are listed in the online tool documentation. Note that the read name format is e.g. MalformedReadFilter, but at the command line the filter name should be given without the Filter suffix; e.g. -rf MalformedRead (NOT -rf MalformedReadFilter, which is not recognized by the program). Note also that some read filters are applied by default for some analysis tools; this is specified in each tool's documentation. The default filters can only be disabled if they are DisableableReadFilters.

    List[String]  []


    --read_group_black_list / -rgbl

    Exclude read groups based on tags
    This will filter out read groups matching : (e.g. SM:sample1) or a .txt file containing the filter strings one per line.

    List[String]  NA


    --refactor_NDN_cigar_string / -fixNDN

    Reduce NDN elements in CIGAR string
    Some RNAseq aligners that use a known transcriptome resource (such as TopHat2) produce NDN elements in read CIGARS when a small exon is entirely deleted during transcription, which ends up looking like [exon1]NDN[exon3]. These rarely happen, but when they do they cause GATK to fail with an error. Setting this flag tells the GATK to reduce "NDN" to a simpler CIGAR representation with one N element (with total length of the three refactored elements). From the point of view of variant calling, there is no meaningful difference between the two representations.

    boolean  false


    --reference_sequence / -R

    Reference sequence file
    The reference genome against which the sequence data was mapped. The GATK requires an index file and a dictionary file accompanying the reference (please see the online documentation FAQs for more details on these files). Although this argument is indicated as being optional, almost all GATK tools require a reference in order to run. Note also that while GATK can in theory process genomes from any organism with any number of chromosomes or contigs, it is not designed to process draft genome assemblies and performance will decrease as the number of contigs in the reference increases. We strongly discourage the use of unfinished genome assemblies containing more than a few hundred contigs. Contig numbers in the thousands will most probably cause memory-related crashes.

    File  NA


    --reference_window_stop / -ref_win_stop

    Reference window stop
    Stop of the expanded window for which the reference context should be provided, relative to the locus.

    int  0  [ [ 0  ∞ ] ]


    --remove_program_records / -rpr

    Remove program records from the SAM header
    Some tools keep program records in the SAM header by default. Use this argument to override that behavior and discard program records for the SAM header. Does not work on CRAM files.

    boolean  false


    --sample_rename_mapping_file / -sample_rename_mapping_file

    Rename sample IDs on-the-fly at runtime using the provided mapping file
    On-the-fly sample renaming works only with single-sample BAM, CRAM, and VCF files. Each line of the mapping file must contain the absolute path to a BAM, CRAM, or VCF file, followed by whitespace, followed by the new sample name for that BAM, CRAM, or VCF file. The sample name may contain non-tab whitespace, but leading or trailing whitespace will be ignored. The engine will verify at runtime that each BAM/CRAM/VCF targeted for sample renaming has only a single sample specified in its header (though, in the case of BAM/CRAM files, there may be multiple read groups for that sample).

    File  NA


    --secondsBetweenProgressUpdates / -secondsBetweenProgressUpdates

    Time interval for process meter information output (in seconds)

    long  10  [ [ -∞  ∞ ] ]


    --showFullBamList / NA

    Emit list of input BAM/CRAM files to log
    This emits a log entry (level INFO) containing the full list of sequence data files to be included in the analysis (including files inside .bam.list or .cram.list files).

    Boolean  false


    --simplifyBAM / -simplifyBAM

    Strip down read content and tags
    If provided, output BAM/CRAM files will be simplified to include only key reads for downstream variation discovery analyses (removing duplicates, PF-, non-primary reads), as well stripping all extended tags from the kept reads except the read group identifier

    boolean  false


    --sites_only / -sites_only

    Output sites-only VCF
    This produces a VCF with only the first 8 columns of site-level information and without any sample-level info (genotypes etc).

    boolean  false


    --static_quantized_quals / -SQQ

    Use static quantized quality scores to a given number of levels (with -BQSR)
    Static quantized quals are entirely separate from the quantize_qual option which uses dynamic binning. The two types of binning should not be used together.

    List[Integer]  NA


    --unsafe / -U

    Enable unsafe operations: nothing will be checked at runtime
    For expert users only who know what they are doing. We do not support usage of this argument, so we may refuse to help you if you use it and something goes wrong. The one exception to this rule is ALLOW_N_CIGAR_READS, which is necessary for RNAseq analysis.

    The --unsafe argument is an enumerated type (TYPE), which can have one of the following values:

    ALLOW_N_CIGAR_READS
    ALLOW_UNINDEXED_BAM
    ALLOW_UNSET_BAM_SORT_ORDER
    NO_READ_ORDER_VERIFICATION
    ALLOW_SEQ_DICT_INCOMPATIBILITY
    LENIENT_VCF_PROCESSING
    ALL

    TYPE  NA


    --useOriginalQualities / -OQ

    Use the base quality scores from the OQ tag
    This flag tells GATK to use the original base qualities (that were in the data before BQSR/recalibration) which are stored in the OQ tag, if they are present, rather than use the post-recalibration quality scores. If no OQ tag is present for a read, the standard qual score will be used.

    Boolean  false


    --validation_strictness / -S

    How strict should we be with validation
    Keep in mind that if you set this to LENIENT, we may refuse to provide you with support if anything goes wrong.

    The --validation_strictness argument is an enumerated type (ValidationStringency), which can have one of the following values:

    STRICT
    LENIENT
    SILENT

    ValidationStringency  SILENT


    --variant_index_parameter / -variant_index_parameter

    Parameter to pass to the VCF/BCF IndexCreator
    This is either the bin width or the number of features per bin, depending on the indexing strategy. This argument is no longer necessary when producing GVCF files. Using the output file ".g.vcf" extension will automatically set the appropriate value

    int  -1  [ [ -∞  ∞ ] ]


    --variant_index_type / -variant_index_type

    Type of IndexCreator to use for VCF/BCF indices
    Specify the Tribble indexing strategy to use for VCFs. LINEAR creates a LinearIndex with bins of equal width, specified by the Bin Width parameter INTERVAL creates an IntervalTreeIndex with bins with an equal amount of features, specified by the Features Per Bin parameter DYNAMIC_SEEK attempts to optimize for minimal seek time by choosing an appropriate strategy and parameter (user-supplied parameter is ignored) DYNAMIC_SIZE attempts to optimize for minimal index size by choosing an appropriate strategy and parameter (user-supplied parameter is ignored) This argument is no longer necessary when producing GVCF files. Using the output file ".g.vcf" extension will automatically set the appropriate value

    The --variant_index_type argument is an enumerated type (GATKVCFIndexType), which can have one of the following values:

    DYNAMIC_SEEK
    DYNAMIC_SIZE
    LINEAR
    INTERVAL

    GATKVCFIndexType  DYNAMIC_SEEK


    --version / -version

    Output version information
    Use this to check the version number of the GATK executable you are invoking. Note that the version number is always included in the output at the start of every run as well as any error message.

    Boolean  false


    GATK version 3.7-0-gcfedb67 built at 2017/02/09 12:35:06.