• Diagnostics and Quality Control Tools
  • ASEReadCounter
  • AnalyzeCovariates
  • CallableLoci
  • CheckPileup
  • CompareCallableLoci
  • ContEst
  • CountBases
  • CountIntervals
  • CountLoci
  • CountMales
  • CountRODs
  • CountRODsByRef
  • CountReadEvents
  • CountReads
  • CountTerminusEvent
  • DepthOfCoverage
  • DiagnoseTargets
  • DiffObjects
  • ErrorRatePerCycle
  • FastaStats
  • FindCoveredIntervals
  • FlagStat
  • GCContentByInterval
  • GatherBqsrReports
  • Pileup
  • PrintRODs
  • QualifyMissingIntervals
  • ReadClippingStats
  • ReadGroupProperties
  • ReadLengthDistribution
  • SimulateReadsForVariants
  • Sequence Data Processing Tools
  • BaseRecalibrator
  • ClipReads
  • IndelRealigner
  • LeftAlignIndels
  • PrintReads
  • RealignerTargetCreator
  • SplitNCigarReads
  • SplitSamFile
  • Variant Discovery Tools
  • ApplyRecalibration
  • CalculateGenotypePosteriors
  • GATKPaperGenotyper
  • GenotypeGVCFs
  • HaplotypeCaller
  • MuTect2
  • RegenotypeVariants
  • UnifiedGenotyper
  • VariantRecalibrator
  • Variant Evaluation Tools
  • GenotypeConcordance
  • ValidateVariants
  • VariantEval
  • VariantFiltration
  • Variant Manipulation Tools
  • CatVariants
  • CombineGVCFs
  • CombineVariants
  • HaplotypeResolver
  • LeftAlignAndTrimVariants
  • PhaseByTransmission
  • RandomlySplitVariants
  • ReadBackedPhasing
  • SelectHeaders
  • SelectVariants
  • ValidationSiteSelector
  • VariantAnnotator
  • VariantsToAllelicPrimitives
  • VariantsToBinaryPed
  • VariantsToTable
  • VariantsToVCF

  • Annotation Modules
  • AS_BaseQualityRankSumTest
  • AS_FisherStrand
  • AS_InbreedingCoeff
  • AS_InsertSizeRankSum
  • AS_MQMateRankSumTest
  • AS_MappingQualityRankSumTest
  • AS_QualByDepth
  • AS_RMSMappingQuality
  • AS_ReadPosRankSumTest
  • AS_StrandOddsRatio
  • AlleleBalance
  • AlleleBalanceBySample
  • AlleleCountBySample
  • BaseCounts
  • BaseCountsBySample
  • BaseQualityRankSumTest
  • BaseQualitySumPerAlleleBySample
  • ChromosomeCounts
  • ClippingRankSumTest
  • ClusteredReadPosition
  • Coverage
  • DepthPerAlleleBySample
  • DepthPerSampleHC
  • ExcessHet
  • FisherStrand
  • FractionInformativeReads
  • GCContent
  • GenotypeSummaries
  • HaplotypeScore
  • HardyWeinberg
  • HomopolymerRun
  • InbreedingCoeff
  • LikelihoodRankSumTest
  • LowMQ
  • MVLikelihoodRatio
  • MappingQualityRankSumTest
  • MappingQualityZero
  • MappingQualityZeroBySample
  • NBaseCount
  • OxoGReadCounts
  • PossibleDeNovo
  • QualByDepth
  • RMSMappingQuality
  • ReadPosRankSumTest
  • SampleList
  • SnpEff
  • SpanningDeletions
  • StrandAlleleCountsBySample
  • StrandBiasBySample
  • StrandOddsRatio
  • TandemRepeatAnnotator
  • TransmissionDisequilibriumTest
  • VariantType
  • Read Filters
  • BadCigarFilter
  • BadMateFilter
  • CountingFilteringIterator.CountingReadFilter
  • DuplicateReadFilter
  • FailsVendorQualityCheckFilter
  • HCMappingQualityFilter
  • LibraryReadFilter
  • MalformedReadFilter
  • MappingQualityFilter
  • MappingQualityUnavailableFilter
  • MappingQualityZeroFilter
  • MateSameStrandFilter
  • MaxInsertSizeFilter
  • MissingReadGroupFilter
  • NoOriginalQualityScoresFilter
  • NotPrimaryAlignmentFilter
  • OverclippedReadFilter
  • Platform454Filter
  • PlatformFilter
  • PlatformUnitFilter
  • ReadGroupBlackListFilter
  • ReadLengthFilter
  • ReadNameFilter
  • ReadStrandFilter
  • ReassignMappingQualityFilter
  • ReassignOneMappingQualityFilter
  • ReassignOriginalMQAfterIndelRealignmentFilter
  • SampleFilter
  • SingleReadGroupFilter
  • UnmappedReadFilter
  • Resource File Codecs
  • BeagleCodec
  • BedTableCodec
  • RawHapMapCodec
  • RefSeqCodec
  • SAMPileupCodec
  • SAMReadCodec
  • TableCodec

  • Reference Utilities
  • FastaAlternateReferenceMaker
  • FastaReferenceMaker
  • QCRef
  • Showing docs for version 3.7-0


    IndelRealigner

    Perform local realignment of reads around indels

    Category Sequence Data Processing Tools

    Traversal ReadWalker

    PartitionBy READ


    Overview

    The local realignment process is designed to consume one or more BAM files and to locally realign reads such that the number of mismatching bases is minimized across all the reads. In general, a large percent of regions requiring local realignment are due to the presence of an insertion or deletion (indels) in the individual's genome with respect to the reference genome. Such alignment artifacts result in many bases mismatching the reference near the misalignment, which are easily mistaken as SNPs. Moreover, since read mapping algorithms operate on each read independently, it is impossible to place reads on the reference genome such at mismatches are minimized across all reads. Consequently, even when some reads are correctly mapped with indels, reads covering the indel near just the start or end of the read are often incorrectly mapped with respect the true indel, also requiring realignment. Local realignment serves to transform regions with misalignments due to indels into clean reads containing a consensus indel suitable for standard variant discovery approaches.

    Note that indel realignment is no longer necessary for variant discovery if you plan to use a variant caller that performs a haplotype assembly step, such as HaplotypeCaller or MuTect2. However it is still required when using legacy callers such as UnifiedGenotyper or the original MuTect.

    There are 2 steps to the realignment process:

    1. Determining (small) suspicious intervals which are likely in need of realignment (see the RealignerTargetCreator tool)
    2. Running the realigner over those intervals (IndelRealigner)

    For more details, see the indel realignment method documentation.

    Input

    One or more aligned BAM files and optionally one or more lists of known indels.

    Output

    A realigned version of your input BAM file(s).

    Usage example

     java -jar GenomeAnalysisTK.jar \
       -T IndelRealigner \
       -R reference.fasta \
       -I input.bam \
       -known indels.vcf \
       -targetIntervals intervalListFromRTC.intervals \
       -o realignedBam.bam
     

    Caveats


    Additional Information

    Read filters

    These Read Filters are automatically applied to the data by the Engine before processing by IndelRealigner.

    Downsampling settings

    This tool does not apply any downsampling by default.


    Command-line Arguments

    Engine arguments

    All tools inherit arguments from the GATK Engine' "CommandLineGATK" argument collection, which can be used to modify various aspects of the tool's function. For example, the -L argument directs the GATK engine to restrict processing to specific genomic intervals; or the -rf argument allows you to apply certain read filters to exclude some of the data from the analysis.

    IndelRealigner specific arguments

    This table summarizes the command-line arguments that are specific to this tool. For more details on each argument, see the list further down below the table or click on an argument name to jump directly to that entry in the list.

    Argument name(s) Default value Summary
    Required Inputs
    --targetIntervals
    NA Intervals file output from RealignerTargetCreator
    Optional Inputs
    --knownAlleles
     -known
    [] Input VCF file(s) with known indels
    Optional Outputs
    --out
     -o
    NA Output bam
    Optional Parameters
    --consensusDeterminationModel
     -model
    USE_READS Determines how to compute the possible alternate consenses
    --LODThresholdForCleaning
     -LOD
    5.0 LOD threshold above which the cleaner will clean
    --nWayOut
    NA Generate one output file for each input (-I) bam file (not compatible with -output)
    Advanced Parameters
    --entropyThreshold
     -entropy
    0.15 Percentage of mismatches at a locus to be considered having high entropy (0.0 < entropy <= 1.0)
    --maxConsensuses
    30 Max alternate consensuses to try (necessary to improve performance in deep coverage)
    --maxIsizeForMovement
     -maxIsize
    3000 maximum insert size of read pairs that we attempt to realign
    --maxPositionalMoveAllowed
     -maxPosMove
    200 Maximum positional move in basepairs that a read can be adjusted during realignment
    --maxReadsForConsensuses
     -greedy
    120 Max reads used for finding the alternate consensuses (necessary to improve performance in deep coverage)
    --maxReadsForRealignment
     -maxReads
    20000 Max reads allowed at an interval for realignment
    --maxReadsInMemory
     -maxInMemory
    150000 max reads allowed to be kept in memory at a time by the SAMFileWriter
    Advanced Flags
    --noOriginalAlignmentTags
     -noTags
    false Don't output the original cigar or alignment start tags for each realigned read in the output bam

    Argument details

    Arguments in this list are specific to this tool. Keep in mind that other arguments are available that are shared with other tools (e.g. command-line GATK arguments); see Inherited arguments above.


    --consensusDeterminationModel / -model

    Determines how to compute the possible alternate consenses
    We recommend that users run with USE_READS when trying to realign high quality longer read data mapped with a gapped aligner; Smith-Waterman is really only necessary when using an ungapped aligner (e.g. MAQ in the case of single-end read data).

    The --consensusDeterminationModel argument is an enumerated type (ConsensusDeterminationModel), which can have one of the following values:

    KNOWNS_ONLY
    Uses only indels from a provided ROD of known indels.
    USE_READS
    Additionally uses indels already present in the original alignments of the reads.
    USE_SW
    Additionally uses 'Smith-Waterman' to generate alternate consenses.

    ConsensusDeterminationModel  USE_READS


    --entropyThreshold / -entropy

    Percentage of mismatches at a locus to be considered having high entropy (0.0 < entropy <= 1.0)
    For expert users only! This is similar to the argument in the RealignerTargetCreator walker. The point here is that the realigner will only proceed with the realignment (even above the given threshold) if it minimizes entropy among the reads (and doesn't simply push the mismatch column to another position). This parameter is just a heuristic and should be adjusted based on your particular data set.

    double  0.15  [ [ -∞  ∞ ] ]


    --knownAlleles / -known

    Input VCF file(s) with known indels
    Any number of VCF files representing known indels to be used for constructing alternate consenses. Could be e.g. dbSNP and/or official 1000 Genomes indel calls. Non-indel variants in these files will be ignored.

    This argument supports reference-ordered data (ROD) files in the following formats: BCF2, VCF, VCF3

    List[RodBinding[VariantContext]]  []


    --LODThresholdForCleaning / -LOD

    LOD threshold above which the cleaner will clean
    This term is equivalent to "significance" - i.e. is the improvement significant enough to merit realignment? Note that this number should be adjusted based on your particular data set. For low coverage and/or when looking for indels with low allele frequency, this number should be smaller.

    double  5.0  [ [ -∞  ∞ ] ]


    --maxConsensuses / -maxConsensuses

    Max alternate consensuses to try (necessary to improve performance in deep coverage)
    For expert users only! If you need to find the optimal solution regardless of running time, use a higher number.

    int  30  [ [ -∞  ∞ ] ]


    --maxIsizeForMovement / -maxIsize

    maximum insert size of read pairs that we attempt to realign
    For expert users only!

    int  3000  [ [ -∞  ∞ ] ]


    --maxPositionalMoveAllowed / -maxPosMove

    Maximum positional move in basepairs that a read can be adjusted during realignment
    For expert users only!

    int  200  [ [ -∞  ∞ ] ]


    --maxReadsForConsensuses / -greedy

    Max reads used for finding the alternate consensuses (necessary to improve performance in deep coverage)
    For expert users only! If you need to find the optimal solution regardless of running time, use a higher number.

    int  120  [ [ -∞  ∞ ] ]


    --maxReadsForRealignment / -maxReads

    Max reads allowed at an interval for realignment
    For expert users only! If this value is exceeded at a given interval, realignment is not attempted and the reads are passed to the output file(s) as-is. If you need to allow more reads (e.g. with very deep coverage) regardless of memory, use a higher number.

    int  20000  [ [ -∞  ∞ ] ]


    --maxReadsInMemory / -maxInMemory

    max reads allowed to be kept in memory at a time by the SAMFileWriter
    For expert users only! To minimize memory consumption you can lower this number (but then the tool may skip realignment on regions with too much coverage; and if the number is too low, it may generate errors during realignment). Just make sure to give Java enough memory! 4Gb should be enough with the default value.

    int  150000  [ [ -∞  ∞ ] ]


    --noOriginalAlignmentTags / -noTags

    Don't output the original cigar or alignment start tags for each realigned read in the output bam

    boolean  false


    --nWayOut / -nWayOut

    Generate one output file for each input (-I) bam file (not compatible with -output)
    Reads from all input files will be realigned together, but then each read will be saved in the output file corresponding to the input file that the read came from. There are two ways to generate output bam file names: 1) if the value of this argument is a general string (e.g. '.cleaned.bam'), then extensions (".bam" or ".sam") will be stripped from the input file names and the provided string value will be pasted on instead; 2) if the value ends with a '.map' (e.g. input_output.map), then the two-column tab-separated file with the specified name must exist and list unique output file name (2nd column) for each input file name (1st column). Note that some GATK arguments do NOT work in conjunction with nWayOut (e.g. --disable_bam_indexing).

    String  NA


    --out / -o

    Output bam
    The realigned bam file.

    GATKSAMFileWriter  NA


    --targetIntervals / -targetIntervals

    Intervals file output from RealignerTargetCreator
    The interval list output from the RealignerTargetCreator tool using the same bam(s), reference, and known indel file(s).

    R IntervalBinding[Feature]  NA