Last updated: 2022-03-15

Checks: 7 0

Knit directory: chipseq-cross-species/

This reproducible R Markdown analysis was created with workflowr (version 1.7.0). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.


Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

The command set.seed(20220209) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version 61f21eb. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .RData
    Ignored:    .Rhistory
    Ignored:    analysis/.RData
    Ignored:    analysis/.Rhistory
    Ignored:    data/genomic_data/
    Ignored:    output/annotations/
    Ignored:    output/bam_files/
    Ignored:    output/filtered_peaks/
    Ignored:    output/logs/
    Ignored:    output/peaks/
    Ignored:    output/plots/
    Ignored:    output/qc/A-1_input.SeqDepthNorm.bw
    Ignored:    output/qc/A-1_input.SeqDepthNorm_dunnart_downSampled.bw
    Ignored:    output/qc/A-1_input.ccurve.txt
    Ignored:    output/qc/A-1_input.extrap.txt
    Ignored:    output/qc/A-1_input_R1_trimmed.fastq.gz
    Ignored:    output/qc/A-1_input_est_lib_complex_metrics.txt
    Ignored:    output/qc/A-2_H3K4me3.SeqDepthNorm.bw
    Ignored:    output/qc/A-2_H3K4me3.SeqDepthNorm_dunnart_downSampled.bw
    Ignored:    output/qc/A-2_H3K4me3.ccurve.txt
    Ignored:    output/qc/A-2_H3K4me3.extrap.txt
    Ignored:    output/qc/A-2_H3K4me3_R1_trimmed.fastq.gz
    Ignored:    output/qc/A-2_H3K4me3_est_lib_complex_metrics.txt
    Ignored:    output/qc/A-2_H3K4me3_vs_A-1_input.frip_default.txt
    Ignored:    output/qc/A-2_H3K4me3_vs_B-1_input.frip_default.txt
    Ignored:    output/qc/A-3_H3K27ac.SeqDepthNorm.bw
    Ignored:    output/qc/A-3_H3K27ac.SeqDepthNorm_dunnart_downSampled.bw
    Ignored:    output/qc/A-3_H3K27ac.ccurve.txt
    Ignored:    output/qc/A-3_H3K27ac.extrap.txt
    Ignored:    output/qc/A-3_H3K27ac_R1_trimmed.fastq.gz
    Ignored:    output/qc/A-3_H3K27ac_est_lib_complex_metrics.txt
    Ignored:    output/qc/A-3_H3K27ac_vs_A-1_input.frip_default.txt
    Ignored:    output/qc/A-3_H3K27ac_vs_B-1_input.frip_default.txt
    Ignored:    output/qc/B-1_input.SeqDepthNorm.bw
    Ignored:    output/qc/B-1_input.SeqDepthNorm_dunnart_downSampled.bw
    Ignored:    output/qc/B-1_input.ccurve.txt
    Ignored:    output/qc/B-1_input.extrap.txt
    Ignored:    output/qc/B-1_input_R1_trimmed.fastq.gz
    Ignored:    output/qc/B-1_input_est_lib_complex_metrics.txt
    Ignored:    output/qc/B-2_H3K4me3.SeqDepthNorm.bw
    Ignored:    output/qc/B-2_H3K4me3.SeqDepthNorm_dunnart_downSampled.bw
    Ignored:    output/qc/B-2_H3K4me3.ccurve.txt
    Ignored:    output/qc/B-2_H3K4me3.extrap.txt
    Ignored:    output/qc/B-2_H3K4me3_R1_trimmed.fastq.gz
    Ignored:    output/qc/B-2_H3K4me3_est_lib_complex_metrics.txt
    Ignored:    output/qc/B-2_H3K4me3_vs_A-1_input.frip_default.txt
    Ignored:    output/qc/B-2_H3K4me3_vs_A-1_input.frip_default_dunnart_downSampled.txt
    Ignored:    output/qc/B-2_H3K4me3_vs_B-1_input.frip_default.txt
    Ignored:    output/qc/B-3_H3K27ac.SeqDepthNorm.bw
    Ignored:    output/qc/B-3_H3K27ac.SeqDepthNorm_dunnart_downSampled.bw
    Ignored:    output/qc/B-3_H3K27ac.ccurve.txt
    Ignored:    output/qc/B-3_H3K27ac.extrap.txt
    Ignored:    output/qc/B-3_H3K27ac_R1_trimmed.fastq.gz
    Ignored:    output/qc/B-3_H3K27ac_est_lib_complex_metrics.txt
    Ignored:    output/qc/B-3_H3K27ac_vs_A-1_input.frip_default.txt
    Ignored:    output/qc/B-3_H3K27ac_vs_B-1_input.frip_default.txt
    Ignored:    output/qc/bamPEFragmentSize_rawcounts.tab
    Ignored:    output/qc/bamPEFragmentSize_rawcounts_dunnart_downSampled.tab
    Ignored:    output/qc/multiBAM_fingerprint_metrics.txt
    Ignored:    output/qc/multiBAM_fingerprint_metrics_dunnart_downSampled.txt
    Ignored:    output/qc/multiBAM_fingerprint_rawcounts.txt
    Ignored:    output/qc/multiBAM_fingerprint_rawcounts_dunnart_downSampled.txt
    Ignored:    output/qc/multibamsum.npz
    Ignored:    output/qc/multibamsum.tab
    Ignored:    output/qc/multibamsum_dunnart_downSampled.npz
    Ignored:    output/qc/multibamsum_dunnart_downSampled.tab
    Ignored:    output/qc/pearsoncor_multibamsum_matrix.txt
    Ignored:    output/qc/pearsoncor_multibamsum_matrix_dunnart_downSampled.txt
    Ignored:    output/wga/
    Ignored:    results/

Untracked files:
    Untracked:  .snakemake/
    Untracked:  Rplots.pdf
    Untracked:  data/raw_reads/
    Untracked:  output/qc/H3K27ac_overlap_default.frip
    Untracked:  output/qc/ucsc_alignment/
    Untracked:  output/rnaseq/
    Untracked:  slurm-33931929.out
    Untracked:  slurm-33931930.out
    Untracked:  slurm-33931931.out
    Untracked:  slurm-33931932.out
    Untracked:  slurm-33931933.out
    Untracked:  slurm-33931934.out
    Untracked:  slurm-33934389.out
    Untracked:  slurm-33934390.out
    Untracked:  slurm-33934391.out
    Untracked:  slurm-33934392.out
    Untracked:  slurm-33934393.out
    Untracked:  slurm-33934394.out
    Untracked:  slurm-33935177.out
    Untracked:  slurm-33935178.out
    Untracked:  slurm-33935179.out
    Untracked:  slurm-33935180.out
    Untracked:  slurm-33935181.out
    Untracked:  slurm-33935182.out
    Untracked:  slurm-33938210.out
    Untracked:  slurm-33938211.out
    Untracked:  slurm-33938212.out
    Untracked:  slurm-33938213.out
    Untracked:  slurm-33938214.out
    Untracked:  slurm-33938215.out
    Untracked:  slurm-33948861.out
    Untracked:  slurm-33948862.out
    Untracked:  slurm-33948863.out
    Untracked:  slurm-33948864.out
    Untracked:  slurm-33948865.out
    Untracked:  slurm-33948866.out

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.


These are the previous versions of the repository in which changes were made to the R Markdown (analysis/whole_genome_alignment.Rmd) and HTML (docs/whole_genome_alignment.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File Version Author Date Message
Rmd 61f21eb lecook 2022-03-15 wflow_publish(“analysis/whole_genome_alignment.Rmd”)
html c3418c7 lecook 2022-03-15 Build site.
Rmd 09ea90e lecook 2022-03-15 wflow_publish(“analysis/whole_genome_alignment.Rmd”)
html a557720 lecook 2022-03-15 Build site.
Rmd 4b30eee lecook 2022-03-15 wflow_publish(“analysis/whole_genome_alignment.Rmd”)
Rmd 6ebc287 lecook 2022-03-14 fix
Rmd bd46533 lecook 2022-03-14 update
Rmd d8b90f6 lecook 2022-03-14 updated
Rmd 290c569 lecook 2022-03-14 update
Rmd 7e64d0f lecook 2022-03-14 add wga notes
Rmd a97d3c5 lecook 2022-03-01 first commit
Rmd 1fa92d9 lecook 2022-02-23 add analysis files

Notes on Whole Genome Alignment

There are a number of online tutorials to build the alignment and nets using Lastz: http://genomewiki.ucsc.edu/index.php/Whole_genome_alignment_howto https://www.bioconductor.org/packages/release/bioc/vignettes/CNEr/inst/doc/PairwiseWholeGenomeAlignment.html https://github.com/hillerlab/GenomeAlignmentTools (more fine-tuned and advanced)

Methods from Hecker & Hiller 2020 Whole Genome Alignment:

To compute pairwise and multiple genome alignments, we used the human hg38 assembly as the reference (Supplementary Fig. 1 shows the entire workflow). We first built pairwise alignments between human and a query species using lastz and axtChain to compute co-linear alignment chains [82]. To align placental mammals, we used previously determined lastz parameters (K = 2400, L = 3000, Y = 9400, H = 2000, and the lastz default scoring matrix) that have a sufficient sensitivity to capture orthologous exons [16]. To align chimpanzee, bonobo, and gorilla, we changed the lastz parameters (K = 4500 and L = 4500).

After building chains, we applied RepeatFiller (RRID:SCR_017414), a method that performs another round of local alignment, considering unaligning regions ≤20 kb in size that are bounded by co-linear alignment blocks up- and downstream. RepeatFiller removes any repeat masking from the unaligned region and is therefore able to detect novel alignments between repetitive regions. We have previously shown that RepeatFiller detects several megabases of aligning repetitive sequences that would be missed otherwise. After RepeatFiller, we applied chainCleaner with parameters -LRfoldThreshold = 2.5 -doPairs -LRfoldThresholdPairs = 10 -maxPairDistance = 10000 -maxSuspectScore = 100000 -minBrokenChainScore = 75000 to improve alignment specificity. Pairwise alignment chains were converted into alignment nets using a modified version of chainNet that computes real scores of partial nets. Nets were filtered using NetFilterNonNested.perl with parameters -doUCSCSynFilter -keepSynNetsWithScore 5000 -keepInvNetsWithScore 5000, which applies the UCSC “syntenic net” score thresholds (minTopScore of 300000 and minSynScore of 200000) and keeps nested nets that align to the same locus (inversions or local translocations; net type “inv” or “syn” according to netClass) if they score ≥5,000. For the Mongolian gerbil, tarsier, Malayan flying lemur, sperm whale, Przewalski’s horse, Weddell seal, Malayan pangolin, Chinese pangolin, Hoffmann’s two-fingered sloth, and Cape rock hyrax that have genome assemblies with a scaffold N50 ≤1,000,000 and a contig N50 ≤100,000, we just required that nets have a score ≥100,000. For marsupials and platypus, we lowered the score threshold for nets to 10,000 and kept inv or syn nets with scores ≥3,000.

Some definitions

In chain and net lingo, the target is the reference genome sequence and the query is some other genome sequence. For example, if you are viewing Human-Mouse alignments in the Human genome browser, human is the target and mouse is the query.

A gapless block is a base-for-base alignment between part of the target and part of the query, possibly including mismatching bases. It has the same length in bases on the target and the query. This is the output of the most primitive alignment algorithms.

A gap is a link between two gapless blocks, indicating that the target or the query has sequence that should be skipped over in order to make the best-scoring alignment. In other words, the scoring penalty for skipping over one or more bases is less than the penalty for continuing to align the sequences without skipping.

A single-sided gap is a gap in which sequence in either target or query must be skipped over. A plausible explanation for needing to skip over a base in the target while not skipping a base in the query is that either the target has an inserted base or the query has a deleted base. Many alignment tools produce alignments with single-sided gaps between gapless blocks.

A double-sided gap skips over sequence in both target and query because the sum of penalties for mismatching bases exceeds the penalty for extending a gap across them. This is possible only when the penalty for extending a gap is less than the penalty for creating a new gap and less than the penalty for a mismatch, and when the alignment algorithm is capable of considering double-sided gaps.

A chain is a sequence of non-overlapping gapless blocks, with single- or double-sided gaps between blocks. Within a chain, target and query coords are monotonically non-decreasing (i.e. always increasing or flat). Chains are constructed by the axtChain program which finds pairwise alignments with the same target and query sequence, on the same strand, that can be merged if overlapping and joined into one longer alignment with a higher score under an affine gap-scoring system (progressively decreasing penalties for longer gaps).

  • double-sided gaps are a new capability (blastz can’t do that) that allow extremely long chains to be constructed.
  • not just orthologs, but paralogs too, can result in good chains. but that’s useful!
  • chains should be symmetrical – e.g. swap human-mouse -> mouse-human chains, and you should get approx. the same chains as if you chain swapped mouse-human blastz alignments. However, Blastz’s dynamic masking is asymmetrical, so in practice those results are not exactly symmetrical. Also, dynamic masking in conjunction with changed chunk sizes can cause differences in results from one run to the next.
  • chained blastz alignments are not single-coverage in either target or query unless some subsequent filtering (like netting) is done.
  • chain tracks can contain massive pileups when a piece of the target aligns well to many places in the query. Common causes of this include insufficient masking of repeats and high-copy-number genes (or paralogs).

A net is a hierarchical collection of chains, with the highest-scoring non-overlapping chains on top, and their gaps filled in where possible by lower-scoring chains, which in turn may have gaps filled in by lower-level chains and so on.

  • I think a chain’s qName also helps to determine which level it lands in, i.e. it makes a difference whether a chain’s qName is the same as the top-level chain’s qName or not, because the levels have meanings associated with them – see details page.
  • a net is single-coverage for target but not for query, unless it has been filtered to be single-coverage on both target and query. By convention we add “rbest” to the net filename in that case.
  • because it’s single-coverage in the target, it’s no longer symmetrical.
  • the netter has two outputs, one of which we usually ignore: the target-centric net in query coordinates. The reciprocal best process uses that output: the query-referenced (but target-centric / target single-cov) net is turned back into component chains, and then those are netted to get single coverage in the query too; the two outputs of that netting are reciprocal-best in query and target coords. Reciprocal-best nets are symmetrical again.
  • nets do a good job of filtering out massive pileups by collapsing them down to (usually) a single level.
  • “LiftOver chains” are actually chains extracted from nets, or chains filtered by the netting process.

Preparation

Spartan modules

module load foss
module load lastz
module load ucsc/21072020
module load perl

Repeat mask dunnart genome

Create conda environment will all the dependencies:

conda create -n wga
conda activate wga
conda config --add channels conda-forge
conda config --add channels biocore
conda config --add channels bioconda
conda install repeatmasker

Run RepeatModeler to de novo find repeat regions in the dunnart genome:

BuildDatabase -name dunnart -engine ncbi data/genomic_data/Scras_dunnart_assem1.0_pb-ont-illsr_flyeassem_red-rd-scfitr2_pil2xwgs2_60chr.fasta

nohup RepeatModeler -database dunnart -pa 20 >& output/wga/repeat_modeler
repeat_modeler/repeatmodeler.out

Run RepeatMasker to mask repeats in dunnart genome (makes repeats lowercase). Run as an array for scaffolds to make it quicker. Create commands for array slurm script: repeatMasker.sh

RepeatMasker -q -xsmall data/genomic_data/Scras_dunnart_assem1.0_pb-ont-illsr_flyeassem_red-rd-scfitr2_pil2xwgs2_60chr.fasta -default_search_engine hmmer -trf_prgm /home/lecook/.conda/envs/wga/bin/trf -hmmer_dir /home/lecook/.conda/envs/wga/bin/

Split into scaffolds

Using faSplit from the UCSC Kent Tools

faSplit byName data/genomic_data/Scras_dunnart_assem1.0_pb-ont-illsr_flyeassem_red-rd-scfitr2_pil2xwgs2_60chr.fasta data/genomic_data/smiCra1_scaffolds/

Create .2bit and .sizes files

faToTwoBit data/genomic_data/Scras_dunnart_assem1.0_pb-ont-illsr_flyeassem_red-rd-scfitr2_pil2xwgs2_60chr_RM.fasta data/genomic_data/smiCra1.2bit
twoBitInfo data/genomic_data/smiCra1.2bit stdout | sort -k2rn > data/genomic_data/smiCra1.chrom.sizes

mm10 target genome

http://hgdownload.cse.ucsc.edu/goldenpath/mm10/bigZips/mm10.2bit

mm10.2bit - contains the complete mouse/mm10 genome sequence in the 2bit file format. Repeats from RepeatMasker and Tandem Repeats Finder (with period of 12 or less) are shown in lower case; non-repeating sequence is shown in upper case.

lastZ

To align placental mammals, we used previously determined lastz parameters (K = 2400, L = 3000, Y = 9400, H = 2000, and the lastz default scoring matrix) that have a sufficient sensitivity to capture orthologous exons

To align placental mammals, we used the lastz alignment parameters K = 2400, L = 3000, Y = 9400, H = 2000 and the lastz default scoring matrix, correspond- ing to parameter set 2 in Table 1. To align vertebrates, we used K = 2400, L = 3000, Y = 3400, H = 2000 and the HoxD55 scoring matrix. Citation: Increased alignment sensitivity improves the usage of genome alignments for comparative gene annotation. Nucleic Acids Res. 2017;45(14):8369–77.

Create commands for running lastZ for all scaffolds: code/whole_genome_alignment/lastz.sh

Repeated with vertebrate alignment parameters and HoxD55 scoring matrix. Retrieve A LOT more aligned sequences. For example scaffold00002 with mammal parameters retrieved 863M of data, while with the new parameters it’s 2.5GB.

Run as an array on slurm: code/whole_genome_alignment//array_wrapper.slurm

Convert maf to axt-format

http://last.cbrc.jp/doc/maf-convert.html

module load last/last/1066
maf-convert axt output/wga/maf/${tr}.maf > output/wga/axt/${tr}.axt

script: code/whole_genome_alignment//maf-convert_commands.sh run with: code/whole_genome_alignment//array_wrapper.slurm

axtChain

We use axtChain (http://www.soe.ucsc.edu/~kent; default parameters) to build co-linear alignment chains.

axtChain -linearGap=loose -scoreScheme=lecook/bin/GenomeAlignmentTools/HoxD55.q output/axt/[axt_files] data/genomic_data/mm10.2bit data/genomic_data/smiCra1.2bit output/wga/axt_chain/mm10_smiCra1.chain

script: code/whole_genome_alignment//axt_commands.sh run with: code/whole_genome_alignment//array_wrapper.slurm

chainMergeSort

Merge short chains into longer ones, concatenate chains and sort

chainMergeSort output/wga/axt_chain/*.chain > output/wga/chain_merge/smiCra1_mm10.chain

Installing Genomic Alignment Tools (Hiller group)

git clone https://github.com/hillerlab/GenomeAlignmentTools.git

cd GenomeAlignmentTools/kent/src

make

# Kent binaries
PATH=$PATH:/Users/lauracook/bin/GenomeAlignmentTools/kent/bin;export PATH

export KENTSRC_DIR=/Users/lauracook/bin/GenomeAlignmentTools/kent/src/

cd /Users/lauracook/bin/GenomeAlignmentTools/src
export MACHTYPE=x86_64
make 

PATH=$PATH:/Users/lauracook/bin/GenomeAlignmentTools/src

RepeatFiller

https://github.com/hillerlab/GenomeAlignmentTools

RepeatFiller [5] is a tool to incorporate newly-detected repeat-overlapping alignments into pairwise alignment chains [4]. Its runtime adds little to the computationally more expensive step of generating chains in pairwise whole-genome alignments. RepeatFiller circumvents the problem that considering all repeat-overlapping alignment seeds during whole genome alignment is computationally not feasible. Therefore, RepeatFiller only aligns local genomic regions that are bounded by colinear aligning blocks, as provided in the chains, which makes it feasible to consider all seeds including those that overlap repetitive regions. RepeatFiller application to mammalian genome alignment chains can add between 22 and 84 Mb of previously-undetected alignments that mostly originate from transposable elements [5]. This helps to comprehensively align repetitive regions and improves the annotation of conserved non-coding elements.

python3 RepeatFiller.py -c output/wga/chain_merge/smiCra1_mm10.chain -T2 data/genomic_data/mm10.2bit -Q2 data/genomic_data/smiCra1.2bit

python3 RepeatFiller.py -c smiCra1_mm10_patched_sorted.chain -T2 data/genomic_data/mm10.2bit -Q2 data/genomic_data/smiCra1.2bit

RepeatFiller adds 0.3G data to the alignment - 2 Feb 2021

patchChain

patchChain.perl performs a highly sensitive local pairwise alignment for loci flanked by aligning blocks [1,3]. Given an alignment chain [4], it considers all chains that pass the score and span filters (optional parameters), extracts all the unaligning loci and creates local alignment jobs. After executing these alignment jobs, the newly found and the original local alignments are combined and used to produce a new set of improved chains.

This procedure is recommended for comparisons between species that are separated by >0.75 substitutions per neutral site [1].

patchChain.perl output/wga/chain_merge/smiCra1_mm10.chain data/genomic_data/mm10.2bit data/genomic_data/smiCra1.2bit data/genomic_data/mm10.chrom.sizes data/genomic_data/smiCra1.chrom.sizes -chainMinScore 5000 -gapMaxSizeT 500000 -gapMaxSizeQ 500000 -gapMinSizeT 30 -gapMinSizeQ 30 -numJobs 10 -jobDir jobs -jobList jobList -outputDir output/wga/patch_chain/pslOutput -minEntropy 1.8 -windowSize 30 -minIdentity 60 -lastzParameters "--format=axt K=1500 L=2500 M=0 T=0 W=5 Q=lecook/bin/GenomeAlignmentTools/example/HoxD55.q"

This results in 10 alignment jobs that are located in jobs/ and listed in ‘jobList’ now execute these jobs on a compute cluster or run them sequentially by doing ‘chmod +x jobList; ./jobList’

# concatenate all new results
find output/wga/psl -name "*.psl" | xargs -i cat {} > output/wga/patch_chain/newAlignments.psl

# concatenate all genomewide psl lastz alignments
find psl -name "*.psl" | xargs -i cat {} > otuput/wga/psl/genomeWide.lastz.psl

# combine the genome-wide lastz results (the combined psl file that was used to create the input chains) and the newly found psl alignments
cat psl/genomeWide.lastz.psl patchChain/newAlignments.psl > patch_chain/all.psl 

patchChain adds 1.1G to the genome wide alignment file

use axtChain from the Kent source to compute alignment chains that include the new alignments

This create a folder with the jobs and a jobList which can be called in an array slurm script. Then you can run the jobs in parallel.

axtChain again…

axtChain on psl alignments (patchChain new alignments plus genomeWide lastz alignments)

axtChain -psl -linearGap=loose -scoreScheme=/lecook/bin/GenomeAlignmentTools/HoxD55.q output/wga/patch_chain/all.psl data/genomic_data/mm10.2bit data/genomic_data/smiCra1.2bit output/wga/patch_chain/smiCra1_mm10_patched.chain

4.7G Jan 31 18:42 smiCra1_mm10.chain 5.3G Feb 1 17:52 smiCra1_mm10_patched.chain

chainCleaner

https://github.com/hillerlab/GenomeAlignmentTools

After RepeatFiller, we applied chainCleaner with parameters -LRfoldThreshold = 2.5 -doPairs -LRfoldThresholdPairs = 10 -maxPairDistance = 10000 -maxSuspectScore = 100000 -minBrokenChainScore = 75000 to improve alignment specificity.

chainCleaner improves the specificity in genome alignment chains by detecting and removing local alignments that obscure the evolutionary history of genomic rearrangements [2]. The input is a chain file, ideally after adding alignments found with highly sensitive parameters if distal species are compared. The output is a chain file that contains re-scored and score-sorted chains after removing the local alignments from the parent chains and adding them as individual chains. The resulting output file can be used to get alignment nets by running chainNet [4].

chainCleaner output/wga/repeat_filler/smiCra1_mm10_repFil.chain data/genomic_data/mm10.2bit data/genomic_data/smiCra1.2bit output/wga/chain_cleaner/smiCra1_mm10_repFill_chainCl.chain output/wga/chain_cleaner/smiCra1_mm10_repFill_chainCl.bed -tSizes=mm10.chrom.sizes -qSizes=smiCra1.chrom.sizes -LRfoldThreshold=2.5 -doPairs -LRfoldThresholdPairs=10 -maxPairDistance=10000 -maxSuspectScore=100000 -minBrokenChainScore=75000 -linearGap=loose

Run chain cleaner on patched chain file

chainCleaner smiCra1_mm10_patched_sorted_repFil.chain data/genomic_data/mm10.2bit data/genomic_data/smiCra1.2bit output/wga/chain_cleaner/smiCra1_mm10_patched_sorted_repFil_chainCl.chain output/wga/chain_cleaner/smiCra1_mm10_patched_sorted_repFil_chainCl.bed -tSizes=data/genomic_data/mm10.sizes -qSizes=data/genomic_data/smiCra1.sizes -LRfoldThreshold=2.5 -doPairs -LRfoldThresholdPairs=10 -maxPairDistance=10000 -maxSuspectScore=100000 -minBrokenChainScore=75000 -linearGap=loose

-LRfoldThreshold = threshold for removing local alignment blocks if the score of the left and right fill of brokenChain. Default is 2.5 -doPairs = flag: if set, do test if pairs of chain breaking alignments can be removed -LRfoldThresholdPairs = threshold for removing local alignment blocks if the score of the left and right fill of broken chains (for pairs). Default = 10 -maxPairDistance = only consider pairs of chain breaking alignments where the distance between the end of the upstream CBA and the start of the downstream CBA is at most that many bp (default 10000) -maxSuspectScore = threshold for score of suspect subChain. If higher, do not remove suspect.

-linearGap=loose

directory: output/wga/chain_cleaner/

For filtering the chains, we need the size of each chromosome

faSize data/genomic_data/smiCra1.fa -detailed > data/genomic_data/smiCra1.sizes
faSize data/genomic_data/mm10.fa -detailed > data/genomic_data/mm10.sizes

chainPreNet

chainPreNet output/wga/chain_cleaner/smiCra1_mm10_patched_sorted_chainCl.chain data/genomic_data/mm10.sizes data/genomic_data/smiCra1.sizes smiCra1_mm10_patched_sorted_chainCl_preNet.chain

chainPreNet without chainCleaner

chainPreNet smiCra1_mm10_patched_sorted_repFil_chainCl.chain data/genomic_data/mm10.sizes data/genomic_data/smiCra1.sizes output/wga/chain_prenet/smiCra1_mm10_patched_sorted_repFil_chainCl_preNet.chain

chainNet

Given a set of alignment chains, chainNet produces alignment nets, which is a hierarchical collection of chains or parts of chains that attempt to capture only orthologous alignments [4]. The original chainNet implementation approximates the score of “sub-nets” (nets that come from a part of a chain and fill a gap in a higher-level net) by the fraction of aligning bases. This can lead to a bias in case the aligning blocks of a chain are not equally distributed. We implemented a new parameter “-rescore” in chainNet that computes the real score of each subnet [2].

Make the alignment nets:

chainNet output/wga/chain_prenet/smiCra1_mm10_patched_sorted_repFil_chainCl_preNet.chain -minSpace=1 data/genomic_data/mm10.sizes data/genomic_data/smiCra1.sizes stdout /dev/null | netSyntenic stdin output/wga/chain_net/smiCra1_mm10_patched_sorted_repFil_chainCl_preNet.net

netChainSubset

Creates a single chain file using only the chains that also appear in the net.

netChainSubset output/wga/chain_prenet/smiCra1_mm10_patched_sorted_repFil_chainCl_preNet.net output/wga/chain_cleaner/smiCra1_mm10_patched_sorted_repFil_chainCl.chain output/wga/chain_net/smiCra1_mm10_patched_sorted_repFil_chainCl_preNet_chainNet.chain

Join chain fragments

chainStitchId output/wga/chain_net/smiCra1_mm10_patched_sorted_chainCl_netted.chain chainStitchId output/wga/chain_stitch/smiCra1_mm10_patched_sorted_chainCl_netted_stitched.chain

chainStitchId chainStitchId output/wga/chain_net/smiCra1_mm10_patched_sorted_repFil_chainCl_preNet_chainNet.chain chainStitchId output/wga/chain_stitch/smiCra1_mm10_patched_sorted_repFil_chainCl_preNet_chainNet_stitched.chain

From the synteny files (positions), get the sequences and re-create alignments.

Create MAF files

smiCra1_mm10_patched_sorted_repFil_chainCl_preNet_chainNet_stitched.chain

netToAxt output/wga/chain_net/smiCra1_mm10_patched_sorted_repFil_chainCl_preNet.net chain_stitch/smiCra1_mm10_patched_sorted_repFil_chainCl_preNet_chainNet_stitched.chain data/genomic_data/mm10.2bit data/genomic_data/smiCra1.2bit stdout | axtSort stdin output/wga/net_to_axt/mm10-smiCra1.axt

axtToMaf output/wga/net_to_axt/mm10-smiCra1.axt data/genomic_data/mm10.sizes data/genomic_data/smiCra1.sizes output/wga/maf/mm10-smiCra1.maf -tPrefix=mm10. -qPrefix=smiCra1.

chainSwap

chainSwap mm10_to_smiCra1_patched_sorted_chainCl_netted_stitched.chain stdout | nice chainSort stdin stdout | nice gzip -c > smiCra1_to_mm10_patched_sorted_chainCl_netted_stitched.chain

chainSwap smiCra1_mm10_patched_sorted_repFil_chainCl_preNet_chainNet_stitched.chain stdout | nice chainSort stdin stdout > mm10_smiCra1_patched_sorted_repFil_chainCl_preNet_chainNet_stitched.chain

Quality control of alignment blocks

MafFilter. Using mm10-smiCra1 maf file generated using axtToMaf look at the block lengths and genomic coverage.

maffilter param=options-file

Parameter options file

DATA=mm10-smiCra1 //A user-defined variable, representing the input maf file, without extension
input.file=$(DATA).maf.gz  //Input maf file, gzipped
input.file.compression=gzip
output.log=$(DATA).maffilter.log //Output log file
maf.filter=                                 \
    SequenceStatistics(                     \
        statistics=(                        \
                BlockLength,                \
            AlnScore,                       \
            BlockCounts,                    \
            PairwiseDivergence(             \
                species1=mm10,              \
                species2=smiCra1)),         \
        ref_species=mm10,                   \
        file=divergence.statistics.csv) 

Summarise block lengths, block counts, pairwise divergence and calculate exon coverage in the maf file between species in R.

# Load libraries

library(plyr)
library(data.table)
library(seqinr)
library(Biostrings)
library(tidyverse)

Summarise statistics from MafFilter

mm10 <- fread("output/qc/ucsc_alignment/mm10.statistics.csv") # read in table

smiCra1 <- fread("output/qc/ucsc_alignment/smiCra1.statistics.csv") # read in table
head(mm10) #check file has been imported correctly
    Chr   Start    Stop BlockLength AlnScore Counts.A Counts.C Counts.G
1: chr1 3012982 3013097         118     2740       23       16       12
2: chr1 3027832 3027936         104     2614       31       23       11
3: chr1 3028771 3028920         149     2821       51       18       36
4: chr1 3029833 3029947         122     2687       38       19       21
5: chr1 3044214 3044384         170      787       49       21       38
6: chr1 3044384 3044519         142     2483       30       35       29
   Counts.T Counts.Gap Counts.Unresolved
1:       64          3                 0
2:       39          0                 0
3:       44          0                 0
4:       36          8                 0
5:       62          0                 0
6:       41          7                 0
mm10ChrCount <- mm10 %>% dplyr::count(Chr, sort=TRUE) # how many blocks per chromosome
smiCra1ChrCount <- smiCra1 %>% dplyr::count(Chr, sort=TRUE) # how many blocks per chromosome

mm10 number of blocks per chromosome

mm10ChrCount
                     Chr      n
 1:                 chr1 207321
 2:                 chr2 206723
 3:                 chr3 167622
 4:                 chr6 163865
 5:                 chr4 163152
 6:                 chr5 163055
 7:                 chr7 152609
 8:                 chr8 142300
 9:                chr10 141726
10:                 chrX 138973
11:                chr11 138669
12:                 chr9 137467
13:                chr12 134230
14:                chr14 132548
15:                chr13 130426
16:                chr15 114484
17:                chr16 106874
18:                chr17 103335
19:                chr18  97483
20:                 chrY  73394
21:                chr19  67800
22: chr5_JH584299_random    417
23: chrY_JH584301_random    233
24: chr1_GL456211_random    224
25: chr4_JH584293_random    218
26: chr4_GL456350_random    209
27: chr1_GL456210_random    199
28: chr4_JH584294_random    182
29: chr1_GL456221_random    181
30: chrY_JH584300_random    179
31: chrX_GL456233_random    170
32: chr1_GL456212_random    141
33: chrY_JH584303_random    132
34: chrY_JH584302_random    101
35:       chrUn_GL456379     88
36: chr4_GL456216_random     87
37: chr5_GL456354_random     87
38: chr7_GL456219_random     87
39: chr5_JH584298_random     84
40: chr5_JH584297_random     83
41: chr5_JH584296_random     78
42:       chrUn_JH584304     77
43: chr1_GL456213_random     53
44:       chrUn_GL456359     38
45:       chrUn_GL456378     35
46:       chrUn_GL456393     35
47:       chrUn_GL456367     32
48:       chrUn_GL456382     30
49:       chrUn_GL456370     27
50:       chrUn_GL456372     27
51:       chrUn_GL456387     27
52:       chrUn_GL456239     25
53:       chrUn_GL456366     23
54: chr4_JH584292_random     22
55:       chrUn_GL456385     19
56:       chrUn_GL456394     18
57:       chrUn_GL456381     17
58:       chrUn_GL456360     12
59:       chrUn_GL456368     10
60:                 chrM      7
61: chr4_JH584295_random      3
                     Chr      n

dunnart number of blocks per chromosome

smiCra1ChrCount
                           Chr      n
  1: scaffold00002_pilon_pilon 456793
  2: scaffold00001_pilon_pilon 105481
  3: scaffold00004_pilon_pilon  73635
  4: scaffold00020_pilon_pilon  73492
  5: scaffold00009_pilon_pilon  69917
 ---                                 
609: scaffold00699_pilon_pilon      1
610: scaffold00704_pilon_pilon      1
611: scaffold00708_pilon_pilon      1
612: scaffold00714_pilon_pilon      1
613: scaffold00715_pilon_pilon      1

total size (incl gaps)

just use mouse as the statistics are the same

size1to10bp <- mm10[which(mm10$BlockLength>=1 & mm10$BlockLength<=10),]
size10to100bp <- mm10[which(mm10$BlockLength>=10 & mm10$BlockLength<=100),]
size100to1kb <- mm10[which(mm10$BlockLength>=100 & mm10$BlockLength<=1000),]
size1kbto10kb <- mm10[which(mm10$BlockLength>=1000 & mm10$BlockLength<=10000),]
size10kbto100kb <- mm10[which(mm10$BlockLength>=10000 & mm10$BlockLength<=100000),]
size100kbto1Mb <- mm10[which(mm10$BlockLength>=100000 & mm10$BlockLength<=1000000),]

dplyr::count(size1to10bp)
      n
1: 5523
count(size10to100bp)
        n
1: 816883
count(size100to1kb)
         n
1: 1975414
count(size1kbto10kb)
        n
1: 103456
count(size10kbto100kb)
    n
1: 17
count(size100kbto1Mb)
   n
1: 0
sum(size1to10bp$BlockLength)
[1] 28265
sum(size10to100bp$BlockLength)
[1] 57934059
sum(size100to1kb$BlockLength)
[1] 578314504
sum(size1kbto10kb$BlockLength)
[1] 165302684
sum(size10kbto100kb$BlockLength)
[1] 212515
sum(size100kbto1Mb$BlockLength)
[1] 0
df <- data.table(
  size = c("size1to10bp", "size10to100bp", "size100to1kb", "size1kbto10kb", "size10kbto100kb", "size100kbto1Mb"),
  sum_block_length = c(sum(size1to10bp$BlockLength),
  sum(size10to100bp$BlockLength),
  sum(size100to1kb$BlockLength),
  sum(size1kbto10kb$BlockLength),
  sum(size10kbto100kb$BlockLength),
  sum(size100kbto1Mb$BlockLength)),
  number_blocks = c(dplyr::count(size1to10bp), dplyr::count(size10to100bp), dplyr::count(size100to1kb), 
  dplyr::count(size1kbto10kb), dplyr::count(size10kbto100kb), dplyr::count(size100kbto1Mb))
)

df
              size sum_block_length number_blocks
1:     size1to10bp            28265          5523
2:   size10to100bp         57934059        816883
3:    size100to1kb        578314504       1975414
4:   size1kbto10kb        165302684        103456
5: size10kbto100kb           212515            17
6:  size100kbto1Mb                0             0
  • Fraction of mouse exons are in the alignment, and what fraction you actually recover by using liftover.

mm10 exons from refseq (downloaded from UCSC)

dunnart exons from lifted devil annotation gtf file converted to a bed file using BEDOPS gtf2bed function

gff2bed < data/genomic_data/Scras_dunnart_assem1.0_pb-ont-illsr_flyeassem_red-rd-scfitr2_pil2xwgs2_60chr2.gff > data/genomic_data/dunnart_gff.bed
less data/genomic_data/dunnart_gff.bed | grep "exon" > data/genomic_data/dunnart_exons.bed

Mouse exon coverage

mm10 refseq ncbi curated exons downloaded from ucsc

system("bedtools sort -i data/genomic_data/mm10exons.bed > output/qc/ucsc_alignment/mm10exonsSorted.bed", intern=TRUE) ## sort by coordinates

system("bedtools merge -i output/qc/ucsc_alignment/mm10exonsSorted.bed > output/qc/ucsc_alignment/mm10exonsMerged.bed", intern=TRUE) ## bedtools merge where transcripts are the same coordinates or within each other

system("bedtools merge -i output/qc/ucsc_alignment/mm10_mafAlignment.bed > output/qc/ucsc_alignment/mm10_mafAlignmentMerged.bed", intern=TRUE) ## also merge the maf alignments from the liftOver

system("bedtools intersect -a output/qc/ucsc_alignment/mm10exonsMerged.bed -b output/qc/ucsc_alignment/mm10_mafAlignmentBlocks.bed -wo > output/qc/ucsc_alignment/mm10exonOverlaps.bed", intern=TRUE) ## intersect exon coordinates with maf alignment file
mm10exon <- fread("output/qc/ucsc_alignment/mm10exonsMerged.bed") ## exon file
mm10exon$length <- (mm10exon$V3 - mm10exon$V2) ## exon length in ncbi refseq 
mm10overlaps <- fread("output/qc/ucsc_alignment/mm10exonOverlaps.bed") ## overlap file
dim(mm10overlaps) ## check file imported correctly
[1] 190176     10
sum(mm10overlaps$V10) ## 31776094bps of exons found in maf alignment
[1] 31776094
sum(mm10exon$length) ## 34221920bps of exons in refseq mm10
[1] 34221920
mm10exonCoverage <- sum(mm10overlaps$V10)/sum(mm10exon$length) ## converge alignment exon bps/total exon bps

print(paste0("mm10 exon coverage = ", mm10exonCoverage))
[1] "mm10 exon coverage = 0.928530427281695"

Dunnart exon coverage

system("bedtools sort -i data/genomic_data/dunnart_exons.bed > output/qc/ucsc_alignment/dunnart_exonsSorted.bed", intern = TRUE) ## sort by coordinates

system("bedtools merge -i output/qc/ucsc_alignment/dunnart_exonsSorted.bed > output/qc/ucsc_alignment/smiCra1exonsMerged.bed", intern = TRUE) ## bedtools merge where transcripts are the same coordinates or within each other

system("bedtools merge -i output/qc/ucsc_alignment/smiCra1_mafAlignmentBlocksSorted.bed > output/qc/ucsc_alignment/smiCra1_mafAlignmentMerged.bed", intern = TRUE) ## also merge the maf alignments from the liftOver
system("bedtools intersect -a output/qc/ucsc_alignment/smiCra1exonsMerged.bed -b output/qc/ucsc_alignment/smiCra1_mafAlignmentMerged.bed -wo > output/qc/ucsc_alignment/smiCra1_exonOverlaps.bed", intern=TRUE) ## intersect exon coordinates with maf alignment file
smiCra1exon <- fread("output/qc/ucsc_alignment/smiCra1exonsMerged.bed") ## exon file
dim(smiCra1exon) ## check file imported correctly
[1] 193182      3
smiCra1exon$length <- (smiCra1exon$V3 - smiCra1exon$V2)  ## exon length
smiCra1overlaps <- fread("output/qc/ucsc_alignment/smiCra1_exonOverlaps.bed") ## exon overlaps with maf alignment file
dim(smiCra1overlaps) ## check imported correctly
[1] 193996      7
sum(smiCra1exon$length) # 63502228
[1] 63502228
sum(smiCra1overlaps$V7) # 43939135
[1] 43939135
smiCra1exonCoverage <- sum(smiCra1overlaps$V7)/sum(smiCra1exon$length)

print(paste0("dunnart exon coverage = ", smiCra1exonCoverage))
[1] "dunnart exon coverage = 0.691930604387613"

Average sequence percentage mismatches for dunnart and mouse

mm10 <- fread("output/qc/ucsc_alignment/mm10.divergence.statistics.csv")
smiCra1 <- fread("output/qc/ucsc_alignment/smiCra1.divergence.statistics.csv")

mean(mm10$`Div.mm10-smiCra1`)
[1] 37.1781
mean(smiCra1$`Div.smiCra1-mm10`)
[1] 37.1781
## Genome coverage
readDNAmm10 <- readDNAStringSet("output/qc/ucsc_alignment/mm10_mafAlignment_nogaps.fasta") #number bp in alignment 743350114
readDNAsmiCra1 <- readDNAStringSet("output/qc/ucsc_alignment/smiCra1_mafAlignment_nogaps.fasta") #number bp in alignment 747935780

mm10genomeSize <- 2652783500
smiCra1genomeSize <-2838290115

mm10coverage <- readDNAmm10@ranges@width/mm10genomeSize
smiCra1coverage <- readDNAsmiCra1@ranges@width/smiCra1genomeSize

print(paste0("mm10 genome coverage = ", mm10coverage))
[1] "mm10 genome coverage = 0.280215145336964"
print(paste0("dunnart genome coverage =", smiCra1coverage))
[1] "dunnart genome coverage =0.263516324863077"

LiftOver

liftOver data/genomic_data/smiCra1exonsMerged.bed output/wga/liftOver_chains/mm10_smiCra1_patched_sorted_repFil_chainCl_preNet_chainNet_stitched.chain output/qc/ucsc_alignment/smiCra1exonsMapped.txt output/qc/ucsc_alignment/smiCra1exonsUnmapped.txt

number of dunnart exons = 193182

number of dunnart exons that can be mapped to mouse = 111693

liftOver output/qc/ucsc_alignment/mm10exonsMerged.bed output/wga/liftOver_chains/smiCra1_mm10_patched_sorted_repFil_chainCl_preNet_chainNet_stitched.chain output/qc/ucsc_alignment/mm10exonsMapped.txt output/qc/ucsc_alignment/mm10exonsUnmapped.txt

number of mouse exons = 193776

number of mouse exons that can be lifted over to dunnart = 160213

References

[1] Sharma V, Hiller M. Increased alignment sensitivity improves the usage of genome alignments for comparative gene annotation. Nucleic Acids Res., 45(14), 8369–8377, 2017

[2] Suarez H, Langer BE, Ladde P, Hiller M. chainCleaner improves genome alignment specificity and sensitivity. Bioinformatics, 33(11):1596-1603, 2017

[3] Hiller M, Agarwal S, Notwell JH, Parikh R, Guturu H, Wenger AM, Bejerano G. Computational methods to detect conserved non-genic elements in phylogenetically isolated genomes: application to zebrafish. Nucleic Acids Res, 41(15):e151.

[4] Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution’s cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. PNAS, 100(20):11484-9, 2003

[5] Osipova E, Hecker N, Hiller M. RepeatFiller newly identifies megabases of aligning repetitive sequences and improves annotations of conserved non-exonic elements, submitted

[6] MafFilter

[7] Kent tools UCSC


sessionInfo()
R version 4.1.0 (2021-05-18)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux

Matrix products: default
BLAS/LAPACK: /usr/local/easybuild-2019/easybuild/software/compiler/gcc/10.2.0/openblas/0.3.12/lib/libopenblas_haswellp-r0.3.12.so

locale:
 [1] LC_CTYPE=en_AU.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_AU.UTF-8        LC_COLLATE=en_AU.UTF-8    
 [5] LC_MONETARY=en_AU.UTF-8    LC_MESSAGES=en_AU.UTF-8   
 [7] LC_PAPER=en_AU.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
 [1] forcats_0.5.1       stringr_1.4.0       dplyr_1.0.8        
 [4] purrr_0.3.4         readr_1.4.0         tidyr_1.1.3        
 [7] tibble_3.1.2        ggplot2_3.3.3       tidyverse_1.3.1    
[10] Biostrings_2.62.0   GenomeInfoDb_1.30.1 XVector_0.34.0     
[13] IRanges_2.28.0      S4Vectors_0.32.3    BiocGenerics_0.40.0
[16] seqinr_4.2-5        data.table_1.14.0   plyr_1.8.6         
[19] workflowr_1.7.0    

loaded via a namespace (and not attached):
 [1] httr_1.4.2             sass_0.4.0             jsonlite_1.7.2        
 [4] modelr_0.1.8           bslib_0.2.5.1          assertthat_0.2.1      
 [7] getPass_0.2-2          cellranger_1.1.0       GenomeInfoDbData_1.2.7
[10] yaml_2.2.1             progress_1.2.2         pillar_1.6.1          
[13] backports_1.2.1        glue_1.4.2             digest_0.6.27         
[16] promises_1.2.0.1       rvest_1.0.0            colorspace_2.0-1      
[19] htmltools_0.5.1.1      httpuv_1.6.1           pkgconfig_2.0.3       
[22] broom_0.7.6            haven_2.4.1            zlibbioc_1.40.0       
[25] scales_1.1.1           processx_3.5.2         whisker_0.4           
[28] later_1.2.0            git2r_0.28.0           generics_0.1.0        
[31] ellipsis_0.3.2         withr_2.4.2            cli_2.5.0             
[34] magrittr_2.0.1         crayon_1.4.1           readxl_1.3.1          
[37] evaluate_0.14          ps_1.6.0               fs_1.5.0              
[40] fansi_0.5.0            MASS_7.3-54            xml2_1.3.2            
[43] tools_4.1.0            prettyunits_1.1.1      hms_1.1.0             
[46] lifecycle_1.0.1        reprex_2.0.0           munsell_0.5.0         
[49] callr_3.7.0            ade4_1.7-16            compiler_4.1.0        
[52] jquerylib_0.1.4        rlang_1.0.2            grid_4.1.0            
[55] RCurl_1.98-1.3         rstudioapi_0.13        bitops_1.0-7          
[58] rmarkdown_2.8          gtable_0.3.0           DBI_1.1.1             
[61] R6_2.5.0               lubridate_1.7.10       knitr_1.33            
[64] utf8_1.2.1             rprojroot_2.0.2        stringi_1.6.2         
[67] Rcpp_1.0.8             vctrs_0.3.8            dbplyr_2.1.1          
[70] tidyselect_1.1.1       xfun_0.23