View on GitHub

tigmint

⛓ Correct misassemblies using linked AND long reads

Release Conda Issues

Correct misassemblies in genome assembly drafts using linked or long sequencing reads

Cut sequences at positions with few spanning molecules.

Written by Shaun Jackman, Lauren Coombe, Justin Chu, and Janet Li.

Paper · Slides · Poster

Citation

Shaun D. Jackman, Lauren Coombe, Justin Chu, Rene L. Warren, Benjamin P. Vandervalk, Sarah Yeo, Zhuyi Xue, Hamid Mohamadi, Joerg Bohlmann, Steven J.M. Jones and Inanc Birol (2018). Tigmint: correcting assembly errors using linked reads from large molecules. BMC Bioinformatics, 19(1). doi:10.1186/s12859-018-2425-6

Description

Tigmint identifies and corrects misassemblies using linked (e.g. MGI’s stLFR, 10x Genomics Chromium) or long (e.g. Oxford Nanopore Technologies long reads) DNA sequencing reads. The reads are first aligned to the assembly, and the extents of the large DNA molecules are inferred from the alignments of the reads. The physical coverage of the large molecules is more consistent and less prone to coverage dropouts than that of the short read sequencing data. The sequences are cut at positions that have insufficient spanning molecules. Tigmint outputs a BED file of these cut points, and a FASTA file of the cut sequences.

Tigmint also allows the use of long reads from Oxford Nanopore Technologies. The long reads are segmented and assigned barcodes, and the following steps of the pipeline are the same as described above.

Each window of a specified fixed size is checked for a minimum number of spanning molecules. Sequences are cut at those positions where a window with sufficient coverage is followed by some number of windows with insufficient coverage is then followed again by a window with sufficient coverage.

Installation

Install Tigmint using Brew

Install Linuxbrew on Linux or Windows Subsystem for Linux (WSL), or install Homebrew on macOS, and then run the command

brew install tigmint

Install Tigmint using Conda

conda install -c bioconda tigmint

Install Tigmint using PyPI

pip3 install tigmint

Run Tigmint using Docker

docker run -it bcgsc/tigmint

Install Tigmint from the source code

Download and extract the source code.

git clone https://github.com/bcgsc/tigmint && cd tigmint
cd src
make

or

curl -L https://github.com/bcgsc/tigmint/releases/download/v1.2.4/tigmint-1.2.4.tar.gz | tar xz && cd tigmint-1.2.4
cd src
make

Dependencies

Install Python package dependencies

pip3 install intervaltree pybedtools pysam numpy

Tigmint uses Bedtools, minimap2, BWA, zsh and Samtools. These dependencies may be installed using Homebrew on macOS or Linuxbrew on Linux.

Install the dependencies of Tigmint

brew install bedtools bwa samtools
brew tap brewsci/bio
brew install minimap2

Install the dependencies of ARCS (optional)

brew tap brewsci/bio
brew install arcs links-scaffolder

Install the dependencies for calculating assembly metrics (optional)

brew install abyss seqtk

Usage

To run Tigmint on the draft assembly myassembly.fa with the reads myreads.fq.gz, which have been run through longranger basic:

tigmint-make tigmint draft=myassembly reads=myreads

To run both Tigmint and scaffold the corrected assembly with ARCS:

tigmint-make arcs draft=myassembly reads=myreads

To run Tigmint, ARCS, and calculate assembly metrics using the reference genome GRCh38.fa:

tigmint-make metrics draft=myassembly reads=myreads ref=GRCh38 G=3088269832

To run Tigmint with long reads in fasta or fastq format (myreads.fa.gz or myreads.fq.gz) on the draft assembly myassembly.fa for an organism with a genome size of gsize:

tigmint-make tigmint-long draft=myassembly reads=myreads span=auto G=gsize dist=auto

Note

tigmint-make commands

Parameters of Tigmint

Parameters of ARCS

Parameters of LINKS

Parameters for calculating assembly metrics

Tips

To use stLFR linked reads with Tigmint, you will need to re-format the reads to have the barcode in a BX:Z: tag in the read header. For example, this format

@V100002302L1C001R017000000#0_0_0/1 0	1
TGTCTTCCTGGACAGCTGACATCCCTTTTGTTTTTCTGTTTGCTCAGATGCTGTCTCTTATACACATCTTAGGAAGACAAGCACTGACGACATGATCACC
+
FFFFFFFGFGFFGFDFGFFFFFFFFFFFGFFF@FFFFFFFFFFFF@FFFFFFFFFGGFFEFEFFFF?FFFFGFFFGFFFFFFFGFFEFGFGGFGFFFGFF

should be changed to:

@V100002302L1C001R017000000 BX:Z:0_0_0
TGTCTTCCTGGACAGCTGACATCCCTTTTGTTTTTCTGTTTGCTCAGATGCTGTCTCTTATACACATCTTAGGAAGACAAGCACTGACGACATGATCACC
+
FFFFFFFGFGFFGFDFGFFFFFFFFFFFGFFF@FFFFFFFFFFFF@FFFFFFFFFGGFFEFEFFFF?FFFFGFFFGFFFFFFFGFFEFGFGGFGFFFGFF

Support

After first looking for existing issue at https://github.com/bcgsc/tigmint/issues, please report a new issue at https://github.com/bcgsc/tigmint/issues/new. Please report the names of your input files, the exact command line that you are using, and the entire output of Tigmint.

Pipeline

Tigmint pipeline illustration