Correct misassemblies using Linked Reads
Split sequences at positions with low depth of coverage and high number of molecule starts.
Written by Shaun Jackman.
Tigmint identifies and corrects misassemblies using linked reads from 10x Genomics Chromium. The reads are first aligned to the assembly, and the extents of the large DNA molecules are inferred from the alignments of the reads. The physical coverage of the large molecules is more consistent and less prone to coverage dropouts than that of the short read sequencing data. Atypical drops in physical molecule coverage, less than the median minus two times the inter-quartile range, reveal possible misassemblies. Clipped alignments of the first and last reads of a molecule are used to refine the coordinates of the misassembly with base-pair accuracy.
Download and extract the source code. Compiling is not needed.
git clone https://github.com/bcgsc/tigmint && cd tigmint
curl -L https://github.com/bcgsc/tigmint/archive/master.tar.gz | tar xz && mv tigmint-master tigmint && cd tigmint
Install the dependencies of Tigmint
brew tap homebrew/science brew install bedtools bwa gawk gnu-sed miller pigz r samtools seqtk Rscript -e 'install.packages(c("ggplot2", "rmarkdown", "tidyverse", "uniqtag"))'
Install the dependencies of ARCS (optional)
brew install arcs links-scaffolder
Install the dependencies for calculating assembly metrics (optional)
brew install abyss
Change your current working directory to the directory in which Tigmint is installed:
To run Tigmint on the draft assembly
myassembly.fa with the reads
myreads.fq.gz, which have been run through
tigmint-make tigmint draft=myassembly reads=myreads
To run both Tigmint and scaffold the corrected assembly with ARCS:
tigmint-make arcs draft=myassembly reads=myreads
To run Tigmint, ARCS, and calculate assembly metrics using the reference genome
tigmint-make metrics draft=myassembly reads=myreads ref=GRCh38 G=3088269832
tigmint-makeis a Makefile script, and so any
makeoptions may also be used with
tigmint-make, such as
- The file extension of the assembly must be
.faand the reads
.fq.gz, and the extension is not included in the parameters
reads. These specific file name requirements result from implementing the pipeline in GNU Make.
tigmint: Run Tigmint, and produce a file named
arcs: Run Tigmint and ARCS, and produce a file name
metrics: Run, Tigmint, ARCS, and calculate assembly metrics using
abyss-samtobreak, and produce TSV files.
Parameters of Tigmint
draft: Name of the draft assembly,
reads: Name of the reads,
depth_threshold=100: Depth of coverage threshold
starts_threshold=4: Number of molecule starts threshold
minsize=2000: Minimum molecule size
as=100: Minimum alignment score
nm=5: Maximum number of mismatches
t=8: Number of threads
gzip=gzip: gzip compression program, use
pigz -p8for parallelized compression
Parameters of ARCS
Parameters of LINKS
Parameters for calculating assembly metrics
ref: Reference genome,
ref.fa, for calculating assembly contiguity metrics
G: Size of the reference genome, for calculating NG50 and NGA50
After first looking for existing issue at https://github.com/bcgsc/tigmint/issues, please report a new issue at https://github.com/bcgsc/tigmint/issues/new. Please report the names of your input files, the exact command line that you are using, and the entire output of