FracSim
FracSim is a fast, memory-efficient command-line tool for estimating bacterial genome similarity using the FracMinHash sketching algorithm. It computes both Jaccard index and Average Nucleotide Identity (ANI) between genomes, enabling large-scale comparative genomics studies.
✨ Key Features
- Fast: Uses FracMinHash sketching to dramatically lower memory footprint (≈33 MB per pair) and runtime.
- Accurate: Provides Jaccard index and ANI estimates;achieves MAE < 0.25% for ANI > 95%.
- Fully self‑contained: No dependency on BioPython, k‑mer counting libraries, or heavy bioinformatics frameworks.
- Flexible input: Supports FASTA/FASTQ (plain or gzip/bzip2/xz/zip compressed), single or multiple files, and file lists.
- Easy to use: Clean command‑line interface with multi‑processing support and progress indicators.
- Multiple output formats: CSV, TSV, JSON; can be saved to file.
- Open source: MIT licensed – contributions and usage are welcome.hosted on GitHub.
🚀 Quick Example
Compare two E. coli genomes:
fracsim -i ecoli_k12.fasta ecoli_o157.fasta -k 21 -s 100 --ani