Skip to content

FracSim

Latest Version Conda PyPI Python 3.8+ Platform License: MIT

FracSim is a fast, memory-efficient command-line tool for estimating bacterial genome similarity using the FracMinHash sketching algorithm. It computes both Jaccard index and Average Nucleotide Identity (ANI) between genomes, enabling large-scale comparative genomics studies.


✨ Key Features

  • Fast: Uses FracMinHash sketching to dramatically lower memory footprint (≈33 MB per pair) and runtime.
  • Accurate: Provides Jaccard index and ANI estimates;achieves MAE < 0.25% for ANI > 95%.
  • Fully self‑contained: No dependency on BioPython, k‑mer counting libraries, or heavy bioinformatics frameworks.
  • Flexible input: Supports FASTA/FASTQ (plain or gzip/bzip2/xz/zip compressed), single or multiple files, and file lists.
  • Easy to use: Clean command‑line interface with multi‑processing support and progress indicators.
  • Multiple output formats: CSV, TSV, JSON; can be saved to file.
  • Open source: MIT licensed – contributions and usage are welcome.hosted on GitHub.

🚀 Quick Example

Compare two E. coli genomes:

fracsim -i ecoli_k12.fasta ecoli_o157.fasta -k 21 -s 100 --ani