Usage

Command Line Options

Argument	Description
`-i, --input`	Space-separated list of input genome files (FASTA/Q).
`-l, --list`	File containing one input file path per line.
`-k, --kmer-size`	k-mer length (1–64, default: 16).
`-s, --scaled`	FracMinHash sampling rate (integer >= 1, default: 100).
`--seed`	Random seed for hashing (default: 42).
`-a, --ani`	Compute ANI in addition to Jaccard index (Percentage).
`-t, --threads`	Number of parallel tasks (default: 1).
`-f, --format`	Output format: `table`, `csv`, `json`, `tsv` (default: `table`).
`-o, --output`	Output file path (default: stdout).
`-m, --min-similarity`	Minimum similarity threshold (Jaccard index, or ANI if --ani is set),Only output pairs with similarity ≥ this value (default: 0.0).
`-p, --performance`	Enable performance monitoring (total time and peak memory) - high self overhead, may slow down computation.
`-V, --verbose`	Print detailed progress information.
`-v, --version`	Show program version and exit.
`-h, --help`	Show this help message and exit.

Option	Description
`-i, --input`	Space-separated list of input genome files.
`-l, --list`	File containing one input file path per line. Alternative to `-i` for handling many files.

FASTA (.fasta, .fa, .fna, .ffn, .frn): sequences only.
FASTQ (.fastq, .fq): sequences and quality scores (quality is ignored for sketching).
Compressed files (.gz, .bz2, .xz, .zip): all above formats can be compressed and will be auto-detected and decompressed on the fly.

Option	Description
`-k, --kmer-size`	k-mer length (1–64, default: 31). Longer k-mers provide higher specificity but require more memory.
`-s, --scaled`	FracMinHash sampling rate (integer >= 1, default: 100). Controls sketch size: lower values = smaller sketches (faster but less sensitive).
`--seed`	Random seed for hashing (default: 42). Change to get different hash values for the same input.

Option	Description
`-a, --ani`	Compute ANI (Average Nucleotide Identity) in addition to Jaccard index. Adds ANI column to output.
`-f, --format`	Output format: `table`, `csv`, `json` (default: `table`). Console always displays formatted results.
`-o, --output`	Output file path. Format is auto-detected from extension (`.csv`, `.json`, `.txt`, `.tsv`).
`-m, --min-similarity`	Filter results: only output pairs with similarity ≥ this value (default: 0.0). Useful for focusing on high-similarity genomes.

FracSim supports multiple output formats to suit different needs:

Table: Human-readable formatted table
CSV: Comma-separated values, ideal for spreadsheet software or further analysis
JSON: Structured data format, perfect for programmatic processing and integration
TSV: Tab separated, machine friendly, can be directly imported by tools such as Excel, R, Python (pandas), etc.

Option	Description
`-t, --threads`	Number of parallel tasks (sketch generation phase is accelerated using multiprocessing), default 1.

Option	Description
`-V, --verbose`	Print detailed progress information, including sketch sizes and processing time.
`-v, --version`	Show program version and exit.
`-h, --help`	Show this help message and exit.

fracsim -i genome1.fna genome2.fna -k 31 -s 100 --ani

# Using space-separated list
fracsim -i genome1.fna genome2.fna genome3.fna -k 31 -s 1000
# Using list file
fracsim -l genome_list.txt -k 31 -s 1000

fracsim -i genome1.fna genome2.fna -k 31 -s 100 --ani -o results.csv

fracsim -i genome1.fna genome2.fna -k 31 -s 100 --min-similarity 80

fracsim -i genome1.fna genome2.fna -k 31 -s 100 --format json -o results.json -V

fracsim -i large_genome.fna reference_genome.fna -k 31 -s 100 --threads 8

# Auto-detects and decompresses .gz files
fracsim -i genome1.fna.gz genome2.fna.gz -k 31 -s 100 --ani