Skip to content

Usage

Command Line Options

Argument Description
-i, --input Space-separated list of input genome files (FASTA/Q).
-l, --list File containing one input file path per line.
-k, --kmer-size k-mer length (1–64, default: 31).
-s, --scaled FracMinHash sampling rate (0–1, default: 0.01).
--seed Random seed for hashing (default: 42).
-a, --ani Compute ANI in addition to Jaccard index.
-t, --threads Number of parallel tasks (default: 1).
-f, --format Output format: table, csv, json (default: table).
-o, --output Output file path (default: stdout).
--min-similarity Only output pairs with Jaccard index ≥ this value (default: 0.0).
-V, --verbose Print detailed progress information.
-v, --version Show program version and exit.
-h, --help Show this help message and exit.

Parameters

Input Options

Option Description
-i, --input Space-separated list of input genome files.
-l, --list File containing one input file path per line. Alternative to -i for handling many files.

Supported file formats

  • FASTA (.fasta, .fa, .fna, .ffn, .frn): sequences only.
  • FASTQ (.fastq, .fq): sequences and quality scores (quality is ignored for sketching).
  • Compressed files (.gz, .bz2, .xz, .zip): all above formats can be compressed and will be auto-detected and decompressed on the fly.

k-mer Settings

Option Description
-k, --kmer-size k-mer length (1–64, default: 31). Longer k-mers provide higher specificity but require more memory.
-s, --scaled FracMinHash sampling rate (0–1, default: 0.01). Controls sketch size: lower values = smaller sketches (faster but less sensitive).
--seed Random seed for hashing (default: 42). Change to get different hash values for the same input.

Output Control

Option Description
-a, --ani Compute ANI (Average Nucleotide Identity) in addition to Jaccard index. Adds ANI column to output.
-f, --format Output format: table, csv, json (default: table). Console always displays formatted results.
-o, --output Output file path. Format is auto-detected from extension (.csv, .json, .txt).
--min-similarity Filter results: only output pairs with Jaccard index ≥ this value (default: 0.0). Useful for focusing on high-similarity genomes.

Output Formats

FracSim supports multiple output formats to suit different needs:

  • Table: Human-readable formatted table
  • CSV: Comma-separated values, ideal for spreadsheet software or further analysis
  • JSON: Structured data format, perfect for programmatic processing and integration

Performance

Option Description
-t, --threads Number of parallel tasks (sketch generation phase is accelerated using multiprocessing), default 1.

Information

Option Description
-V, --verbose Print detailed progress information, including sketch sizes and processing time.
-v, --version Show program version and exit.
-h, --help Show this help message and exit.

Usage Examples

Basic pairwise comparison

fracsim -i genome1.fna genome2.fna -k 31 -s 0.01 --ani 

Batch processing multiple files

# Using space-separated list
fracsim -i genome1.fna genome2.fna genome3.fna -k 31 -s 0.001
# Using list file
fracsim -l genome_list.txt -k 31 -s 0.001

Controlling output

Save results as CSV
fracsim -i genome1.fna genome2.fna -k 31 -s 0.01 --ani -o results.csv
Filter high-similarity pairs only
fracsim -i genome1.fna genome2.fna -k 31 -s 0.01 --min-similarity 0.8
JSON output with verbose logging
fracsim -i genome1.fna genome2.fna -k 31 -s 0.01 --format json -o results.json -V

Performance optimization

Parallel processing of large genomes
fracsim -i large_genome.fna reference_genome.fna -k 31 -s 0.01 --threads 8

Working with compressed files

# Auto-detects and decompresses .gz files
fracsim -i genome1.fna.gz genome2.fna.gz -k 31 -s 0.01 --ani