Usage
Command Line Options
| Argument |
Description |
-i, --input |
Space-separated list of input genome files (FASTA/Q). |
-l, --list |
File containing one input file path per line. |
-k, --kmer-size |
k-mer length (1–64, default: 16). |
-s, --scaled |
FracMinHash sampling rate (integer >= 1, default: 100). |
--seed |
Random seed for hashing (default: 42). |
-a, --ani |
Compute ANI in addition to Jaccard index (Percentage). |
-t, --threads |
Number of parallel tasks (default: 1). |
-f, --format |
Output format: table, csv, json, tsv (default: table). |
-o, --output |
Output file path (default: stdout). |
-m, --min-similarity |
Minimum similarity threshold (Jaccard index, or ANI if --ani is set),Only output pairs with similarity ≥ this value (default: 0.0). |
-p, --performance |
Enable performance monitoring (total time and peak memory) - high self overhead, may slow down computation. |
-V, --verbose |
Print detailed progress information. |
-v, --version |
Show program version and exit. |
-h, --help |
Show this help message and exit. |
Parameters
| Option |
Description |
-i, --input |
Space-separated list of input genome files. |
-l, --list |
File containing one input file path per line. Alternative to -i for handling many files. |
- FASTA (
.fasta, .fa, .fna, .ffn, .frn): sequences only.
- FASTQ (
.fastq, .fq): sequences and quality scores (quality is ignored for sketching).
- Compressed files (
.gz, .bz2, .xz, .zip): all above formats can be compressed and will be auto-detected and decompressed on the fly.
k-mer Settings
| Option |
Description |
-k, --kmer-size |
k-mer length (1–64, default: 31). Longer k-mers provide higher specificity but require more memory. |
-s, --scaled |
FracMinHash sampling rate (integer >= 1, default: 100). Controls sketch size: lower values = smaller sketches (faster but less sensitive). |
--seed |
Random seed for hashing (default: 42). Change to get different hash values for the same input. |
Output Control
| Option |
Description |
-a, --ani |
Compute ANI (Average Nucleotide Identity) in addition to Jaccard index. Adds ANI column to output. |
-f, --format |
Output format: table, csv, json (default: table). Console always displays formatted results. |
-o, --output |
Output file path. Format is auto-detected from extension (.csv, .json, .txt, .tsv). |
-m, --min-similarity |
Filter results: only output pairs with similarity ≥ this value (default: 0.0). Useful for focusing on high-similarity genomes. |
FracSim supports multiple output formats to suit different needs:
- Table: Human-readable formatted table
- CSV: Comma-separated values, ideal for spreadsheet software or further analysis
- JSON: Structured data format, perfect for programmatic processing and integration
- TSV: Tab separated, machine friendly, can be directly imported by tools such as Excel, R, Python (pandas), etc.
| Option |
Description |
-t, --threads |
Number of parallel tasks (sketch generation phase is accelerated using multiprocessing), default 1. |
| Option |
Description |
-V, --verbose |
Print detailed progress information, including sketch sizes and processing time. |
-v, --version |
Show program version and exit. |
-h, --help |
Show this help message and exit. |
Usage Examples
Basic pairwise comparison
fracsim -i genome1.fna genome2.fna -k 31 -s 100 --ani
Batch processing multiple files
# Using space-separated list
fracsim -i genome1.fna genome2.fna genome3.fna -k 31 -s 1000
# Using list file
fracsim -l genome_list.txt -k 31 -s 1000
Controlling output
Save results as CSV
fracsim -i genome1.fna genome2.fna -k 31 -s 100 --ani -o results.csv
Filter high-similarity pairs only
fracsim -i genome1.fna genome2.fna -k 31 -s 100 --min-similarity 80
JSON output with verbose logging
fracsim -i genome1.fna genome2.fna -k 31 -s 100 --format json -o results.json -V
Parallel processing of large genomes
fracsim -i large_genome.fna reference_genome.fna -k 31 -s 100 --threads 8
Working with compressed files
# Auto-detects and decompresses .gz files
fracsim -i genome1.fna.gz genome2.fna.gz -k 31 -s 100 --ani