Skip to content

Benchmarks

Reproducible benchmarks on bare-metal servers. 3-node cluster with a dedicated Go load generator on a private VLAN.

Scenariobatch=500batch=1
Async4.01M evt/s266K evt/s
Sync1.70M evt/s24.1K evt/s

Async delivery responds before data is fsynced to disk. Sync delivery waits for WAL fsync and replication acknowledgement before responding — every event is durable on disk when the client receives “ok”.

Round-trip latency for a single durable sync write, measured end-to-end from the load generator.

PercentileLatency
p50171 µs
p90198 µs
p99224 µs
p999250 µs

Async throughput with batch=500, varying the number of concurrent workers.

WorkersEvents/sec
11,442K
22,554K
84,011K
164,034K
643,942K

The cluster saturates at 8 workers with approximately 4.0M events per second. Each node runs a single-threaded event loop — throughput scales with the number of nodes, not CPU cores.

All numbers were collected on Cherry Servers (US-Chicago, hourly billing):

  • CPU: AMD Ryzen 7 7700X, 8 cores / 16 threads, 5.6 GHz boost
  • RAM: 64 GB DDR5
  • Disk: 2× Micron 7450 NVMe, enterprise grade
  • Network: private VLAN, approximately 64 µs RTT between nodes
  • OS: Ubuntu 22.04, kernel 6.8

Benchmark scripts and the Go load generator are available in the benchmarks repository.

Terminal window
cd benchmarks
make up # provision servers, build, start cluster
make bench # run all scenarios with batch=500
make bench-single # run batch=1 scenarios
make clean # destroy servers and stop billing

The benchmarks README covers setup, configuration, and scenario details.