Benchmarks

Reproducible benchmarks on bare-metal servers. 3-node cluster with a dedicated Go load generator on a private VLAN.

Throughput

Scenario	batch=500	batch=1
Async	4.01M evt/s	266K evt/s
Sync	1.70M evt/s	24.1K evt/s

Async delivery responds before data is fsynced to disk. Sync delivery waits for WAL fsync and replication acknowledgement before responding — every event is durable on disk when the client receives “ok”.

Latency

Round-trip latency for a single durable sync write, measured end-to-end from the load generator.

Percentile	Latency
p50	171 µs
p90	198 µs
p99	224 µs
p999	250 µs

Scaling

Async throughput with batch=500, varying the number of concurrent workers.

Workers	Events/sec
1	1,442K
2	2,554K
8	4,011K
16	4,034K
64	3,942K

The cluster saturates at 8 workers with approximately 4.0M events per second. Each node runs a single-threaded event loop — throughput scales with the number of nodes, not CPU cores.

Hardware

All numbers were collected on Cherry Servers (US-Chicago, hourly billing):

CPU: AMD Ryzen 7 7700X, 8 cores / 16 threads, 5.6 GHz boost
RAM: 64 GB DDR5
Disk: 2× Micron 7450 NVMe, enterprise grade
Network: private VLAN, approximately 64 µs RTT between nodes
OS: Ubuntu 22.04, kernel 6.8

Reproduce

Benchmark scripts and the Go load generator are available in the benchmarks repository.

cd benchmarks
make up            # provision servers, build, start cluster
make bench         # run all scenarios with batch=500
make bench-single  # run batch=1 scenarios
make clean         # destroy servers and stop billing

The benchmarks README covers setup, configuration, and scenario details.