Benchmarks
Reproducible benchmarks on bare-metal servers. 3-node cluster with a dedicated Go load generator on a private VLAN.
Throughput
Section titled “Throughput”| Scenario | batch=500 | batch=1 |
|---|---|---|
| Async | 4.01M evt/s | 266K evt/s |
| Sync | 1.70M evt/s | 24.1K evt/s |
Async delivery responds before data is fsynced to disk. Sync delivery waits for WAL fsync and replication acknowledgement before responding — every event is durable on disk when the client receives “ok”.
Latency
Section titled “Latency”Round-trip latency for a single durable sync write, measured end-to-end from the load generator.
| Percentile | Latency |
|---|---|
| p50 | 171 µs |
| p90 | 198 µs |
| p99 | 224 µs |
| p999 | 250 µs |
Scaling
Section titled “Scaling”Async throughput with batch=500, varying the number of concurrent workers.
| Workers | Events/sec |
|---|---|
| 1 | 1,442K |
| 2 | 2,554K |
| 8 | 4,011K |
| 16 | 4,034K |
| 64 | 3,942K |
The cluster saturates at 8 workers with approximately 4.0M events per second. Each node runs a single-threaded event loop — throughput scales with the number of nodes, not CPU cores.
Hardware
Section titled “Hardware”All numbers were collected on Cherry Servers (US-Chicago, hourly billing):
- CPU: AMD Ryzen 7 7700X, 8 cores / 16 threads, 5.6 GHz boost
- RAM: 64 GB DDR5
- Disk: 2× Micron 7450 NVMe, enterprise grade
- Network: private VLAN, approximately 64 µs RTT between nodes
- OS: Ubuntu 22.04, kernel 6.8
Reproduce
Section titled “Reproduce”Benchmark scripts and the Go load generator are available in the benchmarks repository.
cd benchmarksmake up # provision servers, build, start clustermake bench # run all scenarios with batch=500make bench-single # run batch=1 scenariosmake clean # destroy servers and stop billingThe benchmarks README covers setup, configuration, and scenario details.