Skip to content

How it stays fast

Unimeter is meant to handle the event rates that real production billing systems throw at it, which is sometimes hundreds of thousands of events per second from a single customer during traffic peaks. It is also meant to never lose a committed event. Those two goals normally pull in opposite directions, because durability means syncing to disk and disk is slow. This page explains how we reconciled them, so you can judge whether the design fits your situation.

One binary, written in Zig, no dependencies

Section titled “One binary, written in Zig, no dependencies”

Unimeter is a single statically linked executable. It does not pull in a database, a message broker, or an orchestration framework. Everything it needs is compiled into the binary, which is a few megabytes and has no runtime dependencies beyond a modern Linux kernel.

The language choice is Zig. We wanted a compiled, systems-level language with explicit memory management and no hidden control flow, which rules out Go and JVM languages. We wanted something that compiles quickly and produces a predictable binary, which ruled out Rust for our taste. Zig fits, and its standard library has first-class support for the Linux primitives we rely on.

Being dependency-free has practical benefits beyond simplicity. There is no PostgreSQL to upgrade, no Kafka cluster to operate, no version-skew concerns between internal services. You run one binary on some machines and monitor it. That is the whole operational surface.

The hot path of Unimeter, the place where an event spends its milliseconds from arrival to acknowledgement, never talks to a database. It talks to the kernel through io_uring and to a few hundred megabytes of carefully laid out in-process memory. That is it.

When an event arrives, it is appended to a write-ahead log on disk through an asynchronous io_uring submission. While the disk I/O is in flight, the event is also added to a running aggregate in memory, which is a hash map keyed by account and metric. When the kernel confirms the disk write, the client gets an acknowledgement. The whole path is zero-allocation on the critical path; buffers are preallocated at startup and reused forever.

In-memory aggregates are what make reads fast. Every time a new event updates the running total for a customer, the in-memory record is updated in place. Answering a query is an O(1) hash lookup that returns the current total in about a microsecond. There is no aggregation query, no scan, no join.

The in-memory aggregates are fast but volatile. If the process crashes, the memory is gone. The write-ahead log is what makes them durable. Every event is appended to the log with a checksum chain before the client is acknowledged, and the log is the source of truth. On restart, Unimeter replays the log to rebuild the in-memory aggregates and then continues serving traffic.

To keep startup fast even after the log has grown large, Unimeter periodically writes a checkpoint to disk. A checkpoint is a snapshot of all the in-memory aggregates at a specific log position. On startup, Unimeter loads the latest checkpoint and then replays only the events that arrived after it. This keeps recovery time proportional to the amount of recent activity rather than the full history of the system.

The log and checkpoints together mean that a crash never loses a committed event and never requires a long rebuild. In our tests, a server with a year of production data typically restarts and resumes serving within a few seconds.

A single node’s log can still be lost to hardware failure, so Unimeter replicates every event to a quorum of cluster nodes before acknowledging the client. The replication protocol is Viewstamped Replication, the same algorithm used in TigerBeetle and similar in spirit to Raft. We chose it because it is well-studied, it handles leader failure cleanly, and its correctness properties have been proven formally.

In practice this means that every event your application sends is written to at least two out of three disks before your client.Ingest call returns when you use sync delivery. With async delivery the acknowledgement comes slightly earlier and the replication happens in the background within milliseconds, which is usually fine for usage metering and gives you higher throughput.

Replication also gives you leader failover. If the leader for a partition crashes, the remaining nodes run an election and one of them becomes the new leader within a second or two. Clients notice the change through a redirect response and update their partition map automatically. No manual intervention is needed.

A lot of distributed systems solve the client-side routing problem by putting a proxy in front of the cluster. Every write and every read goes through a proxy that knows the topology. This adds a network hop and a point of failure, and it means the proxy itself needs to be load-balanced, highly available, and operated.

Unimeter takes the approach used by ScyllaDB, Cassandra, and Redis Cluster instead. The client library is smart. It downloads the cluster topology on startup, keeps it cached, and sends each request directly to the node that owns the relevant data. When the topology changes, the cluster pushes an update to every connected client and they refresh. The client handles redirects gracefully, so even if a node’s cached topology is stale, the worst case is one extra roundtrip.

The benefit is dramatic. There is no proxy tier to operate. There is no extra network hop between your application and the storage node. Your application talks to Unimeter with one TCP roundtrip per request, and that roundtrip goes directly to the node that has the data in memory.

Distributed systems are hard to test because failures that matter are rare in practice and happen in bad combinations. Unimeter is tested with a deterministic simulator (VOPR) that runs the full cluster logic in a single process with a seeded random number generator that controls when nodes crash, when messages are dropped, when the network partitions, and when disks corrupt writes. For every seed, the simulator checks a set of invariants after each simulated step.

The invariants cover the properties that matter. No two nodes ever think they are leader for the same partition in the same view. A committed event is never lost after a view change. Aggregate sums always equal the sum of events that fed them. Filtered totals match the events that carry the matching tag. Running the simulator for ten thousand iterations across several seeds takes a minute and exercises millions of operations under chaos conditions.

The simulator runs on every change to the code. A commit that breaks an invariant is caught immediately, well before reaching production. This is the main way we get to a system that does what it claims under all the awkward conditions that distributed systems get into.

No system gets everything. To keep Unimeter fast and simple, we chose not to support a few things that systems aimed at different tradeoffs would include.

There is no ad-hoc query language. You cannot write SQL against Unimeter. The query surface is small and targeted at the billing use case: totals by account, metric, and period. For analytical queries over raw events, export the events to a system built for that and query there.

There is no storage engine to tune. Unimeter keeps everything that matters in memory, so it is fast but the footprint scales with the number of unique customer-metric-period combinations you have. At a million customers times ten metrics, you use about half a gigabyte of memory, which is fine. At a hundred million customers you would feel the pressure and probably want a different system.

There is no multi-region replication today. A Unimeter cluster is assumed to live within one datacenter or one availability zone. Multi-region active-active is conceptually straightforward given the replication protocol but it is not yet implemented. If your billing system spans continents, Unimeter is currently not the answer for the global layer.

If you have read this far, you know enough about Unimeter’s design to decide whether it fits. It is fast because the hot path avoids every avoidable cost. It is correct because every event is durable, replicated, and tested against a simulator. It is simple because it is one binary and it does one job.

To put it into production, review Running a cluster. To understand how your application talks to it, start with the Quickstart.