Deploy a cluster
Production-style Ursula runs as a static-membership Raft cluster: a fixed list of nodes with stable IDs, gRPC peer-to-peer Raft traffic, and a shared S3 cold backend. The migration benchmark target is three voting nodes across availability zones.
For Kubernetes, use the Helm chart. It maps the same static-membership model onto StatefulSet pod ordinals, stable peer DNS, and per-pod PVCs.
Topology
Each node needs:
- a stable Raft
node_id - a TCP listen address (the same address every peer dials over gRPC)
- the full peer list (including itself)
- the same storage mode (
--storage-backend memoryor--storage-backend disk --disk-path DIR) on every peer - a one-time membership initializer on the first start (
--raft-init-membership-per-group) - a shared cold backend (filesystem path for single-host smoke, S3 for production)
There is no separate "advertise" address. The --listen address must be reachable by other peers.
Cluster config file
The --raft-cluster-config flag accepts a small JSON file shared across all nodes - only node_id differs per host:
{
"node_id": 1,
"init_membership_per_group": true,
"peers": [
{"node_id": 1, "url": "http://10.0.0.1:4437"},
{"node_id": 2, "url": "http://10.0.0.2:4437"},
{"node_id": 3, "url": "http://10.0.0.3:4437"}
]
}
Same file on each host, only the top-level node_id field changes.
Start node 1
ursula \
--listen 0.0.0.0:4437 \
--core-count 16 \
--raft-group-count 256 \
--storage-backend disk \
--disk-path /var/lib/ursula \
--raft-cluster-config /etc/ursula/cluster.json
init_membership_per_group: true only needs to be present on the very first start of a fresh cluster. After that, Ursula remembers membership. Flip it to false (or remove it) for subsequent starts.
Start nodes 2 and 3
Same command, same JSON file with node_id adjusted to 2 and 3 respectively. Use --storage-backend memory instead of --storage-backend disk --disk-path DIR if you want a non-durable test cluster.
Verify with ursulactl
Once daemons are up on every node, point ursulactl at a one-file manifest of the cluster and block until each Raft group has elected a leader:
cat > cluster-manifest.json <<'JSON'
{
"nodes": [
{"id": 1, "http_url": "http://10.0.0.1:4437", "host": "10.0.0.1"},
{"id": 2, "http_url": "http://10.0.0.2:4437", "host": "10.0.0.2"},
{"id": 3, "http_url": "http://10.0.0.3:4437", "host": "10.0.0.3"}
]
}
JSON
ursulactl wait-ready --config cluster-manifest.json --expected-groups 256
ursulactl status --config cluster-manifest.json
wait-ready returns non-zero with a single-line reason if the timeout elapses. status prints one line per node showing the raft group count and per-leader group counts as observed from that node's metrics. Healthy clusters report the same distribution from every reporter.
If you don't have ursulactl handy, the raw metrics endpoint works as a fallback:
for host in 10.0.0.1 10.0.0.2 10.0.0.3; do
curl -s "http://$host:4437/__ursula/metrics" | jq '.raft_groups | length'
done
Storage exclusivity
Pick one storage backend per cluster.
--storage-backend memory- fast, volatile. Survives no restart--storage-backend disk --disk-path DIR- durable OpenRaft log mode underDIR/raft-log(recommended for clusters)
Cold storage
A shared object store is recommended for any multi-node deployment because peers need to read each other's flushed chunks. Configure via environment variables on every node:
URSULA_COLD_BACKEND=s3
URSULA_COLD_S3_BUCKET=my-ursula-bucket
URSULA_COLD_S3_REGION=us-east-1
URSULA_COLD_ROOT=ursula-prod-20260518
See configure S3 for the full set of URSULA_COLD_* variables.
Operating the cluster
The first tool for day-2 work is ursulactl:
ursulactl restart— drain-aware rolling restart with applied-index catch-up gates.ursulactl status— leadership distribution per node.ursulactl wait-ready— gate scripts on group + leader counts.
For SSH/AWS-side plumbing — pushing binaries, writing systemd units, EC2 Instance Connect, S3 cleanup — use scripts/ursula_ec2.py. See operations for the full split.
The HTTP admin surface underneath both tools is small and stable enough to script against directly:
GET /__ursula/metrics— per-node JSON snapshot.POST /__ursula/raft/{group_id}/snapshot— manually trigger a Raft snapshot.POST /__ursula/raft/{group_id}/purge— purge stale log entries.POST /__ursula/raft/{group_id}/learners/{node_id}— add a learner (non-voting) replica.POST /__ursula/raft/{group_id}/leader/transfer/{node_id}— hand off leadership to another voter (the primitiveursulactl restartbuilds on).POST /__ursula/flush-cold/{bucket}/{stream}— force a cold flush for one stream.
Limits in the current build
- Membership changes (adding/removing voters after bootstrap) are not yet exposed as a routine workflow. Learners can be added via the admin endpoint above.
ursulactl restartpackages the safe rolling restart loop (drain → restart-cmd → wait-ready); version-upgrade tooling that diffs binaries before restart is not yet packaged.- There is no zero-downtime config reload. Restart the process to pick up new flags or env vars.