OSS HTTP Streams Benchmark

A comparison of Ursula, Durable Streams, and S2 Lite across multi-stream writes, catch-up replay, and SSE live tail.

What was measured

All three systems answered the exact same three workloads from the exact same client binary. The bench client picks a backend with --api-style ursula|durable|s2 and switches its HTTP plumbing (URLs, body shape, auth headers) so the workload itself is identical across backends.

Ursula

  • 3 × c7g.4xlarge, one voter per AZ
  • 256 Raft groups, 16 cores per node
  • Every commit replicates to a majority quorum (2 of 3)
  • S3 cold flush enabled; ~675 MiB uploaded in this run
  • Bench targets all 3 nodes via round-robin

Durable Streams

  • 1 × c7g.4xlarge, single Rust server process
  • durable-streams-server v0.3.0
  • file-durable storage on the root EBS volume
  • Capacity limits raised above workload size; replay uses ?offset=-1

S2 Lite

  • 1 × c7g.4xlarge, single S2 Lite process
  • s2-cli v0.33.0 (s2-lite)
  • S3 backend (S3 Standard, same region)
  • S2 Lite's own API, not Durable Streams protocol

All three backends are persistent in this run. Ursula commits each write to a 3-voter Raft quorum across three c7g.4xlarge nodes and runs background S3 cold flush; Durable Streams' file-backed store fsyncs to the root EBS volume on a single node; S2 Lite writes through to S3 on a single node. Ursula append acknowledgements are not gated by the S3 flush, but this run did exercise that background path. This is the durable-vs-durable comparison. Aggregate throughput reflects Ursula getting 3× the hardware in exchange for delivering quorum-replicated durability across AZs that the other two do not provide here.

The OS file descriptor limit was set to 65,535 on the client and servers. With S2 Lite artificially constrained to 256 fds, the same harness reproduces connection failures as Too many open files; those failures are excluded from the headline results.

Durable Streams' max_memory_bytes is a hard payload capacity limit, not an eviction cache. It is raised here only to avoid benchmark-induced 413 responses; the data directory is on EBS, not /tmp tmpfs.

Multi-stream write

The question this scenario answers: when many streams are writing concurrently, does the system commit them in parallel or does some shared point serialize them? Ursula's bet is multi-Raft sharding across nodes and cores.

Multi-stream write - aggregate throughput

higher is better

N independent streams, one writer per stream, 256 B payload, 30 s. All three systems run with persistent backends: Ursula commits to a 3-voter Raft quorum with S3 cold flush enabled, Durable Streams runs file-durable storage on EBS, and S2 Lite runs against S3.

aggregate commits / shigher is better · log x, linear y012k23k35k47k1005002kconcurrent active streams
UrsulaDurable StreamsS2 Lite

Ursula keeps every append on a 3-voter Raft quorum while asynchronously flushing cold chunks to S3; this run uploaded ~675 MiB through that background path. Durable Streams is shown on a real EBS-backed data directory; earlier tmpfs-backed file-durable numbers are excluded.

Multi-stream write - p99 latency

lower is better

Same workload. Lower is better.

p99 append latencylower is better · log x, linear y0.002785578351.1k1005002kconcurrent active streams
UrsulaDurable StreamsS2 Lite

S2 Lite's per-append latency is dominated by the S3 PUT round-trip. Durable Streams pays local EBS fdatasync on the file-durable path; Ursula pays the cross-node quorum cost plus background cold-flush pressure and remains below both at every measured concurrency.

SSE fan-out

One popular document with N concurrent SSE viewers and a steady-rate publisher. The bet: a server with an O(unique-request) wake path delivers each event to all viewers in one round; a naive O(N) wake loop or storage-backed tail path can add latency as subscriber count grows.

SSE fan-out - per-event delivery p99

lower is better

One stream, one writer at 50 events / s, N concurrent SSE subscribers. End-to-end publish-to-receive latency measured at each subscriber.

p99 fan-out latencylower is better · log x, linear y0.0031.462.894.2126502005001kconcurrent SSE subscribers on one stream
UrsulaDurable StreamsS2 Lite

Ursula and Durable Streams both keep fan-out p99 in single-digit milliseconds through 1,000 subscribers. S2 Lite remains around 100 ms because the S3-backed path dominates the live-tail floor in this setup.

Catch-up replay

After a deploy or a network blip, many clients reconnect - each to its own document. Each client wants "give me the full current state of this stream". The mechanism differs by system: Ursula uses /bootstrap which returns a snapshot plus the tail since that snapshot, while DS and S2 Lite must replay the full log because neither ships a matching snapshot endpoint in this harness.

Catch-up replay - p99 latency

lower is better

N clients, each on its own stream pre-filled with 200 events × 1 KiB. Ursula uses GET /bootstrap (snapshot + tail-since-snapshot); DS and S2 Lite replay the full log in this harness.

p99 latency among ok clientslower is better · log x, linear y0.002224456678891005001kconcurrent clients (each on a unique stream)
UrsulaDurable StreamsS2 Lite

At 1,000 concurrent clients, Ursula has the lowest replay p99 (253 ms) and the smallest response body (172 KB), ahead of Durable Streams at 366 ms and S2 Lite at 794 ms.

Takeaways

  • Write throughput at 500 streams: Ursula 41.6k quorum commits/s vs S2 Lite 6.0k; Durable Streams reaches 3.4k on its single-node EBS file-durable path.
  • SSE fan-out at 1,000 subscribers: Ursula 8.3 ms p99 and DS 6.5 ms p99 both stay in single-digit milliseconds; S2 Lite is 112 ms p99 on the S3-backed path.
  • Catch-up replay at 1,000 clients: Ursula has the lowest p99 at 253 ms and the smallest response body at 172 KB.
  • Caveats: Ursula uses 3 × c7g vs DS / S2 Lite's 1 × c7g (deployment-shape comparison, not per-CPU). DS numbers use file-durable on EBS with capacity limits raised above the benchmark footprint, and S2 Lite uses S3 Standard with a 65,535-fd process limit.

Durability and availability posture

Throughput and latency are only fair to compare if the durability properties are clear. Here is what each system actually guarantees in this benchmark's configuration. Ursula pays a quorum round-trip on every commit; S2 Lite pays an S3 PUT; the file-durable Durable Streams server writes to a single EBS volume.

SystemCommitted data lives onOne instance lostOne AZ lostApprox. annual data-loss probability
Ursula3 Raft voters across us-east-1a / 1b / 1cservice stays up; data preserved (2/3 quorum)service stays up; data preserved (2/3 quorum)~10−7 (needs concurrent loss of 2 voters across AZs before recovery)
Durable Streams (file-durable)local disk on one EBS volume, one instance, one AZservice down + acknowledged data potentially unrecoverableservice down + acknowledged data potentially unrecoverable~10−5 (bounded by single EBS volume / instance failure rate)
S2 Lite (S3)S3 Standard (cross-AZ replicated by S3, 11-nines object durability)service down until restart; committed data preserved on S3service down until restart; committed data preserved on S3~10−11 per object (S3 durability), service availability bounded by single instance

Three different shapes of "durable". Ursula gives you replicated availability too - the cluster keeps serving on instance or AZ loss. Durable Streams is the weakest on availability: data on one disk, service on one process. S2 Lite has the best raw object-storage durability but its service front-end is single-instance, so an instance failure means downtime even though the data is intact. Read the throughput and latency numbers above with this in mind: Ursula is paying for that quorum replication on every write.

Reproduce

# 1. Build the bench client and the Ursula HTTP server
cargo build --release -p ursula -p ursula-bench

# 2. Bring up each backend on identical hardware
export URSULA_COLD_BACKEND=s3
export URSULA_COLD_S3_BUCKET=<s3-bucket>
export URSULA_COLD_S3_REGION=<region>
export URSULA_COLD_FLUSH_MIN_HOT_BYTES=65536
export URSULA_COLD_FLUSH_MAX_BYTES=65536
python3 scripts/ursula_ec2.py --config <manifest>.json start
~/.cargo/bin/s2 lite --bucket <s3-bucket> --path s2-lite --port 4439
durable-streams-server --profile dev --config ds-ebs-file-durable.toml

# 3. Run the same three scenarios against each
for api in ursula durable s2; do
 ursula-bench multi-stream --target http://NODE:PORT --api-style "$api" \
 --streams 500 --duration-secs 30 --payload-bytes 256
 ursula-bench fan-out --target http://NODE:PORT --api-style "$api" \
 --subscribers 1000 --writer-rate 50 --duration-secs 30
done

# 4. Replay (apples-to-apples on all three backends)
for api in ursula durable s2; do
 ursula-bench bootstrap --target http://NODE:PORT --api-style "$api" \
 --clients 1000 --pre-events 200 --per-client-stream
done