patchbay — Sam Latino

A gateway touches every token you stream, so it has exactly two jobs — add almost nothing, and never send a private prompt to someone else’s computer; patchbay does the second one at compile time.

deployment: 1 single binary
private→external routes: 0 unrepresentable in the types
retry window: pre-byte never after first byte
overhead vs LiteLLM: unmeasured head-to-head planned, same host

the zero is structural

Requests are tagged at ingress, and the tag is a type parameter, not a field. Routing a private request to an external backend is therefore not a guarded error path — it is a program that does not compile. Shape of the idea, simplified:

// simplified sketch — the real types carry more context
struct Private;
struct Shareable;

struct Request<Tag> { /* payload, key, budget … */ }

impl Request<Private> {
    // the only route() that exists for Private takes local backends
    fn route(self, pool: &LocalPool) -> Routed { /* … */ }
}

impl Request<Shareable> {
    fn route(self, pool: &AnyPool) -> Routed { /* … */ }
}

The boundary where tags get assigned — config parsing, header mapping — is runtime code and stays dangerous, so an exhaustive proptest hammers exactly that border: across generated request/backend combinations, a private request never resolves to an external backend. Types carry the guarantee; the proptest guards the crossing where untyped input becomes typed.

route path

docs/route-path.svg

ingress → keys/budgets → governor → router; private traffic can only resolve to the local pool

Backend selection blends an EWMA of observed per-backend latency with static priority, so a slow backend drains traffic without a config push and a recovered one earns it back. Fallback chains retry with jittered backoff under one hard rule: retries fire only before the first response byte. After first byte, a retry would splice duplicate output into a stream a client is already consuming — so past that point, failures surface instead of retrying.

The SSE relay is byte-faithful. Upstream bytes pass through untouched, with one exception: the final usage frame is intercepted to feed accounting. That is the entire parse surface of the streaming path.

operations

Virtual keys with per-key token budgets, governor-enforced RPM/TPM limits, and a Prometheus /metrics endpoint. The route path has a criterion bench in the repo — so when the LiteLLM comparison runs, the harness is already waiting. 13 tests, clippy -D warnings clean; the proptest does the heavy lifting on the invariant that matters.

numbers, including the missing one

bench/overhead-vs-litellm pending

Planned and not yet run: patchbay vs LiteLLM on the same host against the same backend — added latency p50/p99 on the route path, plus time-to-first-byte under streaming. The criterion bench of the route path exists; the cross-gateway numbers appear here when the run happens, not before.