sam@latino:~/projects/patchbay$ cat README.md
patchbay
Single-binary OpenAI-compatible gateway where privacy routing is enforced in the type system — a private request cannot select an external backend, by construction.
lang Rust status active tests 13 repo github.com/slatino-dev/patchbay
- [LLM gateway]
- [type-driven design]
- [privacy routing]
- [SSE streaming]
- [self-hosted inference]
A gateway touches every token you stream, so it has exactly two jobs — add almost nothing, and never send a private prompt to someone else’s computer; patchbay does the second one at compile time.
- deployment
- 1 single binary
- private→external routes
- 0 unrepresentable in the types
- retry window
- pre-byte never after first byte
- overhead vs LiteLLM
- unmeasured head-to-head planned, same host
the zero is structural
Requests are tagged at ingress, and the tag is a type parameter, not a field. Routing a private request to an external backend is therefore not a guarded error path — it is a program that does not compile. Shape of the idea, simplified:
// simplified sketch — the real types carry more context
struct Private;
struct Shareable;
struct Request<Tag> { /* payload, key, budget … */ }
impl Request<Private> {
// the only route() that exists for Private takes local backends
fn route(self, pool: &LocalPool) -> Routed { /* … */ }
}
impl Request<Shareable> {
fn route(self, pool: &AnyPool) -> Routed { /* … */ }
}
The boundary where tags get assigned — config parsing, header mapping — is runtime code and stays dangerous, so an exhaustive proptest hammers exactly that border: across generated request/backend combinations, a private request never resolves to an external backend. Types carry the guarantee; the proptest guards the crossing where untyped input becomes typed.
route path
Backend selection blends an EWMA of observed per-backend latency with static priority, so a slow backend drains traffic without a config push and a recovered one earns it back. Fallback chains retry with jittered backoff under one hard rule: retries fire only before the first response byte. After first byte, a retry would splice duplicate output into a stream a client is already consuming — so past that point, failures surface instead of retrying.
The SSE relay is byte-faithful. Upstream bytes pass through untouched, with one exception: the final usage frame is intercepted to feed accounting. That is the entire parse surface of the streaming path.
operations
Virtual keys with per-key token budgets, governor-enforced RPM/TPM limits,
and a Prometheus /metrics endpoint. The route path has a criterion bench in
the repo — so when the LiteLLM comparison runs, the harness is already
waiting. 13 tests, clippy -D warnings clean; the proptest does the heavy
lifting on the invariant that matters.
numbers, including the missing one
Planned and not yet run: patchbay vs LiteLLM on the same host against the same backend — added latency p50/p99 on the route path, plus time-to-first-byte under streaming. The criterion bench of the route path exists; the cross-gateway numbers appear here when the run happens, not before.