sam@latino:~/projects$ ls -lt

projects

Agent systems, evals, retrieval, and self-hosted inference infrastructure. Every test count is from the project's own suite — no estimates.

6 projects 524 tests total

~/projects/redcell python

redcell

Defensive agent-robustness harness mapped to the OWASP Top 10 for Agentic Applications — deterministic oracles, a 146-case probe corpus, and a sandboxed three-level target.

status: active · 132 tests
~/projects/longhaul rust

longhaul

Early-adopter MCP 2026-07-28 release-candidate server in Rust — stateless core, the Tasks extension, and statelessness proven by round-robining one client across two server instances.

status: active · 67 tests
~/projects/patchbay rust

patchbay

Single-binary OpenAI-compatible gateway where privacy routing is enforced in the type system — a private request cannot select an external backend, by construction.

status: active · 13 tests
~/projects/millstone rust

millstone

Clean-room BM25 + tree-sitter repo-map retrieval crate, with a bench harness against tantivy and SQLite FTS5 — the "you probably don't need embeddings (yet)" thesis.

status: active · 35 tests
~/projects/callcheck python

callcheck

Tool-calling and structured-output conformance matrix for vLLM-served open models, with an 11-label failure taxonomy and a mock server that proves the scorer itself is correct.

status: active · 165 tests
~/projects/eval-gate python + typescript

eval-gate

Regression-eval CI gate — embedding-free scorers, drift detection with a k-repeat noise floor, and a sticky PR comment via a GitHub Action. Dogfooded by every other repo here.

status: active · 112 tests