redcell
Defensive agent-robustness harness mapped to the OWASP Top 10 for Agentic Applications — deterministic oracles, a 146-case probe corpus, and a sandboxed three-level target.
status: active · 132 tests
sam@latino:~/projects$ ls -lt
Agent systems, evals, retrieval, and self-hosted inference infrastructure. Every test count is from the project's own suite — no estimates.
Defensive agent-robustness harness mapped to the OWASP Top 10 for Agentic Applications — deterministic oracles, a 146-case probe corpus, and a sandboxed three-level target.
status: active · 132 tests
Early-adopter MCP 2026-07-28 release-candidate server in Rust — stateless core, the Tasks extension, and statelessness proven by round-robining one client across two server instances.
status: active · 67 tests
Single-binary OpenAI-compatible gateway where privacy routing is enforced in the type system — a private request cannot select an external backend, by construction.
status: active · 13 tests
Clean-room BM25 + tree-sitter repo-map retrieval crate, with a bench harness against tantivy and SQLite FTS5 — the "you probably don't need embeddings (yet)" thesis.
status: active · 35 tests
Tool-calling and structured-output conformance matrix for vLLM-served open models, with an 11-label failure taxonomy and a mock server that proves the scorer itself is correct.
status: active · 165 tests
Regression-eval CI gate — embedding-free scorers, drift detection with a k-repeat noise floor, and a sticky PR comment via a GitHub Action. Dogfooded by every other repo here.
status: active · 112 tests