sam@latino:~$ whoami

AI engineer. Agent systems, evals, and self-hosted LLM infrastructure — Rust + Python.

6 open-source projects · 524 tests · Rust · Python · TypeScript

~/projects

6 dirs

~/projects/redcell python

redcell

Defensive agent-robustness harness mapped to the OWASP Top 10 for Agentic Applications — deterministic oracles, a 146-case probe corpus, and a sandboxed three-level target.

status: active · 132 tests
~/projects/longhaul rust

longhaul

Early-adopter MCP 2026-07-28 release-candidate server in Rust — stateless core, the Tasks extension, and statelessness proven by round-robining one client across two server instances.

status: active · 67 tests
~/projects/patchbay rust

patchbay

Single-binary OpenAI-compatible gateway where privacy routing is enforced in the type system — a private request cannot select an external backend, by construction.

status: active · 13 tests
~/projects/millstone rust

millstone

Clean-room BM25 + tree-sitter repo-map retrieval crate, with a bench harness against tantivy and SQLite FTS5 — the "you probably don't need embeddings (yet)" thesis.

status: active · 35 tests
~/projects/callcheck python

callcheck

Tool-calling and structured-output conformance matrix for vLLM-served open models, with an 11-label failure taxonomy and a mock server that proves the scorer itself is correct.

status: active · 165 tests
~/projects/eval-gate python + typescript

eval-gate

Regression-eval CI gate — embedding-free scorers, drift detection with a k-repeat noise floor, and a sticky PR comment via a GitHub Action. Dogfooded by every other repo here.

status: active · 112 tests

~/writing

rss

2026-06-10 · retrieval · bm25 · benchmarks

BM25 beat my vector database (sometimes)

A crossover framework for lexical versus vector retrieval on code — and the adversarial bench harness I built so my own argument can lose.

2026-06-06 · tool calling · evals · vllm · gateways

Every model fails tool calling differently

Tool calling is the load-bearing primitive of every agent stack, and open models break it in at least eleven distinguishable ways. Naming the failure modes changes how you build the layer above.

2026-06-03 · agent security · owasp · prompt injection · evals

Red-teaming my own agents with the OWASP Agentic Top 10

Turning "resists prompt injection" into a regression number: a deterministic harness, 146 probes across five OWASP agentic categories, and a hardening sweep that went 73% → 3% → 0%.

~/contact

email: latinosammy2@gmail.com
github: github.com/slatino-dev
linkedin: linkedin.com/in/samlatino
hf: huggingface.co/SamLatino