about — Sam Latino

AI engineer · Rust + Python · agent systems, evals, self-hosted LLM infrastructure

I build production AI systems — the orchestration layer, the evaluation harness, and the inference infrastructure underneath. Six open-source projects in 2026, all tested, all honest about what the numbers say (and where the numbers aren't in yet).

Background

B.S. Biological Engineering, Louisiana State University (2016–2020). The degree taught me how to read a methods section critically and distrust a result that wasn't reproduced. I apply the same instinct to ML benchmarks.

I came to engineering by way of enterprise software sales. That context is useful: it gave me a product-oriented perspective on technical decisions and a working model of how organizations actually adopt infrastructure. I know what "simple to operate" means to the people who have to operate it.

What I build

My work clusters around three areas:

Agent systems. Orchestration harnesses, permission models, tool-call routing, multi-round-trip flows. The practical question is always: what breaks at the boundary between the model and the environment, and how do you know before a user finds out?
Evals. Deterministic oracles where possible; statistical noise floors where not. I treat an eval suite as code — it has to be wrong-able, reproducible, and honest about confidence intervals. My own repos are dogfooded through eval-gate.
Self-hosted inference. Running Qwen models on local GPUs via vLLM. The operational cost of a cloud dependency compounds in ways that only become visible when you run your own stack and compare. Local inference also makes privacy routing tractable — one of the core ideas in patchbay.

The projects are in Rust and Python. Rust for anything that runs as a daemon, holds state, or needs a single-binary deploy. Python for eval harnesses and anything that lives close to the model API surface.

Current projects

All six are open source. Brief descriptions with links:

redcell — agent robustness testing mapped to the OWASP Top 10 for Agentic Applications. Deterministic oracles, 146-case probe corpus, in-repo sandboxed target with three hardening levels.
longhaul — an MCP server targeting the 2026-07-28 release-candidate spec: stateless core, Tasks extension, InputRequired multi-round-trip, JSON Schema 2020-12 tool schemas. Statelessness is proven by round-robining a task lifecycle across two server instances sharing only SQLite.
patchbay — single-binary OpenAI-compatible gateway. Privacy routing enforced in the type system: a private request cannot select an external backend, guaranteed by the type checker.
millstone — clean-room BM25 + tree-sitter repo-map retrieval crate. Built to find out where the crossover between lexical and vector retrieval actually lives on code corpora.
callcheck — tool-calling and structured-output conformance matrix for vLLM-served open models. An 11-label failure taxonomy, k=3 resumable runner, in-repo mockserver so CI proves the scorer.
eval-gate — regression-eval CI gate. Embedding-free scorers, drift detection with a k-repeat noise floor, sticky PR comments via a GitHub Action.

Contact

Open to conversations about agent infrastructure, evaluation methodology, and self-hosted inference. No cold outreach forms — direct contact only.

email: latinosammy2@gmail.com
github: github.com/slatino-dev
linkedin: linkedin.com/in/samlatino
huggingface: huggingface.co/SamLatino