BM25 beat my vector database (sometimes)
A crossover framework for lexical versus vector retrieval on code — and the adversarial bench harness I built so my own argument can lose.
sam@latino:~/writing$ ls -lt
Notes from building agent systems, evals, and self-hosted LLM infrastructure. Every post is backed by a repo you can clone and a case study you can check it against.
A crossover framework for lexical versus vector retrieval on code — and the adversarial bench harness I built so my own argument can lose.
Tool calling is the load-bearing primitive of every agent stack, and open models break it in at least eleven distinguishable ways. Naming the failure modes changes how you build the layer above.
Turning "resists prompt injection" into a regression number: a deterministic harness, 146 probes across five OWASP agentic categories, and a hardening sweep that went 73% → 3% → 0%.