sam@latino:~$ cat uses.md
uses
The stack I reach for. No affiliate links; no sponsored placement.
Languages
- Rust
- daemons, CLI tools, anything that needs a static binary and zero runtime surprise
- Python
- eval harnesses, model-API-adjacent code, anything with a fast iteration loop
- TypeScript
- Astro, GitHub Actions, one-off scripts
Inference
- vLLM
- serving open-weight models, tool-call formatting, structured output
- LiteLLM
- routing layer over vLLM; single key surface for all tooling
- Self-hosted GPU server
- one machine on the local network; no cloud inference dependency for production workloads
Storage
- SQLite (FTS5/BM25)
- lexical retrieval by default; single-file deploys; zero ops overhead
- LanceDB
- optional vector tier when semantic recall is genuinely required
- PostgreSQL
- relational workloads that need constraints and joins
Rust crates (frequent)
- axum + tokio
- HTTP servers; async runtime
- rusqlite
- SQLite, including FTS5 virtual tables
- criterion
- micro-benchmarks; the patchbay route path runs under it
- insta
- snapshot tests; catches regressions in rendered output
- serde / serde_json
- everywhere
Python packages (frequent)
- pytest + hypothesis
- unit tests and property-based testing
- httpx
- async HTTP client for model APIs and mockservers
- jsonschema
- conformance checking in callcheck and eval-gate
- rich
- terminal output in CLI harnesses
Editor and terminal
- Neovim
- primary editor
- Windows Terminal
- daily driver shell
- Git + GitHub CLI (gh)
- version control; CI via GitHub Actions
Hardware
One desktop as the daily driver. One GPU server on the local network for model inference — Qwen models, served via vLLM. The GPU server handles all production inference; nothing goes to a cloud provider during a working session.
The machines sit on a private mesh network; SSH is the only management interface, and no web dashboards are exposed.
What I don't use
Cloud LLM APIs during development. The cost model and the latency model both change if you can assume the model is local, and I'd rather build to that assumption than paper over the seams. The tradeoffs show up in patchbay's privacy-routing design and in millstone's case against an embedding dependency in the indexing path.