tag

#ollama

4 posts

Hermes on CPU: Diagnosing a 39-Minute Response Time April 27, 2026

A disk usage query taking 39 minutes isn't slow hardware — it's a misconfigured agent. Here's how to find the real bottleneck and fix it.
Local Model or API: Choosing the Right Backend for Hermes April 27, 2026

CPU-only inference on an OptiPlex 7070 is painfully slow. Here's a practical comparison of cloud APIs and second-hand GPU upgrades — with real costs in NZD.
Choosing a Local LLM for CPU-Only Inference April 26, 2026

Comparing local models for CPU-only inference on a 16 GB machine — focused on the constraints that actually matter for a Hermes agent: 64k context, reliable tool use, fitting in RAM, and response speed.
Hermes Agent on a CPU-Only Mini PC April 25, 2026

Installing Ollama and Hermes on a headless Ubuntu box, wiring up the Claude gateway, and working through the rough edges including a gateway install bug and a model that wouldn't use tools.