Pi Agent - Saboor

Why this exists

Off-the-shelf agent frameworks optimize for capability. Wrap a model in a tool-use loop, add function calling, ship. You get capability fast and you understand nothing about what the agent is actually doing.

I wanted the opposite: harness engineering as the point, and fine-grained control over the tools I use every day.

[SAB — the real origin.] What were you doing when you decided to build your own harness instead of using someone else's? This is the paragraph that earns the rest of the page.

What it does

Pi is a headless coding agent. It's on every machine I own and runs against whatever LLM provider is convenient. Interactive mode keeps human-in-the-loop control over what it can do; non-interactive work is described as packets — bounded YAML or JSON descriptions of work — and run by the harness.

The packet is the contract, and the runner is deliberately dumb.

Every run produces timestamped artifacts plus an independent verification pass. Not because verification is good hygiene, but because small models confabulate, and verifier trust has to come from tool-trace evidence rather than the shape of a claim. A confident-sounding answer is not evidence that anything happened.

That's also Pi's distinguishing job on a team of agents. Alongside Hermes as an always-on coordinator and Codex and Claude as peer coders, Pi is the one that verifies what the peers claim they did.

How it works

The gate is auto-mode.json. Twenty-eight natural-language deny rules, judged by a classifier model, with allow rules, an explicit tool allowlist, failOpen, and consecutive- and total-denial ceilings. The rules read like policy, not regex — "Irreversible Local Destruction: deleting, truncating, or overwriting local files that existed before the session started without explicit user direction naming the specific targets." Two of them govern the agent's honesty and its ability to touch its own guardrails.

The extensions are the deterministic layer — around 25 of them in TypeScript, each owning one concern: cwd-guard, mutation-confirm, recoverability-gate, language-guard, audit-log, cost-tracker, harness-state, stop-and-checkpoint, needle-shadow. Adding one is bounded and mechanical.

The runner is Bash — pi-packet validate, pi-packet run, pi-packet last — against a JSON Schema packet contract, with a Rust TUI (pi-hub-rs) as the operator hub for auditing and observing runs.

[SAB — worth answering here:] why deterministic extensions and a classifier gate, rather than one or the other? Has the gate ever blocked something it shouldn't have?

Where it stands

Daily driver, every machine. The harness audit graded the extension system well — adding an extension, packet, or dispatch target is bounded work. The open seams are extension load ordering and capability metadata on the tool catalog.

The repo is private today. Going public is gated on a scrub: operator hostnames and SSH paths still live in infra-topology.md and the incident reports, and those are tracked in git.

[SAB — status specifics.] What was the first thing you had Pi do? What does it do for you on a normal day?

What's next

Needle — a 26M on-device tool router that decides local-verb-vs-delegate-to-frontier across a five-verb registry, taking the routing decision out of a large fraction of tool calls. Your own project brief calls it the centerpiece.

[SAB — Needle deserves more than this paragraph, and you parked it for now.] When you're ready it's either its own section here or a fourth project. The ship-gate numbers and the finetune story are already written down in ~/.pi/needle/.