MyAPI - Saboor

Why this exists

Every agent session starts cold. Claude Code, Codex, Cursor — each one opens on a project it has never seen, scans files, infers what it can from names and READMEs, and starts working from a partial picture.

The context already existed. Three years of it: Obsidian notes, exported ChatGPT and Claude conversations, CLI agent session logs. The problem was never that the information didn't exist — it was that none of it was findable the way agents consume information. Agents don't browse. They query, and they expect structured evidence back.

[SAB — your origin moment goes here.] The brief asks: what specifically tipped you from "this is annoying" to "I'm building a system for this"? A session where an agent hallucinated something that was sitting documented in your vault? This paragraph is the one I can't write for you.

What it does

An agent sends a natural-language question to a single /query endpoint. MyAPI classifies the intent, routes it through a multi-lane retrieval pipeline, and returns reranked evidence with source metadata — in under 3 seconds.

Two audiences, one pipeline. The retrieval is identical; only the response shape changes. Agents get structured JSON — ranked evidence, source metadata, confidence. Humans get episodic recall ("find that thread where I worked out the Khoj sync race") and decision retrieval across years of notes. Same corpus, same provenance, different envelope.

How it works

The corpus is roughly 3,200 Obsidian markdown files plus conversation archives and CLI session logs. context_refinery/ normalizes those heterogeneous sources into canonical knowledge objects before anything hits the index.

Khoj provides semantic vector search. Context Refinery sits on top of it and does the actual work:

Query classification — what kind of question is this?
Multi-lane retrieval — semantic, keyword, and synthesized-note boosting, in parallel
Metadata-aware filtering — time, source type, tags
Reranking — so what comes back is evidence to reason with, not a list of file paths

context_refinery/retrieval.py carries most of that at ~1,500 lines, with ~770 lines of tests behind it.

Where it stands

Phase 1 is live — a cloud VM behind Tailscale. Not on the tailnet, can't reach it. No public endpoint, no auth layer; the network is the trust boundary.

Retrieval quality is measured, not felt. Every query in the bank gets diagnosed into one of seven buckets:

| Bucket | What it means | What it tells me to fix | |---|---|---| | Win | Correct evidence, top-ranked | Nothing — baseline | | Weak win | Correct evidence, buried | Ranking | | Corpus gap | The information isn't there | Content, not retrieval | | Retrieval gap | It's there, search missed it | Embeddings / query expansion | | Metadata gap | Filtered out by time/source/tag | The classifier's assumptions | | Intent gap | Routed to the wrong lane | Intent classification | | Answer-shape gap | Right evidence, wrong structure | Response envelope |

The buckets exist to name the lever. Corpus shaping and intent classification are the primary ones. Swapping models or tuning hyperparameters is the last resort, not the first instinct.

[SAB — the brief asks for numbers you have and I don't:] how many queries in the bank, which bucket is most common, the actual measured cold-start improvement, VM spec and real monthly cost.

What's next

Phase 1 proved the pipeline works. Phase 2 is trust calibration — working the bucket gaps, tuning the intent classifier, raising corpus signal.

The active direction is the corpus v1 substrate: normalized markdown, meaningful folder structure, frontmatter provenance, stable headers, explicit trust and canonicality fields. Corpus quality turns out to be source-of-truth quality — if what you're indexing isn't well-structured, no amount of retrieval tuning recovers the signal.

Done looks like this: an agent cold-starts, queries MyAPI, and begins working from accurate context without asking me to confirm any of it.