Cheaper models with Oxagen
How a typed code graph and business ontology let your agents run on smaller, faster models without giving up accuracy — the model-selection argument with the eval methodology to verify it.
The thesis
Everyone has the same models. The difference between a frontier model guessing at your domain and a small model retrieving from your typed graph is context, not capability.
Oxagen ingests your business ontology and your codebase into a Neo4j-backed knowledge graph. Agents query that graph through MCP. They get the schema, the call paths, the commits, and the tests in a typed, deterministic payload — and they stop paying for a frontier model to re-derive context that the graph already holds.
The trade is concrete: swap Opus for Haiku, hold accuracy on the eval suite that ships with your workspace, and the inference bill drops by a factor that lives at model selection — the largest variable cost in any agent deployment.
Where the savings come from
Smaller context windows
A typed traversal returns 5–20 nodes with the properties the agent needs. A vanilla agent dumps whole files. The token delta compounds across calls in a session — a five-step debugging thread on Oxagen fits in a fraction of the context the same thread spends on file dumps.
Fewer tool-call round trips
ontology.explain_function is one call. The vanilla equivalent —
read the file, grep for callers, git log, git show, grep for
tests, read the tests — is ten or more. Each round trip pays for
prompt cache turnover and serialization on top of the model's own
inference. Walk through the
agentic-coding cookbook
for the exact tool-call deltas on three real threads.
Deterministic node IDs
Once the agent has a UUID for a function, that ID is stable across the session and across runs. Cache hits are real — both at the LLM provider's prompt-cache layer and at the Oxagen response layer. Free-form retrieval that returns slightly different shapes call to call defeats both.
Typed traversal beats vector misses
Vector search returns whatever is similar in embedding space, which
is mostly the right thing — until it is not. A graph traversal of
typed calls edges returns the exact callers, every time, in the
order the connector wrote them. There is no false-positive blast
radius for the agent to clean up.
A concrete cost-shape comparison
Anthropic's list pricing puts Opus at roughly 27× the per-token cost of Haiku. The flagship eval run on the Oxagen harness — a typed agent task using the Oxagen MCP server, scored against the same task on a stateless Opus baseline — measures a 95% inference cost reduction with accuracy held. That is the canonical number on the Oxagen homepage; it ships sourced from the Evals dashboard rather than a marketing draft.
The shape of the comparison, on the eval task:
| Stateless Opus baseline | Haiku + Oxagen MCP | |
|---|---|---|
| Model | Claude Opus 4.7 | Claude Haiku 4.5 |
| Per-query model cost | ~$0.0220 (list) | ~$0.0008 (list) |
| Tool calls per task | 10–14 (file reads + greps + git) | 1–3 (typed MCP calls) |
| Context per call | full files / shell output | 5–20 typed nodes |
| Accuracy on harness | 1.00 baseline | held within the harness's tolerance |
| Cost delta per query | — | ~95% reduction |
The math you should not take on faith: clone
oxagenai/oxagen-evals,
point it at your workspace, and run it. The harness records the
prompt + completion tokens per call, the tool-call count, and the
correctness score. The accuracy and cost deltas above are the
early observed range from the published harness — your repos and
your tasks will produce their own numbers, and the harness tells you
what they are.
Numerical honesty: the only number you should trust is the number you can reproduce. Every figure on this page comes from the linked harness. When the Evals dashboard moves, this page moves with it.
What this is NOT
- Not a model-quality argument. Haiku is a smaller model than Opus. We are not claiming it reasons better. We are claiming that a Haiku agent reading a typed graph beats an Opus agent reading file dumps on bounded retrieval tasks — because the bottleneck on those tasks is context, not reasoning.
- Not a claim that Oxagen replaces the model. The model still composes the answer. Oxagen replaces the file-dump-and-grep loop the model spends most of its tokens on.
- Not universal. Tasks that are unbounded reasoning over vague prose (open-ended writing, novel design from scratch) do not benefit from a code or business graph. The savings live in the retrieval-heavy tasks that dominate any production agent: code understanding, debug triage, refactor, ticket triage, support routing, internal Q&A.
How to verify on your own workspace
- Connect a GitHub repository and let the backfill complete.
- Install the Oxagen MCP server in your coding agent.
- Run the agentic-coding cookbook on a function you care about — once on Opus without the MCP server, once on Haiku with it.
- Read the Code Graph page to understand which queries compress to one MCP call.
- Clone
oxagenai/oxagen-evalsand run the harness against your own codebase.
The harness is the argument. The page is the index.
Where to go next
- Code Graph — what gets ingested and how the canonical query is shaped.
- Cookbook: agentic coding with Oxagen — the three threads with real MCP payloads.
- MCP Server — install in Claude Code, Cursor, VS Code, Windsurf, or Codex.
oxagenai/oxagen-evals— methodology and harness for the numbers above.
BigQuery
Ingest any Google BigQuery query result into the workspace knowledge graph as typed nodes, with incremental delta loads and cross-domain edges to existing entities from other connectors.
Agent Overview
How the Oxagen agent reaches your workspace — one tool registry, three call sites (in-app chat, MCP, and the REST API) — and a map of every capability surface.