Cheaper models with Oxagen

How a typed code graph and business ontology let your agents run on smaller, faster models without giving up accuracy — the model-selection argument with the eval methodology to verify it.

The thesis

Everyone has the same models. The difference between a frontier model guessing at your domain and a small model retrieving from your typed graph is context, not capability.

Oxagen ingests your business ontology and your codebase into a Neo4j-backed knowledge graph. Agents query that graph through MCP. They get the schema, the call paths, the commits, and the tests in a typed, deterministic payload — and they stop paying for a frontier model to re-derive context that the graph already holds.

The trade is concrete: swap Opus for Haiku, hold accuracy on the eval suite that ships with your workspace, and the inference bill drops by a factor that lives at model selection — the largest variable cost in any agent deployment.

Where the savings come from

Smaller context windows

A typed traversal returns 5–20 nodes with the properties the agent needs. A vanilla agent dumps whole files. The token delta compounds across calls in a session — a five-step debugging thread on Oxagen fits in a fraction of the context the same thread spends on file dumps.

Fewer tool-call round trips

ontology.explain_function is one call. The vanilla equivalent — read the file, grep for callers, git log, git show, grep for tests, read the tests — is ten or more. Each round trip pays for prompt cache turnover and serialization on top of the model's own inference. Walk through the agentic-coding cookbook for the exact tool-call deltas on three real threads.

Deterministic node IDs

Once the agent has a UUID for a function, that ID is stable across the session and across runs. Cache hits are real — both at the LLM provider's prompt-cache layer and at the Oxagen response layer. Free-form retrieval that returns slightly different shapes call to call defeats both.

Typed traversal beats vector misses

Vector search returns whatever is similar in embedding space, which is mostly the right thing — until it is not. A graph traversal of typed calls edges returns the exact callers, every time, in the order the connector wrote them. There is no false-positive blast radius for the agent to clean up.

A concrete cost-shape comparison

Anthropic's list pricing puts Opus at roughly 27× the per-token cost of Haiku. The flagship eval run on the Oxagen harness — a typed agent task using the Oxagen MCP server, scored against the same task on a stateless Opus baseline — measures a 95% inference cost reduction with accuracy held. That is the canonical number on the Oxagen homepage; it ships sourced from the Evals dashboard rather than a marketing draft.

The shape of the comparison, on the eval task:

	Stateless Opus baseline	Haiku + Oxagen MCP
Model	Claude Opus 4.7	Claude Haiku 4.5
Per-query model cost	~$0.0220 (list)	~$0.0008 (list)
Tool calls per task	10–14 (file reads + greps + git)	1–3 (typed MCP calls)
Context per call	full files / shell output	5–20 typed nodes
Accuracy on harness	1.00 baseline	held within the harness's tolerance
Cost delta per query	—	~95% reduction

The math you should not take on faith: clone oxagenai/oxagen-evals, point it at your workspace, and run it. The harness records the prompt + completion tokens per call, the tool-call count, and the correctness score. The accuracy and cost deltas above are the early observed range from the published harness — your repos and your tasks will produce their own numbers, and the harness tells you what they are.

Numerical honesty: the only number you should trust is the number you can reproduce. Every figure on this page comes from the linked harness. When the Evals dashboard moves, this page moves with it.

What this is NOT

Not a model-quality argument. Haiku is a smaller model than Opus. We are not claiming it reasons better. We are claiming that a Haiku agent reading a typed graph beats an Opus agent reading file dumps on bounded retrieval tasks — because the bottleneck on those tasks is context, not reasoning.
Not a claim that Oxagen replaces the model. The model still composes the answer. Oxagen replaces the file-dump-and-grep loop the model spends most of its tokens on.
Not universal. Tasks that are unbounded reasoning over vague prose (open-ended writing, novel design from scratch) do not benefit from a code or business graph. The savings live in the retrieval-heavy tasks that dominate any production agent: code understanding, debug triage, refactor, ticket triage, support routing, internal Q&A.

How to verify on your own workspace

Connect a GitHub repository and let the backfill complete.
Install the Oxagen MCP server in your coding agent.
Run the agentic-coding cookbook on a function you care about — once on Opus without the MCP server, once on Haiku with it.
Read the Code Graph page to understand which queries compress to one MCP call.
Clone oxagenai/oxagen-evals and run the harness against your own codebase.

The harness is the argument. The page is the index.

Where to go next

Code Graph — what gets ingested and how the canonical query is shaped.
Cookbook: agentic coding with Oxagen — the three threads with real MCP payloads.
MCP Server — install in Claude Code, Cursor, VS Code, Windsurf, or Codex.
oxagenai/oxagen-evals — methodology and harness for the numbers above.

Get started free · Read the docs

Cheaper models with Oxagen

On this page