Oxagen Docs

How Oxagen ingests a repository into a typed, queryable code graph and what your agents can traverse against it.

Why a code graph?

Grep and embedding-only retrieval treat a codebase as a bag of strings. An agent asked "who calls parseJWT?" either runs ten greps and stitches the answers together, or it gets a vector-similarity guess that misses exact callers and surfaces unrelated files that happen to mention the substring. Both modes pay tokens for context the agent then has to read to find what it actually needed.

A code graph encodes the relationships directly. Files, functions, classes, imports, calls, and tests are typed nodes; the edges between them are typed too. One traversal answers structural questions exactly, in one MCP call, with deterministic node IDs the agent can cite back.

Oxagen ingests every connected GitHub repository into the same Neo4j-backed workspace knowledge graph that holds your business ontology — so an agent doing code work and an agent doing business work read from the same store, with the same RLS scoping, over the same MCP surface.

What gets ingested

Connect a GitHub repository through the GitHub App and Oxagen runs a deterministic pipeline: clone at HEAD, walk the include / exclude globs, parse each supported file with tree-sitter, resolve cross-file references with stack-graphs, fetch repo + identity + CI + tests + security metadata through the GitHub REST + GraphQL APIs, and write typed nodes and edges into the workspace graph. The source parser ships TypeScript / JavaScript and Python; the GitHub-metadata, CI, tests, and security passes apply to repositories in any language.

Every node and edge carries an envelope of provenance: tenant_id, workspace_id, source (one of github, git, tree-sitter, lsp, derived, manual), source_id, fetched_at, observed_at, schema_version. Edges carry valid_from and a nullable valid_to so every traversal is bitemporal — the graph reflects HEAD by default and can be queried at any commit (see Time-travel queries).

Node types

The ingestion pipeline emits nodes in nine categories. Every node inherits the provenance envelope above; the tables below name only the type-specific properties.

Repository layer

Node type	Represents	Key properties
`code.repo`	The connected repository	`full_name`, `default_branch`, `visibility`, `last_synced_sha`, `is_public`, `installation_id`
`code.branch`	A named branch on the repo	`name`, `is_default`, `is_protected`, `head_commit_id`
`code.tag`	A git tag	`name`, `target_commit_id`, `is_annotated`
`code.commit`	A commit on the default branch	`sha`, `short_sha`, `message`, `message_body`, `author_name`, `author_email`, `authored_at`, `diff_patch`, `files_changed`, `insertions`, `deletions`
`code.tree`	A git tree at a commit	`sha`, `commit_id`, `entry_count`
`code.conventional_commit_scope`	Parsed Conventional Commit scope	`scope`, `usage_count`

Identity layer

Node type	Represents	Key properties
`code.author`	A `(name, email)` pair from `git log`	`name`, `email`, `commit_count`, `first_seen_at`, `last_seen_at`
`github.user`	A GitHub account	`login`, `github_id`, `name`, `email_public`, `is_bot`, `is_member_of_org`, `avatar_url`
`person`	The deduplicated human behind one or more git authors / GitHub users	`canonical_name`, `primary_email`, `aliases[]`, `oxagen_user_id`

The identity-resolution pass runs after every sync and folds code.author + github.user into a single person via deterministic strategies — exact-email match (1.0), GitHub login match (0.8), name match (0.6) — recorded as edges so the resolution is auditable.

Filesystem layer

Node type	Represents	Key properties
`code.file`	One source file at HEAD	`path`, `lang`, `blob_sha`, `line_count`, `is_test_file`, `is_generated`, `parse_error`
`code.file_version`	A file at a specific commit — the anchor every code-semantic node ties to	`file_id`, `commit_id`, `blob_sha`, `path_at_commit`, `loc`, `change_type`, `previous_path`, `previous_blob_sha`
`code.package`	An internal package directory	`path`, `package_name`, `version`, `private`

Code semantics

Node type	Represents	Key properties
`code.function`	A function, method, or arrow declaration	`file_id`, `start_line`, `end_line`, `signature`, `is_exported`, `is_async`, `kind`, `parent_class`
`code.class`	A class declaration	`file_id`, `start_line`, `end_line`, `is_exported`, `export_name`
`code.symbol`	An interface, type alias, enum, or const export	`file_id`, `kind`, `start_line`, `end_line`, `is_exported`, `export_name`
`code.namespace`	A TS / Python namespace or module-level scope	`name`, `file_id`
`code.import`	A resolved import statement	`from`, `to`, `is_external`
`code.variable`	A module-level binding the parser surfaces for resolution	`name`, `file_id`, `start_line`
`code.decorator`	A decorator applied to a function or class	`name`, `file_id`
`code.type_reference`	A reference to a typed symbol from inside a function signature	`name`, `referrer_id`, `target_id`
`code.exception`	A named exception class observed in `raise` / `throw`	`name`
`code.external_package`	A declared third-party dependency	`name`, `ecosystem` (`npm`, `pypi`, `mixed`)
`code.chunk`	A summarised slice of source for hybrid retrieval	`file_id`, `start_line`, `end_line`, `summary`, `embedding_id`

Markdown / mdx documents are stored as code.file nodes with lang in {markdown, mdx} — not a separate code.doc type. Filter on lang when querying for "all docs in this repo".

diff_patch is byte-capped at 16 KB per commit. Larger diffs are truncated UTF-8-safely with a [truncated] marker so an agent can tell the diff was clipped before reasoning over it.

Documentation surfaces

Node type	Represents	Key properties
`code.adr`	An ADR file under `docs/adr/` or similar	`number`, `title`, `status`, `file_id`
`code.changelog_entry`	One release entry in `CHANGELOG.md`	`version`, `date`, `kind`, `body`

GitHub metadata

Node type	Represents	Key properties
`github.pull_request`	A pull request on the repo	`number`, `title`, `body`, `state`, `merged`, `merged_at`, `base_ref`, `head_ref`
`github.issue`	An issue	`number`, `title`, `body`, `state`, `closed_at`
`github.discussion`	A discussion thread	`number`, `title`, `body`, `category_id`, `is_answered`
`github.discussion_category`	A discussion category	`name`, `slug`
`github.review`	A code review on a PR	`state`, `submitted_at`, `body`
`github.review_comment`	An in-line review comment anchored to a file + line	`path`, `line`, `body`
`github.comment`	A free-form comment on an issue / PR / discussion	`body`, `created_at`
`github.label`	A repo label	`name`, `color`, `description`
`github.milestone`	A repo milestone	`title`, `due_on`, `state`
`github.release`	A published release	`tag_name`, `name`, `body`, `published_at`, `prerelease`
`github.project` / `github.project_item`	A GitHub Project board + its items	`name`, `title`, `body`, `status`

CI

Node type	Represents	Key properties
`github.workflow`	A workflow definition	`path`, `name`, `state`
`github.workflow_run`	One execution of a workflow	`run_id`, `status`, `conclusion`, `started_at`, `completed_at`
`github.workflow_job`	A job inside a workflow run	`name`, `status`, `conclusion`, `runner_id`
`github.workflow_step`	A step inside a job	`name`, `status`, `conclusion`, `number`
`github.workflow_artifact`	An artifact produced by a run	`name`, `size_bytes`, `expired`
`github.runner`	A self-hosted or hosted runner	`name`, `os`, `is_self_hosted`
`github.action`	A reusable action referenced by a workflow	`name`, `repo`, `version`

Tests, coverage, knowledge

Node type	Represents	Key properties
`code.test_suite`	A test suite (file or module)	`name`, `file_id`, `framework`
`code.test_case`	One test inside a suite	`name`, `suite_id`, `start_line`, `end_line`
`code.test_run`	An execution of a test suite tied to a workflow run	`workflow_run_id`, `started_at`, `total`, `passed`, `failed`, `skipped`
`code.test_result`	The outcome of one case in one run	`case_id`, `run_id`, `status`, `duration_ms`, `failure_message`
`code.coverage_record`	Coverage for one file at a commit	`file_id`, `commit_id`, `lines_total`, `lines_hit`

Security

Node type	Represents	Key properties
`security.dependabot_alert`	A Dependabot alert	`number`, `state`, `severity`, `package`, `ecosystem`, `cve`
`security.code_scanning_alert`	A code-scanning alert	`number`, `state`, `severity`, `rule_id`, `file_id`
`security.secret_scanning_alert`	A secret-scanning alert	`number`, `state`, `secret_type`, `validity`
`security.secret_scanning_location`	The file + line a secret was observed at	`file_id`, `start_line`, `end_line`
`security.vulnerability`	A CVE / GHSA advisory referenced by an alert	`ghsa_id`, `cve_id`, `severity`, `summary`

Memory

Node type	Represents	Key properties
`memory.episode`	One agent session: what it worked on, ran, observed, produced	`agent_id`, `started_at`, `ended_at`, `outcome`, `salience_score`
`memory.procedure`	A pattern distilled from ≥3 episodes that share an anchor	`trigger`, `precision_score`, `reuse_count`, `is_pinned`, `recommendation`
`memory.archived_episode`	An episode demoted below the salience floor after 30 days	mirrors `memory.episode` minus the live HNSW embedding

Edge types

The connector emits structural edges deterministically from the AST and the GitHub API. Semantic edges are LLM-inferred with a confidence score and can be turned off per-connection. Derived edges are computed by the M7 derivation jobs and refresh on every push.

Every edge carries valid_from and a nullable valid_to (the bitemporal envelope). The structural edges below are the canonical ones agents traverse — see packages/oxagen/oxagen/connectors/github/types.py:EdgeKind for the complete enum, including review / comment / project plumbing surfaces.

Code structure

Edge type	Source → Target	Provenance	Semantics
`contains`	`code.repo` → `code.package` / `code.file`; `code.package` → `code.file`	structural	Hierarchical containment.
`defines`	`code.file` → function / class / symbol; `code.class` → method	structural	This file (or class) declares this child.
`defined_in`	`code.function` / `code.class` / `code.symbol` → `code.file_version`	structural	Anchors the symbol to the file at a specific commit.
`member_of`	`code.function` → `code.class`	structural	Inverse of class→method `defines`. One-hop "what class does this method belong to?".
`imports`	`code.file` → `code.file`	structural	One file imports another.
`resolves_to_import`	`code.import` → `code.file` / `code.external_package`	structural	Stack-graphs resolution of the import statement.
`calls`	`code.function` → `code.function`	structural	Resolved call site. Carries `call_sites` count and `is_cross_file`.
`extends`	`code.class` → `code.class`	structural	Inheritance.
`implements`	`code.class` → `code.class`	structural	Interface implementation.
`instantiates`	`code.function` → `code.class`	structural	A function constructs the class.
`decorated_by`	`code.function` / `code.class` → `code.decorator`	structural	Decorator application.
`has_namespace`	`code.function` / `code.class` / `code.symbol` → `code.namespace`	structural	Symbol lives inside a namespace.
`has_type_reference`	`code.function` → `code.symbol`	structural	Type referenced in a function signature.
`declares_type`	`code.function` → `code.symbol`	structural	Function annotated with this type.
`re_exports`	`code.symbol` → `code.symbol`	structural	Symbol re-exported from another module.
`depends_on`	`code.file` → `code.external_package`	structural	Declared package dependency. Carries `version_spec` and `dep_kind`.
`references`	`code.file` → `code.file`	structural	Internal markdown / doc link.
`tests` / `is_tested_by`	`code.function` ↔ `code.function`	structural	Heuristic test pairing — both directions emitted.
`throws` / `is_thrown_by`	`code.function` ↔ `code.exception`	structural	Function raises a named exception.
`has_chunk`	`code.file` → `code.chunk`	structural	File chunked for hybrid retrieval.

Semantic (LLM-inferred, optional)

Edge type	Source → Target	Semantics
`reads`	`code.function` → `code.symbol` / field	Reads a value or field. Off by default on large repos.
`writes`	`code.function` → `code.symbol` / field	Writes a value or field.
`returns`	`code.function` → `code.symbol`	Returns this shape.
`modifies`	`code.function` → `code.symbol`	Mutates the target.
`calculates`	`code.function` → `code.symbol`	Computes the target value.
`validates`	`code.function` → `code.symbol`	Validates the target.
`configures`	`code.function` → `code.symbol`	Configures a target.
`computes`	`code.function` → `code.symbol`	Derived computation.

Git history and identity

Edge type	Source → Target	Provenance	Semantics
`has_commit`	`code.repo` → `code.commit`	structural	Repo has this commit on the default branch.
`contains_commit`	`code.branch` → `code.commit`	structural	Branch contains this commit.
`head_at`	`code.branch` → `code.commit`	structural	Branch points at this head commit.
`targets_branch`	`github.pull_request` → `code.branch`	structural	PR targets this branch.
`merged_as`	`github.pull_request` → `code.commit`	structural	PR was merged as this commit.
`authored_by`	`code.commit` → `code.author`	structural	Commit was authored by this person. Authors deduplicated by email.
`has_scope`	`code.commit` → `code.conventional_commit_scope`	structural	Conventional Commits scope parsed from the message.
`touched`	`code.commit` → `code.file` / `code.file_version` / `code.function`	structural	Files / file versions (always) and functions (best-effort from diff hunk markers) the commit changed. Carries `action` — `added` / `modified` / `removed` / `renamed` / `copied` — sourced from `git show --name-status`. Distinct from `modifies` — see note below.
`touches_symbol`	`code.commit` → `code.symbol` / `code.function`	structural	Symbol-precise version of `touched`, materialised by the M3 commit→symbol resolver.
`in_release`	`code.commit` → `github.release`	structural	Commit was included in this release.

GitHub metadata (PRs, issues, discussions, reviews)

Edge type	Source → Target	Semantics
`of` / `by`	`github.review` → `github.pull_request` / `github.user`	Review belongs to a PR; review was written by a user.
`of_review`	`github.review_comment` → `github.review`	Review comment belongs to a review.
`anchored_to`	`github.review_comment` → `code.file_version` / `code.file`	Review comment is anchored to a file at a commit.
`on`	`github.comment` → `github.issue` / `github.pull_request` / `github.discussion` / `code.commit`	Comment is on a parent surface.
`replies_to`	`github.comment` → `github.comment`	Threaded reply.
`assigned_to`	`github.issue` → `github.user`	Issue assignee.
`labeled`	`github.issue` / `github.pull_request` → `github.label`	Item carries a label.
`in_milestone`	`github.issue` / `github.pull_request` → `github.milestone`	Item is inside a milestone.
`in_category`	`github.discussion` → `github.discussion_category`	Discussion sits in a category.
`answered_by`	`github.discussion` → `github.comment`	The accepted answer of a discussion.
`mentions`	`github.issue` / `github.pull_request` → `github.issue` / `github.pull_request` / `code.commit` / `github.user`	Cross-link parsed from issue/PR bodies.
`closes` / `links`	`github.pull_request` → `github.issue`	PR closes / references an issue.

CI

Edge type	Source → Target	Semantics
`of_workflow`	`github.workflow_run` → `github.workflow`	Run belongs to this workflow.
`of_run`	`github.workflow_job` → `github.workflow_run`	Job belongs to this run.
`of_job`	`github.workflow_step` → `github.workflow_job`	Step belongs to this job.
`uses`	`github.workflow` / `github.workflow_step` → `github.action`	Workflow / step references this reusable action.
`triggered_by_commit`	`github.workflow_run` → `code.commit`	Run was triggered by this commit.
`triggered_by_pr`	`github.workflow_run` → `github.pull_request`	Run was triggered by this PR.
`triggered_by_user`	`github.workflow_run` → `github.user`	Run was triggered by this user (manual dispatch).
`produced`	`github.workflow_run` → `github.workflow_artifact`	Run produced this artifact.
`ran_on`	`github.workflow_job` → `github.runner`	Job ran on this runner.
`job_depends_on`	`github.workflow_job` → `github.workflow_job`	Inter-job dependency from `needs:`.

Tests and coverage

Edge type	Source → Target	Semantics
`of_suite`	`code.test_case` → `code.test_suite`	Test case belongs to a suite.
`of_case`	`code.test_result` → `code.test_case`	Result is for this case.
`of_workflow_run`	`code.test_run` → `github.workflow_run`	Test run was produced by this workflow run.
`of_file`	`code.coverage_record` → `code.file`	Coverage record applies to this file.
`at_commit`	`code.coverage_record` → `code.commit`	Coverage record is at this commit.
`hit`	`code.coverage_record` → `code.function`	Functions exercised by the run.

Security

Edge type	Source → Target	Semantics
`defined_by`	`security.dependabot_alert` → `security.vulnerability`	Alert references this CVE/GHSA.
`anchored_to`	`security.code_scanning_alert` → `code.file_version`	Alert is anchored to a file at a commit.
`of_file`	`security.secret_scanning_location` → `code.file`	Secret-scanning hit lives in this file.
`supersedes`	`security.dependabot_alert` → `security.dependabot_alert`	A new alert supersedes an older one for the same dependency.

Memory

Edge type	Source → Target	Semantics
`worked_on`	`memory.episode` → `code.symbol` / `code.file` / `github.issue` / `github.pull_request`	What the agent's session targeted.
`ran`	`memory.episode` → `code.test_case`	Tests the agent ran during the session.
`observed`	`memory.episode` → `code.test_result` / `github.workflow_run` / error fingerprint	What the agent saw happen.
`produced`	`memory.episode` → `code.commit` / `github.pull_request`	Commits / PRs the agent's session produced.
`used_tool`	`memory.episode` → `MCPTool`	MCP tool the agent called. Carries `tool`, `args_hash`, `result_hash`.
`triggers_on`	`memory.procedure` → `code.symbol` / `code.test_case` / pattern	Anchor that activates a procedure on recall.
`derived_from`	`memory.procedure` → `memory.episode`	Provenance: which episodes promoted to this procedure.
`supersedes`	`memory.procedure` → `memory.procedure`	Newer procedure replaces an older one.
`invalidated_by`	`memory.procedure` → `code.commit`	Commit that invalidated the procedure.

Derived (M7)

The derivation jobs run after every push and recompute the following edges from the structural graph. They carry confidence_score and computed_at.

Edge type	Source → Target	Semantics
`expert_on`	`person` → `code.symbol` / `code.file`	Top contributor by modification volume + recency.
`co_changes_with`	`code.symbol` → `code.symbol`	Symbols that historically change together. Powers `code.co_changes_with`.
`introduced_bug`	`code.commit` → `code.commit`	SZZ — this commit introduced the bug a later commit fixed.
`regressed_by`	`code.test_case` → `code.commit`	The commit that regressed a previously-passing test.

Agent-authored (incidents)

Edge type	Source → Target	Semantics
`reports` / `impacts`	`code.incident` ↔ code node	Agent-authored incident edges; see the MCP server.

Why touched and not modifies for commit edges? modifies is reserved for the LLM-inferred semantic edge above (code.function → code.symbol — this function mutates this value). The structural commit→file edge would be a different relationship with the same string, which Neo4j would silently merge into a single edge type. An agent calling code.find_path { edge_types: ["modifies"] } would then get a mix of structural commit history and semantic data flow with no way to disambiguate. touched keeps the two cleanly apart.

The EdgeKind enum in packages/oxagen/oxagen/connectors/github/types.py is the source of truth for the names — every traversal tool accepts these strings as filters.

The canonical agent query

The whole graph is shaped to make the following query a one-call answer. Given a function name, return everything an agent needs to fix or refactor it: input shape, output shape, last commit, last author, diff, test coverage, and the docs that reference it.

MATCH (f:Node {type: 'code.function', name: $fn_name})
OPTIONAL MATCH (c:Node {type: 'code.commit'})-[:touched]->(f)
OPTIONAL MATCH (c)-[:authored_by]->(a:Node {type: 'code.author'})
OPTIONAL MATCH (f)<-[:is_tested_by]-(t:Node {type: 'code.function'})
OPTIONAL MATCH (doc:Node {type: 'code.file'})-[:references]->(f)
WHERE doc.properties.lang IN ['markdown', 'mdx']
OPTIONAL MATCH (c)-[ct:touched]->(f)
RETURN
  f.properties.params         AS input_shape,
  f.properties.return_type    AS output_shape,
  f.properties.signature      AS signature,
  c.properties.message        AS last_commit_message,
  c.properties.diff_patch     AS last_diff,
  ct.action                   AS last_change_kind,  -- added | modified | removed
  a.properties.name           AS last_author,
  collect(DISTINCT t.name)    AS test_coverage,
  collect(DISTINCT doc.name)  AS referencing_docs
ORDER BY c.properties.authored_at DESC
LIMIT 1

What the OPTIONAL MATCH clauses do, one at a time:

The first MATCH anchors on the function. name is a property; the type filter restricts to code.function so a class or symbol of the same name does not collide.
c-[:touched]->(f) walks back to every commit whose diff touched this function. The touched edge from a commit to a function is best-effort — emitted when the diff hunk header includes a function marker — but touched from commit to file is exhaustive, which is why code.commit is queryable as a sibling and not the only handle.
c-[:authored_by]->(a) resolves the commit's author node. Authors are deduplicated on email across the repo, so an agent can ask "who else has touched this code?" with one more hop.
f<-[:is_tested_by]-(t) returns the test functions that exercise the target. The connector also emits the inverse tests edge from the test function to the subject — pick the direction that fits the query.
doc-[:references]->(f) returns markdown / mdx documents that name the function or its file. Design docs, changelogs, and incident write-ups all link back to the symbols they describe. Markdown / mdx files are stored as code.file nodes with lang in {markdown, mdx} — the WHERE clause above filters to those.

A vanilla file-search agent reproduces this with: git log --follow, git blame for the author, grep -r 'def parseJWT' to find the implementation, two more greps to locate tests, a final pass to find docs that mention the symbol — every step a separate tool call, every hop paying for tokens. The graph collapses it to one MCP call and returns typed records the agent does not have to re-parse.

Impact analysis across class members

The member_of edge makes the "if I change a method, what else in the class might break?" query a single hop. Expand the canonical query above to walk class siblings + their tests:

MATCH (m:Node {type: 'code.function', name: $method_name})
MATCH (m)-[:member_of]->(cls:Node {type: 'code.class'})
MATCH (cls)-[:defines]->(sibling:Node {type: 'code.function'})
WHERE sibling.id <> m.id
OPTIONAL MATCH (sibling)<-[:is_tested_by]-(t:Node {type: 'code.function'})
RETURN
  cls.name                       AS containing_class,
  sibling.name                   AS sibling_method,
  sibling.properties.signature   AS sibling_signature,
  EXISTS { MATCH (sibling)-[:calls]->(m) } AS calls_changed_method,
  collect(DISTINCT t.name)       AS sibling_tests

What this answers in one MCP call: which methods sit beside the one the agent is about to change, which of them call into it, and the tests that exercise each of those siblings. The agent does not need to grep, git blame, or re-parse — every relationship is already typed in the graph.

You do not write Cypher directly — ontology.explain_function, ontology.symbol_context, ontology.traverse, and ontology.ask compile to this shape. ontology.symbol_context is the closest direct wrapper: pass a name (or node_id) and it returns the target node, its containing class, sibling methods, tests, callers, callees, semantic returns, and the most-recent commits with action labels and resolved authors — every collection independently capped so a hub function can't blow up the response. See How agents query the graph below for the MCP tool surface.

// laser-context bundle for a single symbol — one MCP call, no Cypher
{
  "tool": "ontology.symbol_context",
  "args": { "name": "processOrder" }
}
// →
{
  "target": {
    "id": "…", "name": "processOrder", "kind": "function",
    "signature": "async function processOrder(input)",
    "file": "services/api/lib/orders.ts",
    "is_exported": true, "is_async": true
  },
  "containing_class": { "id": "…", "name": "OrderService" },
  "siblings":         [ { "id": "…", "name": "cancelOrder" } ],
  "tests":            [ { "id": "…", "name": "processOrder_handles_decline" } ],
  "callers":          [ { "id": "…", "name": "checkoutHandler" } ],
  "callees":          [ { "id": "…", "name": "chargeStripe" } ],
  "semantic_returns": [ { "id": "…", "name": "OrderResult" } ],
  "recent_commits": [
    {
      "commit": {
        "id": "…", "name": "9a4c1f10",
        "sha": "9a4c1f10abcdef…", "short_sha": "9a4c1f10",
        "message": "fix(api): retry chargeStripe on transient 5xx",
        "authored_at": "2026-04-22T14:03:12Z",
        "files_changed": 3, "insertions": 41, "deletions": 12
      },
      "action": "modified",
      "author": { "id": "…", "name": "Alex Rivera", "email": "alex@example.com" },
      "diff_excerpt": "@@ -1,3 +1,5 @@\n  retry()"
    }
  ]
}

Caps are configurable: siblings_limit, tests_limit, callers_limit, callees_limit, returns_limit, and commits_limit default to 50 / 50 / 50 / 50 / 25 / 10 respectively. Identity comes from the bearer token — workspace_id is never read from input.

How agents query the graph

Every entry in the catalogue below is a real MCP tool exposed by mcp.oxagen.ai. Tool names and input field names match the running server — see MCP Server for installation. The catalogue follows the SPEC §7 grouping so it stays one-to-one with the spec doc.

Discovery and structure

Tool	What it returns
`code.repo_overview`	Size, languages, top modules, top contributors, hot files, alert counts.
`code.module_tree`	Logical module hierarchy from `code.file` nodes.
`code.find_symbol`	Resolve a name to `code.symbol` / `code.function` / `code.class` nodes.
`code.describe_symbol`	Full record per symbol — signature, callers count, callees count, summary.
`code.read_symbol`	Source bytes for one symbol — line-sliced to its range with N lines of surrounding context (default 5, configurable 0–200). Replaces grep + git-show + base64 decode in one MCP call.

Traversal and impact

Tool	What it returns
`code.find_callers`	Inbound `calls` edges to a target node, transitive up to `depth=5`.
`code.callees_of`	Outbound `calls` from a target node.
`code.find_dependencies`	Outbound `calls` / `imports` edges from a target node.
`code.find_path`	Shortest path between two nodes constrained by `edge_types`.
`code.references_to`	All inbound references — calls, imports, uses.
`code.co_changes_with`	Symbols that historically change together (M7 derived `co_changes_with` edge).
`code.get_neighborhood`	Bidirectional 1- or 2-hop expansion around a target.
`code.find_dead_code`	Nodes with no inbound `calls` / `imports` from elsewhere in the workspace.
`code.find_cycles`	Cycles in the dependency graph, filtered by `kind` and `min_length`.
`code.stats`	Aggregate counts (files, functions, classes), fan-in / fan-out, cycle count.
`ontology.explain_function`	Full edge neighbourhood for a named function — what it calls, throws, reads, writes, returns.
`ontology.impact_of`	Reverse traversal: every function that reads / writes / modifies / calculates a named symbol.
`ontology.symbol_context`	One-call laser-context bundle for a symbol — class, sibling methods, tests, callers, callees, semantic returns, recent commits. Replaces 5+ separate traversal calls.
`ontology.traverse`	Bidirectional path enumeration from a node id, capped at `max_hops=5`, optionally filtered by `edge_types`.

SPEC §7.2 aliases dual-register at the same credit cost: code.callers_of → code.find_callers, code.dependency_path → code.find_path, code.affected_by → ontology.impact_of. Both names roll up to the canonical in usage analytics.

History and ownership

Tool	What it returns
`code.recent_changes`	Recently-changed files plus the commits that touched them.
`code.blame_enriched`	Git blame plus the introducing PR body and its reviews.
`code.pr_history`	PRs touching a target with merge status.
`code.who_knows_about`	Top experts by `EXPERT_ON` score (M7 derivation).
`code.expertise`	What symbols a person owns (M7 derivation).

Tests and CI

Tool	What it returns
`code.tests_for`	Tests that exercise a target.
`code.coverage_for`	Coverage records for a file.
`code.failing_tests`	Currently failing test cases.
`code.flaky_tests`	Tests that flip pass/fail across recent runs.
`code.last_run`	Last workflow run for a workflow.
`code.run_failures`	Per-job failures inside a workflow run.

Dependencies and security

Tool	What it returns
`code.dependencies`	Direct dependencies with constraints, resolved versions, and licenses.
`code.dependency_graph`	Transitive dependency subgraph for a package.
`security.open_alerts`	Aggregate open Dependabot / code-scanning / secret-scanning alerts.
`security.alert_context`	Full context for one alert — anchor, vulnerability, related commits.

Issues, PRs, discussions

Tool	What it returns
`code.pr_context`	A PR plus its linked issues, reviews, and comments.
`code.issue_context`	An issue plus linked PRs, commits, and discussion.
`code.find_issues`	Substring + filter search across `github.issue` nodes.
`code.discussion_context`	A discussion plus its accepted answer and related code.

Search

Tool	What it returns
`ontology.search`	Hybrid vector + structural search across the workspace graph.
`code.find_pattern`	Hybrid score over `code.chunk` nodes — summary match plus structural anchors.
`ontology.ask`	Hybrid retrieval, then an LLM composes a grounded answer with cited node UUIDs.

Memory

Tool	What it returns
`memory.recall`	Procedures-first recall envelope — top-K procedures plus the episodes that promoted them.
`memory.remember`	Anchor a fresh `memory.episode` to the active session.
`memory.procedure_for`	Direct trigger-pattern lookup against `memory.procedure` nodes.
`memory.forget`	Soft-delete a memory. Pinned memories require an explicit grant.

Maintenance

Tool	What it returns
`ontology.refresh_repo`	Trigger a paths-restricted re-ingest on a connection.

Identity (workspace_id, user_id) is derived from the verified bearer token — never passed as input. Cross-workspace queries return empty results.

Response envelope

Every code-graph tool returns the same canonical envelope so the agent parses one shape regardless of which tool emitted it.

{
  "results": [...],
  "evidence": [
    {"node_id": "…", "kind": "code.function", "url": "https://github.com/…"}
  ],
  "tokens_used_estimate": 142,
  "counterfactual_estimate_tokens": 8400,
  "counterfactual_method": "grep_plus_read_n_files",
  "cursor": null,
  "tenant_scoped_at": {
    "tenant_id": "…",
    "workspace_id": "…",
    "project_id": null
  }
}

results — the typed records the tool was asked for.
evidence — node IDs the agent can cite back. Every record in results is reachable from at least one evidence entry.
tokens_used_estimate — tokens spent on this tool's reply, computed from the assembled payload (Anthropic SDK token-count API → tiktoken cl100k_base approximation → deterministic fallback). Note: the tiktoken fallback uses GPT-4 tokenisation, so estimates may differ slightly from Anthropic-billed token counts.
counterfactual_estimate_tokens — what the same answer would have cost a vanilla file-search agent, computed by a per-tool estimator. This is the receipt for the "fewer LLM calls, smaller models, lower bills" claim.
counterfactual_method — stable method-id naming the estimator (e.g. grep_plus_read_n_files, lockfile_transitive_parse_plus_manifests).
cursor — opaque pagination token, or null for unpaginated results.
tenant_scoped_at — the workspace the tool resolved to. Always populated.

The envelope is enforced — malformed responses are rejected before they reach the agent.

Time-travel queries (`at_commit`)

Every relationship in the graph carries valid_from and a nullable valid_to, so a query can ask the graph as it stood at any commit. Tools that traverse the graph accept an optional at_commit parameter; the Cypher rewriter ANDs r.valid_from <= $at AND coalesce(r.valid_to, '9999-…') > $at into the first MATCH for every named relationship variable. Bare patterns are left alone.

// callers of parseJWT as of commit 9a4c1f2
{
  "tool": "code.find_callers",
  "args": {
    "name": "parseJWT",
    "at_commit": "9a4c1f2"
  }
}

The same shape works on ontology.explain_function, code.find_path, code.references_to, memory.recall, and any other tool that traverses bitemporal edges. The response envelope's tenant_scoped_at records the workspace; evidence records the node IDs at that commit.

A vanilla file-search agent reproduces this with git checkout <sha>, repeats every grep, parses every file again, and discards the working tree. The graph keeps the historical state queryable without checking anything out.

Two agents that never run in the same process can still share state through the workspace graph. The first writes a node — a finding, a pattern, a code.commit — and the next agent's first MCP call sees it.

A concrete shape:

2026-04-22, 14:03 UTC. Agent A (Claude Code in a developer's editor) merges PR #517 to main. The push webhook triggers an incremental sync. code.commit { sha: "9a4c1f…" }, code.author { email: "alex@…" }, and the code.commit-[:touched]->code.function { name: "parseJWT" } edge land in the workspace graph within ~30 seconds.
2026-04-25, 08:11 UTC. Agent B (a triage agent invoked by an on-call engineer) receives a stack trace mentioning parseJWT. Its first MCP call is ontology.explain_function { name: "parseJWT" }. The response contains every commit since 04-22, author Alex, the diff for #517, and the tests that exercise the function — without Agent B running git log or cloning the repo.

The agents never coordinated. The graph did. Multi-agent coordination is structural, not bolt-on: the second agent reads the first agent's output the same way it reads any other graph node.

Memory and the graph are the same store

The agent memory layer (Agent Memory) writes _mem:action, _mem:sequence, and _mem:pattern nodes into the same Neo4j workspace graph as your code.* nodes. A pattern that applies to a specific function is structurally connected to it, so retrieving the pattern pulls the symbol with it.

When the evaluator promotes a pattern keyed on a code symbol — for example, "ontology mutations on parseJWT fail with NodeNotFoundError 83% of the time" — it writes:

{
  "type": "_mem:pattern",
  "name": "ontology_mutation:code.function:NodeNotFoundError",
  "properties": {
    "confidence_score": 0.83,
    "applies_to": {
      "type": "code.function",
      "name": "parseJWT"
    },
    "recommendation": "Validate the node exists and is accessible before proceeding."
  }
}

The applies_to shape is matched by the pre-execution context hook, which traverses from the pattern node to the referenced code node in one hop. An agent calling memory.context for parseJWT receives the pattern, the function node, its commits, and its tests in the same payload — one MCP call, no follow-up.

{
  "patterns": [
    {
      "id": "5c7c…",
      "name": "ontology_mutation:code.function:NodeNotFoundError",
      "confidence_score": 0.83,
      "recommendation": "Validate the node exists and is accessible before proceeding."
    }
  ],
  "code_context": {
    "function": {
      "id": "a3f2…",
      "name": "parseJWT",
      "signature": "function parseJWT(token: string): JWTClaims",
      "file": "services/api/lib/auth.ts",
      "is_exported": true
    },
    "tests": [
      { "id": "b8c1…", "name": "parseJWT_handles_expired_token" }
    ],
    "recent_commits": []
  }
}

The shape is deterministic. The agent does not need to interpret free-form text — it reads the typed record and decides.

Per-repo configuration

Every GitHub connection ships with the manifest fields below. They are read on the first sync and editable from Connections → Settings (or via the API on the connection record).

Field	Type	Default	What it does
`repo_full_name`	string	—	The repository in `owner/repo` form.
`default_branch`	string	`main`	Branch to ingest. Other branches are not parsed; cross-branch comparison happens via the GitHub PR / commit metadata pass.
`include_paths`	string[]	`apps/`, `packages/`, `services/**`	Glob set for files to parse.
`exclude_paths`	string[]	`/node_modules/`, `/.next/`, `/dist/`, `*/.d.ts`	Glob set for files to skip before parsing.
`languages`	string[]	`["typescript", "python"]`	Parser languages enabled for this repo. The source parser ships TypeScript / JavaScript and Python.
`ingest_commit_history`	boolean	`true`	Creates `code.commit` nodes for recent commits with diffs. Lets agents query what changed, who, and why.
`enable_semantic_edges`	boolean	`true`	Runs an LLM pass to infer `reads` / `writes` / `returns` / `validates` between functions. Disable on large repos to reduce cost.
`sync_schedule`	string	`daily`	`manual`, `daily`, or `weekly` — when to re-index automatically. Push webhooks always trigger an incremental sync regardless.
`commit_depth`	number	`100`	How many recent commits to ingest on first sync. `min: 10`, `max: 500`.
`min_import_edge_referrers`	number	`3`	Suppress file-level `imports` edges to any target imported by more than this many files in the same repo — aggregated to a single `depends_on` edge instead. Reduces hub-node clutter (React, lodash).

Existing connections inherit the documented defaults on the first sync after a settings update — the manifest defines the schema, the API merges new keys non-destructively, and the worker honours the merged blob on the next run.

Sync lifecycle

A repo connection moves through four states, defined by the ConnectionStatus enum:

`status`	Meaning
`pending`	Created but no successful sync yet. The first backfill is running or queued.
`active`	At least one sync has completed. The connection is live and accepting webhooks.
`error`	The most recent sync raised. `last_error` carries the message; retry is one click.
`paused`	Operator-paused. No automatic syncs run; manual `ontology.refresh_repo` still works.

Five things drive a sync:

Initial backfill. Triggered automatically on connect. Reads commit_depth commits, walks include / exclude globs, parses every supported file, resolves edges, writes nodes.
Push webhook. GitHub pushes to the default branch trigger an incremental sync that re-parses changed files and adds new code.commit nodes since last_synced_sha.
Schedule. sync_schedule runs a defensive re-sync daily or weekly to recover from missed webhooks. manual opts out — only push webhooks and explicit ontology.refresh_repo calls trigger a sync.
Reconciliation cadence. Even if a webhook is dropped or rate-limited, the graph self-heals on a known schedule: an hourly drift sweep refreshes issues / PRs / discussions, a nightly metadata pass refreshes repository / branch / label / milestone / CODEOWNERS / branch-protection state, a nightly security pass refreshes Dependabot / code-scanning / secret-scanning alerts, and a Sunday-night full reparse re-derives every code.symbol from source. State per (workspace, surface) is tracked in a watermark table so each sweep resumes from where the last one left off.
Manual. Force a re-sync from the connection card or from any agent with ontology.refresh_repo { connection_id, paths? }. Pass paths to limit the re-ingest to a subset of the repo — useful after a targeted edit when you do not want to wait for the next webhook.

Where to go next

Cookbook: Index your codebase and query it with Claude — first-sync walkthrough.
Agentic coding cookbook — three end-to-end agent threads with real MCP payloads.
MCP Server — install the server in Claude Code, Cursor, VS Code, Windsurf, or Codex.
Agent Memory — how patterns and sequences attach to code nodes.
Cheaper models with Oxagen — the cost argument with eval methodology.

Get started free · Read the docs

Code Graph

On this page