Code Graph
How Oxagen ingests a repository into a typed, queryable code graph and what your agents can traverse against it.
Why a code graph?
Grep and embedding-only retrieval treat a codebase as a bag of strings. An
agent asked "who calls parseJWT?" either runs ten greps and stitches
the answers together, or it gets a vector-similarity guess that misses
exact callers and surfaces unrelated files that happen to mention the
substring. Both modes pay tokens for context the agent then has to read
to find what it actually needed.
A code graph encodes the relationships directly. Files, functions, classes, imports, calls, and tests are typed nodes; the edges between them are typed too. One traversal answers structural questions exactly, in one MCP call, with deterministic node IDs the agent can cite back.
Oxagen ingests every connected GitHub repository into the same Neo4j-backed workspace knowledge graph that holds your business ontology — so an agent doing code work and an agent doing business work read from the same store, with the same RLS scoping, over the same MCP surface.
What gets ingested
Connect a GitHub repository through the GitHub App and Oxagen runs a deterministic pipeline: clone at HEAD, walk the include / exclude globs, parse each supported file with tree-sitter, resolve cross-file references with stack-graphs, fetch repo + identity + CI + tests + security metadata through the GitHub REST + GraphQL APIs, and write typed nodes and edges into the workspace graph. The source parser ships TypeScript / JavaScript and Python; the GitHub-metadata, CI, tests, and security passes apply to repositories in any language.
Every node and edge carries an envelope of provenance:
tenant_id, workspace_id, source (one of github, git,
tree-sitter, lsp, derived, manual), source_id, fetched_at,
observed_at, schema_version. Edges carry valid_from and a nullable
valid_to so every traversal is bitemporal — the graph reflects HEAD
by default and can be queried at any commit (see
Time-travel queries).
Node types
The ingestion pipeline emits nodes in nine categories. Every node inherits the provenance envelope above; the tables below name only the type-specific properties.
Repository layer
| Node type | Represents | Key properties |
|---|---|---|
code.repo | The connected repository | full_name, default_branch, visibility, last_synced_sha, is_public, installation_id |
code.branch | A named branch on the repo | name, is_default, is_protected, head_commit_id |
code.tag | A git tag | name, target_commit_id, is_annotated |
code.commit | A commit on the default branch | sha, short_sha, message, message_body, author_name, author_email, authored_at, diff_patch, files_changed, insertions, deletions |
code.tree | A git tree at a commit | sha, commit_id, entry_count |
code.conventional_commit_scope | Parsed Conventional Commit scope | scope, usage_count |
Identity layer
| Node type | Represents | Key properties |
|---|---|---|
code.author | A (name, email) pair from git log | name, email, commit_count, first_seen_at, last_seen_at |
github.user | A GitHub account | login, github_id, name, email_public, is_bot, is_member_of_org, avatar_url |
person | The deduplicated human behind one or more git authors / GitHub users | canonical_name, primary_email, aliases[], oxagen_user_id |
The identity-resolution pass runs after every sync and folds
code.author + github.user into a single person via deterministic
strategies — exact-email match (1.0), GitHub login match (0.8), name
match (0.6) — recorded as edges so the resolution is auditable.
Filesystem layer
| Node type | Represents | Key properties |
|---|---|---|
code.file | One source file at HEAD | path, lang, blob_sha, line_count, is_test_file, is_generated, parse_error |
code.file_version | A file at a specific commit — the anchor every code-semantic node ties to | file_id, commit_id, blob_sha, path_at_commit, loc, change_type, previous_path, previous_blob_sha |
code.package | An internal package directory | path, package_name, version, private |
Code semantics
| Node type | Represents | Key properties |
|---|---|---|
code.function | A function, method, or arrow declaration | file_id, start_line, end_line, signature, is_exported, is_async, kind, parent_class |
code.class | A class declaration | file_id, start_line, end_line, is_exported, export_name |
code.symbol | An interface, type alias, enum, or const export | file_id, kind, start_line, end_line, is_exported, export_name |
code.namespace | A TS / Python namespace or module-level scope | name, file_id |
code.import | A resolved import statement | from, to, is_external |
code.variable | A module-level binding the parser surfaces for resolution | name, file_id, start_line |
code.decorator | A decorator applied to a function or class | name, file_id |
code.type_reference | A reference to a typed symbol from inside a function signature | name, referrer_id, target_id |
code.exception | A named exception class observed in raise / throw | name |
code.external_package | A declared third-party dependency | name, ecosystem (npm, pypi, mixed) |
code.chunk | A summarised slice of source for hybrid retrieval | file_id, start_line, end_line, summary, embedding_id |
Markdown / mdx documents are stored as
code.filenodes withlangin{markdown, mdx}— not a separatecode.doctype. Filter onlangwhen querying for "all docs in this repo".
diff_patchis byte-capped at 16 KB per commit. Larger diffs are truncated UTF-8-safely with a[truncated]marker so an agent can tell the diff was clipped before reasoning over it.
Documentation surfaces
| Node type | Represents | Key properties |
|---|---|---|
code.adr | An ADR file under docs/adr/ or similar | number, title, status, file_id |
code.changelog_entry | One release entry in CHANGELOG.md | version, date, kind, body |
GitHub metadata
| Node type | Represents | Key properties |
|---|---|---|
github.pull_request | A pull request on the repo | number, title, body, state, merged, merged_at, base_ref, head_ref |
github.issue | An issue | number, title, body, state, closed_at |
github.discussion | A discussion thread | number, title, body, category_id, is_answered |
github.discussion_category | A discussion category | name, slug |
github.review | A code review on a PR | state, submitted_at, body |
github.review_comment | An in-line review comment anchored to a file + line | path, line, body |
github.comment | A free-form comment on an issue / PR / discussion | body, created_at |
github.label | A repo label | name, color, description |
github.milestone | A repo milestone | title, due_on, state |
github.release | A published release | tag_name, name, body, published_at, prerelease |
github.project / github.project_item | A GitHub Project board + its items | name, title, body, status |
CI
| Node type | Represents | Key properties |
|---|---|---|
github.workflow | A workflow definition | path, name, state |
github.workflow_run | One execution of a workflow | run_id, status, conclusion, started_at, completed_at |
github.workflow_job | A job inside a workflow run | name, status, conclusion, runner_id |
github.workflow_step | A step inside a job | name, status, conclusion, number |
github.workflow_artifact | An artifact produced by a run | name, size_bytes, expired |
github.runner | A self-hosted or hosted runner | name, os, is_self_hosted |
github.action | A reusable action referenced by a workflow | name, repo, version |
Tests, coverage, knowledge
| Node type | Represents | Key properties |
|---|---|---|
code.test_suite | A test suite (file or module) | name, file_id, framework |
code.test_case | One test inside a suite | name, suite_id, start_line, end_line |
code.test_run | An execution of a test suite tied to a workflow run | workflow_run_id, started_at, total, passed, failed, skipped |
code.test_result | The outcome of one case in one run | case_id, run_id, status, duration_ms, failure_message |
code.coverage_record | Coverage for one file at a commit | file_id, commit_id, lines_total, lines_hit |
Security
| Node type | Represents | Key properties |
|---|---|---|
security.dependabot_alert | A Dependabot alert | number, state, severity, package, ecosystem, cve |
security.code_scanning_alert | A code-scanning alert | number, state, severity, rule_id, file_id |
security.secret_scanning_alert | A secret-scanning alert | number, state, secret_type, validity |
security.secret_scanning_location | The file + line a secret was observed at | file_id, start_line, end_line |
security.vulnerability | A CVE / GHSA advisory referenced by an alert | ghsa_id, cve_id, severity, summary |
Memory
| Node type | Represents | Key properties |
|---|---|---|
memory.episode | One agent session: what it worked on, ran, observed, produced | agent_id, started_at, ended_at, outcome, salience_score |
memory.procedure | A pattern distilled from ≥3 episodes that share an anchor | trigger, precision_score, reuse_count, is_pinned, recommendation |
memory.archived_episode | An episode demoted below the salience floor after 30 days | mirrors memory.episode minus the live HNSW embedding |
Edge types
The connector emits structural edges deterministically from the AST and the GitHub API. Semantic edges are LLM-inferred with a confidence score and can be turned off per-connection. Derived edges are computed by the M7 derivation jobs and refresh on every push.
Every edge carries valid_from and a nullable valid_to (the
bitemporal envelope). The structural
edges below are the canonical ones agents traverse — see
packages/oxagen/oxagen/connectors/github/types.py:EdgeKind for the
complete enum, including review / comment / project plumbing surfaces.
Code structure
| Edge type | Source → Target | Provenance | Semantics |
|---|---|---|---|
contains | code.repo → code.package / code.file; code.package → code.file | structural | Hierarchical containment. |
defines | code.file → function / class / symbol; code.class → method | structural | This file (or class) declares this child. |
defined_in | code.function / code.class / code.symbol → code.file_version | structural | Anchors the symbol to the file at a specific commit. |
member_of | code.function → code.class | structural | Inverse of class→method defines. One-hop "what class does this method belong to?". |
imports | code.file → code.file | structural | One file imports another. |
resolves_to_import | code.import → code.file / code.external_package | structural | Stack-graphs resolution of the import statement. |
calls | code.function → code.function | structural | Resolved call site. Carries call_sites count and is_cross_file. |
extends | code.class → code.class | structural | Inheritance. |
implements | code.class → code.class | structural | Interface implementation. |
instantiates | code.function → code.class | structural | A function constructs the class. |
decorated_by | code.function / code.class → code.decorator | structural | Decorator application. |
has_namespace | code.function / code.class / code.symbol → code.namespace | structural | Symbol lives inside a namespace. |
has_type_reference | code.function → code.symbol | structural | Type referenced in a function signature. |
declares_type | code.function → code.symbol | structural | Function annotated with this type. |
re_exports | code.symbol → code.symbol | structural | Symbol re-exported from another module. |
depends_on | code.file → code.external_package | structural | Declared package dependency. Carries version_spec and dep_kind. |
references | code.file → code.file | structural | Internal markdown / doc link. |
tests / is_tested_by | code.function ↔ code.function | structural | Heuristic test pairing — both directions emitted. |
throws / is_thrown_by | code.function ↔ code.exception | structural | Function raises a named exception. |
has_chunk | code.file → code.chunk | structural | File chunked for hybrid retrieval. |
Semantic (LLM-inferred, optional)
| Edge type | Source → Target | Semantics |
|---|---|---|
reads | code.function → code.symbol / field | Reads a value or field. Off by default on large repos. |
writes | code.function → code.symbol / field | Writes a value or field. |
returns | code.function → code.symbol | Returns this shape. |
modifies | code.function → code.symbol | Mutates the target. |
calculates | code.function → code.symbol | Computes the target value. |
validates | code.function → code.symbol | Validates the target. |
configures | code.function → code.symbol | Configures a target. |
computes | code.function → code.symbol | Derived computation. |
Git history and identity
| Edge type | Source → Target | Provenance | Semantics |
|---|---|---|---|
has_commit | code.repo → code.commit | structural | Repo has this commit on the default branch. |
contains_commit | code.branch → code.commit | structural | Branch contains this commit. |
head_at | code.branch → code.commit | structural | Branch points at this head commit. |
targets_branch | github.pull_request → code.branch | structural | PR targets this branch. |
merged_as | github.pull_request → code.commit | structural | PR was merged as this commit. |
authored_by | code.commit → code.author | structural | Commit was authored by this person. Authors deduplicated by email. |
has_scope | code.commit → code.conventional_commit_scope | structural | Conventional Commits scope parsed from the message. |
touched | code.commit → code.file / code.file_version / code.function | structural | Files / file versions (always) and functions (best-effort from diff hunk markers) the commit changed. Carries action — added / modified / removed / renamed / copied — sourced from git show --name-status. Distinct from modifies — see note below. |
touches_symbol | code.commit → code.symbol / code.function | structural | Symbol-precise version of touched, materialised by the M3 commit→symbol resolver. |
in_release | code.commit → github.release | structural | Commit was included in this release. |
GitHub metadata (PRs, issues, discussions, reviews)
| Edge type | Source → Target | Semantics |
|---|---|---|
of / by | github.review → github.pull_request / github.user | Review belongs to a PR; review was written by a user. |
of_review | github.review_comment → github.review | Review comment belongs to a review. |
anchored_to | github.review_comment → code.file_version / code.file | Review comment is anchored to a file at a commit. |
on | github.comment → github.issue / github.pull_request / github.discussion / code.commit | Comment is on a parent surface. |
replies_to | github.comment → github.comment | Threaded reply. |
assigned_to | github.issue → github.user | Issue assignee. |
labeled | github.issue / github.pull_request → github.label | Item carries a label. |
in_milestone | github.issue / github.pull_request → github.milestone | Item is inside a milestone. |
in_category | github.discussion → github.discussion_category | Discussion sits in a category. |
answered_by | github.discussion → github.comment | The accepted answer of a discussion. |
mentions | github.issue / github.pull_request → github.issue / github.pull_request / code.commit / github.user | Cross-link parsed from issue/PR bodies. |
closes / links | github.pull_request → github.issue | PR closes / references an issue. |
CI
| Edge type | Source → Target | Semantics |
|---|---|---|
of_workflow | github.workflow_run → github.workflow | Run belongs to this workflow. |
of_run | github.workflow_job → github.workflow_run | Job belongs to this run. |
of_job | github.workflow_step → github.workflow_job | Step belongs to this job. |
uses | github.workflow / github.workflow_step → github.action | Workflow / step references this reusable action. |
triggered_by_commit | github.workflow_run → code.commit | Run was triggered by this commit. |
triggered_by_pr | github.workflow_run → github.pull_request | Run was triggered by this PR. |
triggered_by_user | github.workflow_run → github.user | Run was triggered by this user (manual dispatch). |
produced | github.workflow_run → github.workflow_artifact | Run produced this artifact. |
ran_on | github.workflow_job → github.runner | Job ran on this runner. |
job_depends_on | github.workflow_job → github.workflow_job | Inter-job dependency from needs:. |
Tests and coverage
| Edge type | Source → Target | Semantics |
|---|---|---|
of_suite | code.test_case → code.test_suite | Test case belongs to a suite. |
of_case | code.test_result → code.test_case | Result is for this case. |
of_workflow_run | code.test_run → github.workflow_run | Test run was produced by this workflow run. |
of_file | code.coverage_record → code.file | Coverage record applies to this file. |
at_commit | code.coverage_record → code.commit | Coverage record is at this commit. |
hit | code.coverage_record → code.function | Functions exercised by the run. |
Security
| Edge type | Source → Target | Semantics |
|---|---|---|
defined_by | security.dependabot_alert → security.vulnerability | Alert references this CVE/GHSA. |
anchored_to | security.code_scanning_alert → code.file_version | Alert is anchored to a file at a commit. |
of_file | security.secret_scanning_location → code.file | Secret-scanning hit lives in this file. |
supersedes | security.dependabot_alert → security.dependabot_alert | A new alert supersedes an older one for the same dependency. |
Memory
| Edge type | Source → Target | Semantics |
|---|---|---|
worked_on | memory.episode → code.symbol / code.file / github.issue / github.pull_request | What the agent's session targeted. |
ran | memory.episode → code.test_case | Tests the agent ran during the session. |
observed | memory.episode → code.test_result / github.workflow_run / error fingerprint | What the agent saw happen. |
produced | memory.episode → code.commit / github.pull_request | Commits / PRs the agent's session produced. |
used_tool | memory.episode → MCPTool | MCP tool the agent called. Carries tool, args_hash, result_hash. |
triggers_on | memory.procedure → code.symbol / code.test_case / pattern | Anchor that activates a procedure on recall. |
derived_from | memory.procedure → memory.episode | Provenance: which episodes promoted to this procedure. |
supersedes | memory.procedure → memory.procedure | Newer procedure replaces an older one. |
invalidated_by | memory.procedure → code.commit | Commit that invalidated the procedure. |
Derived (M7)
The derivation jobs run after every push and recompute the following
edges from the structural graph. They carry confidence_score and
computed_at.
| Edge type | Source → Target | Semantics |
|---|---|---|
expert_on | person → code.symbol / code.file | Top contributor by modification volume + recency. |
co_changes_with | code.symbol → code.symbol | Symbols that historically change together. Powers code.co_changes_with. |
introduced_bug | code.commit → code.commit | SZZ — this commit introduced the bug a later commit fixed. |
regressed_by | code.test_case → code.commit | The commit that regressed a previously-passing test. |
Agent-authored (incidents)
| Edge type | Source → Target | Semantics |
|---|---|---|
reports / impacts | code.incident ↔ code node | Agent-authored incident edges; see the MCP server. |
Why
touchedand notmodifiesfor commit edges?modifiesis reserved for the LLM-inferred semantic edge above (code.function → code.symbol— this function mutates this value). The structural commit→file edge would be a different relationship with the same string, which Neo4j would silently merge into a single edge type. An agent callingcode.find_path { edge_types: ["modifies"] }would then get a mix of structural commit history and semantic data flow with no way to disambiguate.touchedkeeps the two cleanly apart.
The EdgeKind enum in
packages/oxagen/oxagen/connectors/github/types.py is the source of
truth for the names — every traversal tool accepts these strings as
filters.
The canonical agent query
The whole graph is shaped to make the following query a one-call answer. Given a function name, return everything an agent needs to fix or refactor it: input shape, output shape, last commit, last author, diff, test coverage, and the docs that reference it.
MATCH (f:Node {type: 'code.function', name: $fn_name})
OPTIONAL MATCH (c:Node {type: 'code.commit'})-[:touched]->(f)
OPTIONAL MATCH (c)-[:authored_by]->(a:Node {type: 'code.author'})
OPTIONAL MATCH (f)<-[:is_tested_by]-(t:Node {type: 'code.function'})
OPTIONAL MATCH (doc:Node {type: 'code.file'})-[:references]->(f)
WHERE doc.properties.lang IN ['markdown', 'mdx']
OPTIONAL MATCH (c)-[ct:touched]->(f)
RETURN
f.properties.params AS input_shape,
f.properties.return_type AS output_shape,
f.properties.signature AS signature,
c.properties.message AS last_commit_message,
c.properties.diff_patch AS last_diff,
ct.action AS last_change_kind, -- added | modified | removed
a.properties.name AS last_author,
collect(DISTINCT t.name) AS test_coverage,
collect(DISTINCT doc.name) AS referencing_docs
ORDER BY c.properties.authored_at DESC
LIMIT 1What the OPTIONAL MATCH clauses do, one at a time:
- The first
MATCHanchors on the function.nameis a property; thetypefilter restricts tocode.functionso a class or symbol of the same name does not collide. c-[:touched]->(f)walks back to every commit whose diff touched this function. Thetouchededge from a commit to a function is best-effort — emitted when the diff hunk header includes a function marker — buttouchedfrom commit to file is exhaustive, which is whycode.commitis queryable as a sibling and not the only handle.c-[:authored_by]->(a)resolves the commit's author node. Authors are deduplicated on email across the repo, so an agent can ask "who else has touched this code?" with one more hop.f<-[:is_tested_by]-(t)returns the test functions that exercise the target. The connector also emits the inversetestsedge from the test function to the subject — pick the direction that fits the query.doc-[:references]->(f)returns markdown / mdx documents that name the function or its file. Design docs, changelogs, and incident write-ups all link back to the symbols they describe. Markdown / mdx files are stored ascode.filenodes withlangin{markdown, mdx}— theWHEREclause above filters to those.
A vanilla file-search agent reproduces this with: git log --follow,
git blame for the author, grep -r 'def parseJWT' to find the
implementation, two more greps to locate tests, a final pass to find
docs that mention the symbol — every step a separate tool call, every
hop paying for tokens. The graph collapses it to one MCP call and
returns typed records the agent does not have to re-parse.
Impact analysis across class members
The member_of edge makes the "if I change a method, what else in the
class might break?" query a single hop. Expand the canonical query
above to walk class siblings + their tests:
MATCH (m:Node {type: 'code.function', name: $method_name})
MATCH (m)-[:member_of]->(cls:Node {type: 'code.class'})
MATCH (cls)-[:defines]->(sibling:Node {type: 'code.function'})
WHERE sibling.id <> m.id
OPTIONAL MATCH (sibling)<-[:is_tested_by]-(t:Node {type: 'code.function'})
RETURN
cls.name AS containing_class,
sibling.name AS sibling_method,
sibling.properties.signature AS sibling_signature,
EXISTS { MATCH (sibling)-[:calls]->(m) } AS calls_changed_method,
collect(DISTINCT t.name) AS sibling_testsWhat this answers in one MCP call: which methods sit beside the one
the agent is about to change, which of them call into it, and the
tests that exercise each of those siblings. The agent does not need
to grep, git blame, or re-parse — every relationship is already
typed in the graph.
You do not write Cypher directly — ontology.explain_function,
ontology.symbol_context, ontology.traverse, and ontology.ask
compile to this shape. ontology.symbol_context is the closest
direct wrapper: pass a name (or node_id) and it returns the
target node, its containing class, sibling methods, tests, callers,
callees, semantic returns, and the most-recent commits with action
labels and resolved authors — every collection independently capped
so a hub function can't blow up the response. See
How agents query the graph below for
the MCP tool surface.
// laser-context bundle for a single symbol — one MCP call, no Cypher
{
"tool": "ontology.symbol_context",
"args": { "name": "processOrder" }
}
// →
{
"target": {
"id": "…", "name": "processOrder", "kind": "function",
"signature": "async function processOrder(input)",
"file": "services/api/lib/orders.ts",
"is_exported": true, "is_async": true
},
"containing_class": { "id": "…", "name": "OrderService" },
"siblings": [ { "id": "…", "name": "cancelOrder" } ],
"tests": [ { "id": "…", "name": "processOrder_handles_decline" } ],
"callers": [ { "id": "…", "name": "checkoutHandler" } ],
"callees": [ { "id": "…", "name": "chargeStripe" } ],
"semantic_returns": [ { "id": "…", "name": "OrderResult" } ],
"recent_commits": [
{
"commit": {
"id": "…", "name": "9a4c1f10",
"sha": "9a4c1f10abcdef…", "short_sha": "9a4c1f10",
"message": "fix(api): retry chargeStripe on transient 5xx",
"authored_at": "2026-04-22T14:03:12Z",
"files_changed": 3, "insertions": 41, "deletions": 12
},
"action": "modified",
"author": { "id": "…", "name": "Alex Rivera", "email": "alex@example.com" },
"diff_excerpt": "@@ -1,3 +1,5 @@\n retry()"
}
]
}Caps are configurable: siblings_limit, tests_limit, callers_limit,
callees_limit, returns_limit, and commits_limit default to 50 / 50
/ 50 / 50 / 25 / 10 respectively. Identity comes from the bearer token —
workspace_id is never read from input.
How agents query the graph
Every entry in the catalogue below is a real MCP tool exposed by
mcp.oxagen.ai. Tool names and input field names match the running
server — see MCP Server for installation. The
catalogue follows the SPEC §7 grouping so it stays one-to-one with
the spec doc.
Discovery and structure
| Tool | What it returns |
|---|---|
code.repo_overview | Size, languages, top modules, top contributors, hot files, alert counts. |
code.module_tree | Logical module hierarchy from code.file nodes. |
code.find_symbol | Resolve a name to code.symbol / code.function / code.class nodes. |
code.describe_symbol | Full record per symbol — signature, callers count, callees count, summary. |
code.read_symbol | Source bytes for one symbol — line-sliced to its range with N lines of surrounding context (default 5, configurable 0–200). Replaces grep + git-show + base64 decode in one MCP call. |
Traversal and impact
| Tool | What it returns |
|---|---|
code.find_callers | Inbound calls edges to a target node, transitive up to depth=5. |
code.callees_of | Outbound calls from a target node. |
code.find_dependencies | Outbound calls / imports edges from a target node. |
code.find_path | Shortest path between two nodes constrained by edge_types. |
code.references_to | All inbound references — calls, imports, uses. |
code.co_changes_with | Symbols that historically change together (M7 derived co_changes_with edge). |
code.get_neighborhood | Bidirectional 1- or 2-hop expansion around a target. |
code.find_dead_code | Nodes with no inbound calls / imports from elsewhere in the workspace. |
code.find_cycles | Cycles in the dependency graph, filtered by kind and min_length. |
code.stats | Aggregate counts (files, functions, classes), fan-in / fan-out, cycle count. |
ontology.explain_function | Full edge neighbourhood for a named function — what it calls, throws, reads, writes, returns. |
ontology.impact_of | Reverse traversal: every function that reads / writes / modifies / calculates a named symbol. |
ontology.symbol_context | One-call laser-context bundle for a symbol — class, sibling methods, tests, callers, callees, semantic returns, recent commits. Replaces 5+ separate traversal calls. |
ontology.traverse | Bidirectional path enumeration from a node id, capped at max_hops=5, optionally filtered by edge_types. |
SPEC §7.2 aliases dual-register at the same credit cost:
code.callers_of → code.find_callers,
code.dependency_path → code.find_path,
code.affected_by → ontology.impact_of. Both names roll up to the
canonical in usage analytics.
History and ownership
| Tool | What it returns |
|---|---|
code.recent_changes | Recently-changed files plus the commits that touched them. |
code.blame_enriched | Git blame plus the introducing PR body and its reviews. |
code.pr_history | PRs touching a target with merge status. |
code.who_knows_about | Top experts by EXPERT_ON score (M7 derivation). |
code.expertise | What symbols a person owns (M7 derivation). |
Tests and CI
| Tool | What it returns |
|---|---|
code.tests_for | Tests that exercise a target. |
code.coverage_for | Coverage records for a file. |
code.failing_tests | Currently failing test cases. |
code.flaky_tests | Tests that flip pass/fail across recent runs. |
code.last_run | Last workflow run for a workflow. |
code.run_failures | Per-job failures inside a workflow run. |
Dependencies and security
| Tool | What it returns |
|---|---|
code.dependencies | Direct dependencies with constraints, resolved versions, and licenses. |
code.dependency_graph | Transitive dependency subgraph for a package. |
security.open_alerts | Aggregate open Dependabot / code-scanning / secret-scanning alerts. |
security.alert_context | Full context for one alert — anchor, vulnerability, related commits. |
Issues, PRs, discussions
| Tool | What it returns |
|---|---|
code.pr_context | A PR plus its linked issues, reviews, and comments. |
code.issue_context | An issue plus linked PRs, commits, and discussion. |
code.find_issues | Substring + filter search across github.issue nodes. |
code.discussion_context | A discussion plus its accepted answer and related code. |
Search
| Tool | What it returns |
|---|---|
ontology.search | Hybrid vector + structural search across the workspace graph. |
code.find_pattern | Hybrid score over code.chunk nodes — summary match plus structural anchors. |
ontology.ask | Hybrid retrieval, then an LLM composes a grounded answer with cited node UUIDs. |
Memory
| Tool | What it returns |
|---|---|
memory.recall | Procedures-first recall envelope — top-K procedures plus the episodes that promoted them. |
memory.remember | Anchor a fresh memory.episode to the active session. |
memory.procedure_for | Direct trigger-pattern lookup against memory.procedure nodes. |
memory.forget | Soft-delete a memory. Pinned memories require an explicit grant. |
Maintenance
| Tool | What it returns |
|---|---|
ontology.refresh_repo | Trigger a paths-restricted re-ingest on a connection. |
Identity (workspace_id, user_id) is derived from the verified
bearer token — never passed as input. Cross-workspace queries return
empty results.
Response envelope
Every code-graph tool returns the same canonical envelope so the agent parses one shape regardless of which tool emitted it.
{
"results": [...],
"evidence": [
{"node_id": "…", "kind": "code.function", "url": "https://github.com/…"}
],
"tokens_used_estimate": 142,
"counterfactual_estimate_tokens": 8400,
"counterfactual_method": "grep_plus_read_n_files",
"cursor": null,
"tenant_scoped_at": {
"tenant_id": "…",
"workspace_id": "…",
"project_id": null
}
}results— the typed records the tool was asked for.evidence— node IDs the agent can cite back. Every record inresultsis reachable from at least oneevidenceentry.tokens_used_estimate— tokens spent on this tool's reply, computed from the assembled payload (Anthropic SDK token-count API →tiktoken cl100k_baseapproximation → deterministic fallback). Note: thetiktokenfallback uses GPT-4 tokenisation, so estimates may differ slightly from Anthropic-billed token counts.counterfactual_estimate_tokens— what the same answer would have cost a vanilla file-search agent, computed by a per-tool estimator. This is the receipt for the "fewer LLM calls, smaller models, lower bills" claim.counterfactual_method— stable method-id naming the estimator (e.g.grep_plus_read_n_files,lockfile_transitive_parse_plus_manifests).cursor— opaque pagination token, ornullfor unpaginated results.tenant_scoped_at— the workspace the tool resolved to. Always populated.
The envelope is enforced — malformed responses are rejected before they reach the agent.
Time-travel queries (at_commit)
Every relationship in the graph carries valid_from and a nullable
valid_to, so a query can ask the graph as it stood at any commit.
Tools that traverse the graph accept an optional at_commit parameter;
the Cypher rewriter ANDs r.valid_from <= $at AND coalesce(r.valid_to, '9999-…') > $at into the first MATCH for every named relationship
variable. Bare patterns are left alone.
// callers of parseJWT as of commit 9a4c1f2
{
"tool": "code.find_callers",
"args": {
"name": "parseJWT",
"at_commit": "9a4c1f2"
}
}The same shape works on ontology.explain_function,
code.find_path, code.references_to, memory.recall, and any
other tool that traverses bitemporal edges. The response envelope's
tenant_scoped_at records the workspace; evidence records the
node IDs at that commit.
A vanilla file-search agent reproduces this with git checkout <sha>, repeats every grep, parses every file again, and discards the
working tree. The graph keeps the historical state queryable without
checking anything out.
How agents share state through the graph
Two agents that never run in the same process can still share state
through the workspace graph. The first writes a node — a finding, a
pattern, a code.commit — and the next agent's first MCP call sees
it.
A concrete shape:
- 2026-04-22, 14:03 UTC. Agent A (Claude Code in a developer's
editor) merges PR #517 to
main. The push webhook triggers an incremental sync.code.commit { sha: "9a4c1f…" },code.author { email: "alex@…" }, and thecode.commit-[:touched]->code.function { name: "parseJWT" }edge land in the workspace graph within ~30 seconds. - 2026-04-25, 08:11 UTC. Agent B (a triage agent invoked by an
on-call engineer) receives a stack trace mentioning
parseJWT. Its first MCP call isontology.explain_function { name: "parseJWT" }. The response contains every commit since 04-22, author Alex, the diff for #517, and the tests that exercise the function — without Agent B runninggit logor cloning the repo.
The agents never coordinated. The graph did. Multi-agent coordination is structural, not bolt-on: the second agent reads the first agent's output the same way it reads any other graph node.
Memory and the graph are the same store
The agent memory layer (Agent Memory) writes
_mem:action, _mem:sequence, and _mem:pattern nodes into the same
Neo4j workspace graph as your code.* nodes. A pattern that applies
to a specific function is structurally connected to it, so retrieving
the pattern pulls the symbol with it.
When the evaluator promotes a pattern keyed on a code symbol — for
example, "ontology mutations on parseJWT fail with
NodeNotFoundError 83% of the time" — it writes:
{
"type": "_mem:pattern",
"name": "ontology_mutation:code.function:NodeNotFoundError",
"properties": {
"confidence_score": 0.83,
"applies_to": {
"type": "code.function",
"name": "parseJWT"
},
"recommendation": "Validate the node exists and is accessible before proceeding."
}
}The applies_to shape is matched by the pre-execution context hook,
which traverses from the pattern node to the referenced code node in
one hop. An agent calling memory.context for parseJWT receives
the pattern, the function node, its commits, and its tests
in the same payload — one MCP call, no follow-up.
{
"patterns": [
{
"id": "5c7c…",
"name": "ontology_mutation:code.function:NodeNotFoundError",
"confidence_score": 0.83,
"recommendation": "Validate the node exists and is accessible before proceeding."
}
],
"code_context": {
"function": {
"id": "a3f2…",
"name": "parseJWT",
"signature": "function parseJWT(token: string): JWTClaims",
"file": "services/api/lib/auth.ts",
"is_exported": true
},
"tests": [
{ "id": "b8c1…", "name": "parseJWT_handles_expired_token" }
],
"recent_commits": []
}
}The shape is deterministic. The agent does not need to interpret free-form text — it reads the typed record and decides.
Per-repo configuration
Every GitHub connection ships with the manifest fields below. They are read on the first sync and editable from Connections → Settings (or via the API on the connection record).
| Field | Type | Default | What it does |
|---|---|---|---|
repo_full_name | string | — | The repository in owner/repo form. |
default_branch | string | main | Branch to ingest. Other branches are not parsed; cross-branch comparison happens via the GitHub PR / commit metadata pass. |
include_paths | string[] | apps/**, packages/**, services/** | Glob set for files to parse. |
exclude_paths | string[] | **/node_modules/**, **/.next/**, **/dist/**, **/*.d.ts | Glob set for files to skip before parsing. |
languages | string[] | ["typescript", "python"] | Parser languages enabled for this repo. The source parser ships TypeScript / JavaScript and Python. |
ingest_commit_history | boolean | true | Creates code.commit nodes for recent commits with diffs. Lets agents query what changed, who, and why. |
enable_semantic_edges | boolean | true | Runs an LLM pass to infer reads / writes / returns / validates between functions. Disable on large repos to reduce cost. |
sync_schedule | string | daily | manual, daily, or weekly — when to re-index automatically. Push webhooks always trigger an incremental sync regardless. |
commit_depth | number | 100 | How many recent commits to ingest on first sync. min: 10, max: 500. |
min_import_edge_referrers | number | 3 | Suppress file-level imports edges to any target imported by more than this many files in the same repo — aggregated to a single depends_on edge instead. Reduces hub-node clutter (React, lodash). |
Existing connections inherit the documented defaults on the first sync after a settings update — the manifest defines the schema, the API merges new keys non-destructively, and the worker honours the merged blob on the next run.
Sync lifecycle
A repo connection moves through four states, defined by the
ConnectionStatus enum:
status | Meaning |
|---|---|
pending | Created but no successful sync yet. The first backfill is running or queued. |
active | At least one sync has completed. The connection is live and accepting webhooks. |
error | The most recent sync raised. last_error carries the message; retry is one click. |
paused | Operator-paused. No automatic syncs run; manual ontology.refresh_repo still works. |
Five things drive a sync:
- Initial backfill. Triggered automatically on connect. Reads
commit_depthcommits, walks include / exclude globs, parses every supported file, resolves edges, writes nodes. - Push webhook. GitHub pushes to the default branch trigger an incremental sync that re-parses changed files and adds new
code.commitnodes sincelast_synced_sha. - Schedule.
sync_scheduleruns a defensive re-syncdailyorweeklyto recover from missed webhooks.manualopts out — only push webhooks and explicitontology.refresh_repocalls trigger a sync. - Reconciliation cadence. Even if a webhook is dropped or rate-limited, the graph self-heals on a known schedule: an hourly drift sweep refreshes issues / PRs / discussions, a nightly metadata pass refreshes repository / branch / label / milestone / CODEOWNERS / branch-protection state, a nightly security pass refreshes Dependabot / code-scanning / secret-scanning alerts, and a Sunday-night full reparse re-derives every
code.symbolfrom source. State per (workspace, surface) is tracked in a watermark table so each sweep resumes from where the last one left off. - Manual. Force a re-sync from the connection card or from any agent with
ontology.refresh_repo { connection_id, paths? }. Passpathsto limit the re-ingest to a subset of the repo — useful after a targeted edit when you do not want to wait for the next webhook.
Where to go next
- Cookbook: Index your codebase and query it with Claude — first-sync walkthrough.
- Agentic coding cookbook — three end-to-end agent threads with real MCP payloads.
- MCP Server — install the server in Claude Code, Cursor, VS Code, Windsurf, or Codex.
- Agent Memory — how patterns and sequences attach to code nodes.
- Cheaper models with Oxagen — the cost argument with eval methodology.