Oxagen Docs

Code Graph

How Oxagen ingests a repository into a typed, queryable code graph and what your agents can traverse against it.

Why a code graph?

Grep and embedding-only retrieval treat a codebase as a bag of strings. An agent asked "who calls parseJWT?" either runs ten greps and stitches the answers together, or it gets a vector-similarity guess that misses exact callers and surfaces unrelated files that happen to mention the substring. Both modes pay tokens for context the agent then has to read to find what it actually needed.

A code graph encodes the relationships directly. Files, functions, classes, imports, calls, and tests are typed nodes; the edges between them are typed too. One traversal answers structural questions exactly, in one MCP call, with deterministic node IDs the agent can cite back.

Oxagen ingests every connected GitHub repository into the same Neo4j-backed workspace knowledge graph that holds your business ontology — so an agent doing code work and an agent doing business work read from the same store, with the same RLS scoping, over the same MCP surface.

What gets ingested

Connect a GitHub repository through the GitHub App and Oxagen runs a deterministic pipeline: clone at HEAD, walk the include / exclude globs, parse each supported file with tree-sitter, resolve cross-file references with stack-graphs, fetch repo + identity + CI + tests + security metadata through the GitHub REST + GraphQL APIs, and write typed nodes and edges into the workspace graph. The source parser ships TypeScript / JavaScript and Python; the GitHub-metadata, CI, tests, and security passes apply to repositories in any language.

Every node and edge carries an envelope of provenance: tenant_id, workspace_id, source (one of github, git, tree-sitter, lsp, derived, manual), source_id, fetched_at, observed_at, schema_version. Edges carry valid_from and a nullable valid_to so every traversal is bitemporal — the graph reflects HEAD by default and can be queried at any commit (see Time-travel queries).

Node types

The ingestion pipeline emits nodes in nine categories. Every node inherits the provenance envelope above; the tables below name only the type-specific properties.

Repository layer

Node typeRepresentsKey properties
code.repoThe connected repositoryfull_name, default_branch, visibility, last_synced_sha, is_public, installation_id
code.branchA named branch on the reponame, is_default, is_protected, head_commit_id
code.tagA git tagname, target_commit_id, is_annotated
code.commitA commit on the default branchsha, short_sha, message, message_body, author_name, author_email, authored_at, diff_patch, files_changed, insertions, deletions
code.treeA git tree at a commitsha, commit_id, entry_count
code.conventional_commit_scopeParsed Conventional Commit scopescope, usage_count

Identity layer

Node typeRepresentsKey properties
code.authorA (name, email) pair from git logname, email, commit_count, first_seen_at, last_seen_at
github.userA GitHub accountlogin, github_id, name, email_public, is_bot, is_member_of_org, avatar_url
personThe deduplicated human behind one or more git authors / GitHub userscanonical_name, primary_email, aliases[], oxagen_user_id

The identity-resolution pass runs after every sync and folds code.author + github.user into a single person via deterministic strategies — exact-email match (1.0), GitHub login match (0.8), name match (0.6) — recorded as edges so the resolution is auditable.

Filesystem layer

Node typeRepresentsKey properties
code.fileOne source file at HEADpath, lang, blob_sha, line_count, is_test_file, is_generated, parse_error
code.file_versionA file at a specific commit — the anchor every code-semantic node ties tofile_id, commit_id, blob_sha, path_at_commit, loc, change_type, previous_path, previous_blob_sha
code.packageAn internal package directorypath, package_name, version, private

Code semantics

Node typeRepresentsKey properties
code.functionA function, method, or arrow declarationfile_id, start_line, end_line, signature, is_exported, is_async, kind, parent_class
code.classA class declarationfile_id, start_line, end_line, is_exported, export_name
code.symbolAn interface, type alias, enum, or const exportfile_id, kind, start_line, end_line, is_exported, export_name
code.namespaceA TS / Python namespace or module-level scopename, file_id
code.importA resolved import statementfrom, to, is_external
code.variableA module-level binding the parser surfaces for resolutionname, file_id, start_line
code.decoratorA decorator applied to a function or classname, file_id
code.type_referenceA reference to a typed symbol from inside a function signaturename, referrer_id, target_id
code.exceptionA named exception class observed in raise / throwname
code.external_packageA declared third-party dependencyname, ecosystem (npm, pypi, mixed)
code.chunkA summarised slice of source for hybrid retrievalfile_id, start_line, end_line, summary, embedding_id

Markdown / mdx documents are stored as code.file nodes with lang in {markdown, mdx} — not a separate code.doc type. Filter on lang when querying for "all docs in this repo".

diff_patch is byte-capped at 16 KB per commit. Larger diffs are truncated UTF-8-safely with a [truncated] marker so an agent can tell the diff was clipped before reasoning over it.

Documentation surfaces

Node typeRepresentsKey properties
code.adrAn ADR file under docs/adr/ or similarnumber, title, status, file_id
code.changelog_entryOne release entry in CHANGELOG.mdversion, date, kind, body

GitHub metadata

Node typeRepresentsKey properties
github.pull_requestA pull request on the reponumber, title, body, state, merged, merged_at, base_ref, head_ref
github.issueAn issuenumber, title, body, state, closed_at
github.discussionA discussion threadnumber, title, body, category_id, is_answered
github.discussion_categoryA discussion categoryname, slug
github.reviewA code review on a PRstate, submitted_at, body
github.review_commentAn in-line review comment anchored to a file + linepath, line, body
github.commentA free-form comment on an issue / PR / discussionbody, created_at
github.labelA repo labelname, color, description
github.milestoneA repo milestonetitle, due_on, state
github.releaseA published releasetag_name, name, body, published_at, prerelease
github.project / github.project_itemA GitHub Project board + its itemsname, title, body, status

CI

Node typeRepresentsKey properties
github.workflowA workflow definitionpath, name, state
github.workflow_runOne execution of a workflowrun_id, status, conclusion, started_at, completed_at
github.workflow_jobA job inside a workflow runname, status, conclusion, runner_id
github.workflow_stepA step inside a jobname, status, conclusion, number
github.workflow_artifactAn artifact produced by a runname, size_bytes, expired
github.runnerA self-hosted or hosted runnername, os, is_self_hosted
github.actionA reusable action referenced by a workflowname, repo, version

Tests, coverage, knowledge

Node typeRepresentsKey properties
code.test_suiteA test suite (file or module)name, file_id, framework
code.test_caseOne test inside a suitename, suite_id, start_line, end_line
code.test_runAn execution of a test suite tied to a workflow runworkflow_run_id, started_at, total, passed, failed, skipped
code.test_resultThe outcome of one case in one runcase_id, run_id, status, duration_ms, failure_message
code.coverage_recordCoverage for one file at a commitfile_id, commit_id, lines_total, lines_hit

Security

Node typeRepresentsKey properties
security.dependabot_alertA Dependabot alertnumber, state, severity, package, ecosystem, cve
security.code_scanning_alertA code-scanning alertnumber, state, severity, rule_id, file_id
security.secret_scanning_alertA secret-scanning alertnumber, state, secret_type, validity
security.secret_scanning_locationThe file + line a secret was observed atfile_id, start_line, end_line
security.vulnerabilityA CVE / GHSA advisory referenced by an alertghsa_id, cve_id, severity, summary

Memory

Node typeRepresentsKey properties
memory.episodeOne agent session: what it worked on, ran, observed, producedagent_id, started_at, ended_at, outcome, salience_score
memory.procedureA pattern distilled from ≥3 episodes that share an anchortrigger, precision_score, reuse_count, is_pinned, recommendation
memory.archived_episodeAn episode demoted below the salience floor after 30 daysmirrors memory.episode minus the live HNSW embedding

Edge types

The connector emits structural edges deterministically from the AST and the GitHub API. Semantic edges are LLM-inferred with a confidence score and can be turned off per-connection. Derived edges are computed by the M7 derivation jobs and refresh on every push.

Every edge carries valid_from and a nullable valid_to (the bitemporal envelope). The structural edges below are the canonical ones agents traverse — see packages/oxagen/oxagen/connectors/github/types.py:EdgeKind for the complete enum, including review / comment / project plumbing surfaces.

Code structure

Edge typeSource → TargetProvenanceSemantics
containscode.repocode.package / code.file; code.packagecode.filestructuralHierarchical containment.
definescode.file → function / class / symbol; code.class → methodstructuralThis file (or class) declares this child.
defined_incode.function / code.class / code.symbolcode.file_versionstructuralAnchors the symbol to the file at a specific commit.
member_ofcode.functioncode.classstructuralInverse of class→method defines. One-hop "what class does this method belong to?".
importscode.filecode.filestructuralOne file imports another.
resolves_to_importcode.importcode.file / code.external_packagestructuralStack-graphs resolution of the import statement.
callscode.functioncode.functionstructuralResolved call site. Carries call_sites count and is_cross_file.
extendscode.classcode.classstructuralInheritance.
implementscode.classcode.classstructuralInterface implementation.
instantiatescode.functioncode.classstructuralA function constructs the class.
decorated_bycode.function / code.classcode.decoratorstructuralDecorator application.
has_namespacecode.function / code.class / code.symbolcode.namespacestructuralSymbol lives inside a namespace.
has_type_referencecode.functioncode.symbolstructuralType referenced in a function signature.
declares_typecode.functioncode.symbolstructuralFunction annotated with this type.
re_exportscode.symbolcode.symbolstructuralSymbol re-exported from another module.
depends_oncode.filecode.external_packagestructuralDeclared package dependency. Carries version_spec and dep_kind.
referencescode.filecode.filestructuralInternal markdown / doc link.
tests / is_tested_bycode.functioncode.functionstructuralHeuristic test pairing — both directions emitted.
throws / is_thrown_bycode.functioncode.exceptionstructuralFunction raises a named exception.
has_chunkcode.filecode.chunkstructuralFile chunked for hybrid retrieval.

Semantic (LLM-inferred, optional)

Edge typeSource → TargetSemantics
readscode.functioncode.symbol / fieldReads a value or field. Off by default on large repos.
writescode.functioncode.symbol / fieldWrites a value or field.
returnscode.functioncode.symbolReturns this shape.
modifiescode.functioncode.symbolMutates the target.
calculatescode.functioncode.symbolComputes the target value.
validatescode.functioncode.symbolValidates the target.
configurescode.functioncode.symbolConfigures a target.
computescode.functioncode.symbolDerived computation.

Git history and identity

Edge typeSource → TargetProvenanceSemantics
has_commitcode.repocode.commitstructuralRepo has this commit on the default branch.
contains_commitcode.branchcode.commitstructuralBranch contains this commit.
head_atcode.branchcode.commitstructuralBranch points at this head commit.
targets_branchgithub.pull_requestcode.branchstructuralPR targets this branch.
merged_asgithub.pull_requestcode.commitstructuralPR was merged as this commit.
authored_bycode.commitcode.authorstructuralCommit was authored by this person. Authors deduplicated by email.
has_scopecode.commitcode.conventional_commit_scopestructuralConventional Commits scope parsed from the message.
touchedcode.commitcode.file / code.file_version / code.functionstructuralFiles / file versions (always) and functions (best-effort from diff hunk markers) the commit changed. Carries actionadded / modified / removed / renamed / copied — sourced from git show --name-status. Distinct from modifies — see note below.
touches_symbolcode.commitcode.symbol / code.functionstructuralSymbol-precise version of touched, materialised by the M3 commit→symbol resolver.
in_releasecode.commitgithub.releasestructuralCommit was included in this release.

GitHub metadata (PRs, issues, discussions, reviews)

Edge typeSource → TargetSemantics
of / bygithub.reviewgithub.pull_request / github.userReview belongs to a PR; review was written by a user.
of_reviewgithub.review_commentgithub.reviewReview comment belongs to a review.
anchored_togithub.review_commentcode.file_version / code.fileReview comment is anchored to a file at a commit.
ongithub.commentgithub.issue / github.pull_request / github.discussion / code.commitComment is on a parent surface.
replies_togithub.commentgithub.commentThreaded reply.
assigned_togithub.issuegithub.userIssue assignee.
labeledgithub.issue / github.pull_requestgithub.labelItem carries a label.
in_milestonegithub.issue / github.pull_requestgithub.milestoneItem is inside a milestone.
in_categorygithub.discussiongithub.discussion_categoryDiscussion sits in a category.
answered_bygithub.discussiongithub.commentThe accepted answer of a discussion.
mentionsgithub.issue / github.pull_requestgithub.issue / github.pull_request / code.commit / github.userCross-link parsed from issue/PR bodies.
closes / linksgithub.pull_requestgithub.issuePR closes / references an issue.

CI

Edge typeSource → TargetSemantics
of_workflowgithub.workflow_rungithub.workflowRun belongs to this workflow.
of_rungithub.workflow_jobgithub.workflow_runJob belongs to this run.
of_jobgithub.workflow_stepgithub.workflow_jobStep belongs to this job.
usesgithub.workflow / github.workflow_stepgithub.actionWorkflow / step references this reusable action.
triggered_by_commitgithub.workflow_runcode.commitRun was triggered by this commit.
triggered_by_prgithub.workflow_rungithub.pull_requestRun was triggered by this PR.
triggered_by_usergithub.workflow_rungithub.userRun was triggered by this user (manual dispatch).
producedgithub.workflow_rungithub.workflow_artifactRun produced this artifact.
ran_ongithub.workflow_jobgithub.runnerJob ran on this runner.
job_depends_ongithub.workflow_jobgithub.workflow_jobInter-job dependency from needs:.

Tests and coverage

Edge typeSource → TargetSemantics
of_suitecode.test_casecode.test_suiteTest case belongs to a suite.
of_casecode.test_resultcode.test_caseResult is for this case.
of_workflow_runcode.test_rungithub.workflow_runTest run was produced by this workflow run.
of_filecode.coverage_recordcode.fileCoverage record applies to this file.
at_commitcode.coverage_recordcode.commitCoverage record is at this commit.
hitcode.coverage_recordcode.functionFunctions exercised by the run.

Security

Edge typeSource → TargetSemantics
defined_bysecurity.dependabot_alertsecurity.vulnerabilityAlert references this CVE/GHSA.
anchored_tosecurity.code_scanning_alertcode.file_versionAlert is anchored to a file at a commit.
of_filesecurity.secret_scanning_locationcode.fileSecret-scanning hit lives in this file.
supersedessecurity.dependabot_alertsecurity.dependabot_alertA new alert supersedes an older one for the same dependency.

Memory

Edge typeSource → TargetSemantics
worked_onmemory.episodecode.symbol / code.file / github.issue / github.pull_requestWhat the agent's session targeted.
ranmemory.episodecode.test_caseTests the agent ran during the session.
observedmemory.episodecode.test_result / github.workflow_run / error fingerprintWhat the agent saw happen.
producedmemory.episodecode.commit / github.pull_requestCommits / PRs the agent's session produced.
used_toolmemory.episodeMCPToolMCP tool the agent called. Carries tool, args_hash, result_hash.
triggers_onmemory.procedurecode.symbol / code.test_case / patternAnchor that activates a procedure on recall.
derived_frommemory.procedurememory.episodeProvenance: which episodes promoted to this procedure.
supersedesmemory.procedurememory.procedureNewer procedure replaces an older one.
invalidated_bymemory.procedurecode.commitCommit that invalidated the procedure.

Derived (M7)

The derivation jobs run after every push and recompute the following edges from the structural graph. They carry confidence_score and computed_at.

Edge typeSource → TargetSemantics
expert_onpersoncode.symbol / code.fileTop contributor by modification volume + recency.
co_changes_withcode.symbolcode.symbolSymbols that historically change together. Powers code.co_changes_with.
introduced_bugcode.commitcode.commitSZZ — this commit introduced the bug a later commit fixed.
regressed_bycode.test_casecode.commitThe commit that regressed a previously-passing test.

Agent-authored (incidents)

Edge typeSource → TargetSemantics
reports / impactscode.incident ↔ code nodeAgent-authored incident edges; see the MCP server.

Why touched and not modifies for commit edges? modifies is reserved for the LLM-inferred semantic edge above (code.function → code.symbolthis function mutates this value). The structural commit→file edge would be a different relationship with the same string, which Neo4j would silently merge into a single edge type. An agent calling code.find_path { edge_types: ["modifies"] } would then get a mix of structural commit history and semantic data flow with no way to disambiguate. touched keeps the two cleanly apart.

The EdgeKind enum in packages/oxagen/oxagen/connectors/github/types.py is the source of truth for the names — every traversal tool accepts these strings as filters.

The canonical agent query

The whole graph is shaped to make the following query a one-call answer. Given a function name, return everything an agent needs to fix or refactor it: input shape, output shape, last commit, last author, diff, test coverage, and the docs that reference it.

MATCH (f:Node {type: 'code.function', name: $fn_name})
OPTIONAL MATCH (c:Node {type: 'code.commit'})-[:touched]->(f)
OPTIONAL MATCH (c)-[:authored_by]->(a:Node {type: 'code.author'})
OPTIONAL MATCH (f)<-[:is_tested_by]-(t:Node {type: 'code.function'})
OPTIONAL MATCH (doc:Node {type: 'code.file'})-[:references]->(f)
WHERE doc.properties.lang IN ['markdown', 'mdx']
OPTIONAL MATCH (c)-[ct:touched]->(f)
RETURN
  f.properties.params         AS input_shape,
  f.properties.return_type    AS output_shape,
  f.properties.signature      AS signature,
  c.properties.message        AS last_commit_message,
  c.properties.diff_patch     AS last_diff,
  ct.action                   AS last_change_kind,  -- added | modified | removed
  a.properties.name           AS last_author,
  collect(DISTINCT t.name)    AS test_coverage,
  collect(DISTINCT doc.name)  AS referencing_docs
ORDER BY c.properties.authored_at DESC
LIMIT 1

What the OPTIONAL MATCH clauses do, one at a time:

  1. The first MATCH anchors on the function. name is a property; the type filter restricts to code.function so a class or symbol of the same name does not collide.
  2. c-[:touched]->(f) walks back to every commit whose diff touched this function. The touched edge from a commit to a function is best-effort — emitted when the diff hunk header includes a function marker — but touched from commit to file is exhaustive, which is why code.commit is queryable as a sibling and not the only handle.
  3. c-[:authored_by]->(a) resolves the commit's author node. Authors are deduplicated on email across the repo, so an agent can ask "who else has touched this code?" with one more hop.
  4. f<-[:is_tested_by]-(t) returns the test functions that exercise the target. The connector also emits the inverse tests edge from the test function to the subject — pick the direction that fits the query.
  5. doc-[:references]->(f) returns markdown / mdx documents that name the function or its file. Design docs, changelogs, and incident write-ups all link back to the symbols they describe. Markdown / mdx files are stored as code.file nodes with lang in {markdown, mdx} — the WHERE clause above filters to those.

A vanilla file-search agent reproduces this with: git log --follow, git blame for the author, grep -r 'def parseJWT' to find the implementation, two more greps to locate tests, a final pass to find docs that mention the symbol — every step a separate tool call, every hop paying for tokens. The graph collapses it to one MCP call and returns typed records the agent does not have to re-parse.

Impact analysis across class members

The member_of edge makes the "if I change a method, what else in the class might break?" query a single hop. Expand the canonical query above to walk class siblings + their tests:

MATCH (m:Node {type: 'code.function', name: $method_name})
MATCH (m)-[:member_of]->(cls:Node {type: 'code.class'})
MATCH (cls)-[:defines]->(sibling:Node {type: 'code.function'})
WHERE sibling.id <> m.id
OPTIONAL MATCH (sibling)<-[:is_tested_by]-(t:Node {type: 'code.function'})
RETURN
  cls.name                       AS containing_class,
  sibling.name                   AS sibling_method,
  sibling.properties.signature   AS sibling_signature,
  EXISTS { MATCH (sibling)-[:calls]->(m) } AS calls_changed_method,
  collect(DISTINCT t.name)       AS sibling_tests

What this answers in one MCP call: which methods sit beside the one the agent is about to change, which of them call into it, and the tests that exercise each of those siblings. The agent does not need to grep, git blame, or re-parse — every relationship is already typed in the graph.

You do not write Cypher directly — ontology.explain_function, ontology.symbol_context, ontology.traverse, and ontology.ask compile to this shape. ontology.symbol_context is the closest direct wrapper: pass a name (or node_id) and it returns the target node, its containing class, sibling methods, tests, callers, callees, semantic returns, and the most-recent commits with action labels and resolved authors — every collection independently capped so a hub function can't blow up the response. See How agents query the graph below for the MCP tool surface.

// laser-context bundle for a single symbol — one MCP call, no Cypher
{
  "tool": "ontology.symbol_context",
  "args": { "name": "processOrder" }
}
// →
{
  "target": {
    "id": "…", "name": "processOrder", "kind": "function",
    "signature": "async function processOrder(input)",
    "file": "services/api/lib/orders.ts",
    "is_exported": true, "is_async": true
  },
  "containing_class": { "id": "…", "name": "OrderService" },
  "siblings":         [ { "id": "…", "name": "cancelOrder" } ],
  "tests":            [ { "id": "…", "name": "processOrder_handles_decline" } ],
  "callers":          [ { "id": "…", "name": "checkoutHandler" } ],
  "callees":          [ { "id": "…", "name": "chargeStripe" } ],
  "semantic_returns": [ { "id": "…", "name": "OrderResult" } ],
  "recent_commits": [
    {
      "commit": {
        "id": "…", "name": "9a4c1f10",
        "sha": "9a4c1f10abcdef…", "short_sha": "9a4c1f10",
        "message": "fix(api): retry chargeStripe on transient 5xx",
        "authored_at": "2026-04-22T14:03:12Z",
        "files_changed": 3, "insertions": 41, "deletions": 12
      },
      "action": "modified",
      "author": { "id": "…", "name": "Alex Rivera", "email": "alex@example.com" },
      "diff_excerpt": "@@ -1,3 +1,5 @@\n  retry()"
    }
  ]
}

Caps are configurable: siblings_limit, tests_limit, callers_limit, callees_limit, returns_limit, and commits_limit default to 50 / 50 / 50 / 50 / 25 / 10 respectively. Identity comes from the bearer token — workspace_id is never read from input.

How agents query the graph

Every entry in the catalogue below is a real MCP tool exposed by mcp.oxagen.ai. Tool names and input field names match the running server — see MCP Server for installation. The catalogue follows the SPEC §7 grouping so it stays one-to-one with the spec doc.

Discovery and structure

ToolWhat it returns
code.repo_overviewSize, languages, top modules, top contributors, hot files, alert counts.
code.module_treeLogical module hierarchy from code.file nodes.
code.find_symbolResolve a name to code.symbol / code.function / code.class nodes.
code.describe_symbolFull record per symbol — signature, callers count, callees count, summary.
code.read_symbolSource bytes for one symbol — line-sliced to its range with N lines of surrounding context (default 5, configurable 0–200). Replaces grep + git-show + base64 decode in one MCP call.

Traversal and impact

ToolWhat it returns
code.find_callersInbound calls edges to a target node, transitive up to depth=5.
code.callees_ofOutbound calls from a target node.
code.find_dependenciesOutbound calls / imports edges from a target node.
code.find_pathShortest path between two nodes constrained by edge_types.
code.references_toAll inbound references — calls, imports, uses.
code.co_changes_withSymbols that historically change together (M7 derived co_changes_with edge).
code.get_neighborhoodBidirectional 1- or 2-hop expansion around a target.
code.find_dead_codeNodes with no inbound calls / imports from elsewhere in the workspace.
code.find_cyclesCycles in the dependency graph, filtered by kind and min_length.
code.statsAggregate counts (files, functions, classes), fan-in / fan-out, cycle count.
ontology.explain_functionFull edge neighbourhood for a named function — what it calls, throws, reads, writes, returns.
ontology.impact_ofReverse traversal: every function that reads / writes / modifies / calculates a named symbol.
ontology.symbol_contextOne-call laser-context bundle for a symbol — class, sibling methods, tests, callers, callees, semantic returns, recent commits. Replaces 5+ separate traversal calls.
ontology.traverseBidirectional path enumeration from a node id, capped at max_hops=5, optionally filtered by edge_types.

SPEC §7.2 aliases dual-register at the same credit cost: code.callers_ofcode.find_callers, code.dependency_pathcode.find_path, code.affected_byontology.impact_of. Both names roll up to the canonical in usage analytics.

History and ownership

ToolWhat it returns
code.recent_changesRecently-changed files plus the commits that touched them.
code.blame_enrichedGit blame plus the introducing PR body and its reviews.
code.pr_historyPRs touching a target with merge status.
code.who_knows_aboutTop experts by EXPERT_ON score (M7 derivation).
code.expertiseWhat symbols a person owns (M7 derivation).

Tests and CI

ToolWhat it returns
code.tests_forTests that exercise a target.
code.coverage_forCoverage records for a file.
code.failing_testsCurrently failing test cases.
code.flaky_testsTests that flip pass/fail across recent runs.
code.last_runLast workflow run for a workflow.
code.run_failuresPer-job failures inside a workflow run.

Dependencies and security

ToolWhat it returns
code.dependenciesDirect dependencies with constraints, resolved versions, and licenses.
code.dependency_graphTransitive dependency subgraph for a package.
security.open_alertsAggregate open Dependabot / code-scanning / secret-scanning alerts.
security.alert_contextFull context for one alert — anchor, vulnerability, related commits.

Issues, PRs, discussions

ToolWhat it returns
code.pr_contextA PR plus its linked issues, reviews, and comments.
code.issue_contextAn issue plus linked PRs, commits, and discussion.
code.find_issuesSubstring + filter search across github.issue nodes.
code.discussion_contextA discussion plus its accepted answer and related code.
ToolWhat it returns
ontology.searchHybrid vector + structural search across the workspace graph.
code.find_patternHybrid score over code.chunk nodes — summary match plus structural anchors.
ontology.askHybrid retrieval, then an LLM composes a grounded answer with cited node UUIDs.

Memory

ToolWhat it returns
memory.recallProcedures-first recall envelope — top-K procedures plus the episodes that promoted them.
memory.rememberAnchor a fresh memory.episode to the active session.
memory.procedure_forDirect trigger-pattern lookup against memory.procedure nodes.
memory.forgetSoft-delete a memory. Pinned memories require an explicit grant.

Maintenance

ToolWhat it returns
ontology.refresh_repoTrigger a paths-restricted re-ingest on a connection.

Identity (workspace_id, user_id) is derived from the verified bearer token — never passed as input. Cross-workspace queries return empty results.

Response envelope

Every code-graph tool returns the same canonical envelope so the agent parses one shape regardless of which tool emitted it.

{
  "results": [...],
  "evidence": [
    {"node_id": "…", "kind": "code.function", "url": "https://github.com/…"}
  ],
  "tokens_used_estimate": 142,
  "counterfactual_estimate_tokens": 8400,
  "counterfactual_method": "grep_plus_read_n_files",
  "cursor": null,
  "tenant_scoped_at": {
    "tenant_id": "…",
    "workspace_id": "…",
    "project_id": null
  }
}
  • results — the typed records the tool was asked for.
  • evidence — node IDs the agent can cite back. Every record in results is reachable from at least one evidence entry.
  • tokens_used_estimate — tokens spent on this tool's reply, computed from the assembled payload (Anthropic SDK token-count API → tiktoken cl100k_base approximation → deterministic fallback). Note: the tiktoken fallback uses GPT-4 tokenisation, so estimates may differ slightly from Anthropic-billed token counts.
  • counterfactual_estimate_tokens — what the same answer would have cost a vanilla file-search agent, computed by a per-tool estimator. This is the receipt for the "fewer LLM calls, smaller models, lower bills" claim.
  • counterfactual_method — stable method-id naming the estimator (e.g. grep_plus_read_n_files, lockfile_transitive_parse_plus_manifests).
  • cursor — opaque pagination token, or null for unpaginated results.
  • tenant_scoped_at — the workspace the tool resolved to. Always populated.

The envelope is enforced — malformed responses are rejected before they reach the agent.

Time-travel queries (at_commit)

Every relationship in the graph carries valid_from and a nullable valid_to, so a query can ask the graph as it stood at any commit. Tools that traverse the graph accept an optional at_commit parameter; the Cypher rewriter ANDs r.valid_from <= $at AND coalesce(r.valid_to, '9999-…') > $at into the first MATCH for every named relationship variable. Bare patterns are left alone.

// callers of parseJWT as of commit 9a4c1f2
{
  "tool": "code.find_callers",
  "args": {
    "name": "parseJWT",
    "at_commit": "9a4c1f2"
  }
}

The same shape works on ontology.explain_function, code.find_path, code.references_to, memory.recall, and any other tool that traverses bitemporal edges. The response envelope's tenant_scoped_at records the workspace; evidence records the node IDs at that commit.

A vanilla file-search agent reproduces this with git checkout <sha>, repeats every grep, parses every file again, and discards the working tree. The graph keeps the historical state queryable without checking anything out.

How agents share state through the graph

Two agents that never run in the same process can still share state through the workspace graph. The first writes a node — a finding, a pattern, a code.commit — and the next agent's first MCP call sees it.

A concrete shape:

  1. 2026-04-22, 14:03 UTC. Agent A (Claude Code in a developer's editor) merges PR #517 to main. The push webhook triggers an incremental sync. code.commit { sha: "9a4c1f…" }, code.author { email: "alex@…" }, and the code.commit-[:touched]->code.function { name: "parseJWT" } edge land in the workspace graph within ~30 seconds.
  2. 2026-04-25, 08:11 UTC. Agent B (a triage agent invoked by an on-call engineer) receives a stack trace mentioning parseJWT. Its first MCP call is ontology.explain_function { name: "parseJWT" }. The response contains every commit since 04-22, author Alex, the diff for #517, and the tests that exercise the function — without Agent B running git log or cloning the repo.

The agents never coordinated. The graph did. Multi-agent coordination is structural, not bolt-on: the second agent reads the first agent's output the same way it reads any other graph node.

Memory and the graph are the same store

The agent memory layer (Agent Memory) writes _mem:action, _mem:sequence, and _mem:pattern nodes into the same Neo4j workspace graph as your code.* nodes. A pattern that applies to a specific function is structurally connected to it, so retrieving the pattern pulls the symbol with it.

When the evaluator promotes a pattern keyed on a code symbol — for example, "ontology mutations on parseJWT fail with NodeNotFoundError 83% of the time" — it writes:

{
  "type": "_mem:pattern",
  "name": "ontology_mutation:code.function:NodeNotFoundError",
  "properties": {
    "confidence_score": 0.83,
    "applies_to": {
      "type": "code.function",
      "name": "parseJWT"
    },
    "recommendation": "Validate the node exists and is accessible before proceeding."
  }
}

The applies_to shape is matched by the pre-execution context hook, which traverses from the pattern node to the referenced code node in one hop. An agent calling memory.context for parseJWT receives the pattern, the function node, its commits, and its tests in the same payload — one MCP call, no follow-up.

{
  "patterns": [
    {
      "id": "5c7c…",
      "name": "ontology_mutation:code.function:NodeNotFoundError",
      "confidence_score": 0.83,
      "recommendation": "Validate the node exists and is accessible before proceeding."
    }
  ],
  "code_context": {
    "function": {
      "id": "a3f2…",
      "name": "parseJWT",
      "signature": "function parseJWT(token: string): JWTClaims",
      "file": "services/api/lib/auth.ts",
      "is_exported": true
    },
    "tests": [
      { "id": "b8c1…", "name": "parseJWT_handles_expired_token" }
    ],
    "recent_commits": []
  }
}

The shape is deterministic. The agent does not need to interpret free-form text — it reads the typed record and decides.

Per-repo configuration

Every GitHub connection ships with the manifest fields below. They are read on the first sync and editable from Connections → Settings (or via the API on the connection record).

FieldTypeDefaultWhat it does
repo_full_namestringThe repository in owner/repo form.
default_branchstringmainBranch to ingest. Other branches are not parsed; cross-branch comparison happens via the GitHub PR / commit metadata pass.
include_pathsstring[]apps/**, packages/**, services/**Glob set for files to parse.
exclude_pathsstring[]**/node_modules/**, **/.next/**, **/dist/**, **/*.d.tsGlob set for files to skip before parsing.
languagesstring[]["typescript", "python"]Parser languages enabled for this repo. The source parser ships TypeScript / JavaScript and Python.
ingest_commit_historybooleantrueCreates code.commit nodes for recent commits with diffs. Lets agents query what changed, who, and why.
enable_semantic_edgesbooleantrueRuns an LLM pass to infer reads / writes / returns / validates between functions. Disable on large repos to reduce cost.
sync_schedulestringdailymanual, daily, or weekly — when to re-index automatically. Push webhooks always trigger an incremental sync regardless.
commit_depthnumber100How many recent commits to ingest on first sync. min: 10, max: 500.
min_import_edge_referrersnumber3Suppress file-level imports edges to any target imported by more than this many files in the same repo — aggregated to a single depends_on edge instead. Reduces hub-node clutter (React, lodash).

Existing connections inherit the documented defaults on the first sync after a settings update — the manifest defines the schema, the API merges new keys non-destructively, and the worker honours the merged blob on the next run.

Sync lifecycle

A repo connection moves through four states, defined by the ConnectionStatus enum:

statusMeaning
pendingCreated but no successful sync yet. The first backfill is running or queued.
activeAt least one sync has completed. The connection is live and accepting webhooks.
errorThe most recent sync raised. last_error carries the message; retry is one click.
pausedOperator-paused. No automatic syncs run; manual ontology.refresh_repo still works.

Five things drive a sync:

  • Initial backfill. Triggered automatically on connect. Reads commit_depth commits, walks include / exclude globs, parses every supported file, resolves edges, writes nodes.
  • Push webhook. GitHub pushes to the default branch trigger an incremental sync that re-parses changed files and adds new code.commit nodes since last_synced_sha.
  • Schedule. sync_schedule runs a defensive re-sync daily or weekly to recover from missed webhooks. manual opts out — only push webhooks and explicit ontology.refresh_repo calls trigger a sync.
  • Reconciliation cadence. Even if a webhook is dropped or rate-limited, the graph self-heals on a known schedule: an hourly drift sweep refreshes issues / PRs / discussions, a nightly metadata pass refreshes repository / branch / label / milestone / CODEOWNERS / branch-protection state, a nightly security pass refreshes Dependabot / code-scanning / secret-scanning alerts, and a Sunday-night full reparse re-derives every code.symbol from source. State per (workspace, surface) is tracked in a watermark table so each sweep resumes from where the last one left off.
  • Manual. Force a re-sync from the connection card or from any agent with ontology.refresh_repo { connection_id, paths? }. Pass paths to limit the re-ingest to a subset of the repo — useful after a targeted edit when you do not want to wait for the next webhook.

Where to go next

Get started free · Read the docs

On this page