Oxagen Docs

Ingestion

How data enters the Oxagen knowledge graph.

Overview

Ingestion is the path from raw content to typed rows in your workspace ontology. The public API exposes one ingestion entry point — the prompt endpoint — which accepts free-form text and structured additions in the same shape.

Code-graph ingestion → see Code Graph. Connecting a GitHub repository runs a parallel pipeline that emits typed nodes across nine categories — repository, identity, filesystem, code semantics, GitHub metadata (PRs / issues / discussions / reviews), CI (workflows / runs / jobs), tests + coverage, security alerts, and agent memory — plus the structural, semantic, derived, and bitemporal edges that connect them. The Code Graph page covers node and edge inventories, time-travel queries, the M9 MCP catalogue, the canonical response envelope, and the per-repo manifest. The prompt and structured-input pipeline below remains the entry point for all non-code data.

Entry pointAPI pathWhat it accepts
PromptPOST /v1/ontology/promptFree-form text — notes, emails, snippets
DirectPOST /v1/ontology/nodesStructured node payload
EdgesPOST /v1/ontology/edgesStructured edge payload

Every path goes through the same extraction pipeline and produces the same kind of typed nodes and edges, scoped to the caller's workspace.

Additional ingestion surfaces (persisted notes, file uploads, OAuth data-source connectors, GitHub repository code, URL auto-fetch from chat) are available inside the Oxagen web app at app.oxagen.ai and run on the same underlying pipeline. See Connectors for the full list of first-party data sources and the typed nodes + edges each one produces. For private GitHub repositories, install and configure the Oxagen GitHub App as described in GitHub App (private repositories).

The Pipeline

Every ingested item flows through the same five-step pipeline in a background worker:

  1. Classify — an LLM assigns a node type (person, meeting, receipt, etc.) and generates a short title/summary.
  2. Extract — text is normalised. For prompts and structured inputs, content is already text.
  3. Embed — a 512-dim vector is generated for every node and stored in pgvector for semantic search.
  4. Relate — the refiner discovers edges to existing entities in the workspace (for example, a new meeting linked to an existing person node).
  5. Persist — nodes and edges are written with workspace_id stamped; RLS ensures every subsequent read is scoped to your workspace.

The prompt endpoint runs extraction inline and returns the extracted shape directly. Structured node / edge creation is synchronous.

Entry point details

Prompt — one-shot text

Best for: pasting a block of text (a meeting note, an email body, a snippet of a document).

curl -X POST https://api.oxagen.ai/v1/ontology/prompt \
  -H "Authorization: Bearer <token>" \
  -H "Content-Type: application/json" \
  -d '{"prompt": "Met with Sarah Chen at Acme. They want to close the enterprise deal by May 30."}'

The response echoes nodes_created and edges_created counts plus an intent classifier (update, query, or both). See Ontology.

Structured nodes and edges

Best for: clients that already have typed entities (for example, an agent that has computed a new relationship and wants to persist it deterministically).

# Create a node
curl -X POST https://api.oxagen.ai/v1/ontology/nodes \
  -H "Authorization: Bearer <token>" \
  -H "Content-Type: application/json" \
  -d '{
    "type": "Person",
    "title": "Sarah Chen",
    "properties": { "email": "sarah@acme.com" }
  }'

# Create an edge between two nodes
curl -X POST https://api.oxagen.ai/v1/ontology/edges \
  -H "Authorization: Bearer <token>" \
  -H "Content-Type: application/json" \
  -d '{
    "type": "works_at",
    "source_id": "<person_node_id>",
    "target_id": "<org_node_id>"
  }'

What Gets Stored

Regardless of entry point, Oxagen stores:

  • Nodes — the entities we recognize (people, organizations, tasks, dates, documents).
  • Edges — typed relationships between those nodes.
  • Embeddings — 512-dim vectors that power semantic search.
  • Provenance — a pointer back to the source item so every node can trace its origin.

Oxagen does not store raw document text or fetched HTML. Ingested content is reduced to structured rows on ingest.

See Security & Privacy for the full data handling model.

Errors You May See

StatusMeaning
400Missing or malformed content
402Insufficient credits
422Validation error — check the detail field
429Rate limited

On this page