Workspace Memory
How memory works inside an Oxagen workspace — the four layers (semantic, episodic, telemetry, working set), the master toggle, what each layer retains, and how the chat path reads them on every turn.
Overview
A workspace is not just a graph — it is the unit of memory for every agent
that touches it. Memory in Oxagen is workspace-scoped, layered, and
opt-in. One toggle (workspace.config.memory_enabled) controls whether
the workspace records and recalls anything beyond the active conversation
window.
There are four layers, each with a different lifetime, store, and read path. They are independent — disabling memory disables capture into the durable layers but leaves the active conversation untouched, and the ontology graph itself is always available because it is the workspace.
| Layer | Lifetime | Store | What it holds |
|---|---|---|---|
| Semantic | Permanent (until soft-deleted) | Neo4j + Postgres mirror | The workspace ontology — typed nodes and edges. The thing the agent reasons over. |
| Episodic | Permanent (summary only) | Postgres app.memory_episodes | Rolled-up summaries of older conversational turns. Source raw turns are hard-deleted on consolidation. |
| Telemetry | Permanent | Postgres agent_memory.* + Neo4j _mem:* | Per-action audit, grouped sequences, promoted patterns, benchmark trends. |
| Working set | TTL (default 1h, refreshed on read) | Redis | Volatile, per-conversation scratchpad — planner decisions, recent retrievals, intermediate tool outputs. |
The CLAUDE-rule that governs all four: no raw user data. Episodic consolidation deletes source messages once a summary exists. Telemetry records action types and outcomes, not payload bodies. The working set is volatile by construction. The ontology stores typed entities and embeddings, never raw documents.
Enabling memory for a workspace
The master toggle lives at workspace.config.memory_enabled (a JSONB
field on org.workspaces.config). When false, the workspace still has
its ontology graph — semantic memory is structural, not optional — but
durable capture into the episodic and telemetry layers is paused.
Via the app
- Open the workspace.
- Settings → Memory → Enable memory.
The setting round-trips through PUT /v1/workspaces/{workspace_id}/settings
under the memory.enabled key. The server bridges that into
workspace.config["memory_enabled"] so every read path
(is_memory_enabled(workspace)) sees the same value.
Via MCP
workspace.enable_memory
workspace.disable_memoryThe bearer token resolves the workspace; no arguments needed.
Via REST
POST /v1/workspaces/{workspace_id}/memory/enable
POST /v1/workspaces/{workspace_id}/memory/disableThese endpoints are internal (not in /openapi.json) and gate on
workspace membership.
Layer 1 — Semantic memory (the ontology)
The workspace knowledge graph is the semantic memory layer. Every node and edge is workspace-scoped, RLS-enforced at the database, and written through the same provider so vector search and graph traversal agree on what exists.
The chat path embeds this layer into every turn through the ontology tools — hybrid search picks the most relevant nodes, the prompt-builder injects them into context, and the response links back to the node IDs that supported it.
Semantic memory is always live for a workspace. The
memory_enabled toggle does not turn it off. To hide a node, soft-delete
it; to remove a slice of the graph, use the connection-delete flow.
See Ontology for the full data model, hybrid-search behavior, and the credit-metered prompt-run audit.
Layer 2 — Episodic memory
Episodic memory holds summaries of older conversation turns so the chat path can recall context that has scrolled out of the active window without reading raw transcripts.
Data model
One row per consolidated chunk in app.memory_episodes:
| Column | Purpose |
|---|---|
conversation_id | The conversation the episode belongs to. Cascade-deletes with the parent conversation. |
summary | LLM-generated third-person paragraph covering the consolidated turns. |
source_message_ids | UUIDs of the source app.messages rows. The rows themselves are deleted on insert — this column is a forensics breadcrumb, not a live foreign key. |
turn_count | How many source messages folded into this episode. |
workspace_id | RLS scope. Inherited from WorkspaceScopedBase. |
The summary prompt explicitly prefers types and roles over names — "the user" rather than "Mac", "the assistant", named entities only when the conversation is meaningfully about them.
Read path — live
On every chat turn, _get_episodic_context reads the five most recent
episodes for the active conversation and prepends them to the system
prompt as:
Earlier conversation summaries (most recent last):
- (12 turns) <summary>
- (8 turns) <summary>
- ...Read failures (missing table, transient DB error) are swallowed and logged at WARNING — episodic recall is best-effort and never breaks a chat turn. If the workspace has no episodes yet, the block is empty and nothing is injected.
Write path — consolidation
Consolidation rolls up older raw messages into one episode and
hard-deletes the sources. summarize_old_turns(session, conversation_id, retain_last=20, max_age_days=30) does the work:
- Picks messages older than
max_age_daysthat fall outside theretain_lastmost-recent rows. - Asks the orchestration router (
kind="qa") to write a single-paragraph summary using the workspace's configured model. - Inserts a
MemoryEpisoderow. - Hard-deletes the source
app.messagesrows.
The function is idempotent: consolidated rows are gone, so a re-run finds no new candidates among them. Tails shorter than two messages are skipped.
How episodes are produced. Callers invoke the consolidator function directly — the worker does not run it on a schedule. Recall reads episodes when they exist and returns the empty string when they do not, so a workspace with no consolidator calls still gets a well-formed recall response.
Lifecycle
Episodes are soft-deletable (the is_deleted column inherited
from SQLTableBase) and cascade-delete with the parent
conversation. Deleting a conversation removes both the remaining raw
messages and every episode summarising it.
Layer 3 — Agent telemetry
Telemetry captures what agents did in the workspace — every tool call, every mutation, every dead end — and promotes recurring outcomes into patterns that are injected back into the agent system prompt before the next action.
This is the layer documented in detail under Agent Memory. The short version:
agent_memory.agent_actions— one row per tool call, with action type, target node type, outcome, and error code (no payloads).agent_memory.agent_sequences— actions grouped by session.agent_memory.agent_patterns— promoted when ≥5 sequences share the same(action_type, node_type, error_code)at ≥60% confidence.agent_memory.agent_benchmarks— nightly metric values per workspace.
Each Postgres row has a Neo4j twin under the _mem:* label space, so
telemetry travels alongside the ontology in the same graph and an
operator can hop from any domain node to its full agent history in
one edge.
Telemetry is the layer where the memory_enabled toggle has the most
behavioural impact — it is the only layer that also changes how the
agent runs (via pre-execution context injection). Disabling memory
stops capture and stops injection; existing rows remain queryable for
audit.
The legacy module is named oxagen.domains.agent_memory; the
oxagen.domains.memory.__init__ docstring marks it for a rename to
agent_telemetry in a follow-up. Treat agent telemetry as the
canonical name for this layer; the existing /docs/agent-memory page
covers the full surface.
Layer 4 — Working set
The working set is the volatile, per-conversation scratchpad. It holds whatever the planner needs to remember inside a single conversation but does not need to persist beyond it: the last planner decision, the most recent retrieval result, intermediate tool outputs.
Layout
oxa:ws:<workspace_id>:<conversation_id>:<slot>Values are JSON-encoded. Slot names are caller-defined; the API takes
any string. Suggested conventions: planner_decision,
recent_retrieval, intermediate_results.
Behaviour
- TTL. Default one hour. Every read refreshes the TTL — every read is a heartbeat.
- Eviction. A slot quietly disappears once its TTL expires. Readers
get
Noneand must reconstruct. - Clear.
clear_working(workspace_id, conversation_id)wipes every slot for one conversation in a single SCAN+DELETE pass; called when the conversation ends.
Maturity note
The working-set module ships in oxagen.domains.memory.working_set
and has Redis-coupled tests. It is a primitive callers opt into
directly — the chat path reads from the semantic and episodic layers
and does not consult the working set on its own.
How a chat turn uses memory
Each turn assembles its prompt from multiple layers. The flow:
User message arrives
│
▼
Check workspace.config.memory_enabled
│
▼ (always)
Hybrid-search the ontology — semantic memory hits
│
▼ (always, best-effort)
recall_recent_episodes() — episodic summaries
│
▼ (only if memory_enabled)
Inject promoted telemetry patterns referencing the target nodes
│
▼
Compose system prompt:
[episodic summaries] + [ontology context] + [pattern context]
│
▼
Run model, stream response
│
▼ (only if memory_enabled)
Capture action(s) into agent_memory.agent_actions
│
▼ (eventual, when consolidation runs)
Old turns rolled into a new MemoryEpisode; source messages deletedWhat each layer contributes:
- Semantic answers what does the workspace know about this?
- Episodic answers what has this conversation already covered?
- Telemetry answers what tends to go wrong when an agent does this?
- Working set answers what was I just thinking? (within one conversation, for one planner)
The turn never blocks on a memory layer. Failures in episodic recall, pattern lookup, or the working set are logged and skipped — the chat path always has the active conversation messages plus the ontology to fall back on.
Privacy and data retention
Workspace memory follows the project-wide rule: no raw user data persists past the active window.
| Layer | What persists | What does not |
|---|---|---|
| Semantic | Typed entities, edges, embeddings, location references | Raw documents, raw email bodies, file contents |
| Episodic | LLM summary, message UUIDs, turn count | Source message bodies (hard-deleted on consolidation) |
| Telemetry | Action type, target node type, outcome, error code, latency | Tool arguments, response bodies, user-supplied strings |
| Working set | JSON values the caller chooses to write | Anything past the TTL — values evaporate on expiry |
Cross-workspace memory is explicitly out of scope. Patterns, episodes, and working-set slots never leave the workspace they were written in. RLS enforces this at the database level for the Postgres layers; key namespacing enforces it for Redis.
REST and MCP surface
Toggle
POST /v1/workspaces/{workspace_id}/memory/enable
POST /v1/workspaces/{workspace_id}/memory/disable
PUT /v1/workspaces/{workspace_id}/settings # body: {"memory": {"enabled": true}}workspace.enable_memory
workspace.disable_memoryTelemetry (agent memory)
See Agent Memory — MCP tools reference for the full surface. Highlights:
memory.search # query patterns and sequences
memory.context # semantic search over sequence intent_summary
memory.annotate # annotate or suppress a pattern
memory.metric_create # define a custom metric
memory.benchmark_get # benchmark trend for a metricEpisodic and working set
The episodic and working-set layers do not currently expose dedicated
REST or MCP tools — they are internal primitives the chat path uses.
Conversation deletion (DELETE /v1/conversations/{id}) cascades
through app.memory_episodes. Episode rows are visible via the
admin tooling for incident response.
Disabling memory
Per-workspace:
workspace.disable_memoryDisabling stops new telemetry capture and stops pre-execution pattern injection. Existing telemetry rows, episodes, and ontology data are preserved and remain queryable. Re-enabling resumes capture from the point of re-enable; nothing back-fills.
To delete memory:
- Conversation-scoped — delete the conversation. Episodes cascade.
- Workspace-scoped — soft-delete or hard-delete the workspace. Every layer is workspace-scoped, so the data goes with it.
- Pattern-scoped —
memory.annotatewithsuppressed: truehides a telemetry pattern from injection without deleting the row.
Related
- Agent Memory — the telemetry layer in detail.
- Ontology — the semantic layer, the data model, and the credit-metered prompt-run audit.
- Security — RLS, workspace isolation, and the cross-workspace boundary.
Models
How Oxagen chooses which language model powers each request, and how to bring your own provider key.
Question Answering
Query your workspace ontology in natural language. The question-answerer agent runs hybrid retrieval over your typed knowledge graph and returns a grounded, citation-backed narrative.