Oxagen Docs

Artifact storage

Where Oxagen stores everything the agent produces — the unified app.documents table, the source / provider / generated_by_run_id discriminators, and the documents browser that answers "what did the agent ship last week?".

Every artifact the agent generates — Google Doc, Google Sheet, Google Slides deck, PDF — lands in the same app.documents table that holds user-uploaded files and in-product markdown notes. There is no agent-specific document store, no parallel agent_artifacts table, no hidden bucket. The agent's output is a first-class workspace document.

The unified app.documents row

Three columns disambiguate the agent surface:

ColumnValuesMeaning
sourceuser_upload · agent_generated · external_importWhere the document came from. Constrained by a CHECK so SQL consumers can rely on the closed value set.
generated_by_run_idUUIDThe agent run that produced this document. Populated when source='agent_generated'.
providergoogle · microsoft · local · connector-specificThe external system responsible for the binary. local means the bytes live in object storage; the rest mean a vendor copy.
external_idstringStable id for the vendor object (e.g. Google Drive file id).
external_urlstringViewer URL — the link that opens the live doc in the vendor's UI.
parent_document_idUUIDSelf-FK to the document this row was derived from. A docs.export_pdf PDF points at the source Doc; an external sync revision points at the previous revision.

Beyond these, the row carries the same fields a user upload carries — label, name, extension, mime_type, storage_path, preview_pdf_path, page_count, file_size_bytes, ingestion_status. Generation does not skip ingestion: an agent-authored doc is indexed into the workspace graph the same way an uploaded doc is, so an agent reading "every doc that mentions Acme" sees its own output alongside human uploads.

What gets stamped on each format

Capabilitykindextensionmime_typeprovider
docs.create_from_specuploaded.gdocapplication/vnd.google-apps.documentgoogle
sheets.create_from_specuploaded.gsheetapplication/vnd.google-apps.spreadsheetgoogle
slides.create_from_specuploaded.gslidesapplication/vnd.google-apps.presentationgoogle
docs.export_pdf / sheets.export_pdf / slides.export_pdfuploaded.pdfapplication/pdfgoogle (re-export from Drive)
pdf.convertuploaded.pdfapplication/pdflocal (Gotenberg)

source is always agent_generated. generated_by_run_id always points at the run that authored it.

Provenance chain

The parent_document_id self-FK is the chain agents traverse to answer "what was this PDF rendered from?":

Drive Doc (source='agent_generated', generated_by_run_id=R1)
   ↑ parent_document_id
PDF export (source='agent_generated', generated_by_run_id=R1)
   ↑ parent_document_id
LibreOffice-converted PDF of a user upload (source='agent_generated', generated_by_run_id=R2)

The chain is workspace-scoped — a follow-up parent_document_id hop never leaves the workspace. Re-rendering an artifact reuses the chain (same parent, new sibling row) so audit reconstructs "who exported this, when, with which run".

The documents browser

The dashboard's documents browser reads app.documents directly. The agent-output view applies one filter:

SELECT *
FROM app.documents
WHERE workspace_id = $1
  AND source = 'agent_generated'
  AND is_deleted = false
ORDER BY created_at DESC
LIMIT 50

A composite index (workspace_id, source, created_at DESC) supports the dominant pagination pattern without an extra sort. The same index powers the "agent output, this week" query on the workspace overview card.

Tags

Workspace tags (app.document_tags joined via app.document_tag_links) attach to agent-authored documents the same way they attach to uploads. The generation tools auto-apply three tags when present in the workspace tag dictionary:

TagValue
agent<agent_slug> — the named agent that produced the doc, when set.
kinddoc · sheet · slides · pdf
run<run_id> — back-reference to the agent run.

Tags missing from the dictionary are silently skipped — the agent never invents a tag. Workspace admins manage the dictionary at Settings → Document tags.

Linked nodes

The ingestion pipeline writes a document node to the workspace ontology for every row and joins it to extracted entities via mentions edges. app.document_node_links keeps the Postgres ↔ Neo4j bridge — an agent calling ontology.list_nodes { type: 'document' } gets every agent-authored doc alongside user uploads, queryable by the same mentions graph traversals.

Soft delete and retention

Documents follow the standard soft-delete contract from WorkspaceScopedBaseis_deleted = true plus deleted_at and deleted_by_id. A soft-deleted agent artifact stops appearing in the documents browser and stops contributing to ingestion, but the row remains queryable from the audit chain. Hard delete is a workspace-admin-only operation; the audit.event row recording the delete is not deletable (the audit schema revokes DELETE on the oxagen role).

Storage targets

The binary location depends on the provider:

  • provider = 'google' — the canonical bytes live in Google Drive. storage_path is null; external_url is the viewer URL. The PDF export of a Google artifact is also stored locally (via the export endpoint's byte stream).
  • provider = 'local' — the bytes live in object storage at storage_path (gs://oxagen-documents/...). Used by pdf.convert and by user uploads.
  • provider = 'microsoft' — same as Google but the file lives in the Microsoft Graph drive item. Used by external imports from the Microsoft 365 connector.

The two paths share the same app.documents surface so an agent reading the documents browser never has to branch on provider.

Audit

Each generation, conversion, share, and delete writes an audit.event row chained to the workspace's audit stream. See Events, triggers, and audits.


Artifacts overview · Document generation · Brand kits · Events, triggers, and audits

On this page