Oxagen Docs

Ingest Zoom meetings, attendees, and recorded transcripts into the workspace knowledge graph. The connector infers action items, deliverables, decisions, problems, and opportunities as typed nodes and wires them to the people they belong to.

Zoom Meetings

Meetings + transcripts → action items, decisions, deliverables.

The Zoom connector turns post-meeting cleanup into structured graph data. Every past meeting the authenticated host attended becomes a meeting node, every participant becomes a person node connected to that meeting, and — when cloud recording is enabled and a VTT transcript is available — an LLM extraction pass mines the transcript for action items, deliverables, decisions, problems, opportunities, and open-ended concepts. Each inferred item lands as its own typed node, edged back to the source meeting and (when the owner is on the participant list) to the person responsible.

The end result is a graph your agents can traverse in one hop: "What did Alice commit to in last week's meetings?", "Every customer pain we heard in Q2", "All deliverables that mention the Acme contract" — each is a single MATCH against the ontology.

What gets ingested

Source	Node type	Edge type	Direction
Past meeting	`meeting`	—	—
Host (from meeting payload)	`person`	`hosted` / `hosted_by`	Person → Meeting + reverse
Participant (from past-meeting participants API)	`person`	`had_participant` / `attended`	Meeting → Person + reverse
Inferred action item	`action_item`	`produced_action_item`	Meeting → Action item
Inferred deliverable	`deliverable`	`produced_deliverable`	Meeting → Deliverable
Inferred decision	`decision`	`produced_decision`	Meeting → Decision
Inferred problem	`problem`	`surfaced_problem`	Meeting → Problem
Inferred opportunity	`opportunity`	`surfaced_opportunity`	Meeting → Opportunity
Inferred concept	`concept`	`discussed_concept`	Meeting → Concept
Action item / deliverable owner (when match exists)	`person`	`assigned_to`	Action item → Person

Every meeting node carries the raw Zoom UUID, topic, start and end times, duration, host email, join URL, and (when ingested) the cleaned transcript text plus a 2–4 sentence meeting summary. Every inferred-insight node carries the model's title, a 1–2 sentence grounded paraphrase, the claimed owner email, and a 0.0–1.0 confidence score.

How a single meeting flows through the graph

Consider a real sales call:

Acme weekly sync — Tuesday 2pm. Host: mac@oxagen.ai. Attendees: alice@oxagen.ai, bob@acme.com, carol@acme.com. Cloud recording on.

Transcript excerpt: "Bob: We're blocked on getting the SSO config approved by our security team — that's our biggest pain right now. Mac: I'll send over our SOC 2 report by Friday so you can fast-track approval. Alice: I'll draft the technical onboarding deck for the security review. Carol: We can probably commit to 100 seats if onboarding goes smoothly. Mac: We'll go with the Pro plan then. Bob: Sounds good."

After this meeting syncs, the workspace graph contains:

(meeting "zoom:abc==", topic="Acme weekly sync", start="...")
   ├─[:hosted_by]──> (person "mac@oxagen.ai")
   ├─[:had_participant]──> (person "alice@oxagen.ai")
   ├─[:had_participant]──> (person "bob@acme.com")
   ├─[:had_participant]──> (person "carol@acme.com")
   │
   ├─[:produced_action_item]──> (action_item "Send SOC 2 report to Acme by Friday")
   │                              └─[:assigned_to]──> (person "mac@oxagen.ai")
   ├─[:produced_action_item]──> (action_item "Draft technical onboarding deck")
   │                              └─[:assigned_to]──> (person "alice@oxagen.ai")
   ├─[:produced_deliverable]──> (deliverable "Technical onboarding deck")
   ├─[:produced_decision]──> (decision "Go with Pro plan for Acme")
   ├─[:surfaced_problem]──> (problem "SSO approval blocked at Acme security")
   ├─[:surfaced_opportunity]──> (opportunity "Acme commits to ~100 seats")
   ├─[:discussed_concept]──> (concept "SOC 2 compliance")
   ├─[:discussed_concept]──> (concept "SSO onboarding")
   └─[:discussed_concept]──> (concept "Pro plan pricing")

Every one of those nodes is queryable, embeddable, and edgeable. The alice@oxagen.ai node is the same node that Gmail, Calendar, Meet, and Contacts write to — so cross-source traversals "just work."

Real use cases

Sales: weekly customer-pain digest

"What problems did customers raise in the last 7 days, grouped by account?"

The agent runs:

MATCH (m:meeting)-[:surfaced_problem]->(p:problem)
WHERE m.start_time >= datetime() - duration({days: 7})
MATCH (m)-[:had_participant]->(person:person)
WHERE person.email ENDS WITH '@acme.com'
   OR person.email ENDS WITH '@globex.com'
RETURN person.email AS account_contact,
       p.title AS problem,
       p.description AS context,
       m.start_time AS heard_at
ORDER BY m.start_time DESC

The same problems node is then a natural edge target for competing_with, solved_by_feature, or builds edges written from other surfaces — for example, Linear issues that name a feature, or GitHub PR descriptions that reference the same problem text.

Operations: who owes what, by Friday

"Show me every open action item assigned to me from last week's meetings."

MATCH (m:meeting)-[:produced_action_item]->(a:action_item)
      -[:assigned_to]->(p:person {email: 'mac@oxagen.ai'})
WHERE m.start_time >= datetime() - duration({days: 7})
RETURN a.title, a.description, m.topic, m.start_time
ORDER BY m.start_time DESC

Because assigned_to only fires when the LLM-extracted owner email matches an existing person node, this query never returns hallucinated owners. The connector deliberately does not mint a Person from a model-extracted email — it looks the person up against the participant list (and the rest of your workspace's people corpus) and skips attribution when there's no match. The owner_email property on the action item preserves the model's raw claim for downstream review.

Product: opportunities by theme

"Surface every expansion / partnership / new-use-case opportunity mentioned across all meetings this quarter, clustered by topic."

The opportunity nodes carry both their own embedding and a discussed_concept neighborhood from the same meeting, so the agent can group them with k-NN against the concept embeddings and write back a clustered_with edge as a derived view.

Engineering: tech-debt and risk surface

"Every problem that came up in our last 30 days of engineering syncs, and who raised it."

The connector is bias-neutral on topic — it surfaces customer pain, market threats, blockers, broken systems, tech debt, and architectural risks under the same problem type. The system prompt nudges the model toward business / product / sales / engineering use cases but explicitly leaves room for research, classroom, medical, legal, and personal contexts via the open-ended concept category.

Customer success: the "what happened in this account" timeline

MATCH (org:concept {title: 'Acme'})<-[:discussed_concept]-(m:meeting)
OPTIONAL MATCH (m)-[:produced_action_item]->(a:action_item)
OPTIONAL MATCH (m)-[:surfaced_problem]->(p:problem)
OPTIONAL MATCH (m)-[:produced_decision]->(d:decision)
RETURN m.start_time, m.topic, m.meeting_summary,
       collect(DISTINCT a.title) AS actions,
       collect(DISTINCT p.title) AS problems,
       collect(DISTINCT d.title) AS decisions
ORDER BY m.start_time DESC

One query, one timeline — Acme-relevant meetings only, with every action, decision, and pain point that mentioned them, including the meetings where Acme was discussed but no one from Acme attended.

How invites and attendance are wired

Zoom's API model and the connector's response to it:

Zoom field	Where it comes from	Where it lands in the graph
`meeting.uuid`	`/users/me/meetings?type=past` (per-occurrence id — distinct from `meeting_id` which is shared across a recurring series)	`meeting.name = "zoom:{uuid}"`, `properties.zoom_uuid`
`meeting.host_email`	Same payload	`(person)-[:hosted]->(meeting)` and reverse
`meeting.start_time` / `end_time` / `duration`	Same payload	`meeting.properties.start_time`, `end_time`, `duration_minutes`
`meeting.topic`	Same payload	`meeting.properties.topic` plus a `display_name` decorated with the start timestamp so recurring meetings with identical topics render as distinguishable cards
Participants list	`/past_meetings/{uuid}/participants` (paginated)	`(meeting)-[:had_participant]->(person)` + reverse `(person)-[:attended]->(meeting)`, with `display_name` and `join_time` stamped on the edge

People-resolution uses the same helper Gmail / Calendar / Meet use (upsert_person_by_email), so a Zoom attendee whose email also appears in your inbox or calendar resolves to the same person node — no duplicates, no reconciliation step. Phone-only / anonymous callers (no email) fall back to display-name keying so the relationship still lands; they just won't dedupe across surfaces.

How transcripts and insights are inferred

When the host had cloud recording enabled and Zoom finished processing the VTT transcript, the connector:

Fetches the transcript — GET /meetings/{uuid}/recordings, filters recording_files for file_type == TRANSCRIPT, downloads the .VTT file with the bearer token in the Authorization header (never in the URL — see security notes below).
Parses VTT — strips the WEBVTT header, cue numbers, and timestamp lines; merges consecutive cues from the same speaker so half-sentences don't shred the downstream LLM context. The result is a clean Speaker: line transcript text.
Persists the transcript — stored on meeting.properties.transcript (capped at the max_transcript_chars setting, default 120,000 chars) plus a transcript_char_count count. The transcript URL expires after Zoom deletes the recording, so persisting the parsed text locally is what keeps your graph queryable beyond Zoom's retention window.
Runs an LLM extraction pass — FAST-tier model (Haiku-class by default — Sonnet/Opus is overkill for structured paraphrase + classification). The system prompt instructs the model to be precise, terse, and grounded — never invent participants, projects, or commitments. The model returns one JSON object with six insight arrays (action_items, deliverables, decisions, problems, opportunities, concepts) plus a meeting_summary string.
Wires the insights — each insight becomes a typed node, find-or-create'd by a deterministic content slug (BLAKE2b hash of kind:title.lower()) so re-running the sync on the same transcript collapses to the same nodes instead of duplicating. The corresponding produced_* / surfaced_* / discussed_* edge from the meeting fires last. For action items and deliverables, an assigned_to edge is emitted only when the model-extracted owner email matches an existing person node — hallucinated emails are logged as zoom.owner_unattributed and the edge is skipped.

The system prompt is deliberately bias-leaning toward business use cases — sales calls produce problem / opportunity nodes more often than research calls do — but the open-ended concept category catches everything else. Tested transcripts from research interviews, classroom debates, medical consultations, and architecture reviews all land high-signal concept nodes; they just don't necessarily generate action items.

Settings

Key	Type	Default	Description
`backfill_days`	number	90	How far back to scan on first sync. Subsequent syncs are incremental from `last_synced_at`.
`ingest_transcripts`	boolean	true	Master switch for the transcript pipeline. When false, meetings + participants still ingest; transcripts are skipped entirely.
`infer_insights`	boolean	true	When false, the transcript is still stored on the meeting node as a property but the LLM extraction pass is skipped (saves credits — useful for archival-only setups).
`min_transcript_chars`	number	200	Transcripts shorter than this skip the LLM pass. Standups and 1:1s often fall below this — they still produce meeting + participant nodes, just no insight nodes.
`max_transcript_chars`	number	120,000	Transcripts longer than this are truncated before extraction to keep prompt cost bounded. The cap is in characters, not tokens, because that's what's predictable from the source.

Edit these per-connection in Settings → Connections → Zoom → Settings.

OAuth scopes

The Zoom Marketplace app requests read-only scopes only:

Scope	What it's for
`meeting:read:list_past_instances`	List the authenticated host's past meeting occurrences
`meeting:read:past_meeting`	Pull individual meeting metadata
`meeting:read:list_past_participants`	Pull the participants list per occurrence
`cloud_recording:read:list_user_recordings`	Find the user's cloud recordings
`cloud_recording:read:recording`	Download the transcript file
`user:read:user`	Resolve the connecting user's email for nickname purposes

No write scopes. No scheduling scopes. The connector cannot create, modify, or delete Zoom meetings.

Sync semantics

Incremental cutoff — last_synced_at is the lower bound on subsequent runs; the first sync uses the backfill_days window from the manifest.
Mid-pagination failure handling — if a list call fails part-way through (rate limit, transient 5xx, token revocation), the connection's last_synced_at is left unchanged so the next run retries from the same cutoff. No silent gaps.
Token rotation propagation — Zoom rotates the refresh token on every use. A 401 mid-sync triggers a refresh-and-retry; the rotated credentials propagate forward through participant paging into the transcript download so a single rotation doesn't drop the transcript.
Idempotency — meeting dedup uses zoom:{uuid} as the canonical name (UUID is per-occurrence, so recurring meetings get distinct nodes). Inferred-insight dedup uses a content slug, so re-running the sync on the same transcript converges instead of duplicating.

Security notes

The bearer token is passed in the Authorization header on every Zoom API call and on the transcript download — never embedded in the URL query string. This avoids leaking the token into Nginx / Cloudflare / proxy access logs and Referer headers.
Credentials are encrypted at rest with AES-GCM, scoped to the workspace's encryption key. Disconnecting the connector revokes the grant at Zoom and tears down the local row; opting into Purge nodes also hard-deletes every node and edge ingested from that connection.
The connector never mints a person node from an LLM-extracted email. Owner attribution for action items and deliverables is a lookup against the existing People in the workspace; unmatched emails are preserved as a property on the insight node but do not create new entities.

Get started free · Read the API docs · Connectors overview

Zoom Meetings

Zoom Meetings

On this page