Oxagen Docs

Ingest any Google BigQuery query result into the workspace knowledge graph as typed nodes, with incremental delta loads and cross-domain edges to existing entities from other connectors.

BigQuery

Any SQL result → typed nodes + cross-domain edges into the workspace graph.

The BigQuery connector lets you point the workspace at any analytic table or query and ingest the result as typed nodes. It's the escape hatch for data that doesn't fit any of the purpose-built connectors — internal warehouses, dbt models, event tables, CRM mirrors — and the natural cross-domain bridge: the manifest's edge_mappings knob declares how each row links to entities that already exist in your graph from other connectors.

Authentication is via a tenant-supplied service-account JSON. Queries are dry-run first to forecast bytes billed and aborted when the estimate exceeds the configured cost cap, so a malformed query can't quietly drain your BigQuery budget.

What gets ingested

Each row of your query result becomes one node of the configured type (default bigquery.row, but you can choose any free-form type — convention is to prefix with bigquery.):

Source	Node type	Properties
Query row	`<node_type>` (your choice)	Every selected column, plus `bigquery_external_id` (from your `external_id_column`) for idempotent upserts

name_column controls the human-readable display name; rows fall back to the external id when the configured column is null.

Cross-domain edges

The edge_mappings setting declares how each ingested row links to existing nodes in the workspace — regardless of which connector originally created them:

{
  "edge_type": "paid_to",
  "column": "vendor_email",
  "target_node_type": "person",
  "match_property": "email",
  "match_strategy": "normalized_email"
}

For every row whose vendor_email column matches an existing person node's email property (via the chosen match_strategy), the connector writes a paid_to edge from the new BigQuery-row node to that person. match_strategy can be exact, normalized_email, or normalized_name.

Malformed mappings log and skip rather than aborting the run.

Real use cases

CRM mirror — ingest your crm.opportunities table as bigquery.opportunity nodes, with edge_mappings pointing at account_owner_email → person (so every opportunity is edged to the AE who owns it) and customer_domain → organization (so opportunities cluster by account).
Product event rollups — point at a dbt model that summarizes "weekly active workspaces per customer" and ingest each row as a bigquery.usage_snapshot node with edges to the corresponding organization. The agent can now answer "which customers are growing usage week-over-week?" as a graph traversal.
Vendor → person bridge — pair with Plaid: ingest a vendor CSV from BigQuery and stamp vendor_of edges between bigquery.vendor and existing financial.merchant nodes so every transaction inherits the vendor's organization context.

Settings

Key	Type	Default	Description
`project_id`	string	—	GCP project the query bills to. Must match the service account or grant it BigQuery Job User on this project.
`location`	string	`US`	BigQuery dataset location. Mismatches surface as `location` errors at job submit time.
`sql`	string	—	The query to ingest. Standard SQL only. The connector wraps it with the delta filter when one is configured.
`delta_column`	string	""	Column tracking "what we've already ingested." Next sync filters for rows strictly greater than the stored cursor.
`delta_mode`	list	`timestamp`	One of `timestamp` / `numeric` / `none`.
`node_type`	string	`bigquery.row`	Free-form type tag. Convention: prefix with `bigquery.` so traversals can isolate warehouse-sourced nodes.
`name_column`	string	""	Column whose value becomes the display name. Falls back to external id when blank.
`external_id_column`	string	`id`	Dedup key — upserts on `(workspace, node_type, properties.bigquery_external_id)`.
`edge_mappings`	string[]	`[]`	Cross-domain edge declarations (see above).
`max_rows_per_sync`	number	100,000	Hard cap on rows per run.
`cost_cap_bytes`	number	1 GiB	Per-sync ceiling on bytes billed; dry-run aborts the job if the estimate exceeds this.
`use_query_cache`	boolean	true	Pass `use_query_cache=True` to BigQuery. Cached results hide upstream changes — flip off when the source mutates within the cache TTL.

Authentication

Tenant-supplied service-account JSON. Paste it into the connection wizard; Oxagen encrypts it at rest. The connector requests only the bigquery.readonly scope.

Cost guardrails

Every sync dry-runs the query first via client.query(..., dry_run=True) and reads total_bytes_processed from the estimate. If the estimate exceeds cost_cap_bytes, the run aborts with a clean error before any real bytes are billed. The same cap is also passed as maximum_bytes_billed to the real query, so even a wildly inaccurate dry-run estimate can't blow past the cap.

Get started free · Connectors overview

BigQuery

BigQuery

What gets ingested

Cross-domain edges

Real use cases

Settings

Authentication

Cost guardrails

On this page