Oxagen Docs

BigQuery

Ingest any Google BigQuery query result into the workspace knowledge graph as typed nodes, with incremental delta loads and cross-domain edges to existing entities from other connectors.

Google BigQuery logo

BigQuery

Any SQL result → typed nodes + cross-domain edges into the workspace graph.

The BigQuery connector lets you point the workspace at any analytic table or query and ingest the result as typed nodes. It's the escape hatch for data that doesn't fit any of the purpose-built connectors — internal warehouses, dbt models, event tables, CRM mirrors — and the natural cross-domain bridge: the manifest's edge_mappings knob declares how each row links to entities that already exist in your graph from other connectors.

Authentication is via a tenant-supplied service-account JSON. Queries are dry-run first to forecast bytes billed and aborted when the estimate exceeds the configured cost cap, so a malformed query can't quietly drain your BigQuery budget.

What gets ingested

Each row of your query result becomes one node of the configured type (default bigquery.row, but you can choose any free-form type — convention is to prefix with bigquery.):

SourceNode typeProperties
Query row<node_type> (your choice)Every selected column, plus bigquery_external_id (from your external_id_column) for idempotent upserts

name_column controls the human-readable display name; rows fall back to the external id when the configured column is null.

Cross-domain edges

The edge_mappings setting declares how each ingested row links to existing nodes in the workspace — regardless of which connector originally created them:

{
  "edge_type": "paid_to",
  "column": "vendor_email",
  "target_node_type": "person",
  "match_property": "email",
  "match_strategy": "normalized_email"
}

For every row whose vendor_email column matches an existing person node's email property (via the chosen match_strategy), the connector writes a paid_to edge from the new BigQuery-row node to that person. match_strategy can be exact, normalized_email, or normalized_name.

Malformed mappings log and skip rather than aborting the run.

Real use cases

  • CRM mirror — ingest your crm.opportunities table as bigquery.opportunity nodes, with edge_mappings pointing at account_owner_emailperson (so every opportunity is edged to the AE who owns it) and customer_domainorganization (so opportunities cluster by account).
  • Product event rollups — point at a dbt model that summarizes "weekly active workspaces per customer" and ingest each row as a bigquery.usage_snapshot node with edges to the corresponding organization. The agent can now answer "which customers are growing usage week-over-week?" as a graph traversal.
  • Vendor → person bridge — pair with Plaid: ingest a vendor CSV from BigQuery and stamp vendor_of edges between bigquery.vendor and existing financial.merchant nodes so every transaction inherits the vendor's organization context.

Settings

KeyTypeDefaultDescription
project_idstringGCP project the query bills to. Must match the service account or grant it BigQuery Job User on this project.
locationstringUSBigQuery dataset location. Mismatches surface as location errors at job submit time.
sqlstringThe query to ingest. Standard SQL only. The connector wraps it with the delta filter when one is configured.
delta_columnstring""Column tracking "what we've already ingested." Next sync filters for rows strictly greater than the stored cursor.
delta_modelisttimestampOne of timestamp / numeric / none.
node_typestringbigquery.rowFree-form type tag. Convention: prefix with bigquery. so traversals can isolate warehouse-sourced nodes.
name_columnstring""Column whose value becomes the display name. Falls back to external id when blank.
external_id_columnstringidDedup key — upserts on (workspace, node_type, properties.bigquery_external_id).
edge_mappingsstring[][]Cross-domain edge declarations (see above).
max_rows_per_syncnumber100,000Hard cap on rows per run.
cost_cap_bytesnumber1 GiBPer-sync ceiling on bytes billed; dry-run aborts the job if the estimate exceeds this.
use_query_cachebooleantruePass use_query_cache=True to BigQuery. Cached results hide upstream changes — flip off when the source mutates within the cache TTL.

Authentication

Tenant-supplied service-account JSON. Paste it into the connection wizard; Oxagen encrypts it at rest. The connector requests only the bigquery.readonly scope.

Cost guardrails

Every sync dry-runs the query first via client.query(..., dry_run=True) and reads total_bytes_processed from the estimate. If the estimate exceeds cost_cap_bytes, the run aborts with a clean error before any real bytes are billed. The same cap is also passed as maximum_bytes_billed to the real query, so even a wildly inaccurate dry-run estimate can't blow past the cap.


Get started free · Connectors overview

On this page