KobaiMay 21, 2026 1:53:05 AM9 min read

Building a Knowledge Graph on Databricks: Architecture Walkthrough with Kobai Saturn

18:26

A step-by-step walkthrough of how Kobai Saturn brings graph capability into the Databricks Lakehouse from raw Delta tables through to governed graph queries, published SQL views, and downstream intelligence.

Knowledge graphs have a reputation for being operationally complex. Much of that reputation comes from architectures that require data to be exported to a standalone graph platform — introducing a separate system to provision, a separate governance model to maintain, and a synchronization pipeline between the two.

Kobai Saturn brings graph capability into the Databricks Lakehouse architecture. The graph representation is built within Databricks, as Delta tables within Unity Catalog. Graph traversals execute on Databricks compute. Saturn operates within the Databricks governance and compute model, rather than introducing a separate graph platform with a separate control plane.

This post walks through how that works — the architecture, the build process from raw data to deployed knowledge graph, and the outputs Saturn produces for analytics, AI, and downstream teams.

Saturn’s central principle: graph capability should be built inside the Databricks Lakehouse using Delta tables, Databricks compute, and Unity Catalog governance rather than requiring data to be exported to a separate graph platform.

Architecture: How Saturn fits within Databricks

Saturn is a graph layer that operates within the Databricks Data Intelligence Platform. Understanding the key components and their relationship to Databricks infrastructure clarifies what Saturn adds and what it inherits.

Component	What it is	How it relates to Databricks
Delta Lake tables	Source data in open Delta format, managed by Unity Catalog.	Source data remains governed in the Lakehouse. Saturn reads from these tables to build the graph representation. The source tables are not modified.
Saturn Graph Schema	The graph representation materialized as Delta tables within Unity Catalog — entity instances, relationship instances, and semantic metadata.	Created as a Unity Catalog schema. Governed by the same access controls, lineage tracking, and policies as all other Databricks assets.
Saturn Graph Engine	Translates graph traversals and semantic queries into optimized SQL and Spark. Executes on Databricks compute clusters.	Uses Databricks compute natively. Separate compute clusters for ingestion and query workloads. No external compute infrastructure required.
Unity Catalog integration	Centralized security, lineage, and governance. SSO passthrough ensures user identity is preserved to the Databricks backend.	Saturn inherits Unity Catalog governance. Access controls defined in Unity Catalog apply to all graph queries without additional configuration.
Published query views	SQL-accessible views generated from graph traversals, published as Unity Catalog objects.	Standard Unity Catalog views. Discoverable and consumable by any SQL-compatible tool in the Databricks ecosystem — Notebooks, BI tools, Genie, and APIs.

Ontology standards and interoperability

Saturn is designed to support interoperability with standards-based ontology approaches where required, including through integrations and import/export patterns. For organizations working with industry ontology frameworks — in financial services, energy, or aerospace — this allows existing semantic model investments to be carried forward rather than rebuilt.

The build process: From raw data to deployed knowledge graph

Saturn’s build process follows a structured workflow. The steps are sequential for the initial build and designed for incremental extension once the foundation is in place.

Step 1: Define the target questions and scope

Before any ontology is designed, the team identifies 3–5 business questions the knowledge graph needs to answer. These become the acceptance criteria. Starting with specific questions rather than a broad data model keeps the initial build focused and produces demonstrable value quickly.

Deliverable: a written scope with target questions, the entities those questions involve, and the source systems that hold relevant data.

Step 2: Author the semantic model in Kobai Studio

Domain experts use Kobai Studio’s no-code visual environment to define entity types, relationship types, properties, and constraints. No specialist graph language is required. The resulting ontology is stored in the Saturn Graph Schema as a Unity Catalog object, versioned automatically, and governed by role-based access control.

Studio runs within the Databricks Workspace. Ontology changes take effect without a separate deployment cycle.

Step 3: Map source data to the semantic model using Precursor

Kobai Precursor is an AI-assisted data mapping tool that analyzes source Delta tables and recommends mappings to semantic model entities. Domain experts review and approve suggestions, with the option to override or extend. Precursor reduces the data engineering effort that graph builds typically require.

Mappings are stored as configuration in the Saturn Graph Schema. Source tables are not modified. The mapping is a logical declaration of how source columns correspond to semantic entities.

Step 4: Build the graph representation

Saturn reads from mapped source Delta tables and materializes the graph representation — entity instances, relationship instances, and semantic metadata — as Delta tables within the Saturn Graph Schema in Unity Catalog. Separate compute clusters handle ingestion and query workloads independently, so build jobs do not affect query performance.

The graph is materialized inside Databricks. Source data remains governed in the Lakehouse. Kobai does not require exporting data to a separate graph platform.

Step 5: Author queries and publish SQL views

With the graph built, authorized users author graph queries in Kobai Studio using a visual interface. The underlying execution is SQL/Spark on Databricks compute. Completed queries are published as SQL-accessible views derived from the semantic model — consumable from Databricks Notebooks, BI tools, Genie, and APIs without requiring knowledge of graph traversal.

Saturn publishes four categories of output: (1) Query views — traversal results as SQL views; (2) Ontology views — entity and relationship definitions as queryable metadata; (3) Semantic vectors — embeddings from graph-aligned data for vector search and RAG; (4) Graph frames — Spark DataFrames accessible from Notebooks for ML workflows.

Step 6: Deploy to the intelligence layer

Published views are connected to downstream consumers: Kobai Tower for visual graph exploration, Kobai Episteme for AI-assisted Q&A with graphical lineage, Databricks Genie spaces via the Kobai SDK, and Databricks Notebooks for ML workflows. All connections inherit Unity Catalog governance and SSO passthrough.

All intelligence layer access routes through Databricks compute and Unity Catalog enforcement. There is no elevation of privilege at the semantic layer.

What Saturn produces

Saturn’s query publishing layer produces four output types that integrate with existing Databricks tooling without requiring new tools or data access patterns.

Output type	What it is	Who consumes it
Query views	SQL-accessible views generated from graph traversals and published as Unity Catalog objects. A query view for “certified engineer availability” returns a table assembled through graph traversal but queryable as a standard SQL view.	Data analysts via Databricks SQL or BI tools; Genie spaces via the Kobai SDK; dashboards and reporting pipelines. Any SQL-compatible tool consumes a query view without requiring knowledge of the underlying graph traversal.
Ontology views	SQL-accessible views representing the semantic model itself — entity type definitions, relationship type definitions, and property schemas. These make the ontology queryable and auditable.	Governance and data management teams; the Episteme AI module for question-to-query matching; automated documentation tools.
Semantic vectors	Vector embeddings generated from graph-aligned data, reflecting semantic meaning rather than raw text similarity. Generated by Saturn from the materialized graph representation.	Databricks Vector Search for RAG-based AI; Episteme for GraphAI question matching; data science teams building hybrid retrieval workflows.
Graph frames	Spark DataFrames containing the graph representation — entity instances, relationship instances, and semantic metadata — accessible directly from Databricks Notebooks via supported Kobai APIs.	Data scientists and ML engineers working in Databricks Notebooks. Graph frames can be used as training data, feature sources, or inputs to network analysis. Accessed through the Kobai Python SDK.

Compute model and developer access

Separated ingestion and query compute

Saturn maintains separate compute clusters for graph ingestion (building and updating the graph representation) and graph queries (traversals at runtime). A large ingestion job — onboarding a new data source or refreshing entity instances — does not affect the performance of live queries. Each workload type can be scaled independently. Multiple use cases running against the same graph can have isolated compute allocations.

Developer access through standard Databricks tooling

Developers access Saturn outputs through standard Databricks tooling. Published query views and ontology views are accessible from Databricks SQL, Notebooks, and any SQL-compatible BI tool as Unity Catalog objects. Graph frames are accessible from Notebooks via supported Kobai Python SDK calls. Genie spaces connect to Saturn’s semantic context through the Kobai SDK. All access routes through Databricks compute and Unity Catalog governance.

For teams building AI workflows, semantic vectors published by Saturn are compatible with Databricks Vector Search, enabling graph-grounded RAG without exporting data from the Lakehouse. For agent-driven workflows, the Kobai SDK exposes governed graph traversal to autonomous AI agents operating within the Databricks environment.

Standalone Graph platforms vs Saturn within Databricks

The architectural distinction between deploying a standalone graph platform alongside the Lakehouse and using Saturn within it has practical operational consequences.

Consideration	Standalone graph platform	Saturn within Databricks
Data location	Data typically exported to or replicated into the graph platform’s own store.	Graph representation materialized inside Databricks as Delta tables. Source data remains governed in the Lakehouse.
Governance model	Separate access control configuration required. Must be kept in sync with the source platform.	Unity Catalog governance inherited. SSO passthrough preserves user identity. Access policies apply consistently.
Compute	Typically managed separately from the Lakehouse compute environment.	Saturn runs on Databricks compute clusters. Ingestion and query workloads managed within the Databricks environment.
Query interface	Often requires a graph-specific query language (e.g. Cypher, SPARQL).	Traversal results published as standard SQL-accessible views. Consumable from any SQL-compatible tool.
ML/AI integration	Typically requires data movement or API calls across system boundaries.	Graph frames and semantic vectors available within the Databricks environment through supported Kobai APIs.

A practical example: Phased pilot on an existing Databricks Lakehouse

To make the walkthrough concrete: a wind energy operator builds a maintenance knowledge graph on an existing Databricks Lakehouse. Starting state: Delta tables containing asset records, maintenance work orders, engineer certifications, inventory records, and operational schedules — governed by Unity Catalog, but with no shared semantic model connecting them.

Phase	Steps	What happens	Output
PilotWeek 1	Scope + ontology	Reliability engineer and planner identify 3 target questions. Ontology authored in Studio: asset, engineer, certification, part, work order, schedule entities and relationships.	Ontology stored in Unity Catalog. Target questions defined as acceptance criteria.
PilotWeek 1–2	Mapping + ingestion	Precursor analyses source Delta tables and recommends entity-to-column mappings. Reviewed and approved. Saturn builds the graph representation from mapped source tables.	Graph schema populated. Entity and relationship instances materialized as Delta tables within Unity Catalog.
PilotWeek 2–4	Queries + deploy	Query views authored in Studio for each target question. Episteme connected for AI Q&A. Genie space created via Kobai SDK. Tower deployed for graph exploration.	Query views published. Intelligence layer deployed. All consumer types connected with Unity Catalog governance validated end-to-end.

An initial pilot that starts delivering value in 2–4 weeks is achievable for organizations with clean Delta tables and engaged domain experts. The phased approach — scope and ontology first, then mapping and ingestion, then query and deployment — produces a working intelligence layer before the full graph scope is defined.

Graph capability inside the Lakehouse you already run

Saturn is designed for teams that have invested in the Databricks Lakehouse and want to extend it with knowledge graph and semantic AI capabilities. The graph representation is built and governed within Databricks. Source data remains in the Lakehouse. The governance model already in place extends to every semantic query.

If your team is exploring knowledge graph capabilities on Databricks, the Genie Spaces Accelerator Kit and the Semantic Graph Pilot provide structured ways to get started through the Databricks Marketplace. Both are supported by Kobai engineers who work alongside your team through the build and deployment process.

To explore Kobai Saturn in your Databricks environment, visit kobai.io or reach us at contact@kobai.io.

Kobai

COMMENTS

Building a Knowledge Graph on Databricks: Architecture Walkthrough with Kobai Saturn

RELATED ARTICLES