Organizing Brownfield Data Across Multiple Plants.
Agentic AI Needs a Semantic Foundation. Here’s Why Retrieval and Reasoning Are Different Problems.
Vector search is a powerful capability for retrieval. As AI agents move from answering questions to executing workflows, organizations are finding that retrieval and reasoning are different problems and that both need to be well-served for agentic AI to scale reliably. This post explains what each layer contributes and why Databricks + Kobai addresses both.
Vector search has become the default grounding mechanism for enterprise AI. Connect a model to a vector store, embed your documents, retrieve the most similar chunks at query time and the model has access to private data. It is a genuine improvement over ungrounded generation, and for many use cases it is the right starting point.
Agentic AI, however, operates differently. An agent does not just answer a question and hand control back to the user. It takes a sequence of actions: gathers context, makes decisions, calls tools, triggers workflows, and produces outputs that drive real-world consequences. At each step, the quality of the action depends on how well the agent understands not just what the data says, but what it means — how the entities involved relate to each other, what rules apply, and what constraints govern the decision.
Retrieval and reasoning are different problems. Vector search addresses the first very well. Shared semantic context addresses the second. For agents operating across multi-step workflows, both matter because the quality of each action depends not just on finding the right information, but on understanding what it means and how the entities involved relate to each other.
|
Vectors find information. Semantics make it usable. Enterprise agentic AI benefits from both Databricks Vector Search for retrieval and discovery, Kobai’s semantic intelligence for reasoning, context, and governance. |
Retrieval and reasoning: where vector search excels, and where semantic context helps
Vector search works by encoding data as numerical embeddings that capture semantic similarity. When a query arrives, it is encoded in the same space and the closest matching records are returned. It is fast, scalable, and effective at surface-level semantic matching — finding documents, passages, and records that are topically related to a question.
For agentic AI operating across complex, multi-step workflows, there are three areas where semantic context can significantly improve on retrieval alone:
1. Retrieval by similarity, not by meaning
Vector search returns records that look semantically similar to the query. It does not determine whether those records are from the authoritative source, whether the retrieved information is current, or whether the entity referred to in the query is the same entity as the one in the retrieved record. Two engineers with similar job titles may produce similar embeddings even if they are qualified for entirely different asset classes. An agent relying on vector retrieval may act on the wrong entity without knowing it.
2. Entity relationship traversal
The most consequential agentic decisions involve multiple connected entities. Which engineers are certified for this asset class, available on this date, and within travel range of this site? Which counterparties carry exposure above the regulatory threshold under this contract type? These questions require traversing declared relationships between entities, not finding documents that mention the relevant terms. Vector search alone can struggle with this class of question, which is where declared entity relationships in a semantic model provide more reliable, governed answers.
3. Context consistency across agent steps
In a multi-step agent workflow, each retrieval call produces its own context window. Without a shared semantic model governing what entities mean and how they relate, the context assembled at step two may be inconsistent with the context assembled at step five. The agent may resolve “customer” differently across steps, or encounter a relationship at one step that conflicts with a constraint retrieved at another. Many organizations find that shared semantic context can significantly improve agent consistency across complex workflows.
|
The compounding error problem In single-turn Q&A, a retrieval error produces a wrong answer. In agentic workflows, a retrieval error at step N produces a wrong action at step N, which propagates as corrupted context to step N+1, which takes a wrong action based on corrupted context, which propagates further. The accuracy requirements for agentic AI are substantially higher than for Q&A AI precisely because errors compound rather than terminate. |
What shared business context adds for agentic AI
Shared business context (a formal definition of enterprise entities, their relationships, and the rules that govern them) addresses the limitations above at the architectural level rather than the retrieval level. It does not replace vector search; it complements it.
The practical framing from the Databricks + Kobai architecture is straightforward: vectors find information; semantics make it usable. Databricks Vector Search handles similarity retrieval and discovery. Kobai’s semantic layer adds the entity context, relationship traversal, and governed reasoning that agents need to act reliably on what they find.
Entities and relationships are declared, not retrieved
In a semantic model, the fact that Engineer J. Santos is certified for Turbine Model GE-7F is not a passage to be retrieved — it is a declared relationship that the agent can traverse directly. When the agent asks “who can service this asset?” it follows a declared relationship chain rather than relying on a document being embedded close enough to the query to surface. The result is typically more reliable and more traceable than similarity retrieval alone.
For agents that take real-world actions, this distinction matters. An agent that dispatches a maintenance engineer based on a retrieved document mentioning certification is more likely to act on incomplete or ambiguous information than one that traverses a declared certification relationship validated against the ontology.
Business rules constrain agent behaviour
Enterprise operations are governed by rules that exist above the data level. A contract may not be modified without approval from a specific role. An asset may not be taken offline during a regulatory inspection window. An exposure may not exceed a defined threshold without triggering a review. These rules are not typically embedded in documents that vector search can retrieve. They belong in the semantic model, where they constrain agent behaviour at decision time rather than being surfaced as context the agent may or may not retrieve.
Consistent context across the full workflow
A shared semantic model provides the same definitions to every step in the agent workflow. “Customer” means the same thing at step one and step six. “Active contract” carries the same definition across every tool call. The agent operates from a shared ground truth rather than assembling a local interpretation of the schema at each step. Shared semantic context can significantly improve agent consistency and reliability, particularly in cross-domain workflows where entity definitions span multiple systems.
Governance and traceability extend to agent actions
When an agent acts on semantically grounded context, the reasoning chain behind each action is inspectable. Compliance teams can trace an agent’s decision back through the semantic query, through the entity traversal, to the specific governed data that informed it. For enterprises in regulated industries and for any organization that needs to audit why an agent took a particular action, this traceability is the practical implementation of AI accountability.
|
Enterprise AI quality improves with consistent semantic context. Shared business meaning is what allows agents to act reliably, repeatedly, and traceably across complex multi-step workflows. |
The hybrid AI pattern: Vectors + Semantics
The answer for enterprise agentic AI is not to replace vector search with semantic context, or vice versa. It is to use each for what it is best suited for and to combine them in a pattern that takes advantage of both.
|
Capability |
What it addresses well |
Where the other layer adds value |
|
Vector search (e.g. Databricks Vector Search) |
Similarity-based retrieval: finding documents, passages, and records topically related to a query. Discovery across large unstructured data. Fast recall at scale. |
Entity relationship traversal. Declared constraint checking. Consistent definitions across multi-step workflows. Governed, auditable reasoning chains. |
|
Semantic context (e.g. Kobai on Databricks) |
Declared entity relationships. Business rule enforcement. Cross-domain reasoning. Consistent definitions across agent workflow steps. Traceable, governed context for every action. |
Open-ended discovery across unstructured data. Surface-level semantic matching. High-recall retrieval where all relevant records are unknown in advance. |
|
Combined (Databricks + Kobai) |
Similarity retrieval surfaces candidate entities and relevant documents. Semantic context validates, constrains, and enriches agent reasoning. LLMs and agents reason using governed context from both layers. |
Enterprise agentic AI benefits from both working together. Each layer addresses the challenges the other is less suited for. |
In the Databricks + Kobai architecture, this pattern is operationalized natively. Databricks Vector Search handles similarity retrieval across Lakehouse data. Kobai’s semantic layer — built within the Databricks Lakehouse under Unity Catalog governance — provides entity context, relationship traversal, and governed business rules. Agents access both layers through the Kobai SDK, with all queries executing on Databricks compute under the same access controls.
What this looks like in practice: A human-supervised agent workflow
Consider a human-supervised agent workflow at an energy operator, a scenario where the distinction between retrieval and reasoning becomes operational.
|
Agent task A sensor anomaly on Gearbox G-07 at Site B triggers an agent-assisted maintenance workflow. The agent’s role: identify the most appropriate engineer to recommend, confirm parts availability, and draft a work order for planner review before the upcoming peak generation window. A human planner approves the final dispatch. |
|
Agent step |
Vector search approach |
Semantic context approach |
|
Identify qualified engineers |
Retrieve documents mentioning gearbox maintenance and engineer names. Model infers certification from text. Surface-level matches may return engineers without the relevant qualification. |
Traverse declared certification relationships: Gearbox Model G-07 → Asset Class → Engineers certified for this class. More reliable and governed by the ontology. |
|
Check availability |
Retrieve scheduling documents or calendar entries for retrieved engineer names. No guarantee of completeness or currency. |
Query declared assignment relationships and schedule constraints directly. Agent receives current availability from the semantic model, not from retrieved text. |
|
Confirm parts availability |
Retrieve documents mentioning part numbers and depot inventory. May return stale or incomplete records. |
Traverse parts-to-depot relationship in the semantic model. Inventory status from governed Lakehouse data. |
|
Apply operational constraints |
Retrieve documents about the peak generation window and maintenance scheduling policies. Agent must synthesize rules from retrieved text with risk of missed or misapplied constraints. |
Business rules — maintenance window constraints, generation period protections — are declared in the ontology. Agent behaviour is constrained by declared rules, not inferred from retrieved documents. |
|
Draft work order for planner review |
Agent produces work order based on synthesized context. Lineage from decision to source data is limited. |
Work order drafted from governed semantic queries. Every step of the reasoning chain is traceable to governed source data in the Lakehouse. Planner reviews and approves. |
Across this workflow, the distinction is reliability and traceability. The vector search approach produces a plausible workflow. The semantic context approach produces a more governed and more traceable one. For a human-supervised agent workflow that informs real-world actions, recommending an engineer, confirming parts, drafting a compliance record, that difference is operationally meaningful.
What shared business context adds for Agentic AI
Shared business context — a formal definition of enterprise entities, their relationships, and the rules that govern them — addresses the reasoning challenges above at the architectural level. It does not replace vector search; it complements it.
The practical framing from the Databricks + Kobai architecture is straightforward: vectors find information; semantics make it usable. Databricks Vector Search handles similarity retrieval and discovery. Kobai’s semantic layer adds the entity context, relationship traversal, and governed reasoning that agents need to act reliably on what they find.
Three requirements for enterprise-grade Agentic AI
Based on what agentic AI actually requires to operate reliably at enterprise scale, three capabilities need to be in place before agents are deployed for consequential workflows.
1. A shared semantic model that agents can query programmatically
The semantic model needs to be accessible to agents through an API or SDK, not just through a natural language interface designed for human users. Agents need to traverse entity relationships, apply constraint checks, and retrieve governed business rules as part of their execution flow. In the Databricks + Kobai architecture, this access is provided through the Kobai SDK, which exposes governed semantic context to agents running within the Databricks environment.
2. Governance that extends to agent actions
An agent that can access data that its operator cannot should not exist. Governance policies that apply to human users must apply equally to agents operating on their behalf. When Kobai’s semantic layer is built within the Databricks Lakehouse under Unity Catalog governance, SSO passthrough ensures that every agent query executes with the identity and access permissions of the user or service account that authorized the agent. There is no privilege escalation at the semantic layer.
3. Traceable reasoning chains for every agent action
As enterprises deploy agents for consequential decisions — maintenance scheduling, risk assessment, procurement approvals — the question of why the agent took a particular action becomes a governance and compliance requirement. The semantic context that drives each step of an agent’s reasoning needs to be logged and auditable. Kobai’s Episteme module provides graphical lineage from agent action back through the semantic query to the governed source data, producing an audit trail as a native output of the agent’s execution.
|
The agentic AI governance test Before deploying an agent for a consequential workflow, ask: if this agent takes a wrong action, can we trace exactly what context it was operating from, which entity relationships it traversed, and which rules it applied or missed? If the answer is no, the agent’s context is not sufficiently governed for enterprise deployment. Shared semantic context is the architectural condition that makes the answer yes. |
Databricks + Kobai: Governed semantic intelligence for enterprise agents
Kobai extends the Databricks Lakehouse with the shared semantic intelligence that agentic AI requires to operate reliably at enterprise scale. Graph structures are built directly within Databricks under Unity Catalog governance. Semantic context is available to agents through the Kobai SDK, with all queries executing on Databricks compute. Governance, lineage, and access controls apply consistently to every agent action.
The pattern is complementary: Databricks Vector Search for similarity retrieval and discovery; Kobai’s semantic layer for entity context, relationship traversal, and governed business rules; LLMs and agents reasoning over both to produce actions that are grounded, consistent, and traceable.
For enterprises deploying agents on Databricks — in operations, finance, compliance, or commercial workflows — this combination provides the semantic foundation that determines whether agentic AI scales reliably or accumulates errors that compound across every workflow it touches.
|
To explore how semantic intelligence on the Databricks Lakehouse supports enterprise agentic AI, visit kobai.io or contact us at contact@kobai.io. |

