Organizing Brownfield Data Across Multiple Plants.
Knowledge Graphs 101: What They Are, What They’re Not, and When You Need One
The term “knowledge graph” gets used to describe everything from a simple lookup table to a multi-billion-node AI system. This guide cuts through the noise, plainly explaining what knowledge graphs are, busting the most common misconceptions, and giving you a clear decision framework for when one genuinely belongs in your architecture.
Ask five people what a knowledge graph is and you will likely get five different answers. A data engineer might describe it as a graph database. A machine learning team might call it a way to reduce hallucinations in AI. An enterprise architect might frame it as a unified metadata model. A consultant might point to Google’s Knowledge Panel as the canonical example.
All of them are partly right. None of them gives the complete picture. And the confusion matters, because organisations are making significant architectural decisions based on incomplete or inaccurate mental models of what a knowledge graph actually is.
This post is a plain-English guide. We will explain what knowledge graphs are, from first principles. We will address the five most common misconceptions directly. We will show you the real-world scenarios where a knowledge graph is the right tool and the ones where it is not. And at the end, we will show you how to get knowledge graph capabilities on the data platforms you already run, without building a new silo.
|
A knowledge graph is not a product you buy. It is an architectural pattern — a way of representing data as interconnected entities with explicit meaning. You can implement it on many different technology foundations, including the data lakehouse you may already operate. |
PART ONE
What a Knowledge Graph actually is
A knowledge graph is a structured representation of real-world entities and the relationships between them. It answers two questions about every piece of data: what is this thing, and how does it relate to everything else?
The word “graph” here is used in the mathematical sense — a structure made of nodes (the entities) and edges (the relationships between them). The word “knowledge” is the important part: it means those nodes and edges carry explicit meaning, not just identifiers.
Every knowledge graph has three fundamental building blocks:
- Entities: the real-world things your business cares about. Customers, assets, products, engineers, suppliers, contracts, locations. Anything that can be named and described.
- Relationships: how those entities connect to each other. An engineer is certified for a turbine model. A turbine is located at a wind farm. A product is supplied by a vendor. These are not just foreign keys — they are named, typed connections that carry meaning.
- Ontology: the formal definition of what each entity type means, what properties it has, and what kinds of relationships it can participate in. The ontology is the blueprint; the knowledge graph is what you build on top of it.
The classic way to represent a knowledge graph fact is as a triple: Subject — Predicate — Object. For example:
|
Subject (Entity) |
Predicate (Relationship) |
Object (Entity) |
|
Engineer: J. Santos |
is certified for |
Turbine Model: GE-7F |
|
Turbine: T-4421 |
is located at |
Wind Farm: Site B, Texas |
|
Component: Bearing #88X |
is part of |
Engine Assembly: EA-20 |
|
Supplier: AcmeSteel Co. |
manufactures |
Part: Bearing #88X |
|
Work Order: WO-9981 |
is assigned to |
Engineer: J. Santos |
What makes these triples powerful is not any single fact in isolation. It is the ability to traverse them to start at one entity and follow relationships across the graph to arrive at an answer. Which engineers can service Turbine T-4421? Start at the turbine, find its model, find who is certified for that model. That traversal is trivial in a knowledge graph. In a relational database, it requires reconstructing that path through joins, and the path itself must have been anticipated when the schema was designed.
|
The critical distinction: storage vs. meaning A relational database stores data efficiently. A knowledge graph represents what that data means and how the things it describes relate to each other. Neither replaces the other. They are complementary layers. The most effective modern data architectures use a data platform for storage and governance, and a knowledge graph layer for meaning and relationship traversal. |
PART TWO
What a Knowledge Graph is not
The misconceptions around knowledge graphs are at least as damaging as the lack of understanding. Here are the five we encounter most often.
Misconception 1: A knowledge graph is a type of database
This is the most common and most consequential confusion. A knowledge graph is not a database product. It is a data modelling pattern and a way of structuring and representing information. You can implement a knowledge graph on top of a graph database, but you can also implement one on top of a relational database, a document store, or (as Kobai does) directly on a data lakehouse using Delta tables.
The database is the infrastructure. The knowledge graph is the semantic model that runs on top of it. Conflating the two leads organizations to assume they must buy and operate a dedicated graph database to get knowledge graph capabilities. That assumption is not only wrong, it is also expensive.
Misconception 2: A knowledge graph is only for AI and machine learning
Knowledge graphs pre-date the current wave of enterprise AI by many years. Search engines, encyclopaedias, recommendation systems, and fraud detection networks were using knowledge graph patterns long before large language models became mainstream. The value of a knowledge graph for human analysts, being able to navigate complex entity relationships visually and answer multi-hop questions, is independent of any AI capability.
That said, knowledge graphs have become significantly more valuable in the AI era. They are the most reliable mechanism for grounding AI answers in a governed, structured enterprise context. GraphRAG and GraphAI (which we cover in detail in a separate post) demonstrate that combining knowledge graphs with large language models produces materially better answers than either approach alone. But the knowledge graph does not exist to serve AI but it exists to represent reality, and AI benefits from that representation.
Misconception 3: A knowledge graph is the same as a semantic layer
These two terms are often used interchangeably, but they describe different things. A semantic layer is a business abstraction on top of data that standardizes metric definitions, business terms, and query patterns. Tools like dbt, Databricks Business Semantics, and AtScale all provide semantic layers in this sense.
A knowledge graph is a different kind of semantic layer: it models real-world entities and their relationships, not just metric definitions. The distinction matters because a semantic layer optimized for KPIs will tell you that “revenue” means net_revenue in every system. A knowledge graph will also tell you that a specific customer is a subsidiary of a parent account, which is managed by a specific account team, which has a specific contract expiry date. The latter enables a much richer class of reasoning.
A well-designed enterprise data architecture will often include both: an analytics semantic layer for metric consistency, and a knowledge graph layer for entity relationships and cross-domain reasoning.
Misconception 4: Building a knowledge graph requires a data science team
This misconception comes from the historical experience of building knowledge graphs: it was genuinely hard. Defining ontologies in RDF/OWL notation, loading data into triple stores, writing SPARQL queries — these were specialized skills that required dedicated data engineering capacity.
Modern knowledge graph platforms have changed this picture significantly. No-code ontology modelling tools allow domain experts such as the operations manager, the supply chain analyst, the reliability engineer, etc. to define entities and relationships visually without writing code. The technical plumbing (ingestion, indexing, query translation) happens automatically. This shift is important because the people who best understand what the entities mean and how they relate to each other are domain experts, not data scientists.
Misconception 5: A knowledge graph is a project, not a product
Historically, knowledge graph initiatives were large, bespoke engineering projects, and multi-year endeavours that rarely delivered on their promises. This history has made many organizations sceptical. But it reflects the state of the technology a decade ago, not today.
Modern approaches start with a narrow, high-value domain (a single asset class, a specific product line, a defined customer segment) and expand incrementally. A semantic model for one domain takes weeks, not years. And because the model is designed to be extended, every new domain added increases the value of everything already in the graph through network effects.
PART THREE
Knowledge graphs vs. relational databases: What’s actually different
This comparison is the one that matters most for most organizations, because the relational database or its modern equivalent, the data warehouse or lakehouse is where the vast majority of enterprise data already lives. The question is not “should we use a knowledge graph instead of a relational database?” It is “what does a knowledge graph add on top of one?
|
Dimension |
Relational / Lakehouse |
Knowledge Graph Layer |
|
Primary purpose |
Store and process data at scale |
Represent meaning and relationships between entities |
|
Data model |
Tables, rows, columns, foreign keys |
Nodes (entities), edges (relationships), ontology |
|
Relationships |
Reconstructed through JOINs at query time; must be anticipated in schema design |
Declared explicitly; traversed directly; schema-independent |
|
Multi-hop queries |
Complex; performance degrades with depth; requires careful index design |
Native; traversing multiple relationship hops is a first-class operation |
|
Meaning |
Implicit in column names and schema conventions |
Explicit in the ontology; defined once, available everywhere |
|
Schema changes |
Disruptive; breaking schema changes cascade through downstream queries |
Additive; new entity types and relationships extend the model without breaking existing queries |
|
Governance |
Row/column-level access controls; lineage tracked at table level |
Inherits platform governance; semantic lineage traceable to specific entities and relationships |
|
Who authors it |
Data engineers define schema and transformations |
Domain experts define entity types and relationships; engineers wire up data sources |
The practical implication of this table is that relational databases and knowledge graphs solve different problems and are at their best when used together. The relational layer stores data with scale, performance, and governance. The knowledge graph layer makes that data traversable and meaningful. Building the second on top of the first without moving data is the architecture that gets the most from both.
|
You do not have to choose between a relational database and a knowledge graph. The modern pattern is to keep data in your governed data platform and express meaning and relationships as a semantic layer on top. Data stays in place. Governance is inherited. Knowledge graph capabilities are added without creating a new silo. |
PART FOUR
The core components of a knowledge graph
A knowledge graph is not a single thing. It is a combination of several components that together produce the “connected intelligence” effect. Understanding each one helps clarify what you are building and what you are not.
|
Component |
What it is and why it matters |
|
Ontology |
The formal definition of your entity types, relationship types, and constraints. Think of it as the schema for your knowledge graph except that it describes business reality ("an Engineer can be certified for an Asset Class") rather than technical storage. The ontology is what makes the graph semantic rather than just structural. |
|
Entities |
The actual instances of your defined types — specific engineers, specific assets, specific customers, specific work orders. These are the nodes in the graph. Each entity inherits the definition of its type from the ontology. |
|
Relationships |
The named connections between entities. Unlike a foreign key, a relationship in a knowledge graph has a type ("certified for", "located at", "supplies", "reports to"), a direction, and can carry properties of its own. Relationships are first-class citizens, not just structural plumbing. |
|
Properties |
Attributes that describe entities and relationships. A Turbine entity might have properties for model number, installation date, and rated capacity. A "supplies" relationship might carry a lead time and a contract reference. |
|
Inference rules |
Optional but powerful: logical rules that derive new relationships from existing ones. If Engineer A is certified for Asset Class X, and Asset B belongs to Asset Class X, then Engineer A is qualified to service Asset B. Inference rules let the graph reason about what it knows. |
|
Query layer |
The mechanism for traversing and interrogating the graph. This might be SPARQL (for RDF graphs), Cypher (for property graphs), or in a lakehouse-native implementation which is a translation layer that converts graph traversals into optimized SQL. The query layer is what makes the graph usable by AI systems, BI tools, and human analysts. |
PART FIVE
When you need a knowledge graph and when you don’t
The most useful guidance we can offer is a clear decision framework. Knowledge graphs are powerful, but they are not the right tool for every problem. Here is how to tell the difference.
Strong signals that a knowledge graph belongs in your architecture
Consider a knowledge graph when your data challenges exhibit one or more of the following characteristics:
- Your questions span multiple domains. When answering a single business question requires connecting data from multiple systems including operations and finance, customers and supply chain, assets and people, a knowledge graph layer provides the unified semantic model to traverse those connections without reconstructing them through brittle joins.
- Relationships between entities are more important than the entities themselves. In fraud detection, the suspicious pattern is not a single transaction but the network of connections between accounts, devices, and people. In aerospace traceability, what matters is not the part specification alone but the chain from design to manufacture to installation to inspection. When the relationship is the insight, a graph is the right data structure.
- Your schema changes frequently and you need flexibility. Relational schemas are rigid by design such as adding a new entity type or relationship type requires schema migration and downstream query updates. Knowledge graphs are additive: you extend the ontology without breaking existing queries. Organizations with fast-changing data models benefit materially from this flexibility.
- You need multi-hop reasoning. How many hops separate this supplier from our most critical product line? Which components share a batch with the part that failed? Which regulatory clauses apply to this maintenance action, given the asset class, the operator certification, and the jurisdiction? These questions require traversing multiple relationships in sequence. Graph traversal is designed for this; SQL JOINs are not.
- AI systems need to reason across your enterprise, not just retrieve documents. RAG-based AI is good at finding relevant documents. It is less good at reasoning across connected entities. If you are building AI copilots, agents, or decision support systems that need to understand how your business works, not just what documents say about it. A knowledge graph provides the structured context that makes that reasoning reliable.
- Governance and traceability matter for your AI outputs. In regulated industries, AI answers must be traceable to their sources. A knowledge graph layer that inherits your data platform’s governance and records every traversal provides the audit trail that compliance requires.
|
Strong signal: you likely need a knowledge graph |
Weaker signal: other tools may serve you better |
|
Questions span 3+ systems or data domains |
Questions are confined to a single system or domain |
|
You need to traverse entity relationships (multi-hop) |
You primarily aggregate, filter, and calculate metrics |
|
Your business model or data schema is evolving rapidly |
Your schema is stable and well-understood |
|
AI systems need to reason about how things connect |
AI systems need to retrieve specific documents or records |
|
Governance and explainability of AI answers is required |
Governance requirements are met at the data platform level |
|
Domain experts need to define and own meaning |
Meaning can be captured in schema conventions and documentation |
When a knowledge graph is not the right answer
It is equally important to be clear about when a knowledge graph is unlikely to add value:
- When your primary need is consistent metric definitions. If the goal is to ensure that “monthly active users” means the same thing in every BI tool, you need an analytics semantic layer, not a knowledge graph. Tools like dbt or Databricks Business Semantics are designed for exactly this.
- When you have a single, well-defined query pattern. If every question your stakeholders ask follows the same shape, a well-designed relational view or materialized query will be faster and simpler to maintain.
- When data volume is massive and latency requirements are sub-millisecond. High-frequency, low-latency transactional systems are not where knowledge graphs shine. They are optimized for complex reasoning across moderate data volumes, not for processing millions of events per second.
- When your organisation is not ready to invest in semantic modelling. A knowledge graph is only as good as the ontology behind it. If your team does not have the time or willingness to formally define entities and relationships, the graph will not deliver on its promise.
PART SIX
Knowledge graphs in the real world: Six use cases that work
The following use cases consistently demonstrate strong returns from knowledge graph architectures. Each one exhibits the pattern of multi-hop, cross-domain reasoning that knowledge graphs are best suited for.
|
1. Digital thread in aerospace and manufacturing An aircraft part has a history: design specification, manufacturing batch, quality inspection, installation record, service history, removal record. That history spans multiple systems and multiple organizations. A knowledge graph connects them into a traversable chain (the digital thread) so that when a sensor flags an anomaly, an engineer can trace the component’s full provenance in seconds rather than days. A large aerospace manufacturer using Kobai’s semantic graph achieved 10x faster equipment diagnosis, on-time delivery improvements, and significant cost savings by connecting parts data across manufacturing, maintenance, and service systems. |
|
2. Predictive maintenance and field operations Predicting equipment failure requires connecting sensor readings, maintenance history, environmental conditions, engineer certifications, and parts availability. These data elements live in different systems and have relationships that no single table captures. A knowledge graph provides a unified model that allows a question like “which assets are at risk and do we have the right engineers available?” to be answered correctly and completely. Energy and utility operators have used this pattern to move from reactive to proactive maintenance, reducing unplanned downtime and the availability loss revenue that comes with it. |
|
3. Supply chain resilience and risk When a supply chain disruption occurs, the critical questions are multi-hop: which components are affected, which products depend on those components, which customer orders are at risk, and which alternative suppliers can fill the gap. A knowledge graph that connects suppliers, parts, assemblies, products, and customer orders can answer these questions in real time rather than requiring days of manual investigation across siloed systems. Organizations with complex, multi-tier supply chains have used knowledge graphs to build dynamic digital twins of their supply network, enabling scenario planning and disruption response at a speed that was previously impossible. |
|
4. Customer 360 and commercial intelligence A “customer” in enterprise data is rarely a single entity. It is a network of contacts, accounts, subsidiaries, contracts, product relationships, and interaction histories spread across CRM, ERP, billing, and service systems. A knowledge graph that unifies this network allows commercial teams to ask questions that previously required manual investigation: who are our key contacts at this account, what products do they own, which contracts are expiring, and which of our other services are a natural fit? Professional services firms have used this pattern to improve bid success rates and cross-sell revenue by enabling self-service commercial intelligence on Databricks without duplicating data into a separate CDP. |
|
5. Enterprise knowledge and expertise discovery In large organizations, expertise is distributed and often invisible. Which engineers have worked on nuclear operations projects? Which consultants have deep experience in regulatory compliance? What lessons were learned on a similar project in another region? A knowledge graph that connects people, skills, projects, documents, and outcomes makes this knowledge searchable, turning institutional knowledge from a cultural asset into a queryable resource. This pattern is particularly valuable in asset-intensive industries where experienced engineers retire and take decades of tacit knowledge with them. A knowledge graph does not replace that knowledge, but it preserves and surfaces what was captured. |
|
6. Explainable AI in regulated industries In financial services, healthcare, aerospace, and energy, AI systems that make decisions or recommendations must be able to explain those decisions. A knowledge graph provides the structured, governed context that makes AI explanations traceable: not “the model predicted X” but “the model predicted X because of entities A, B, and C, which are related in the following way, and the data supporting each relationship can be found here.” Regulatory frameworks across jurisdictions are increasingly requiring this level of explainability. Organizations that build their AI on a knowledge graph foundation are better positioned to meet these requirements without having to retrofit traceability after the fact. |
PART SEVEN
How to implement a knowledge graph: Architecture options
The choice of implementation architecture has significant implications for cost, governance, time to value, and operational complexity. There are three broad approaches in common use.
Option 1: Standalone graph database
This approach involves deploying a dedicated graph database product and loading data from your source systems into it. The graph database then handles ontology management, storage, and query execution.
This approach makes sense when:
- Your use case is primarily graph-native (e.g. fraud detection, real-time network analysis)
- Your organization does not already operate a modern data platform
- Your data volumes are within the range that in-memory graph execution handles well
The trade-offs are real: you introduce a second data platform with its own governance model, its own ingestion pipelines, its own cost profile, and its own operational overhead. Data must be kept in sync between your source systems and the graph database. Access controls must be managed separately. These are not insurmountable, but they are costs that compound over time.
Option 2: Graph layer on top of a data warehouse or lakehouse
This approach keeps data in your existing governed data platform (Databricks, Snowflake, etc.) and adds a semantic knowledge graph layer on top. The graph layer expresses the ontology and manages entity relationships, but the data remains in place. Graph traversals are translated into optimized SQL or Spark queries that execute on your existing compute, governed by your existing access controls.
This approach makes sense when:
- Your organization already operates a modern data platform at scale
- You want knowledge graph capabilities without introducing a second system of record
- Governance consistency and lineage traceability are priorities
- You want elastic, on-demand compute rather than a permanently-running in-memory graph cluster
This is the approach that Kobai takes on the Databricks Lakehouse, and the one that Databricks itself recommends for enterprise-grade knowledge graph capabilities. Data stays in Delta Lake tables, governed by Unity Catalog. The semantic model and relationship traversal layer sits on top.
Option 3: Build your own
Some organizations attempt to implement knowledge graph capabilities using general-purpose tools — representing triples in relational tables, writing custom traversal logic in SQL or Python, managing ontologies in spreadsheets. This approach is common in early-stage or low-budget initiatives.
The practical limits of this approach become apparent quickly. Multi-hop traversals in SQL are painful to write and maintain. Ontology management without purpose-built tooling becomes a documentation problem. Scaling to enterprise data volumes without a graph-optimized execution layer creates performance bottlenecks. Most organizations that start here eventually migrate to one of the first two options.
PART EIGHT
How to get started: A practical approach
The most common mistake in knowledge graph initiatives is starting too large. Organizations attempt to model everything at once, get bogged down in ontology debates, and never ship anything useful. The most successful approaches do the opposite.
- Start with a single, high-value domain. Pick one area of your business where relationship-traversal questions matter and where the data is accessible. A specific asset class, a defined product line, a single customer segment. Build a minimal ontology for that domain and connect the data sources that are most relevant.
- Identify two or three questions you cannot currently answer without manual work. These are your proof-of-concept questions. They should be specific, answerable, and have a measurable value to the business. “Which engineers are certified to service the gearboxes flagged as high-risk this month?” is a good proof-of-concept question. “Give us a complete digital twin of our operations” is not.
- Involve domain experts from day one. The ontology should be defined by the people who understand the business, not designed in a database modelling tool by data engineers. Use no-code tools that allow a reliability engineer or a supply chain manager to define entity types and relationships directly.
- Build on your existing data platform. Do not provision a new graph database until you have demonstrated value and have a clear case for why an in-place semantic layer is insufficient. Start by expressing your knowledge graph on top of the data you already have, where it already lives.
- Expand incrementally by adding adjacent domains. Once the first domain is working and delivering value, add the adjacent domain that shares entities with it. Assets connect to engineers; engineers connect to work orders; work orders connect to parts. Each addition compounds the value of everything already in the graph.
|
The key principle: start narrow, expand through network effects A knowledge graph that covers one domain well is more valuable than a half-finished graph that covers everything poorly. Build depth before breadth. The network effects where each new domain multiplies the value of existing domains through shared entities will follow naturally. |

