Organizing Brownfield Data Across Multiple Plants.
Connected Data Is Not the Same as Connected Understanding
We moved the data to the cloud. Why are simple questions still so difficult to answer?
That’s the frustration that surfaces, reliably, in conversations with industrial leaders who have made significant investments in platforms like AVEVA and Databricks. The infrastructure is performing. The data is flowing. And yet an engineer at Plant B trying to compare pump failure patterns with Plant A still ends up in a two-day investigation across systems that should already be connected.
Here is the version of that problem that every industrial company immediately recognizes.
|
Plant A calls it P-101. Plant B calls it PU-001. Plant C uses a legacy naming convention: VUIRATION_PMP_10. Same asset. Same failure mode. Different names. An engineer asking “show me pumps similar to P-101 that failed under high vibration conditions” should get an answer that spans the fleet. Instead, they get an investigation. Different naming conventions. Different historian configurations. Different answers — or no answer at all. |
This is not an edge case. It is the standard condition in industrial organizations that have grown through acquisition, operated across regions, or run facilities built in different decades with different instrumentation standards. And it is the reason why AI that works at one site consistently struggles to scale across the fleet.
The challenge has shifted. Getting data into the cloud is largely solved. The new challenge is shared meaning: making industrial data consistent, traversable, and usable by both humans and AI systems without requiring a data engineering team to mediate every question.
|
Connected data is not the same as connected understanding. Industrial intelligence requires more than connectivity. It requires context. |
PART ONE
The industrial data problem has evolved
The first wave of industrial digital transformation was about data acquisition: getting operational data off local historians and into scalable cloud platforms. That wave has largely succeeded. Most large industrial organizations now have the data infrastructure in place.
The second wave — the one most organizations are now in the middle of — is about making that data useful for decisions. Not dashboards. Decisions. Faster root cause investigations. Confident answers to cross-domain questions. AI that can reason across the fleet rather than within a single site.
This is where the gap between investment and operational outcome is most visible. The data exists. The platform is capable. And yet maintenance decisions are still slow because the relevant context is scattered. Engineering and operations still get different answers because no shared definition of “downtime” exists. AI pilots still stall because the asset model at Plant A is inconsistent with Plant B.
|
We’ve digitized the data. We haven’t digitized the meaning. And without shared meaning, every cross-domain question requires a manual investigation. |
PART TWO
Brownfield alignment: the decision that can’t be made
Return to the pump example from the introduction. The engineer wants to understand failure patterns across the fleet. The data exists. The PI historians are fed into Databricks. But without a shared model that says “P-101, PU-001, and VUIRATION_PMP_10 are all instances of a centrifugal pump in the same asset class, subject to the same maintenance logic” — the fleet-wide question cannot be answered automatically.
This is the brownfield alignment problem. Every plant was built or acquired with its own naming conventions, its own instrumentation model, its own historian configuration. Nobody made a wrong decision. The plants just grew independently. And now the organization wants to make fleet-wide decisions about predictive maintenance, about performance optimization, about sustainability reporting and the data does not speak a common language.
The operational consequence is measured in hours and decisions deferred. Every cross-site analysis that should take minutes takes days. Every fleet-wide AI initiative requires a mapping exercise before the model can even start. Every new site added to the business re-introduces the alignment problem from scratch.
|
What this costs operationally The brownfield alignment problem is not just a data quality inconvenience. It is a decision latency problem. When a reliability engineer cannot get a fleet-wide answer in the time available before a maintenance decision must be made, they make the decision on incomplete information. Over a fleet of hundreds of assets, across dozens of sites, the cumulative cost of those decisions — in unplanned downtime, in reactive rather than proactive maintenance, in missed optimization opportunities — is significant. |
PART THREE
Connected data still lacks shared meaning
Even when the brownfield naming problem is partially addressed, a deeper challenge remains. The data describes what happened. It does not describe what it means and it does not connect the dots between what happened in one system and what that implies in another.
When the pump failure event appears in the CMMS, the system records the work order, the component replaced, and the completion date. What is not recorded — because no individual system was designed to record it — is the relationship between that failure and the specific operating conditions in AVEVA PI at the time, the engineering design specification in the document system, the sustainability impact of the unplanned downtime, or whether similar patterns exist in other assets.
Each of those relationships matters for the decision. An engineer investigating a recurring failure needs all of them. But they are buried in pipelines, in SQL, in institutional knowledge that lives in people’s heads rather than in a form that can be shared, reused, and made available to AI systems.
|
What the data says |
What the decision requires |
|
Pump P-101 had a work order on 14 March |
Was this failure related to the vibration readings in PI from the prior 72 hours? |
|
The vibration sensor shows anomalous readings |
Which other pumps in the fleet show the same pattern right now? |
|
A component was replaced |
Was this component from the same supplier batch as others currently in service? |
|
Downtime was logged as 4.2 hours |
What was the sustainability impact, and how does it affect our ESG reporting? |
PART FOUR
Why AI struggles without shared context
The three problems above converge into a single AI challenge. Industrial AI quality improves significantly when context is shared, governed, and consistent. When it is not, the same symptoms appear across every organization attempting to scale AI beyond a single site or use case.
AI pilots that work at Plant A fail to generalize to Plant B because the asset models are inconsistent. Predictive maintenance models trained on clean single-site data cannot be deployed fleet-wide without a mapping exercise that takes longer than the model took to build. Operations teams stop trusting AI outputs because the same question asked in two different Genie spaces returns two different answers based on inconsistently configured business logic.
The organizational consequence is that AI projects that demonstrate real value in a controlled PoC environment do not get scaled. They get documented as “successful pilots” and deprioritized. The business case for enterprise AI is real; the ability to deliver it at scale is what is missing.
|
The question for every industrial organization investing in AI is not whether the technology works at a site level. It does. The question is whether shared context exists to make it work across the enterprise. |
PART FIVE
How a shared context layer bridges AVEVA and Databricks
The architectural response is a context layer that sits between industrial data sources and the AI and analytics capabilities that consume them. Not a new data store. Not another pipeline. A layer that adds meaning, relationships, and shared business rules to data that already exists, built directly within the Databricks Lakehouse under Unity Catalog governance.
|
Layer |
What it provides |
|
AVEVA operational data |
Trusted industrial data from PI, historian, ERP, CMMS, and sustainability systems — the authoritative operational record |
|
Databricks Lakehouse |
Open, scalable, governed foundation for all industrial data where compute, governance, and AI capabilities live |
|
Kobai context layer |
Shared meaning, relationships, and business rules added directly within the Lakehouse — P-101, PU-001, and VUIRATION_PMP_10 all understood as the same class of asset, subject to the same operational logic |
|
AI / analytics / decisions |
Trusted insights and faster decisions: root cause analysis in minutes rather than days, fleet-wide AI that generalizes across sites, explainable outputs that operations teams actually trust |
With this layer in place, the engineer who wanted to find pumps similar to P-101 that failed under high vibration conditions can ask that question across the fleet and receive an answer that spans Plant A, Plant B, and Plant C, because the context layer understands that they are all instances of the same asset class. The investigation that previously took days takes minutes. The decision that was made on incomplete information is now made on a connected picture.
|
If AI is the interface, context is the foundation. The future of industrial intelligence is not just connected data. It’s connected understanding. |
PART SIX
What this means for AVEVA customers on Databricks
For industrial organizations that have invested in AVEVA and Databricks, the context question is the practical next step. The platform investment has been made. The data is flowing. The question is whether the shared understanding needed to make that investment deliver decisions at scale has been built alongside it.
Faster decisions from existing data
The most immediate impact of a shared context layer is decision speed. Root cause investigations that currently require a data engineer to assemble context from multiple systems become self-service queries. Fleet-wide questions that are currently unanswerable without a manual mapping exercise become routine. The data has not changed. The ability to reason over it has.
AI that scales beyond the pilot site
For organizations where AI has succeeded at one site but stalled at fleet-wide rollout, shared context is the missing precondition. A common asset model that normalizes naming conventions across sites and declares the relationships between assets, events, and operational context is what allows a predictive maintenance model to generalize from Plant A to Plant B without a site-by-site reconfiguration.
Trusted AI with explainable outputs
As AI governance requirements tighten in energy, chemicals, and process manufacturing, the ability to explain why an AI recommendation was made to trace it back to the specific data and relationships that produced it becomes a business and regulatory requirement. A context layer built within the Databricks Lakehouse under Unity Catalog governance produces that traceability as a natural output of the architecture.
Databricks + Kobai: shared context for industrial AI
Kobai extends the Databricks Lakehouse with the shared context layer that industrial AI requires to scale. Graph structures are built directly within Databricks under Unity Catalog governance, connecting AVEVA operational data with engineering, maintenance, sustainability, and enterprise data into a unified operational picture.
Watch Ryan Oattes’ full AVEVA World presentation here:
|
To explore how shared context on the Databricks Lakehouse can extend the value of your AVEVA investment, visit kobai.io or contact us at contact@kobai.io. |

