Blazing Fast Data Ingestion with Kobai Saturn on Databricks

In today’s fast-paced data-driven world, speed and efficiency are paramount. At Kobai, we’ve taken data ingestion performance to the next level by harnessing the full power of Databricks.

September 30, 2024
Blazing Fast Data Ingestion with Kobai Saturn on Databricks

Our codeless, user-friendly interface configures everything, and once the data starts flowing, Kobai steps back and lets Databricks handle the heavy lifting. The result? A blazing-fast Knowledge Graph that’s not just quick to assemble but robust, flexible, and capable of scaling with your business demands. Here’s a closer look at how Kobai Saturn supercharges data ingestion.

Performance That Speaks for Itself

Recently, we ran an internal benchmark that showcased the unparalleled power of our technology:

  • 28.9 Billion Triples Ingested in Just 46 Minutes
    • Unlike lab-grown tests with curated data, this was a real-world example. Kobai was running on a Serverless SQL cluster in Azure Databricks, and the data, in CSV format, was stored across multiple S3 buckets on AWS. There were no shortcuts. We leveraged Databricks’ best-in-class technologies, including Liquid Clustering, Photon Query Optimization, and Spark's parallel processing. Together, these features enabled us to process massive amounts of data at record speeds.
  • Real Results, Real Savings
    • Not only was ingestion lightning-fast, but it was also cost-effective:
  • Cost: $145.66
    • Databricks 3XL Serverless ($190.40/hr) for 46 minutes

For comparison, using a smaller XL cluster took 121 minutes and cost $112.93. Why not use even more powerful hardware? Our tests found that the 4XL configuration ran up against ingress limits on our Azure account. But don’t worry, we’re always optimizing!

The lesson here is clear: with Kobai Saturn, you get the flexibility to run at full throttle when needed, without committing to long-term high infrastructure costs.

Maximize Efficiency with Flexible Compute

Once your Knowledge Graph is built, it’s ready to go. Unlike other solutions that require continuous server activity, Kobai stores data efficiently in Databricks Delta tables, allowing you to choose when and how to allocate resources.

  • You can separate compute resources for ingestion and queries. This means you can run ingestion at full speed without locking in expensive infrastructure for longer than necessary.
  • Got an active data science team running deep analysis on the same data your company-wide dashboards rely on? No problem! With Kobai, you can assign right-sized clusters for every use case, ensuring everyone has what they need without over-committing resources.
  • If your ML team takes a long weekend, you can rest easy. Simply shut down their compute resources without losing any of the data. When they return, the compute will be back online in seconds, and no re-ingestion is required.

A Smarter Alternative to Virtual Graphs

Virtual graphs might sound appealing at first—leaving your data where it resides and running queries across multiple silos without ingestion. But remember why you want a Knowledge Graph: it’s about integrating data across your entire business for contextualized decision-making. In reality, virtual graphs introduce challenges, especially when your queries span multiple silos.

With Kobai, ingestion ensures that your data is ready to deliver insights in real-time across your organization, without the need for extensive architectural planning. By keeping your data within a Knowledge Graph, it’s always available, scalable, and prepared to handle emerging workloads, including AI-driven analyses.

Continuous Optimization and Incremental Updates

While our high-speed ingestion is impressive, Kobai also offers features that allow for incremental updates. You can continuously add new data without needing to re-ingest everything from scratch. This means your system evolves as your data does, staying relevant and efficient without unnecessary costs.

A Seamless User Experience

While these technical advances are game-changing, what truly sets Kobai apart is how it integrates into your business. From data engineers and scientists to business leaders, everyone benefits from faster data ingestion and more flexible compute. Imagine how much more quickly your team can make data-driven decisions when the data is instantly available and perfectly contextualized.

Let Kobai Do the Heavy Lifting

We hope you don’t need this horsepower very often, but when a new data source becomes available, you’ll appreciate how fast Kobai can bring it into your Knowledge Graph. With Kobai Saturn, you’re not just ingesting data—you’re adding layers of context and insight that help power smarter decisions across your business.

Ready to see it in action? Contact us today to learn more or schedule a demo and experience the speed and efficiency of Kobai Saturn for yourself.
