DataPelago aims to save enterprise $ via universal data processing

Share This Post


Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More


As data continues to be key to business success, enterprises are racing to drive maximum value from the information in hand. But the volume of enterprise data is growing so quickly — doubling every two years — that the computing power to process it in a timely and cost-efficient manner is hitting a ceiling. 

California-based DataPelago aims to solve this with a “universal data processing engine” that allows enterprises to supercharge the performance of existing data query engines (including open-source ones) using the power of accelerating computing elements such as GPUs and FPGAs (Fixed Programming Gate Arrays). This enables the engines to process exponentially increasing volumes of complex data across varied formats.

The startup has just emerged from stealth but is already claiming to deliver a five-fold reduction in query/job latency while providing significant cost benefits. It has also raised $47 million in funding with the backing of multiple venture capital firms, including Eclipse, Taiwania Capital, Qualcomm Ventures, Alter Venture Partners, Nautilus Venture Partners and Silicon Valley Bank.

Addressing the data challenge

More than a decade ago, structured and semi-structured data analysis was the go-to option for data-driven growth, providing enterprises with a snapshot of how their business was performing and what needed to be fixed.

The approach worked well, but the evolution of technology also led to the rise of unstructured data — images, PDFs, audio and video files – within enterprise systems. Initially, the volume of this data was small, but today, it accounts for 90% of all information created (far more than structured/semi-structured) and is very critical for advanced enterprise applications like large language models.

Now, as enterprises are looking to mobilize all their data assets, including large volumes of unstructured data, for these use cases, they are running into performance bottlenecks and struggling to process them timely and cost-effectively.

The reason, as DataPelago CEO Rajan Goyal says, is the computing limitation of legacy platforms, which were originally designed for structured data and general-purpose computing (CPUs).

“Today, companies have two choices for accelerated data processing…Open-source systems offered as a managed service by cloud service providers have smaller licensing fees but require users to pay more for cloud infrastructure compute costs to reach an acceptable level of performance. On the other hand, proprietary services (built with open-source frameworks or otherwise) can be inherently more performant, but they have much higher licensing fees. Both choices result in higher total cost of ownership (TCO) for customers,” he explained.

To address this performance and cost gap for next-gen data workloads, Goyal started building DataPelago, a unified platform that dynamically accelerates query engines with accelerated computing hardware like GPUs and FPGAs, enabling them to handle advanced processing needs for all types of data, without massive increase in TCO.

“Our engine accelerates open-source query engines like Apache Spark or Trino with the power of GPUs resulting in a 10:1 reduction in the server count, which results in lower infrastructure cost and lower licensing cost in the same proportion. Customers see disruptive price/performance advantages, making it viable to leverage all the data they have at their disposal,” the CEO noted.

At the core, DataPelago’s offering uses three main components –  DataApp, DataVM and DataOS. The DataApp is a pluggable layer that allows integration of DataPelago with open data processing frameworks like Apache Spark or Trino, extending them at the planner and executor node level. 

Once the framework is deployed and the user runs a query or data pipeline, it is done unmodified, with no change required in the user-facing application. On the backend, the framework’s planner converts it into a plan, which is then taken by DataPelago. The engine uses an open-source library like Apache Gluten to convert the plan into an open-standard, Intermediate Representation called Substrait. This plan is sent to the executor node where DataOS converts the IR into an executable Data Flow Graph (DFG).

Finally, the DataVM evaluates the nodes of the DFG and dynamically maps them to the right computing element – CPU, FPGA, Nvidia GPU or AMD GPU – based on availability or cost/performance characteristics. This way, the system redirects the workload to the most suitable hardware available from hyperscalers or GPU cloud providers for maximizing performance and cost benefits.

Significant savings for early DataPelago adopters

While the technology to dynamically accelerate query engines with accelerated computing is new, the company is already claiming it can deliver a five-fold reduction in query/job latency with a two-fold reduction in TCO compared to existing data processing engines.

“One company we’re working with was spending $140M on one workload, with 90% of this cost going to compute. We are able to decrease their total spend to < $50M,” Goyal said.

The CEO did not share the total number of companies working with DataPelago, but he did point out that the company is seeing significant traction from enterprises across verticals such as security, manufacturing, finance, telecommunications, SaaS and retail. The existing customer base includes notable names such as Samsung SDS, McAfee and insurance technology provider Akad Seguros, he added.

“DataPelago’s engine allows us to unify our GenAI and data analytics pipelines by processing structured, semi-structured, and unstructured data on the same pipeline while reducing our costs by more than 50%,” André Fichel, CTO at Akad Seguros, said in a statement.

As the next step, Goyal plans to build on this work and take its solution to more enterprises looking to accelerate their data workloads while being cost-efficient at the same time.

“The next phase of growth for DataPelago is building out our go-to-market team to help us manage the high number of customer conversations we’re already engaging in, as well as continue to grow into a global service,” he said.



Source link
spot_img

Related Posts

The Hindu AI Summit 2024: a package

As artificial intelligence evolves at an unprecedented pace,...

Black Friday VPN deals: What to expect, early sales

It might seem early, but VPN deals have...

I saw Lego’s new F1 range – check out these iconic sets

When Lego announced it was teaming up with...
spot_img