Data lakes were designed to offer a strong alternative to traditional data warehouses, by delivering optimal flexibility, agility and quick time-to-market for a wide range of analytics use cases. As storage becomes a commodity, data consumers demand to analyze all available data, instantly and without waiting for lengthy data modeling projects. The data lake promise is focused on zero data modeling or duplication and the ability to run queries on any data at its granular format as the single source of truth.
The critical challenges are often related to efficiencies. Data warehouseד deliver very strong price / performance but fail on delivering agility. Data lake query engines are based on brute force technology, which means 90% of compute resources are wasted on scanning and filtering through massive amounts of data, resulting in a poor price / performance balance. To truly realize the value of data lakes, data teams are challenged with heavy optimizations and DataOps, which often push them away from data lake architecture and back towards optimized data silos.
Who are your first pilot customers? (It’s okay if you don’t have one yet, but if you do, please list):
We are currently working with a wide range of customers, including large cybersecurity vendors, international pharmaceuticals, large media companies, etc.
In most deployments, there is significant pressure on data platform teams to adopt and expand the usage of a data lake architecture. In Varada’s case, these costumers also demand that the analytics stack run within their VPC to meet agility and policy requirements, as well as avoid lock-in.
Across the board, our customers experience challenges in meeting performance and cost requirements on highly dimensional data. This is where Varada’s autonomous indexing technology shines.
“By keeping our existing data lake and operating within our own VPC, Varada’s solution was not only easy to deploy, it also enabled us to apply our own strict security and data policies and maintain full control throughout the data and application stack”. Head of data platforms, endpoint protection vendor.
The results our customers experience are 10x-100x performance uplift on 10x more data, and often with 40%-60% TCO reduction.
What technologies do you use?:
Varada offers a data lake analytics platform with unique autonomous indexing technology, driving dramatic performance uplift while eliminating the need for data modeling and multiple optimized data platforms. The solution runs in the customer’s VPC and on its data lake, and can be offered as a standalone cluster, or connect to existing Presto / Trino clusters.
Varada is 10x-100x faster than any other data lake query engine. Data teams and users no longer need compromise on performance and fast time-to-insights.
Varada’s proprietary indexing logic automatically analyzes the data lake and introduces indexes for filtering, joins and aggregates, continuously evaluating query performance on the fly.
Varada’s engine automatically prioritizes the data to index or cache based on a smart observability layer that continuously monitors demand. Varada indexes data directly from the data lake across any column. This means that every query is optimized automatically. We recently introduced elastic scaling capabilities that further extend TCO & performance advantages.
Which category do you see yourself in?:
Data to Decisions and AI
Where is the company headquartered?:
Tel Aviv
How many employees do you have today?:
26-50
What is the state of your minimal viable offering?:
The video above demonstrate the value of Varada's platform vs. AWS Athena, a native AWS serverless Presto platform.
Customers can expect a similar advantages vs other Presto and Trino implementations, as well as cloud data warehouses such as Redshift or Snowflake.