About this Constellation ShortList

Data lakes serve as the first destination for data in all its forms, including structured transactional records, semi-structured/variably structured data types (e.g., log files, clickstreams, email, images, social streams, text documents) and unstructured content (e.g., images, audio and video files). Data lake management enables organizations to bring order and accessibility to data lakes, filling gaps in the management capabilities of platforms including Hadoop, Spark and high-scale object stores. 

The category refers to data management, data lineage, metadata management and governance. This category is important for any organization that wants to take advantage of high-scale data, particularly as data sources and data-driven applications multiply and as data scale and diversity grows, contributing to the complexity and risk of lost and unusable data within data lakes.

With this update, Kylo has been added based on the combination of dozens of successful deployments within large, well-known enterprises and significant contributions from a growing open source community. Unifi Software was added for its breadth of capabilities, extending into data cataloging and self-service data preparation. 

Threshold Criteria

Constellation considers the following criteria for these solutions: 

  • Native ability to connect to myriad data sources and ingest diverse data types, including structured, semi-structured and unstructured data sources, and file formats native to Hadoop, NoSQL databases and relational platforms
  • Includes visual data-flow orchestration interfaces, data-parsing and data-transformation, job scheduling modules and services-based data-delivery capabilities
  • Works with Hadoop, Spark and high-scale object stores, including compatibility with YARN, native security, governance modules, MapReduce processing, and open-source tools such as Hive and Presto
  • Metadata management, including the ability to capture and apply metadata classifications by source, asset type and business language, giving organizations better insight into data lake content
  • Governance capabilities ensure that organizations can meet strict policies and compliance requirements, supplementing Hadoop-native security controls and data-lineage tracking 

The Constellation ShortList™

Constellation evaluates more than a dozen solutions categorized in this market. This Constellation ShortList is determined by client inquiries, partner conversations, customer references, vendor selection projects, market share and internal research.

  • IBM
  • Informatica
  • Kylo
  • Oracle
  • Podium Data
  • Unifi Software
  • Zaloni

Frequency of Evaluation

Each Constellation ShortList evaluation will be updated every 180 days as needed. 

Evaluation Services

Constellation clients may work with the analyst and research team to conduct a more thorough discussion of this ShortList. Constellation can also provide guidance in vendor selection and contract negotiation.

Connect with a Constellation analyst or learn more. Click here. 

Download Research Click to Download Report