About this Constellation ShortList

Data lakes serve as the first destination for data in all its forms, including structured transactional records, semi-structured/variably structured data types (e.g., log files, clickstreams, email, images, social streams, text documents) and unstructured content (e.g., images, audio and video files). Data lake management enables organizations to bring order and accessibility to data lakes, filling gaps in the management capabilities of platforms including Hadoop, Spark and high-scale object stores. 

The category refers to data management, data lineage, metadata management and governance. This category is important for any organization that wants to take advantage of high-scale data, particularly as data sources and data-driven applications multiply and as data scale and diversity grows, contributing to the complexity and risk of lost and unusable data within data lakes.

The criteria for inclusion in this Constellation ShortList have been modified to reflect heightened expectations for supporting cloud deployments. Since the Q3 2018 publication of this list, Podium Data’s acquisition by Qlik was completed and the product was rebranded and updated as Qlik Data Catalyst. Kylo has been removed from this update, as it will no longer be supported by Teradata after September 2019.

Threshold Criteria

Constellation considers the following criteria for these solutions: 

  • Native ability to connect to myriad data sources and ingest diverse data types, including structured, semi-structured and unstructured data sources, and file formats native to Hadoop, NoSQL databases, relational platforms and cloud object stores.
  • Includes visual data-flow orchestration interfaces, data-parsing and data-transformation, job scheduling modules and services-based data-delivery capabilities.
  • Works with Hadoop, Spark and cloud-based object stores and data lake services. Includes compatibility with Sentry, Ranger, other native security and access controls, governance modules, and open-source tools such as Hive and Presto.
  • Metadata management, including the ability to capture and apply metadata classifications by source, asset type and business language, giving organizations better insight into data lake content.
  • Governance capabilities ensure that organizations can meet strict policies and compliance requirements, supplementing Hadoop-, Spark- and cloud-service native security and access controls and data-lineage tracking. 

The Constellation ShortList™

Constellation evaluates more than a dozen solutions categorized in this market. This Constellation ShortList is determined by client inquiries, partner conversations, customer references, vendor selection projects, market share and internal research.

  • IBM
  • Informatica
  • Oracle
  • Qlik
  • Unifi Software
  • Zaloni

Frequency of Evaluation

Each Constellation ShortList evaluation will be updated every 180 days as needed. 

Evaluation Services

Constellation clients may work with the analyst and research team to conduct a more thorough discussion of this ShortList. Constellation can also provide guidance in vendor selection and contract negotiation.

Download Research Click to Download Report