About this Constellation ShortList™

Data lakes serve as the first destination for data in all its forms, including structured transactional records, semi-structured/variably structured data types (e.g., log files, clickstreams, email, images, social streams, text documents) and unstructured content (e.g., images, audio and video files). Data lake management enables organizations to bring order and accessibility to data lakes, filling gaps in the management capabilities of platforms including object stores, Spark and Hadoop.

The category refers to data management, data lineage, metadata management and governance. This category is important for any organization that wants to take advantage of high-scale data. Management prowess is particularly important as data sources and data-driven applications multiply and as data scale and diversity grows. All of the above contribute to complexity and the risk of lost and unusable data.

The criteria for inclusion in this list have been modified to reflect heightened expectations for supporting hybrid and multi-cloud deployment. As a result, Oracle has been dropped as it has yet to make its OCI Data Flow (Spark) and OCI Data Catalog services generally available. Constellation is also monitoring Oracle’s containerization efforts and Microsoft Azure partnership with respect to multi-cloud data lake management and will reassess for the Q1 2020 update.

Threshold Criteria

Constellation considers the following criteria for these solutions: 

  • Native ability to connect to myriad data sources and ingest diverse data types, including structured, semi-structured and unstructured data sources, and file formats native to cloud object stores, relational platforms, NoSQL databases, Spark and Hadoop.
  • Includes visual data-flow orchestration interfaces, data-parsing and data-transformation, job scheduling modules and services-based data-delivery capabilities.
  • Works with cloud-based object stores and data-lake services Apache Spark and Hadoop distributions. Includes compatibility with big-data platform-native security and access controls and governance modules and supports management of data lakes deployed on premises or on multiple public clouds.
  • Metadata management, including the ability to capture and apply metadata classifications by source, asset type and business language, giving organizations better insight into data lake content.
  • Governance capabilities ensure that organizations can meet strict policies and compliance
    requirements, supplementing Hadoop-, Spark- and cloud-service native security and access controls and data-lineage tracking.

The Constellation ShortList™

Constellation evaluates more than a dozen solutions categorized in this market. This Constellation ShortList is determined by client inquiries, partner conversations, customer references, vendor selection projects, market share and internal research.

  • IBM
  • Informatica
  • Qlik
  • Unifi Software
  • Zaloni

Frequency of Evaluation

Each Constellation ShortList evaluation will be updated every 180 days as needed. 

Evaluation Services

If you would like to put our extensive expertise in vendor selections and contact negotiations to work for you, please contact us at ShortList@constellationr.com.

Download Research Click to Download Report