Teradata QueryGrid

Teradata QueryGrid , Teradata Corporation

2015

Supernova Award Category

The Problem

The types of data and the types of analytical engines have grown more diverse, with companies employing many different technologies for operational and analytical systems, which are sourced from different vendors. Due to a lack of integration, users treat them as separate systems, limiting many companies’ abilities to get timely answers to their business questions, which ultimately affects productivity. Instead of struggling with growing data volumes, organizations need to access and benefit from the growing data volumes coming from this diverse and expanding array of sources. Enterprises learn more, derive more value from their data assets, and excel in their business by applying the best analytical tool to their comprehensive, integrated data assets. The challenge is to be able to efficiently retrieve and analyze this tremendous volume of data with non-integrated systems from various vendors that often have complex processing requirements. Users need uninterrupted, self-service access to all types of data, including big data, regardless of where that data is stored. That means IT needs the ability to move data and incorporate data from multiple systems without delay.

The Solution

Teradata identified the need to provide high performance, transparent access to data and processing capabilities on other systems to SQL queries submitted to the Teradata Integrated Data Warehouse (IDW) or Teradata Aster Discovery Platform. Teradata QueryGrid was conceived and designed to extend these platforms across the analytical environment, delivering the value of integration to modern, multi-faceted data architectures. Teradata also addressed the need to move data and processing requests bi-directionally from the data warehouse to the other systems while encompassing a variety of target systems from Teradata and other vendors. The vision grew out of a narrower project called SQL-H which provides access to the increasingly popular Hadoop data lake from the Teradata Aster Discovery Platform and the Integrated Data Warehouse. From this project, we saw the benefit of a larger solution connecting and orchestrating queries across a variety of platforms.

The results

Teradata QueryGrid bridges the islands of data and processing within an organization. It delivers a cohesive data fabric spanning the breadth of multiple systems, giving companies the flexibility to pick their file systems, operating systems, data types, analytic engines, and system design characteristics to meet their business needs. Teradata QueryGrid orchestrates query execution across multiple processing engines and repositories in a secure and optimized manner. It removes the delay required for IT to move data and incorporate data within multiple systems to allow uninterrupted analyses and thought processes by removing the need to transfer intermediate results and switch between systems. Teradata QueryGrid impacts the isolated data and analytical environments within an enterprise and gives business users self-service access to big data. It unifies the various analytical data stores and makes specialized analytical capabilities (e.g., graph functions) in one type of system available to operate on data in another system. Ultimately, the solution extends the analytical capability of users and application developers alike through data and system integration.

Metrics

Some of the key metrics used to evaluate Teradata QueryGrid are time-to-value of new applications, as well as “data augmentation frequency,” which measures the increase in use of previously unknown value data from the data lake with other data or the augmentation of other data with known facts from the data warehouse.

Teradata QueryGrid enables and streamlines processes where data with different natures and/or multiple analytical systems need to be used together; on-demand, self-service, at query-runtime. This type of situation occurs in a wide variety of business use cases. For example, where web click data in a data lake needs to be combined with customer contact and purchase data from other channels for customer service or business analysis. An example of a very different business scenario is where sensor data from a product in the field such as diagnostic data sent from a car experiencing overheating problems needs to be combined with manufacturing supplier data from the data warehouse. Another very different process is where a customer facing web application supported by a MongoDB database needs to be augmented by customer data and next-best-offer determination from the data warehouse.

The Technology

The Teradata Database table operator feature enables processing of data from any source (even coming from outside of the database) with parallel results streams into the working area within the database. This capability was utilized to create a parallel bi-directional connection from the remote data source directly into the working area of Teradata Database so that the data and processing on the other platform act as if they were local within the Teradata Database.

Disruptive Factor

The benefit of integration in a data warehouse rather than isolated subject or application-specific data marts is well recognized. Companies trying to use information in an integrated manner faced challenges and often delayed applications or abandoned them altogether waiting for relevant data to be moved in a batch process. Teradata QueryGrid brings this same value of integration to new big data and new types of analytical engines that have recently become available.

The data fabric puts the focus on getting answers to business questions, and not on the underlying IT process or infrastructure by removing extensive data collection and connectivity challenges. Businesses can put big data to work, extending access to users throughout their organization and enabling more self-service access to data to raise the level of employees' job roles. It transparently harnesses the combined power of multiple analytic engines and datasets to address a business question.

QueryGrid is an innovative approach to integrate fundamentally different types of data for the most efficient use, providing information that was not previously available, such as: web applications can now use data from multiple systems in a single request; a single query from an application can make use of data in multiple systems; and, end users who know SQL can make ad hoc requests without the need for IT to move data between systems or perform extracts for them in advance, reducing data preparation times.

Shining Moment

Many areas of analytical technologies can benefit by breaking down barriers and widening the potential scope of analyses. Repetitive or programmatic analyses, such as dashboards, reports and ad hoc or self-service analysis can now include all available data, analytical tools, and resources, including the newest techniques. And, IT services benefit by removing the need to always combine data into the same physical system or to move data to an engine with a unique analytical capability.