Supernova Award Category
Data to Decisions
The Problem
More and more companies are moving their offerings onto the cloud so users can have mobile access to their products or use them from alternate devices. When Trifacta released Trifacta v4, a new and improved version of its flagship Trifacta Wrangler Enterprise product, which featured the general availability of Builder, a new menu-driven workflow to guide users through data wrangling steps, they also wanted to support deploying Trifacta in the cloud through integrations with AWS, Google Cloud Platform and Microsoft Azure.
As organizations continue to invest in Google Cloud Platform (GCP), Google sought out to break down the barriers to data analytics for more widespread adoption across organizations. Google identified data preparation as a consistent bottleneck for customers, with some estimating up to 80% of total workload process time being devoted to it, and more often than not, it was limited to their most technical resources.
The Solution
The goal of this innovation was to add support for data preparation to GCP that would allow existing customers to extend data preparation to a wider range of users and, ultimately, increase value from their data across the organization. More specifically, Google envisioned a solution that would enable seamless data preparation of the vast amount of marketing data that Google generates from Google Analytics, Google Adwords, Google Doubleclick, etc.
Trifacta, as part of Google Cloud Dataprep, plays a critical role in enabling everyday business users to explore and prepare data themselves to drive faster, more accurate analysis. These users are now leveraging the full potential of data in Google Cloud Services to drive new sources of business value, such as improving operational efficiency, personalizing products and services, and uncovering new insights. In order to provide this level of service, Google looked to Trifacta to provide an intuitive approach to data preparation.
The results
With an embedded version of Trifacta as part of the Google Cloud Platform, Google is now able to offer a more intuitive approach to preparing data to its many customers. This ensures a greater adoption of the product and increases GCP’s value among customers. In addition, the Trifacta embed in GCP was built in tandem with the Google team, each informing each other on strategy and growth tactics, which has instilled confidence in Google for the longevity of this collaboration.
As a self-service data preparation technology, and now embedded in GCP, Trifacta shifts the burden of the data preparation work away from IT toward end users. Cloud Dataprep will enable any Google Cloud user (with the appropriate permissions) to access, explore and prepare diverse data in services such as Cloud Storage and BigQuery for a variety of downstream uses.
Cloud dataprep benefits Google Cloud by expanding and deepening the usage of Google Cloud services such as Cloud Storage, Dataflow & BigQuery through the expanded adoption and usage of the service. For Google’s customers, the offering can benefit all facets of their organization, depending on the given initiative. From driving increased awareness of customer behavior to improving supply chain efficiency to better forecasting sales, Google’s customers are finding new and unexpected ways to leverage cloud-based data.
Metrics
Though the product is still in private beta, Google’s Fortune 100 customers have already planned new initiatives that center on the use of self-service data preparation leveraging Google Cloud Dataprep. Since the launch of the private beta of Google Cloud Dataprep in May, over 2,000 users from over 100 companies have gained access to the service. In total, over 5,000 users from over 400 companies have requested access to the service during the private beta.
The users who have gained access to the service have created more than 6,000 Google Cloud Dataflow jobs leveraging Cloud Dataprep to define the logic of those jobs. The largest job to date has executed over a 4.5 PB dataset. Once the Cloud Dataprep service exits beta in January, we’ll be able to release more information on the customers using this service and the business value they are creating.
The Technology
Google Cloud Dataprep embeds Trifacta's intelligent, user-friendly interface and Photon Compute Framework, and natively integrates Google Cloud Dataflow for serverless, auto-scaling execution of data preparation recipes with record performance and optimal resource utilization. Google Cloud Dataprep provides analysts with the ability to intuitively explore and prepare diverse datasets within Google Cloud Platform for a variety of downstream uses including analytics and machine learning.
Disruptive Factor
Companies are augmenting existing infrastructure investments with cloud-based analytics infrastructure and applications in part because of the rise of new or unconventional data sources, such as IoT data, which are being generated from an increasing number of cloud-based sensors and devices. Trifacta’s collaboration with Google Cloud Dataprep will help organizations fully utilize this new information by allowing non-technical users to leverage and wrangle the data themselves.
IT organizations that are familiar with Trifacta Enterprise, which leverages Hadoop Distributed File System (HDFS), Apache Hive and Apache Spark for deployment, can offer self-service data preparation that integrates seamlessly with Google Cloud Storage, BigQuery and Cloud Dataflow. If managing across cloud and on-prem, Trifacta interoperates across all environments, ensuring common metadata, lineage, and governance across cloud and on-prem investments.
Shining Moment
From Google’s perspective, they are the first cloud provider to offer self-service data preparation directly within its platform. The collaboration between Trifacta and GCP drives new innovation in the cloud space, and is already being adopted at companies around the globe. For these companies, the value of this innovation will be driven by increased adoption and a greater throughput in the organization.
