Google Cloud’s Dataplex, Datastream and Vertex AI announcements point to a well-rounded platform.
The star performers in baseball are known as five-tool players, meaning they hit for average, hit for power, and excel at base-running, throwing and fielding. Think Hank Aaron, Willie Mays and Ken Griffey, Jr. The five-tool equivalents in data excel at data integration, data platforms, data governance, analytics and data science.
Enterprises scouting for star-caliber vendors have certainly had their heads turned by Google Cloud on the strength of its data platforms, with BigQuery being a standout, and its data science capabilities, leading with TensorFlow. (Indeed, Major League Baseball itself is a Google customer that uses BigQuery, among other services). But Google has been akin to an up-and-coming baseball star like Fernando Tatis, Jr. or Vladimir Guerrero, Jr.: It’s clear that both are going to be superstars based on their prodigious hitting, but Tatis needs to work on his fielding while Guerrero is just average when it comes to running the bases. These players need time to mature and develop all five tools.
So it goes with Google Cloud, which is maturing into a five-tool data platform. During the Google Data Cloud Summit last month, it became clear that the pace of maturation is accelerating, with three crucial services announced: Dataplex, Datastream and Vertex AI. These services will help to fill out an integrated, end-to-end platform for data engineers, data scientists, developers, data analysts and business users (see slide below).
Google Cloud is filling out an integrated data platform aimed at a broad spectrum of users.
The idea with data fabrics is to virtualize access to data, with queries reaching out to myriad, distributed sources without having to move or copy that information into a centralized data warehouse. Fabrics increasingly extend across data lakes and data science environments.
The “intelligent” part of Dataplex promises “automatic data discovery, metadata harvesting… and data quality with built-in AI.” Google is also touting centralized security and governance capabilities, including “data policy management, monitoring and auditing for data authorization, retention and classification.”
It’s pretty clear that Dataplex, which is in private preview, is combining data virtualization, which I would put in the data integration camp, with the data governance role, addressed elsewhere by metadata management and governance offerings such as independent Collibra, IBM Watson Knowledge Catalog and (also in preview) Microsoft Azure Purview.
Given Google’s multi-cloud efforts with Google Anthos and BigQuery Omni (which now extends BigQuery data access to AWS and Azure), Dataplex will surely extend beyond Google Cloud, but we’ll have to wait to see what’s available at launch and what comes later. Google has stated that support for other data sources is coming “soon.”
Datastream, a second big announcement at last month’s Google Data Cloud Summit, is a change data capture (CDC) and replication service, now in preview, that will support low-latency requirements including real-time analytics, heterogeneous database synchronization and event-driven architectures. Squarely in the data integration camp, CDC technology is also not new, with long-standing market leaders being Oracle GoldenGate and Qlik Data Streaming (CDC) (formerly Attunity). Nonetheless, the company says its offering takes a different approach by offering a serverless service that automatically scales while replicating and synchronizing data with minimal latency. It integrates with services including BigQuery, Cloud Spanner, Dataflow and Data Fusion.
Vertex AI has been enhanced with new services to ease the production deployment of AI and ML models.
In addition to the new integration, governance and data science tools detailed above, Google also announced a preview Analytics Hub service that will provide a collaborative library for curating analytic assets and sharing and monetizing data. So here again, Google is building on core strengths like BigQuery and rounding out the portfolio to be a complete data player. I’m looking forward to an increasingly competitive public cloud playoff season that will extend over the months and years to come.