Among the more than 200 announcements slated for #AWSreinvent 2023, new Zero-ETL integrations with Amazon Redsift and AI Recommendations for Amazon DataZone were two data-related announcements that caught my eye. Both were announced by Amazon Web Services CEO Adam Selipsky during his November 28 keynote address.
Zero-ETL Integrations with Amazon Redshift: Last year, AWS announced its intention to develop Zero-ETL integrations between Amazon Aurora, the vendor’s transactional database service, and Amazon Redshift, its analytical database service for data warehousing. That led to the general availability of Zero-ETL for Amazon Aurora for MySQL.
During his Tueday morning keynote, Selipsky announced the private preview of three additional Zero-ETL Integrations for Redshift: Amazon Aurora PostgreSQL, Amazon RDS for MySQL, and Amazon DynamoDB. AWS is harnessing the same technology, which is essentially change data capture (CDC) low-latency integration, to move data instantly and seamlessly from three additional transactional database services to Redshift. What's more, AWS has also used the technology to deliver a now generally available Zero-ETL integration between DynamoDB and the Amazon OpenSearch Service.
Doug’s take: Zero-ETL is compelling because it promises considerable savings in time, effort and administrative headaches over conventional extract, transform, load (ETL) development work. It promotes low-latency insight while also reducing ETL processing and development costs. The Zero-ETL service will clearly introduce its own costs, but the time and labor savings are compelling. Keep in mind, though that the "T" in ETL is missing from this service, as data will be moved from the source to Redshift as is. AWS execs tell me customers are either doing the transformations within Redshift, essentially taking an ELT approach, or they are adapting their Redshift schema to work with loaded data as is.
As for the DynamoDB to OpenSearch integration, this will enable data from massive, customer-facing DyanamoDB-based transactional deployments to be quickly available to OpenSearch full-text search, fuzzy search, auto-complete, and vector search for machine learning (ML) capabilities. Talking to AWS executives it’s pretty clear a future step might be using the Zero-ETL capability to do reverse ETL from Redshift back into operational databases such as the various flavors of Aurora, RDS and DynamoDB.
GenAI Recommendations for Amazon DataZone. DataZone is the metadata management, data cataloging and budding governance service that AWS introduced in preview at Re:Invent 2022. The problem with data catalogs is that it can be an administrative pain to add and curate new data resources as part of the collection – unless it can be done automatically with the aid of ML/AI. AWS announced just this step by announcing, in private preview, generative AI recommendations for DataZone. The service will automatically generate descriptions for data - presumably covering both technical and business relevance - that humans can review, tweak and adopt as they add data to the catalog. DataZone is also the place where global access controls can be defined.
Doug's take: The use of ML/AI for augmented cataloging is pervasive among metadata management, cataloging and governance platforms, with examples including Alation, Collibra, Microsoft Purview and Google Dataplex. What's novel here is application of GenAI, which is an obvious next step that multiple vendors are either previewing or adding to their roadmaps. Given that so much is still in preview across all vendors, it's hard to say whether anybody has an edge in using GenAI at this point. DataZone is in the early days of its adoption by customers, so anything AWS can do to remove friction from using the service will help to promote wider adoption. This service, I'm told, will gradually assume a larger role as it matures, supporting rule-based governance over data and related assets.
How Data Catalogs Will Benefit From and Accelerate Generative AI
Boomi Steps Up on Automation and AI as Integration Demands Evolve
Google Sets BigQuery Apart With GenAI, Open Choices, and Cross-Cloud Querying