Supernova Award Category
The Problem
The initial explosion of the digital channels 10 years ago resulted in a few tough years for Macy’s. They were losing customers and had to upgrade their entire operations across digital channels. They needed to access traditional and unstructured data as they wanted to capture every touchpoint and to analyze the relevant content based on past and current customer purchases. They needed a system that was nimble and flexible to accommodate the scale of their data.
Macy’s implemented Hadoop in 2010 when it was still relatively new. It migrated to Hortonworks in 2014 and built a sophisticated asset in their Hortonworks cluster which was able to determine the advertising effectiveness. The problem was that it was too sophisticated and the BI users were only able to access the business data that was on Hadoop through Hue. Hue is an end-line SQL interface which is very crude and needed end-users to write high-level SQL queries. Many users would make mistakes by writing queries that essentially produced only one query at a time. This would cause the entire cluster to be flooded and brought the cluster to a halt. In addition, end-users were creating thousands of temporary tables which affected the cluster – also making it virtually impossible to use.
The Solution
At Macy’s there are hundreds of experiments occurring at the same time. Data from different sources come into Hadoop with many people accessing it. They found that even with Hadoop they still spent 60-70% of the time on data engineering. In addition there was long analytic iteration time and collaborations were frustratingly difficult.
They needed a solution that could be shared with the entire organization without compromising data quality, and it needed to seamlessly integrate with existing BI tools.
Macy’s chose AtScale, deploying it as a single semantic layer for BI tools on Hadoop. Macy’s appreciated how AtScale allows end-users to directly and efficiently access valuable data in Hadoop while preserving the control, security and responsiveness of the existing big data platform. With AtScale, we can leave the data in Hadoop where it lies, use the right BI tools for the right job and provide a single semantic layer for consistency in data definitions and results.
The results
Data is key for an organization like Macy’s. We understood that it can be an advantage if we can somehow get ahead of it and learn to manage it and optimize it quickly, efficiently and securely.
Before doing BI on Hadoop with AtScale on Hortonworks analysts only interface to the Hadoop data was HUE, a command SQL interface that required sophisticated SQL queries. Business users could make mistakes in queries that would create cartesian products, flood the cluster and bring it to a crawl. And users created thousands of temp tables, potentially making the Hive cluster sluggish.
AtScale allows Macy’s to analyze massive amounts of data across diverse touchpoints without business users having to write their own queries by hand. Analysts are able to use the same tools they are familiar and comfortable with, such as Tableau and Excel but without hitting the wrong data. The result is quicker performance and the Hadoop environment and cluster remain “healthy” without bringing it to a halt. AtScale enables Hadoop to serve many more users at the same time, because the queries are short lived and are consistent as they are managed by the virtual keys of AtScale.
AtScale allowed us to keep the data in Hadoop instead of moving it and gave the ability for all users to use data in the same way. It opened up for other teams beyond Marketing and other users beyond just the data scientists to access the data in Hadoop.
Metrics
Metrics to support results of implementing AtScale include:
Completeness: moved from multiple silos to one combined source for analysis of big picture
Timeliness: Reduced time to insight from several hours → to real-time
Scale: Millions paid keywords analyzed real-time with sub-second queries
Speed: 3 data moves to ZERO. Eliminated 3 extraction steps, and now leave data where it lands in Hadoop, for immediate analysis.
Insight: 360 degree closed loop analysis - Operationalized analysis → insight → decision → action
Impact: Opportunity to make and save Millions of $ with instant bid decisions over a 6-week season → that drives 60% annual revenue
The Technology
AtScale BI on Hadoop Platform
Disruptive Factor
When we started Hadoop 5 years ago, there was no precedence, so success came from organic learning, and modifying as we went. A challenge we hit early on with BI on Hadoop, was that analysts, used to traditional OLAP/BI query times felt analytic iterations on Hadoop (historically batch) were too slow. Because I and my team focus on delivering value every step of the way and consider adoption from the beginning, we were ready to respond to business needs as we went.
- Understand what the business is trying to do first, then find a solution
- Ensure confidence in data quality
- Standardize release processes to create consistency for internal users
- Socialize, train, empower users to drive collaboration and ownership
Monitor quantitative + qualitative measures that show business impact
Using these guidelines, we disrupted the status quo of ‘bringing data to the tool’ and are operationalizing BI on Hadoop; bringing BI tools to the data. What started in Marketing, branched to Finance, Supply Chain, Merchandising. This could be the first enterprise analytics platform that caters to the entire company, including different users; data scientists, analysts, senior leadership...
What sets me apart may be my tolerance to a maturing solution for the greater good. I’m an innovator willing to take a chance on something new even if it isn’t perfect. I work with my vendors like partners. We are in this together, looking for ways to improve the way we work.
Shining Moment
What started in Marketing to innovate around data and analysis has turned into a broad movement across the business. This could be the first real enterprise analytics platform that caters to the entire company.As a result of my Big Data and BI expertise I have been invited to speak as a thought-leader at many Hadoop and Business Intelligence events; Hadoop Summit 2015, 2016, Strata Hadoop 2016, and I serve on several Customer Advisory Boards.
