Reid Levesque

Head, Solution Engineering, Royal Bank of Canada

Supernova Award Category

The Problem

Our legacy platforms could not process volumes of data required for a broad range of analytic end users across the bank’s businesses. The incumbent stack was too costly, too slow and too cumbersome. As a result, the business couldn’t do its job; analysts who needed rapid, iterative access to a variety of datasets were constrained to workarounds that required them to split work into dozens of spreadsheets across 10-15 laptops. We’d been  doing it this way for years and had not achieved success. We tried using  a columnar database, but it did not perform; we tried an In-Memory Data Grid, and found it not broadly useful enough.

Hadoop seemed to be a good fit. But putting it into operation meant matching architectural virtues to economics. We tried four ‘classic’ IT-centric data management use cases: Migrating relational workloads, grid computing, relocating ETL and providing a general-purpose file store. Once the TCO of Hadoop was figured in, none of these provided the net gain required in cost and productivity.

The Solution

We chose Hadoop for its broad appeal across the organization, and architectural features to address a diverse set of use cases; driving adoption was the next challenge. Users didn’t fully comprehend the architectural benefits in the same way as IT. For example, they didn’t understand that they could process data on Hadoop clusters rather than extract it to standalone flies.

We solved this by giving users access to the data via a self-service visualization and BI tool called Arcadia Data that provided web-based visual analytics directly off of the Hadoop clusters. It offered ways to explore and manipulate data that were sufficiently familiar to end users that they recognized what it could do.

Once business users saw how they could access data directly, create visualizations, drill down to raw data, and share results, end user engagement with the data on the Hadoop clusters skyrocketed. Users reported order-of-magnitude acceleration in data access and analysis cycle times.

The results

End user collaboration: End users had spent most of their time seeking data and organizing it to fit the tools at hand, like spreadsheets or local bespoke applications. Actual analysis to expose data relationships needed long setup times. Once Hadoop and Arcadia collapsed setup times with quick, collaborative access, users were able to focus on iterative, interactive analysis without losing steam on data setup.

Networking: The underlying IT strategy for data networking was premised on limiting access across NAS shares. Hadoop enabled a lot of cross-domain access which loaded the network much more than we anticipated. Therefore we created a network sized to peak data interactions.

IT agility: We had successfully separated IT functions into specialties that worked very effectively in isolation on the old platform. Where development, operations, security, and infrastructure teams once worked as separate entities, we sat them next to each other to cut the latency of interaction between them. Fewer tickets, faster time to value and outcome.

TCO: With these productivity gains in place, we could build on the capability to extend Hadoop to address the original IT user cases profitably; migrating relational workloads, grid computing, relocating ETL and providing a general-purpose file store were now economically viable.

Metrics

Prior to switching to Hadoop and Arcadia, analysts were only able to perform one analysis per day, because it was constrained by a  batch process with no interactivity. After this project was completed, users were able to perform dozens of analysis cycles per day, as we had now created  interactive and easy-to-use system. This allowed the analysts to spend only a quarter of their time on this activity, freeing them up to do more risk management and less reporting.

By the time the cost of moving to Hadoop was added to the cost of setting up Hadoop in the first place, it was higher than the benefit, which is why TCO never worked for us.

For example, if we take the database replacement use case, we could save  paying the Oracle license of $200,000. However, migrating from Oracle to Hive would take six months of labor plus overhead at an internal cost of $100,000, plus the Hadoop cluster setup after that. This takes about 24 person-months of labor at an internal cost of $400,000. Hardware and licensing outlay added another $400,000. This made it  far more expensive to set up Hadoop for the first time than the Oracle license. By contrast, using Arcadia on Hadoop eliminated the costs of migration.

The Technology

Arcadia Enterprise from Arcadia Data,  visual analytics and BI platform  running Native on Hadoop clusters. Hadoop clusters provided by Hortonworks.

Disruptive Factor

Traditional separation of IT specialties enabled the bank to achieve a lot of efficiencies. We’d optimized ourselves into narrow use cases that no longer centered on current user needs or use cases.

Structuring work engagement based on delivering results rather than achieving utilization efficiencies was a significant change in how we approached the problem. Providing a common platform on which this collaboration took place was a significant shift in the division of labor within IT.

There were also a large number of IT policies that were explicitly contradicted by Hadoop’s operating requirements. We had to challenge the status quo in configuration best practices to accommodate the platform.

Productivity of analytic end users was made a specific objective across multiple analytic sources. Previously, end users had to broker their own data extraction efforts across multiple sources and then organize local computation – be it in the form of bespoke applications, spreadsheets, data visualizations on local tools, etc. In the new world, we put a premium on how quickly users could gather, analyze, manipulate and visualize the data they needed to get their job done.

Shining Moment

We needed to analyze up to 7 years of client activity to examine possible patterns via interactive analysis and visualization. Once we set up data in Hadoop via the easy-to-use Arcadia platform, we were able to give it directly to the business users. Broad adoption of the tool and visualizations across 100s of users quickly demonstrated that when you give users a tool that makes sense to them, & they have the data they already understand and can go at it, it's amazing what they get done.

Head, Solution Engineering

Submission Details

Year
Category
Result