VP, Enterprise Information Management, Technology Services, TD Bank
Data to Decisions
TD Bank, America's Most Convenient Bank, is one of the 10 largest banks in the U.S., providing more than 9 million customers with a full range of retail, small business and commercial banking products and services at more than 1,200 convenient locations throughout the Northeast, Mid-Atlantic, Metro D.C., the Carolinas and Florida. In addition, TD Bank and its subsidiaries offer customized private banking and wealth management services through TD Wealth, and vehicle financing and dealer commercial services through TD Auto Finance.
In 2013, TD Bank set an ambitious goal for themselves: to transform how they use IT to drive the business. Traditional competitors like Bank of America, on-line banks like Ally and new tech contenders like Apple Pay were pressuring TD to quickly innovate with new products that anticipated emerging market opportunities. But without fast agile access to data to drive pervasive analytics, TD Bank’s leaders struggled to meet the new competitive challenge.
Joe DosSantos, TD’s SVP of IT, captured the challenge when he asked: “How do we make our use of data and analytics a game changer for the bank at a rate 10x greater than our past cost savings?” The existing set of technologies which the bank used to deliver data to the business – specifically ETL tools, RDBMS, and analytic appliances – were inadequate.
- takes 6 months to deliver new data to the business,
- there is redundant data provisioning created excess costs, duplicate storage, and silos
- Only highly conformed, strictly formatted data was accessible to analysts, limiting ad hoc analytic analysis, exploratory analytics.
- Concern that unprotected PII data might be accidentally exposed, many potentially valuable data sets were never provisioned to the business
TD Bank’s “IT 3.0” program is a multi-year initiative aimed at fundamentally re-architecting how data is delivered to the business. TD has replaced their existing processes and tech stack with a new model of fast, secure, delivery of business-ready data. A managed enterprise data lake – called the Enterprise Data Provisioning Platform (EDPP) - pre-positions data from hard to reach enterprise systems, legacy platforms, and third party data sources in a secure governed platform. Incoming data is automatically scanned to detect and protect PII. Raw data is successively cleansed, conformed, transformed, and documented to produce data sets appropriate for use by data scientists or business users.
Users directly request access to data via a GUI app for new analytic or reporting needs. Users also post data to add into the EDPP – a process that finished in 2 days with no IT help.
For Core IT/Data Provisioning Process:
- Reduced time required to deliver new data to the business from 2 months to 2 days
- Lowered the cost to provision a single file of data to the business from $12,000 to $400
- Eliminated $4.4M in data provisioning costs in first 3 years of the project.
For Data Security and Governance:
- Achieved 100% automatic identification and protection of all sensitive data in the EDPP.
- Implemented consistent governance across all data in the EDPP, including consistently enforced data quality standards, the creation of business and technical metadata definitions and values, integration of official master data and reference data sets at an enterprise level, and implementation of a consistent data preparation, delivery, and certification processes.
For Enterprise Applications (Example: Monthly Branch Scorecard Reports):
- By year 2 of the project, TD had enough data in the EDPP to using it as the data source system for downstream applications. This allowed TD to generate data delivered with the following results:
- Sunset 200+ ETL jobs
- $300k/yr lower personnel cost - 2 F/T on other projects
- $40k+/yr lower ETL costs going forward
- From 6 mos to 2 days to add new data to the scorecard
- Improved quality, lineage, transparency, and metadata
- Migration of data to more cost-effective s/w and h/w
- Stronger PII protection
- New integrated, consistent date set for data scientist
- Empower business users with their own data and drive it through the provisioning process without IT help. This freed up thousands of IT man hours to work on other projects, while allowing the business to use data more independently/frequently.
- Implement consistent governance across all data including consistently enforced data quality standards, the creation of business and technical metadata definitions and values, integration of official master data and reference data sets.
- Increase consistency, security, and efficiency of enterprise data by eliminating silos and increasing reuse and collaboration.
- Provide a standalone data prep application that generated data for monthly updates of their Branch Scorecard Reporting system.
- Empower TD business units with the most-agile, cost-effective and scalable platform to deliver business-ready data to their business users.
Hadoop: CDH 5.7.1; ~100+ nodes production cluster
Schedulers: Autosys, Oozie, Pipeline Controller
Reporting: Tableau 10.x
Other: Talend, AtScale
TD Bank is a trailblazer. Interestingly, the disruptiveness of this project transcends what was achieved with technology; reducing data costs, reducing data complexity, and accelerating time-to-answers.
What stands is the story behind the technology and how TD worked to invoke and influencing new thinking among their management ranks to ultimately achieve this Big Data transformation.
Perpetual evolution and embracing disruptive technology didn’t make TD’s data delivery team very popular. With the EDPP project, they had to influence other forces (people) within the organization to embrace the concept of a “data marketplace” as a strategic vision as well as a technological advancement.
Building advocacy for a project to deliver a completely new way to turn their data into higher-value assets was often more challenging that labyrinth of technology complexities they were trying to eliminate.
This use case from TD truly represents uncharted territory – where traditional organizational boundaries are being breached, long-standing customs and practices are changing, and well-accepted roles and responsibilities have been redefined.
In their own words, they can describe the project’s impressive outcomes that brought automation and publishing of business-ready data to 100’s of users - driving down the delivery times and costs of data to LOBs.