Supernova Award Category
Data to Decisions
The Problem
As an 80-year-old company, Kaplan has no shortage of legacy systems and applications. In 2015, the organization recognized that it lacked a centralized way to bring diverse information together for data-driven decisionmaking.
As part of a company-wide move into Amazon Web Services (AWS) that began in 2013, Kaplan selected Amazon Redshift as an analytical data platform. It needed to consolidate access to data from more than 40 data sources on the platform, retire legacy systems where possible while decreasing opex costs.
The Solution
After determining writing and maintaining Python code would be too demanding, Parekh needed a data integration vendor that could handle a variety of sources, data types and data velocity requirements. The solution would have to be easy to maintain, could scale as demand grew, pass security and compliance reviews while supporting data integration for both on-premises and VPC on AWS environments.
Delving through his team’s requirements, SnapLogic, Talend, Informatica, and Microsoft SQL Server Integration Services (SSIS) - which the company had already been using - made up the shortlist of vendors.
SSIS required too much administrative support. Informatica was cost-prohibitive, and while Talend was a viable open-source option it required expertise that was scarce to find.
SnapLogic, however, was easy to use, competitively priced and excelled at the rigorous security requirements since the platform does not interact with any data behind customer’s firewalls.
The results
Kaplan has integrated more than 50 applications in less than a year, and plans to integrate 40 more applications in the next year. Instead of taking four weeks to provide reporting and meet data requests, Parekh’s team can turn around these requests around, with insights, in fewer than three days. Now, data scientists are empowered to distill the meaning behind the trends in data, and business departments can confidently pinpoint areas to improve.
SnapLogic has helped Kaplan to integrate data not just for selling more products to its customers but to provide a central source of truth to propel students to the next
step in their school and career paths. From product development to finance, Kaplan uses the power of big data within their data lake to identify and predict areas that are most impactful in a student’s learning journey, and eliminate inefficient financial activities to improve the bottom line.
“We are now able to gather real-time student feedback, undertake analysis, and tap into data profiling, which empowers our end users, and prepares us for building a product platform that helps millions of students across various channels, devices, and locations,” said Parekh. “Essentially, SnapLogic helped transition our IT organization from a service provider to that of a partner in decision making.”
Metrics
- Kaplan has saved ~$1million in IT costs since late 2016 by
- retiring seven legacy off the shelf and home-grown systems
- consolidating 50+ apps and data sources on to Redshift
- the platform is ingesting 20 million to 30 million new records per day while retaining storage demands under 3 terabytes
- New data archiving workflows will cut CRM system storage costs by $150,000 annually
- Activity-based cost analyses supported by Redshift have helped Kaplan streamline operations and boost profits
The Technology
- Primary solution: SnapLogic Enterprise Integration Cloud, Amazon Redshift
- Secondary connections/integrations (via SnapLogic): Amazon DynamoDB, Amazon EMR, Amazon Kinesis, Google Analytics, Google BigQuery, MongoDB, Salesforce.com, SQL Server, Zuora
Disruptive Factor
Tapping Parekh’s history as a consultant at PwC and Deloitte he was able to take a long-term approach and invest in a solution that currently saves Kaplan money and business processes while delivering better results, but also has a flexible framework with which he can experiment. With the 80-year old company, he has ambitious plans, including:
- experimenting with machine learning via Spark-based processing of large data sets
- implementing schema-on-read architecture using NoSQL data from sources including MongoDB
- possibly helping the Kaplan CRM team by enabling a near-real-time customer reporting application using Python, MongoDB, Apache Airflow and microservices
- issued a streaming analytics POC that uses the combination of Spark Streaming, Amazon Kinesis, ElasticSearch and Kibana for visualization
Shining Moment
Beyond throwing costly solutions at the problem, Parekh took time to understand the problems, developed a vision and was able to quickly get internal buy-off while being a relatively new employee. Rather than getting locked into a legacy system or moving everything to one platform (and likely losing so much data along the way), Parekh helped transform an 80-year old education services company into a forward-thinking model of modern IT.
