Supernova Award Category
Data to Decisions
The Problem
When Etsy was founded in 2005, the company started out with a monolithic PostgreSQL database in which everything was stored – from listings, sellers, and buyers to conversations and forums. Etsy quickly outgrew the original database and began to shard it – first vertically and then horizontally with hundreds of MySQL servers.
Performing analytics with this setup became difficult over time, and Etsy began replicating data from the shards back into a new PostgreSQL business intelligence (BI) server – which proved to be too slow. Postgres is designed for single record lookup – as opposed to analytic-type queries – and this particular setup didn’t perform as well as a relational database.
The Solution
The Vertica Analytics Platform is purpose-built for Big Data analytics and is designed for use in a range of analytical use cases and workloads where speed, scalability, simplicity and openness are crucial to the success of data analytics initiatives. Vertica relies on a tested, reliable distributed architecture and columnar compression where massively parallel processing enables data to be handled at petabyte scale for the most demanding use cases.
With some queries on PostgreSQL taking days to complete, the Etsy team migrated its BI server to Vertica – as migration was a straight forward process. The IT team bulk loaded large amounts of data into Vertica at once using the COPY statement – and ultimately, the organization was able to get up to speed and hit the ground running quickly. This was especially true since the PostgreSQL queries could be run without change on Vertica!
The results
The total cost of ownership for Vertica has remained low due in part to Etsy’s ability to administer Vertica with in-house Database administrators. Moreover, the Etsy team did not have to alter existing queries – tools that the organization had invested countless hours and placed into hundreds of reporting queries that were running on the Postgres BI server. Etsy ported and ran them unchanged on Vertica which resulted in enhanced time-to-value.
On staff analysts have proven to be a big benefactor as the organization has found that providing them with tools that they can use quickly and easily to leverage has led to new business insights. Etsy’s management encourages analysis and the discovering of new insights, and analysts have discovered things about the business by working in the database, finding interesting data, and then exploring it. As an example, Etsy uses Vertica alongside Hadoop and each night, analysts download search terms used on the market site to Hadoop and then run MapReduce jobs to improve search algorithms.
Moreover, Etsy is leveraging Vertica to complete ad hoc queries as analysts and employees that understand SQL demand it.
Metrics
Initially, Etsy planned for only a small group of analysts to use Vertica, but that number grew quickly. When the organization saw that that they could run financial reports that used to take days in literally seconds – demand spiked. Approximately 25 percent of employees use Vertica. Many analysts depend on Vertica every day resulting in difficulties for the IT team to find a window to perform updates.
While retrieving results in seconds instead of days is an important advantage, it is the fundamental changes enabled by the performance increase that have most benefitted the company. With Vertica, Etsy is noticing result sets so much faster that it’s akin to texting versus the Pony Express.
One important metric to underscore is the size of the Vertica license. Etsy started off with a 5 TB license, which soon scaled to 10 TB, then to 30 TB, then to 50 TB – and now is up to 270 TB. Vertica is foundational to all aspects of the business – from data scientists, to researchers, to in house analysts.
The Technology
Vertica
Disruptive Factor
An initial challenge that Etsy had to overcome was importing existing data into the platform. To address this challenge, engineers had developed a Schlep program into Vertica – which is still used to this day as it leverages MySQL.
Vertica easily facilitated this process as a result from going from a BI system based on PostgreSQL. Again, Etsy did not have to rewrite any existing queries. It was important when a platform was selected that minimal edits would be required to avoid workflow disruption.
Shining Moment
Vertica is now at the heart of Etsy’s analytics stack and the “Ah Ha” moment was actualized when important business queries that typically took 4 days to process were completing in just a few minutes.
