Supernova Award Category
The Problem
eHarmony’s match-making science is based on a three-step process: compatibility matching, affinity matching, and match distribution. With approximately 60 million registered eHarmony users, more than ten million pairings need to be scored every night; better matches result from re-training its matchmaking models across each specific user ID every day. While eHarmony has been running a Hadoop-based analytics environment for several years, the organization sought an improvement to its infrastructure that would facilitate model creation, training, and evaluations with simplicity, in real time and at scale
The Solution
eHarmony enhanced its Hadoop-based analytics environment by signing up for a Cloudera Enterprise subscription (versus self-supporting its CDH environment, as it had previously been doing) and upgrading its cloud infrastructure to utilize the latest Intel Xeon E5 Processors. The Modeling team, led by Dr. Jonathan Morra, deployed Maestro, Model Trainer, and Apache Spark within the Hadoop system to expedite and simplify the model creation and execution process.
The results
With its next-generation analytics environment, eHarmony can create more personalized affinity matching, delivering better results for more customers by boosting performance and accommodating more complex analyses. In other words, eHarmony’s new environment creates pairings that match users with other users that they’re inclined to both like and to also communicate with well, ultimately improving the chances of relationship success for users.
Metrics
With eHarmony’s new environment, data models can be created, run, and trained faster than before. Maestro, the legacy model training environment, trained models in about 28 hours each. The new system, Model Trainer, executes models in only three hours.
Meanwhile, the volume of code has been reduced from 2.8 MB to 1.6 MB, using fewer languages, making it easier to compare and understand data models.
The Technology
eHarmony uses Cloudera Enterprise as its Hadoop-based platform, leveraging ecosystem components including Apache Spark, Maestro, and Model Trainer, running on the Intel Xeon processor E5 family.
Disruptive Factor
As one of the earliest adopters of Hadoop as an enterprise platform, eHarmony is once again demonstrating its prowess as an innovator by incorporating an up-and-coming technology -- Spark -- into its analytics stack to deliver real-time analytics with unprecedented complexity and scalability. While delivering strong results, this comes with challenges as well; eHarmony has learned that Spark is complicated to configure due to its relative immaturity. It requires a strong understanding of all of its configuration options and how to tune them. But with the appropriate knowledge, the tool can deliver impressive results. In eHarmony’s case, this translates into a larger number of happy relationships.
Shining Moment
eHarmony demonstrates a unique case where the use of cutting edge technologies applied successfully truly makes the world a happier place. Peer reviewed research published in the Proceedings of the National Academy of Sciences in 2012 indicates that eHarmony has the lowest divorce rate of all meeting places measured. On average, 438 people marry every day in the U.S. as a result of being matched on eHarmony, nearly 4% of new marriages.
