The Climate Corporation

, The Climate Corporation

Overview

pThe Climate Corporation (Climate) describes itself as “innovating at the intersection of agriculture and technology,” with a mission to “help all the world’s farmers sustainably increase their productivity with digital tools.” The Climate FieldViewTM (FieldView) platform helps farmers manage their crops using data on their acreage, soil type, elevation, average precipitation, crop and yield histories and other measures. Climate’s data science team builds models based on this data and each farmer’s goals to deliver personalized recommendations on what seeds and hybrids to plant, how to plant them, how to care for the crops throughout the growing season and when to harvest to maximize yields./p

Supernova Award Category

Data to Decisions

The Problem

pBy 2016, Climate already managed data by the petabyte, storing everything in Amazon Simple Storage Service (S3) and using Amazon Web Services (AWS) Spark and Yarn services on EC2 compute infrastructure to process the data and run analyses. Climate had already moved its data scientists away from doing desktop analyses to spinning up development instances on AWS, but the homegrown infrastructure the company built to support cloud-based modeling work had limitations:/p p• Climate lacked insight into the number, type, duration, cost and other workload parameters associated with each cloud-based job and project./p p• Data engineers had trouble quickly responding to data scientist requests for special hardware capacity, such as graphical processing unit (GPU) instances, because nonstandard compute options weren’t supported./p p• When jobs failed, data scientists (also known internally at Climate as researchers) would lose their analyses before they had a chance to extract vital workload information from the AWS Spark/S3 environment./p p• Available metadata and documentation on the data, languages, dependencies and requirements associated with each model varied, often making it difficult for data scientists to repeat analyses and for engineers to bring models into production./p

The Solution

pClimate quickly settled on the Domino Data Science Platform from Domino Data Lab, attracted by the following features and capabilities:/p p• Uses configurable Docker containers to create and document data science environments that can be shared and versioned for reuse and repeatability./p p• Supports discovery and reuse of data sources, including databases and distributed platforms deployed on-premises or in virtual or publiccloud environments./p p• Documents jobs and resources used to provide insight into data and languages used, dependencies and costs by project, model, user and other parameters./p pDomino Data Science Platform was deployed within a matter of two days in Climate’s AWS environment with configuration and management support from Domino Data Lab. Climate tailored Domino to make it easy for data scientists to securely access Climate’s data on S3 and deploy a variety of preconfigured development environments instantiated in Docker containers./p

The results

pThe benefits of the Domino deployment start with the data scientists, but they extend to the data engineering team, business stakeholders and Climate FieldView customers. Where it used to take Climate data engineers anywhere from one day to one week to set up special-request development environments, Domino has made it a self-service proposition for data scientists./p pThe platform has also set standards around code management, data management and cloud instances that have brought significant productivity gains. Climate estimates that its data science team has doubled its annual model output since Domino was deployed./p pClimate’s productivity gains have been driven in large part by Domino’s support for collaboration, repeatability and reuse. “It’s so much easier to share models and see what everybody is working on,” says Hochmuth. “In our homegrown environment, it was up to the data scientist to figure out what artifacts and data they needed to save, but in Domino, that’s all built into the platform. You can select any model run or experiment and see what version of the model and what data were used, the dependencies and what packages of software were used.”/p

Metrics

ul liAccording to Climate, tests of Seed Advisor conducted across 100,000 acres in Iowa, Illinois and Minnesota during the 2018 growing season demonstrated an average yield advantage of 9.1 bushels per acre compared with what the farmer would have planted without Climate’s recommendations./li liDisease risk and identification models based on three years of data successfully forecasted the occurrence of disease more than 80 percent of the time in 2018, giving farmers forewarning and recommendations on whether and when to apply fungicides./li /ul

The Technology

pThe Climate FieldViewTM (FieldView) platform helps farmers manage their crops using data on their acreage, soil type, elevation, average precipitation, crop and yield histories and other measures./p

Disruptive Factor

pGiven the scarcity and cost of data scientists, productivity was the key deliverable Climate needed to not only sustain FieldView but to support new recommendations, new crops and new geographic markets. “Most data scientists spend 80 percent of their time figuring out where data exists, how to bring it together, how to set up data infrastructure and how to execute an analysis,” says Hochmuth. “For somebody with a Ph.D., that’s not the best use of their time. Using Domino, now 20 percent of data scientists’ time might involve particulars around data and execution and 80 percent of their time is focused on doing model development.”/p p /p

Shining Moment

pThe benefits of standardization, automation and documentation supported by Domino have streamlined Climate’s model-development workflow and doubled data scientist productivity. The platform gives data scientists self-service control over their development environments while easing collaboration with data engineers for ongoing optimization work and the handoff of models for production deployment./p

Submission Details

Year
Category
Data to Decisions
Result