Supernova Award Category
The Problem
Most data science teams end up frustrated with their AI initiatives. Importantly, their frustration is not a result of the mathematical models they are building. This after all is what data scientists are trained to do. Rather, the frustration stems from the massive, complex datasets that most AI projects require in order to produce models that perform well in the real world. By underestimating the the data effort involved in AI projects, organizations place their data scientists under tremendous time pressure and oftentimes unrealistic expectations.
As an example, consider the challenges involved in teaching a drone to deliver a package. GPS can get the drone to the right neighborhood, even to the right house. But then, how does it know to leave the package on the driveway or front stoop, and not the roof or the backyard pool? The solution is training. The drone’s algorithm has to be shown tens or hundreds of thousands of aerial images of homes, each of which has clearly marked and labeled every significant object (e.g., driveways, bushes, ponds, yards, etc.).
Today, humans have to do much of that labeling and marking. Today, the humans who do the bulk of that work are very expensive, highly trained data scientists. And because there are usually only a handful of them to do the image annotation work of hundreds or thousands of people, strategic and high-visibility AI projects burn through budgets and fall behind schedule.
The Solution
Artificial intelligence is only as good as the data used to train it. Alegion combines its own AI software with a million-member on-demand workforce of data specialists to produce extremely high-quality AI training data at massive scale. The company’s Data and Task Management Platform can manage both great volumes of data (hundreds of thousands to millions of data items) and very large work teams (thousands of people simultaneously).
The software supports processes designed specifically for machine learning training data. It is able to define and configure complex data task workflows, including multi-step and conditional paths for text, photo, video and audio. Alegion’s platform has particular strengths in the area of computer vision, which includes image and video annotation capabilities that include point annotation and counts, keyline points, bounding boxes, polygons, and semantic segmentation, enhancing speed and accuracy for computer vision training data.
The results
Before: An organization decides to pursue a strategic AI initiative. They recruit expensive data scientists, who generate the necessary algorithms. It then falls to the data scientists to assemble, clean, normalize, label and annotate the thousands or millions of data items that are required to train the algorithm, so it can go into production at an acceptable level of confidence. The data scientists, highly skilled mathematicians, sometimes are not as skilled in data management and they almost never have the time required to select and process huge volumes of data. As a result, they burn through budget and project time trying to squeeze performance out of a model with inadequate training data. Ultimately their very strategic projects end up in significant jeopardy.
After: An organization decides to pursue a strategic AI initiative. Their data scientists endeavor to build the appropriate algorithms. They reach out to Alegion for help with their training data collection and annotation. Alegion discusses the scope and nature of the dataset and ascertains the annotations and taxonomies the dataset requires. The organization sends its data to Alegion. Alegion structures the tasks required to annotate the data in its platform, as well as the workflow required to meet the organization’s time and quality goals. Alegion selects an appropriate subset of its global workforce to execute on the tasks. Alegion delivers the training dataset to the organization.
Metrics
Alegion typically guarantees 99%+ accuracy in its training data. Greater accuracy is achieved with more steps in the workflow and more review steps, both automated and human.
The Technology
Alegion’s technology is a combined Data and Task Management Platform. It provides on-demand workers with tools for labeling and annotating training data. It includes a workflow manager that supports any number or type of annotation and review processes. The platform also manages and monitors the performance of the humans in the loop. It can manage massive amounts of data (hundreds of thousands to millions of data items) and very large work teams (thousands of people simultaneously).
Disruptive Factor
The real disruption in this context is AI and machine learning itself. The lack of high quality training data at scale is constraining the growth and promise of AI. The reality is that despite very significant focus and investments, over half of enterprise AI projects are never deployed. This is disastrous.
Alegion addresses training data, the single biggest blocker of AI deployments. Alegion’s own disruptive contribution is not just to make AI projects less expensive or faster to deployment, although that too is true. More often, however, Alegion training data efforts lead to model deployments that otherwise would not happen.
Shining Moment
Alegion’s founders have a passion for improving and evolving the distributed, on-demand labor model. Too often today these opportunities are exploitative and provide nothing but per-transaction compensation. Alegion works with governments and like-minded crowd sourcers to provide hundreds of thousands of people around the world with next-generation work opportunities and compensation that elevates their life situations.
