Salesforce announced a good starting point for big data analysis Thursday with the introduction of Salesforce Wave For Big Data, but it’s just that: a starting point. Stay tuned for more partners, more big data plumbing and, most importantly for customers, proven and repeatable use cases.

This week’s announcement was all about big data access, with free integrations introduced to the Google Compute Cloud, Cloudera and Hortonworks Hadoop distributions, and application-management vendor New Relic. That’s a pretty impressive list, though the obvious hold outs are Amazon Web Services and Splunk. It’s a good bet they’ll join the list as Salesforce Wave adoption grows.

A fictional Salesforce Wave for Big Data analysis in which weather and sensor data is mashed up with a bike-share firm's CRM data.

A Salesforce Wave for Big Data analysis in which weather and sensor data is mashed up with customer Bay Area BikeShare’s CRM data.

What you didn’t hear much about was ready-to-run analyses, much less anything resembling applications. That’s pretty typical in the big data space where the possibilities are always limitless but not terribly concrete. It’s up to the practitioners to come up with something practical, affordable and differentiating.

There are, at least, very obvious themes Salesforce customers can pursue:

  • Sales: Clickstreams and other high-scale customer-interaction data sets could enhance sales lead scoring efforts.
  • Service: With all the discussion of Internet-of-Things opportunities, we can’t miss the idea of applying log and sensor data to customer service scenarios like predictive maintenance.
  • Marketing: Mobile data gets big, and it could be useful to marketers looking for best customers and hoping for responsive marketing campaigns.

In the absence of lots of real-world customer-use-case examples (for now), the message might be, “if you can dream it, you can do it.” But the data you need won’t magically appear in Salesforce Wave. We’re going to see two personas involved at the back end of Salesforce Wave for Big Data deployments: big data developer types and data wranglers. The developer types will be the people who can spot the right data deep inside these Google/Cloudera/Hortonworks/New Relic data lakes. The data wranglers will then curate that information into slivers of data or aggregates that can be loaded into Wave.

Data curation is where Salesforce Wave for Big Data partners Informatica and Trifacta come in. Informatica is, of course, the biggest cloud-data-integration partner for Salesforce. But this data-munging process doesn’t have to be an IT project. These two vendors also offer self-service style data-blending and transformation tools — Informatica Rev and Trifacta, respectively – that are geared to analysts and data-savvy business users who can prepare data for loading into Wave. Once these data-movement jobs are set up, Salesforce says they can be repeated up to 20 times per day, with near-hourly latency being another pretty good start.

MyPOV: Great Start, Now For The Second Wave

Salesforce will have more to say about big data later this summer, so the story will get richer than extracting data sets and putting them into Wave.  We’ll want to see deeper and more repeatable connective tissue between diverse source systems and Wave. Customers will want to see proven, lighthouse-customer case examples and some form of starting point or blueprint for building big data apps.

The imperative for would-be customers – and indeed for all big data practitioners – is getting business leaders engaged in the blue-sky brainstorming. Without good, free-flowing and on-going collaboration between the business types and those developers and data wranglers, a project might start to feel like an old-school data warehousing project, with requirements-gathering and time-lag disconnects. As my colleague Holger Mueller points out, the very act of picking slivers of data or aggregates out of those big repositories removes the opportunity to “ask all kinds of ‘crazy’ questions in the hunt for insights.”

Holger is alluding to the flexibility of schema-on-read analysis, but it’s just not practical to put everything in a Hadoop cluster into Wave. The database behind Wave is a proprietary NoSQL key-value-store, so we should expect fast and flexible data loading. But from what we’ve seen thus far, it seems the serendipitous big-data-discovery opportunities will remain at the big data platform level.

Salesforce Wave for Big Data is about putting proven insights into production in an end-user-accessible data mart in the cloud. That’s still powerful because Wave is broadly accessible to all those Salesforce business users with, of course, all the mobile and social goodness that comes with Salesforce.

Have a question about your big data/analytics strategy? Let's talk! Contact me here.