IBM adds cloud services for database deployment, graph analysis, predictive analytics and data-enrichment, hoping to attract developers and data scientists to IBM Bluemix.

IBM on February 4 added a handful of new data and analytics services to the list of more than 25 now available on the IBM Bluemix cloud platform. The list includes IBM Compose Enterprise, the IBM Graph database service, IBM Predictive Analytics and IBM Analytics Exchange.

@IBM, #analytics, #cloudcomputing

IBM added four new services to its Data and Analytics portfolio on February 4. The
collection is just one subset of services in the IBM Bluemix cloud platform portfolio.

As is typical with press releases, the details of the announcement where cryptic and buzz-word ladened, so I talked to Adam Kocoloski, CTO of IBM’s Analytics Platform Services Marketplace, to get more detail. Here’s what I learned along with my own analysis.

IBM Compose Enterprise: When IBM acquired ComposeIO (now Compose, an IBM Company) last July, it accelerated its ability to help development teams with delivery of web-scale apps by enabling them to quickly containerize and deploy open-source databases on multiple public clouds (today that list includes AWS, DigitalOcean and IBM Softlayer). Compose Enterprise puts the same platform in the hands of enterprise IT so they can deploy MongoDB, Elasticsearch, Redis, PostgreSQL, RabitMQ and a few other distributed, open source products in containerized fashion on their private clouds. The service eases the task of clustering, containerizing, upgrading and backing up database instances in a uniform way.

MyPOV: This is a great move, giving customers a private-cloud/IaaS containerized database deployment option. As a side note, at one point (the company says it was two years ago), Compose supported deployment on Microsoft Azure. Not sure why that option went away, but it would be nice to see Azure and Google on the public cloud option list along with AWS and Softlayer. The more portable the container, the better.

IBM Graph: This is a managed, cloud-based graph database service that employs Apache TinkerPop and the Gremlin graph traversal language as its interface. As Wikipedia describes it, Apache TinkerPop and Gremlin are to graph databases what the JDBC and SQL are to relational databases. Developed in 2009, Gremlin is supported by notable OLTP graph databases including Neo4j, OrientDB and Tital as well as OLAP graph processors Giraph and Spark. Graph approaches are well suited to use cases including IoT, real-time recommendations, fraud detection, and network analysis.

Why use IBM Graph as opposed better known graph databases (like Neo4j, Titan or GraphDB) delivered as a service? Kocoloski said the graph data store under the hood of IBM Graph is “an implementation detail” that’s not exposed to the service user. “Because we’ve standardized on the TinkerPop API, we can experiment with different engines for different use cases. Many graph databases support TinkerPop, so it’s possible to use Neo4j one day and swap it out for something else the next without rewriting the application.”

MyPOV: IBM had a big hand in pushing TinkerPop to join the Apache Software Foundation. I suspect it did so for the same reasons IBM put big support behind CouchDB, an open source, NoSQL rival to the likes of MongoDB and Couchbase. (It did so after acquiring Cloudant, a database-as-a-service company that based its product on CouchDB.) Yes, both CouchDB and TinkerPop are open source projects, but in both cases IBM can exert a lot of influence. While there’s plenty of competition for MongoDB and Neo4j services and support, IBM is the dominant support provider for CouchDB and the only support option for IBM Graph. In short, IBM stands to benefit more if customers use IBM Graph, a commercial offering with open source TinkerPop and Gremlin under the hood.

IBM Predictive Analytics: IBM says this new service allows developers to “easily self-build machine learning models from a broad library into applications.” As I detailed last year, IBM has put a big push on Apache Spark, redesigning or replatforming more than 15 analytics and commerce offerings to run on Spark. IBM says this “dramatically accelerates real-time processing capabilities.”

Predictive Analytics is aimed at putting predictive analytics in the hands of more users. “Not everyone is going to be comfortable dropping down into Spark and working with MLLib or writing their own Scala jobs,” said Kocoloski. “We’re trying to give people an introduction to the world of multivariate analysis and machine learning.”

MyPOV: The key mode of democratization in this new service is an auto-modelling capability with roots in SPSS and that is currently limited to models exposed in SPSS. Open source Spark is an enabler under the hood, but it’s another service offering that gets back to an IBM commercial product.

IBM Analytics Exchange: This data exchange today offers a catalog of more than 150 public datasets along with a way to blend private data with public data to come up with freshinsights. Exchange is also a foundation for metadata and business data management in the cloud, said Kocoloski. The Exchange will ultimately offer more data sources and make it easier for customers to find relevant data sources. IBM also has a separate data-as-a-service business called Insight Cloud Services, which offers high-value data including Twitter Data and data from The Weather Company (an acquisition finalized last week). The Exchange will also be “a foundational component” of Insights Cloud Services, said Kocoloski. In short, think of the Exchange as enabling plumbing for exposing and blending all sorts of data.

MyPOV: The ability to offer data in the cloud is of huge and growing importance, so the more the better. I look forward to seeing more data sets and a single, integrated catalog serving up everything from widely available public data to unique, high-value sources such as The Weather Company. I do have a minor quibble with the name “Analytics Exchange,” as it seems what we’re talking about here is a data exchange or data-as-a-service infrastructure, not a place where analytics are developed.

MyPOV Overall

The real power of these offerings is cumulative. It’s not any one, tactical service that makes IBM’s case, it’s the combined breadth of offerings on Bluemix that creates and one-stop shop for the developer and data scientist. These data and analytics services are a fraction of the overall catalog, as there are also Watson cognitive services, mobile services, DevOps services, web and application services, business analytics services, storage and security services and so on. The broader and more coordinated the portfolio and platform becomes, the more compelling it becomes as a cloud-based platform for delivering next-gen, data-driven apps.