In July of this year, Satya Nadella shared our broad vision for big data and analytics when he announced Cortana Analytics. Building on this vision, today we’re announcing a new and expanded Azure Data Lake that makes big data processing and analytics simpler and more accessible.MyPOV - Good to see Microsoft executing fast on announcements, I'd peg the Data Lake even all the way back to May at Build Win. But then there is some need of urgency as enterprises are building next generation applications at increased pace these days and it is important for Microsoft to be there as a strategic technology partner.
The expanded Microsoft Azure Data Lake includes the following:
Azure Data Lake Store, previously announced as Azure Data Lake, will be available in preview later this year. The Data Lake Store provides a single repository where you can easily capture data of any size, type and speed without forcing changes to your application as data scales. In the store, data can be securely shared for collaboration and is accessible for processing and analytics from HDFS applications and tools.
Azure Data Lake Analytics, a new service built on Apache YARN that dynamically scales so you can focus on your business goals, not on distributed infrastructure. This service will be available in preview later this year and includes U-SQL, a language that unifies the benefits of SQL with the expressive power of user code. U-SQL’s scalable distributed query capability enables you to efficiently analyze data in the store and across SQL Servers in Azure, Azure SQL Database and Azure SQL Data Warehouse.
Azure HDInsight, our fully managed Apache Hadoop cluster service with a broad range of open source analytics engines including Hive, Spark, HBase and Storm. Today, we are announcing general availability of managed clusters on Linux with an industry-leading 99.9% uptime SLA. HDInsight will be able to take advantage of capabilities in the Store for increased throughput, scale and security.
Supporting the Azure Data Lake:
Azure Data Lake Tools for Visual Studio, provide an integrated development environment that spans the Azure Data Lake, dramatically simplifying authoring, debugging and optimization for processing and analytics at any scale.
Leading Hadoop ISV applications that span security, governance, data preparation and analytics can be easily deployed from the Azure Marketplace on top of Azure Data Lake.
MyPOV - Kudos to Microsoft for defining and thus bringing clarity to what the Azure Data Lake is from a product portfolio perspective. Too often vendors create more confusion with announcements, so this definition is helpful. So let's dissect more below.
Azure Data LakeMyPOV - Well that reads almost all too good to be true. Removing complexities is always a good thing, but reading 'seamless' integration always makes the alarm bells ring... we will have to dig a little deeper here. On the flip side Microsoft can claim a broad level of experience working with the data of its different divisions. If Microsoft can showcase how 'it is drinking its own champagne' with Azure Data Lake - that would be very powerful showcase and adding immediate credibility to the overall offering.
Azure Data Lake includes all the capabilities required to make it easy for developers, data scientists, and analysts to store data of any size, shape and speed, and do all types of processing and analytics across platforms and languages. It removes the complexities of ingesting and storing all of your data while making it faster to get up and running with batch, streaming, and interactive analytics. Azure Data Lake works with existing IT investments for identity, management, and security for simplified data management and governance. It also integrates seamlessly with operational stores and data warehouses so you can extend current data applications. We’ve drawn on the experience of working with enterprise customers and running some of the largest scale processing and analytics in the world for Microsoft businesses like Office 365, Xbox Live, Azure, Windows, Bing and Skype. Azure Data Lake solves many of the productivity and scalability challenges that prevent you from maximizing the value of your data assets with a service that’s ready to meet your current and future business needs.
“Hortonworks and Microsoft have partnered closely over many years to further the Hadoop platform for big data analytics, including contributions to YARN, Hive, and other Apache projects,” said Rob Bearden, CEO at Hortonworks. “Azure Data Lake, including Azure HDInsight powered by Hortonworks Data Platform, demonstrates our shared commitment to make it easier for everyone to work with big data.”
MyPOV - Always good to have partner and ecosystem presence in a press release and Hortonworks is key for the proper working of the Azure Data Lake.
Azure Data Lake Store – A hyper-scale repository for big data processing and analytic workloadsMyPOV - Storage capabilities have become core to next gen Application persistence needs, if e.g. IoT data cannot be ingested fast enough, the finest analytics tool set on top of it doesn't create value. We will have to see what scalability numbers and use cases Microsoft can offer, a good move is to be 'agnostic' in regards of the Hadoop distribution partnering with the 'Big 3' - Cloudera, Hortonworks and MapR.
The value of a data lake resides in the ability to develop solutions across data of all types – unstructured, semi-structured and structured. This begins with the Azure Data Lake Store, a single repository to capture and access any type of data for high-performance processing and analytics and low latency workloads with enterprise-grade security. For example, data can be ingested in real-time from sensors and devices for IoT solutions, or from online shopping websites into the store without the restriction of fixed limits on account or file size unlike current offerings in the market. As part of Azure Data Lake, the store supports development of your big data solutions with the language or framework of your choice. The store in Azure Data Lake is HDFS compatible so Hadoop distributions like Cloudera, Hortonworks®, and MapR can readily access the data for processing and analytics.
"Cloudera is pleased to be working closely with Microsoft to integrate our enterprise data hub with the Azure Data Lake Store,” said Mike Olson, founder and chief strategy officer at Cloudera. “Cloudera on Azure benefits from the Data Lake Store which acts as a cloud-based landing zone for data in your enterprise data hub. Because the store is compatible with WebHDFS, Cloudera can leverage Data Lake and provide customers with a secure and flexible big data solution."
MyPOV - Good quote by Olsen, making things tangible is always appreciated. It reads like Cloudera depending applications and code can be deployed with Microsoft Data Lake a which is a win for customers and of course allows Microsoft to position the Azure Data Lake as alternate storage to other large data storage options in the market.
Azure Data Lake Analytics – a new distributed processing and analytics service
Azure Data Lake Analytics lets you focus on the logic of your application, not the distributed infrastructure running it. Instead of deploying, configuring and tuning hardware, you write queries to transform your data and extract valuable insight. Built on Apache YARN, and designed for the cloud, the analytics service can handle jobs of any scale instantly by simply setting the dial for how much power you need. The analytics service for Azure Data Lake is cost-efficient because you only pay for your job when it is running, and support for Azure Active Directory lets you manage access and roles simply and integrates with your on-premises identity system.
MyPOV - Always good to reduce complexity and taking away the details and intricacies of infrastructure is a huge draw for enterprises to deploy next gen application to cloud based platforms. And good for Microsoft to point out usage based billing, one of the key value propositions of cloud-based offerings. Finally smart from Microsoft to keep pay using the 'higher ground' the on premise Active Directory integration is one of the big differentiators for Microsoft when moving services from on premise to the cloud or provisioning users for next generation applications.
We know that many developers and data scientists struggle to be successful with big data using existing technologies and tools. Code-based solutions offer great power, but require significant investments to master, while SQL-based tools make it easy to get started but are difficult to extend. We've faced the same problems inside Microsoft and that’s why we introduced, U-SQL, a new query language that unifies the ease of use of SQL with the expressive power of C#. The U-SQL language is built on the same distributed runtime that powers the big data systems inside Microsoft. Millions of SQL and .NET developers can now process and analyze all of their data with the skills they already have. The U-SQL support in Azure Data Lake Tools for Visual Studio includes state of the art support for authoring, debugging and advanced performance analysis features for increased productivity when optimizing jobs running across thousands of nodes.
MyPOV - I am always concerned when new programming languages get created. In my view the world has enough programming languages already. That said - there are new use cases that developers and enterprises have to build applications for. So it comes back to understand what benefits U-SQL brings developers. It's good to see Microsoft has used it for its internal use cases and deployments, so before coming to a first verdict in U-SQL, let's understand more details about it.
“U-SQL was especially helpful because we were able to get up and running using our existing skills with .NET and SQL,” says Sam Vanhoutte, Chief Technology Officer at Codit. “This made big data easy because we didn’t have to learn a whole new paradigm. With Azure Data Lake, we were able to process data coming in from smart meters and combine it with the energy spot market prices to give our customers the ability to optimize their energy consumption and potentially save hundreds of thousands of dollars.”
MyPOV - Well that's a strong endorsement of U-SQL, good point to see the compatibility with C# and NET. This is one of the key advantages of Micro in next generation applications, leverage the existing ecosystem and developer community. Making integration to C# easier, taps in the 2nd largest developer community out there (after Java).
Azure HDInsight - Fully Managed Hadoop, Spark, Storm and HBase
Azure Data Lake also includes HDInsight, our Apache Hadoop-based service that allows you spin up any number of nodes in minutes. As one of the fastest growing services in Azure, HDInsight gives you the breadth of the Hadoop ecosystem in a managed service that’s monitored and supported by Microsoft. Furthering our commitment to productivity, we’ve updated our Visual Studio Tools for authoring, advanced debugging, and tuning for Hive queries and Storm topologies running in HDInsight.
MyPOV – Also a good move, HDInsight has been popular in the Microsoft ecosystem already, as enterprises struggle with standing up and operating Hadoop distributions. And finding synergies with Visual Studio will help this offering further.
Today, we are announcing the general availability of HDInsight on Linux. We work closely with Hortonworks and Canonical to provide the HDP™ distribution on the Ubuntu Operating System that powers the Linux version of HDInsight in the Data Lake. This is another strategic step by Microsoft to meet customers where they are and make it easier for you run Hadoop workloads in the cloud.
MyPOV – That’s probably one of the bigger news in the release. It is not too long ago when Microsoft was a Windows only shop. By endorsing more Linux based offerings, Microsoft is looking at what developers are familiar with, and what is more of a preferred platform. Being Linux based has also become more and more a requirement for RfP for the data side of next gen Application platforms, as enterprises still see the need to potentially move these platforms back on premise. Finally TCO to run Hadoop on Linux vs Windows may be different, starting with the availability of distributions, but likely a topic Microsoft will not be as eager to address.
Leading Hadoop ISVs on the Azure Data Lake
There are a growing set of leading data management applications for Azure Data Lake. This includes applications that provide end-to-end big data analytics like Datameer, technologies that address big data security and governance like Dataguise and BlueTalon, unified stream and batch with DataTorrent, and tools that give business users the ability to visualize and analyze data in compelling ways like AtScale and Zoomdata. Support from our partners ensures that you have the best applications available as you get started with Azure Data Lake.
MyPOV – Good to see the mention of a number if important ISVs that help enterprises to build, operate and maintain their next generation applications. The ISV investment is a further proof point that Microsoft’s BigData offering is being taken serious and sees traction in the enterprise (otherwise these ISVs would not invest to bring their platforms to Azure).
We will continue to invest in solutions for big data processing and analytics to make it easier for everyone to work with data of any type, size and speed using the tools, languages and frameworks they want to in a trusted cloud, hybrid or on premise environment. Our goal is to make big data technology simpler and more accessible to the greatest number of people possible: big data professionals: developers, data scientists, analysts, and application developers; but also businesspeople and mainstream IT managers.
MyPOV – Well that’s a nice mission statement. Let’s check in how Microsoft delivers along this vision and goals in the next quarters.
Overall MyPOVNext gen Applications are key for enterprises to achieve differentiation and create disruption in their prospective industries. Almost all of the 7 generic use cases we have identified involve Hadoop style BigData elements, as well as analytics and machine learning capabilities. Microsoft knows this and is adding capabilities to Azure accordingly. Moreover BigData projects bring ‘load’ to platforms, which itself creates scale, which then creates economies of scale that reduce the cost to operate the platforms. Again BigData is great for platform vendors, due to the gravitational nature of BigData. It is not easy to move BigData across platforms.
On the concern side, Microsoft comes late to the overall cloud platform and BigData market. Enterprises had to look elsewhere and only around spring of this year, Microsoft has caught up in product capability and messaging. The capabilities of this press release are another major step forward, but the interested buyer needs to note the GA dates of most of these offerings.
On the brighter side Microsoft is catching up to the leaders in the space, and bringing a large ecosystem and synergy play to the next generation applications market. There are a lot of Microsoft centric enterprises out there, and for them these offerings are attractive.
Stay tuned for more about the exciting platform announcements for building next generation applications.
More about Microsoft:
- News Analysis - Microsoft and Salesforce Strengthen Strategic Partnership at Dreamforce 2015 - Good for joint customers - read here
- News Analyis - NetSuite announced Cloud Alliance with Microsoft - read here
- Event Report - Microsoft Build - Microsoft really wants to make developers' lives easier - read here
- First Hand with Microsoft Hololens - read here
- Event Report - Microsoft TechEd - Top 3 Enterprise takeaways - read here
- First Take - Microsoft discovers data ambience and delivers an organic approach to in memory database - read here
- Event Report - Microsoft Build - Azure grows and blossoms - enough for enterprises (yet)? Read here.
- Event Report - Microsoft Build Day 1 Keynote - Top Enterprise Takeaways - read here.
- Microsoft gets even more serious about devices - acquire Nokia - read here.
- Microsoft does not need one new CEO - but six - read here.
- Microsoft makes the cloud a platform play - Or: Azure and her 7 friends - read here.
- How the Cloud can make the unlikeliest bedfellows - read here.
- How hard is multi-channel CRM in 2013? - Read here.
- How hard is it to install Office 365? Or: The harsh reality of customer support - read here.