Results

Microsoft Teams - What Office 365 Has Been Waiting For

Microsoft Teams - What Office 365 Has Been Waiting For

Microsoft first introduced their new collaboration tool Microsoft Teams (they call it a chat-based workspace) back in Nov 2016 as a preview release. Four months and over 100 product enhancements later, this week they made the first official version available to Office 365 business subscribers. Microsoft says there are more than 85 million active users of Office 365, and since the November preview, more than 50,000 organizations (they don’t report how many users) have started using Microsoft Teams, which is available in 181 markets and 19 languages.

You can read the blog post from Kirk Koenigsbauer, Corporate Vice President of the Microsoft Office team for all the details from the launch, and below is a collection of my observations that you can scroll through.

 

Choices. Choice. Choices.

The Group Messaging market, or as Constellation Research calls it Conversational Business Platforms (CBP) is highly competitive. In the last 3 months alone there have been launches of Cisco Spark, Slack Enterprise Grid, a preview release of IBM’s Watson Workspace and an early adopter program for Google Hangouts Chat. Next week at the Enterprise Connect conference the traditional Unified Communication vendors such as Ring Central/Glip, ALE Rainbow, Unify Circuit, and others will be announcing their latest news. Add to that products including Convo, Flock (which just raised another $25M), HiBox, Intellinote, Ryver and Zinc and, let’s not forget about Workplace by Facebook and you can see that organizations have a wide variety of choices.

Microsoft Teams Starts Off Strong

For an initial release, Microsoft Teams is already very robust. It provides:

  • Threaded conversations with rich text including custom gifs (make your own cartoons)
  • Voice and video calling (I believe up to 80 people in a meeting)
  • Integration with Office 365 apps such as Word, PowerPoint, Excel, OneNote, Planner and SharePoint
  • Calendar integration and meeting scheduling
  • Highly customizable, add your own tabs along the top of each channel for the applications you want
  • Lots of security and compliance standards: SOC 1, SOC 2, EU Model Clauses, ISO27001 and HIPAA, as well as support for audit log search, eDiscovery and legal holds.
  • Accessibility features including support for screen readers, high contrast and keyboard-only navigation
  • There are already 150+ 3rd party integrations and bots

That’s quite an impressive list for a 1.0 product and shows that Microsoft is taking Teams very seriously. I view Teams as what Office 365 should have been from the start… a single user experience that brings together multiple Microsoft (and partner) products/features allowing people to focus more on the work and less on what product they are in.

Still Need: Deeper Integration and Polish

While Teams is very good for an initial release there are still several areas where it needs more functionality or polish. 

  • The OneNote and Planner integrations are quite rudimentary. You can not convert (or copy) a threaded conversation into a note, nor create a note and broadcast a link into the channel. You can’t create a Planner task from the conversation stream nor are updates to Planner tasks broadcast into the stream. In the current incarnation, these apps are simply tabs in a channel that allow you to access the applications, but there is very little integration and they operate as silos.
  • I don’t see a way to mute or hide conversations, so busy channels can get quite full
  • While you can save favourite conversations, I don’t see a way to get a permalink to one so that you can send a link to someone in email or chat, or add it to a calendar invitation.
  • YouTube videos launch into a separate window instead of playing inline
  • There are no hashtags for grouping similar messages
  • Currently Microsoft Teams is limited to internal use only, meaning you can not invite people outside of your organization into a team. For external communication, Microsoft still recommends using a Yammer community. Microsoft expects to have external guest access available at the end of Q2.

My main issue with Microsoft Teams is that while it is built using Office 365 Groups, conversations across Yammer, Outlook Groups, are Teams are not overlapping. What I mean is, if you have an Office 365 group named Marketing, you can't post in a Microsoft Team named Marketing and have that same conversation appear in Yammer and in Outlook Groups. This will lead to confusion over which application to use and when. I would like to see a more consistent experience across all of Microsoft's communication and collaboration applications.

Becoming An Intelligent Workspace

Microsoft is making good strides in adding Artificial Intelligence (AI) features to Office, but I’ve yet to see anything added to Teams. Compare that to IBM’s Watson Workspace which uses AI to classify posts by type (such as question or task) as well as provide a daily summary of key conversations. I look forward to seeing what Microsoft does with their Cognitive Services and Cortana to automate workflows, filter information, provide intelligent recommendations, classify images and files, etc.

What This Means for Customers

Two years ago Slack gave the enterprise collaboration market a wake up call. Despite the availability of enterprise social networks such as Yammer, Socialcast, Jive, IBM Connections and others, it was clear that small groups of people (teams) wanted an easier way to collaborate. Slack’s popularity led to several “clone products” as well as forced IBM, Google and Microsoft to answer back with products of their own. Integration with the Office 365 portfolio, the huge Microsoft partner ecosystem, and the fact that it’s included in their license makes Microsoft Teams a compelling product for Microsoft customers. However, that’s also its weakness. Being part of Office 365 is not what everyone is looking for. Customers who want a simple chat client without the overhead or complexity of Office 365 may opt to look at one of the other Conversational Business Platforms solutions.

 

Oracle Scoops Up Infinity Big Data Platform for Its Marketing Cloud

Oracle Scoops Up Infinity Big Data Platform for Its Marketing Cloud

Constellation Insights

Oracle has quietly added what appears to be a very powerful big data tool to its arsenal. The company is acquiring the technology assets of Infinity, a big-data platform developed by web analytics pioneer Webtrends, and plans to add it to its Marketing Cloud suite. 

Very little information about Oracle's specific plans for Infinity was released, but it will be offered as a cloud service in six to nine months following the deal's close. 

It wasn't clear why Webtrends decided to sell Infinity to Oracle, as it had only launched the platform last year. In a blog post, one Webtrends executive described Infinity's purpose as such:

Infinity is built to support the data challenges presented by the Internet of Things. It does so by providing an object-centric big data store coupled with unlimited data accessibility and exploration. Being object-centric goes far beyond web analytics and “person” behaviors to collecting and understanding interactions from devices, sensors and any other networked object.

Infinity was created as a successor to Webtrends's older Log Analyzer product, and delivers deeper, more granular and lower-latency analysis, says Constellation Research VP and principal analyst Doug Henschen. "It can analyze customer behavior at the individual visitor and event level, and as users move from phones to laptops and other devices. Thus, it's a valuable tool in supporting multi-channel attribution."

There are some familar big data names under the hood of Infinity, including Hadoop and real-time technologies such as Kafka and Spark. Infinity should align well with Crosswise, an acquisition Oracle made last year for its Data Cloud. Crosswise aligns customer data across channels, Henschen notes.

"The Oracle Marketing Cloud role for Infinity supports cross-channel indentification in real time and in the context of marketing activities," Henschen adds. 

Infinity "is a valuable add-on to the Oracle Marketing Cloud," Henschen says. It remains to be seen what capabilities it will deliver once the deal closes, however, so existing customers should avoid deeper investments until the road map is clear, he adds.
 
24/7 Access to Constellation Insights
Subscribe today for unrestricted access to expert analyst views on breaking news.
Data to Decisions Marketing Transformation Next-Generation Customer Experience Chief Customer Officer Chief Marketing Officer Chief Digital Officer

Event Report - Workforce Vision 2017 - After restart and takeoff, time for the boosters

Event Report - Workforce Vision 2017 - After restart and takeoff, time for the boosters

We had the opportunity to attend the Workforce Software Vision 2017 user conference held in New Orleans from March 13th till 15th 2017. The even was well attended with over 260 attendees, managing a collective 500k workers. 

 
 



Here is my event video of the event:

 

No time to watch – here is the 1 slide update:

 

If you want more details – read on:

Focus on Product and more – Always good to see vendors focusing on product, in the case of Workforce Software, CEO Morini shared that the vendor will add 80 more developers. For a 600 employee company a substantial investment. Likewise Workforce Software has shown progress on the partner side (more large SIs are on board), it appears the reseller relationship with SAP is going well, and lastly the vendor unveiled a new implementation methodology to help customers go live quicker.


 
Workforce Software Constellation Research Holger Mueller
Morini introduces the connected Workforce


Roadmap Transparency – In the past Workforce (like many other HCM vendors) was not a poster child for transparency, luckily for customers, this has changed. And Workforce shared a three year roadmap, with the usual caveats, but quite a difference to what was shared two years ago (the last user conference I was able to attend). No surprise – unification is a key theme across the coming release, with some vertical capabilities and most importantly a much needed UI improvement. 

 
Workforce Software Constellation Research Holger Mueller
Workforce Software 2016 in review


Hard work – will it be enough? – No question Workforce has made a lot of progress, but the question remains, can the vendor catchup with the 800 pound gorilla of the industry, Kronos. In general the speed of vendors in Workforce Management is increasing, with a strong focus on innovation, so Workforce’s task is not getting easier. But plenty of room to differentiate, but the vendor now needs to get accelerate in delivery across the board, hence the blog post title.
 
Workforce Software Constellation Research Holger Mueller
Broady opens Workforce Vision 2017
 

MyPOV 

Good progress by the new Workforce Software management team, no doubt. With more investment into product, focus on implementation speed, more partners, successful reseller relationships, and more – the vendor is executing the right strategies. Now they have to materialize and make a difference in the near future.

On the concern side, Workforce Software has to bring together multiple platform at different levels, from architecture, data centers all the way to UI. And it needs to deliver the next generation of its product, taking advantage of cloud, microservices etc. and for all the talk on engagement, it must improve its user experience. The days of clumsy screens for power users are counted. To be fair, the vendor has realized that and plans a UI overhaul, architecture change and other improvements.

If it all will be enough to change the distance to the market leader, it is too early to tell, with no doubt Workforce Software has positioned itself much better than where the vendor was a few years ago. We will have to check in again, stay tuned



More on Workforce Software:
 
  • News Analysis - WorkForce Software Announces Global Reseller Agreement with SAP - read here
  • Progress Report - WorkForce Software powers into more Workforce Management - but needs to watch the Fundamentals - read here


More on Workforce Management:
 
  • Event Report - Kronos KronosWorks - Solid progress and big things loom - read here
  • Progress Report - Ceridian makes good progress, the basics are done now its about next gen capabilities - read here
  • Event Report - Kronos KronosWorks - New Versions, new UX, more mobile - faster implementations - read here
  • Event Report - Ceridian Insights - Momentum and Differentiation Building - read here
 


Find more coverage on the Constellation Research website here and checkout my magazine on Flipboard and my YouTube channel here.
New C-Suite Data to Decisions Innovation & Product-led Growth Revenue & Growth Effectiveness Future of Work Next-Generation Customer Experience Tech Optimization Digital Safety, Privacy & Cybersecurity Leadership AI Analytics Automation CX EX Employee Experience HCM Machine Learning ML SaaS PaaS Cloud Digital Transformation Enterprise Software Enterprise IT HR AR Chief Customer Officer Chief People Officer Chief Human Resources Officer

Concur Gets Deeper Into Traveler Risk Management

Concur Gets Deeper Into Traveler Risk Management

Constellation Insights

SAP's Concur subsidiary is expanding its play in traveler risk management, a key area of innovation in a world increasingly marked by political unrest, extreme weather and terrorism. Here are the details from its announcement:

Concur Risk Messaging will capture unrivaled traveler location data via Concur Travel & Expense, Concur Mobile, Concur TripLink, TripIt from Concur, supplier e-receipts and more, providing travel managers immediate and unparalleled visibility into employees that may be at risk. Concur Active Monitoring, powered by HX Global, will offer 24/7 monitoring, proactive communication capabilities, and assistance coordination. This enables businesses to deliver on their commitment to ensure employee safety and well-being as they travel, across time zones and outside of business hours.

While Concur has offered traveler risk management capabilities for years, the new offering broadens the feature set through the partnership with HX Global. Concur's system uses more granular travel data than that provided by a GDS (global distribution system) such as Sabre, which are used by transportation providers, hotels and travel agencies to make reservations. For example, it can use an employee's expense receipts and card purchases to piece together a location data trail. Concur can also pull in HR system data for a fuller view of the employee.

Overall, it's growing business for Concur. The company says it sent more than 10 million alerts to travelers last year, and the number of Concur users who were alerted grew from 151,000 to 1.3 million over the course of the year. 

Concur's Fusion user conference is ongoing in Chicago this week. I'll be there and plan to dig deeper into Concur's new travel risk management offering. 

24/7 Access to Constellation Insights
Subscribe today for unrestricted access to expert analyst views on breaking news.

Future of Work Tech Optimization Digital Safety, Privacy & Cybersecurity Chief People Officer

CEN Member Chat with R "Ray" Wang on Dynamic Leadership

CEN Member Chat with R "Ray" Wang on Dynamic Leadership

R "Ray" Wang, founder of Constellation Research, shares his views on what it takes to be a dynamic leader and explains why it's valuable. For those who want regular access to content like this, consider joining the Constellation Executive Network

On <iframe src="https://player.vimeo.com/video/208403091?badge=0&autopause=0&player_id=0" width="832" height="720" frameborder="0" title="CEN Member Chat with R &quot;Ray&quot; Wang on Dynamic Leadership" webkitallowfullscreen mozallowfullscreen allowfullscreen></iframe>

Mar 22: Join Microsoft CIO Jim Dubois and Mott Macdonald’s Simon Denton on how to achieve success with Office 365

Mar 22: Join Microsoft CIO Jim Dubois and Mott Macdonald’s Simon Denton on how to achieve success with Office 365

1

Working with various organizations worldwide, I’ve had the great opportunity to help facilitate sustainable Office 365 adoption. On March 22, I’m excited to host a webinar along with Microsoft CIO Jim Dubois,  Mott Macdonald’s Simon Denton and Microsoft Fast Track Sharon Liu as we unpack what it takes to achieve success with Office 365.

 

Jim will share advice on how to enable digital transformation with Office 365 and Simon will walk through how he helped his colleagues boost productivity by moving to Office 365. Lastly,  Sharon will demo the Fast Track resources for starting your adoption and achieving your goals.

What are you waiting for? Register now!

Inside Intel's $15.3 Billion Bet On Mobileye

Inside Intel's $15.3 Billion Bet On Mobileye

Constellation Insights

Intel has already been a key player in the IoT (Internet of Things) market but is looking to significantly strengthen its hand by plunking down $15.3 billion for MobilEye, maker of software, specialized chips and cameras for self-driving cars.

The Israeli company has been in business for 17 years, and has 25 partnerships with automakers. It began working with Intel last year and had already announced plans to launch fully autonomous vehicles in conjunction with BMW and Intel by 2021. Intel plans to create a global autonomous vehicle division based in Israel that combines its existing operations with Mobileye. 

With Mobileye, Intel gains software for each of the three main "pillars" of autonmous driving: mapping, environment sensing and driving policy. Mobileye develops a series of proprietary chips called EyeQ, upon which its software is deployed. Intel sees synergies between Mobileye's specialized tech and its own high-end chips, estimating that self-driving cars could generate in the neighborhood of 4,000 GB of data per day—information that needs to be processed in real-time in order to keep the vehicles moving safely down the road.

While Mobileye is focused on autonomous vehicles, the acquisition speaks to Intel's broader ambitions in IoT and the new wave of computing, says Constellation Research VP and principal analyst Andy Mulholland.

"Intel is actively riding the shift from the traditional computer chip market to the new markets, where an ever increasing number of devices require a processor chip," he says. "Intel has worked to steadily over recent years to introduce a new generation of chips that combine low power consumption, low cost, and specialized functionality."

This new generation of chips effectively require Intel to rewrite Moore's law from a focus on doubling the capacity of a chip every eighteen months towards providing the same capacity but at half the cost every eighteen months, Mulholland adds. "A big part of this challenge is to understand exactly how how the processing power will be demanded and this increases the need for specific market expertise," he says. "Clearly, self-driving cars are likely to be a huge marketplace, and introduce very specific processing requirements, making the acquisition of Mobileye a logical move."

"Compared to the "tab for the fab" as the investment in the design and production of a new chipset is known, the MobilEye acquisition price could be seen as a good buy to get a world leading chipset right at first release," Mulholland notes.

Intel expects the deal to close within about nine months. 

24/7 Access to Constellation Insights
Subscribe today for unrestricted access to expert analyst views on breaking news.

Tech Optimization Chief Information Officer Chief Digital Officer

Google Next - Day 1 Summary

Google Next - Day 1 Summary

Media Name: googlenextlogo.jpg

Google Cloud Next 17 Recap

Google Cloud Next 17 Recap

Google Cloud is adding must-have enterprise features and scaling the business to meet data platform, machine learning and AI demand. Here’s a progress report.

On <iframe src="https://player.vimeo.com/video/208189440?badge=0&autopause=0&player_id=0" width="1280" height="720" frameborder="0" title="Google Cloud Next 17 Recap" webkitallowfullscreen mozallowfullscreen allowfullscreen></iframe>

Down Report - Human error takes AWS S3 down in US-EAST-1 - and it is felt - 3.8 Cloud Load Toads

Down Report - Human error takes AWS S3 down in US-EAST-1 - and it is felt - 3.8 Cloud Load Toads

The Cloud / IaaS industry has grown rapidly in the last years, and providers have been solidifying their systems over the years. Outages are always unfortunate and by large the cloud has shown that it is more resilient than pretty much any on premises computing setup. Nonetheless outages happen – and we are adding a new blog post type for these events – the “Down Report” – where we plan to dissect and rate what has gone wrong, and especially focus on the lessons learnt for the provider affected, the industry, but most importantly for their customers. 
 

To make the effort a little more fun – we assign ‘Cloud Load Toads” to the overall event and each circumstance. We mean no disrespect to the ‘load toads’ that work valiantly in the worlds air forces, but liked the suggestion of our colleague Alan Lepofsky (@alanlepo), who came up with the term ‘Cloud Load Toad”.
 

On the ‘Cloud Load Toad’ scala that goes from 1 (bad but ok, can happen) to 5 (very bad, should never ever happen) we rate the severity off the event overall and the events that lead to it.

AWS S3 Down in US-EAST-1

First of all, kudos to AWS, who published the post mortem post (see here) in about 48 hour past the event, faster than usual, judging from other downtime events in the past. But then each cloud outage is different, the root cause – manual error – is easier to establish than e.g. trouble shooting a battery fire, that destroys its very evidence (think Samsung).

But let’s dissect the post mortem report:
We’d like to give you some additional information about the service disruption that occurred in the Northern Virginia (US-EAST-1) Region on the morning of February 28th. The Amazon Simple Storage Service (S3) team was debugging an issue causing the S3 billing system to progress more slowly than expected.

MyPOV – Certainly production and billing systems need to be connected, and in many scenarios the production system can create issues with the load triggered for the billing system. But a production system should never be able to be stopped by an administrative system, like a billing system. Production should be kept running, billing can be worried about later. It is likely that the S3 billing system (my speculation) is using S3, too – creating a potential recursive dependency. Needless to say – these systems should be isolated. 
Rating: 3 Cloud Load Toads
 
 
At 9:37AM PST, an authorized S3 team member using an established playbook executed a command which was intended to remove a small number of servers for one of the S3 subsystems that is used by the S3 billing process.

MyPOV – Related to above, obvious that the billing system is also using S3 now. Good to drink your own champagne, but when it goes bad because of a mistake by the champagne maker – never good and not only the customers but the champagne maker gets food poisoning – not what you want to have happen. But humans can make mistakes.
 
Unfortunately, one of the inputs to the command was entered incorrectly and a larger set of servers was removed than intended. The servers that were inadvertently removed supported two other S3 subsystems. One of these subsystems, the index subsystem, manages the metadata and location information of all S3 objects in the region. This subsystem is necessary to serve all GET, LIST, PUT, and DELETE requests. The second subsystem, the placement subsystem, manages allocation of new storage and requires the index subsystem to be functioning properly to correctly operate. The placement subsystem is used during PUT requests to allocate storage for new objects. Removing a significant portion of the capacity caused each of these systems to require a full restart. While these subsystems were being restarted, S3 was unable to service requests.

MyPOV – Kudos to AWS for transparency. But any attendee to its reInvent user conference knows how much the vendor prides itself of not letting humans make mistakes, but putting key / vital processes into code. Certainly, the approach and philosophy wasn’t followed here. Would be good to chat with AWS CTO Werner Vogels about this one… I am sure that enough people in Seattle are pondering that in the future typos, manual human error should not take systems down. Of course, we still need a kill switch for the humans… 
Rating: 4 Cloud Load Toads.
 
 
Other AWS services in the US-EAST-1 Region that rely on S3 for storage, including the S3 console, Amazon Elastic Compute Cloud (EC2) new instance launches, Amazon Elastic Block Store (EBS) volumes (when data was needed from a S3 snapshot), and AWS Lambda were also impacted while the S3 APIs were unavailable.

MyPOV – AWS suggests to write critical processes to span across regions. Its own website – amazon.com and subsidiary zappos.com did not go down, and were probably coded correctly. The question is (and sorry if I have not read the fine print) – could an AWS client still use the US-EAST-1 services like EC2, EBS, AWS Lambda etc. if pointed to other S3 stores – or does an S3 failure take the whole region out? This is a deeply critical issue for any IaaS techstack in a IaaS data center. So, did customers have a chance here? A question to follow up with AWS. Not Rated.


 
S3 subsystems are designed to support the removal or failure of significant capacity with little or no customer impact. We build our systems with the assumption that things will occasionally fail, and we rely on the ability to remove and replace capacity as one of our core operational processes. While this is an operation that we have relied on to maintain our systems since the launch of S3, we have not completely restarted the index subsystem or the placement subsystem in our larger regions for many years. S3 has experienced massive growth over the last several years and the process of restarting these services and running the necessary safety checks to validate the integrity of the metadata took longer than expected. The index subsystem was the first of the two affected subsystems that needed to be restarted. By 12:26PM PST, the index subsystem had activated enough capacity to begin servicing S3 GET, LIST, and DELETE requests. By 1:18PM PST, the index subsystem was fully recovered and GET, LIST, and DELETE APIs were functioning normally. The S3 PUT API also required the placement subsystem. The placement subsystem began recovery when the index subsystem was functional and finished recovery at 1:54PM PST. At this point, S3 was operating normally. Other AWS services that were impacted by this event began recovering. Some of these services had accumulated a backlog of work during the S3 disruption and required additional time to fully recover.

MyPOV – AWS describes well that things break all the time, and they can even go down. But IaaS providers need to be certain they can come back up, and part of that coming back is also to understand how long it will take to come back up. S3 has been very popular, so the harder to take it down, test (or simulate) time for it to come back, but certainly something AWS could and should have done and known. When you run IT, and don’t know when a system that is down will come back up more or less for sure, the IT professionals are in a bad spot. 
 
 
Rating: 4 Cloud Load Toads
 
 
We are making several changes as a result of this operational event. While removal of capacity is a key operational practice, in this instance, the tool used allowed too much capacity to be removed too quickly. We have modified this tool to remove capacity more slowly and added safeguards to prevent capacity from being removed when it will take any subsystem below its minimum required capacity level. This will prevent an incorrect input from triggering a similar event in the future.

MyPOV – This section read like there was a software tool – but it malfunctioned. That of course is not good. Granted hard to simulate and test with systems of this scale – but not a good enough answer. 
Rating: 3 Cloud Load Toads
 
We are also auditing our other operational tools to ensure we have similar safety checks. We will also make changes to improve the recovery time of key S3 subsystems. We employ multiple techniques to allow our services to recover from any failure quickly. One of the most important involves breaking services into small partitions which we call cells. By factoring services into cells, engineering teams can assess and thoroughly test recovery processes of even the largest service or subsystem. As S3 has scaled, the team has done considerable work to refactor parts of the service into smaller cells to reduce blast radius and improve recovery. During this event, the recovery time of the index subsystem still took longer than we expected. The S3 team had planned further partitioning of the index subsystem later this year. We are reprioritizing that work to begin immediately.

MyPOV – Kudos to AWS for transparency, explaining that it has a solution and committing to get better going forward. School book response that all vendors with an outage should share – not all have.


 
From the beginning of this event until 11:37AM PST, we were unable to update the individual services’ status on the AWS Service Health Dashboard (SHD) because of a dependency the SHD administration console has on Amazon S3. Instead, we used the AWS Twitter feed (@AWSCloud) and SHD banner text to communicate status until we were able to update the individual services’ status on the SHD. We understand that the SHD provides important visibility to our customers during operational events and we have changed the SHD administration console to run across multiple AWS regions.

MyPOV – This is probably the worst finding, a too optimistic implementation of the key dashboard on AWS overall status. It should never have a single point of failure, but yet we see this happening over and over in outages. Vendors need to learn not to rely on their services to communicate with clients in an outage situation – as they may not be able to respond, a cardinal mistake (see e.g. for another outage issue here)... but yet vendors keep doing so. 
Rating: 5 Cloud Load Toads

 
 
Finally, we want to apologize for the impact this event caused for our customers. While we are proud of our long track record of availability with Amazon S3, we know how critical this service is to our customers, their applications and end users, and their businesses. We will do everything we can to learn from this event and use it to improve our availability even further.

MyPOV – Kudos for acknowledging and owning the issue. No blame game and scape goating (that is often seen here too, the most common scape goat being the network / network provider).
 

A pretty severe event

When doing the tally across the cloud load toads, assuming I did the math right - then I count 19 total toads, across 5 events - bringing the event to 3.8 cloud load toads. I am sure AWS will be the first to agree that this wasn't an insignificant event. But let's look at the lessons learnt. But customers could have coded their loads to avoid the down time.
 
 

Lessons for IaaS Customers

Here are the key aspects for customers to learn from the AWS S3 outage:

Have you built for resilience? Sure, it costs, but all major IaaS providers offer strategies on how to avoid single location / data center failures. Way too many prominent internet properties did not chose to do so – so if ‘born on the web’ properties miss this – its key to check regular enterprises do not miss this. Uptime has a price, make it a rational decision, now is a good time to get budget / investment approved, when warranted and needed.

Ask your IaaS vendor a few questions: Enterprises should not be shy to ask IaaS providers if they have done a few things:
  • Do your run your systems by hand or with software
     
  • Could the same issue that happened with AWS S3 in US-EAST-1 happen to you?
     
  • How do you test your operational software?
     
  • When have you taken your most popular services down last time?
     
  • What is the expected up time of your most popular services?
     
  • When did your produce that test of expected up time last and how has the system usage increased since then?
     
  • How can we code for resilience – and what does it cost?
     
  • What kind of renumeration / payment / cost relief can be expected with a downtime?
     
  • What single point of failure should we be aware of?
     
  • How are your operation consoles built?
     
  • How do you communicate in a downtime situation with customers?
     
  • How often and when do you refresh your older datacenters, servers?
     
  • How often have your reviewed and improved your operational procedures in the last 12 months? Give us a few examples how you have increased resilience.


And some key internal questions, customers of IaaS vendors have to ask themselves:
  • What are your customer / employee communication tools?
     
  • When your IaaS vendor goes down, so may your customer and employee facing apps. How do you communicate then?
     
  • Make sure to learn from AWS mistake – do not rely on the same point of failure / architecture as the production systems – as it will not be available. Simple, but always good to check and better even monitor. 
 

MyPOV

Outages are always unfortunate. The key thing is to learn from them, knowing AWS they will be ruthless to address issues (and hopefully update customers and analysts on status progress). Kudos for a fast past mortem, taking responsibility and sharing first strategies to avoid another occurrence.

On the concern side AWS needs to ask itself how it recycles and reviews architecture and servers. US-EAST is a behemoth that is nonetheless popular, but may need more rejuvenation than AWS may expect / have planned. In the cloud location monopoly race it is possible that vendors might stretch aging infrastructure beyond the breaking point. Of course, afterwards it is easy to armchair everything, but this remains an area to watch.

Overall hopefully plenty of lessons learnt all around, for AWS, other IaaS providers and customers.
Innovation & Product-led Growth Tech Optimization Future of Work amazon SaaS PaaS IaaS Cloud Digital Transformation Disruptive Technology Enterprise IT Enterprise Acceleration Enterprise Software Next Gen Apps IoT Blockchain CRM ERP CCaaS UCaaS Collaboration Enterprise Service AR Chief Information Officer Chief Technology Officer Chief Digital Officer Chief Data Officer Chief Analytics Officer Chief Information Security Officer Chief Executive Officer Chief Operating Officer