Constellation Insights

As you've probably heard, Amazon Web Services' had a bit of an outage this week. Problems with its Simple Storage Service in the US-EAST-1 region caused a large number of prominent websites to either be inaccesible or perform very slowly. While Amazon characterized the issue as "increased error rates," rather than an outage, for many users it came to the same. 

Among the affected websites were Quora, Medium, Slack, Twitch.tv, Imgur, Heroku, Bitbucket, Citrix, Expedia, Zendesk and Razer, along with many others, as the Register reports. The last makes cloud-based driver software for mice and other peripherals popular with gamers. Other IoT-related companies such as Nest also reported malfunctioning devices due to the S3 problems.

Other Amazon Services in the US-East-1 region, such as WorkMail, Elastic FileSystem and Elastic Load Balancing, suffered problems that would appear to be either remarkable coincidences or somehow related to the S3 errors. 

In any case, as of this writing AWS's system health dashboard shows no issues in any region. Also, S3 has often exceeded its SLA—99.99 percent for the standard service and 99.9 percent for the infrequent access option—according to third-party tracker Cloudharmony, as Techcrunch reports.

On Twitter Tuesday, AWS said it had determined the root cause of the problems, but as of yet has not provided a more detailed explanation of what happened. Customers will surely be asking for one. However, the outage may also turn the lens back on customers themselves. After all, while Amazon provides redundancy for S3 within a region, the safer way to architect massive websites and applications is through redundancy across multiple regions—a decision that is often not made due to reasons such as cost and complexity.

"IaaS downtimes are always unfortunate, but with more dependency on public cloud, they're affecting more customers," says Constellation Research VP and principal analyst Holger Mueller. "Who would have thought that email systems, wifi login pages and mouse settings depend on S3? So we not only see a dependency on IaaS providers, but a deeper layering of applications, beyond traditional contained software models."

To that end, "outages are always a good opportunity to see if a cloud-native application has been built correctly or not so much," Mueller adds."From that perspective way too many internet properties were affected by S3 having an issue in a single AWS data center."

The most interesting news to come after AWS explains the root cause is whether it had had enough capacity across its other data centers to keep Internet properties running even when US EAST-1 went down, Mueller adds. "AWS will likely claim it had, but we will have to speak to customers who coded correctly for high availability and disaster recovery and find out why they still went down."

24/7 Access to Constellation Insights
Subscribe today for unrestricted access to expert analyst views on breaking news.