June 14 AWS Outage: Building Redundancy is Critical

Brian Taylor

June 18, 2012

3 Min Read
June 14 AWS Outage: Building Redundancy is Critical

Although 2012 has been relatively free of cloud service disruptions, we still cringe whenever we hear about one. The latest: Amazon Web Services (AWS) (NASDAQ: AMZN) went offline for a few hours June 14 — not a long outage but long enough to get the cloud naysayers’ tongues wagging once again. And, on some level, they have a point.

Here’s the scoop: AWS suffered a power outage at its Northern Virginia data center June 14, taking out a number of startups and Internet sites including Parse, Pinterest, Quora and Heroku. To its credit, Heroku provided a transparent incident report. The AWS Service Health Dashboard reported the problem just before 9 p.m. Pacific, and the outage affected Amazon Elastic Compute Cloud, Amazon Relational Database Service and AWS Elastic Beanstalk serviced out of the Virginia data center. AWS fully restored services during the night.

Although the power outage meant the problem was not IT-related, anytime Amazon suffers a service disruption, comments start to fly. Competitors, tweeting experts, pundits, conservative cloud-o-phobes — it may come as no surprise — all had something to add to the commentary on the outage.

So now it’s Talkin’ Cloud’s turn. First, the cloud is here to stay. Secondly, the most measured commentary I read about the outage can be summed up by saying: Don’t put your eggs in one basket.

Or as Barb Darrow wrote in Giga Om: “It shows that building in redundancy is critical — whether your app runs in your own data center or in someone else’s cloud. In short, AWS users should make sure their workloads run across AWS regions to prevent future snafus.”

In her blog Darrrow also quoted Carl Brooks, analyst at Tier1 Research: “AWS outages are still magnified out of proportion to their severity. It doesn’t help their credibility with the paleoconservative enterprise paranoid who will use this as an excuse to buy more absurdly overpriced IT from the usual suspects.”

Thus, Talkin’ Cloud point No. 3: You can’t turn back the IT clock. IT capability and, increasingly, cloud access, have become essential for modern organizations of all stripes, and one of the underlying rules of technology is “innovate or die.”

Last August, a lightning strike in Ireland resulted in a transformer explosion at AWS’ electricity supplier, knocking out Amazon’s backup generators and disrupting service in the AWS European zone data center.

The AWS data center in Northern Virginia suffered a previous outage in April 2011, which affected sites including Foursquare, Twitter, HootSuite, Reddit, Quora and GroupMe. AWS characterized the failure as a “re-mirroring storm,” in which “a large number of volumes were effectively “stuck” while the nodes searched the cluster for the storage space it needed for its new replica.”

But, on the other hand, business for AWS continues to both impress and/or infuriate. Last week AWS blogged that storage on S3 has reached 1 trillion objects, which is close to 143 objects for every person presently alive on the planet.

That’s a lot of objects, and as S3 storage continues to grow, Talkin’ Cloud will be watching.

Free Newsletters for the Channel
Register for Your Free Newsletter Now

You May Also Like