Partners, Customers Fire Back Over Azure Outage Explanation
“Better late than never” may be good words to live by, but be careful as to how late you are. This week, Microsoft is finding out the hard way after its Azure team finally outlined what caused the service outage a month ago.
As Jason Zander, corporate vice president for the Microsoft Azure team, noted in a blog post, the service interruption of Azure Storage can be chalked up to user error. A software change meant to improve the performance of the cloud storage service was implemented, but things didn’t go as planned. The issue came down to two things, according to Zander.
First, “The standard flighting deployment policy of incrementally deploying changes across small slices was not followed.” And second, “Although validation in test and pre-production had been done against Azure Table storage Front-Ends, the configuration switch was incorrectly enabled for Azure Blob storage Front-Ends.”
So basically: User error. You can read the full details of the incident at the Azure blog.
Although it’s good to see Microsoft finally providing some more detail on what went wrong, the delay in analyzing the issue and getting that information out to partners and customers was longer than expected. Partners and customers lambasted Microsoft for its lack of communications while also praising Microsoft for coming clean.
A user going by dotnetchris thanked Zander for the full explanation, but noted, “I’m sure many will be displeased it took a full 30 days for MIcrosoft to properly communicate what occurred. I hope that lessons will be learned from this and these mistakes regarding communication not repeated.”
The user said Microsoft’s reporting channels are “severely lacking” and that it is “a critical issue and needs resolved immediately. This is an issue that has existed for years.”
Members of the Microsoft Azure team attempted to defend actions, explaining it had used Twitter to get the word out, but user Qix offered a rebuttal: “You tweeted people? Nobody saw a tweet until after service was restored.”
Outages happen. They’re not pleasant. They’re not pretty. And they’re frequently in breach of service-level agreements (SLAs) that have been put in place. The cloud still fares rather well when compared to a lot of traditional IT systems. But communication is still sometimes lacking, and cloud services providers would do well to remember that their customers and partners expect to be informed sooner rather than later.