https://www.channelfutures.com/wp-content/themes/channelfutures_child/assets/images/logo/footer-new-logo.png
  • Home
  • Technologies
    • Back
    • SDN/SD-WAN
    • Cloud
    • RMM/PSA
    • Security
    • Telephony/UC/Collaboration
    • Cable
    • Mobility & Wireless
    • Fiber/Ethernet
    • Data Centers
    • Backup & Disaster Recovery
    • IoT
    • Desktop
    • Artificial Intelligence
    • Analytics
  • Strategy
    • Back
    • Mergers and Acquisitions
    • Channel Research
    • Business Models
    • Distribution
    • Tech Services Brokerages
    • Sales & Marketing
    • Best Practices
    • Vertical Markets
    • Regulation & Compliance
    • Diversity, Equity & Inclusion
  • MSP 501
    • Back
    • MSP 501 Information Center
    • 2021 MSP 501 Rankings
  • Intelligence
    • Back
    • Galleries
    • Podcasts
    • From the Industry
    • Reports/Digital Issues
    • Webinars
    • White Papers
  • Channel Futures TV
  • EMEA
  • Channel Chatter
    • Back
    • People on the Move
    • New/Changing Channel Programs
    • New Products & Services
    • Industry Honors
  • Resources
    • Back
    • Advisory Boards
    • Industry Organizations
    • Our Sponsors
    • Advertise
    • 2022 Editorial Calendar
  • Awards
    • Back
    • 2021 MSP 501
    • Circle of Excellence
    • DE&I 101
    • Top Gun 51
    • Channel Partners 101 (CP 101)
  • Events
    • Back
    • CP Conference & Expo
    • MSP Summit
    • Channel Partners Europe
    • Channel Partners Event Coverage
    • Webinars
    • Industry Events
  • About Us
  • DE&I
Channel Futures
  • NEWSLETTER
  • Home
  • Technologies
    • Back
    • SDN/SD-WAN
    • Cloud
    • RMM/PSA
    • Security
    • Telephony/UC/Collaboration
    • Cable
    • Mobility & Wireless
    • Fiber/Ethernet
    • Data Centers
    • Backup & Disaster Recovery
    • IoT
    • Desktop
    • Artificial Intelligence
    • Analytics
  • Strategy
    • Back
    • Mergers and Acquisitions
    • Channel Research
    • Business Models
    • Distribution
    • Tech Services Brokerages
    • Sales & Marketing
    • Best Practices
    • Vertical Markets
    • Regulation & Compliance
    • Diversity, Equity & Inclusion
  • MSP 501
    • Back
    • MSP 501 Information Center
    • 2021 MSP 501 Rankings
  • Intelligence
    • Back
    • Galleries
    • Podcasts
    • From the Industry
    • Reports/Digital Issues
    • Webinars
    • White Papers
  • Channel Futures TV
  • EMEA
  • Channel Chatter
    • Back
    • People on the Move
    • New/Changing Channel Programs
    • New Products & Services
    • Industry Honors
  • Resources
    • Back
    • Advisory Boards
    • Industry Organizations
    • Our Sponsors
    • Advertise
    • 2022 Editorial Calendar
  • Awards
    • Back
    • 2021 MSP 501
    • Circle of Excellence
    • DE&I 101
    • Top Gun 51
    • Channel Partners 101 (CP 101)
  • Events
    • Back
    • CP Conference & Expo
    • MSP Summit
    • Channel Partners Europe
    • Channel Partners Event Coverage
    • Webinars
    • Industry Events
  • About Us
  • DE&I
    • Newsletter
  • REGISTER
  • MSPs
  • VARs / SIs
  • Agents
  • Cloud Service Providers
  • Channel Partners Events
 Channel Futures

Cloud


Leap Year and Windows Azure Cloud Outage: Root Cause Analysis

  • Written by Brian Taylor
  • March 12, 2012

Bill Laing of Microsoft has posted a “Root Cause Analysis” (RCA) that not only gives a detailed account of the Leap Year software bug and related issues that hit the Windows Azure servers starting February 28, but also provides unprecedented insights into the Azure cloud architecture.

With the release of the RCA blog, Microsoft announced that it has extended a 33% service credit to ALL customers of Windows Azure. Bill Laing writes: “Microsoft recognizes that this outage had a significant impact on many of our customers. We stand behind the quality of our service and our Service Level Agreement (SLA), and we remain committed to our customers.”

On March 12 Gartner analyst Kyle Hilgendorf blogged that he “was very pleased with the level of detail in Microsoft’s RCA.” In reference to the Amazon Web Services’ EBS outage in April 2011, he wrote “an RCA… is one of the best insights into architecture, testing, recovery, and communication plans in existence at a cloud provider. Microsoft’s RCA was no exception.”

A Closer Look

Bill Laing’s starts his analysis with a background description of the architectural issues that helped propagate the Leap Year bug. While a detailed reproduction of the RCA is beyond the scope of this Talkin Cloud blog, suffice to say that proper virtual machine functioning on Azure requires that a Guest Agent (GA) have a valid transfer certificate, which is encrypted to protect both proprietary and client data. The certificates are valid for one year from the date of creation.

The Leap Year software bug, which according to the RCA triggered at 4:00 pm PST on February 28, just as clients were starting cloud work for Leap Year itself, forced the transfer certificates to have a valid-to date of February 29, 2013. The system rejected the date, causing the certificates to fail.

This led to a cascade effect, the shutting down of numerous Azure server clusters, because the monitoring processes incorrectly assumed that hardware was corrupted. Moreover, a software update was underway (which complicated matters the next day—see below), and this lead to the software bug spreading even faster, in comparison to less active storage servers.

Microsoft identified the bug at 6:38 pm PST. At 6:55 pm PST, the company disabled all client service management worldwide for Windows Azure. Bill Laing writes: “This is the first time we have ever taken this step.”

The company worked to fix the GA’s and roll them out on the evening of February 28 and though the night. At 5:23 am PST on February 29, Microsoft announced that client service management was back online for the majority of Azure server clusters.

Then, Leap Year Strikes

But as (bad) luck would have it, Azure suffered a second outage on February 29. On the previous day, seven servers were undergoing the software update when the bug hit them. Microsoft decided to unite the old components with the new GA (rolled out the night before) on these servers. Confident of the solution, Microsoft opted for a quicker “blast” update across all servers at 2:47 pm PST.

Microsoft failed to notice, and Bill Laing was very honest on this point, that their new update package with the older components also included a network plugin created for the update, which were not compatible. To sum it up, a second major loss of service to Azure clients occurred. It was not until 2:15 am on March 1 that Microsoft determined that all servers were cleaned up and fully functional.

Laing concludes the RCA with defined follow-up steps to improve the Azure system, and a repetition of Microsoft’s commitment to its Azure customers. As a result of the outage, Azure has come under considerable criticism and increased scrutiny as a cloud service. But in terms of openness and transparency, this RCA blog from Microsoft rose to the occasion.

Talkin Cloud will keep you updated on Microsoft’s work in providing uninterrupted Windows Azure cloud services.

Tags: Agents Cloud Service Providers MSPs VARs/SIs Channel Research Cloud

Most Recent


  • Sophos tip
    Microsoft Global Channel Chief Rodney Clark Makes Sudden Exit
    The unexpected departure takes place just one year after Clark took over the position.
  • Cloud growth
    Rise with SAP and New Partner Co-Sell Program Fuels Swift Cloud Growth
    Accenture and IBM are first premium Rise with SAP technology management services (TMS) providers.
  • Cybersecurity challenges
    Unprecedented Times Impacting Cybersecurity Channel Partners
    Channel leaders have to adapt to changes in market routes.
  • Best practices
    AWS Security Best Practices: A Baker's Dozen for Success in the Cloud
    From implementing the principle of least privilege to cloud workload protection, following these tips makes for a more secure cloud infrastructure.

Leave a comment Cancel reply

-or-

Log in with your Channel Futures account

Alternatively, post a comment by completing the form below:

Your email address will not be published. Required fields are marked *

Related Content

  • Cloud Phone
    AT&T to Offer Up to 1 Million Customers Cisco Webex Calling
  • edge computing
    'Challenging Results' for MSPs in Channel Futures' Exclusive Quarterly Survey
  • Cloud Computing
    'No Drama'? Microsoft Status as SAP-Preferred Cloud Partner Fades
  • White House
    White House Urges Companies to Take Ransomware Attacks More Seriously

Upcoming Events

View all

Channel Partners Europe

June 14, 2022 - June 15, 2022

MSP Summit

September 13, 2022 - September 16, 2022

Galleries

View all

Unprecedented Times Impacting Cybersecurity Channel Partners

May 16, 2022

8 Channel People Making Waves This Week at Avant, Cisco, Databricks, More

May 13, 2022

Talent Shortage Ripple Effects Continue to Create Headaches for Partners

May 13, 2022

Industry Perspectives

View all

Build Customers for Life with CX and Lifecycle Selling

May 16, 2022

Voice Analytics Are a Must-Have as Companies Evolve COVID-Rushed Tech

May 12, 2022

Top 5 Trends and Challenges Channel Partners Are Facing in 2022

May 9, 2022

Webinars

View all

Simplifying SaaS Security for MSPs

April 27, 2022

How to Supercharge The Network to Support Your IT Superhero Moves

May 3, 2022

The 2022 MSP Challenge: Scale Service Delivery Despite the Talent Gap

April 21, 2022

White Papers

View all

The New Bottom Line: How MSPs Can Meet the Healthcare Crisis While Evolving Their Businesses

April 19, 2022

How to build a Security Operations Center (on a budget)

April 4, 2022

The AT&T Cybersecurity Incident Response Toolkit

April 4, 2022

Channel Futures TV

View all

AT&T, Microsoft, Cisco, ThreatLocker on Unlocking Partner Potential

Agents Share ‘Secrets,’ Industry Opportunity

May 11, 2022

Vonage Addresses Potential Partner Opportunity via Acquisition by Ericsson

May 5, 2022

Lumen Technologies ‘Built for Growth and Scale’

May 4, 2022

Twitter

ChannelFutures

How to build and provide a great customer experience. #CX #ChannelPartners @IngramMicroInc dlvr.it/SQTrfh https://t.co/RsAA2Lliek

May 16, 2022
ChannelFutures

[email protected] global channel chief Rodney Clark made the surprise announcement he is leaving his position just one ye… twitter.com/i/web/status/1…

May 16, 2022
ChannelFutures

.@SAP touts growth of Rise with SAP at #SAPSapphireOrlando dlvr.it/SQTCs5 https://t.co/J2LrQrYlNQ

May 16, 2022
ChannelFutures

#CPExpo #cybersecurity roundtable discusses unprecedented times for channel chiefs, partners. @Sophos, @Fortinet,… twitter.com/i/web/status/1…

May 16, 2022
ChannelFutures

Being #cybersecurity proactive about the threat landscape makes a better #cloudcomputing strategy, says… twitter.com/i/web/status/1…

May 16, 2022
ChannelFutures

As many MSPs deal with continuing hiring shortages, there is a growing need to find and retain the right talent.… twitter.com/i/web/status/1…

May 13, 2022
ChannelFutures

New Charter is focusing on the entrepreneur journey and has a unique snap-up model/strategy. dlvr.it/SQK8Jn https://t.co/kZ69jpi4AA

May 13, 2022
ChannelFutures

Free Live DE&I Webinar: “Shared Language: Inclusion Culture’s Secret Weapon” dlvr.it/SQK1bt https://t.co/837MHLLgG1

May 13, 2022

MSSP Insider

Business advice for MSSPs and news from the broader security channel.

Newsletters and Updates

Sign up for The Channel Report, Channel Futures Update, MSP 501 Newsletter and more.

Live Channel Events

Get the latest information on the next industry-leading Channel Partners event.

Galleries

Educational slide shows and images from live events.

Media Kit And Advertising

Want to reach our audience? Access our media kit.

DISCOVER MORE FROM INFORMA TECH

  • Channel Partners Events
  • Telecoms.com
  • MSP 501
  • Black Hat
  • IoT World Today
  • Omdia

WORKING WITH US

  • Contact
  • About Us
  • Advertise
  • Newsletter

FOLLOW Channel Futures ON SOCIAL

  • Privacy
  • CCPA: “Do Not Sell My Data”
  • Cookie Policy
  • Terms
Copyright © 2022 Informa PLC. Informa PLC is registered in England and Wales with company number 8860726 whose registered and Head office is 5 Howick Place, London, SW1P 1WG.
This website uses cookies, including third party ones, to allow for analysis of how people use our website in order to improve your experience and our services. By continuing to use our website, you agree to the use of such cookies. Click here for more information on our Cookie Policy and Privacy Policy.
X