Skype CIO Apologizes for Outage: Lessons for VARs and MSPs in the Cloud
When a cloud service goes dark, somebody has to step up to the microphone, apologize, explain what went wrong, and describe how similar outages will be avoided in the future. Such was the case this week for Skype CIO Lars Rabbe, who has posted a blog entry describing Skype’s recent outage and corrective measures. But there’s a bigger issue looming for VARs, managed services providers (MSPs) and cloud integrators: Can you really trust a single cloud service provider for mission-critical offerings like VoIP? Or do channel partners need to hedge their bets?
Of course, TalkinCloud needs to point out that Skype isn’t a traditional cloud platform. Instead, Skype leverages a peer-to-peer design. In a December 29 post, Rabbe wrote: “Last week, the P2P network became unstable and suffered a critical failure. The failure lasted approximately 24 hours from December 22, 0800 PST/1600 GMT to December 23, 0800 PST/1600 GMT.”
Rabbe goes on to describe what caused the failure (basically, a bug in Skype for Windows client version 5.0.0152), recovery measures and a four-step process that aims to prevent similar outages in the future. The four-step effort, word for word from Rabbe, involves:
- First, we will continue to examine our software for potential issues, and provide ‘hotfixes’ where appropriate, for download or automatic delivery to our users. Since a bug was identified in Skype for Windows (version 5.0.0.152), we had provided a fix to v5.0 of our Windows software prior to the incident, and we will provide further updates for download this week. We will also be reviewing our processes for providing ‘automatic’ updates to our users so that we can help keep everyone on the latest Skype software. We believe these measures will reduce the possibility of this type of failure occurring again.
- Second, we are learning the lessons we can from this incident and reviewing our processes and procedures, looking in particular for ways in which we can detect problems more quickly to potentially avoid such outages altogether, and ways to recover the system more rapidly after a failure.
- Third, while our Windows v5 software release was subject to extensive internal testing and months of Beta testing with hundreds of thousands of users, we will be reviewing our testing processes to determine better ways of detecting and avoiding bugs which could affect the system.
- Finally, as we continue to grow, we will keep under constant review the capacity of our core systems that support the Skype user base, and continue to invest in both capacity and resilience of these systems. An investment program we initiated a year ago has significantly increased our capacity already and more investment is planned for 2011 both to support the ongoing roll out of our paid and enterprise products, and to continue to support the growth of our core Skype software that we know millions of users rely on every day.
Is An Apology Enough for Partners?
Rabbe also conceded that Skype “fell short in both fulfilling your expectations and communicating with you during this incident.”
Over the past year or two, I’ve read similar apologies following outages at Amazon Web Services, Microsoft BPOS (Business Productivity Suite) and smaller hosted service providers. But here’s where things get difficult in the channel: VARs and MSPs can certainly “monitor” third-party cloud services for end customers. But when Skype, Amazon Web Services, Microsoft BPOS and similar systems go dark, channel partners are left powerless. Sure, they can alert customers about the outage but the partner can’t impact how or when the service comes back online.
Going forward, I wonder if channel partners will begin to leverage two service providers in each market category. That is, two backup providers, two hosted email providers, two hosted VoIP service providers and so on. If one goes dark, the channel partner can assess the situation and potentially shift customers over to an alternative provider. A prime example: When Skype went dark, plenty of folks jumped over to ooVoo.
That sounds simple enough. But can VARs and MSPs build profitable business models juggling two key partners in each cloud service category? Frankly, I don’t have the answer to that financial riddle.
Follow Talkin’ Cloud via RSS, Facebook and Twitter. Sign up for Talkin’ Cloud’s Weekly Newsletter, Webcasts and Resource Center. Read our editorial disclosures here.
Kevin: Thanks for the comment. Generally speaking, I still think cloud applications are more reliable than traditional on-premise file servers, email systems, etc. But we hear about cloud outages more often because large communities (rather than a single customer) feels the impact…
-jp