Why Your Cloud Provider’s SLA Should Be More Than Just a Number

To determine the targeted duration of time required to restore acceptable service levels to business processes following disaster or disruption, many organizations increasingly rely on RTO and RPO as standards, supplanting more traditional SLAs guaranteed by their cloud provider.

October 12, 2015

6 Min Read
Why Your Cloud Provider’s SLA Should Be More Than Just a Number

By Egenera Guest Blog

To determine the targeted duration of time required to restore acceptable service levels to business processes following disaster or disruption, many organizations increasingly rely on recovery time objective (RTO) and recovery point objective (RPO) as standards, supplanting more traditional SLAs guaranteed by their cloud provider.

Just like SLAs, however, RTO and RPO don’t always and completely answer every question associated with an organization’s ideal backup and recovery window. Things like: Where is my data held? How secure is it? What kind of payback can I expect if my service fails? And what does that mean for my business’s short- and long-term prospects–and even its reputation–if my data is inaccessible for hours or even days at a time? There’s where disaster recovery as a service comes in.

Like insurance of any kind (health, auto, etc.), disaster recovery as part of your business continuity planning is unequivocally about risk mitigation: paying an external entity to ensure business continuity and integrity is preserved in the event of a disaster. But in the real world, does anyone take that planning as seriously as they should? 

I ask this question because I came across a recently published 2014 survey by the Disaster Recovery Preparedness Council that suggests the incidences and costs of outages, while remaining a focus for some organizations, is still very much a challenge for most: 

  • More than one-third (36%) of organizations lost one or more critical applications, VMs or critical data files for hours at a time over the past year, while nearly one in five companies lost one or more critical applications over a period of days.

  • One in four respondents said that they had lost most or all of a data center for hours or even days.

  • Reported losses from outages ranged from a few thousand euros to millions of euros, with nearly 20% indicating losses of more than €45,000 to over €5 million.

Even more startling, the survey results reveal that nearly two thirds (60%) of companies don’t have a documented DR plan, another 40% admitted their DR plan didn’t work properly when a disaster occurred, and a quarter of respondents (25%) lost their entire data center’s use entirely for hours, even days. These outcomes all point to the same conclusion: There is, even today, a serious lack of disaster recovery planning, testing and resources. How is this possible and how to address these shortcomings?

First of all, hand-wringing or analysis by paralysis isn’t enough. Instead, take time to thoughtfully evaluate which DRaaS provider is right for you using some real-world criteria and best practices.

For example, one metric might be whether a provider’s DR solution is hands-on or only offers a set of point tools. Taking that logic a step further, that means you and your team will shoulder the responsibility for building out the solution and then ensuring it’s correctly configured across your ecosystem. It also means you’ll be responsible for testing and monitoring its efficacy and performance moving forward, and that when the individual you’ve tasked to support your DR solution is unavailable or away on holiday, you’ve got to hope real hard that nothing bad happens. And, if an event does occur, will that individual’s protégée step into that role, to pick up the pieces and to get your network back up and running?

Here are some other considerations:

Test and retest: More than a third of all companies taking the council’s survey tested their disaster recovery plans only once or twice a year. Nearly 25% of them never tested their DR plans at all. Talk about inviting a very unwelcome surprise. Face it: Networks aren’t held together merely by faith and the infrastructure equivalent of duct tape. Without testing, you have no empirical proof whether you can come back from an “event” or extended outage. Even more disturbing: When companies did test their DR plans, more than 65% of them didn’t even pass their own tests. So, for this topic let’s add one more “R”: recalibrate.

Just because you say you’re a service provider doesn’t mean you are: While I don’t have much visibility into how cloud storage laws apply in the United States, there are many organizations here in Ireland that just cannot risk their data being stored outside of the isle’s jurisdiction. Out of sight out of mind isn’t something that’s practical, much less in our case legal. Even more problematic, there are those service providers who hang out their shingle and claim they can store and manage your data, no problem. Keep in mind, however, that some of these providers, in fact, may exist only as a “shell organization”–one that turns your data over to another entity that remains completely undisclosed to you (and one that effectively may have no skin in the game). No harm, no foul, right?

Good intentions are not enough: In my experience many of the organizations I either find on my own or who find me have either no DR plan in place or believe they already have one that’s sufficient to satisfy an instance of downtime or protracted outage. In fact, these misperceptions–including a lack of understanding on what good DR planning really means for the long haul, its perceived high costs or good intentions aside such as “we’ve just never gotten around to it”–can be easily overcome.

Come together: Piecing together a complete DR solution requires respect for what you’re trying to accomplish and all the components and supporting architecture it comprises. For many MSPs, managing, monitoring and supporting a cloud–any cloud–includes lots of overhead that can quickly become onerous, especially since some cloud providers only supply point tools, rather than an integrated, fully managed service approach that intervenes on your behalf behind-the-scenes to keep your network up and running.

You really are one of a kind: It’s true. Disaster recovery planning is never one size fits all. Like each of us, DR is unique to the organization and the systems that support it. For example, what about hybrid environments–those that couple physical with virtual environments? In this instance, you will need a disaster recovery solution that stretches across both domains and, ideally, in real time. In other, more sophisticated settings, it may require a complete upgrade of the client’s systems in order to implement a productive and responsive DR solution.

In sum, RTO and RPO are an accepted litmus test to determine how rapidly your network comes back following an instance of downtime. SLAs continue to be important because they hold your vendor accountable, both practically as well as, in most cases, financially. DRaaS, too, is increasingly important in the mix.

Organizations evaluating MSPs as presumptive partners should take into account many or all of the considerations I’ve included here as best practices that separate one provider from the next. That includes making certain that the MSP has a thorough understanding and appreciation of what’s involved in restoring systems, as well as the resilience and reputation their company is supporting following an outage.

Ultimately, it’s about reducing risk–yours as well as the MSP’s, because disaster recovery is not just about if you’ll need it but how soon.

Pete Manca is Egenera’s CEO. Guest blogs such as this one are published monthly and are part of MSPmentor’s annual platinum sponsorship.

Read more about:

Free Newsletters for the Channel
Register for Your Free Newsletter Now

You May Also Like