As enterprises move more workloads to the cloud, building resiliency into the plans is critical.

Jeffrey Burt

June 3, 2019

4 Min Read
7 Worst Cloud Outages Of 2015 So FarWhich of this year's cloud service outages have caused the most damage Here's a closer look at seven of the worst
7 Worst Cloud Outages Of 2015 (So Far) Which of this year's cloud service outages have caused the most damage? Here's a closer look at seven of the worst outages of 2015 (so far).

The weekend’s hours-long outage of Google Cloud’s computing network that interrupted such services as G Suite and YouTube is the latest cautionary tale for enterprises that may be considering putting all of their business into the public cloud or leveraging only one cloud service provider.

The internet giant on Sunday issued an alert that there was a problem with its network, noting a high level of congestion in the eastern United States. Almost five hours later, the problem was resolved and the company noted it would “conduct an internal investigation of this issue and make appropriate improvements to our systems to help prevent or minimize future recurrence.”

The company said a more detailed report of the incident – which also reportedly impacted third-party applications like Snapchat, Nest, Discord and Vimeo – would come after the investigation is completed.

Such outages can be highly disruptive for businesses that increasingly are migrating workloads and data to the cloud to take advantage of the greater agility and scalability offered by cloud providers, including top-tier vendors Google Cloud, Amazon Web Services (AWS) and Microsoft Azure. As more workloads move to the cloud, businesses need to be aware of the risk of such outages and adopt hybrid and multicloud approaches that will lessen their exposure if one of these networks go down.

The Google Cloud incident came just weeks after a similar situation hit software-as-a-service (SaaS) company Salesforce, when an outage that lasted several days impacted such services as Sales Cloud, Service Cloud and Pardot, the vendor’s B2B marketing automation platform.

Medina-Angelique_ThousandEyes.jpg

ThousandEyes’ Angelique Medina

“Moving critical applications and services to the cloud brings unprecedented power to IT teams who no longer have to worry about building and maintaining infrastructure,” said Angelique Medina, director of product marketing at ThousandEyes, a network monitoring company for enterprises and service providers that tracked the Google Cloud problems. “At the same time, cloud computing introduces a solid dose of unpredictability due to the sheer complexity of the internet and cloud connectivity. We get reminded of this reality on a reasonably regular basis when outages occur in the cloud or other service providers.”

ThousandEyes mapped out user locations – denoted by red circles – around the world that were impacted by the outage, and later created another map that showed areas in green that were running once the problem was resolved.

Google-during-outage.pngGoogle-post-outage.png

Medina said that what comes out in Google Cloud’s post-mortem report will hold important lessons for customers, noting that it’s “vitally important to ensure your cloud architecture has sufficient resiliency measures, whether on a multiregion basis or even multicloud basis, to protect from future recurrence of outages.”

Enterprises seem to be taking such cautions to heart, with several reports being issued this year showing increasing use of multicloud and hybrid cloud strategies. A report from workload automation vendor Turbonomic found that 84 percent of survey respondents said they use more than one cloud provider. The push toward multiple and hybrid clouds is being driven by the desire to use best-of-breed cloud services and to ensure that applications are available when needed. Eighty-three percent of respondents said they expect their workloads to move easily between their cloud environments.

Morales-Chris_Vectra.jpg

Vectra’s Chris Morales

Using the cloud, however, comes with risks, says Chris Morales, head of security analytics at Vectra, a cybersecurity firm.

“Relying on any type of service means that there is going to be some level of risk of a service outage,” Morales told Channel Futures. “This could be from digital to physical, like the power going out. Using electricity as an example, some people do run backup generators in the time of an outage, but that gets back basic services and not access to things that also went out, like TV broadcasting or radio.”

There is a difference for users, Morales said, adding that a “service outage is an acceptable risk to most consumers. It is critical services like …

… hospitals [that] face a bigger concern.”

Cloud services can make things much easier in many ways for enterprises, but it’s important to remember that they come with inherent unpredictability, ThousandEyes’ Medina said.

“The cloud and the internet are … massive, complex, and endlessly interconnected,” she said. “The cloud is arguably still the best way to do IT for most businesses today, but it carries risks that no team should be unprepared for. When you no longer have control over all the infrastructure, software and networks you rely on to run your business, you need timely visibility so you can tell what’s going on and get resolution as fast as possible.”

Medina also warned that these challenges do not only challenge cloud providers.

“Lest we be tempted to pillory Google for the outage, it’s important to remember that every IT team experiences service outages,” she said.

Free Newsletters for the Channel
Register for Your Free Newsletter Now

You May Also Like