Getting Your Data Center Out of the Dark Ages
As I prepared for a podcast interview on the topic of Data Center Infrastructure Management (DCIM), I was amazed to learn how quickly the industry is beginning to demand management of power, space and cooling inside those four walls. Ironically, just as I was in the middle of researching data center outages the lights went out! I feared I would be stuck capturing my ideas for the podcast on quill and parchment and there would be countless candle-lit days ahead, living like a modern day Benjamin Franklin.
Turns out, I had just blown a fuse. Crisis managed. But this incident made me realize how much I take power, space, and the heating/cooling for granted. They’re things I just expect to be there. While I may be able to survive an hour or two without conducting my research for a podcast or watching the Jets face off against Patriots (that was painful), service providers especially cannot take power for granted when service levels are do or die for their business. You can differentiate your business all you want, but if you’re missing SLAs, you’ll be missing your customers too, because this is the excuse they’ll leverage to get increased funding to turn their operations back in-house or move to a more reliable service provider.
Managing power, space and cooling so we don’t get thrown into the dark ages
Operating a data center today without a DCIM solution is as crazy as relying on Benjamin Franklin’s kite and key to stand in as your back-up power supply. That might be a bit hyperbolic, but DCIM is revolutionary for the data center. It provides a unified, real-time view of power, space, and cooling, instead of having multiple (often customized) developed tools and manual processes to support it.
I once heard a story from one of my interviewees about a recent outage they’ve experienced, which I thought captured the importance of what I’m describing. Imagine a post-outage meeting with the CIO and his directs. The enterprise architect stands by the whiteboard and begins to describe how he created redundancy on (what he thought was) every failure point for their most critical customer facing systems. At some point a catastrophic event occurred and those highly available systems were suddenly highly unavailable for quite some time. What happened? Well, the architect focused on the things that typically fail, not the things we don’t expect to fail. What he didn’t realize, was that even though they had two circuit breaker panels, both whips that were used to power the rack went into the same circuit panel and created a single point of failure. So, when the panel blew, the rack went down, as depicted in the diagram below.
In hindsight, it’s easy to understand why they didn’t catch it. As we all know, whips can run through the floor and walls, so these points of failure are not always apparent to the naked eye. Long story short, they implemented DCIM after the outage in order to proactively alert the ops staff to power, space, and cooling failure points, giving them the insights they needed in order to ensure that they would never experience a similar outage again. And they haven’t since! we all know that service providers don’t just worry about the risks associated with outages; and although I feel for that architect, who probably still gets asked if he considered utilities in his architecture, they have other (possibly even more demanding) issues to deal with – like maintaining and improving their margins.
Capacity planning to meet demand and increase margins
Typically service providers avoid risky situations as described above by over compensating on their infrastructure (i.e. overprovisioning to reduce risk). While this might appear be the easiest way to reduce risk, it’s also the most common margin killer for the service providers. What makes it even worse is that in order to combat this situation, most service providers today rely on manual processes to monitor and map their data center through point solutions. This process is not only time consuming, it doesn’t provide the real time data in a simple, correlated and unified view, which is very much needed in order to identify threshold breaches of running near full capacity ahead of time. This could put them at high risk of not being able to keep up with demand, thereby impacting SLAs and potentially cascading into other problems.
DCIM, however, gives you a unified view of your entire infrastructure, allowing you to properly plan for peaks and troughs. You’ll get real-time data and analytics, providing you with insights to help you make better decisions. It will help you answer important questions, like whether or not you really need to start that costly build out this month or if you may be able to delay that build out and better utilize your current infrastructure. DCIM gives you that extra assurance that will enable you to provide superior service to your end customers and increase your margins. A comprehensive DCIM solution provides capabilities that extend far beyond what I’ve outlined in this blog. It is truly the next generation of automation and intelligent reporting in the data center. As a service provider, you can implement DCIM to operate in a more effective way, thereby reducing inefficient, unnecessary complexities, costs and risks. You can also use DCIM to generate new revenue opportunities by selling DCIM as a service to end customers, but I can’t possibly cover everything in this blog! If you’re interested in learning more, please join 451 Research and CA Technologies on 12/11 for a webinar where they’ll dive into all the benefits of properly managing your power, space and cooling in DCIM for Your Business – The Opportunity for Service Providers.
Derek Stevens is a Sr. Product Marketing Manager at CA Technologies. You can connect with him on LinkedIn and on twitter (@DerekintheCloud).