Wednesday, October 13, 2010

Am I Cheating on PUE or DCiE?

In a parallel blog, we've been discussing the death of the modern data center. It's not as dire or alarmist as it sounds - merely the realization that data centers are evolving as both real estate-based solutions outside the enterprise and the use of cloud computing for social media, entertainment and now more traditional corporate computing functions. In many ways, we're returning to the days where the MEP infrastructure is specifically engineered to the computing systems they support. For you old fogies out there (like me, even at 47), chilled water's making a comeback as well. We're not just dumping capacity into a white space anymore.

Recently, I was reviewing an article in ENR stating that a data center achieved a PUE of 1.08. I will say that I found the figure a bit hard to believe. Where we found fault with the statement was that it was:

  • Not backed by environmental data.
  • Did not appear to be normalized over an annualized basis, since the facility has been open less than one year.
  • Not offered out of context to the entire data center population.
  • Does not mention that it uses a significant amount of server fan power to achieve this figure.

It doesn't mean the figure was misstated, just taken out of context.

When viewing some of the home run PUE's of the past couple of years, sites located in areas known for beneficial cooling (like the use of very cold outside air, just off the North or Baltic Sea) were leveled to sites in the US with some explanation. We're not contending the 1.08 was hokey, just that when compared to a site in Phoenix, Dallas or Northern Virginia, clear disclosure is necessary. It also doesn't state if there was a reduction of the kW's consumed - just that the ratio is lower. What you should care about in PUE is the reduction in overall power consumption or an increase in MIPS/W or whatever power-to-IT metric you happen to fancy.

I will also say that the current PUE and DCiE standard is pretty generic. The new ANSI/BICSI Data Center Standard 002 speaks specifically to annualized data, where all four seasons must be considered. The PUE and DCiE figures don't recognize server fan power in the equation. Nor do we expect that to change in the near future.

Here are some areas that we're finding suppress PUE values:
  • Environmental data for the PUE calculation is not taken on either the winter or summer design days. Per the new ANSI/BICSI Data Center Standard 002, PUE must be examined with design day enviromentals.
  • Many users are now using server fans as part of the ventilation chain in a more formalized way. While this has been the case with forced ventilation cabinets for years, users like Facebook, Google and Yahoo will actually design the server fan into the supply and return air chain. What happens is the server fan power is counted with the IT load and not the supporting load. Fair game, but you'll see a PUE as much as 25 - 30 basis points lower than a system that uses more traditional air handling system where the server fans are not engineered into ventilation chain.

Make sure that PUE and DCiE consider the server fan contribution where used when comparing them to facilities that don't employ or consider the server fan power as part of the formal HVAC system.

How Did the Data Center Come Down With a Fatal Disease?

Many of us in the industry have discussed why data centers of the past are changing. I contend that it's not an unanticipated shift, but merely a return to either a generic or highly bespoke solution for a mainfame computing environment. And if you ever worked in a Cray shop, you know how specific the solutions were for that superb platform.

For a minute, let's take a quick review of where we've been. In the 1980's (and yes, I'm that old, been in the MC business that long and I recall chilled water and 415 Hz), data center infrastructure suited a homogeneous platform, in this case the ubiquitous mainframe. With the rise of the microserver, and its 60Hz power and air cooling, the infrastructure of the data center evolved. Remember, there was no hosting or managed service business back in those days (except for TymeNet, run by Tom O'Rourke, the father of one of my best friends from Santa Clara). So, in the 1980's early 1990's, you had a single path data center that served a specific platform.

With the rise of the server farm, we began to see an increase in real-time computing operations. With that came the rise of fault-tolerant MEP systems that had to address a more heterogeneous series of platforms in the compute, storage, server and network activities of the enterprise. Simultaneously, power densities were increasing rapidly. This has been the habit of the industry from the mid 1990's to the mid 2000's. While there was certainly facility solutions specific to a user's computing needs, most solutions simply delivered a specific Tier or Class solution at a given power density (rendered in W/sf).

During the mid 2000's, higher density systems, starting with blade-based systems forced a reversal of evenly spreading capacity throughout the data center to serving the higher density systems with specific power, cooling and racking solutions. Much of what we've seen during this time has been systems-based work outside of the cloud. Most, if not all of our facility solutions, have been addressing the exceptions to the "evenly spread" capacity versus any form of true reengineering of the ENTIRE process. Containerized computing and infrastructure solutions are a step toward solving this challenge.

Here it is. What is causing the death of the modern data center is the split realities of users seeking a real estate based solution to their data center needs and the emergence of cloud computing or specific hardware solutions massively deployed across the enterprise. One of these realities transfers the physical asset to outside the enterprise. The other reality, if properly executed, has and will yield signficant efficiencies in power consumption and computing throughput when examining W/MIP of your computing systems.

Friday, September 10, 2010

The Death of the Data Center as We know It

I'm not the most radical person on the planet.

After some consideration, I've come to the realization that in an effort to drive energy efficiencies into the data center, we may have inadvertently changed how users view the relationship between the critical utility infrastructure and the hardware and systems they power and cool. While this will not be an overnight change. The fact is that rendering any portion of the operation more generic while maintaining application availability will undoubtedly and eventually gain the attention of IT users everywhere.

I do believe that data center utility architecture has taken a tectonic turn for the better. The crux of my arguement is that users are now looking at the physical infrastructure from the backplane or the hardware level up, where in the past, we have stopped at the input to the server or drive bay. Nothing bad can come of this, aside from the discomfort of change for a better way of business.

Much of the current state of affairs is the habit and indictment of the user, engineer, builder and manufacturer to conceive, provide or build facilities that they are comfortable with when operating in a very risk-adverse environment. Users are now finding that they may customize their platforms to their applications in a very profound way. And I don't mean from whom or where they are buying their servers from - it means that the servers themselves are bespoke for what is processed on them.

When I mentioned tectonic before, I also believe that the change will be slow. Facility, platform and application adoption are rarely synchronized, and that is one of the key elements in making this kind of leap. That being said, facilities will follow what the hardware is doing and the hardware will follow what the application is doing - that age-old chestnut will never change.


Stay tuned for a multi-part blog and white paper on where this may have started, where this is today, where all of this is going and how the relationships between all stake holders in the IT service train will change.

Tuesday, July 27, 2010

NEC 708

For those of you who have been following additions to the NEC and have missed the impact of the new Article 708. The intention has been to protect facilities and to provide a certain level of survivability to buildings focused on national defense or vital civil operations. What we have found is that this requires a 2-hour rating on much of the electrical system, specifically the power, ventilation and life-safety systems. While concealing conduit under the floor slab helps ameliorate this situation, exposed conduit falls to GRC or IMC.

This is a newer Code section, and will undoubtedly undergo further refinement as it ages (as all Code Articles typically do).

Tuesday, March 30, 2010

Where These's Smoke, There's EPO

Sorry for the break everyone, but I had some trouble with my Blogspot account, now solved!

This post is about the EPO. There's been a major shift in the industry considering EPO. Presently, the EPO is being negotiated out with the Jurisdiction or, in a longer plan, being excluded as a requirement to NEC 645. That's all good and fine, but let's talk about the realities of what happens when you actually have an event in the data center.

I will have to thank my brother-in-law for this musing (the esteemed Dr. Richard Swan, the potentate of retail software intentory tracking & management and an early pioneer in RFID tracking systems. Check him out at as the CTO of Retail Solutions at www.retailsolutions.com and is a way smarter engineer than I am).

One of Richard's colo spaces suffered a planned EPO after a smoke event. And the source of that EPO was the firefighters that responded to the alarm. It occurred to me that the fear of accidental EPO drives much of the desire to remove the EPO from the data center. In this day and age, a typical EPO can be provided as an instrinsically-safe system, that is also maintainable. This is a great improvement versus the old days where the systems were not intrinsically safe and had to be dealt with hot. That pretty much will deal with that human error factor, aside from staff not following MOPs and approved sequences of work. What we've never tired to face is what actually happens if you get a fire department response. The response may not be a fire, but a smoke event, spill or second-stage alarm.

In the case of Richard's space, there was a moderate smoke event in the data center. Since it was a second alarm on the fire alarm system (and a wet sprinkler that did not discharge) but not a flame event, the fire department rolled to the data center. What most folks may not know about your local FD in a non-high-rise is that the first things the fire fighters are likely to do when the arrive at an alarm is to cut all of the power to the building. And since we actually train the local FDs here, they are also smart enough to kill the generators, UPS and batteries. And afterwards, you will be dark and dead.

Folks will say this is outlandish! What you have to recall is that firefighters use water and agent to fight fires, it all being conductive of electricity. Ergo, they cut the power to avoid electrocuting themselves. Darn it, we didn't think of that.

So while you may want to get rid of your EPO and may be successful, there may be a day when the "manual FD EPO" will come on by as a result of an alarm and take you offline.

What should you do:
  • Ramp down and/or shut off IT operations in the affected area at an alarm or FD alert.
  • Compartmentalize the DC to avoid a single point of platform, network, infrastructure or application failure in the event of a manual EPO.
  • Clarify the FD response with the Jurisdication during construction or at worst every two years.

Sunday, January 31, 2010

Equipment Sourcing

I can hear it now. I got equipment from People's Republic of East Bejesus and I'm worried it will fail!

Well, most of what we buy is industrial-grade equipment that bears a lot of 3rd party scrutiny, such as UL or independent lab, not to mention the commissioning we do to systems before we turn a job over to our client.

In the past three years, there's been some real failures in OEM quality control or in assembly control that we've seen result in catastrophic equipment failures. In this day and age when everyone feels we're circling the drain in Dante's Inferno (much better in the original language, by the way), we tend to overreact.

What we forget is that, while repugnant and you hope it's not on your job, there's no such thing as a perfect manufacturing or OEM sourcing operation. So what are we going to do without tossing nuclear-level testing into the mix.

Here are a few tips:
  1. Check your "Buy American" clauses for OEM parts allowances.
  2. If you want something tested in the factory before the system assembly, simply specify the part and testing method before it goes in. Right now, we're really watching circuit breakers (as they are coming for a host of sources as opposed to one factory for a given system, like a paralleling gear or a UPS power module, as well as wiring harnesses.
  3. Ask for the source quality program for the OEM supplier of interest during the purchasing process.
  4. Financially handicap poor or good performers to level your bid field.

And finally, expect things to fail. Keep enough time in the factory and field testing to allow for rework time.

And remember, there's no better time than uptime.

Next blog - Where's there's smoke, there's EPO.