James Hamilton is Vice President and Distinguished Engineer at Amazon Web Services. He specializes in infrastructure efficiency, reliability and scaling. Prior to joining Amazon.com, James was Microsoft Data Center Futures Architect. He has spent more than 20 years working on high-scale services, database management systems, and compilers.
What do you think is our greatest challenge and our lowest hanging fruit to reduce energy 10% across the industry over the next 18 months? What about the harder tasks, to decrease energy use 20% over 36 months?
The big levers are: 1) server utilization, 2) high-efficiency servers, and 3) high-efficiency mechanical systems. In a good but not even close to industry-leading data center with a PUE of 1.7, about 59% of the power delivered to the center is delivered to the servers. That tells us to look first at how efficiently we’re using the servers since that’s where most of the power is going. Industry-wide, server utilization is terrible and a good utilization number is 30%. Most are significantly worse.
Many techniques can be employed to raise utilization levels. Most revolve around workload scheduling and shutting servers off. A technique that I really like is called Resource Consumption Shaping which is, in effect, shifting peak utilization and flattening the sinusoidal consumption curves. In Should We Shut Off Servers, I argue that shutting servers off should not be the first choice when maximizing work done per dollar although it will help with power consumption. The core of the argument is that server hardware costs and infrastructure are higher than the power costs so the first goal should be to use the servers rather than shut them off or, worse, leave them idle. Shutting them off saves power, but doesn’t help with server costs, power provisioning costs, and mechanical systems costs. If the servers can deliver more value to the business than the marginal cost of power, then “off” should not be the first choice.
Over the next few weeks, I’ll post to Perspectives some experimental work where we run a production web service workload on prototype servers built from client-side parts and realize more than a 3x advantage in work done/joule and work done/dollar and more than a 9x advantage in work done/rack unit. We still need to get data on increased failure rates but, generally, high efficiency servers have great promise. In Annual Fully Burdened Cost of Power, we show that replacing inefficient servers early can make good economic (and environmental) sense.
One technique of improving mechanical system efficiency in an existing data center build, without resorting to a rebuild or moving to high-efficiency modular components, is to run the existing data center hotter. ASHRAE publishes specification on acceptable ambient operating ranges for servers. Nearly all data centers run considerably colder than the high end of this range. There is a substantial cost to this increased safety margin. I wouldn’t recommend just raising the data center temperature and wait and see what breaks, but there are large enough savings to be had that it almost certainly is worth bringing in a Mechanical Engineer to study your center and make recommendations on temperature ranges you can operate at. Even small increases in data center temperature can yield substantial savings in over-all efficiency and will be reflected in improved PUE.
Other thoughts you’d like to share?
PUEs of 1.2 to 1.3 are a huge improvement for most enterprise data centers and it’s a step well within their reach without technology breakthrough. Better is clearly possible but it requires more innovation and it’s tough to fund this at anything other than very high scale. I would argue that getting a few mega-data centers down to a PUE of 1.15 is not nearly as interesting as getting all data centers under 1.3. The former is fun to read about, but the latter has much more leverage and saves much more power with so many small, inefficient data centers currently in use.
Why are you confident that this will work in 5k sq foot data centers as well as the mega datacenters you are familiar with at Microsoft?
As a thought experiment, let’s say we have a small to medium sized facility with say 2.5MW total power facility that operates at a PUE of 2.2. This facility is delivering 1.14MW to the IT equipment and the remaining 1.36MW is spent in power distribution and cooling losses (mostly cooling). If we improve that facility to a 1.3 PUE facility, that same 2.5MW would then be able to deliver 1.9MW to the IT equipment. Without changing the power available to operate the facility, we are effectively getting ¾ of a MW of additional power for servers without asking the utility for any more. This is essentially power growth without direct cost. If you were using 300W per server (choose a number appropriate for your configuration), you would be able to add 2,600 new serves without additional power costs in this scenario. A fairly compelling outcome but how can the owner of a small facility afford to do this?
One of the most environmentally friendly decisions that a small enterprise data center operator can make is to move some of the low value-add services to large service providers. These large providers can run large, very efficient data centers and, on products like email, a service offering may be more reliable and will almost certainly be cheaper and more environmentally conscious. However, many workloads can’t easily be moved to a service provider and some workloads need to stay in the enterprise data center. Since not everyone is going to have the scale of Google, Amazon, Microsoft, Yahoo, etc., how can we make a small data center efficient?
One answer to this question that has great potential and brings more benefits than just power savings is to use modular designs. I wrote about containerized systems two and half years ago in Architecture for Modular Data Centers and talked about many of their advantages. The key advantage that makes containers interesting in this discussion is that a well designed container will have a PUE in 1.25 to 1.35 range. There is no magic here. The advantage of containers in this context is they are delivered with efficient infrastructure designs. They come with an excellent power distribution design, good mechanical systems, and manufacturers will provision the modules with the servers of your choice.
Containers are now available from Rackable Systems, Dell, HP, IBM, Verrari, and others (see First Containerized Data Center Announcement). I particularly like the Dell and Rackable designs and both are available in roughly ½ MW configurations. Take two of these containers and put them on the roof with the air handling equipment or in the parking lot near the building. These containers take 3-phase, 480VAC power directly. Just route the 480VAC feed from the legacy data center and run it to the location where the containers were placed. Convert the existing, inefficient mini-data center that we are re-engineering into office space and meeting rooms. This approach takes back building space for its intended purpose, people working, and converts an inefficient data center into an environmentally sensitive installation of which you can be proud. And, with less power wasted, there is more power to run servers so this approach not only gains office space but it also increases data center capacity without additional power consumption.
Have A GREEN DAY and a GREEN 2009!