Interview with James Hamilton on PUE goals
James Hamilton is Vice President and Distinguished Engineer at Amazon Web Services. He specializes in infrastructure efficiency, reliability and scaling. Prior to joining Amazon.com, James was Microsoft Data Center Futures Architect. He has spent more than 20 years working on high-scale services, database management systems, and compilers.
What do you think is our greatest
challenge and our lowest hanging fruit to reduce energy 10% across the industry
over the next 18 months? What about the harder tasks, to decrease energy use
20% over 36 months?
The big levers are: 1)
server utilization, 2) high-efficiency servers, and 3) high-efficiency
mechanical systems. In a good but not even close to industry-leading data
center with a PUE of 1.7, about 59% of the power delivered to the center is
delivered to the servers. That tells us to look first at how efficiently we’re
using the servers since that’s where most of the power is going. Industry-wide,
server utilization is terrible and a good
utilization number is 30%. Most are significantly worse.
Many techniques can be
employed to raise utilization levels. Most revolve around workload scheduling
and shutting servers off. A technique that I really like is called Resource Consumption Shaping which is, in effect, shifting peak utilization and
flattening the sinusoidal consumption curves. In Should We Shut Off Servers, I argue that shutting servers off should not be
the first choice when maximizing work done per dollar although it will help
with power consumption. The core of the argument is that server hardware costs
and infrastructure are higher than the power costs so the first goal should be
to use
the servers rather than shut them off or, worse, leave them idle. Shutting them
off saves power, but doesn’t help with server costs, power provisioning costs,
and mechanical systems costs. If the servers can deliver more value to the
business than the marginal cost of power, then “off” should not be the first
choice.
Over the next few weeks,
I’ll post to Perspectives some experimental work where we run a production web service workload
on prototype servers built from client-side parts and realize more than a 3x
advantage in work done/joule and work done/dollar and more than a 9x advantage
in work done/rack unit. We still need to get data on increased failure rates
but, generally, high efficiency servers have great promise. In Annual Fully Burdened Cost of Power, we show that replacing inefficient servers early
can make good economic (and environmental) sense.
One technique of improving
mechanical system efficiency in an existing data center build, without
resorting to a rebuild or moving to high-efficiency modular components, is to
run the existing data center hotter. ASHRAE publishes specification on acceptable ambient operating ranges for
servers. Nearly all
data centers run considerably colder than the high end of this range. There is a
substantial cost to this increased safety margin. I wouldn’t recommend just
raising the data center temperature and wait and see what breaks, but there are
large enough savings to be had that it almost certainly is worth bringing in a
Mechanical Engineer to study your center and make recommendations on
temperature ranges you can operate at. Even small increases in data center temperature can yield substantial
savings in over-all efficiency and will be reflected in improved PUE.
Other thoughts you’d like to share?
PUEs of 1.2 to 1.3 are a huge improvement for most enterprise data centers and it’s a step well within their reach without technology breakthrough. Better is clearly possible but it requires more innovation and it’s tough to fund this at anything other than very high scale. I would argue that getting a few mega-data centers down to a PUE of 1.15 is not nearly as interesting as getting all data centers under 1.3. The former is fun to read about, but the latter has much more leverage and saves much more power with so many small, inefficient data centers currently in use.
Why are you confident that this will
work in 5k sq foot data centers as well as the mega datacenters you are
familiar with at Microsoft?
As a thought experiment,
let’s say we have a small to medium sized facility with say 2.5MW total power
facility that operates at a PUE of 2.2. This facility is delivering 1.14MW to
the IT equipment and the remaining 1.36MW is spent in power distribution and
cooling losses (mostly cooling). If we improve that facility to a 1.3 PUE
facility, that same 2.5MW would then be able to deliver 1.9MW to the IT
equipment. Without changing the power available to operate the facility, we are
effectively getting ¾ of a MW of additional power for servers without asking
the utility for any more. This is essentially power growth without direct cost.
If you were using 300W per server (choose a number appropriate for your
configuration), you would be able to add 2,600 new serves without additional
power costs in this scenario. A fairly compelling outcome but how can the owner
of a small facility afford to do this?
One of the most environmentally
friendly decisions that a small enterprise data center operator can make is to
move some of the low value-add services to large service providers. These large
providers can run large, very efficient data centers and, on products like
email, a service offering may be more reliable and will almost certainly be
cheaper and more environmentally conscious. However, many workloads can’t
easily be moved to a service provider and some workloads need to stay in the
enterprise data center. Since not everyone is going to have the scale of
Google, Amazon, Microsoft, Yahoo, etc.,
how can we make a small data center efficient?
One answer to this
question that has great potential and brings more benefits than just power
savings is to use modular designs. I wrote about containerized systems two and
half years ago in Architecture for Modular Data Centers and talked about many of their advantages. The key
advantage that makes containers interesting in this discussion is that a well
designed container will have a PUE in 1.25 to 1.35 range. There is no magic
here. The advantage of containers in this context is they are delivered with
efficient infrastructure designs. They come with an excellent power
distribution design, good mechanical systems, and manufacturers will provision
the modules with the servers of your choice.
Containers are now
available from Rackable Systems, Dell, HP, IBM, Verrari, and others (see First Containerized Data Center Announcement). I
particularly like the Dell and Rackable designs and both are available in
roughly ½ MW configurations. Take two of these containers and put them on the
roof with the air handling equipment or in the parking lot near the building. These
containers take 3-phase, 480VAC power directly. Just route the 480VAC feed from
the legacy data center and run it to the location where the containers were
placed. Convert the existing, inefficient mini-data center that we are
re-engineering into office space and meeting rooms. This approach takes back
building space for its intended purpose, people working, and converts an
inefficient data center into an environmentally sensitive installation of which
you can be proud. And, with less power wasted, there is more power to run
servers so this approach not only gains office space but it also increases data
center capacity without additional power consumption.
Have A GREEN DAY and a GREEN 2009!
Mr. Hamilton brings much desired new directions to the datacenter world.
There's a lot of work to be done to convince Enterprise datacenter managers to look at containers. A lot of container innovations come from non-Tier-1 server vendors, but most brick-n-mortar enterprises are already tied to Tier-1 server vendors. Luckily, the web enterprises, which make money with their servers, usually have much larger and more homogeneous server needs then brick-n-mortar companies which just use servers to support their business. And web enterprises don't have these legacy 30-year-old relationships to tier-1 vendors, so they can easily buy modular datacenter infrastructure based on features, innovation, and price/performance/watts/sqft.
I have no doubt that 2009, and probably also 2010, will be the "year of the modular data center". I also sense the beginning of the "container hosting datacenter wars" between datacenter and co-lo operators which will start building infrastructure to host end-user containers. It’s a great opportunity for smaller and more innovative datacenter operators, and I think that those who are smart enough and fast enough to act now, will be able to lead the pack and make money. There’s a lot of work to be done here, for example what model will be used to charge for hosting containers, and we also need an official or at least de-facto standard for hooking up container to power, network and chilled water.
Posted by: Lior Paster | January 01, 2009 at 08:17 AM