The Power Wall
Why Aren’t Modern CPUs Faster?
What Happened in the Late 1990’s?
Edward L. Bosworth, Ph.D.
What is the Problem?
problem is illustrated in this figure, taken from Patterson & Hennessy. The
SPECint performance of the hottest chip grew by 52% per year from 1986 to
2002, and then grew only 20% in the next three years (about 6% per year).
Here is a Clue to the Problem
problem is now called “the Power Wall”.
It is illustrated in this figure,
taken from Patterson & Hennessy.
· The design goal
for the late 1990’s and early 2000’s was to drive the clock
rate up. This was done by adding more transistors to a smaller chip.
this increased the power dissipation of the CPU chip
beyond the capacity of inexpensive cooling techniques.
Roadmap for CPU Clock Speed: Circa 2005
Here is the result of the best thought in 2005. By 2015, the clock speed
of the top “hot chip” would be in the 12 – 15 GHz range.
The CPU Clock Speed Roadmap (A Few Revisions Later)
This reflects the practical experience gained with
dense chips that were literally
“hot”; they radiated considerable thermal power and were difficult to cool.
Law of Physics: All electrical power consumed is eventually radiated as heat.
Summary of What Follows
are two solutions to the problem of the Power Wall.
We explore each.
1. The technique taken by IBM for
implementation on their large
z/10 and z/11 servers was to include sophisticated and costly
This allowed CPU clock rates in excess of
The commercial products today use 4.67 GHz CPU chips.
2. The technique taken by commodity processor
providers, such as Intel and
AMD. They have moved to a multicore design, in which each CPU chip
contains a moderate number of components. Due to cost constraints, the
chips are cooled by simple fans and all–metal heat radiators.
Each of these components might be called
a “CPU” or “processor”,
except that the design calls for them to be on a single chip, still
called the CPU. For that reason these units are called “cores”.
CPU chips with multiple processors per
chip are called “multicore”.
The term “multiprocessor microprocessor” just does not work.
The IBM Power 6 CPU
is the CPU used in the IBM large mainframes.
It has 790 million
transistors in a chip of area 341 square millimeters.
In the Z/10, the chip runs at 4.67 GHz. Lab prototypes have run at 6.0 GHz.
The Power 595 configuration of the Z/10 uses between
16 and 64 of the
Power 6 chips, each running at 5.0 GHz.
Here is a picture of the Power 6 chip and a typical module for its mounting.
IBM Cooling Technology
most chip manufacturers target commodity computers that cannot be
fitted with expensive cooling; IBM is targeting the mainframe community.
The IBM Power 6 CPU is generally placed in water cooled units.
copper tubing feeds cold water to cooling units in direct contact with the
CPU chips. Each CPU chip is laid out not to have “hot spots”.
Some engineers are using the warm water from the computer to heat buildings.
Cooling a Faster Single–Core CPU
We have seen how IBM does it; they provide a costly water cooling system.
Intel and AMD duplicate this design in a market that will not allow
the expense and complexity of a water cooling system?
Akasa Copper Heatsink Mugen 2 Cooler
Here are two options for air cooling of a commercial CPU chip.
A Google search for “Computer
Cooling Radiators” shows a brisk market
in water cooling units for commodity CPU chips.
Making a Faster Single–Core CPU
standard way is to make a more sophisticated pipeline.
This requires more transistors per chip, but that is not a problem.
code is executed top to bottom. Each
instruction is being handled by a
distinct stage of the pipeline. The CPU is executing five instructions at once.
BUT: More transistors mean more power, thus more heat.
We can also raise the CPU clock rate, but this causes more heat as well.
The Power Equation
Each CPU is nothing more than a collection of transistors, acting as switches.
modern CPU might have about a billion transistors on a single chip. The
IBM Power 6 chip has 790 million transistors in an area of 341 mm2.
power equation for a single transistor can be written as
Power = K (Capacitive Load)·(Voltage)2·(Frequency Switched)
keep the power the same, one should halve the voltage for every
speed increase of four.
Older chips (Intel 8086) ran at 5.0 volts; the newer ones run at about 1.5 volts.
a new chip, at the same voltage and capacitive load specifications would
emit (1.5/5.0)2 = (0.3)2 » 10% of the power of the older chip. This would seem
to allow a chip that is ten times faster (in clock speed).
However, faster clock rates sometimes demand higher voltages.
higher voltages mean less trouble due to random noise. A random
signal of 0.2 volts might disrupt a chip running at 1.0 volts, but not 5.0 volts.
CPU chip (code named “
in the actual clock rate. The fastest mass–produced chip ran at 3.8 GHz, though
some enthusiasts (called “overclockers”) actually ran the chip at 8.0 GHz.
release, this chip was thought to generate about 40% more heat per
clock cycle that earlier variants. This gave rise to the name “PresHot”.
which was intended to be scaled up eventually to ten gigahertz. The heat
problems could never be handled, and Intel abandoned the architecture.
following are adapted from a review of the
· The only way to keep it below 60 Celsius (140 F) was
to operate it
with the cover off and plenty of ventilation.
· Even equipped with the massive Akasa
King Copper heat sink (see a
previous slide), the system reached 77 Celsius (171 F) when operating
at 3.8 GHz under full load and shut itself down.
Multicore Chips: The Start of a New Line
than continuing to improve single–program performance, many
commercial chip manufacturers have adopted a “server mentality”; increase
the throughput of a number of programs running concurrently.
shall study parallel processing later.
At that time, we shall not that the
difficulty lies in keeping all processors doing productive work.
division of a single problem among a large number of processors, or the
use of a large number of processors for cooperating tasks, is difficult.
Recall that a multicore chip is just a CPU chip with multiple processors.
a server, especially a large one such as the IBM z/10, there are a large number
of independent processes that do not need to intercommunicate. Allocation of
processors (cores) to such a job mix is almost trivial.
Question: Compare a
single processor operating at 4 GHz to a
dual core processor with each core operating at 2 GHz.
dual core processor is likely to consume less power, but can it do
the same amount of work per unit time as the faster single core processor?
Intel’s Multicore Chip Offerings for 2010
2010, Intel Corporation has released a new series of multicore processors.
Here is a Intel Corp overview of this series.
All of these seem to be quad–core.
According to Intel, the multi–core technology will
· permanently alter the course of computing as we know it,
· provide new levels of energy efficient performance,
· deliver full parallel execution of multiple software threads, and
· reduce the amount of electrical power to do the computations.
The current technology
provides for one, two, four, or eight cores in
a single processor.
Intel expects to have
available soon single processors with several tens
of cores, if not one hundred.
This new technology
seems to be targeted at the commercial desktop machine,
which can “run several demanding modern applications at once”.
At present, there are
little hard data on multicore machines.
What we have mostly is marketing hype. That might change soon.
Slide 1 is the title slide.
2 and 3 are adaptations of Figures 1.15 and1.16 found in the textbook
Computer Organization and Design, Fourth Edition, by David A. Patterson
and John L. Hennessy, Morgan Kaufmann, 2009, ISBN 978–0–12–374493–7.
4 and 5 were taken from slides 7 and 8 of a presentation by Katherine
Yelick, NERSC Division Director (U.C. Berkeley and Lawrence Berkeley
National Laboratory). The second slide also appeared in a late 2009 issue of
the Communications of the ACM.
Slide 6 is a summary, with few quotes.
7 and 8 are based on Google searches, especially for “IBM Power 6”,
which is the CPU chip for the IBM z/10 and (soon to be released) z/11. While
visiting the zSeries manufacturing plant in
the CPU for the z/11 would be run at a speed above 5 GHz.
7 quotes details on the IBM Power 6 chip from the paper IBM Power6
Microarchitecture, IBM Journal Res. & Dev. Vol. 51, No. 6, November 2007.
9 shows two commercial cooling units for commodity CPUs.
The unit at the left is an Akasa Copper Heatsink. (www.akasa.com.tw/)
The unit on the right is a Mugen 2 Cooler (See
of the information on the Intel Prescott chip comes from the Wikipedia
article on “Pentium 4”. Some material is taken from a review by Sander Sassen,
found at http://www.hardwareanalysis.com/content/article/1693/.
15 is built on several sources, including:
the Intel White Paper Extending the World’s Most Popular
The Intel web site http://www.intel.com/multi-core/index.htm