The Power Wall
Why
Aren’t Modern CPUs Faster?
What
Happened in the Late 1990’s?
Edward L. Bosworth, Ph.D.
Associate Professor
January 2010
What is the
Problem?
The
problem is illustrated in this figure, taken from Patterson & Hennessy. The
SPECint performance of the hottest chip grew by 52%
per year from 1986 to
2002, and then grew only 20% in the next three years (about 6% per year).
Here is a
Clue to the Problem
The
problem is now called “the Power Wall”.
It is illustrated in this figure,
taken from Patterson & Hennessy.
· The design goal
for the late 1990’s and early 2000’s was to drive the clock
rate up. This was done by adding more transistors to a
smaller chip.
· Unfortunately,
this increased the power dissipation of the CPU chip
beyond the capacity of inexpensive
cooling techniques.
Roadmap for
CPU Clock Speed: Circa 2005
Here is the result of the best thought in 2005. By 2015, the clock speed
of the top “hot chip” would be in the 12 – 15 GHz range.
The CPU
Clock Speed Roadmap (A Few Revisions Later)
This reflects the practical experience gained with
dense chips that were literally
“hot”; they radiated considerable thermal power and were difficult to cool.
Law of Physics:
All electrical power consumed is eventually radiated as heat.
Summary of
What Follows
There
are two solutions to the problem of the Power Wall.
We explore each.
1. The technique taken by IBM for
implementation on their large
z/10 and z/11 servers was to
include sophisticated and costly
cooling technologies.
This allowed CPU clock rates in excess of
5.0 GHz.
The
commercial products today use 4.67 GHz CPU chips.
2. The technique taken by commodity processor
providers, such as Intel and
AMD. They have moved to a multicore design, in which
each CPU chip
contains a moderate number of
components. Due to cost constraints, the
chips are cooled by simple fans and
all–metal heat radiators.
Each of these components might be called
a “CPU” or “processor”,
except that the design calls for
them to be on a single chip, still
called the CPU. For that reason these units are called “cores”.
CPU chips with multiple processors per
chip are called “multicore”.
The term “multiprocessor
microprocessor” just does not work.
The IBM
Power 6 CPU
This
is the CPU used in the IBM large mainframes.
It has 790 million
transistors in a chip of area 341 square millimeters.
In
the Z/10, the chip runs at 4.67 GHz. Lab
prototypes have run at 6.0 GHz.
The Power 595 configuration of the Z/10 uses between
16 and 64 of the
Power 6 chips, each running at 5.0 GHz.
Here
is a picture of the Power 6 chip and a typical module for its mounting.
IBM Cooling
Technology
While
most chip manufacturers target commodity computers that cannot be
fitted with expensive cooling; IBM is targeting the mainframe community.
The
IBM Power 6 CPU is generally placed in water cooled units.
The
copper tubing feeds cold water to cooling units in direct contact with the
CPU chips. Each CPU chip is laid out not
to have “hot spots”.
Some
engineers are using the warm water from the computer to heat buildings.
Cooling a
Faster Single–Core CPU
We
have seen how IBM does it; they provide a costly water cooling system.
Could
Intel and AMD duplicate this design in a market that will not allow
the expense and complexity of a water cooling system?
Akasa
Copper Heatsink Mugen 2 Cooler
Here
are two options for air cooling of a commercial CPU chip.
A Google search for “Computer
Cooling Radiators” shows a brisk market
in water cooling units for commodity CPU chips.
Making a
Faster Single–Core CPU
The
standard way is to make a more sophisticated pipeline.
This requires more transistors per chip, but that is not a problem.
The
code is executed top to bottom. Each
instruction is being handled by a
distinct stage of the pipeline. The CPU
is executing five instructions at once.
BUT: More
transistors mean more power, thus more heat.
We
can also raise the CPU clock rate, but this causes more heat as well.
The Power
Equation
Each
CPU is nothing more than a collection of transistors, acting as switches.
A
modern CPU might have about a billion transistors on a single chip. The
IBM Power 6 chip has 790 million transistors in an area of 341 mm2.
The
power equation for a single transistor can be written as
Power = K (Capacitive Load)·(Voltage)2·(Frequency Switched)
To
keep the power the same, one should halve the voltage for every
speed increase of four.
Older
chips (Intel 8086) ran at 5.0 volts; the newer ones run at about 1.5 volts.
Such
a new chip, at the same voltage and capacitive load specifications would
emit (1.5/5.0)2 = (0.3)2 » 10% of the power of the older chip. This would seem
to allow a chip that is ten times faster (in clock speed).
However,
faster clock rates sometimes demand higher voltages.
Also,
higher voltages mean less trouble due to random noise. A random
signal of 0.2 volts might disrupt a chip running at 1.0 volts, but not 5.0
volts.
The Intel
The
CPU chip (code named “
in the actual clock rate. The fastest
mass–produced chip ran at 3.8 GHz, though
some enthusiasts (called “overclockers”) actually ran
the chip at 8.0 GHz.
Upon
release, this chip was thought to generate about 40% more heat per
clock cycle that earlier variants. This
gave rise to the name “PresHot”.
The
which was intended to be scaled up eventually to ten gigahertz. The heat
problems could never be handled, and Intel abandoned the architecture.
The
following are adapted from a review of the
· The
· The only way to keep it below 60 Celsius (140 F) was
to operate it
with the cover off and plenty of ventilation.
· Even equipped with the massive Akasa
King Copper heat sink (see a
previous slide), the system reached 77 Celsius (171 F) when operating
at 3.8 GHz under full load and shut itself down.
Multicore
Chips: The Start of a New Line
Rather
than continuing to improve single–program performance, many
commercial chip manufacturers have adopted a “server mentality”; increase
the throughput of a number of programs running concurrently.
We
shall study parallel processing later.
At that time, we shall not that the
difficulty lies in keeping all processors doing productive work.
The
division of a single problem among a large number of processors, or the
use of a large number of processors for cooperating tasks, is difficult.
Recall
that a multicore chip is just a CPU
chip with multiple processors.
In
a server, especially a large one such as the IBM z/10, there are a large number
of independent processes that do not need to intercommunicate. Allocation of
processors (cores) to such a job mix is almost trivial.
Question: Compare a
single processor operating at 4 GHz to a
dual core processor
with each core operating at 2 GHz.
The
dual core processor is likely to consume less power, but can it do
the same amount of work per unit time as the faster single core processor?
Intel’s
Multicore Chip Offerings for 2010
For
2010, Intel Corporation has released a new series of multicore processors.
Here is a Intel Corp overview of this series.
All of these seem to be
quad–core.
Intel’s
Rationale
According
to Intel, the multi–core technology will
· permanently alter the course of computing as we know
it,
· provide new levels of energy efficient performance,
· deliver full parallel execution of multiple software
threads, and
· reduce the amount of electrical power to do the
computations.
The current technology
provides for one, two, four, or eight cores in
a single processor.
Intel expects to have
available soon single processors with several tens
of cores, if not one hundred.
This new technology
seems to be targeted at the commercial desktop machine,
which can “run several demanding modern applications at once”.
At present, there are
little hard data on multicore machines.
What we have mostly is marketing hype.
That might change soon.
References
Slide
1 is the title slide.
Slides
2 and 3 are adaptations of Figures 1.15 and1.16 found in the textbook
Computer Organization and Design,
Fourth Edition, by David A. Patterson
and John L. Hennessy, Morgan Kaufmann, 2009, ISBN 978–0–12–374493–7.
Slides
4 and 5 were taken from slides 7 and 8 of a presentation by Katherine
Yelick, NERSC Division Director (U.C. Berkeley and
Lawrence Berkeley
National Laboratory). The second slide also
appeared in a late 2009 issue of
the Communications of the ACM.
Slide
6 is a summary, with few quotes.
Slides
7 and 8 are based on Google searches, especially for “IBM Power 6”,
which is the CPU chip for the IBM z/10 and (soon to be released) z/11. While
visiting the zSeries manufacturing plant in
the CPU for the z/11 would be run at a speed above 5 GHz.
Slide
7 quotes details on the IBM Power 6 chip from the paper IBM Power6
Microarchitecture, IBM Journal Res. & Dev. Vol. 51, No. 6, November 2007.
Slide
9 shows two commercial cooling units for commodity CPUs.
The unit at the left is an Akasa Copper Heatsink. (www.akasa.com.tw/)
The unit on the right is a Mugen 2 Cooler (See
http://www.scythe-eu.com/en/products/cpu-cooler/mugen-cpu-cooler.html)
Much
of the information on the Intel Prescott chip comes from the Wikipedia
article on “Pentium 4”. Some material is
taken from a review by Sander Sassen,
found at http://www.hardwareanalysis.com/content/article/1693/.
Slide
15 is built on several sources, including:
the Intel White Paper Extending the World’s Most Popular
Processor
Architecture.
The
Intel web site http://www.intel.com/multi-core/index.htm