The Power Wall For CPU Chips

What is the Problem?

The problem is illustrated in this figure, taken from Patterson & Hennessy. The
SPECint performance of the hottest chip grew by 52% per year from 1986 to
2002, and then grew only 20% in the next three years (about 6% per year).

Here is a Clue to the Problem

The problem is now called “the Power Wall”. It is illustrated in this figure,
taken from Patterson & Hennessy.

· The design goal for the late 1990’s and early 2000’s was to drive the clock
rate up. This was done by adding more transistors to a smaller chip.

· Unfortunately, this increased the power dissipation of the CPU chip
beyond the capacity of inexpensive cooling techniques.

Roadmap for CPU Clock Speed: Circa 2005

Here is the result of the best thought in 2005. By 2015, the clock speed
of the top “hot chip” would be in the 12 – 15 GHz range.

The CPU Clock Speed Roadmap (A Few Revisions Later)

This reflects the practical experience gained with dense chips that were literally
“hot”; they radiated considerable thermal power and were difficult to cool.

Law of Physics: All electrical power consumed is eventually radiated as heat.

Summary of What Follows

There are two solutions to the problem of the Power Wall.
We explore each.

1.    The technique taken by IBM for implementation on their large
       z/10 and z/11 servers was to include sophisticated and costly
       cooling technologies.

This allowed CPU clock rates in excess of 5.0 GHz.
The commercial products today use 4.67 GHz CPU chips.

2.    The technique taken by commodity processor providers, such as Intel and
       AMD. They have moved to a multicore design, in which each CPU chip
       contains a moderate number of components. Due to cost constraints, the
       chips are cooled by simple fans and all–metal heat radiators.

       Each of these components might be called a “CPU” or “processor”,
       except that the design calls for them to be on a single chip, still
       called the CPU. For that reason these units are called “cores”.

CPU chips with multiple processors per chip are called “multicore”.
The term “multiprocessor microprocessor” just does not work.

The IBM Power 6 CPU

This is the CPU used in the IBM large mainframes. It has 790 million
transistors in a chip of area 341 square millimeters.

In the Z/10, the chip runs at 4.67 GHz. Lab prototypes have run at 6.0 GHz.

The Power 595 configuration of the Z/10 uses between 16 and 64 of the
Power 6 chips, each running at 5.0 GHz.

Here is a picture of the Power 6 chip and a typical module for its mounting.

IBM Cooling Technology

While most chip manufacturers target commodity computers that cannot be
fitted with expensive cooling; IBM is targeting the mainframe community.

The IBM Power 6 CPU is generally placed in water cooled units.

The copper tubing feeds cold water to cooling units in direct contact with the
CPU chips. Each CPU chip is laid out not to have “hot spots”.

Some engineers are using the warm water from the computer to heat buildings.

Cooling a Faster Single–Core CPU

We have seen how IBM does it; they provide a costly water cooling system.

Could Intel and AMD duplicate this design in a market that will not allow
the expense and complexity of a water cooling system?

Akasa Copper Heatsink Mugen 2 Cooler

Here are two options for air cooling of a commercial CPU chip.

A Google search for “Computer Cooling Radiators” shows a brisk market
in water cooling units for commodity CPU chips.

Making a Faster Single–Core CPU

The standard way is to make a more sophisticated pipeline.
This requires more transistors per chip, but that is not a problem.

The code is executed top to bottom. Each instruction is being handled by a
distinct stage of the pipeline. The CPU is executing five instructions at once.

BUT: More transistors mean more power, thus more heat.

We can also raise the CPU clock rate, but this causes more heat as well.

The Power Equation

Each CPU is nothing more than a collection of transistors, acting as switches.

A modern CPU might have about a billion transistors on a single chip. The
IBM Power 6 chip has 790 million transistors in an area of 341 mm².

The power equation for a single transistor can be written as
Power = K (Capacitive Load)·(Voltage)²·(Frequency Switched)

To keep the power the same, one should halve the voltage for every
speed increase of four.

Older chips (Intel 8086) ran at 5.0 volts; the newer ones run at about 1.5 volts.

Such a new chip, at the same voltage and capacitive load specifications would
emit (1.5/5.0)² = (0.3)² » 10% of the power of the older chip. This would seem
to allow a chip that is ten times faster (in clock speed).

However, faster clock rates sometimes demand higher voltages.

Also, higher voltages mean less trouble due to random noise. A random
signal of 0.2 volts might disrupt a chip running at 1.0 volts, but not 5.0 volts.

The Intel Prescott: The End of the Line

The CPU chip (code named “Prescott” by Intel) appears to be the high–point
in the actual clock rate. The fastest mass–produced chip ran at 3.8 GHz, though
some enthusiasts (called “overclockers”) actually ran the chip at 8.0 GHz.

Upon release, this chip was thought to generate about 40% more heat per
clock cycle that earlier variants. This gave rise to the name “PresHot”.

The Prescott was an early model in the architecture that Intel called “NetBurst”,
which was intended to be scaled up eventually to ten gigahertz. The heat
problems could never be handled, and Intel abandoned the architecture.

The following are adapted from a review of the Prescott by Sander Sassen.

· The Prescott idled at 50 degrees Celsius (122 degrees Fahrenheit)

· The only way to keep it below 60 Celsius (140 F) was to operate it
with the cover off and plenty of ventilation.

· Even equipped with the massive Akasa King Copper heat sink (see a
previous slide), the system reached 77 Celsius (171 F) when operating
at 3.8 GHz under full load and shut itself down.

Multicore Chips: The Start of a New Line

Rather than continuing to improve single–program performance, many
commercial chip manufacturers have adopted a “server mentality”; increase
the throughput of a number of programs running concurrently.

We shall study parallel processing later. At that time, we shall not that the
difficulty lies in keeping all processors doing productive work.

The division of a single problem among a large number of processors, or the
use of a large number of processors for cooperating tasks, is difficult.

Recall that a multicore chip is just a CPU chip with multiple processors.

In a server, especially a large one such as the IBM z/10, there are a large number
of independent processes that do not need to intercommunicate. Allocation of
processors (cores) to such a job mix is almost trivial.

Question: Compare a single processor operating at 4 GHz to a
dual core processor with each core operating at 2 GHz.

The dual core processor is likely to consume less power, but can it do
the same amount of work per unit time as the faster single core processor?

Intel’s Multicore Chip Offerings for 2010

For 2010, Intel Corporation has released a new series of multicore processors.
Here is a Intel Corp overview of this series.

All of these seem to be quad–core.

Intel’s Rationale

According to Intel, the multi–core technology will

· permanently alter the course of computing as we know it,

· provide new levels of energy efficient performance,

· deliver full parallel execution of multiple software threads, and

· reduce the amount of electrical power to do the computations.

The current technology provides for one, two, four, or eight cores in
a single processor.

Intel expects to have available soon single processors with several tens
of cores, if not one hundred.

This new technology seems to be targeted at the commercial desktop machine,
which can “run several demanding modern applications at once”.

At present, there are little hard data on multicore machines.
What we have mostly is marketing hype. That might change soon.

References

Slide 1 is the title slide.

Slides 2 and 3 are adaptations of Figures 1.15 and1.16 found in the textbook
Computer Organization and Design, Fourth Edition, by David A. Patterson
and John L. Hennessy, Morgan Kaufmann, 2009, ISBN 978–0–12–374493–7.

Slides 4 and 5 were taken from slides 7 and 8 of a presentation by Katherine
Yelick, NERSC Division Director (U.C. Berkeley and Lawrence Berkeley
National Laboratory). The second slide also appeared in a late 2009 issue of
the Communications of the ACM.

Slide 6 is a summary, with few quotes.

Slides 7 and 8 are based on Google searches, especially for “IBM Power 6”,
which is the CPU chip for the IBM z/10 and (soon to be released) z/11. While
visiting the zSeries manufacturing plant in Poughkeepsie, NY, I was told that
the CPU for the z/11 would be run at a speed above 5 GHz.

Slide 7 quotes details on the IBM Power 6 chip from the paper IBM Power6
Microarchitecture, IBM Journal Res. & Dev. Vol. 51, No. 6, November 2007.

Slide 9 shows two commercial cooling units for commodity CPUs.
The unit at the left is an Akasa Copper Heatsink. (www.akasa.com.tw/)
The unit on the right is a Mugen 2 Cooler (See
http://www.scythe-eu.com/en/products/cpu-cooler/mugen-cpu-cooler.html)

Much of the information on the Intel Prescott chip comes from the Wikipedia
article on “Pentium 4”. Some material is taken from a review by Sander Sassen,
found at http://www.hardwareanalysis.com/content/article/1693/.

Slide 15 is built on several sources, including:
the Intel White Paper Extending the World’s Most Popular
Processor Architecture.

The Intel web site http://www.intel.com/multi-core/index.htm