Functional Units
of a Modern Computer
We
begin this lecture by repeating a figure from a previous lecture.
Logically
speaking a computer has four components.
Connecting the
Components
Early
schemes for connecting the components could be very messy.
The backplane of
a PDP–10, circa 1970
Another
Motherboard
Copper Traces on
a PCB
These
traces, which do the job of wires, are the real advantage of a PCB. All
modern busses are just collections of traces, aluminum or copper.
Types of Busses
There
are a number of ways to classify computer busses.
Here
we focus on one distinction: synchronous and asynchronous.
A
bus transaction is an exchange of
data or control information.
In
simple transactions, a source
asserts signals onto a bus, and a
destination copies the values into some sort of buffer memory.
More
complex transactions might comprise a number of simple transactions.
A memory read transaction might include the following:
Assert the memory address on the
address lines.
Assert READ on the control lines.
Memory places data on the data
lines.
The
distinction of simple vs. complex transactions is not important.
The term “transaction” covers a
variety of possible exchanges.
A
synchronous bus is one in which
there is a clock signal, used to coordinate
bus transactions. Naturally, it has a
line (or trace) dedicated to transmit it.
An
asynchronous bus is one without a
clock signal.
It often uses control signals, such as REQ
and ACK, to coordinate transactions.
Clock
Terminology
Clock
frequencies are specified in Hertz,
or cycles per second. Common
units include KHz (thousand cycles per second), MHz (million cycles per
second),
and GHZ (billion cycles per second).
The
inverse of clock frequency is clock
period or clock cycle time.
Consider
a 2 GHz clock. The frequency is 2.0·109 per second, written
more precisely as 2.0·109
sec–1.
The
period is 1.0 / (2.0·109
sec–1) = 0.5·10–9
sec = 0.5 nanosecond
=
500 picoseconds.
Clock
signals are described with standard terminology.
The
clock may be said to “tick” on every rising edge.
The Clock:
System and Bus
A
modern computer is a synchronous sequential machine.
A synchronous machine is one in which
the actions are coordinated by a
central clock signal, either directly or indirectly.
A
sequential machine is one that has
memory and in which the results of
a computation depend on both the input and the contents of memory.
Each
computer has one “master clock”,
probably not called by that name.
All
other clock signals are derived from this master clock.
We
can have a frequency divider used to produce a slower clock. For example,
a 2 GHz (2000 MHz) clock signal can be converted by a divide–by–four
circuit into a 500 MHz clock signal.
There
are also frequency doublers, used to
produce a faster clock.
A 1 GHz clock signal can be doubled to produce a 2 GHz clock signal.
Bus
clocks are often in the 150 MHz to 250 MHz range.
Bus Clock
Signals
The
bus clock frequency is usually a fraction of the system clock frequency.
In
a typical example, a 2 GHz system clock can be passed through a divide–by–8
circuit to produce a 250 MHz bus clock signal.
System
clock signals are generally represented as square waves, as we have done
in previous slides in this lecture.
Bus
clock signals are commonly represented as trapezoidal
waves, to emphasize
the fact that signal levels do not change instantaneously.
There
is no abstract theory behind this representation; it is just common.
Signal Assertion
Levels
A
control signal is said to be asserted
when it causes some bus action to take place.
A
signal may be asserted high or asserted low.
The
terms high and low refer to the voltages used to represent logical signals.
In the original TTL protocol, the following are standard.
5.0 volts logic 1
0.0 volts logic 0.
In
TTL, a signal is asserted high when
it is driven from 0 volts to 5 volts,
in order to achieve its affect.
In
TTL, a signal is asserted low when
it is driven from 5 volts to 0 volts,
in order to achieve its affect.
A
control signal that is active high is simply named. X
is active high.
There
are several notations used to indicate that a signal is active low.
Here are several common notations.
Example: Memory
Control Signals
Consider
a memory unit that is controlled by the CPU.
There
are two signals to the memory. One
activates it and one indicates
the type of transaction commanded by the CPU.
SELECT# This active–low
signal selects the memory chip and activates
its circuitry. When high, the chip is inactive.
R/W# This
is a two–valued signal, indicating which of the two
possible transactions
is commanded by the CPU.
The
symbol with the #, W#,
indicates the action commanded when the signal
is asserted low.
The
symbol without the #, R,
indicates the action when the signal is high.
If
R/W# = 0, the CPU is writing to memory.
If
R/W# = 1, the CPU is reading from memory.
Memory Control
The
effect of the control signals on memory is summarized in this truth table.
Select# |
R/W# |
Action |
1 |
0 |
Memory
contents are not changed |
1 |
1 |
|
0 |
0 |
CPU
writes data to the memory. |
0 |
1 |
CPU
reads data from the memory. |
There is a
standard shorthand for the above table.
Select# |
R/W# |
Action |
1 |
d |
Memory contents are not changed or accessed. |
0 |
0 |
CPU writes data to the memory. |
0 |
1 |
CPU reads data from the memory. |
The “d” in the top row indicates that when
Select# = 1, the value of the R/W#
signal has no affect on what the memory does.
It stands for “don’t care”.
Synchronous
Control Signals
Here
is a depiction of two control signals, each of which is asserted low.
Signal
A# is asserted during the second half of clock pulse T1.
Signal
B# is asserted during the first half of clock pulse T1 and is
shown as asserted for the duration of this diagram.
Signal
A# is somewhat unusual in that it is asserted at the same time as a
clock transition.
Often
signals are asserted before a transition so that the signal is stable
at the time of the transition.
Remember the
Garden Hose
At
the time scales of human experience, propagation of electric signals
appears to be instantaneous.
In
the world of CPU design, when timings are routinely expressed in tens
of picoseconds (1/100 of a nanosecond or 10–11 second), times to
charge
a control line become significant.
Consider
a bus with a length of 10 centimeters, about 4 inches.
The
speed of light is about 30 centimeters per nanosecond.
The speed of signal transmission of a bus is about 20 centimeters per
nanosecond.
The
signal will take about 0.5 nanoseconds, or 500 picoseconds, to traverse the
length of the bus.
The
signal on the bus is said to be stable after 500 picoseconds.
Collections of
Signals
We
now have notations for control signals.
What
about other signals, such as addresses and data?
The
interest in representing such non–control signals is not their
actual value, but whether or not they are valid.
For
example, we do not want to know the actual values on address lines, but
whether or not there is a valid address being asserted.
The
above notation might be used for a 32–bit address line.
A
valid address is asserted on these lines in the middle of the first half of T1.
It remains valid until the middle of the first half of T2.
A Synchronous
Bus Timing Diagram
This
is a bus read. The sequence: the address
becomes valid, RD# is asserted,
and later the data become valid.
An Asynchronous
Bus Timing Diagram
Here,
the importance is the interplay of the Master
Synchronization (MSYN#)
and Slave Synchronization (SSYN#)
signals.
The
sequence: 1. The address becomes valid; MREQ#
and RD# are asserted low.
2. MSYN#
is asserted low, causing the memory to react.
3. Data become valid and SSYN# is asserted low.
4. When SSYN#
goes high, the data are no longer valid.
More on
Commercial Memory Modules
Here
again is a picture of a commercial memory module.
Here
is a picture of the slot into which three memory modules may be placed.
Inserting the
DIMM
The
next figure shows how to insert the Dual
In–Line Memory Module.
The
previous memory module had eight memory chips.
This module has eight memory chips and a control chip in the middle.
We
shall discuss this organization of a module into multiple chips in a later
lecture.
Connecting the
CPU to the Motherboard
Early
CPU chips in the Intel line, such as the 8088 and 80286 could be packaged as
DIP (dual in–line pin) modules. Here is a picture of the Intel 8086.
Later
Pentium CPU chips required more pins, hence different packaging.
Cooling the
Central Processor Unit
A
modern CPU generates a lot of heat. This
has given rise to an industry
devoted to building cooling devices.
Here is a picture of one such device.
Note
that it is the CPU attached to the top of this radiator device.
We do not see the fan.
The Fetch –
Execute Cycle
Again
we mention the fetch–execute cycle, which is the key feature of a
stored program computer.
The
instruction is fetched from memory and then executed.
The
cycle can be described differently, with more phases, as useful to support
discussions, but it is really the same.
Discussions
of the MIPS–32 architecture divide the process into five phases:
1. Fetch instruction from memory.
2. Decode the instruction and read two registers.
3. Execute the operation or calculate an address.
4. Access an operand in data memory or write back
a result.
5. For LW only, write the results of the memory
read into a register.
LW (Load Word) loads a 32–bit
register from memory.
This key cycle focuses on the IR (Instruction Register),
which is a part of
the CU (Control Unit) of the
CPU.
Any binary data copied into the IR are
automatically interpreted as an instruction.
What Is
Executed?
Each
executable instruction is represented in memory as a binary number.
Your
instructor prefers to use hexadecimal representation of these numbers,
as he finds it easier to read. Consider
the following sequence.
B8 23 01 05
25 00 8B D8
03 D8 8B CB
2B C8 2B C0
EB EE
This
is represented in standard format, with four bytes (eight hexadecimal digits)
per line. At this point, we cannot
interpret this sequence.
The
sequence might represent four floating–point numbers, followed by a
16–bit integer. It might represent a
strange sequence of character data.
If
this sequence is read into the IR
(Instruction Register), but not all at once, it
will be interpreted as a sequence of assembly language instructions.
The
process of reverse engineering a sequence of absolute binary code into a
sequence of assembly language instructions is called disassembly.
The Disassembly
of the Sample Code
Here
is the disassembled object code.
B82301 MOV AX, 0123 ; Move value 0x0123 to AX
052500 ADD AX, 0025 ; Add value 0x0025 to AX
8BD8 MOV BX, AX ;
Copy contents of AX into BX
03D8 ADD BX, AX ;
Add contents of BX to AX
8BCB MOV CX, AX ;
Copy contents of AX into CX
2BC8 SUB CX, AX ;
Subtract AX from CX
2BC0 SUB AX, AX ;
Subtract AX from AX, clearing it
EBEE JMP 100 ;
Go to address 100
We shall discuss
disassembly a bit more in future lectures.
Basically, this is the
main function of a good debugger.
Terminology:
The
name “assembly language” refers to
the sequence of human readable code
as seen in the column above. Assembly language
instructions, called “mnemonics”
are supposed to be suggestive to the programmer.
The
term “object code” refers to the
sequence of binary numbers, as shown in the
left column and on the previous slide.
Such code is often represented in hexadecimal
form for easier reading.
The
Assembly–Link–Execute Cycle
Here
is the common sequence for processing assembly language programs.
The process starts with a file in the form of readable text.
The assembler produces listing files (to
be read by humans) as well as the object file,
containing object code, which is almost ready for execution.
The
linker connects the object code of
the user program with object code from any
external procedures used by that code.
It produces a file ready for execution.
The
loader places the executable file
into memory and starts execution.
The
process for higher level languages is similar, except for the fact that the
source
code is said to be compiled. Compilers for IBM mainframe languages tend to
emit
assembly language that is then assembled.
Other compilers emit object code.
The Microsoft
Common Language Infrastructure
Microsoft
introduced its .NET architecture on February 13, 2002.
One
of the goals of this architecture was to make it easier to call programs
written
in one of Microsoft’s languages from those written in another.
One
common option was to call C++ code from Visual Basic code.
The VB code would handle the user interface.
The C++ code would handle the computations.
In
order to make the languages interoperable, certain standards were required.
One
change gave rise to the CIL (Common Intermediate Language),
originally
called MSIL (Microsoft Intermediate Language).
The
process of producing an executable program in this architecture is a bit
indirect.
1. The
compiler converts the source code into a CIL file.
2. The
CIL file is converted to a form called bytecode that is called a .NET assembly.
3. The
.NET assembly is assembled into an object file and linked to form
an executable file. This is then executed.
There
are many more features to the CLI
(Common Language Infrastructure)
than are mentioned in this slide.
The Memory
Component
Here
are a few introductory remarks on memory.
It will be studied in greater
detail in two future sets of lectures.
Modern
computers tend to have memory that is byte
addressable. A byte corresponds
to the size required to store the older character codes. It is a grouping of 8 binary bits.
The
CPU has two registers dedicated to handling memory.
The MAR (Memory
Address Register) holds the address being accessed.
The MBR (Memory
Buffer Register) holds the data being written to the memory or
being read from the memory. This is sometimes called the Memory Data Register.
Modern
memory must allow for addressable entities that have larger sizes. Common
lengths are 16, 32, and 64 bits, corresponding to 2, 4, and 8 bytes.
How
do we address these larger items? Here,
we assume standard memory alignments
for multiple byte items, as these yield more efficient programs.
The
standard: every byte has its own
address.
16–bit ( 2 byte) words
have addresses that are multiples of 2.
32–bit (4 byte) words
have addresses that are multiples of 4.
Consider a 64 KB
Memory
This
is considered only because it allows for very easy illustrations.
64
K = 216 = 65,536.
Byte
addresses run the range from 0 through 65,535.
16–bit
word addresses: 0, 2, 4, …., 65,534.
32–bit
word addresses: 0, 4, 8, …., 65,532.
The
16–bit word at address 65,534 contains the bytes at
addresses 65,534 and 65,535.
The
32–bit word at address 65,532 contains bytes at
addresses 65532, 65533, 65534, and 65535.
In
general, a 16–bit word at address N
comprises two bytes,
at address N and address N + 1.
In
general, a 32–bit word at address N
comprises four bytes,
at addresses N, N + 1, N + 2, and N + 3.
Which
bytes go where? This has given rise to
the discussion called
“Big–Endian vs. Little–Endian”.
Example: A
32–bit Value
Suppose
that the 32–bit register EAX
contains the value 0x01020304 which has
decimal representation 16, 909, 060. The
value in EAX can be represented as:
This
figure uses the bit numbering scheme common for Intel computers.
Suppose
the instruction MOV Z, EAX is executed.
What is placed into address Z?
This
depends on whether the computer is a big–endian or little–endian device.
(The
Pentium is a little–endian device, but we shall examine both options.)
The
little–endian design stores the “little
end” at address Z.
The
big–endian design stores the “big
end” at address Z.
The 32–Bit Example
Continued
The
value that goes into each address is a one–byte number, comprising
two hexadecimal digits. The hexadecimal
digits are not reversed.
Address Big-Endian Little-Endian
Z 01 04
Z + 1 02 03
Z + 2 03 02
Z + 3 04 01
Here
is a graphic representation of the same process.
Example: “Core
Dump” at Address 0x200
Consider
this memory map.
Address |
0x200 |
0x201 |
0x202 |
0x203 |
Contents |
02 |
04 |
06 |
08 |
What is the
32–bit value stored at address 0x200?
Big–endian: The number is 0x02040608. Its decimal value is
2·2563
+ 4·2562
+ 6·2561
+ 8·1 = 33,818,120
Little–endian: The number is 0x08060402. Its decimal value is
8·2563
+ 6·2562
+ 4·2561
+ 2·1 = 134,611,970.
NOTE: Read the bytes backwards, not the hexadecimal
digits.
What is the 16–bit
value stored at address 0x200?
Big–endian: The number is 0x0204. Its decimal value is
2·256
+ 4 = 516
Little–endian: The number is 0x0402. Its decimal value is
4·256
+ 2·1 = 1,206.
The
16–bit value at 0x200 comprises bytes at addresses 0x200 and 0x201 only.
The values at 0x202 and 0x203 are not relevant.
I/O Ports
Computer
memory is best considered as an array of addressable bytes.
I/O ports are addressed
similarly, but are best considered as an array of
addressable registers, each of fixed width.
Each
I/O device is connected to the computer bus through a number of registers
called I/O ports. There are three main
types used.
Data
register For an input device,
this contains data to be sent to the CPU.
For an
output device, this contains data to be output.
Control
register This allows the CPU to
control operation of the I/O device.
This
register is best considered as a set of Boolean flags, each
controlling
one option: can the device interrupt the CPU, etc.
Status
register This allows the CPU to
detect the status of the device.
Is the
power on? Is there a paper jam?
The
control register is an output register; the CPU writes information to it.
The
status register is an input register; the CPU reads status information from it.
Some
simple devices combine the two into a Control & Status register.