Chapter 7C – The
Intel Pentium System
The Pentium line
is the culmination of Intel’s IA–32 architecture and, possibly, the beginning
of the IA–64 architecture. In this
sub–chapter, we examine the details of its design in a bit
more detail. We shall note some features
that have been created to support modern operating
systems. In order to understand these
features, we need to discuss the operating system
requirements briefly.
As noted several
times previously, a modern computer is a complete system. The major
components of this system include the compiler (used to write programs for the
computer),
the motherboard (with its busses and mounting slots), the CPU, and the
operating system.
We begin this
chapter with a basic definition from the study of operating systems. This is the
necessarily vague definition of a process. The term “program” can mean many things,
among
which is the physical listing on paper of the high–level or assembly language
text. When a
program is executing, it acquires a number of assets (memory, registers, etc.)
and becomes a
process. Basically, a process is a
program under execution, along with all the non–sharable
assets required to support that execution.
There is more to the definition than that, but this
imprecise notion will support our discussion below. In memory management, the goal is to
consider two processes logically executing on the computer at the same time,
though probably
executing sequentially, one after another in turn. The assets of each process (including the
binary image of the executing code) must be protected from the other process.
IA–32 Memory Segmentation
Early computers ran
programs mostly written by the users, with only a small amount of system
software to support the user programs.
The logical model of execution was that of a single
program under execution; the user program would call the needed system routines
as needed.
As the operating
system evolved, the execution model became more one of parallel processes,
perhaps executing sequentially but better considered logically as executing in
parallel. The
system processes were best seen as separate from the user process, requiring
protection from
accidental corruption by the user program.
Such protection requires some sort of hardware
support for memory management.
Basic to the
idea of memory management is the definition of ranges of the address space that
a particular process can access. In many
modern computers, the address space is divided into
logical segments. For each logical
segment that a process can access, the hardware defines the
starting address of that segment, the size of the segment, and access rights
owned by the process.
The later IA–32
implementations, including all Pentium models, supported three memory
segmentation modes to facilitate memory management by the operating system. These are
real mode, protected mode, and virtual
8086 mode [R018, page 586; R019, page 36].
Real mode implements the programming mode of the
Intel 8086 almost exactly, with a few
extra features to allow switching to other modes. This mode, when available, can be used to
run MS–DOS programs that require direct access to system memory and hardware
devices.
Programs run in real mode can cause the operating system to crash. If a real mode program is
one of many running on the computer at the time, all of the other programs
crash as well. There
is no protection among programs; the computer just stops responding to
input. In this mode, the
segment registers are used purely to calculate addresses; see the previous
sub–chapter.
There is one
real–mode data structure that requires discussion, as it will lead to a more
general data structure used in protected mode.
This is the IVT (Interrupt Vector Table),
which is used to activate software associated with a specific I/O device. We shall discuss
I/O management, including I/O interrupts and I/O vectors in chapter 9 of this
text. Here is
the brief description of an input I/O operation to show the significance of the
IVT.
1. An
I/O device signals the CPU that it is ready to transfer data by asserting a
signal called an “interrupt”. This is asserted low.
2. When
the CPU is ready to handle the transfer, it sends out a signal, called an
“acknowledge” to initiate the I/O process itself.
3. As
a first step to the I/O process, the device that asserted the interrupt
identifies
itself to the CPU. It does this by sending a vector, which is merely an address to
select an entry in the
IVT. The IVT should be considered as an
array of entries,
each of which contains the
address of the program to handle a specific I/O device.
4. The
ISR (Interrupt Service Routine) appropriate for the device
begins to execute.
There
is more to the story than this, but we have hit the essential idea of a single
IVT to
manage the input and output for all executing programs.
Protected mode is the native state of the
Pentium processor, in which all instructions and
features are available. Programs are
given separate memory areas called segments,
and the
processor uses the segment registers and associated other registers to manage
access to memory,
so that no program can reference memory outside its assigned area. The operating system is
thus protected from intrusion by user programs.
The operating system operates in a privileged
state in which it can change the segment registers in order to access any area
of memory.
Virtual 8086 mode is a sub–mode of protected
mode. In this mode, many of the
protection
features of protected mode are active.
The processor can execute real–mode software in a safe
multitasking environment. If a virtual
8086 mode process crashes or attempts to access memory
in areas reserved for other processes or the operating system, it can be
terminated without
adversely affecting any other process.
In protected
mode, and its sub–mode virtual 8086 mode, each process is assigned a separate
session, which allows for proper
management of its resources. Part of
that management involves
creation of a separate IVT for that session, allowing the Pentium to allocate
different I/O services
to separate sessions. More importantly
it provides protection against software crashes.
Windows XP can
manage multiple separate virtual 8086 sessions at the same time, possibly in
parallel with execution of programs in protected mode. This idea has been extended successfully
to that of a virtual machine, in
which a number of programs can execute on a given machine
without affecting other programs in any way.
The large IBM mainframes, including the
z/9 and z/10, call this idea an LPAR
(Logical Partition).
One key logical
component of the virtual machine idea has yet to be discussed; this is called
virtual memory. This will be discussed fully in chapter 12 of
this textbook. There is one
important point that can be restated even at this early stage. The program generates addresses
that are modified by the operating system into actual addresses into physical
memory. As a
result, the operating system controls access to real physical memory and can
use that control to
enhance security.
In protected
mode, as well as in its sub–mode virtual 8086, addresses to physical memory are
generated in a number of steps. Three
terms related to this process are worth mention: the
effective address, linear address, and physical address. With the exception of the term
“physical address”, which references
the actual address in the computer memory, the terms
are somewhat contrived. In the IA–32
designs, the effective address is
the address generated by
the program before modification by the memory management unit. The rules for generation of
this address are specified by the syntax of the assembly language.
The effective
address is passed to the memory management unit, first to the segmentation
unit,
which accesses the segment registers to create the linear address and then accesses a number of
other MMU (Memory Management Unit) registers to determine the
validity of the address value
and the validity of the access: read, write, execute, etc. The translation from linear address to
physical address is controlled by the virtual memory system, the topic of a
later chapter.
Cache Memory
Here is another
topic that we continue to mention in passing with a promise to discuss it more
fully at a later time. For the moment,
we shall describe the advantages of such a system, and
again postpone a full discussion for another chapter.
Each Pentium
product is packaged with a cache memory system designed to optimize memory
access in a system that is referencing both data memory and instruction memory
at the same
time. We should note that it is the
general practice to keep both data and executable instructions
in the same main memory, and differentiate the two only in the cache. This is one example of
the common use of cache: cause the memory system to act as if it has a certain
desirable attribute
without having to alter the large main memory to actually have that attribute.
At this time,
let’s state a few facts. Because it is
smaller, the Level 1 cache (L1 cache) is faster
than the L2 cache. Because it is smaller
than main memory, the L2 cache is faster than the main
memory. This multilevel cache applies
the same trick twice. In the above
example, the 32 KB
L1 cache combined with the 1 MB L2 cache acts as if it were a single cache
memory with an
access time only slightly slower than the actual L1 cache. Then the combination of cache
memory and the main memory acts as if it were a single large memory (2 GB) with
an access
time only slightly slower than the cache memory. Now we have a memory that functionally is
both large and fast, while no single element actually has both attributes.
Recent main
memory designs have added a write buffer, allowing for short bursts of memory
writes at a rate much higher than the main memory can sustain. Suppose that the main memory
has a cycle time of 80 nanoseconds, requiring a 80 nanosecond time interval
between two
independent writes to memory. A fast
write buffer might be able to accept eight memory writes
in that time span, sending each to main memory at a slower rate.
We mention in
passing that some multi–core Pentium designs have three levels of cache
memory. Here is a picture of the Intel
Core i7 die. This CPU has four cores, each
with its
L1 and L2 caches. In addition, there is
a Level 3 cache that is shared by the four cores. This
design illustrates two realities of CPU design in regards to cache memory.
1. The
placement of cache memory on the CPU chip significantly increases execution
speed, as on –chip accesses
are faster than accesses to another chip.
2. Better
power management, due to the fact that memory uses less power per unit area
than does the CPU logic.
Register Sets
Almost all
modern computers divide storage devices into three classes: registers, memory,
and
external storage (such as disks and magnetic tape). In earlier times, the register set (also
called
the register file) was distinctly associated with the CPU, while main memory
was obviously
separate from the CPU. Now that designs
have on–chip cache memory, the distinction between
register memory and other memory is purely logical. We shall see that difference when we study
a few fragments of IA–32 assembly language.
One of the first
steps in designing a CPU is the determination of the number and naming of the
registers to be associated with the CPU.
There are many general approaches, and then there is
the approach seen on the Pentium. The
design used in all IA–32 and some IA–64 designs is a
reflection of the original Intel 8080 register set.
Register set of
the Intel 8080 and 8086
The original
Intel 8080 and Intel 8086 designs date from a time when single accumulator
machines were still common. As mentioned
in a previous chapter, it is quite possible to design
a CPU with only one general–purpose register; this is called the accumulator. The provision of
seven general–purpose registers in the Intel 8080 design was a step up from
existing practice.
We have already
discussed the evolution of the register set design in the evolution of the
IA–32
line. The Intel 8080 had 8–bit
registers; the Intel 8086, 80186, and 80286 each has 16–bit
registers, and the IA–32 line (beginning with the Intel 80386) all have 32–bit
registers. The
Intel 8080 set the trend; newer models might have additional registers, but
each one had to have
the original register set in some fashion.
Register set of
the Intel 80386
The Intel 80386
was the first member of the IA–32 design line.
It is a convenient example for
purposes of discussion. In fact, it is
common practice for introductory courses in Pentium
assembly language to focus almost exclusively on the Intel 80386 Instruction
Set Architecture
(register set and assembly language instructions), and to treat the full
Pentium ISA as an
extension. Here is a figure showing the
Intel 80386 register set.
EAX:
This is the general–purpose register used for arithmetic and logical
operations. Recall
from the previous chapter that parts of this register can be separately
accessed. This division is
seen also in the EBX, ECX, and EDX registers; the code can reference BX, BH,
CX, CL, etc.
This register
has an implied role in both multiplication and division. In addition, the A register
(AL in the Intel 80386 usage) is involved in all data transfers to and from the
I/O ports.
Here are some
examples of IA–32 assembly language involving the EAX register. Note that
the assembly language syntax denotes hexadecimal numbers by appending an “H”.
MOV EAX, 1234H
; Set the value of EAX to hexadecimal 1234
; The format is
destination, source.
CMP AL, ‘Q’
; Compare the value in AL (the low order 8
; bits of EAX to 81,
the ASCII code for ‘Q’
MOV ZZ, EAX
; Copy the value in EAX to memory location ZZ
DIV DX
; Divide the 32-bit value in EAX by the
; 16-bit value in DX.
Here is an
example showing the use of the AX register (AH and AL) in character input.
MOV AH, 1 ; Set AH to 1 to indicate the desired I/O
; function – read a
character from standard input.
INT 21H ; Software interrupt to invoke an
Operating System
; function, here the value
21H (33 in decimal)
; indicates a standard I/O
call.
MOV XX, AL ; On return from the function call, register
AL
; contains the ASCII code
for a single character.
; Store this in memory
location XX.
EBX:
This can be used as a general–purpose register, but was originally
designed to be
the base register, holding the address of the base of a data structure. The easiest example of
such a data structure is a singly dimensioned array.
LEA EBX, ARR ; The LEA instruction loads the address
; associated with a label
and not the value
; stored at that
location.
MOV AX, [EBX] ; Using
EBX as a memory pointer, get the 16-bit
; value at that address
and load it into AX.
ADD EAX, EBX ; Add the 32-bit value in EBX to that in EAX.
ECX:
This can be used as a general–purpose register, but it is often used in
its special role as
a counter register for loops or bit shifting operations. This code fragment illustrates its use.
MOV EAX, 0 ; Clear the accumulator EAX
MOV ECX, 100 ; Set the count to 100 for 100 repetitions
TOP: ADD EAX, ECX ; Add
the count value to EAX
LOOP TOP ; Decrement ECX, test for zero, and jump
; back to TOP if
non-zero.
At the end of
this loop, EAX contains the value 5,050.
EDX:
This can be used as a general–purpose register, but it can also support
input and output
data transfers. It also plays a special
part in executing integer multiplication and division. In
general, the product of two 8–bit integers is a 16–bit integer, the product of
two 16–bit integers
is a 32–bit integer, and the product of two 32–bit integers is a 64–bit
integer. Remember that
register AL is the 8 low–order bits of EAX, and AX is the 16 low–order bits.
One item that is
important to note is that the EAX register, or whatever part is used in the MUL
operation, is implicitly a part of the operation, without being called out
explicitly.
MOV AL, 5H ;
Move decimal 5 to AL
MOV BL, 10H ; Decimal 16 to BL
MUL BL ; AX gets the 16–bit number 0050H (80
decimal)
; The instruction says
multiply the value in
; AL by that in BL and
put the product in AX.
; Only BL is explicitly
mentioned.
The 16–bit
multiplications use AX as a 16–bit register.
For compatibility with the Intel 8086,
the full 32 bits of EAX are not used to hold the product. Rather the two 16–bit registers AX and
DX are viewed as forming a 32–bit pair and serve to store it. Again, note that the 16–bit version
of the MUL automatically takes AX as holding one of the integers to be
multiplied.
MOV AX, 6000H ;
MOV BX, 4000H ;
MUL BX ; DX:AX = 1800 0000H.
The 32–bit
implementation of multiplication uses EAX to hold one of the integers to be
multiplied and uses the register pair EDX:EAX to hold the product. Here is an example.
MOV EAX, 12345H
MOV EBX, 10000H
MUL EBX ; Form the product EAX times EBX
; EDX:EAX = 0000 0001
2345 0000H
Register DX can
also hold the 16–bit port number of an I/O port.
MOV DX, 0200H
IN
AL, DX ; Get a byte from the
port at address 200H.
The ESI and EDI registers are used as source and destination addresses for
string and array
operations. These are sometimes called “Extended Source Index” and “Extended Destination
Index”. They facilitate high–speed
memory transfers.
The EBP register is used to support the
call stack for high level language procedure calls. We
shall discuss this more in the next chapter, in which we discuss subroutines. Briefly put, it
functions much like a stack pointer, but does not point to the top of the
stack.
The next two registers,
EIP and ESP, are 32–bit versions of the older 16–bit counterparts. We
discuss these here, and then introduce the 16–bit variants by discussing
segments again.
The EIP is the 32–bit Instruction Pointer, so called because it points to the instruction
likely to
be executed next. Many other
architectures call this register by the more traditional, if less
appropriate, name “Program Counter”. Jump and branch instructions, unconditional
or
conditional (if the condition is true), achieve their affect by forcing a
target address into the EIP.
The ESP is the 32–bit Stack Pointer, used to hold the address of the top of the stack. This
register is not commonly accessed directly except as a part of a procedure
call. We must make
the point here that the stack is not always treated as an ADT (Abstract Data Type) with PUSH as
the only way to place an item on the stack.
We shall investigate direct manipulation of the ESP
in more detail when we discuss allocation of dynamic memory for local
variables.
The EFLAGS register holds a collection of
at most 32 Boolean flags with various meanings.
The flags are divided into two broad categories: control flags and status
flags. Control flags
can cause the CPU to break after every instruction (good for debugging),
interrupt execution on
detecting arithmetic overflow, enter protected mode, or enter virtual 8086
mode.
The status flags
reflect the state of the execution and include CF (the carry flag, indicating a
carry out of the last arithmetic operation), OF (the overflow flag, indicating
that the result is
too large or too small to be represented), SF (the sign flag, indicating that
the last result was
negative), ZF (the zero flag, indicating that the last result was zero), and
several more.
There are six
16–bit segment registers (CS, SS, DS, ES, FS, and GS), which are hold overs
from the 16–bit Intel 8086. As discussed
in the previous chapter, these are used to allow
generation of 20–bit addresses from 16–bit registers. The two standard register pairings are
CS:IP (Code Segment and Instruction Pointer) and SS:SP (Stack Segment and Stack
Pointer).
In the more modern Pentium usage, these segment registers are used in
combination with
descriptor registers to support memory management.
Register set of
the Pentium
In addition to
the above register set, the Pentium architecture calls for six 64–bit registers
to
support memory management (CSDCR, SSDCR, DSDCR, ESDCR, FSDCR, and GSDCR), the
TR (Task Register), the IDTR (Interrupt Descriptor Table Register), two
descriptor registers
(GDTR – Global Descriptor Task Register and LDTR – Local Descriptor Task
Register) and a
few more. Then there are the sixteen
specialized data registers (MM0 – MM7 for the multimedia
instructions, and FP0 – FP7 for floating point arithmetic). Newer versions of the architecture
almost certainly contain still more registers.
Especially in
the case of memory management, it is important to remember that the Operating
System functions by setting up and then using some fairly elaborate data
structures. Each of
these structures has a base address stored in one of these registers for fast
access.
Addressing Modes
We now discuss
some of the addressing modes used in the Pentium architecture. We shall use
two–argument instructions to illustrate this, as that is easier. The simplest mode is also the
fastest to execute. This is the data register direct mode. Here is an example.
MOV EAX, EBX ; Copy the value from EBX into EAX
; The value in EBX is
not changed.
Immediate Mode
In this mode,
one of the arguments is the value to be used.
Here are some examples, a few
of which are not valid.
MOV EBX, 1234H ; EBX gets the value 01234H.
MOV 123H, EBX ; NOT VALID.
The destination of any
; move must be a
memory location.
MOV AL, 1234H ; NOT VALID.
Only one byte can be moved
; into an 8-bit
register. This is 2 bytes.
Memory Direct Mode
In this mode,
one of the arguments is a memory location.
Here are some examples.
MOV ECX, [1234H] ; Move the value at address 1234H to ECX.
; Not the same as the
above example.
MOV EDX, WORD1 ; Move the contents of address WORD1 to EDX
MOV WORD2, EDX ; Move the contents of the 32–bit register
; EDX to memory
location WORD2.
MOV X, Y ; NOT VALID. Memory to memory moves are
; not allowed in this
architecture.
Address Register Direct
Here, the
address associated with a label is loaded into a register. Here are two examples,
one of which is memory direct and one of which is address register direct.
LEA EBX, VAR1
; Load the address associated with VAR1
; into register
EBX.
; This is address
register direct.
MOV EBX, VAR1
; Load the value at address VAR1 into EBX.
; This is memory
direct addressing.
Register Indirect.
Here the register
contains the address of the argument.
Here are some examples.
MOV EAX, [EBX] ; EBX contains the address of a value
; to be moved to
EAX.
Note that the
following two code fragments do the same thing to EAX. Only the first
fragment changes the value in EBX.
LEA EBX, VAR1 ; Load the address VAR1 into EBX
MOV EAX, [EBX] ; Load the value at that address into EAX
MOV EAX, VAR1 ; Load the value at address VAR1 into EAX
Direct Offset Addressing
Suppose an array
of 16–bit entries at address AR16. We may
employ direct offset in two ways
to access members of the array. Here are
a number of examples.
MOV CX,AR16+2 ; Load the 16–bit value at address
; AR16 + 2 into
CX. For a zero-based
; array, this might
be AR16[1].
MOV CX,AR16[2] ; Does the same thing. Computes the
; address (AR16 +
2).
Base Index Addressing
This mode
combines a base register with an index register to form an address.
MOV EAX, [EBP+ESI] ;
Add the contents of ESI to that of EBP
; to form the
source address. Move the
; 32–bit value at
that address to EAX.
Index Register with Displacement
There are two
equivalent versions of this, due to the way the assembler interprets the
second way. Each uses an address, here TABLE,
as a base address.
MOV EAX,
[TABLE+EBP+ESI] ; Add the contents of ESI to that
; of EBP to
form an offset, then add
; that to the
address associated
; with the
label TABLE to get the
; address of
the source.
MOV EAX TABLE[ESI] ; Interpreted as the same as above.