Introduction to the MARIE

The MARIE: A Simple Computer

Goals for this chapter:

1. Describe the top–level organization of a modern stored–program computer;
specifically name and describe the four main components.

2. Describe some of the lower–level components, such as registers,
that are common to three of the four main components of the computer.

3. Define an assembly language for this simple computer and use that assembly
language to investigate the functioning of a stored program computer.

NOTES:

1. The hypothetical computer designed by our textbook’s authors is called
the MARIE. See the textbook for the origin of the name.

    2.   The MARIE has an extremely simple design. Any computer the student sees or
          uses, including the computers in one’s microwave oven and washing machine,
          are more powerful than the MARIE.

3. While a very simple design, the MARIE illustrates all of the important features
of a modern stored–program computer. “Simpler” means “easier to learn”.

Why a Binary Computer?

The MARIE is typical of all modern computers in that it is a binary device.

1. All data are stored in binary format, and

2. All arithmetic is performed using two’s–complement binary conventions.

Why not have decimal format? Just create memory elements that store one of 10 values.

The answer has two parts. Each is valid.

1. It is easy to make binary storage devices and binary arithmetic devices.

2. There are a number of significant design challenges in creating ten–state
devices. Mostly these have to do with electronic reliability.

In fact, no modern electronic computer has ever used true decimal storage and arithmetic.

Early machines, such as the ENIAC, were called decimal computers. In fact:

1. Each decimal digit was stored in binary format, and

2. All arithmetic was done in binary format, as adapted for storage
of individual decimal digits.

Computer Basics and Organization

The computer has four top–level components.
    1. The CPU (Central Processing Unit)
    2. The Main Memory
    3. Input/Output Devices, including a Hard Disk
    4. A Bus Structure to facilitate communications between the other components.

Major Components Defined

The system memory (of which this computer has 512 MB) is used for transient storage of programs and data. This is accessed much like an array, with the memory address serving the function of an array index.

The Input / Output system (I/O System) is used for the computer to save data and programs and for it to accept input data and communicate output data.
Technically the hard drive is an I/O device.

The Central Processing Unit (CPU) handles execution of the program.
It has four main components:
   1. The ALU (Arithmetic Logic Unit), which performs all of the arithmetic
         and logical operations of the CPU, including logic tests for branching.
   2. The Control Unit, which causes the CPU to follow the instructions
         found in the assembly language program being executed.
   3. The register file, which stores data internally in the CPU. There are user
         registers and special purpose registers used by the Control Unit.
   4. A set of internal busses to allow the CPU units to communicate.

A System Level Bus, which allows the top–level components to communicate.

The System Clock

We discussed the system clock when we discussed the basic flip–flops.

Here are two depictions of the system clock that will be used in this course.

The top representation is used when discussing the system bus and the I/O bus.
The bottom representation is used elsewhere.

The system clock regulates the execution of instructions in the CPU and
synchronizes all of the CPU components to prevent errors due to bad timing.

The System Bus

The system bus comprises a set of digital lines; each of which carries a signal, power, or ground. The ground is used to complete circuits.

If the line carries a signal, that line is set either to logic 1 (often +5 volts) or logic 0
(often 0 volts). Consider it as transmitting a Boolean variable, in that the value on
that line can be changed by some control unit as a program is executed.

The ground lines also provide isolation between the signal lines, which can act as little antennas – either to “broadcast” signals or receive them from other signal lines.
This is called “cross talk”. It is not desirable.

High–speed busses are shorter than low–speed busses. High–speed busses are used for:
    1. Connecting the CPU to main memory,
    2. Connecting the CPU to the graphics system, and
    3. Possibly connecting the CPU to other high–speed devices.

Low–speed busses are used to connect the CPU to I/O devices, possibly including the
main hard disk.

The speed of a data bus is determined by the time it takes an electrical signal to travel its
length. Light travels about 1 foot per nanosecond; signals travel about 8 inches in that time. A bus operating at 1 GHz cannot be longer than 8 inches; likely it is shorter.

Notations Used for a Bus

Here is the way that we would commonly represent a small bus.

The big “double arrow” notation indicates a bus of a number of different signals.
Our author calls this a “fat arrow”.

Lines with similar function are grouped together. Their count is denoted with the
“diagonal slash” notation.

From top to bottom, we have
1. Three data lines D₂, D₁, and D₀

2. Two address lines A₁ and A₀

3. The clock signal for the bus F.
Not all busses transmit a clock signal; the system bus does.

Power and ground lines usually are not shown in this diagram.

Busses: Common and Point–to–Point

In general, a design should minimize point–to–point busses, as they introduce a
number of difficulties into the design.

Shared busses tend to have lower data rates than point–to–point busses, which then
are the design choice when the bus must support a high data rate.

Typical high–rate busses include the memory bus and the graphics bus. Each of these
is implemented as a point–to–point bus for two reasons:
1. To maximize the data rate, and
2. Because there is only one device with which the CPU communicates.

Some high–rate busses are shared busses. An example would be a bus connecting the
ALU to the register file in the CPU. At any time, at most one of the registers is using
this bus to communicate data to the ALU.

Busses that manage most I/O devices tend to be shared.

Connecting External Devices to the Computer Bus

External devices include printers, network cards, disk drives, and the computer keyboard.

Each device must be connected to one of the computer busses through a device called
an interface, often an “interface card”.

Each device will have software dedicated to controlling it, called a “device driver”.

Device drivers are often considered part of the computer operating system, because
they are called only by the operating system.

The main function of the device driver is to translate between the standard device
control signals used by the operating system and the device–specific control
signals required by the device’s interface card.

The function of the interface card is to present the data and control signals, properly
formatted, to the device being managed, and accept data back.

From the view of the CPU, each device is represented as a number of addressable
registers, some containing data and some control information. The interface card
presents these to the actual device. We shall develop this idea later.

Asynchronous and Synchronous Busses

One aspect of a bus depends on what assumptions can be made about the timing of
the devices attached. Can each device be assumed to work with fixed timing?

Consider a keyboard attached to a common bus. This produces data only when a
user actually presses a key. The timing of data availability is totally unpredictable.

In cases such as managing most I/O devices, an asynchronous bus is used. This means
that there is no clock signal used to coordinate events.

An asynchronous bus must use specific control signals to coordinate between the device
producing the data and the device receiving the data. Here is a sample set:

Request the CPU signals the device that input is required.

Ready the input device signals the CPU that data are ready to be read.

ACK the CPU acknowledges that it has received the data.

For some devices, such as memory, we may assume constant timings. Here we have
fewer control signals. The sequence to read memory is typically simple.

1. The CPU asserts a memory address and the READ control signal.

2. After a fixed time, the CPU reads the data from the appropriate memory register.
The data can be assumed to be present and correct at the specified time.

The Memory Component

The memory stores the instructions and data for an executing program.

Memory is characterized by the smallest addressable unit:
Byte addressable the smallest unit is an 8–bit byte.
Word addressable the smallest unit is a word, usually 16 or 32 bits in length.

Most modern computers are byte addressable, facilitating access to character data.

Logically, computer memory should be considered as an array.
The index into this array is called the address or “memory address”.

A logical view of such a byte addressable memory might be written in code as:

Const MemSize =
byte Memory[MemSize] // Indexed 0 … (MemSize – 1)

The CPU has two registers dedicated to handling memory.

The MAR (Memory Address Register) holds the address being accessed.

         The MBR       (Memory Buffer Register) holds the data being written to the
                                memory or being read from the memory. This is sometimes
                                called the Memory Data Register.

The Simplistic Physical View of Memory

I call this the “linear view”, as memory is still modeled as one large linear array.

The N–bit address selects one of the 2N entities, numbered 0 through (2N – 1).

Read sequence: First address to MAR; command a READ.
then copy the contents of the MBR.

Write sequence: First address to MAR; data to the MBR.
then command a WRITE.

This is logically correct, but difficult to implement at an acceptable price.

Memory Organization and Addressing

Memory is based on binary bits. Each bit can hold one of two values: 0 or 1.

Except for unusual designs, individual bits in memory are not directly addressable
by the CPU (Central Processing Unit). The old IBM 1401 could access bits directly.

The most common memory groupings are as follows:
                8 bits       a byte
                16 bits     a word             (some call this a short word)
                32 bits     a longword      (some call this a word)

The term “word” is somewhat ambiguous due to multiple definitions. In this course, we refer to “16–bit word”, “32–bit word”, etc.

In some computers, a word is the smallest addressable memory unit. Most of these, such as the CDC–6600 (60–bit words) are now obsolete.

In a byte–addressable computer (such as the Intel Pentium series), each byte is addressable individually, although 32–bit words can be directly accessed.

All computers with byte addressing provide instructions to access both 16–bit words and
32–bit longwords. The CPU just accesses two or four bytes at a time.

Memory Organization and Addressing (Part 2)

Memory is often described by a notation with the structure (L x W)

L is the number of addressable units in memory

W is the number of bits in memory

The old CDC–6600 usually had a 256 K x 60 memory. This was 256 · 1024 =
262, 144 words, each of 60 bits. Yes, this was called a “supercomputer”.

A modern Pentium might have a memory described as 512 M x 8;
512 · 2²⁰ = 512 · 1, 048, 576 = 536,870,912 addressable units, each with 8 bits.
This would be called a 512 MB memory.

Main memory sizes are not quoted in bits. Memory chip sizes often are quoted in bits,
but could be quoted in numbers of 4–bit elements as well as 8–bit bytes.

Common notation: 1K = 2¹⁰ = 1, 024                    (almost never seen these days)
                                1M = 2²⁰ = 1, 048, 576
                                1G = 2³⁰ = 1, 073, 741, 824

Address Space and Memory Addressing

N bits will address 2^N items. Pentium has 32 bit addressing and will address 2³² bytes.

To address M items, we need N bits, with 2^N–1 < M £ 2^N. Quite often these days, we
have either M = 2^N (obviously requiring N bits to address). It is also the case that we normally just state the number of address bits and keep the actual memory small enough.

Example: One of my recent laptop computers had 384 MB of byte–addressable
memory. How many bits would be required to address this?

Answer: This memory has 384 MB as is byte addressable, so it has to be able
to address 384·2²⁰ distinct memory cells. 1M = 2²⁰ = 1, 048, 576.

Now 384 = 256 + 128 = 2⁸ + 2⁷,
so 384 MB = 384·2²⁰ bytes = (2⁸ + 2⁷)·2²⁰ bytes = (2²⁸ + 2²⁷) bytes.

If M = (2²⁸ + 2²⁷), it should be obvious that 2²⁸ < M £ 2²⁹, so we need a 29–bit address.

Practicality: All Pentium computers provide a 32–bit address through
a 32–bit MAR (Memory Address Register).

RULE: Memory is cheap. You can never have enough memory on a computer.
When you buy a computer, buy all the memory it can be configured to use.

Memory as a Collection of Chips

In fact, physical memory is built from standard memory chips. For example, a 256 MB memory might be built from sixteen 16 MB chips, each of which might itself be implemented as eight 16 Mb (megabit) chips; a total of 128 chips.

Consider the textbook’s example: a 32 KB memory built from 4KB chips.

32 KB = 2¹⁵ bytes and 4 KB = 2¹² bytes. We need (2¹⁵ / 2¹²) = 2³ = 8 chips. In standard fashion, these chips will be numbered as 0 through 7 inclusive.

We need a 15–bit address for this memory. Address bits are numbered 14 through 0.

Here we adopt low order interleaving. Consecutive addresses are placed in different chips. This facilitates faster access to memory. Here is the textbooks figure showing the location of the first 32 addressable bytes.

Low–Order Interleaving: Partitioning the Address

Low–order interleaving will always use a chip count that is a power of 2; 2^K with K > 1.

The N–bit memory address will be broken into K bits for the chip selection and
(N – K) address bits for each chip.

In our example N = 15 and K = 3. In this low–order interleaving, the three low order bits select the chip to be used. These are bits 2, 1, and 0.

bit	14	13	12	11	10	9	8	7	6	5	4	3	2	1	0
	12–bit address to each chip.												Chip Select

In high–order interleaving (also called “memory banking”, not much used)
the high–order K bits select the chip. In our example
bits 14 – 12 select the chip and bits 11 – 0 are sent to each chip.

In low–order interleaving Chip_Number = Address Mod 2^K, the remainder from division by the number of chips in the chip set. Always this count is a power of 2.

This organization is often closely connected to the size of the memory cache blocks.
If the memory is 2^K–way interleaved (low order), each cache line might have 2^K bytes.

Why have low–order interleaving?

This choice is due to the principle of locality; memory locations tend to be accessed one after another. If consecutive locations are in different chips, the CPU can initiate a number of memory–read operations at a rate faster than the memory chips can handle.

Consider the organization from the book, with an 8–way low interleaving.

Suppose that the CPU wants to fill a cache line with the eight bytes, indexed 8 to 15.

The CPU sends an address and READ command to module 0.
Without waiting for a response, the CPU sends an address and READ to module 1.

Finally, the CPU sends an address and READ command to module 7.

Then, the CPU actually reads from module 0.
If the memory access time is 80 nanoseconds, the CPU can issue on command every
10 nanoseconds as it will take 80 nanoseconds to get back and read a given module.

The Central Processing Unit (CPU)

The CPU also has four main components:
    1. The Control Unit (along with the IR) interprets the machine language instruction
         and issues the control signals to make the CPU execute it.
    2. The ALU (Arithmetic Logic Unit) that does the arithmetic and logic.
    3. The Register Set (Register File) that stores temporary results related to the
         computations. There are also Special Purpose Registers used by the Control Unit.
    4. An internal bus structure for communication.

The Register File

There are two sets of registers, called “General Purpose” and “Special Purpose”.

The origin of the register set is simply the need to have some sort of memory on the computer and the inability to build what we now call “main memory”.

When reliable technologies, such as magnetic cores, became available for main memory, the concept of CPU registers was retained.

Registers are now implemented as a set of flip–flops physically located on the CPU chip. These are used because access times for registers are two orders of magnitude faster than access times for main memory: 1 nanosecond vs. 80 nanoseconds.

General Purpose Registers
These are mostly used to store intermediate results of computation. The count of such registers is often a power of 2, say 2⁴ = 16 or 2⁵ = 32, because N bits address 2^N items.

The registers are often numbered and named with a strange notation so that the assembler will not confuse them for variables; e.g. %R0 … %R15. %R0 is often fixed at 0.

NOTE: The MARIE has only one general purpose register – the AC (Accumulator).
Think of the AC as the display on a standard calculator.

The Register File

Special Purpose Registers
These are often used by the control unit in its execution of the program.

    PC       the Program Counter, so called because it does not count anything.
                It is also called the IP (Instruction Pointer), a much better name.
                The PC points to the memory location of the instruction to be executed next.

IR the Instruction Register. This holds the machine language version of
the instruction currently being executed.

MAR the Memory Address Register. This holds the address of the memory word
being referenced. All execution steps begin with PC ® MAR.

MBR the Memory Buffer Register, also called MDR (Memory Data Register).
This holds the data being read from memory or written to memory.

    PSR     the Program Status Register, often called the PSW (Program Status Word),
                contains a collection of logical bits that characterize the status of the program
                execution: the last result was negative, the last result was zero, etc.

SP on machines that use a stack architecture, this is the Stack Pointer.

Connecting a D Flip–Flop to a Data Bus

Registers are normally implemented with D flip–flops, one for each bit stored.

Cache memory can be considered to be fabricated from D flip–flops. Being made
with a technology called “SRAM” (defined later), this might even be true.

Main memory can be considered to be fabricated from D flip–flops, but it is not.
Here the idea is just a good logical model.

There needs to be a way to place a number of D flip–flops on a pair of common data
busses so that each flip–flop can be written to and read from.

To avoid naming problems, I call the two data busses “To Register” and “From Register”.

Suppose a register file with 32 registers, numbered 0 through 31. Each register in the
file stores 16 bits. Thus, each of the two data busses must have 16 data lines.

We have two questions to consider at this point.
1. How to connect a D flip–flop to the “To Register” data line so that it is loaded
with data only when it is supposed to be, and

2. How to connect a D flip–flop to the “From Register” data line so that it outputs
data only when it is supposed to output.

Controlling Input to a D Flip–Flop

A D flip–flop copies the value on the D input on every rising clock edge.
It is not possible to shut down the D input, it always has a value.

The only way to control input to the flip–flop is through use of the CLK input signal,
which is here shown connected to a control called “Load”.

When Load = 0, the flip–flop does not respond to input and retains its value.

When Load is pulsed from 0 to 1, the flip–flop will load whatever value is on the
bus line “To Reg”.

More on Input to a D Flip–Flop

The control signal Load must be synchronized with the system clock.

The Load signal for this flip–flop is generated by the Control Unit to arrive just before
the rising edge of the system clock, allowing a valid CLK input signal to be generated.

There is a load signal for each register in the register file. Each flip–flop in the register
receives the same Load signal and loads at the same time.

Two One–Bit Registers

This two–register set shows how to select the register to be loaded.

When L₀ is pulsed, the top register is loaded.

When L₁ is pulsed, the bottom register is loaded.

When neither is pulsed, each register keeps its contents.

Theoretically, one might pulse both L₀ and L₁. In practice, this is seldom done.

Managing Flip–Flop Output: the Tri–State Buffer

We now need a simple way to connect the output of the flip–flop to the “From
Register” bus. This is the output labeled as Q. The best choice is a tri–state buffer.

The tri–state buffer is just an automatic switch that can be turned on and off.

Here are the diagrams for two of the four most popular tri–state buffers.

What does the tri–state do when it is enabled?

What does the tri–state do when it is not enabled?

We shall focus on the enabled–high tri–state buffer. The other is similar.

An Enabled–High Tri–State Buffer

Consider an enabled–high tri–state buffer, with the enable signal called “C”.
When C = 1, the buffer is enabled.
When C = 0, the buffer is not enabled.

What does the buffer do?

The buffer should be considered a switch. When C = 0, there is no connection between the input A and the output F. When C = 1, the output F is connected to the input A via what appears to be a non–inverting buffer.

Strictly speaking, when C = 0 the output F remains connected to input A, but through a circuit that offers very high resistance to the flow of electricity. For this reason, the state is often called “high impedance”, “impedance” being an engineer’s word for “resistance”.

Sample Use of Tri–State Buffers

Here is a circuit that uses a pair of tri–state buffers to connect exactly
one of two inputs to an output. The effect of the circuit is at right.

Here is the equivalent circuit using the standard gates.

Connecting the Flip–Flop Output to a Bus

Each flip–flop in a register file can be connected to a bit line in the “From Register”
bus using a tri–state buffer.

When Read = 0, the Q output is not connected to the bus line “From Reg” and
nothing is output to that bus.

When Read = 1, the Q output becomes connected to the bus line “From Reg” on the
next rising edge of the clock. The flip–flop value is output to the bus.

The control unit of the CPU must insure proper timing of the Read control signal.

Registers and D Flip–Flops

Registers are often built from D flip–flops. A 16–bit register has 16 D flip–flops.

Here is a diagram of a single 2–bit register to show how the flip–flops are used.
There are two 2–bit busses, one carrying data to the registers and one data from them.

When Clock = 1 and Load = 1, the register accepts input from the “To Register” bus.

When Clock = 1 and Read = 1, the register puts data onto the “From Register” bus.

The ALU (Arithmetic Logic Unit)

The ALU performs all of the arithmetic and logical operations for the CPU.

These include the following:
Arithmetic: addition, subtraction, negation, etc.
Logical: AND, OR, NOT, Exclusive OR, etc.

This symbol has been used for the ALU since the mid 1950’s.

It shows two inputs and one output.

The reason for two inputs is the fact that many operations, such as addition and logical AND, are dyadic; that is, they take two input arguments.

The Fetch–Execute Cycle

This cycle is the logical basis of all stored program computers.

Instructions are stored in memory as machine language.

Instructions are fetched from memory and then executed.

The common fetch cycle can be expressed in the following control sequence.

MAR ¬ PC. // The PC contains the address of the instruction.

READ. // Put the address into the MAR and read memory.

IR ¬ MBR. // Place the instruction into the MBR.

This cycle is described in many different ways, most of which serve to highlight additional steps required to execute the instruction. Examples of additional steps are: Decode the Instruction, Fetch the Arguments, Store the Result, etc.

A stored program computer is often called a “von Neumann Machine” after one of the originators of the EDVAC.

This Fetch–Execute cycle is often called the “von Neumann bottleneck”, as the necessity for fetching every instruction from memory slows the computer.

Input/Output System

Each I/O device is connected to the system bus through a number of registers.
Collectively, these form part of the device interface.

These fall into three classes:

Data Contains data to be written to the device or just read from it.

Control Allows the CPU to control the device. For example, the CPU
` might instruct a printer to insert a CR/LF after each line printed.

Status Allows the CPU to monitor the status of the device. For example
a printer might have a bit that is set when it is out of paper.

There are two major strategies for interfacing I/O devices.

Memory Mapped            I/O is designated through specific addresses
    Load    KBD_Data      This would be an input, loading into the AC
    Store    LP_Data          This would be an output, storing into a special address

Isolated I/O (Instruction–Based I/O)      Uses special instructions.
    Input                    Read from the designated Input Device
    Output                 Write to the designated Output Device

Interrupts

Efficient management of Input / Output devices demands that these devices be able to signal the CPU when they are ready to initiate a data transfer.

For an input device, this occurs when new data are in its input buffer.

For an output device, this occurs when the device buffer is empty and the device
can accept new data for later output.

While easier to understand within an I/O context, interrupts can occur in other contexts.

1. Errors and malfunctions.

2. Page faults in a virtual memory system (these are hard to handle).

3. Software interrupts or “traps” that allow user software to signal the
Operating System. These differ slightly from standard procedure calls.

Interrupts are either maskable (that is, the CPU can be set to ignore them) or nonmaskable. Generally, the only reason to mask interrupts occurs during that small time of program execution in which the CPU is beginning to process an interrupt.

Improper masking of interrupts can cause a system to crash.