Variants on a
Simple Program Statement
Begin
with a program statement in some high–level language.
Z = X + Y
In
the MARIE assembly language, this would be written as follows.
Load X
Add Y
Store Z
The
hexadecimal representation of the MARIE machine language might be as follows.
10A2
30BC
202D
How
do we get to this hexadecimal representation?
There are a few items to discuss before we really answer this one.
1. What
is an accumulator and what is an accumulator–based machine?
2. What
is assembly language and what is the function of the assembler?
What is an
Accumulator?
The
textbook definition of an accumulator is completely accurate. That definition might be expanded a bit.
In
a classic accumulator–based architecture, the accumulator, denoted “AC” or “ACC”,
is the one
register that holds temporary results from the computation. It accumulates
the results, as in the above
example in which it is used to accumulate the sum.
The
single accumulator architecture is an artifact of the days in which all
registers were very expensive to
build and took some trouble to maintain.
Architectures with fewer general purpose registers (such as
those with only one) were more reliable.
For
a good model of an accumulator, think of the single line display on a pocket
calculator. The older
ones had only one line of digits. The
newer “full screen” units still have a line designated to hold the
latest results. Think of each of these
displays as an output unit copying the contents of a single accumulator.
Early
architectures added another register, called “MQ” to allow for multiplication
and division. The
multiplication of two 16–bit numbers gives a 32–bit result.
Later
architectures (except INTEL with the AX, BX, etc.) moved to a number of general
purpose registers,
often denoted by number: R0, R1, etc.
What Is a
Stored Program Computer?
This
is often called a “von Neumann architecture”, after
John von Neumann, who was the principle author
on a paper defining the concept of a stored program computer.
The
basic idea is simple:
1. Programs
and data are stored in primary computer memory.
Obviously the
programs (and possibly
data) are copied into memory from some secondary
storage (think “fixed
disk”), but that is not part of the model.
2. Each
instruction is represented in memory as a binary number. It is fetched
from memory by the “fetch
unit” and executed by the “execute unit”.
3. It
is likely that some data are written to a secondary storage or print device
during the execution of
the program.
There
are a few corollaries to the above.
a) Each
memory location must be individually accessible and associated with
a unique identifier, such
as an address.
b) Each
instruction must be stored at a unique identifiable memory location.
c) Each
“variable” must be associated with a unique identifiable memory location.
What Does
the Assembler/Loader Do?
Consider
the sample program from the first slide.
Load X
Add Y
Store Z
Each
of these instructions must be placed at a unique address that is easily
identifiable. In very early
machines and very primitive machines (such as the MARIE), the first instruction
of the program is
placed at a fixed address, often called “START”.
After
placing one instruction, the assembler must compute the address of the next
instruction. For some
architectures, such as the INTEL Pentium™ series, this
can be complex, as instructions come in a wide
variety of lengths: 1, 2, 4 bytes, etc.
The
MARIE is simple. All instructions have
the same length: one word. If an
instruction is at location
N, the next is at location (N + 1).
Suppose start = 0x000 (hexadecimal).*
000 Load X
001 Add Y
002 Store Z
* REMEMBER: All addresses are
12–bit binary numbers; so three hexadecimal digits.
More on the
Assembler/Loader
The
assembler must allocate memory locations for each “variable” used in the
computation. In more
complex architectures, this needs to account for the number of bytes allocated
for the variable: 2 or 4
bits for an integer, 4 or 8 bits for a real, etc.
Again,
the simplicity of the MARIE architecture is helpful. Only integers are used as data items.
Each is exactly 16–bits long; one word per integer.
Let’s
expand the program slightly, so that its assembly will make sense. We have:
000 Load X
001 Add Y
002 Store Z
003 Halt
004 X, Dec 4
005 Y, Dec 8
006 Z, Dec 0
END
Note
the label notation, as in “X,”. It is a symbol followed by a comma.
Assembler,
Pass 1
Now
we consider a two pass assembler, which is the “standard variety”.
Pass One: This
identifies the three symbols X, Y, and Z.
It does so by scanning the labels at the
beginning of the lines and finds “X,”, “Y,”, and “Z,”.
In
a more sophisticated assembler, the declarations for these three labels would
identify the type of
the variable: integer, real number, fixed length string, etc.
In
the MARIE, each variable is a 16–bit integer.
At
the end of Pass 1, the assembler has noted the addresses to be assigned to each
“variable” and
enrolled the names in its symbol table,
which is used as a part of Pass 2 to associate each label
with its unique address.
Here
is the symbol table generated after Pass 1 for this program
Label |
Address |
X |
0x004 |
Y |
0x005 |
Z |
0x006 |
Assembler,
Pass 2
The
output of pass 1 of the assembler may be imagined as follows. Each item has been assigned a
location and all of the symbols used (think “variables”) have been identified.
000 Load X
001 Add Y
002 Store Z
003 Halt
004 X, Dec 4
005 Y, Dec 8
006 Z, Dec 0
END
Recall
the MARIE Instruction Format, shown in Figure 4.10 of the textbook.
15 |
14 |
13 |
12 |
11 |
10 |
9 |
8 |
7 |
6 |
5 |
4 |
3 |
2 |
1 |
0 |
OpCode |
Address (represented as 3 hexadecimal digits) |
Note that the opcode,
being the code that represents the instruction, is a binary number. It is often
represented in hexadecimal, but almost never in decimal.
Octal notation is occasionally
used for the opcodes, but we shall avoid the practice.
Let’s trace the second
pass of the assembler. This uses the
symbol table to associate an address with
each of the “variables” mentioned in the assembly instructions.
000 Load X The opcode for “Load” is binary 0001, or
0x1 (hexadecimal 1)
The address for X is 0x004.
The instruction is 0x1004.
001 Add Y The opcode for “Add” is binary 0011, or
0x3.
The
address for Y is 0x005. The instruction is 0x3005.
002 Store Z The opcode for
“Store” is binary 0010, or 0x2.
The
address for Z is 0x006. The instruction is 0x2006.
003 Halt The
opcode for “Halt” is binary 0111, or 0x7.
There is
no operand, so no address part. The
address part of
such an
instruction is usually set to 000, so the instruction is 0x7000.
NOTE: With “zero operand” instructions, such as
HALT, only the first hexadecimal digit is significant.
The last three hexadecimal digits can be any value, as they are ignored by the
microarchitecture (the
circuits that make the computer “go”).
Each of the following is a valid HALT instruction:
0x7000, 0x7123, 0x7777, 0x7ABC, 0x7FFF,
etc.
The
Hexadecimal Program
The
entire assembled program can be represented as an indexed array.
Location |
Contents |
0x000 |
0x1004 |
0x001 |
0x3005 |
0x002 |
0x2006 |
0x003 |
0x7000 |
0x004 |
0x0004 |
0x005 |
0x0008 |
0x006 |
0x0000 |
Note
that the “Dec 0” or “Decimal 0” instruction caused the memory allocated to each
of the
variables to be set to zero.
Technically
“Dec 0” is a pseudo–op, in that it is
a directive to the assembler to do something and is
not an executable instruction. Two other
useful pseudo–ops are
Hex declares a hexadecimal number
END declares
the end of the assembly unit. Otherwise
the assembler gets lost.
The Complete
Program (Without Addresses)
Load X
Add Y
Store Z
Halt
X, Dec 4
Y, Dec 8
Z, Dec 0
END
This
is the program as it would be input.
Comment: One key
difference between most assembly languages and high–level languages that are
compiled is that the latter do not require explicit declaration of memory
locations, as was done above.
A compiler just requires a type definition, from which it will automatically
generate the storage assignments.
The SkipCond Instruction
Disassembly
Consider
the following “core dump” of a MARIE assembly language program.
All
numbers are represented in hexadecimal.
Note
that this program uses an advanced instruction (Clear) to clear the
accumulator.
If
the execution begins at address 0100, what does the program do?
Address Contents
000 A000
001 2009
002 5000
003 200A
004 400B
005 8800
006 7000
007 2009
008 7000
009 0000
00A 0000
00B 0030
Disassembling
the Program (Page 1)
To
disassemble the hexadecimal code, we must identify and determine the
effect of each machine language instruction.
Let’s do this one instruction at a time.
Look
at table 4.7 on page 172 of the textbook to get the definitions.
000 A000
This is an instruction to clear the accumulator. Clear
001 2009
This instruction stores the accumulator into an address.
For lack of anything better, I am
calling this W009 Store W009
002 5000
Input Place the
input data into the accumulator Input
003 200A
This stores the contents of the accumulator Store W00A
004 400B
This subtracts a value from the accumulator Subt W00B
Disassembling
the Program (Page 2)
005 8800
This is Skipcond 800.
Skip the next instruction if the AC > 0. Skipcond 800
006 7000
Halt Halt
007 2009
Store the accumulator contents into this address Store W009
008 7000
Halt. As
there are no branches around this, it is the Halt
last instruction
to be executed. The next three
words must hold
data for the program.
009 0000 Decimal 0
00A 0000 Decimal 0
00B 0030 Decimal 48
Disassembling
the Program (Page 3)
We
now list the pseudo–assembly language form of the program. At this
point, we do not have any good names for the variables.
000 Clear //
Clear the accumulator
001 Store W009 //
Store the zero in this location
002 Input //
Read from the input device. Call this N.
003 Store W00A //
Store the raw input in this location.
004 Subt W00B // Subtract decimal 48 from the input.
005 Skipcond 800 //
Skip if (N – 48) > 0, or N > 48.
006 Halt // Halt with 0 in W009
if N £ 48.
007 Store W009 //
Store (N – 48) in W109
008 Halt // Halt
009 W009, Dec 0
00A W00A, Dec 0
00B W00B, Dec 48 //
The ASCII value for the character ‘0’.
This reads in the ASCII value
of a digit and stores its numeric value in W109.
Sample
Program #2
Address Instruction Comment
000 Input //
Get a number into the AC
001 Store X //
Store it into location X
002 Add X //
Add it back to itself, doubling it.
003 Store X2 //
Store the doubled value.
004 Add X2 //
Now we have four times the value in AC
005 Store X4 //
Store X times 4
006 Add X4 //
Now we have eight times the value
007 Store X8 //
Just for debugging
008 Add X2 //
Now we have ten times the value
009 Output //
Show the answer
00A Halt
00B X, Dec 0
00C X2, Dec 0
00D X4, Dec 0
00E X8, Dec 0
END
Sample
Program #3
Address Instruction Comment
000 Input //
Get a number into the AC
001 JnS By5 // Call a subroutine
002 Store X5 //
Store the value
003 Output // Display the vale
004 Halt // And
stop.
005 X5, Dec 0
006 By5, Hex 0 //
Stores the return address
007 Store X //
Store the AC
008 Add X // AC now has X · 2.
009 Store X2 // Store the doubled value
00A Add X2 //
AC now has X · 4.
00B Add X //
AC now has X · 5.
00C JumpI By5 // Indirect jump to return.
00D X, Dec 0
00E X2, Dec 0
END