Symbols, Addresses, and Variables
Why Assembler Language
Does Not Use Variables
Edward L. Bosworth, Ph.D.
Associate Professor
TSYS Department of Computer Science
Sample of Assembler Language Code
Consider
the assignment statement Z
= X + Y.
We
are using IBM® 370 Series Assembler Language as an example.
Here is a possible translation of the high–level language above.
LD
0,X LOAD REGISTER 0 FROM
ADDRESS X
AD
0,Y ADD VALUE AT ADDRESS Y
STD 0,Z STORE RESULT INTO ADDRESS Z
The
first question in examining this text is to determine what it does.
I have already told you much of what it does, but let’s start at the beginning.
PLEASE NOTE: This lecture will use much that has yet to be discussed.
Please
focus on the “big picture”. The details,
such as
how to
write the code, will be discussed in due time.
A Two – Pass Assembler
Here,
we shall focus only on the first pass
of either a compiler or assembler.
The goal is to read and interpret the symbols found in the text of the code.
Here
is what a two–pass assembler would first do with this text.
LD
0,X LOAD REGISTER 0 FROM
ADDRESS X
AD
0,Y ADD VALUE AT ADDRESS Y
STD 0,Z STORE RESULT INTO ADDRESS Z
The
symbols LD, AD, and STD would be identified as
assembler language
operations, and the symbol 0 would be identified as a
reference to register 0.
The
symbols X, Y, and Z would be identified
properly only if those symbols
had been properly identified. Here is
the way to do it for this code.
X DC
D‘3.0’ DOUBLE-PRECISION FLOAT
Y DC
D‘4.0’
Z DC
D‘0.0’
Rereading the Assembler Text
Here
is a somewhat literal translation of the text.
Note that the
spacing has been altered so that I could get more on a line.
LD
0,X Load with 8-byte value from address X
AD
0,Y Load with 8-byte value
from address Y
STD 0,Z
Store result into 8 bytes at address Z
X DC
D‘3.0’ Set aside eight bytes (64 bits)
at an address to be
associated with
the label X, initialize it
to 3.0
Y DC
D‘4.0’ Set aside eight bytes (64
bits)
at an address to be
associated with
the label Y, initialize it
to 4.0
at an address to be
associated with
the label Z, initialize it
to 0.0
Cautions on the Assembler Process
In
the above fragments, we see two
independent processes at work.
1) Use
of data declarations to reserve space in memory to
be associated with labeled
addresses.
2) Use
of assembly code to perform operations on these data.
Note
that these are inherently independent.
It is
the responsibility of the coder to apply the operations to
the correct data types.
Occasionally,
it is proper to apply a different (and apparently
inconsistent) operation to a data type.
Consider the following.
XX DS
D Double-precision floating
point
All
that really says is “Set aside an eight–byte memory area, and
associate it with the symbol XX.”
Any eight–byte data item could be placed here,
even a 15–digit packed decimal format.
(This is commonly done)
Reading Some BAD Assembler Text
To
show what could happen, and commonly does in student programs,
lets rewrite the above fragment.
LD
0,X LOAD REGISTER 0 FROM
ADDRESS X
AD
0,Y ADD VALUE AT ADDRESS Y
STD 0,Z STORE RESULT INTO ADDRESS Z
X DC
E‘3.0’ SINGLE-PRECISION FLOAT, 4
BYTES
Y DC
E‘4.0’ ANOTHER SINGLE-PRECISION
Z DC
D‘0.0’ A DOUBLE PRECISION
The
first instruction “LD 0,X” will go to address X and extract the next
eight bytes. This will be four bytes for
3.0 and four bytes for 4.0.
The
value retrieved will be 0x4130
0000 4140 0000,
which represents
a double–precision number with value slightly larger than 3.0.
Had X
and Y been properly declared, the value retrieved would have been
0x4130 0000 0000 0000.
What A Modern Compiler Does
Consider
the following fragments of Java code.
double x = 3.0; // 64 bits or eight bytes
double y = 4.0; // 64 bits or eight bytes
double z = 0.0; // 64 bits or eight bytes
// More
declarations and code here.
z = x + y; // Do the addition that is
// proper for this data
type.
// Here, it is
double-precision
// floating point
addition.
Note
that the compiler will interpret the source–language statement
“z = x + y” according to the data
types of the operands.
Rereading the Java Text
Here
is an informal translation of the Java code.
We assume that the Java
compiler is multi–pass, and that it has a standard first pass.
The
standard first pass will identify all of the symbols, allocate storage for
each, and assign a data type for each.
The
compiler, as a system program with input and output, maintains a number
of internal tables to assist in generating the appropriate code.
Here
is our reading. We assume a table used
to describe the variables.
For each variable, this table stores: the
character representation,
the
storage location allocated, and
the
data type for the variable.
When
an operation on a variable is called for, it is the description stored
in this compiler table that is used to select the proper low–level code.
Rereading the Java Text (page 2)
Let’s
begin by rereading and interpreting the code we have seen.
double x =
3.0;
// Allocate 8 bytes of storage to be associated with
// the symbol “x”. Initialize the value
to the
// double–precision floating–point value 3.0.
// Place entries in the compiler tables allocating
// eight bytes for storage and indicating that this
// variable is double–precision floating point.
double y =
4.0; // Same for this symbol,
// but set the value to
4.0.
double z =
0.0; // Same for this symbol,
// but set the value to
4.0.
z = x + y; // Do the addition that is
// proper for this data
type.
// The data
declarations determine the operation.
Another Java Code Fragment
Here
is more code, similar to the first fragment.
float a = 3.0;
// 32 bits or four bytes
float b = 4.0;
// 32 bits or four bytes
float c = 0.0;
// 32 bits or four bytes
double x =
3.0; // 64 bits or eight bytes
double y =
4.0; // 64 bits or eight bytes
double z =
0.0; // 64 bits or eight bytes
// More
declarations and code here.
c = a + b; // Single-precision floating-point
// addition is done here
z = x + y; // Double-precision
floating-point
// addition is done here
The
operations “c
= a + b”
and “z = x + y” have no meaning, apart
from the data types recorded by the compiler.
More on This Code: IBM® 370 Assembler Equivalents
Here
is the sort of thing that might happen.
Assume the data declarations
given above, and repeated here is somewhat altered fashion.
float a = 3.0, b = 4.0, c = 0.0 ;
double x = 3.0, y =
4.0, z = 0.0 ;
c = a + b ; // LE
0,A Instructions appropriate
// AE 0,B
for single-precision
// STE 0,C floating point data.
z = x + y ; // LD
2,X Instructions appropriate
// AD 2,Y
for double-precision
// STD 2,Z floating point data.
IMPORTANT POINT
The
assembler code emitted is entirely dependent on the data types for
the operands, as declared earlier in the program.
Another
Note: When possible, the compiler will
avoid immediate reuse
of registers,
in an attempt to keep as much data in local
registers for
later use. The code is more efficient.
Side Note: How to Write a Better Compiler
Consider
the following Java fragment, focusing on how it might be
compiled and how it should be compiled.
double v = 0.0, w =
0.0, x = 3.0, y = 4.0, z = 5.0 ;
w = x + y ;
v = x + y + z ;
The
example below shows inefficient code of the type actually emitted by an
early 1970’s era compiler. The modern
compiler keeps the sum x
+ y in
register 0 and reuses it as a partial sum in the next result x + y + z.
Older Compiler Modern
Compiler
LD
0, X LD 0, X
AD
0, Y AD 0, Y
STD 0, W STD 0, W
LD
0, X
AD
0, Y
AD
0, Z AD 0, Z
STD 0, V STD 0, V
A Final Comment on Assembler Addresses
Here
is some code that will work, though it is strange.
LD
0,X LOAD REGISTER 0 FROM
ADDRESS X
AD
0,Y ADD VALUE AT ADDRESS Y
STD 0,Z STORE RESULT INTO ADDRESS Z
LD
2,Z RESULT NOW IN REGISTER 2
X DC
D‘3.0’ DOUBLE-PRECISION FLOAT: 8
bytes
Y DC
D‘4.0’ DOUBLE-PRECISION FLOAT: 8
bytes
Z DC
F 32-BIT INTEGER: 4 BYTES
Z1 DC
H 16-BIT INTEGER: 2 BYTES
Z2 DC
H 16-BIT INTEGER: 2 BYTES
The
code above will work. At the end,
register 2 will have the correct result.
What
about the strange declarations for Z, Z1, and Z2? All that matters is
that the STD 0,Z instruction has eight
contiguous bytes into which to place
its result so that the LD
2,Z
instruction can retrieve it. This is
done.
Summary
The
most obvious conclusion is that it is not appropriate to discuss
assembler language code in terms of variables.
The
name “variable” should be reserved
for higher–level compiled
languages in which a data type is attached to each data symbol.
Here
is a brief comparison.
Language Assembler Compiled
Data type determined by Operation Data
Declaration
Attributes of the symbol Address Address
Storage
size Storage size
(the
operation
may
override this)
Data
type as
declared