Chapter 7D – The Java Virtual Machine

This sub–chapter discusses another architecture, that of the JVM (Java Virtual Machine).  In
general, a VM (Virtual Machine) is a hypothetical machine (implemented in either hardware
or software) that directly executes a specific language.  The JVM is a machine that directly
executes Java byte code, which is the output of the Java compiler.  While it is reliably rumored
that a hardware implementation of the JVM has been designed, all existing commercial Java
virtual machines are implemented as software running on top of other machines.  It is worth
note that the term “Java” in the context of a programming language is a trademark of the Oracle
Corporation, which acquired Sun Microsystems on January 27, 2010.

Strictly speaking, the term “Java Virtual Machine” was created by Sun Microsystems, Inc. to
refer to the abstract specification of a computing machine designed to run Java programs [R024,
R025].  Correct implementations of the JVM are required to support the syntax of the language
(how to form expressions), as well as the semantics (what these expressions actually do).

As in many similar cases, the name is also applied to any software that actually implements the specification. 
So many books might make claims such as “The Java Virtual Machine (JVM) is
the software that executes compiled Java bytecode … the name given to the machine language
inside compiled Java programs.”[page 321, R019].

The term “Java” is used primarily in two contexts, that of a Java applet and that of a Java
application.  A Java application appears to be a standard program, which may or may not have
a graphical appearance.  A Java applet is run using a Java–enabled web browser.

When a Java method executes, it has its own stack frame, which is managed by the JVM.  The
two areas of the stack frame of interest to us are 1) the operand stack and 2) the local variable
storage.  As we shall see in a future lecture, a stack frame is a dynamically allocated section of
memory used to store values for the execution of a particular method.  While the stack frame
has many of the attributes of a stack, it is accessed using methods other than the standard stack
operators.  On the other hand, the operand stack is handled much as the pure data type.

The Java Operand Stack

The operand stack holds operands for the Java instructions.  Each entry in the stack occupies
four bytes (32 bits), even those holding shorter data types.  The operand stack is handled
basically as a LIFO (Last In First Out) data structure, except that certain instructions will
pop two items at a time.

The local variable storage for a method is handled very much as a zero–based array.  At
the JVM level, each variable is identified by its index in this array.  Interaction between
the operand stack and local variable storage is illustrated by the following Java byte code
examples, which will require more explanation.

iload_1    // Push the integer value in storage location
               // 1 onto the stack

    istore_2   // Pop the value off the stack and store it
               // as an integer in storage location 2.

We shall explain these instructions more when we discuss stack operations.


Operations on the JVM Operand Stack

Here we review the basic operations on a stack, viewed from both the ADT (Abstract Data
Type) approach and also from the modified approach taken by the JVM.  As an ADT, a
stack is a LIFO data structure with only a few basic operations.  The two basic operations
we shall consider are Push and Pop.  These are described first using the IA–32 assembler syntax.

   PUSH X   ; Push the value at location X onto the stack.

   POP  Y   ; Pop the last value pushed onto the stack
            ; and store it into the location Y.

The description of the stack as a LIFO data structure illustrates its basic operation.  Whatever
is last put on the stack is the first removed.  We shall adopt the standard terminology that any
item is added to the top of the stack and removed from that location.  While any correct stack
implementation must be based on a SP (Stack Pointer), we shall not use that detail here. In the
IA–32 architecture, the 32–bit stack pointer is called ESP.

Here is a short sequence of stack operations, just to serve as a review for the ADT. 
We begin by assuming the following associations of values with locations.

Location

Value

X1

1

X2

2

Y

0

In this sequence, we show only that part of the stack that is directly relevant.

   PUSH X1   ; Push the value at location X1 onto the stack.

   PUSH X2   ; Push the value at location X2 onto the stack.

   POP Y     ; Pop the value from the stack and place into Y.
             ; The value 2 is no longer a part of the stack.

   PUSH X1   ; Push the value at location X1 onto the stack.

 


 

The JVM operand stack is handled fairly much as the abstract data type. 
Here are the key differences.
      1.      The push operation is replaced by a variety of load instructions.
      2.      The pop operation is augmented by a variety of store operations.
      3.      There are a number of operations that pop two operands at a time.
      4.      Some of the above operations pop two operands and then push one result.

Java Primitive Data Types

The primitive data types for the JVM are those for the Java language.  Here is a table.

Data Type

Bytes

Format

Range

char

2

Unicode character

\u0000 to \uFFFF *

byte

1

Signed Integer

–128 to + 127

short

2

Signed Integer

– 32768 to + 32767

int

4

Signed Integer

– 2147483648 to
+ 2147483647

long

8

Signed Integer

– (263) to 263 – 1

float

4

IEEE Single Precision Float

Magnitude: 1.40·10–45 to
3.40
·1038, and zero

double

8

IEEE Double Precision Float

Magnitude: 4.94·10–3224 to
1.798
·10308, and zero

* The character with 16–bit code 0x0000 through that with 16–bit code 0xFFFF.

While the primitive data types are as expected, the JVM storage is based on 32–bit (4 byte)
entities.  Specifically, the operand stack is organized into 32–bit slots and the local variable
storage is considered as an array of 32–bit words of no particular data type.

The correspondence between the Java language primitive data types and the storage allocations
of the JVM is simple: the two 64–bit data types (long and double) are each stored in two
32–bit entries in the local variable array and stored as two entries on the operand stack.  All
other data types are assigned one 32–bit storage location.

Put another way, from the JVM storage view, each data type is either a single word or a
double word.  The JVM instructions interpret these as appropriate.  Consider the following
two Java byte code instructions

       dadd     pop two double–precision floating point values from the stack,
               add them, and push the result back onto the stack

       ladd     pop two long (64 bit) signed integer values from the stack,
               add them, and push the result back onto the stack.

In either case, we have the following stack conditions.

Before

After

Value1–Word1

Result–Word1

Value1–Word2

Result–Word2

Value2–Word1

 

Value2–Word1

 

Something else

 

It is interesting to note that Java stores all multi–byte values in big–endian order, as opposed
to the Intel architectures, which use little–endian order.  Big–endian order is also called
network order”, in that it is used for values transmitted over the global Internet.  As the
JVM provides no instructions for direct access of the memory of the base hardware (Intel,
Motorola, Sparc, etc.), this hardware independence causes no problem.

A Sample of Some Instructions

At this time, it might be convenient to show a sample of Java source code and discuss its
representation as instructions at the bytecode level.  To avoid confusion, this author finds
it expedient to discuss the bytecode instructions before using them. 

In Java bytecodes, each instruction contains a one–byte opcode followed by zero or more
operands.  The name “bytecode” comes from the size of the opcode.  It appears that the JVM
can accommodate at most 28 = 256 different opcodes; though there is a way around this limit.

The following example shows two ways to load integers; that is, push the values onto the
operand stack.  The
iload instructions load from the indexed variable storage, and an
interesting variety of
iconst instructions load constant values.  We first examine the
iconst instructions, noting that each has a separate opcode.

Opcode           Instruction           Comment
0x02                iconst_m1        Push the integer constant –1 onto the stack.
0x03               
iconst_0          Push the integer constant 0 onto the stack
0x04               
iconst_1          Push the integer constant 1 onto the stack
0x05               
iconst_2          Push the integer constant 2 onto the stack
0x06               
iconst_3          Push the integer constant 3 onto the stack
0x07               
iconst_4          Push the integer constant 4 onto the stack
0x08               
iconst_5          Push the integer constant 5 onto the stack

There are two variants of the iload instruction, one for general indices into the local
variable array and one dedicated to the first four elements of this array.  Here we focus on
the latter instructions, each with its one byte opcode.  None of these has an argument.

Opcode           Instruction           Comment
0x1A               iload_0             Copy the integer in location 0 and push onto the stack.
0x1B              
iload_1             Copy the integer in location 1 and push onto the stack.
0x1C              
iload_2             Copy the integer in location 2 and push onto the stack.
0x1D              
iload_3             Copy the integer in location 3 and push onto the stack.

The other variant of the iload instruction has opcode 0x15.  Its appearance in byte code
is as the 2 byte entry
0x15 NN, where NN is the hexadecimal value of the variable index.  Thus
the code
0x15 0A would be read as “push the integer in element 10 of the variable array
onto the stack”.  Remember that 0x0A is the hexadecimal representation of decimal 10.


 

There are two variants of the istore instruction, one for general indices into the local
variable array and one dedicated to the first four elements of this array.  Here we focus on
the latter instructions, each with its one byte opcode.  None of these has an argument.

Opcode           Instruction           Comment
0x3B               istore_0          Pop the integer and store into location 0.
0x3C              
istore_1          Pop the integer and store into location 1.
0x3D              
istore_2          Pop the integer and store into location 2.
0x3E               
istore_3          Pop the integer and store into location 3.

The other variant of the istore instruction has opcode 0x36.  Its appearance in byte code
is as the 2 byte entry
0x36 NN, where NN is the hexadecimal value of the variable index.

With all of that said, here is the Java source code example.

   int a = 3 ;

   int b = 2 ;

   int sum = 0 ;

   sum = a + b ;

Here is the local variable table for this example.  Note that we make the reasonable assumption
that table space is reserved for the variables in the order in which they were declared.

Location Index

Variable stored

0

a

1

b

2

sum

Here is the Java bytecode for the above simple source language sequence.  In this, we add a
number of comments, each in the Java style, to help the reader make the connection.

//   int a = 3 ;

     iconst_3        // Push the constant value 3 onto the stack

     istore_0        // Pop the value and store in location 0.

//   int b = 2 ;

     iconst_2        // Push the constant value 2 onto the stack

     istore_1        // Pop the value and store in location 1

//   int sum = 0 ;

     iconst_0        // Push the constant value 0 onto the stack

     istore_2        // Pop the value and store in location 2

//   sum = a + b ;

     iload_0         // Push value from location 0 onto stack

     iload_1         // Push value from location 1 onto stack

     iadd            // Pop the top two values, add them,
                     // and push the value onto the stack.

     istore_2        // Pop the value and store in location 2.

Just to be very explicit about this, we shall show the state of the operand stack and
variable array as the bytecode above is executed.

In the diagrams below, the gray color in a box indicates that the entry is not valid logically
within the current context.  For example, following the execution of the
iconst_3 instruction,
the stack has only one valid entry, which contains the value 3.  Any entry on the stack that is
“below” the current entry might have meaning within another context, but not in this one.

Similarly, there are values in the variable array, but the values have no meaning in this context.

//   int a = 3 ;

     iconst_3        // Push the constant value 3 onto the stack

     istore_0        // Pop the value and store in location 0.

//   int b = 2 ;

     iconst_2        // Push the constant value 2 onto the stack

     istore_1        // Pop the value and store in location 1

//   int sum = 0 ;

     iconst_0        // Push the constant value 0 onto the stack

     istore_2        // Pop the value and store in location 2


 

To facilitate reading the example, we repeat the state of the JVM just before the addition.

//   sum = a + b ;

     iload_0         // Push value from location 0 onto stack

     iload_1         // Push value from location 1 onto stack

     iadd            // Pop the top two values, add them,
                     // and push the value onto the stack.

     istore_2        // Pop the value and store in location 2.

To complete the discussion of the example, we need to list the integer (32 bit two’s complement) arithmetic operations. 
Here is a table of the standard operations.

Opcode       Instruction    Comment
0x60            iadd               Pop the top two values, add them, and push the result onto the stack.
0x64            isub                Pop the two values, subtract the top one from the second one (next to
                                          top on the stack before subtraction), and push the result onto stack.

0x68            imul                Pop the two values, multiply them, and push the result onto the stack.

0x6C           idiv                Pops the two values, divide the value that was second from top by the
                                          value that was at the top of the stack, truncates the result to the nearest
                                          integer, and pushes the result onto the stack.

Here is the Java bytecode for the short sequence used in the example above.  It has ten bytes.

06 3B 05 3C 03 3D 1A 1B 60 3D


 

The Conditional Instructions

Conditional instructions are one of the class of instructions that separate a stored program
computer from a simple calculator.  It must be possible to compare two values of the same
type and make branching decisions based on the result.  The facility to compare two values
of different types (integer and float, float and double) will probably be provided by the
compiler of a higher level language, and involve type conversions.

The JVM divides these operations into two classes: conditional branch instructions and
comparison instructions.  The division between the two classes is based on operand type,
with the operators called “comparison instructions” used to compare long integers (64 bit),
single precision floating point values, and double precision floating point values.

In the JVM, the conditional each branch instructions pops one value from the stack, examines it,
and branches accordingly.  There are fourteen instructions in this class.  Each of these fourteen
is a three-byte instruction, with the following form.

Byte    Type                                          Range                      Comment
1          unsigned 1-byte integer field     0 to 255                    8-bit opcode
2, 3      signed 2-byte integer field         -32768 to 32767       branch offset

For each of these instructions, the branch target address is computed as (pc + branch_offset),
where pc (program counter) references the address of the opcode of the branch instruction,
and branch_offset is a 16-bit signed integer indicating the offset of the branch target.

The first set of instructions to discuss cover the Java null object.

Opcode  Instruction   Comment
0xC6             ifnull             This pops the top item from the operand stack.  If it is an object
                                            reference to the special object
null, the branch is taken.

0xC7             ifnonnull       The branch is taken if the item does not reference null.

The following twelve instructions treat each numeric value as a 32-bit signed integer.  While
the Java language supports signed integers with length of 8 bits and 16 bits, it appears that
each of these types is extended to 32 bits when pushed onto the stack.

Each of the instructions in the next set pops a single item off the stack, examines it as a 32-bit
integer value, and branches accordingly.

Opcode           Instruction     Comment

0x99                ifeq                  jump if the value is zero
0x9E                ifle                   jump if the value is zero or less than zero (not positive)
0x9B               iflt                   jump if the value is less than zero (negative)
0x9C               ifge                  jump if the value is zero or greater than zero (not negative)
0x9D               ifgt                  jump if the value is greater than zero (positive)
0x9A               ifne                  jump if the value is not zero


 

Each of the next instructions pops two items off the stack, compares them as 32-bit signed
integers, and branches accordingly.  For each of these instructions, we assume that the stack
status before the execution of the branch instruction is as follows.

V1  the 32-bit signed integer at the top of the stack
V2  the 32-bit signed integer next to the top of the stack.

Opcode  Instruction    Comment
0x9F      if_icmpeq      Branch if V1 == V2
0xA2      if_icmpge      Branch if V2 ≥ V1
0xA3      if_icmpgt      Branch if V2 > V1
0xA4      if_icmple       Branch if V2 ≤ V1
0xA1      if_icmplt       Branch if V2 < V1
0xA0      if_icmple       Branch if V1 ≠ V2

In the JVM, the comparison operators pop two values from the stack, compare them, and
then push a single integer onto the top of the stack to indicate the result.  Assume that,
prior to the execution of the comparison operation; the state of the stack is as follows.

V1_Word 1
V1_Word 2
V2_Word 1
V2_Word 2

Here is a description of the effect of this class of instructions.

Result of comparison

Value Pushed onto Stack

V1 > V2

– 1

V1 = V2

0

V1 < V2

1

Here is a list of the comparison instructions.

Opcode     Instruction     Comment
0x94          lcmp                Takes long integers from the stack (each being two words, four
                                          stack entries are popped), compares them, and pushes the result.

0x97          fcmpl               Each takes single–precision floats from the stack (two stack
0x98          fcmpg              entries in total), compares them and pushes the result.  The values
                                          +0.0 and –0.0 are treated as being equal.

0x95          dcmpl              Each takes double–precision floats from the stack (four stack
0x96          dcmpg             entries in total), compares them and pushes the result.  The values
                                          +0.0 and –0.0 are treated as being equal.

Note that there are two comparison operations for each floating point type.  This is based on
how one wants to compare the IEEE standard value NaN (Not a Number, use for results of
arithmetic operations such as 0/0).  The
fcmpl and dcmpl instructions return –1 if either
value is NaN.  The
fcmpg and dcmpg instructions return 1 if either value is NaN.


 

The JVM includes a number of other useful bytecode instructions, including the following.

1.    Instructions to push other integer constant values onto the stack.

2.      Instructions to handle other stack operations, such as discarding the top elements,
       and swapping the top element with the one just below it.

3.    A complete suite of logical operations.

4.    A complete suite of type conversion operations.

5.    Instructions for unconditional branches, subroutine calls, and subroutine returns.

The purpose of this sub–chapter has been to use the JVM (Java Virtual Machine) to illustrate
a commonly used stack–oriented architecture.  A full discussion of the JVM must include
very much more.