Views of
Memory
We begin with a number of views of computer memory and
comment on their use.
The simplest view of memory is that presented at the
ISA (Instruction Set Architecture)
level. At this level, memory is a
monolithic addressable unit.
At this level, the memory is a repository for data and
instructions, with no internal
structure apparent. For some very
primitive computers, this is the actual structure.
In this view, the CPU issues addresses and control
signals. It receives instructions
and data from the memory and writes data back to the memory.
This is the view that suffices for many high–level
language programmers.
In no modern architecture does the CPU write
instructions to the main memory.
The Logical
Multi–Level View of Memory
In a course such as this, we want to investigate the
internal memory structures
that allow for more efficient and secure operations.
The logical view for this course is a three–level view
with
cache memory, main memory, and
virtual memory.
The primary memory is backed by a “DASD” (Direct
Access Storage Device),
an external high–capacity device.
While “DASD” is a name for a device that meets certain
specifications, the standard
disk drive is the only device currently in use that “fits the bill”. Thus DASD = Disk.
This is the view we shall take when we analyze cache
memory.
A More
Realistic View of Multi–Level Memory
Generic
Primary / Secondary Memory
This
lecture covers two related subjects: Virtual
Memory and Cache Memory.
The
two subjects have much in common; their theoretical basis is the same.
In
each case, we have a fast primary memory
backed by a bigger secondary memory.
The “actors” in the two cases
are as follows:
Technology Cache Memory Virtual Memory
Faster Memory SRAM
Cache DRAM Main Memory
Slower Memory DRAM
Main Memory Disk
Block Name Block Page
Block Fits into Cache
Line Page Frame
Block Size 32 to 128 bytes 512
to 4,096 bytes
Always
a multiple of 512.
NOTE: The logical block of secondary
memory is always the same size as its frame
Cache memory: A 32–byte cache line implies memory blocks
of 32 bytes.
Virtual memory: A 512–byte page frame implies pages of 512
bytes.
Effective
Access Time
The discussion of effective access time should be
considered in the generic model of
primary (faster) memory vs. secondary (slower and larger) memory.
Faster memory Access
time = TP (Primary Time).
Slower memory Access
time = TS (Secondary Time).
Always we have TP £ 0.10 · TS. For virtual memory TP £ 0.00010 · TS.
Effective
Access Time: TE = h · TP + (1 – h) · TS, where h (the
primary hit rate) is
the fraction of memory accesses satisfied by the primary memory; 0.0 £ h £ 1.0.
This formula does extend
to multi–level caches. For example a
two–level cache has
TE = h1 · T1 + (1 – h1) · h2 · T2 + (1 – h1) · (1 – h2) · TS.
NOTATION WARNING: In some contexts, the DRAM main memory is called
“primary memory”. I never use that
terminology when discussing multi–level memory.
Examples:
Cache Memory
Suppose a single cache
fronting a main memory, which has 80 nanosecond access time.
Suppose the cache memory
has access time 10 nanoseconds.
If the hit rate is 90%,
then TE = 0.9 · 10.0 + (1 – 0.9) · 80.0
` = 0.9 · 10.0 + 0.1 · 80.0 = 9.0 + 8.0 = 17.0 nsec.
If the hit rate is 99%,
then TE = 0.99 · 10.0 + (1 – 0.99) · 80.0
` = 0.99 · 10.0 + 0.01 · 80.0 = 9.9 + 0.8 = 10.7 nsec.
Suppose a L1 cache with T1
= 4 nanoseconds and h1 = 0.9
Suppose a L2 cache with T2 = 10 nanoseconds
and h2 = 0.99
This is defined to be the number of hits on references that are a miss at L1.
Suppose a main memory with TS = 80.0
TE = h1
· T1 + (1 – h1) · h2
· T2 + (1 – h1) · (1 – h2)
· TS.
= 0.90 · 4.0 + 0.1 · 0.99 · 10.0 + 0.1 · 0.01 · 80.0
= 0.90 · 4.0 + 0.1 · 9.9 + 0.1 · 0.80
= 3.6 + 0.99 + 0.08 = 4.67 nanoseconds.
Note that with these hit
rates, only 0.1 · 0.01 = 0.001 = 0.1% of the memory
references are handled by the much slower main memory.
Example:
Virtual Memory
Suppose a main memory with access time of 80
nanoseconds being backed by a
disk with access time 10 milliseconds (10,000,000 nanoseconds).
Suppose 1,024–byte disk blocks and a hit rate of .9981
( 1,022 / 1,024).
Then, TE =
h · TP + (1 – h) · TS
=
0.9981 · 80 + 0.0019 · 10,000,000
=
79.8 + 19,000 = 19,080 nanoseconds
(approximately).
Suppose 1,024–byte disk blocks and a hit rate of
.9999.
Then, TE =
h · TP + (1 – h) · TS
=
0.9999 · 80 + 0.0001 · 10,000,000
=
80.0 + 1,000 = 1,080 nanoseconds
(approximately).
One can easily see that effective virtual memory is
based on having a very high hit rate.
The
Principle of Locality
It is the principle
of locality that makes this multi–level memory hierarchy work.
There are a number of ways to state this principle of
locality, which refers to the
observed tendency of computer programs to issue addresses in a small range.
Temporal Locality if a program references an address,
it
will likely address it again.
Spatial Locality if a program references an address,
the
next address issues will likely be “close”.
Sequential Locality programs tend to issue sequential
addresses,
executing
one instruction after another.
In 1968, Peter J. Denning, noted what he called the “working set”, which refers
to the tendency of computer programs to spend considerable periods of time
issuing
addresses within a small range.
In the Virtual Memory arena, this causes a small
number of pages to be read into the
main memory and used continuously for a significant time. It is this set of pages
that is precisely the working set.
Generic
Primary / Secondary Memory View
A small fast expensive
memory is backed by a large, slow, cheap memory.
Memory references are
first made to the smaller memory.
1. If
the address is present, we have a “hit”.
2. If
the address is absent, we have a “miss” and must transfer the addressed
item from the slow
memory. For efficiency, we transfer as a
unit the block
containing the addressed
item.
The mapping of the
secondary memory to primary memory is “many to one” in that
each primary memory block can contain a number of secondary memory addresses.
To compensate for each of
these, we associate a tag with each
primary block.
For example, consider a
byte–addressable memory with 24–bit addresses and 16 byte
blocks. The memory address would have
six hexadecimal digits.
Consider the 24–bit address
0xAB7129. The block containing that
address is every
item with address beginning with 0xAB712: 0xAB7120, 0xAB7121, … , 0xAB7129,
0xAB712A, … 0xAB712F.
The primary block would
have 16 entries, indexed 0 through F. It
would have the 20–bit
tag 0XAB712 associated with the block, either explicitly or implicitly.
Valid and
Dirty Bits
Consider a cache line
with eight bytes and a tag field.
Tag \ Offset |
0 |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
What does this cache line
contain?
1. It
contains nothing as data have not been copied from main memory.
2. It
contains the first 8 bytes of an all–zero array at address 0.
How
do we know which it is? We need some
tags associated with each cache line.
At
system start–up, the faster memory contains no valid data, which are copied as
needed from the slower memory.
Each block would have
three fields associated with it
The tag field (discussed
above) identifying the memory addresses
contained
Valid bit set
to 0 at system start–up.
set to 1 when valid data have been copied into the block
Dirty bit set
to 0 at system start–up.
set to 1 whenever the CPU writes to the faster memory
set to 0
whenever the contents are copied to the slower memory.
Associative
Memory
Associative memory is
“content addressable” memory. The contents
of the memory
are searched in one memory cycle.
Consider an array of 256
entries, indexed from 0 to 255 (or 0x0 to 0xFF).
Suppose that we are
searching the memory for entry 0xAB712.
Normal memory would be
searched using a standard search algorithm, as learned
in beginning programming classes.
If the memory is unordered, it
would take on average 128 searches to find an item.
If the memory is ordered, binary
search would find it in 8 searches.
Associative memory would find the item in one search.
Think of the control circuitry
as “broadcasting” the data value (here oxAB712) to all memory cells at the same
time.
If one of the memory cells has the value, it raises a Boolean flag and the item
is found.
We do not consider
duplicate entries in the associative memory.
This can be handled
by some rather straightforward circuitry, but is not done in associative
caches.
Associative
Cache
We now focus on cache
memory, returning to virtual memory only at the end.
Primary memory = Cache Memory (assumed to be one level)
Secondary memory = Main DRAM
Assume
a number of cache lines, each holding 16 bytes.
Assume a 24–bit address.
The simplest arrangement
is an associative cache. It is also the hardest to implement.
Divide
the 24–bit address into two parts: a 20–bit tag and a 4–bit offset.
Bits |
23 – 4 |
3 – 0 |
Fields |
Tag |
Offset |
A cache line in this
arrangement would have the following format.
D bit |
V Bit |
Tag |
16 indexed entries |
0 |
1 |
0xAB712 |
M[0xAB7120] … M[0xAB712F] |
The placement of the 16 byte
block of memory into the cache would be determined by
a cache line replacement policy. The policy would probably be as follows:
1. First,
look for a cache line with V = 0. If one
is found, then it is “empty”
and available, as nothing
is lost by writing into it.
2. If all cache
lines have V = 1, look for one with D = 0.
Such a cache line
can be overwritten without
first copying its contents back to main memory.
Direct–Mapped
Cache
This
is simplest to implement, as the cache line index is determined by the address.
Assume
256 cache lines, each holding 16 bytes.
Assume a 24–bit address.
Recall that 256 = 28, so that we need eight bits to select the cache
line.
Divide
the 24–bit address into three fields: a 12–bit explicit tag, an 8–bit line
number,
and a 4–bit offset within the cache line.
Note that the 20–bit memory tag is divided
between the 12–bit cache tag and 8–bit line number.
Bits |
23 – 12 |
11 – 4 |
3 – 0 |
Cache View |
Tag |
Line |
Offset |
Address View |
Block Number |
Offset |
Consider the address 0xAB7129.
It would have
Tag = 0xAB7
Line = 0x12
Offset = 0x9
Again,
the cache line would contain M[0xAB7120] through
M[0xAB712F].
The cache line would also have a V bit and a D bit (Valid and Dirty bits).
This
simple implementation often works, but it is a bit rigid. An design that is a
blend
of the associative cache and the direct mapped cache might be useful.
Set–Associative
Caches
An
N–way set–associative cache uses
direct mapping, but allows a set of N memory
blocks to be stored in the line. This
allows some of the flexibility of a fully associative
cache, without the complexity of a large associative memory for searching the
cache.
Suppose
a 2–way set–associative implementation of the same cache memory.
Again
assume 256 cache lines, each holding 16 bytes.
Assume a 24–bit address.
Recall that 256 = 28, so that we need eight bits to select the cache
line.
Consider
addresses 0xCD4128 and 0xAB7129. Each
would be stored in cache line
0x12. Set 0 of this cache line would
have one block, and set 1 would have the other.
Entry 0 |
Entry 1 |
||||||
D |
V |
Tag |
Contents |
D |
V |
Tag |
Contents |
1 |
1 |
0xCD4 |
M[0xCD4120] to M[0xCD412F] |
0 |
1 |
0xAB7 |
M[0xAB7120] to M[0xAB712F] |