Mark D. Hill and Alan Jay Smith

EECS Department, University of California, Berkeley

Technical Report No. UCB/CSD-84-175

, 1984

http://www2.eecs.berkeley.edu/Pubs/TechRpts/1984/CSD-84-175.pdf

Advances in integrated circuit density are permitting the implementation on a single chip of functions and performance enhancements beyond those of a basic processors. One performance enhancement of proven value is a cache memory; placing a cache on the processor chip can reduce both mean memory access time and bus traffic. In this paper we use trace driven simulation to study design tradeoffs for small (on-chip) caches. Miss ratio and traffic ratio (bus traffic) are the metrics for cache performance. Particular attention is paid to sub-block caches (also known as sector caches), in which address tags are associated with blocks, each of which contains multiple sub-blocks; sub-blocks are the transfer unit. Using traces from two 16-bit architectures (Z8000,PDP-11) and two 32-bit architectures (VAX-11, System/370), we find that general purpose caches of 64 bytes (net size) are marginally useful in some cases, while 1024-byte caches perform fairly well; typical miss and traffic ratios for a 1024 byte (net size) cache, 4-way set associative with 8 byte blocks are: PDP-11: .039, .156, .060, VAX 11: .080, .160, Sys/370: .244, .489. (These figures are based on traces of user programs and the performance obtained in practice is likely to be less good.) The use of sub-blocks allows tradeoffs between miss ratio and traffic ratio for a given cache size. Load forward is quite useful. Extensive simulation results are presented.


BibTeX citation:

@techreport{Hill:CSD-84-175,
    Author= {Hill, Mark D. and Smith, Alan Jay},
    Title= {Experimental Evaluation of On-Chip Microprocessor Cache Memories},
    Year= {1984},
    Month= {Apr},
    Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/1984/5964.html},
    Number= {UCB/CSD-84-175},
    Abstract= {Advances in integrated circuit density are permitting the implementation on a single chip of functions and performance enhancements beyond those of a basic processors. One performance enhancement of proven value is a cache memory; placing a cache on the processor chip can reduce both mean memory access time and bus traffic. In this paper we use trace driven simulation to study design tradeoffs for small (on-chip) caches.  Miss ratio and traffic ratio (bus traffic) are the metrics for cache performance. Particular attention is paid to sub-block caches (also known as sector caches), in which address tags are associated with blocks, each of which contains multiple sub-blocks; sub-blocks are the transfer unit. Using traces from two 16-bit architectures (Z8000,PDP-11) and two 32-bit architectures (VAX-11, System/370), we find that general purpose caches of 64 bytes (net size) are marginally useful in some cases, while 1024-byte caches perform fairly well; typical miss and traffic ratios for a 1024 byte (net size) cache, 4-way set associative with 8 byte blocks are: PDP-11: .039, .156, .060, VAX 11: .080, .160, Sys/370: .244, .489.  (These figures are based on traces of user programs and the performance obtained in practice is likely to be less good.) The use of sub-blocks allows tradeoffs between miss ratio and traffic ratio for a given cache size. Load forward is quite useful. Extensive simulation results are presented.},
}

EndNote citation:

%0 Report
%A Hill, Mark D. 
%A Smith, Alan Jay 
%T Experimental Evaluation of On-Chip Microprocessor Cache Memories
%I EECS Department, University of California, Berkeley
%D 1984
%@ UCB/CSD-84-175
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/1984/5964.html
%F Hill:CSD-84-175