Performance Evaluation of Cache Prefetch Implementation

John Tse and Alan Jay Smith

EECS Department
University of California, Berkeley
Technical Report No. UCB/CSD-95-877
June 1995

http://www2.eecs.berkeley.edu/Pubs/TechRpts/1995/CSD-95-877.pdf

Prefetching into CPU caches has long been known to be effective in reducing the cache miss ratio, but implementations of prefetching have been unsuccessful in improving CPU performance. The reasons for this are that prefetches interfere with normal cache operation by making cache address and data ports busy, the memory bus busy and the memory banks busy, and by not necessarily being complete by the time that the prefetched data is actually referenced. In this paper, we present the results of a very detailed cycle by cycle trace driven stimulation of a uniprocessor memory system, in which we vary several relevant parameters in order to determine when and if prefetching is useful. We find that in order for prefetching to actually improve performance, the address array needs to be double ported, and the data array needs to either be double ported or fully buffered. It is also very helpful for the bus to be reasonably wide, bus transactions to be split and main memory to be interleaved. Under the best circumstances, i.e. with a significant investment in extra hardware, prefetching can significantly improve performance.


BibTeX citation:

@techreport{Tse:CSD-95-877,
    Author = {Tse, John and Smith, Alan Jay},
    Title = {Performance Evaluation of Cache Prefetch Implementation},
    Institution = {EECS Department, University of California, Berkeley},
    Year = {1995},
    Month = {Jun},
    URL = {http://www2.eecs.berkeley.edu/Pubs/TechRpts/1995/5515.html},
    Number = {UCB/CSD-95-877},
    Abstract = {Prefetching into CPU caches has long been known to be effective in reducing the cache miss ratio, but implementations of prefetching have been unsuccessful in improving CPU performance. The reasons for this are that prefetches interfere with normal cache operation by making cache address and data ports busy, the memory bus busy and the memory banks busy, and by not necessarily being complete by the time that the prefetched data is actually referenced. In this paper, we present the results of a very detailed cycle by cycle trace driven stimulation of a uniprocessor memory system, in which we vary several relevant parameters in order to determine when and if prefetching is useful. We find that in order for prefetching to actually improve performance, the address array needs to be double ported, and the data array needs to either be double ported or fully buffered. It is also very helpful for the bus to be reasonably wide, bus transactions to be split and main memory to be interleaved. Under the best circumstances, i.e. with a significant investment in extra hardware, prefetching can significantly improve performance.}
}

EndNote citation:

%0 Report
%A Tse, John
%A Smith, Alan Jay
%T Performance Evaluation of Cache Prefetch Implementation
%I EECS Department, University of California, Berkeley
%D 1995
%@ UCB/CSD-95-877
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/1995/5515.html
%F Tse:CSD-95-877