Evaluation of Multithreading and Caching in Large Shared Memory Parallel Computers

Robert Francis Boothe

EECS Department
University of California, Berkeley
Technical Report No. UCB/CSD-93-766
July 1993

http://www2.eecs.berkeley.edu/Pubs/TechRpts/1993/CSD-93-766.pdf

Shared memory multiprocessors are considered among the easiest parallel computers to program. However, building shared memory machines with thousands of processors has proven difficult. Two main problems are the long latencies to shared memory and the large network bandwidth required to support the shared memory programming style.

In this dissertation, we quantify the magnitude of these problems and evaluate multithreading and caching as mechanisms for solving them. Multithreading works by overlapping communication with computation, and caching works by filtering out a large fraction of the remote accesses.

We evaluate several multithreading models using simulations of eight benchmark applications. On systems with multithreading but without caching, we have found that the best results are obtained for the explicit-switch multithreading model. This model provides an explicit context switch instruction that allows the compiler to select the points at which context switches occur. Our results suggest that a 200 cycle memory access latency can be tolerated using multithreading levels of 10 threads or less per processor. On systems with both multithreading and caching, we have found that the switch-on-miss multithreading is best. For this model, our results suggest that a 200 cycle memory access latency can be tolerated using multithreading levels of 3 threads or less per processor.

We show that by using multithreading techniques, systems both with and without caching are able to tolerate long latencies and achieve execution efficiencies of 80%. The difference between systems with and without caching is that the systems with caching (if properly configured) use an order of magnitude less network bandwidth. For large machines, the network cost will be a large factor in the cost of the machine, and thus the cost and complexity of cache coherency will be offset by the reduction in the network bandwidth requirement.

Advisor: Abhiram G. Ranade


BibTeX citation:

@phdthesis{Boothe:CSD-93-766,
    Author = {Boothe, Robert Francis},
    Title = {Evaluation of Multithreading and Caching in Large Shared Memory Parallel Computers},
    School = {EECS Department, University of California, Berkeley},
    Year = {1993},
    Month = {Jul},
    URL = {http://www2.eecs.berkeley.edu/Pubs/TechRpts/1993/6284.html},
    Number = {UCB/CSD-93-766},
    Abstract = {Shared memory multiprocessors are considered among the easiest parallel computers to program. However, building shared memory machines with thousands of processors has proven difficult. Two main problems are the long latencies to shared memory and the large network bandwidth required to support the shared memory programming style. <p>In this dissertation, we quantify the magnitude of these problems and evaluate multithreading and caching as mechanisms for solving them. Multithreading works by overlapping communication with computation, and caching works by filtering out a large fraction of the remote accesses. <p>We evaluate several multithreading models using simulations of eight benchmark applications. On systems with multithreading but without caching, we have found that the best results are obtained for the explicit-switch multithreading model. This model provides an explicit context switch instruction that allows the compiler to select the points at which context switches occur. Our results suggest that a 200 cycle memory access latency can be tolerated using multithreading levels of 10 threads or less per processor. On systems with both multithreading and caching, we have found that the switch-on-miss multithreading is best. For this model, our results suggest that a 200 cycle memory access latency can be tolerated using multithreading levels of 3 threads or less per processor. <p>We show that by using multithreading techniques, systems both with and without caching are able to tolerate long latencies and achieve execution efficiencies of 80%. The difference between systems with and without caching is that the systems with caching (if properly configured) use an order of magnitude less network bandwidth. For large machines, the network cost will be a large factor in the cost of the machine, and thus the cost and complexity of cache coherency will be offset by the reduction in the network bandwidth requirement.}
}

EndNote citation:

%0 Thesis
%A Boothe, Robert Francis
%T Evaluation of Multithreading and Caching in Large Shared Memory Parallel Computers
%I EECS Department, University of California, Berkeley
%D 1993
%@ UCB/CSD-93-766
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/1993/6284.html
%F Boothe:CSD-93-766