A.K.W. Yeung

EECS Department, University of California, Berkeley

Technical Report No. UCB/ERL M90/15

, 1990

http://www2.eecs.berkeley.edu/Pubs/TechRpts/1990/ERL-90-15.pdf

With the advent of digital computers, Digital Signal Processing (DSP) has become a dominant force in the fields of signal processing and communication. Examples of such applications include digital audio, speech synthesis and recognition, telecommunication, image and video processing and robotics. As the complexity of the algorithms increases, the task of verifying and optimizing them becomes formidable. The process often requires high computation throughput and simulation of a large amount of data. For example a computation rate of 800 MOPS or more is typical for High Definition Television (HDTV) algorithms. Furthermore, to verify the behavior of the algorithms, many frames of data have to be simulated. These requirements dictate a hardware solution. While techniques such as bread-boarding and fast-prototyping can fulfill the requirements, they typically exhibit long development time and offer very little programmability which is important in optimizing the parameters of some algorithms. Some commercial multiprocessor computers are capable of providing high computation power but the high overhead in inter-processor communications, difficulty in mapping the algorithms to the architecture, lack of instructions for supporting DSP applications and the usual high cost of the machines often limit the effectiveness of these machines. In this report, a dedicated compute-engine called SMART (an acronym for Switchable Multiprocessor Architecture supporting Real Time applications) is presented. The machine attempts to speedup simulation of DSP algorithms by at least two orders of magnitude as compared to general purpose computer architectures. The DSP32C from AT&T Bell Labs, a high performance DSP processor with both floating point and fixed point instructions is used as the core processing unit to provide high computation power, resulting in an order of magnitude in speedup. An additional order of magnitude in speedup is obtained by exploiting the high degree of concurrency, namely pipelining and parallelism, present in most signal processing algorithms.


BibTeX citation:

@techreport{Yeung:M90/15,
    Author= {Yeung, A.K.W.},
    Title= {VLSI Implementation of a Configurable Multiprocessor System for DSP Behavioral Simulation},
    Year= {1990},
    Month= {Feb},
    Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/1990/1420.html},
    Number= {UCB/ERL M90/15},
    Abstract= {With the advent of digital computers, Digital Signal Processing (DSP) has become a dominant force in the fields of signal processing and communication.  Examples of such applications include digital audio, speech synthesis and recognition, telecommunication, image and video processing and robotics.  As the complexity of the algorithms increases, the task of verifying and optimizing them becomes formidable.  The process often requires high computation throughput and simulation of a large amount of data.  For example a computation rate of 800 MOPS or more is typical for High Definition Television (HDTV) algorithms.  Furthermore, to verify the behavior of the algorithms, many frames of data have to be simulated. These requirements dictate a hardware solution. While techniques such as bread-boarding and fast-prototyping can fulfill the requirements, they typically exhibit long development time and offer very little programmability which is important in optimizing the parameters of some algorithms.  Some commercial multiprocessor computers are capable of providing high computation power but the high overhead in inter-processor communications, difficulty in mapping the algorithms to the architecture, lack of instructions for supporting DSP applications and the usual high cost of the machines often limit the effectiveness of these machines. In this report, a dedicated compute-engine called SMART (an acronym for Switchable Multiprocessor Architecture supporting Real Time applications) is presented.  The machine attempts to speedup simulation of DSP algorithms by at least two orders of magnitude as compared to general purpose computer architectures.  The DSP32C from AT&T Bell Labs, a high performance DSP processor with both floating point and fixed point instructions is used as the core processing unit to provide high computation power, resulting in an order of magnitude in speedup.  An additional order of magnitude in speedup is obtained by exploiting the high degree of concurrency, namely pipelining and parallelism, present in most signal processing algorithms.},
}

EndNote citation:

%0 Report
%A Yeung, A.K.W. 
%T VLSI Implementation of a Configurable Multiprocessor System for DSP Behavioral Simulation
%I EECS Department, University of California, Berkeley
%D 1990
%@ UCB/ERL M90/15
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/1990/1420.html
%F Yeung:M90/15