Co-design of Algorithms, Hardware, and Scheduling for Deep Learning Applications

Qijing Huang

EECS Department, University of California, Berkeley

Technical Report No. UCB/EECS-2021-202

August 16, 2021

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2021/EECS-2021-202.pdf

For decades, ever-increasing computing power has been a driving force behind many technology revolutions, including the recent advances in artificial intelligence. However, due to the slowing of integrated circuit process scaling, for system architects to continue to satisfy the ever-growing compute appetite of today's applications, they must now resort to employing heterogeneous systems with specialized accelerators.

Building these accelerator systems, though, is extremely expensive and time-consuming. First, the development cycle for hardware is notoriously long, making it difficult to keep up with the rapid progress in algorithms. Meanwhile, existing compilers are incapable of navigating the intractable mapping space exposed by the novel accelerator architectures. Lastly, algorithms are often designed without hardware efficiency as a key metric, and therefore, pose extra challenges in designing efficient hardware.

This thesis tackles the significant challenges in jointly designing and optimizing algorithms, scheduling, and hardware designs for acceleration. We aim to advance the state-of-the-art through a three-pronged approach: the development of methodologies and tools that automatically generate accelerator systems from high-level abstractions, shortening the hardware development cycle; the adaptation of machine learning and other optimization techniques to improve accelerator design and compilation flows; and the co-design of algorithms and accelerators to exploit more optimization opportunities.

The target application domain of this thesis is deep learning which has achieved unprecedented success in a wide range of tasks such as computer vision, neural language processing, etc. As intelligent devices prevail, deep learning is foreseeably becoming a major computation demand in our everyday life. Therefore, by performing end-to-end system optimization with hardware acceleration, the dissertation aims to unleash the ubiquitous adoption of cutting-edge deep learning algorithms to transform various aspects of life.

Advisors: John Wawrzynek

BibTeX citation:

@phdthesis{Huang:EECS-2021-202,
    Author= {Huang, Qijing},
    Title= {Co-design of Algorithms, Hardware, and Scheduling for Deep Learning Applications},
    School= {EECS Department, University of California, Berkeley},
    Year= {2021},
    Month= {Aug},
    Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2021/EECS-2021-202.html},
    Number= {UCB/EECS-2021-202},
    Abstract= {For decades, ever-increasing computing power has been a driving force behind many technology revolutions, including the recent advances in artificial intelligence. 
However, due to the slowing of integrated circuit process scaling, for system architects to continue to satisfy the ever-growing compute appetite of today's applications, they must now resort to employing heterogeneous systems with specialized accelerators.

Building these accelerator systems, though, is extremely expensive and time-consuming.
First, the development cycle for hardware is notoriously long, making it difficult to keep up with the rapid progress in algorithms. 
Meanwhile, existing compilers are incapable of navigating the intractable mapping space exposed by the novel accelerator architectures. 
Lastly, algorithms are often designed without hardware efficiency as a key metric, and therefore, pose extra challenges in designing efficient hardware. 

This thesis tackles the significant challenges in jointly designing and optimizing algorithms, scheduling, and hardware designs for acceleration. 
We aim to advance the state-of-the-art through a three-pronged approach: the development of methodologies and tools that automatically generate accelerator systems from high-level abstractions, shortening the hardware development cycle; the adaptation of machine learning and other optimization techniques to improve accelerator design and compilation flows; and the co-design of algorithms and accelerators to exploit more optimization opportunities. 

The target application domain of this thesis is deep learning which has achieved unprecedented success in a wide range of tasks such as computer vision, neural language processing, etc.
As intelligent devices prevail, deep learning is foreseeably becoming a major computation demand in our everyday life.   
Therefore, by performing end-to-end system optimization with hardware acceleration, the dissertation aims to unleash the ubiquitous adoption of cutting-edge deep learning algorithms to transform various aspects of life.},
}

EndNote citation:

%0 Thesis
%A Huang, Qijing 
%T Co-design of Algorithms, Hardware, and Scheduling for Deep Learning Applications
%I EECS Department, University of California, Berkeley
%D 2021
%8 August 16
%@ UCB/EECS-2021-202
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2021/EECS-2021-202.html
%F Huang:EECS-2021-202