Scalable Specialization for Domain-Specific SoCs
Seah Kim
EECS Department, University of California, Berkeley
Technical Report No. UCB/EECS-2025-222
December 19, 2025
http://www2.eecs.berkeley.edu/Pubs/TechRpts/2025/EECS-2025-222.pdf
Modern edge, client, and robotics platforms increasingly rely on domain-specific systems-on-chip (SoCs) that combine general-purpose CPUs with many accelerators for machine learning and perception workloads. At the same time, the slowdown of technology scaling and the rapid evolution of models and software stacks make it difficult to design accelerator-rich SoCs that remain effective as applications and deployment scenarios change. This dissertation develops system approaches for scalable specialization in heterogeneous SoCs and shows how to make specialized accelerators deliver high performance and continue to be effective as workloads, resource budgets, and system configurations evolve.
The dissertation argues that scalable specialization is fundamentally a full-system problem. Specialized accelerators are most effective when architects can co-explore accelerator designs and system effects with full-system pre-silicon evaluation, when accelerators are integrated and virtualized as a shared pool that software can orchestrate dynamically, and when algorithms are co-designed with hardware and runtime to handle dynamic workloads under real-time constraints. Guided by this full-system view, the dissertation develops three technical thrusts and an end-to-end silicon prototype.
First, Gemmini and MoCA form a full-system methodology for many-accelerator SoCs. Gemmini is an open-source generator that sweeps a wide space of DNN accelerators and RISC-V SoCs and evaluates them under realistic workloads. MoCA extends this platform with a multi-tenant runtime that detects interference and manages accelerators and shared memory resources to improve latency target satisfaction for co-located workloads. Second, AuRORA addresses integration by virtualizing accelerators through a redesigned CPU-accelerator interface that lets user software acquire, release, and rebind accelerators with low overhead while preserving a tight-coupled programming model. Third, SuperNoVA co-designs a resource-aware SLAM stack, which includes algorithm, runtime, and programmable sparse-linear-algebra accelerators, to deliver bounded-latency performance while preserving accuracy under dynamic workloads. Finally, the MAVERIC chip composes these ideas into a heterogeneous robotics SoC with RISC-V CPUs, INT8 and FP32 accelerators, a shared on-chip memory system, and a multi-layered network-on-chip, demonstrating scalable specialization in silicon for demanding heterogeneous robotics workloads.
Advisors: Borivoje Nikolic and Sophia Shao
BibTeX citation:
@phdthesis{Kim:EECS-2025-222,
Author= {Kim, Seah},
Title= {Scalable Specialization for Domain-Specific SoCs},
School= {EECS Department, University of California, Berkeley},
Year= {2025},
Month= {Dec},
Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2025/EECS-2025-222.html},
Number= {UCB/EECS-2025-222},
Abstract= {Modern edge, client, and robotics platforms increasingly rely on domain-specific systems-on-chip (SoCs) that combine general-purpose CPUs with many accelerators for machine learning and perception workloads. At the same time, the slowdown of technology scaling and the rapid evolution of models and software stacks make it difficult to design accelerator-rich SoCs that remain effective as applications and deployment scenarios change. This dissertation develops system approaches for scalable specialization in heterogeneous SoCs and shows how to make specialized accelerators deliver high performance and continue to be effective as workloads, resource budgets, and system configurations evolve.
The dissertation argues that scalable specialization is fundamentally a full-system problem. Specialized accelerators are most effective when architects can co-explore accelerator designs and system effects with full-system pre-silicon evaluation, when accelerators are integrated and virtualized as a shared pool that software can orchestrate dynamically, and when algorithms are co-designed with hardware and runtime to handle dynamic workloads under real-time constraints. Guided by this full-system view, the dissertation develops three technical thrusts and an end-to-end silicon prototype.
First, Gemmini and MoCA form a full-system methodology for many-accelerator SoCs. Gemmini is an open-source generator that sweeps a wide space of DNN accelerators and RISC-V SoCs and evaluates them under realistic workloads. MoCA extends this platform with a multi-tenant runtime that detects interference and manages accelerators and shared memory resources to improve latency target satisfaction for co-located workloads. Second, AuRORA addresses integration by virtualizing accelerators through a redesigned CPU-accelerator interface that lets user software acquire, release, and rebind accelerators with low overhead while preserving a tight-coupled programming model. Third, SuperNoVA co-designs a resource-aware SLAM stack, which includes algorithm, runtime, and programmable sparse-linear-algebra accelerators, to deliver bounded-latency performance while preserving accuracy under dynamic workloads. Finally, the MAVERIC chip composes these ideas into a heterogeneous robotics SoC with RISC-V CPUs, INT8 and FP32 accelerators, a shared on-chip memory system, and a multi-layered network-on-chip, demonstrating scalable specialization in silicon for demanding heterogeneous robotics workloads.},
}
EndNote citation:
%0 Thesis %A Kim, Seah %T Scalable Specialization for Domain-Specific SoCs %I EECS Department, University of California, Berkeley %D 2025 %8 December 19 %@ UCB/EECS-2025-222 %U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2025/EECS-2025-222.html %F Kim:EECS-2025-222