Systems for Using Far Memory in Datacenters
Emmanuel Amaro
EECS Department, University of California, Berkeley
Technical Report No. UCB/EECS-2021-268
December 23, 2021
http://www2.eecs.berkeley.edu/Pubs/TechRpts/2021/EECS-2021-268.pdf
Datacenter efficiency has become increasingly relevant, as the end of Moore's Law and Dennard scaling have caused CPU and memory performance to begin plateauing. Resource disaggregation is a recent datacenter design point, where server nodes share remote resources through a fast (usually RDMA-based) network, enabling greater execution flexibility and performance in datacenters. Remote or far memory--an instance of resource disaggregation--increases flexibility because nodes can access more memory than locally available. And performance in distributed applications can improve as RDMA provides high-performance access to shared state. This dissertation describes two networked systems that allow server nodes in a data center to leverage far memory.
First, WICkit is a framework and runtime for Where-Independent Code. WICs are a location-independent abstraction representing complex remote memory accesses, e.g., accessing a value in a hashmap. Without code changes, the WICkit runtime can execute WICs at the client, server, and SmartNIC CPU locations. As different locations provide different performance and resource trade-offs, WICkit allows users to flexibly choose the location when execution begins while obtaining comparable performance to location-specific systems.
Second, Cluster Far Memory is a system that transparently allows existing jobs to access far memory. CFM includes a fast swapping mechanism and a far memory-aware job scheduler that enable far memory support at rack scale. Using CFM for memory-intensive workloads, a rack can improve its throughput on the order of 10% or more without increasing the total amount of memory in it.
Advisors: Scott Shenker
BibTeX citation:
@phdthesis{Amaro:EECS-2021-268, Author= {Amaro, Emmanuel}, Title= {Systems for Using Far Memory in Datacenters}, School= {EECS Department, University of California, Berkeley}, Year= {2021}, Month= {Dec}, Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2021/EECS-2021-268.html}, Number= {UCB/EECS-2021-268}, Abstract= {Datacenter efficiency has become increasingly relevant, as the end of Moore's Law and Dennard scaling have caused CPU and memory performance to begin plateauing. Resource disaggregation is a recent datacenter design point, where server nodes share remote resources through a fast (usually RDMA-based) network, enabling greater execution flexibility and performance in datacenters. Remote or far memory--an instance of resource disaggregation--increases flexibility because nodes can access more memory than locally available. And performance in distributed applications can improve as RDMA provides high-performance access to shared state. This dissertation describes two networked systems that allow server nodes in a data center to leverage far memory. First, WICkit is a framework and runtime for Where-Independent Code. WICs are a location-independent abstraction representing complex remote memory accesses, e.g., accessing a value in a hashmap. Without code changes, the WICkit runtime can execute WICs at the client, server, and SmartNIC CPU locations. As different locations provide different performance and resource trade-offs, WICkit allows users to flexibly choose the location when execution begins while obtaining comparable performance to location-specific systems. Second, Cluster Far Memory is a system that transparently allows existing jobs to access far memory. CFM includes a fast swapping mechanism and a far memory-aware job scheduler that enable far memory support at rack scale. Using CFM for memory-intensive workloads, a rack can improve its throughput on the order of 10% or more without increasing the total amount of memory in it.}, }
EndNote citation:
%0 Thesis %A Amaro, Emmanuel %T Systems for Using Far Memory in Datacenters %I EECS Department, University of California, Berkeley %D 2021 %8 December 23 %@ UCB/EECS-2021-268 %U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2021/EECS-2021-268.html %F Amaro:EECS-2021-268