Systems for Using Far Memory in Datacenters

Emmanuel Amaro

EECS Department
University of California, Berkeley
Technical Report No. UCB/EECS-2021-268
December 23, 2021

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2021/EECS-2021-268.pdf

Datacenter efficiency has become increasingly relevant, as the end of Moore's Law and Dennard scaling have caused CPU and memory performance to begin plateauing. Resource disaggregation is a recent datacenter design point, where server nodes share remote resources through a fast (usually RDMA-based) network, enabling greater execution flexibility and performance in datacenters. Remote or far memory--an instance of resource disaggregation--increases flexibility because nodes can access more memory than locally available. And performance in distributed applications can improve as RDMA provides high-performance access to shared state. This dissertation describes two networked systems that allow server nodes in a data center to leverage far memory.

First, WICkit is a framework and runtime for Where-Independent Code. WICs are a location-independent abstraction representing complex remote memory accesses, e.g., accessing a value in a hashmap. Without code changes, the WICkit runtime can execute WICs at the client, server, and SmartNIC CPU locations. As different locations provide different performance and resource trade-offs, WICkit allows users to flexibly choose the location when execution begins while obtaining comparable performance to location-specific systems.

Second, Cluster Far Memory is a system that transparently allows existing jobs to access far memory. CFM includes a fast swapping mechanism and a far memory-aware job scheduler that enable far memory support at rack scale. Using CFM for memory-intensive workloads, a rack can improve its throughput on the order of 10% or more without increasing the total amount of memory in it.

Advisor: Scott Shenker


BibTeX citation:

@phdthesis{Amaro:EECS-2021-268,
    Author = {Amaro, Emmanuel},
    Title = {Systems for Using Far Memory in Datacenters},
    School = {EECS Department, University of California, Berkeley},
    Year = {2021},
    Month = {Dec},
    URL = {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2021/EECS-2021-268.html},
    Number = {UCB/EECS-2021-268},
    Abstract = {Datacenter efficiency has become increasingly relevant, as the end of Moore's Law and Dennard scaling have caused CPU and memory performance to begin plateauing. Resource disaggregation is a recent datacenter design point, where server nodes share remote resources through a fast (usually RDMA-based) network, enabling greater execution flexibility and performance in datacenters. Remote or far memory--an instance of resource disaggregation--increases flexibility because nodes can access more memory than locally available. And performance in distributed applications can improve as RDMA provides high-performance access to shared state. This dissertation describes two networked systems that allow server nodes in a data center to leverage far memory. 

First, WICkit is a framework and runtime for Where-Independent Code. WICs are a location-independent abstraction representing complex remote memory accesses, e.g., accessing a value in a hashmap. Without code changes, the WICkit runtime can execute WICs at the client, server, and SmartNIC CPU locations. As different locations provide different performance and resource trade-offs, WICkit allows users to flexibly choose the location when execution begins while obtaining comparable performance to location-specific systems.

Second, Cluster Far Memory is a system that transparently allows existing jobs to access far memory. CFM includes a fast swapping mechanism and a far memory-aware job scheduler that enable far memory support at rack scale. Using CFM for memory-intensive workloads, a rack can improve its throughput on the order of 10% or more without increasing the total amount of memory in it.}
}

EndNote citation:

%0 Thesis
%A Amaro, Emmanuel
%T Systems for Using Far Memory in Datacenters
%I EECS Department, University of California, Berkeley
%D 2021
%8 December 23
%@ UCB/EECS-2021-268
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2021/EECS-2021-268.html
%F Amaro:EECS-2021-268