Nitesh Mor

EECS Department, University of California, Berkeley

Technical Report No. UCB/EECS-2020-10

January 10, 2020

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2020/EECS-2020-10.pdf

With the advancement of technology, richer computation devices are making their way into everyday life. However, such smarter devices merely act as a source and sink of information; the storage of information is highly centralized in data-centers in today’s world. Even though such data-centers allow for amortization of cost per bit of information, the density and distribution of such data-centers is not necessarily representative of human population density. This disparity of where the information is produced and consumed vs where it is stored only slightly affects the applications of today, but it will be the limiting factor for applications of tomorrow.

The computation resources at the edge are more powerful than ever, and present an opportunity to address this disparity. We envision that a seamless combination of these edge-resources with the data-center resources is the way forward. However, the resulting issues of trust and data-security are not easy to solve in a world full of complexity. Toward this vision of a federated infrastructure composed of resources at the edge as well as those in data-centers, we describe the architecture and design of a widely distributed system for data storage and communication that attempts to alleviate some of these data security challenges; we call this system the Global Data Plane (GDP).

The key abstraction in the GDP is a secure cohesive container of information called a DataCapsule, which provides a layer of uniformity on top of a heterogeneous infrastructure. A DataCapsule represents a secure history of transactions in a persistent form that can be used for building other applications on top. Existing applications can be refactored to use DataCapsules as the ground truth of persistent state; such a refactoring enables cleaner application design that allows for better security analysis of information flows. Not only cleaner design, the GDP also enables locality of access for performance and data privacy—an ever growing concern in the information age.

The DataCapsules are enabled by an underlying routing fabric, called the GDP network, which provides secure routing for datagrams in a flat namespace. The GDP network is a core component of the GDP that enables various GDP components to interact with each other. In addition to the DataCapsules, this underlying network is available to applications for native communication as well. Flat namespace networks are known to provide a number of desirable properties, such as location independence, built-in multicast, etc. However, existing architectures for such networks suffer from routing security issues, typically because malicious entities can claim to possess arbitrary names and thus, receive traffic intended for arbitrary destinations. GDP network takes a different approach by defining an ownership of the name and the associated mechanisms for participants to delegate routing for such names to others. By directly integrating with GDP network, applications can enjoy the benefits of flat namespace networks without compromising routing security.

The Global Data Plane and DataCapsules together represent our vision for secure ubiquitous storage. As opposed to the current approach of perimeter security for infrastructure, i.e. drawing a perimeter around parts of infrastructure and trusting everything inside it, our vision is to use cryptographic tools to enable intrinsic security for the information itself regardless of the context in which such information lives. In this dissertation, we show how to make this vision a reality, and how to adapt real world applications to reap the benefits of secure ubiquitous storage.

Advisors: John D. Kubiatowicz


BibTeX citation:

@phdthesis{Mor:EECS-2020-10,
    Author= {Mor, Nitesh},
    Title= {Global Data Plane: A Widely Distributed Storage and Communication Infrastructure},
    School= {EECS Department, University of California, Berkeley},
    Year= {2020},
    Month= {Jan},
    Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2020/EECS-2020-10.html},
    Number= {UCB/EECS-2020-10},
    Abstract= {With the advancement of technology, richer computation devices are making their way into everyday life. However, such smarter devices merely act as a source and sink of information; the storage of information is highly centralized in data-centers in today’s world. Even though such data-centers allow for amortization of cost per bit of information, the density and distribution of such data-centers is not necessarily representative of human population density. This disparity of where the information is produced and consumed vs where it is stored only slightly affects the applications of today, but it will be the limiting factor for applications of tomorrow.

The computation resources at the edge are more powerful than ever, and present an opportunity to address this disparity. We envision that a seamless combination of these edge-resources with the data-center resources is the way forward. However, the resulting issues of trust and data-security are not easy to solve in a world full of complexity. Toward this vision of a federated infrastructure composed of resources at the edge as well as those in data-centers, we describe the architecture and design of a widely distributed system for data storage and communication that attempts to alleviate some of these data security challenges; we call this system the Global Data Plane (GDP).

The key abstraction in the GDP is a secure cohesive container of information called a DataCapsule, which provides a layer of uniformity on top of a heterogeneous infrastructure. A DataCapsule represents a secure history of transactions in a persistent form that can be used for building other applications on top. Existing applications can be refactored to use DataCapsules as the ground truth of persistent state; such a refactoring enables cleaner application design that allows for better security analysis of information flows. Not only cleaner design, the GDP also enables locality of access for performance and data privacy—an ever growing concern in the information age.

The DataCapsules are enabled by an underlying routing fabric, called the GDP network, which provides secure routing for datagrams in a flat namespace. The GDP network is a core component of the GDP that enables various GDP components to interact with each other. In addition to the DataCapsules, this underlying network is available to applications for native communication as well. Flat namespace networks are known to provide a number of desirable properties, such as location independence, built-in multicast, etc. However, existing architectures for such networks suffer from routing security issues, typically because malicious entities can claim to possess arbitrary names and thus, receive traffic intended for arbitrary destinations. GDP network takes a different approach by defining an ownership of the name and the associated mechanisms for participants to delegate routing for such names to others. By directly integrating with GDP network, applications can enjoy the benefits of flat namespace networks without compromising routing security.

The Global Data Plane and DataCapsules together represent our vision for secure ubiquitous storage. As opposed to the current approach of perimeter security for infrastructure, i.e. drawing a perimeter around parts of infrastructure and trusting everything inside it, our vision is to use cryptographic tools to enable intrinsic security for the information itself regardless of the context in which such information lives. In this dissertation, we show how to make this vision a reality, and how to adapt real world applications to reap the benefits of secure ubiquitous storage.},
}

EndNote citation:

%0 Thesis
%A Mor, Nitesh 
%T Global Data Plane: A Widely Distributed Storage and Communication Infrastructure
%I EECS Department, University of California, Berkeley
%D 2020
%8 January 10
%@ UCB/EECS-2020-10
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2020/EECS-2020-10.html
%F Mor:EECS-2020-10