An Architecture for a Widely Distributed Storage and Communication Infrastructure

Nitesh Mor, Eric Allman, Richard Pratt, Kenneth Lutz and John D. Kubiatowicz

EECS Department
University of California, Berkeley
Technical Report No. UCB/EECS-2018-130
August 28, 2018

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2018/EECS-2018-130.pdf

With the advancement of technology, richer computation devices are making their way into everyday life. However, such smarter devices merely act as a source and sink of information; the storage of information is highly centralized in data-centers in today's world. Even though such data-centers allow for amortization of cost per bit of information, the density and distribution of such data-centers is not necessarily representative of human population density. This disparity of where the information is produced and consumed vs where it is stored only slightly affects the applications of today, but it will be the limiting factor for applications of tomorrow.

The computation resources at the edge are more powerful than ever, and present an opportunity to address this disparity. We envision that a seamless combination of these edge-resources with the data-center resources is the way forward. However, the resulting issues of trust and data-security are not easy to solve. Toward this vision of a federated infrastructure composed of resources at the edge as well as those in data-centers, we describe the architecture and design of a widely distributed system for data storage and communication that attempts to alleviate some of these data security challenges; we call this system the Global Data Plane (GDP).1

The core idea of GDP is a secure single-writer append-only log, which provides a layer of uniformity on top of a heterogeneous infrastructure.2 This secure single-writer log is a unified storage and communication primitive that represents a refactoring of interfaces. Such a refactoring enables cleaner application design that allows for better security analysis of information flows. Not only cleaner design, GDP also enables locality of access for performance and data privacy—an ever growing concern in the information age.

  1. Note that Global Data Plane is an ongoing project and is evolving continuously. This document is an adaptation of a doctoral thesis proposal and merely documents the ideas at a certain point in time. For more current information, please see the project web-page at https://gdp.cs.berkeley.edu.
  2. Since the time of this writing, we have changed the name of 'secure single-writer append-only log' to DataCapsule in the GDP. We believe that this change of terminology better reflects the desired properties and avoids the confusion with a simple log file.


BibTeX citation:

@techreport{Mor:EECS-2018-130,
    Author = {Mor, Nitesh and Allman, Eric and Pratt, Richard and Lutz, Kenneth and Kubiatowicz, John D.},
    Title = {An Architecture for a Widely Distributed Storage and Communication Infrastructure},
    Institution = {EECS Department, University of California, Berkeley},
    Year = {2018},
    Month = {Aug},
    URL = {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2018/EECS-2018-130.html},
    Number = {UCB/EECS-2018-130},
    Abstract = {With the advancement of technology, richer computation devices are making their way into everyday life. However, such smarter devices merely act as a source and sink of information; the storage of information is highly centralized in data-centers in today's world. Even though such data-centers allow for amortization of cost per bit of information, the density and distribution of such data-centers is not necessarily representative of human population density. This disparity of where the information is produced and consumed vs where it is stored only slightly affects the applications of today, but it will be the limiting factor for applications of tomorrow.

The computation resources at the edge are more powerful than ever, and present an opportunity to address this disparity. We envision that a seamless combination of these edge-resources with the data-center resources is the way forward. However, the resulting issues of trust and data-security are not easy to solve. Toward this vision of a federated infrastructure composed of resources at the edge as well as those in data-centers, we describe the architecture and design of a widely distributed system for data storage and communication that attempts to alleviate some of these data security challenges; we call this system the Global Data Plane (GDP).<sup>1</sup>

The core idea of GDP is a secure single-writer append-only log, which provides a layer of uniformity on top of a heterogeneous infrastructure.<sup>2</sup> This secure single-writer log is a unified storage and communication primitive that represents a refactoring of interfaces. Such a refactoring enables cleaner application design that allows for better security analysis of information flows. Not only cleaner design, GDP also enables locality of access for performance and data privacy&#8212;an ever growing concern in the information age.

<small>
<ol>
<li> Note that Global Data Plane is an ongoing project and is evolving continuously. This document is an adaptation of a doctoral thesis proposal and merely documents the ideas at a certain point in time. For more current information, please see the project web-page at <a href="https://gdp.cs.berkeley.edu">https://gdp.cs.berkeley.edu</a>. </li>
<li>Since the time of this writing, we have changed the name of 'secure single-writer append-only log' to DataCapsule in the GDP. We believe that this change of terminology better reflects the desired properties and avoids the confusion with a simple log file.</li>
</ol>
</small>}
}

EndNote citation:

%0 Report
%A Mor, Nitesh
%A Allman, Eric
%A Pratt, Richard
%A Lutz, Kenneth
%A Kubiatowicz, John D.
%T An Architecture for a Widely Distributed Storage and Communication Infrastructure
%I EECS Department, University of California, Berkeley
%D 2018
%8 August 28
%@ UCB/EECS-2018-130
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2018/EECS-2018-130.html
%F Mor:EECS-2018-130