Algebraic Approaches to Distributed Data Systems

Conor Power

EECS Department, University of California, Berkeley

Technical Report No. UCB/EECS-2025-103

May 16, 2025

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2025/EECS-2025-103.pdf

With the rise of cloud computing, software systems have become increasingly distributed. Distributed systems offer myriad benefits such as scalability, availability, and fault tolerance. However, they introduce complexity for the programmers of these systems to ensure correctness and hide non-determinism from the end-user. To address this challenge of programming cloud-scale systems, the Hydro project at Berkeley explores bringing declarative programming to the distributed systems space. Declarative programming has had enormous success in the field of databases in the form of SQL. Its benefit is that it allows developers to specify their goals at a high level and leave complex implementation decisions up to the database system.

In this thesis, we explore the marriage of these two worlds: distributed systems programming and declarative database systems. In pursuit of this marriage, we study independent trends towards algebraic models in distributed systems and database systems. This thesis extends these works, explores the relationship between them, and demonstrates the practical applicability of algebraic properties to optimizing distributed data systems. In particular, we study four lines of research on algebraic properties for distributed data systems: conflict-free replicated data types (CRDTs), algebraic models of incremental view maintenance (IVM), parallel database aggregates, and the CALM Theorem. While these topics have been studied under different formalisms across different research communities, we are able to build bridges between them. We are able to bring the system model and mathematical model of CRDTs, studied in the distributed systems and programming languages communities, to these three other topics that have been studied entirely within the databases research community. The result is a foundation on which to support the benefits of declarativity in distributed systems programming.

Advisors: Joseph M. Hellerstein

BibTeX citation:

@phdthesis{Power:EECS-2025-103,
Author= {Power, Conor},
Title= {Algebraic Approaches to Distributed Data Systems},
School= {EECS Department, University of California, Berkeley},
Year= {2025},
Month= {May},
Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2025/EECS-2025-103.html},
Number= {UCB/EECS-2025-103},
Abstract= {With the rise of cloud computing, software systems have become increasingly distributed. Distributed systems offer myriad benefits such as scalability, availability, and fault tolerance. However, they introduce complexity for the programmers of these systems to ensure correctness and hide non-determinism from the end-user. To address this challenge of programming cloud-scale systems, the Hydro project at Berkeley explores bringing declarative programming to the distributed systems space. Declarative programming has had enormous success in the field of databases in the form of SQL. Its benefit is that it allows developers to specify their goals at a high level and leave complex implementation decisions up to the database system.

In particular, we study four lines of research on algebraic properties for distributed data systems: conflict-free replicated data types (CRDTs), algebraic models of incremental view maintenance (IVM), parallel database aggregates, and the CALM Theorem. While these topics have been studied under different formalisms across different research communities, we are able to build bridges between them. We are able to bring the system model and mathematical model of CRDTs, studied in the distributed systems and programming languages communities, to these three other topics that have been studied entirely within the databases research community. The result is a foundation on which to support the benefits of declarativity in distributed systems programming.},
}

EndNote citation:

%0 Thesis
%A Power, Conor 
%T Algebraic Approaches to Distributed Data Systems
%I EECS Department, University of California, Berkeley
%D 2025
%8 May 16
%@ UCB/EECS-2025-103
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2025/EECS-2025-103.html
%F Power:EECS-2025-103