BOOM: Data-Centric Programming in the Datacenter

Peter Alvaro, Tyson Condie, Khaled Elmeleegy, Joseph M. Hellerstein and Russell C Sears

EECS Department
University of California, Berkeley
Technical Report No. UCB/EECS-2009-111
August 10, 2009

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-111.pdf

Cloud computing makes clusters a commodity, creating the potential for a wide range of programmers to develop new scalable services. However, current cloud platforms do little to simplify truly distributed systems development. In this paper, we explore the use of a declarative, data-centric programming model to achieve this simplicity. We describe our experience using Overlog and Java to implement a "Big Data" analytics stack that is API-compatible with Hadoop and HDFS, with equivalent performance. We extended the system with complex features not yet available in Hadoop, including availability, scalability, and unique monitoring and debugging facilities. We present our experience to validate the enhanced programmer productivity afforded by declarative programming, and to inform the design of new development environments for distributed programming.


BibTeX citation:

@techreport{Alvaro:EECS-2009-111,
    Author = {Alvaro, Peter and Condie, Tyson and Elmeleegy, Khaled and Hellerstein, Joseph M. and Sears, Russell C},
    Title = {BOOM: Data-Centric Programming in the Datacenter},
    Institution = {EECS Department, University of California, Berkeley},
    Year = {2009},
    Month = {Aug},
    URL = {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-111.html},
    Number = {UCB/EECS-2009-111},
    Abstract = {Cloud computing makes clusters a commodity, creating the potential for a wide range of programmers to develop new scalable services. However, current cloud platforms do little to simplify truly distributed systems development.  In this paper, we explore the use of a declarative, data-centric programming model to achieve this simplicity. We describe our experience using Overlog and Java to implement a "Big Data" analytics stack that is API-compatible with Hadoop and HDFS, with equivalent performance.  We extended the system with complex features not yet available in Hadoop, including availability, scalability, and unique monitoring and debugging facilities. We present our experience to validate the enhanced programmer productivity afforded by declarative programming, and to inform the design of new development environments for distributed programming.}
}

EndNote citation:

%0 Report
%A Alvaro, Peter
%A Condie, Tyson
%A Elmeleegy, Khaled
%A Hellerstein, Joseph M.
%A Sears, Russell C
%T BOOM: Data-Centric Programming in the Datacenter
%I EECS Department, University of California, Berkeley
%D 2009
%8 August 10
%@ UCB/EECS-2009-111
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-111.html
%F Alvaro:EECS-2009-111