BOOM: Data-Centric Programming in the Datacenter

Peter Alvaro, Tyson Condie, Neil Conway, Khaled Elmeleegy, Joseph M. Hellerstein and Russell C Sears

EECS Department
University of California, Berkeley
Technical Report No. UCB/EECS-2009-98
July 9, 2009

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-98.pdf

Cloud computing makes datacenter clusters a commodity, potentially enabling a wide range of programmers to develop new scalable services. However, current cloud platforms do little to simplify truly distributed systems development. In this paper, we explore the use of a declarative, data-centric programming model to achieve this simplicity. We describe our experience using Overlog and Java to implement a "Big Data" analytics stack that is API-compatible with Hadoop and HDFS, with equivalent performance. We extended the system with complex features not yet available in Hadoop, including availability, scalability, and unique monitoring and debugging facilities. We present our experience to validate the enhanced programmer productivity afforded by declarative programming, and inform the design of new development environments for distributed programming.


BibTeX citation:

@techreport{Alvaro:EECS-2009-98,
    Author = {Alvaro, Peter and Condie, Tyson and Conway, Neil and Elmeleegy, Khaled and Hellerstein, Joseph M. and Sears, Russell C},
    Title = {BOOM: Data-Centric Programming in the Datacenter},
    Institution = {EECS Department, University of California, Berkeley},
    Year = {2009},
    Month = {Jul},
    URL = {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-98.html},
    Number = {UCB/EECS-2009-98},
    Abstract = {Cloud computing makes datacenter clusters a commodity, potentially enabling a wide range of programmers to develop new scalable services. However, current cloud platforms do little to simplify truly distributed systems development.  In this paper, we explore the use of a declarative, data-centric programming model to achieve this simplicity. We describe our experience using Overlog and Java to implement a "Big Data" analytics stack that is API-compatible with Hadoop and HDFS, with equivalent performance.  We extended the system with complex features not yet available in Hadoop, including availability, scalability, and unique monitoring and debugging facilities. We present our experience to validate the enhanced programmer productivity afforded by declarative programming, and inform the design of new development environments for distributed programming.}
}

EndNote citation:

%0 Report
%A Alvaro, Peter
%A Condie, Tyson
%A Conway, Neil
%A Elmeleegy, Khaled
%A Hellerstein, Joseph M.
%A Sears, Russell C
%T BOOM: Data-Centric Programming in the Datacenter
%I EECS Department, University of California, Berkeley
%D 2009
%8 July 9
%@ UCB/EECS-2009-98
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-98.html
%F Alvaro:EECS-2009-98