Peter Alvaro and Tyson Condie and Khaled Elmeleegy and Joseph M. Hellerstein and Russell C Sears

EECS Department, University of California, Berkeley

Technical Report No. UCB/EECS-2009-111

August 10, 2009

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-111.pdf

Cloud computing makes clusters a commodity, creating the potential for a wide range of programmers to develop new scalable services. However, current cloud platforms do little to simplify truly distributed systems development. In this paper, we explore the use of a declarative, data-centric programming model to achieve this simplicity. We describe our experience using Overlog and Java to implement a "Big Data" analytics stack that is API-compatible with Hadoop and HDFS, with equivalent performance. We extended the system with complex features not yet available in Hadoop, including availability, scalability, and unique monitoring and debugging facilities. We present our experience to validate the enhanced programmer productivity afforded by declarative programming, and to inform the design of new development environments for distributed programming.


BibTeX citation:

@techreport{Alvaro:EECS-2009-111,
    Author= {Alvaro, Peter and Condie, Tyson and Elmeleegy, Khaled and Hellerstein, Joseph M. and Sears, Russell C},
    Title= {BOOM: Data-Centric Programming in the Datacenter},
    Year= {2009},
    Month= {Aug},
    Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-111.html},
    Number= {UCB/EECS-2009-111},
    Abstract= {Cloud computing makes clusters a commodity, creating the potential for a wide range of programmers to develop new scalable services. However, current cloud platforms do little to simplify truly distributed systems development.  In this paper, we explore the use of a declarative, data-centric programming model to achieve this simplicity. We describe our experience using Overlog and Java to implement a "Big Data" analytics stack that is API-compatible with Hadoop and HDFS, with equivalent performance.  We extended the system with complex features not yet available in Hadoop, including availability, scalability, and unique monitoring and debugging facilities. We present our experience to validate the enhanced programmer productivity afforded by declarative programming, and to inform the design of new development environments for distributed programming.},
}

EndNote citation:

%0 Report
%A Alvaro, Peter 
%A Condie, Tyson 
%A Elmeleegy, Khaled 
%A Hellerstein, Joseph M. 
%A Sears, Russell C 
%T BOOM: Data-Centric Programming in the Datacenter
%I EECS Department, University of California, Berkeley
%D 2009
%8 August 10
%@ UCB/EECS-2009-111
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-111.html
%F Alvaro:EECS-2009-111