Data-Centric Scientific Workflow Management Systems
David T Liu
EECS Department, University of California, Berkeley
Technical Report No. UCB/EECS-2007-83
June 15, 2007
http://www2.eecs.berkeley.edu/Pubs/TechRpts/2007/EECS-2007-83.pdf
Recent trends in science and technology augur a rapid increase in the number of computations being employed by scientists. Accompanying increased volumes are growing expectations for the tools that scientists use to handle their computations. These increased volumes and expectations present a new set of problems and opportunities in computation management. In this thesis, I propose Data Centric Scientific Workflow Management Systems (DSWMSs) to address these issues. DSWMSs supersede current approaches by leveraging a deeper understanding of the data manipulated by computations to provide new features and improve usability and performance. Examples of such features include data provenance, work sharing, and interactive computational steering. In this thesis, I make several contributions towards realizing the concept of a DSWMS. First, in conjunction with scientists from several scientific domains, I propose a set of services that are not provided by current paradigms, but are made possible in DSWMSs. Second, I dene an abstract model, the Functional Data Model with Relational Covers (FDM/RC), for representing scientific workloads and a language for dening and manipulating instances (schemas) of the model. Third, I design and implement GridDB, a prototype DSWMS. GridDB is deployed on a large cluster at Lawrence Livermore National Laboratories where it runs science applications at real-world scales. The deployment uncovers a pair of technical problems involving the provisioning of data provenance and memoization (computational caching) so I also contribute solutions to these problems.
Advisors: Michael Franklin
BibTeX citation:
@phdthesis{Liu:EECS-2007-83, Author= {Liu, David T}, Title= {Data-Centric Scientific Workflow Management Systems}, School= {EECS Department, University of California, Berkeley}, Year= {2007}, Month= {Jun}, Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2007/EECS-2007-83.html}, Number= {UCB/EECS-2007-83}, Abstract= {Recent trends in science and technology augur a rapid increase in the number of computations being employed by scientists. Accompanying increased volumes are growing expectations for the tools that scientists use to handle their computations. These increased volumes and expectations present a new set of problems and opportunities in computation management. In this thesis, I propose Data Centric Scientific Workflow Management Systems (DSWMSs) to address these issues. DSWMSs supersede current approaches by leveraging a deeper understanding of the data manipulated by computations to provide new features and improve usability and performance. Examples of such features include data provenance, work sharing, and interactive computational steering. In this thesis, I make several contributions towards realizing the concept of a DSWMS. First, in conjunction with scientists from several scientific domains, I propose a set of services that are not provided by current paradigms, but are made possible in DSWMSs. Second, I dene an abstract model, the Functional Data Model with Relational Covers (FDM/RC), for representing scientific workloads and a language for dening and manipulating instances (schemas) of the model. Third, I design and implement GridDB, a prototype DSWMS. GridDB is deployed on a large cluster at Lawrence Livermore National Laboratories where it runs science applications at real-world scales. The deployment uncovers a pair of technical problems involving the provisioning of data provenance and memoization (computational caching) so I also contribute solutions to these problems.}, }
EndNote citation:
%0 Thesis %A Liu, David T %T Data-Centric Scientific Workflow Management Systems %I EECS Department, University of California, Berkeley %D 2007 %8 June 15 %@ UCB/EECS-2007-83 %U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2007/EECS-2007-83.html %F Liu:EECS-2007-83