Sicheng Pan

EECS Department, University of California, Berkeley

Technical Report No. UCB/EECS-2024-140

May 21, 2024

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2024/EECS-2024-140.pdf

A query optimizer searches for an efficient query plan within the space of its alternatives using a search algorithm. The scope of the search space used to be defined as part of the search algorithm, but the need to incorporate new optimization techniques constantly reshapes the search space, and this brings development challenges to correctly maintain the search algorithm while supporting the new optimization techniques.

The extensible query optimizer architecture is proposed to address this challenge by isolating the definition of search space from the search algorithm. Within this architecture, developers implement independent rewrite rules that describe equivalence relations among query plans, and the search space for the query optimizer is implicitly defined by the transitive closure of these equivalence relations. An extensible query optimizer explores the search space by repeatedly applying the rewrite rules it knows. To support a new optimization technique, developers only need to implement the corresponding rewrite rule for the query optimizer, without worrying about any changes to the search algorithm. This architecture is widely adopted in the designs of modern data pipelines because it simplifies the process of extending a query optimizer with new optimization techniques. However, the growth in the complexity of rewrite rules continues to bring development challenges in terms of implementation correctness and cost.

In this paper, we present RuleScript, a domain-specific language designed for expressing rewrite rules in an extensible query optimizer. Rules expressed in RuleScript are automatically verified for correctness, and they can be used to generate implementations of rewrite rules to be invoked by extensible query optimizers. Moreover, RuleScript can be extended to support custom symbols in rewrite rules, which are defined by users to represent query plan components specific to the backend query engine.

We have implemented a prototype of RuleScript in Java, and performed case studies with this implementation. We have verified rewrites rules in Apache Calcite and generated implementations that can be plugged into Apache Calcite codebase to illustrate the usability of our design.

Advisors: Alvin Cheung


BibTeX citation:

@mastersthesis{Pan:EECS-2024-140,
    Author= {Pan, Sicheng},
    Title= {Extensible Rule Language for Query Optimizer},
    School= {EECS Department, University of California, Berkeley},
    Year= {2024},
    Month= {May},
    Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2024/EECS-2024-140.html},
    Number= {UCB/EECS-2024-140},
    Abstract= {A query optimizer searches for an efficient query plan within the space of its alternatives using a search algorithm. The scope of the search space used to be defined as part of the search algorithm, but the need to incorporate new optimization techniques constantly reshapes the search space, and this brings development challenges to correctly maintain the search algorithm while supporting the new optimization techniques.

The extensible query optimizer architecture is proposed to address this challenge by isolating the definition of search space from the search algorithm. Within this architecture, developers implement independent rewrite rules that describe equivalence relations among query plans, and the search space for the query optimizer is implicitly defined by the transitive closure of these equivalence relations. An extensible query optimizer explores the search space by repeatedly applying the rewrite rules it knows. To support a new optimization technique, developers only need to implement the corresponding rewrite rule for the query optimizer, without worrying about any changes to the search algorithm.
    
This architecture is widely adopted in the designs of modern data pipelines because it simplifies the process of extending a query optimizer with new optimization techniques. However, the growth in the complexity of rewrite rules continues to bring development challenges in terms of implementation correctness and cost.

In this paper, we present RuleScript, a domain-specific language designed for expressing rewrite rules in an extensible query optimizer. Rules expressed in RuleScript are automatically verified for correctness, and they can be used to generate implementations of rewrite rules to be invoked by extensible query optimizers. Moreover, RuleScript can be extended to support custom symbols in rewrite rules, which are defined by users to represent query plan components specific to the backend query engine.

We have implemented a prototype of RuleScript in Java, and performed case studies with this implementation. We have verified rewrites rules in Apache Calcite and generated implementations that can be plugged into Apache Calcite codebase to illustrate the usability of our design.},
}

EndNote citation:

%0 Thesis
%A Pan, Sicheng 
%T Extensible Rule Language for Query Optimizer
%I EECS Department, University of California, Berkeley
%D 2024
%8 May 21
%@ UCB/EECS-2024-140
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2024/EECS-2024-140.html
%F Pan:EECS-2024-140