Shuxian Wang

EECS Department, University of California, Berkeley

Technical Report No. UCB/EECS-2023-188

May 22, 2023

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2023/EECS-2023-188.pdf

The notion of query equivalence is of great significance in the theory and practise of database systems. Prior work in automated query equivalence checking under bag semantics set the first steps in formally modeling and reasoning about query optimization rules. However, many SQL features such as integrity constraints and NULL semantics are avoided or handled in ad hoc ways in prior attempts, resulting in many unsupported use cases in real world usages. Here we present a new framework for query equivalence checking, based on the U-semiring semantics, that leverage SMT solvers as the reasoning engine with improved modeling of SQL features. Empirically, our implementation can verify 235 out of the 415 query pairs extracted from the latest test suite of the Calcite data management framework, which is 2 times over the state-of-the-art tool (104/415). On the theoretical side, we show that our formalism admits a large query fragment in which query equivalence is decidable relative to an SMT solver, generalizing prior decidability results further.

Advisors: Alvin Cheung


BibTeX citation:

@mastersthesis{Wang:EECS-2023-188,
    Author= {Wang, Shuxian},
    Title= {Towards Practical SQL Equivalence Reasoning},
    School= {EECS Department, University of California, Berkeley},
    Year= {2023},
    Month= {May},
    Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2023/EECS-2023-188.html},
    Number= {UCB/EECS-2023-188},
    Abstract= {The notion of query equivalence is of great significance in the theory and practise of database systems. Prior work in automated query equivalence checking under bag semantics set the first steps in formally modeling and reasoning about query optimization rules. However, many SQL features such as integrity constraints and NULL semantics are avoided or handled in ad hoc ways in prior attempts, resulting in many unsupported use cases in real world usages. Here we present a new framework for query equivalence checking, based on the U-semiring semantics, that leverage SMT solvers as the reasoning engine with improved modeling of SQL features. Empirically, our implementation can verify 235 out of the 415 query pairs extracted from the latest test suite of the Calcite data management framework, which is 2 times over the state-of-the-art tool (104/415). On the theoretical side, we show that our formalism admits a large query fragment in which query equivalence is decidable relative to an SMT solver, generalizing prior decidability results further.},
}

EndNote citation:

%0 Thesis
%A Wang, Shuxian 
%T Towards Practical SQL Equivalence Reasoning
%I EECS Department, University of California, Berkeley
%D 2023
%8 May 22
%@ UCB/EECS-2023-188
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2023/EECS-2023-188.html
%F Wang:EECS-2023-188