Efficient Sampling of SAT and SMT Solutions for Testing and Verification

Rafael Dutra

EECS Department, University of California, Berkeley

Technical Report No. UCB/EECS-2019-167

December 3, 2019

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2019/EECS-2019-167.pdf

The problem of generating a large number of diverse solutions to a logical constraint has important applications in testing, verification, and synthesis for both software and hardware. The solutions generated could be used as inputs that exercise some target functionality in a program or as random stimuli to a hardware module. This sampling of solutions can be combined with techniques such as fuzz testing, symbolic execution, and constrained-random verification to uncover bugs and vulnerabilities in real programs and hardware designs. Stimulus generation, in particular, is an essential part of hardware verification, being at the core of widely applied constrained-random verification techniques. For all these applications, the generation of multiple solutions instead of a single solution can lead to better coverage and higher probability of finding bugs. However, generating such solutions efficiently, while achieving a good coverage of the constraint space, is still a challenge today. Moreover, the problem is amplified when the constraints are complex formulas involving several different theories and when the application requires more refined coverage criteria from the solutions.

This work presents three novel techniques developed to tackle the problem of efficient sampling of solutions to logical constraints. They allow the efficient generation of millions of solutions with only tens of queries to a constraint solver, being orders of magnitude faster than previous state-of-the-art samplers. First, a technique called QuickSampler, for sampling of solutions to Boolean (SAT) constraints, with the goal of achieving a close to uniform distribution. Second, a technique called SMTSampler, which is designed to sample solutions to large and complex Satisfiability Modulo Theories (SMT) constraints and aims at providing a good coverage of the constraint itself. Third, a technique called GuidedSampler, which enables coverage-guided sampling of SMT constraints, by shaping the distribution of solutions in a problem-specific basis.

The QuickSampler algorithm takes as input a Boolean constraint and uses only a small number of calls to a constraint solver in order to produce millions of samples that satisfy the constraints with high probability (i.e., 75%) in a few seconds or minutes. Our evaluation of QuickSampler on large real-world benchmarks shows that it can produce unique valid solutions orders of magnitude faster than other state-of-the-art sampling tools. We have also empirically verified that the distribution of solutions is close to uniform, which was our target distribution.

SMTSampler is an extension of the technique that allows efficient sampling of solutions from Satisfiability Modulo Theories (SMT) constraints, including theories such as arrays, bit-vectors and uninterpreted functions. By working over SMT formulas directly, without encoding them into Boolean (SAT) constraints, SMTSampler is able to sample solutions more efficiently, and also achieve a better coverage of the constraint space. Our evaluation on hundreds of benchmarks from SMT-LIB shows that SMTSampler can handle a larger class of SMT problems, outperforming QuickSampler in the number of samples produced and the coverage of the constraint space.

GuidedSampler is an extension of SMTSampler that allows coverage-guided sampling of SMT solutions, by letting the user specify a desired set of coverage points that will shape the distribution of solutions. GuidedSampler enables this greater flexibility by using the specified coverage points to guide the sampling algorithm into generating solutions from diverse coverage classes. This is particularly important for applications that require from a more specific coverage definition, but even for applications where a general notion of coverage suffices, our evaluation shows that the coverage-guided sampling approach is more effective at achieving this desired coverage.

Advisors: Koushik Sen

BibTeX citation:

@phdthesis{Dutra:EECS-2019-167,
    Author= {Dutra, Rafael},
    Title= {Efficient Sampling of SAT and SMT Solutions for Testing and Verification},
    School= {EECS Department, University of California, Berkeley},
    Year= {2019},
    Month= {Dec},
    Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2019/EECS-2019-167.html},
    Number= {UCB/EECS-2019-167},
    Abstract= {The problem of generating a large number of diverse solutions to a logical constraint has important applications in testing, verification, and synthesis for both software and hardware. The solutions generated could be used as inputs that exercise some target functionality in a program or as random stimuli to a hardware module. This sampling of solutions can be combined with techniques such as fuzz testing, symbolic execution, and constrained-random verification to uncover bugs and vulnerabilities in real programs and hardware designs. Stimulus generation, in particular, is an essential part of hardware verification, being at the core of widely applied constrained-random verification techniques. For all these applications, the generation of multiple solutions instead of a single solution can lead to better coverage and higher probability of finding bugs. However, generating such solutions efficiently, while achieving a good coverage of the constraint space, is still a challenge today. Moreover, the problem is amplified when the constraints are complex formulas involving several different theories and when the application requires more refined coverage criteria from the solutions.

This work presents three novel techniques developed to tackle the problem of efficient sampling of solutions to logical constraints. They allow the efficient generation of millions of solutions with only tens of queries to a constraint solver, being orders of magnitude faster than previous state-of-the-art samplers. First, a technique called QuickSampler, for sampling of solutions to Boolean (SAT) constraints, with the goal of achieving a close to uniform distribution. Second, a technique called SMTSampler, which is designed to sample solutions to large and complex Satisfiability Modulo Theories (SMT) constraints and aims at providing a good coverage of the constraint itself. Third, a technique called GuidedSampler, which enables coverage-guided sampling of SMT constraints, by shaping the distribution of solutions in a problem-specific basis.

The QuickSampler algorithm takes as input a Boolean constraint and uses only a small number of calls to a constraint solver in order to produce millions of samples that satisfy the constraints with high probability (i.e., 75%) in a few seconds or minutes. Our evaluation of QuickSampler on large real-world benchmarks shows that it can produce unique valid solutions orders of magnitude faster than other state-of-the-art sampling tools. We have also empirically verified that the distribution of solutions is close to uniform, which was our target distribution.

SMTSampler is an extension of the technique that allows efficient sampling of solutions from Satisfiability Modulo Theories (SMT) constraints, including theories such as arrays, bit-vectors and uninterpreted functions. By working over SMT formulas directly, without encoding them into Boolean (SAT) constraints, SMTSampler is able to sample solutions more efficiently, and also achieve a better coverage of the constraint space. Our evaluation on hundreds of benchmarks from SMT-LIB shows that SMTSampler can handle a larger class of SMT problems, outperforming QuickSampler in the number of samples produced and the coverage of the constraint space.

GuidedSampler is an extension of SMTSampler that allows coverage-guided sampling of SMT solutions, by letting the user specify a desired set of coverage points that will shape the distribution of solutions. GuidedSampler enables this greater flexibility by using the specified coverage points to guide the sampling algorithm into generating solutions from diverse coverage classes. This is particularly important for applications that require from a more specific coverage definition, but even for applications where a general notion of coverage suffices, our evaluation shows that the coverage-guided sampling approach is more effective at achieving this desired coverage.},
}

EndNote citation:

%0 Thesis
%A Dutra, Rafael 
%T Efficient Sampling of SAT and SMT Solutions for Testing and Verification
%I EECS Department, University of California, Berkeley
%D 2019
%8 December 3
%@ UCB/EECS-2019-167
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2019/EECS-2019-167.html
%F Dutra:EECS-2019-167