Parallel Unification Scheduling in Prolog

Wayne Victor Citrin

EECS Department
University of California, Berkeley
Technical Report No. UCB/CSD-88-415
April 1988

Unification, the fundamental operation in Prolog, can take up to 50% of the execution time of a typical Prolog system. One approach to speeding up the unification operation is to perform it on parallel hardware. Although it has been shown that, in the worst case, there is no parallel algorithm for unification that is faster than the best sequential algorithm, we show that there is a substantial subset of unifications that may be speeded up by being performed in parallel.

In order to identify these parallel unifications, we employ a static program analysis technique, SDDA, originally described by Chang. Using information supplied by the programmer, SDDA attempts to predict the grounding and coupling status of arguments in Prolog clauses at the time those clauses are called. Although the predictions given by Chang's basic technique are only rough estimates (due to the static nature of the analysis), we have introduced a number of refinements to significantly improve the technique's accuracy. In addition, we have further improved the accuracy of the predictions through a program transformation technique called procedure splitting, which is described in the dissertation. We have also added the ability to predict the structure of arguments in the clause heads of the analyzed program. All of these improvements allow more unification parallelism to be discovered during the subsequent scheduling phase.

Using the predictions (called modes) provided by this improved SDDA, we transform the unifications into a set of basic unification operations, and then create a partition of that set such that all operations in any given block of the partition may be safely unified in parallel. We call this partition a schedule. Although we show that finding an optimal schedule is an intractable problem, we present a heuristic algorithm that can usually find an optimal or near-optimal schedule.

The computed schedule is then used to guide a compiler in generating code to perform the parallel unifications. In this thesis, the object machine is a modified version of the Berkeley PLM, to which parallel unification instructions have been added.

Based on simulations of this parallel-unification PLM, we found speedups of a factor of two to three in the unification operation when it is performed on parallel hardware. This suggests an overall speedup of 25-33% on common Prolog benchmarks. Through the use of further measures suggested in the dissertation, it is likely that additional speedup in the unification operation can be achieved.

The results indicate that future designers of Prolog machines would do well to include at least two, and more likely three, unification units in their designs. For those times when insufficient unification parallelism is available to employ all of the unification units, the unused units may be employed to compute "look-ahead" unifications (attempting to unify the call subgoal with the next candidate clause, or attempting subsequent call subgoals), thereby achieving additional speedup.

Advisor: Alvin M. Despain


BibTeX citation:

@phdthesis{Citrin:CSD-88-415,
    Author = {Citrin, Wayne Victor},
    Title = {Parallel Unification Scheduling in Prolog},
    School = {EECS Department, University of California, Berkeley},
    Year = {1988},
    Month = {Apr},
    URL = {http://www2.eecs.berkeley.edu/Pubs/TechRpts/1988/5859.html},
    Number = {UCB/CSD-88-415},
    Abstract = {Unification, the fundamental operation in Prolog, can take up to 50% of the execution time of a typical Prolog system. One approach to speeding up the unification operation is to perform it on parallel hardware. Although it has been shown that, in the worst case, there is no parallel algorithm for unification that is faster than the best sequential algorithm, we show that there is a substantial subset of unifications that may be speeded up by being performed in parallel. <p> In order to identify these parallel unifications, we employ a static program analysis technique, SDDA, originally described by Chang. Using information supplied by the programmer, SDDA attempts to predict the grounding and coupling status of arguments in Prolog clauses at the time those clauses are called. Although the predictions given by Chang's basic technique are only rough estimates (due to the static nature of the analysis), we have introduced a number of refinements to significantly improve the technique's accuracy. In addition, we have further improved the accuracy of the predictions through a program transformation technique called procedure splitting, which is described in the dissertation. We have also added the ability to predict the structure of arguments in the clause heads of the analyzed program. All of these improvements allow more unification parallelism to be discovered during the subsequent scheduling phase. <p> Using the predictions (called modes) provided by this improved SDDA, we transform the unifications into a set of basic unification operations, and then create a partition of that set such that all operations in any given block of the partition may be safely unified in parallel. We call this partition a schedule. Although we show that finding an optimal schedule is an intractable problem, we present a heuristic algorithm that can usually find an optimal or near-optimal schedule. <p> The computed schedule is then used to guide a compiler in generating code to perform the parallel unifications. In this thesis, the object machine is a modified version of the Berkeley PLM, to which parallel unification instructions have been added. <p> Based on simulations of this parallel-unification PLM, we found speedups of a factor of two to three in the unification operation when it is performed on parallel hardware. This suggests an overall speedup of 25-33% on common Prolog benchmarks. Through the use of further measures suggested in the dissertation, it is likely that additional speedup in the unification operation can be achieved. <p> The results indicate that future designers of Prolog machines would do well to include at least two, and more likely three, unification units in their designs. For those times when insufficient unification parallelism is available to employ all of the unification units, the unused units may be employed to compute "look-ahead" unifications (attempting to unify the call subgoal with the next candidate clause, or attempting subsequent call subgoals), thereby achieving additional speedup.}
}

EndNote citation:

%0 Thesis
%A Citrin, Wayne Victor
%T Parallel Unification Scheduling in Prolog
%I EECS Department, University of California, Berkeley
%D 1988
%@ UCB/CSD-88-415
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/1988/5859.html
%F Citrin:CSD-88-415