Implicit Storage Schemes for Quick Retrieval

Simeon Naor

EECS Department
University of California, Berkeley
Technical Report No. UCB/CSD-89-510
June 1989

http://www2.eecs.berkeley.edu/Pubs/TechRpts/1989/CSD-89-510.pdf

We address problems of the form: given a collection of n records, each composed of k keys, arrange them in a table of size n so that locating a record with a given key value can be performed while accessing the table as few times as possible. We analyze the amount of additional memory required to achieve optimal search time in a few settings. Schemes that do not use any additional memory are called implicit.

When comparisons are the only basic operation performed on the records, the table can be arranged with no additional memory, so that searching can be performed in optimal logarithmic time. The preprocessing time is also optimal, O(n log n).

When the operations are not restricted to comparisons, e.g. hashing, the problem is analyzed with respect to the size of the domain from which the keys are drawn. It is shown that if the domain is of size at most exponential in the number of records, then there exists an implicit scheme which achieves searching in worst case O(1) probes to the table. Conversely, lower bounds on the domain size for which no implicit O(1) probe search schemes exist are given. These lower bounds improve the previously known bounds. The problem is considered both for the single-key case and the multi-key case. The general conclusion is that the two cases behave in a similar fashion.

The upper bounds for both the comparison and the unrestricted cases are in contrast to lower bounds that appeared in the literature for these models. The seeming contradiction is resolved by observing that our models allow the search algorithm to use information which is not directly related to the value being searched. This points out the delicacy in choosing an appropriate model.

The solutions are achieved by techniques that enable representation of up to O(n log n) arbitrary bits implicitly, by exploiting the order of the records, so that a word of length O(log n) bits can be retrieved by accessing the table a constant number of times. Thus, the general methodology in designing the schemes is as follows: first, give a space efficient algorithm that uses some additional memory, then show how to encode the additional memory implicitly.

Advisor: Manuel Blum


BibTeX citation:

@phdthesis{Naor:CSD-89-510,
    Author = {Naor, Simeon},
    Title = {Implicit Storage Schemes for Quick Retrieval},
    School = {EECS Department, University of California, Berkeley},
    Year = {1989},
    Month = {Jun},
    URL = {http://www2.eecs.berkeley.edu/Pubs/TechRpts/1989/6155.html},
    Number = {UCB/CSD-89-510},
    Abstract = {We address problems of the form: given a collection of <i>n</i> records, each composed of <i>k</i> keys, arrange them in a table of size <i>n</i> so that locating a record with a given key value can be performed while accessing the table as few times as possible. We analyze the amount of additional memory required to achieve optimal search time in a few settings. Schemes that do not use any additional memory are called implicit. <p>When comparisons are the only basic operation performed on the records, the table can be arranged with no additional memory, so that searching can be performed in optimal logarithmic time. The preprocessing time is also optimal, <i>O</i>(<i>n</i> log <i>n</i>). <p>When the operations are not restricted to comparisons, e.g. hashing, the problem is analyzed with respect to the size of the domain from which the keys are drawn. It is shown that if the domain is of size at most exponential in the number of records, then there exists an implicit scheme which achieves searching in worst case <i>O</i>(1) probes to the table. Conversely, lower bounds on the domain size for which no implicit <i>O</i>(1) probe search schemes exist are given. These lower bounds improve the previously known bounds. The problem is considered both for the single-key case and the multi-key case. The general conclusion is that the two cases behave in a similar fashion. <p>The upper bounds for both the comparison and the unrestricted cases are in contrast to lower bounds that appeared in the literature for these models. The seeming contradiction is resolved by observing that our models allow the search algorithm to use information which is not directly related to the value being searched. This points out the delicacy in choosing an appropriate model. <p>The solutions are achieved by techniques that enable representation of up to <i>O</i>(<i>n</i> log <i>n</i>) arbitrary bits implicitly, by exploiting the order of the records, so that a word of length <i>O</i>(log <i>n</i>) bits can be retrieved by accessing the table a constant number of times. Thus, the general methodology in designing the schemes is as follows: first, give a space efficient algorithm that uses some additional memory, then show how to encode the additional memory implicitly.}
}

EndNote citation:

%0 Thesis
%A Naor, Simeon
%T Implicit Storage Schemes for Quick Retrieval
%I EECS Department, University of California, Berkeley
%D 1989
%@ UCB/CSD-89-510
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/1989/6155.html
%F Naor:CSD-89-510