Finding Files Fast | EECS at UC Berkeley

James A. Woods

EECS Department, University of California, Berkeley

Technical Report No. UCB/CSD-83-148

1983

http://www2.eecs.berkeley.edu/Pubs/TechRpts/1983/CSD-83-148.pdf

A fast filename search facility for UNIX is presented. It consolidates two data compression methods with a novel string search technique to rapidly locate arbitrary files. The code, integrated into the standard find utility, consults a preprocessed database, regenerated daily. This contrasts with the usual mechanism of matching search keys against candidate items generated on-the-fly from a scattered directory structure. The pathname database is an incrementally-encoded lexicographically sorted list (sometimes referred to as a "front-compressed" file) which is also subjected to common bigram coding to effect further space reduction. The storage savings are a factor of five to six over the standard ascii representation. The list is scanned using a modified linear search specially tailored to the incremental encoding; typical "user time" required by this algorithm is 40%-50% less than with naive search.

BibTeX citation:

@techreport{Woods:CSD-83-148,
    Author= {Woods, James A.},
    Title= {Finding Files Fast},
    Year= {1983},
    Month= {Jan},
    Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/1983/5392.html},
    Number= {UCB/CSD-83-148},
    Abstract= {A fast filename search facility for UNIX is presented.  It consolidates two data compression methods with a novel string search technique to rapidly locate arbitrary files. The code, integrated into the standard find utility, consults a preprocessed database, regenerated daily. This contrasts with the usual mechanism of matching search keys against candidate items generated on-the-fly from a scattered directory structure. The pathname database is an incrementally-encoded lexicographically sorted list (sometimes referred to as a "front-compressed" file) which is also subjected to common bigram coding to effect further space reduction. The storage savings are a factor of five to six over the standard ascii representation. The list is scanned using a modified linear search specially tailored to the incremental encoding; typical "user time" required by this algorithm is 40%-50% less than with naive search.},
}

EndNote citation:

%0 Report
%A Woods, James A. 
%T Finding Files Fast
%I EECS Department, University of California, Berkeley
%D 1983
%@ UCB/CSD-83-148
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/1983/5392.html
%F Woods:CSD-83-148