Finding Files Fast
James A. Woods
EECS Department, University of California, Berkeley
Technical Report No. UCB/CSD-83-148
, 1983
http://www2.eecs.berkeley.edu/Pubs/TechRpts/1983/CSD-83-148.pdf
A fast filename search facility for UNIX is presented. It consolidates two data compression methods with a novel string search technique to rapidly locate arbitrary files. The code, integrated into the standard find utility, consults a preprocessed database, regenerated daily. This contrasts with the usual mechanism of matching search keys against candidate items generated on-the-fly from a scattered directory structure. The pathname database is an incrementally-encoded lexicographically sorted list (sometimes referred to as a "front-compressed" file) which is also subjected to common bigram coding to effect further space reduction. The storage savings are a factor of five to six over the standard ascii representation. The list is scanned using a modified linear search specially tailored to the incremental encoding; typical "user time" required by this algorithm is 40%-50% less than with naive search.
BibTeX citation:
@techreport{Woods:CSD-83-148, Author= {Woods, James A.}, Title= {Finding Files Fast}, Year= {1983}, Month= {Jan}, Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/1983/5392.html}, Number= {UCB/CSD-83-148}, Abstract= {A fast filename search facility for UNIX is presented. It consolidates two data compression methods with a novel string search technique to rapidly locate arbitrary files. The code, integrated into the standard find utility, consults a preprocessed database, regenerated daily. This contrasts with the usual mechanism of matching search keys against candidate items generated on-the-fly from a scattered directory structure. The pathname database is an incrementally-encoded lexicographically sorted list (sometimes referred to as a "front-compressed" file) which is also subjected to common bigram coding to effect further space reduction. The storage savings are a factor of five to six over the standard ascii representation. The list is scanned using a modified linear search specially tailored to the incremental encoding; typical "user time" required by this algorithm is 40%-50% less than with naive search.}, }
EndNote citation:
%0 Report %A Woods, James A. %T Finding Files Fast %I EECS Department, University of California, Berkeley %D 1983 %@ UCB/CSD-83-148 %U http://www2.eecs.berkeley.edu/Pubs/TechRpts/1983/5392.html %F Woods:CSD-83-148