Smartseer: Continuous Queries over Citeseer
Jayanthkumar Kannan and Beverly Yang and Scott Shenker and Puneet Sharma and Sujata Banerjee and Sujoy Basu and Sung Ju Lee
EECS Department, University of California, Berkeley
Technical Report No. UCB/CSD-05-1371
, 2005
http://www2.eecs.berkeley.edu/Pubs/TechRpts/2005/CSD-05-1371.pdf
As the academic world moves away from physical journals and proceedings to online document repositories, the ability to efficiently locate work of interest among the vast sea of newly-generated papers will become increasingly important. Towards this end, this paper describes the design of SmartSeer, a system that allows users to register personalized continuous queries over the CiteSeer database of technical documents. These users will then be alerted whenever papers that match their queries are put online. SmartSeer has two main design requirements: it should support rich continuous queries (as opposed to simple keyword searches) to allow effective information retrieval and it should be capable of running on a loosely maintained group of unreliable machines donated by multiple organizations (as opposed to assuming a reliable and tightly coupled distributed system). Existing work on distributed continuous query systems fails at least one of these requirements. Our design for SmartSeer is based on Distributed Hash Tables (DHTs), and thereby leverages previous work on DHT-based query systems. A prototype of Smartseer has been implemented and evaluated, and we hope to soon have a publicly available service deployed on Planetlab. Though we evaluate our design only for the SmartSeer application, we believe it also provides useful insights into other distributed and rich continuous query systems (web alerts, news alerts, etc.).
BibTeX citation:
@techreport{Kannan:CSD-05-1371, Author= {Kannan, Jayanthkumar and Yang, Beverly and Shenker, Scott and Sharma, Puneet and Banerjee, Sujata and Basu, Sujoy and Lee, Sung Ju}, Title= {Smartseer: Continuous Queries over Citeseer}, Year= {2005}, Month= {Jan}, Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2005/5422.html}, Number= {UCB/CSD-05-1371}, Abstract= {As the academic world moves away from physical journals and proceedings to online document repositories, the ability to efficiently locate work of interest among the vast sea of newly-generated papers will become increasingly important. Towards this end, this paper describes the design of SmartSeer, a system that allows users to register personalized continuous queries over the CiteSeer database of technical documents. These users will then be alerted whenever papers that match their queries are put online. SmartSeer has two main design requirements: it should support rich continuous queries (as opposed to simple keyword searches) to allow effective information retrieval and it should be capable of running on a loosely maintained group of unreliable machines donated by multiple organizations (as opposed to assuming a reliable and tightly coupled distributed system). Existing work on distributed continuous query systems fails at least one of these requirements. Our design for SmartSeer is based on Distributed Hash Tables (DHTs), and thereby leverages previous work on DHT-based query systems. A prototype of Smartseer has been implemented and evaluated, and we hope to soon have a publicly available service deployed on Planetlab. Though we evaluate our design only for the SmartSeer application, we believe it also provides useful insights into other distributed and rich continuous query systems (web alerts, news alerts, etc.).}, }
EndNote citation:
%0 Report %A Kannan, Jayanthkumar %A Yang, Beverly %A Shenker, Scott %A Sharma, Puneet %A Banerjee, Sujata %A Basu, Sujoy %A Lee, Sung Ju %T Smartseer: Continuous Queries over Citeseer %I EECS Department, University of California, Berkeley %D 2005 %@ UCB/CSD-05-1371 %U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2005/5422.html %F Kannan:CSD-05-1371