Smartseer: Continuous Queries over Citeseer

Jayanthkumar Kannan, Beverly Yang, Scott Shenker, Puneet Sharma, Sujata Banerjee, Sujoy Basu and Sung Ju Lee

EECS Department
University of California, Berkeley
Technical Report No. UCB/CSD-05-1371
January 2005

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2005/CSD-05-1371.pdf

As the academic world moves away from physical journals and proceedings to online document repositories, the ability to efficiently locate work of interest among the vast sea of newly-generated papers will become increasingly important. Towards this end, this paper describes the design of SmartSeer, a system that allows users to register personalized continuous queries over the CiteSeer database of technical documents. These users will then be alerted whenever papers that match their queries are put online. SmartSeer has two main design requirements: it should support rich continuous queries (as opposed to simple keyword searches) to allow effective information retrieval and it should be capable of running on a loosely maintained group of unreliable machines donated by multiple organizations (as opposed to assuming a reliable and tightly coupled distributed system). Existing work on distributed continuous query systems fails at least one of these requirements. Our design for SmartSeer is based on Distributed Hash Tables (DHTs), and thereby leverages previous work on DHT-based query systems. A prototype of Smartseer has been implemented and evaluated, and we hope to soon have a publicly available service deployed on Planetlab. Though we evaluate our design only for the SmartSeer application, we believe it also provides useful insights into other distributed and rich continuous query systems (web alerts, news alerts, etc.).


BibTeX citation:

@techreport{Kannan:CSD-05-1371,
    Author = {Kannan, Jayanthkumar and Yang, Beverly and Shenker, Scott and Sharma, Puneet and Banerjee, Sujata and Basu, Sujoy and Lee, Sung Ju},
    Title = {Smartseer: Continuous Queries over Citeseer},
    Institution = {EECS Department, University of California, Berkeley},
    Year = {2005},
    Month = {Jan},
    URL = {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2005/5422.html},
    Number = {UCB/CSD-05-1371},
    Abstract = {As the academic world moves away from physical journals and proceedings to online document repositories, the ability  to efficiently locate work of interest among the vast sea of  newly-generated papers will become increasingly important. Towards this end, this paper describes the design of SmartSeer, a system that allows users to register personalized continuous queries over the CiteSeer database of technical documents.  These users will then be alerted  whenever papers that match their queries are put online. SmartSeer has two main design requirements: it should support rich continuous queries (as opposed to simple keyword searches) to allow effective information retrieval and it should be capable of running on a loosely maintained group of unreliable machines donated by multiple organizations (as opposed to assuming a reliable and tightly coupled distributed system). Existing work on distributed continuous query systems fails at least one of these requirements.  Our design for SmartSeer is based on Distributed Hash Tables (DHTs), and thereby leverages previous work on DHT-based query systems. A prototype of Smartseer has been implemented and evaluated, and we hope to soon have a publicly available service deployed on Planetlab. Though we evaluate our design only for the SmartSeer application, we believe it also provides useful  insights into other distributed and rich continuous query systems (web alerts, news alerts, etc.).}
}

EndNote citation:

%0 Report
%A Kannan, Jayanthkumar
%A Yang, Beverly
%A Shenker, Scott
%A Sharma, Puneet
%A Banerjee, Sujata
%A Basu, Sujoy
%A Lee, Sung Ju
%T Smartseer: Continuous Queries over Citeseer
%I EECS Department, University of California, Berkeley
%D 2005
%@ UCB/CSD-05-1371
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2005/5422.html
%F Kannan:CSD-05-1371