Measurement and Analysis of Ultrapeer-based P2P Search Networks

Boon Thau Loo, Joseph Hellerstein, Ryan Huebsch, Scott Shenker and Ion Stoica

EECS Department
University of California, Berkeley
Technical Report No. UCB/CSD-03-1277
2003

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2003/CSD-03-1277.pdf

Unstructured Networks have been used extensively in P2P search systems today primarily for file sharing. These networks exploit heterogeneity in the network and offload most of the query processing load to more powerful nodes. As an alternative to unstructured networks, there have been recent proposals for using inverted indexes on structured networks for searching. These structured networks, otherwise known as distributed hash tables (DHTs), guarantee recall and are well suited for locating rare items. However, they may incur significant bandwidth for keyword-based searches. This paper performs a measurement study of Gnutella, a popular unstructured network used for file sharing. We focus primarily on studying Gnutella's search performance and recall, especially in light of recent ultrapeer enhancements. Our study reveals significant query overheads in Gnutella ultra-peers, and the presence of queries that may benefit from the use of DHTs. Based on our study, we propose the use of a hybrid search infrastructure to improve the search coverage for rare items and present some preliminary performance results.


BibTeX citation:

@techreport{Loo:CSD-03-1277,
    Author = {Loo, Boon Thau and Hellerstein, Joseph and Huebsch, Ryan and Shenker, Scott and Stoica, Ion},
    Title = {Measurement and Analysis of Ultrapeer-based P2P Search Networks},
    Institution = {EECS Department, University of California, Berkeley},
    Year = {2003},
    URL = {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2003/5771.html},
    Number = {UCB/CSD-03-1277},
    Abstract = {Unstructured Networks have been used extensively in P2P search systems today primarily for file sharing. These networks exploit heterogeneity in the network and offload most of the query processing load to more powerful nodes. As an alternative to unstructured networks, there have been recent proposals for using inverted indexes on structured networks for searching. These structured networks, otherwise known as distributed hash tables (DHTs), guarantee recall and are well suited for locating rare items. However, they may incur significant bandwidth for keyword-based searches. This paper performs a measurement study of Gnutella, a popular unstructured network used for file sharing. We focus primarily on studying Gnutella's search performance and recall, especially in light of recent ultrapeer enhancements. Our study reveals significant query overheads in Gnutella ultra-peers, and the presence of queries that may benefit from the use of DHTs. Based on our study, we propose the use of a hybrid search infrastructure to improve the search coverage for rare items and present some preliminary performance results.}
}

EndNote citation:

%0 Report
%A Loo, Boon Thau
%A Hellerstein, Joseph
%A Huebsch, Ryan
%A Shenker, Scott
%A Stoica, Ion
%T Measurement and Analysis of Ultrapeer-based P2P Search Networks
%I EECS Department, University of California, Berkeley
%D 2003
%@ UCB/CSD-03-1277
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2003/5771.html
%F Loo:CSD-03-1277