Abstract:
Distributed Hash Tables (DHTs) are very efficient for
querying based on key lookups, if only a small number
of keys has to be registered by each individual peer. But
when it comes to huge collections of document terms necessary
for IR-style keyword search their usefulness is limited.
One reason is the high cost of index maintenance. Due to
the large sizes of vocabularies for document terms, joining
peers cause huge amounts of key inserts, and therefore large
numbers of index maintenance messages. We propose to use
DHTs in combination with peer clustering to cope with this
issue. In our approach peers are first clustered into communities,
each of the communities having a representative
super peer. Then each term occurring in a community is
only registered once in a global DHT by the representative
peer. Thus, though especially for frequent terms in a community
index maintenance is reduced drastically, the global
DHT is still complete and allows for correct query processing.
Our extensive simulations show the applicability of the
scheme for practical applications.
Keywords: Peer to Peer, Distributed Clustering, Hierarchical DHT