Abstract:
Distributed crawling is able to overcome important
limitations of the traditional single-sourced web crawling systems.
However, the optimal benefit of distributed crawling is usually
limited to the sites hosting the crawlers, the rest of the URLs are
by large randomly distributed to the various crawlers. In this
work, we propose a location-aware method, called IPMicra, that
utilizes an IP address hierarchy, and allows crawling of links in a
near optimal location aware manner. Our proposal outperforms
earlier distributed crawling schemes by requiring one order of
magnitude less time for crawling of the same set of sites.
Keywords: distributed crawling, web crawling, location aware crawling
@inproceedings{papapetrou:icic,
author = {Odysseas Papapetrou and George Samaras},
title = {IPMicra: An IP-address based Location Aware Distributed Web Crawler.},
booktitle = {International Conference on Internet Computing},
year = {2004},
pages = {694-699},
abstract-url={http://www2.cs.ucy.ac.cy/~cspapap/abstracts/icic04.html},
url={http://www2.cs.ucy.ac.cy/~cspapap/publications/IC04.pdf}
}