Software description

URL Analytics is an entity extraction tool for URLs developed at L3S Research Center. It provides a way to extract entities from URLs by performing some pre-processing steps on the input data. It can be useful to Web Crawlings in the sense that hints about the web page content can be obtained just by looking at the URLs.

The system is described on the following paper (with detailed analytics information):

Semantic URL Analytics to Support Efficient Annotation of Large Scale Web Archives.(2015)
Tarcisio Souza, Elena Demidova, Thomas Risse, Helge Holzmann, Gerhard Gossen, Julian Szymanski