This projects extends the Lehigh University Benchmark (LUBM) by fulltext content and queries. The generated dataset contains realistic person names and publication content. The additional queries target at fulltext search capabilities of RDF stores. The LUBM benchmark was chosen to be extended due to its wide acceptance, frequent usage, and familiar ontology domain. Other existing or future benchmarks can also be extended, similarily.
The LUBMft extension consists of two parts: the data generator UBAft and the benchmark tester UBTft. The data generator now additionally generates realistic names for all persons, and realistic content for all publications. The benchmark tester has improved benchmarking capabilities, and contains new queries targeting at fulltext queries and IR features.
E. Minack, W. Siberski and W. Nejdl. "Benchmarking Fulltext Search Performance of RDF Stores", in Proceedings of the 6th European Semantic Web Conference (ESWC), pp. 81-95, Heraklion, Crete, Greece, May 31-June 4, 2009.
@inproceedings{DBLP:conf/esws/MinackSN09,
author = {Enrico Minack and Wolf Siberski and Wolfgang Nejdl},
title = {{B}enchmarking {F}ulltext {S}earch {P}erformance of {RDF} {S}tores},
booktitle = {Proceedings of the 6th European Semantic Web Conference (ESWC)},
address = {Heraklion, Crete, Greece},
month = {May 31--June 4},
year = {2009},
pages = {81--95},
ee = {http://dx.doi.org/10.1007/978-3-642-02121-3_10},
isbn = {978-3-642-02120-6}
}
Since the original code of LUBM is released under the GNU General Public License (GPL), this license also holds for the following source code.
java interpreter. They can be compiled using Eclipse (Eclipse project files included). Use the *.jardesc files to update the jar files.
java -jar ubaft-1.0.0.jar -univ N -onto "http://www.lehigh.edu/~zhp2/2004/0401/univ-bench.owl" -names -docs
java -jar rdf2rdf-VERSION.jar University*.owl .n3/usr/sbin/flush-fs-cache.sh to UBT/flush-fs-cache.sh by doingcd /usr/sbinsudo ln -s PATH/UBT/flush-fs-cache.sh/etc/sudoers file using sudo sudoedit /etc/sudoersYOURUSERNAME ALL=(root) NOPASSWD: /usr/sbin/flush-fs-cache.sh
so that UBTft is allowed to flush the filesystem cache.
virtuoso
in the project folder
UBTWrapper* project folder,
config.kb.* files to let the data variable point to the folder containing all benchmark data set files of the desired benchmark size.sh ./load_ubt_FLAVOUR.shconfig.kb.* configuration files (usually different backends).
sh ./query_ubt_FLAVOUR.shsh ./evaluate_all.shevaluate_all.sh to your needs.
The fulltext queries are provided in SPARQL as a template, where certain macros have to be replaced with the RDF store specific fulltext queries. These macros are:
| Macro | Description |
|---|---|
| %%FULLTEXT_SEARCH_PREFIX%% | namespace declarations used by the fulltext queries |
| %%FULLTEXT_SEARCH(?X, "keyword")%% | keyword search to bind variable ?X with resources matching given keyword |
| %%FULLTEXT_SEARCH(?X, ub:publicationText, "keyword")%% | keyword search to bind variable ?X with resources mathing given keyword only in given predicate |
| %%FULLTEXT_SEARCH(?X, ub:publicationText, "keyword", ?score)%% | additionally returns the relevance score of the matching resource |
| %%FULLTEXT_SEARCH(?X, ub:publicationText, "keyword", ?snippet)%% | additionally returns a snippet of the matching content |
| %%FULLTEXT_SEARCH(?X, ub:publicationText, "keyword", ?score, k)%% | restricts the number of matching resources to the top-k |
| %%FULLTEXT_SEARCH(?X, ub:publicationText, "keyword", ?score, l)%% | restricts the matching resources to exceed the score by the given limit l |
For the following RDF stores, the macros have to be replaced according to these examples:
| Jena + LARQ | |
|---|---|
| Namespace | PREFIX arq: <http://jena.hpl.hp.com/ARQ/property#> |
| Examples | ?lit arq:textMatch "keyword" . |
(?lit ?score) arq:textMatch ("keyword" 10) . | |
(?lit ?score) arq:textMatch ("keyword" 0,75) . | |
| Sesame2 + LuceneSail | |
| Namespace | PREFIX ls: <http://www.openrdf.org/contrib/lucenesail#> |
| Example | ?X ls:matches [ |
| Virtuoso5 | |
| Namespace | namespace bif build-in |
| Example | ?X ?p ?lit . |
| YARS | |
| Namespace | PREFIX yars: <http://sw.deri.org/2004/06/yars#> |
| Example | ?lit yars:keyword "keyword" . |