INitiative for the Evaluation of XML Retrieval

XML Entity Ranking (XER) 2009

logo

See also INEX 2009

25 Feb 2010. Official results posted. Qrel files will be posted here when INEX 2009 proceedings will be published.
12 Jul 2010. Test Collection (i.e., qrel files) posted.

Testing Data

The testing data consists of 55 topics (inex09-xer-topics-final.xml) and their assessments in trec_eval format.

XER Topics

60 genuine XER topics have originally been selected from the last 2 editions to be run on the new INEX Wikipedia collection. Topics have been re-assessed by INEX-XER 2009 participants on the new collection.

As last year, from the originally proposed ones, topics with less than 7 relevant entities (that is, 104, and 90) and topics with more than 74 relevant entities (that is, 78, 112, and 85) have been excluded.

The final set consists of 55 genuine XER topics with assessments.

LC Topics

Out of the 55 XER topics, 3 topics have been excluded for the LC task (i.e., 143, 126, and 132).

The reason is that example entities for these topics were not relevant as the underlying Wikipedia collection has changed.

After this selection, 52 List Completion topics are part of the final set and are considered in the evaluation.

Assessments Data Format (Qrels)

Assessments for the INEX XER topics are provided in qrels format, with the following structure:

The identifier for article ###.xml in the collection is given as WP### .

The possible relevance values are:

Guidelines and results

Use inex09-xer-testing-qrels-entity-ranking.txt to evaluate your results on the entity ranking tasks, and inex09-xer-testing-qrels-list-completion.txt to evaluate results for the list completion task. The difference between the two files is whether the entity examples are included as relevant answers or left out and that 3 topics have been removed for LC. Notice that your system should not include the given example entities in the answer set when evaluating the list completion task!

In the case of the entity ranking task, the organizers checked and fixed all the cases where the examples of relevant entities provided at topic creation time were inserted in the pool and judged as non-relevant at assessment time.

The official evaluation measure is xinfAP as defined in [1] which makes use of the stratified sampling for estimating Average Precision. For computing such measure the script sample_eval_xer09.pl (gently provided by Emine Yilmaz) can be used together with the qrels and the run files. The new version of the script takes into account not-an-entity judgements as non-relevant results.

The evaluation results measured with xinfAP for the entity ranking task are:

2_UAmsISLA_ER_TC_ERreltop:0.517266942047346
4_UAmsISLA_ER_TC_ERfeedbackSP:0.504656609651102
1_AU_ER_TC_mandatoryRun.txt:0.26960167994075
3_UAmsISLA_ER_TC_ERfeedbackS:0.209426515126913
2_UAmsISLA_ER_TC_ERfeedback:0.209372961719828
1_TurfdraagsterpadUvA_ER_TC_base+asscats:0.201481343129621
3_TurfdraagsterpadUvA_ER_TC_base+asscats+prfcats:0.199449553142338
2_TurfdraagsterpadUvA_ER_TC_base+prfcats:0.190188879073875
1_UAmsISLA_ER_TC_ERbaseline:0.189399588890863
4_TurfdraagsterpadUvA_ER_TC_base:0.170541471930759
1_PITT_ER_T_MODEL1EDS:0.15293035801695
1_PITT_ER_T_MODEL1EDR:0.14605057710165
1_PITT_ER_T_MODEL1ED:0.130106187928694
1_PITT_ER_T_MODEL1D:0.128591130396271
1_Waterloo_ER_TC_qap:0.0953847690953363
5_TurfdraagsterpadUvA_ER_TC_asscats:0.0820370626519062

The evaluation results measured with xinfAP for the list completion task are:

5_UAmsISLA_LC_TE_LCexpTCP:0.520481785141401
3_UAmsISLA_LC_TE_LCreltop:0.503709455530018
6_UAmsISLA_LC_TE_LCexpTCSP:0.503457764392101
1_UAmsISLA_LC_TE_LCexpTC:0.402108444227398
1_UAmsISLA_LC_TE_LCtermexp:0.357639763257712
2_UAmsISLA_LC_TEC_LCexpTCS:0.35095192440502
3_UAmsISLA_LC_TE_LCexpT:0.31986895359861
1_AU_LC_TE_mandatoryRun.txt:0.308166069016244
2_UAmsISLA_LC_TE_LCbaseline:0.253532395195908
4_UAmsISLA_LC_TE_LCexpC:0.205101342524643
4_TurfdraagsterpadUvA_LC_TE_base+wn20cats:0.173278131477194
3_TurfdraagsterpadUvA_LC_TE_base+wiki20cats+wn20cats:0.165200208905434
2_TurfdraagsterpadUvA_LC_TE_base+wiki20cats+prfcats:0.15992768735345
5_TurfdraagsterpadUvA_LC_TE_base+wiki20cats:0.15661444978422
1_TurfdraagsterpadUvA_LC_TE_base+wiki20cats:0.156389662529707
1_Waterloo_LC_TE:0.100157151036525

Guidelines are archived in the original INEX 2009 Entity Ranking guidelines document.

Judging INEX Entity Ranking

See the judging pages.