|
![]() |
25 Feb 2010. Official results posted. Qrel files will be posted here when INEX 2009 proceedings will be published.
12 Jul 2010. Test Collection (i.e., qrel files) posted.
The testing data consists of 55 topics (inex09-xer-topics-final.xml) and their assessments in trec_eval format.
60 genuine XER topics have originally been selected from the last 2 editions to be run on the new INEX Wikipedia collection. Topics have been re-assessed by INEX-XER 2009 participants on the new collection.
As last year, from the originally proposed ones, topics with less than 7 relevant entities (that is, 104, and 90) and topics with more than 74 relevant entities (that is, 78, 112, and 85) have been excluded.
The final set consists of 55 genuine XER topics with assessments.
Out of the 55 XER topics, 3 topics have been excluded for the LC task (i.e., 143, 126, and 132).
The reason is that example entities for these topics were not relevant as the underlying Wikipedia collection has changed.
After this selection, 52 List Completion topics are part of the final set and are considered in the evaluation.
Assessments for the INEX XER topics are provided in qrels format, with the following structure:
The identifier for article ###.xml in the collection is given as WP### .
The possible relevance values are:
Use inex09-xer-testing-qrels-entity-ranking.txt to evaluate your results on the entity ranking tasks, and inex09-xer-testing-qrels-list-completion.txt to evaluate results for the list completion task. The difference between the two files is whether the entity examples are included as relevant answers or left out and that 3 topics have been removed for LC. Notice that your system should not include the given example entities in the answer set when evaluating the list completion task!
In the case of the entity ranking task, the organizers checked and fixed all the cases where the examples of relevant entities provided at topic creation time were inserted in the pool and judged as non-relevant at assessment time.
The official evaluation measure is xinfAP as defined in [1] which makes use of the stratified sampling for estimating Average Precision. For computing such measure the script sample_eval_xer09.pl (gently provided by Emine Yilmaz) can be used together with the qrels and the run files. The new version of the script takes into account not-an-entity judgements as non-relevant results.
The evaluation results measured with xinfAP for the entity ranking task are:
| 2_UAmsISLA_ER_TC_ERreltop: | 0.517266942047346 |
| 4_UAmsISLA_ER_TC_ERfeedbackSP: | 0.504656609651102 |
| 1_AU_ER_TC_mandatoryRun.txt: | 0.26960167994075 |
| 3_UAmsISLA_ER_TC_ERfeedbackS: | 0.209426515126913 |
| 2_UAmsISLA_ER_TC_ERfeedback: | 0.209372961719828 |
| 1_TurfdraagsterpadUvA_ER_TC_base+asscats: | 0.201481343129621 |
| 3_TurfdraagsterpadUvA_ER_TC_base+asscats+prfcats: | 0.199449553142338 |
| 2_TurfdraagsterpadUvA_ER_TC_base+prfcats: | 0.190188879073875 |
| 1_UAmsISLA_ER_TC_ERbaseline: | 0.189399588890863 |
| 4_TurfdraagsterpadUvA_ER_TC_base: | 0.170541471930759 |
| 1_PITT_ER_T_MODEL1EDS: | 0.15293035801695 |
| 1_PITT_ER_T_MODEL1EDR: | 0.14605057710165 |
| 1_PITT_ER_T_MODEL1ED: | 0.130106187928694 |
| 1_PITT_ER_T_MODEL1D: | 0.128591130396271 |
| 1_Waterloo_ER_TC_qap: | 0.0953847690953363 |
| 5_TurfdraagsterpadUvA_ER_TC_asscats: | 0.0820370626519062 |
The evaluation results measured with xinfAP for the list completion task are:
| 5_UAmsISLA_LC_TE_LCexpTCP: | 0.520481785141401 |
| 3_UAmsISLA_LC_TE_LCreltop: | 0.503709455530018 |
| 6_UAmsISLA_LC_TE_LCexpTCSP: | 0.503457764392101 |
| 1_UAmsISLA_LC_TE_LCexpTC: | 0.402108444227398 |
| 1_UAmsISLA_LC_TE_LCtermexp: | 0.357639763257712 |
| 2_UAmsISLA_LC_TEC_LCexpTCS: | 0.35095192440502 |
| 3_UAmsISLA_LC_TE_LCexpT: | 0.31986895359861 |
| 1_AU_LC_TE_mandatoryRun.txt: | 0.308166069016244 |
| 2_UAmsISLA_LC_TE_LCbaseline: | 0.253532395195908 |
| 4_UAmsISLA_LC_TE_LCexpC: | 0.205101342524643 |
| 4_TurfdraagsterpadUvA_LC_TE_base+wn20cats: | 0.173278131477194 |
| 3_TurfdraagsterpadUvA_LC_TE_base+wiki20cats+wn20cats: | 0.165200208905434 |
| 2_TurfdraagsterpadUvA_LC_TE_base+wiki20cats+prfcats: | 0.15992768735345 |
| 5_TurfdraagsterpadUvA_LC_TE_base+wiki20cats: | 0.15661444978422 |
| 1_TurfdraagsterpadUvA_LC_TE_base+wiki20cats: | 0.156389662529707 |
| 1_Waterloo_LC_TE: | 0.100157151036525 |
Guidelines are archived in the original INEX 2009 Entity Ranking guidelines document.
See the judging pages.