I share my experimental datasets which we used in some 'timeline summarization' papers here for non-commercial purposes.

1. 17 Timelines: We used this dataset (download here) to conduct experiments for following papers

(1.1) G. B. Tran, T.A. Tran, N.K. Tran, M. Alrifai and N. Kanhabua. 2013. Leverage Learning to rank in an optimization framework for timeline summarization. In Proc. TAIA workshop, SIGIR 2013

(1.2) G. B. Tran, M. Alrifai and D. Q. Nguyen. 2013. Predicting Relevant News Events for Timeline Summaries In Proc. 22th WWW2013 [pdf]


2. Crisis data: We used Crisis data (download here) to conduct experiments for our ECIR 2015 paper. Once untar, you will find 4 crisis stories (wrt. Egypt, Libya, Yemen, Syria), each is in a folder under a corresponding name. Inside each folder, there is 'content' folder which is content extracted from html page of the news articles. This was done automatically using Boilerpipe toolkit + some cleaning rules on top. The first sentence is the headline, and they are organized by date. 'headlineNorm' folder contains only headline extracted from 'title tag' of news articles' html. 'timelines' contains expert timeline summaries from famous news agencies. If you would like to refetch the news articles, plz use 'urls_for_news.txt'.

(2.1) G. Tran, M. Alrifai, E. Herder. 2015. Timeline Summarization from Relevant Headlines. In Proc. 37th ECIR 2015.