User Tools

Site Tools


Research Seminar Summer Semester 2011

The Research Seminar takes place on Friday at 14:00 in our Multimedia Room (1526), Appelstr. 9a, 15th floor (unless stated otherwise).

Jan 7

organized by:

Speaker: Elena

DivQ: Diversification for Keyword Search over Structured Databases

Keyword queries over structured databases are notoriously ambiguous. No single interpretation of a keyword query can satisfy all users, and multiple interpretations may yield overlapping results. This paper proposes a scheme to balance the relevance and novelty of keyword search results over structured databases. Firstly, we present a probabilistic model which effectively ranks the possible interpretations of a keyword query over structured data. Then, we introduce a scheme to diversify the search results by re-ranking query interpretations, taking into account redundancy of query results. Finally, we propose α- nDCG-W and WS-recall, an adaptation of α-nDCG and S-recall metrics, taking into account graded relevance of subtopics. Our evaluation on two real-world datasets demonstrates that search results obtained using the proposed diversification algorithms better characterize possible answers available in the database than the results of the initial relevance ranking.

Jan 14

organized by: Dimitris

Speakers: Dimitris, Julien, Marco

Efficient Discovery of Frequent Subgraph Patterns in Uncertain Graph Databases (Dimitris)

Mining frequent subgraph patterns in graph databases is a challenging and important problem with applications in several domains. Recently, there is a growing interest in generalizing the problem to uncertain graphs, which can model the inherent uncertainty in the data of many applications. The main difficulty in solving this problem results from the large number of candidate subgraph patterns to be examined and the large number of subgraph isomorphism tests required to find the graphs that contain a given pattern. The latter becomes even more challenging, when dealing with uncertain graphs. In this paper, we propose a method that uses an index of the uncertain graph database to reduce the number of comparisons needed to find frequent subgraph patterns. The proposed algorithm relies on the apriori property for enumerating candidate subgraph patterns efficiently. Then, the index is used to reduce the number of comparisons required for computing the expected support of each candidate pattern. It also enables additional optimizations with respect to scheduling and early termination, that further increase the efficiency of the method. The evaluation of our approach on three real-world datasets as well as on synthetic uncertain graph databases demonstrates the significant cost savings with respect to the state-of-the-art approach.

Time-Aware Entity-Based Multi-Document Summarisation (Julien)

Automatic news multi-document summarisation received increased intention lately to cope with the increasing amount of news articles and sources. Summarisation of news article has the additional challenge that document (news articles) are timestamped, and often relate events which themselves inscribe in time

We propose three contributions which we believe will help improving summarisation quality:

  1. Considering named entities in news article
  2. Considering time for summarisation and for summary layout
  3. Considering time references in the text in addition to article timestamps

For this we augment a state-of-the-art summarisation technique with named entities and time references, and adapt a state-of-the-art news event detection to cluster sentences to improve summarisation of news article.

This work is in progress, and I will present the general approach and ideas, as well as the current status of the work.

Detecting Health Events on the Social Web to Enable Epidemic Intelligence (Marco)

Content analysis and clustering of natural language documents becomes crucial in various domains, even in public health. Recent pandemics such as Swine Flu have caused concern for public health officials. Given the ever increasing pace at which infectious diseases can spread globally, Officials must be prepared to react sooner and with greater epidemic intelligence gathering capabilities. There is a need to allow for information gathering from a broader range of sources, including the Web which in turn requires more robust processing capabilities. To address this limitation, in this paper, we propose a new approach to detect public health events in an unsupervised manner. We address the problems associated with adapting an unsupervised learner to the medical domain and in doing so, propose an approach which combines aspects from different feature-based event detection methods. We evaluate our approach with a real world dataset with respect to the quality of article clusters. Our results show that we are able to achieve a precision of 62% and a recall of 75% evaluated using manually annotated, real-world data.

Wednesday, Feb 2, 14:00

organized by:

Speaker: Alexandros Nanopoulos

Content-based multimedia information retrieval: past, present, and future

Content-based multimedia information retrieval (CBMIR) provides methods for searching through the vast amount of media of different types that are available today. CBMIR is particularly useful in cases where human-assigned text annotations are incomplete. In this presentation we will start by surveying the fundamental concepts of CBMIR (i.e., the past), namely feature extraction, indexing, similarity searching, and relevance feedback. We will also consider the main current trends (i.e., the present), such as human-centered methods, new features and similarity measures, and ways to bridge the “semantic gap”. Finally, we will present some challenges for future research in this area, such as affective computing, new media types, and use of folksonomies.

Friday, Apr 8, 14:00

organized by:

Speaker: Mohammad Alrifai

Service Selection and Transactional Management for Web Service Composititon

PhD Defense.

Friday, Apr 15, 14:00

organized by:

Speaker: Ekaterini Ioannou

Entity Linkage for Heterogeneous, Uncertain, and Volatile Data

PhD Defense.

Friday, Apr 29, 14:00

organized by:

Speaker: Eelco Herder

Beyond the Usual Suspects: Context-Aware Revisitation Support

A considerable amount of our activities on the Web involves revisits to pages or sites. Reasons for revisiting include active monitoring of content, verification of information, regular use of online services, and reoccurring tasks. Browsers support for revisitation is mainly focused on frequently and recently visited pages.

We present a dynamic browser toolbar that provides recommendations beyond these usual suspects, balancing diversity and relevance. The recommendation method used is a combination of ranking and propagation methods. Experimental outcomes show that this algorithm performs significantly better than the baseline method. Further experiments address the question whether it is more appropriate to recommend specific pages or rather (portal pages of) Web sites.

We conducted two user studies with a dynamic toolbar that relies on our recommendation algorithm. The outcomes confirm that users appreciate and use the contextual recommendations provided by the toolbar.

Friday, May 6, 14:00

organized by:

Speaker: Anh-Tuan Tran

CATE: Context-Aware Timeline for Entity Exploration

With millions of articles in multiple languages, Wikipedia has become the de-facto source of reference on the Internet today. Each article on Wikipedia contains encyclopedic information about various topics and implicitly represents an entity. Extracting the most important facts about such entity will help users to acquire knowledge more effectively. However,bthis task is challenging due to the incomplete and noisy nature of Wikipedia.

We have proposed and implemented CATE (Context-Aware Timeline for Entity Exploration), a framework that utilizes Wikipedia to summarize and visualize the important aspects of entities in a timeline fashion. CATE makes it easier for users to draw an informative picture of an entity (e.g. life of a person, or evolution of a research topic, etc.). The novelty of CATE is two-fold: It explores the entity in different contexts, synchronous with contemporaneous events; it puts the entity in a relationship with other entities, thus offers a broader portrait about it. In order to efficiently query and visualize the results, a number of techniques have been developed, combining information extraction and information retrieval with a novel ranking model.

Speaker: Sascha Tönnies

Developing software in a distributed (research) team – Or how to create sustainable software

More than ever the need for good demos is given at L3S. We have the annual CeBit, presentations at other fairs, conferences and promotional events like the last “Parlamentarischer Abend”. However, there are just some demos available at L3S. In contrast, considering other very good universities, e.g. Cambridge, they invest a lot of effort in developing sustainable software out of research projects and promote them very good (e.g. The first step in that direction would be to build a commonly used infrastructure for software development throughout L3S. This short 30 minutes talk will introduce an infrastructure I ended up after 5 years at L3S and software development for ViFaChem, Cooper and several research papers.

Friday, May 13, 14:00

organized by:

Speaker: Ismail Sengor Altingovde

How does a search engine return results in a few milliseconds… or less?

Web search engines respond a daily volume of millions of queries over a dataset of billions of Web pages, and only in (or, sometimes less than) a few milliseconds. To achieve this, several intelligently crafted techniques are developed and embedded into the search architecture. Caching of various data items such as Web search results, posting lists and their intersections, documents, etc. is a common and key technique that is used to increase query throughput and reduce response time.

In this talk, I will first present some approaches and results from our recent work to improve the performance of search engine caches. In particular, I will discuss cache architecture design and cache content selection strategies that take into account query processing costs. In the second part of the talk, I will present some preliminary findings from a large-scale experiment that shed light on the evolution of Web search results within time. Based on these results, I will point out some possible research directions including, but not limited to, caching.

Speaker: Marcelo Malcher

Using Hadoop to analyze social media data

Hadoop is a distributed computing framework used by major players like Yahoo!, Twitter, Facebook and IBM to analyze large datasets. It is a top-level Apache project, and as such, it is fully open source with a vibrant community behind it. This short technical talk will introduce Hadoop main components - the distributed file system and the map-reduce implementation. It will also present its use for the analysis of social media data and show how to implement map-reduce jobs.

Friday, May 27, 14:00

organized by:

Speaker: Wolf Siberski

Incremental Diversification for Very Large Sets: a Streaming-based Approach

Result diversification is an effective method to reduce the risk that none of the returned results satisfies a user's query intention. It has been shown to decrease query abandonment substantially. On the other hand, computing an optimally diverse set is NP-hard for the usual objectives. Even the greedy diversification algorithms usually exhibit quadratic complexity and require random access to the input set, rendering them impractical in the context of large result sets or continuous data.

To solve this issue, we present a novel diversification approach which treats the input as a stream and processes each element in an incremental fashion, maintaining a near-optimal diverse set at any point in the stream. Our approach exhibits a linear computation and constant memory complexity with respect to input size, without significant loss of diversification quality. In an extensive evaluation on several real-world data sets we show the applicability and efficiency of our algorithm for large result sets as well as for continuous query scenarios such as news stream subscriptions.

Friday, June 10, 14:00

organized by: Ivana Marenzi

Speaker: Jaspreet Singh

Design and evolution of LearnWeb2.0

Thank to the growing success of Web 2.0 platforms, people share information and resources within their social community and beyond. Several tools and platforms have been developed to support different aspects of communication and online collaboration (Mash ups) but one of the keys to success is a good user interface. The interface often acts a bottle neck to the software’s potential. LearnWeb2.0 has been developed in 2009 as a collaborative search and sharing platform. It integrates ten popular resource sharing systems as well as social networking systems such as YouTube and Flickr and provides advanced features for organizing and sharing distributed resources in a collaborative environment. In 2009, the software was developed in PHP. It soon became too difficult to maintain and very slow in terms of response time. LearnWeb2.0 was redeveloped in 2011 in order to make major improvements to all aspects of the software including a much improved user interface. A better user interface leads to better usability and therefore a higher acceptance rate amongst the users. Since the target user of LearnWeb2.0 is an average person (someone not expected to be very good with computers) the usability of the system is vital to its success. LearnWeb2.0has been recently evaluated in an educational environment to support collaborative project work. According to the user evaluation and a deep analysis of the visual interface, new features and usability solutions have been introduced such as.

• New set of user menus to make site navigation easier and intuitive.

• A new concept of groups was introduced. A group in the new system is a collection of resources, people and the data relevant to both within the group. Resources and users belonging to a group viewed in the same manner as search results for better consistency. A thumbnail preview approach with controls was used for better data visualization.

• New admin pages introduced which offer features such as: Change log, Registration wizard, etc.

A questionnaire for User Interface Satisfaction has been prepared to collect new feedback from the users during the next Summer and Winter Semester.

Speaker: Patrick Siehndel


Community news portals and the blogosphere are valuable Internet news sources freely available at all times. They could be more useful, however, if presented and delivered in a more structured way.

SYNC3 is addressing this problem by building a platform that not only supports searching between these unstructured sources, but also displays their connection to the news events that traditional news media refer to. In contrast to the blogosphere news articles are characterized by a much more structured presentation. SYNC3 analyses these news items, clustering them into appropriate events.

In this talk I will present the structure of the Sync3 System, specially the user interface. Additionally I will give a short overview describing the used frameworks (Solr and smart GWT)

Friday, June 24, 14:00

organized by:


Friday, July 1, 14:00

organized by:


Friday, July 15, 14:00

organized by:

Speaker: Marcelo Malcher

Twitter analysis

Friday, September 2, 14:00

organized by: Susanne Elsner

Speaker: Wolfgang Nejdl

L3S extension

from 4 o'clock pm we will have a get together / barbecue.

Please register here to take part

l3sintern/research_seminar_11.txt · Last modified: 2011/09/07 15:04 by siberski