The ARCOMEM project (FP7-IST-270239) addresses several new challenges to archives, museums and libraries in the age of social media. Social media penetrate more and more into all areas of the human being. The aim of the ARCOMEM is to create a "collective memory" that is closely linked to its users. It uses the Web 2.0 and wisdom of the masses to transform web archiving into a more selective and meaningful process.
Arcomem - Using the wisdom of the masses for the intelligent preservation of web content Motivation With the rapid increase in the amount of content on the Web, the previous "we collect everything" approach to preserving content, regarding the necessary resources and quality of the archives, no longer practicable. The vision of the ARCOMEM project is to use the "wisdom of the masses" to select, evaluate and preserve web content. The contents of the resulting archives should reflect the view of user groups on content and their opinions. Goal and Approach The aim of the ARCOMEM project is to transform static web archives into a kind of "collective memory" that is closely linked to its users. This is to be achieved by not only collecting content from SocialWeb platforms, but also analyzing the platforms for information on other relevant content. The idea behind this is that social media users include in their posts references to other discussions or external sources that the user found interesting. The more recommendations made for a page, the more relevant it is for the public. The ARCOMEM Web crawler treats these pages with higher priority than others. In addition, the meaning of a page is documented at the time of collection. But deciding whether a page is relevant to the archive does not alone contribute to its recommendation on the social web. The archivist may additionally provide a list of desired content (e.g., persons, locations, events such as "Olympics") and / or topics (e.g., financial crisis). During the crawl, each page is checked for relevancy against this specification. If a page is classified as irrelevant, it is assumed that the links contained in it also point to irrelevant side. Further crawl activities in this direction will therefore be discontinued at this point. In order to improve the subsequent use of the collected content, they are automatically subjected to a content analysis. Regardless of the crawl specification, the occurring entities, events and topics, as well as the underlying mood of the page are extracted and analyzed and stored along with the page. The archivist or the subsequent users can then use various facets to specifically access content in the archive. Role of the L3S The role of the L3S in the project is to develop methods for consolidating extracted entities and events to enrich the archives and control the crawler. Furthermore, the L3S is working on new approaches to the recognition of dynamics in languages, for example, to recognize new synonyms for entities. The L3S is also involved in social web analysis and crawler development with a focus on prioritizing URLs for crawler control. Potential applications To demonstrate and validate the project results, the ARCOMEM crawler is used in two application scenarios. The first scenario aims at the social web driven archiving of media web sites as needed by broadcasters and broadcasters. The second scenario uses ARCOMEM technology to create political web archives by two European parliaments. More information about the project can be found on the ARCOMEM homepage or follow us on Twitter.