Foto: ©PubPharm

Issue: 03/2018

Machine Learning in Science

Tailor-made information supply

The digital transformation is in full swing. Big data – the generation, linking and evaluation of large amounts of data – influences almost all areas of life in our digital society and has become an indispensable part of science. But the promises that the digital age brings with it in the field of literature supply and information infrastructures will not come true on their own. Libraries as central knowledge providers are faced with a huge challenge: So-called data lakes, which bring together the most diverse types of scientific data in their natural form, need strong structuring, rigorous metadata management and tailor-made services for search and data access in order not to end up as useless data swamps.

One step in this direction is the Fachinformationsdienste für die Wissenschaft (FID), which replaces the special collections of university libraries, one of the oldest funding programmes of the German Research Foundation (DFG), as of 2011. The FIDs are intended to give scientists in Germany direct and convenient access to specialist literature and research-relevant information, regardless of their location. As a nationwide system, they supplement the information infrastructures of universities, research institutions and research industry with supraregional services for peak demand.

PubPharm, the specialist information service for pharmacy, has been supported by the Braunschweig University Library since 2014 – in close cooperation with scientists from the L3S at the Institute for Information Systems at the TU Braunschweig. The interdisciplinary research that characterises the L3S is also proving to be a model of success here: the combination of library expertise, specialist scientific competence and research-based computer science creates innovation for the pharmaceutical world.

PubPharm focuses on the user and his information needs. In this direction, the expandable and personalizable information infrastructure will be further expanded. This should also make the range of services offered more flexible and more accurate. This is done using machine learning and deep learning technologies that enable semantically enriched searches and direct data access to relevant literature. The specialist information service is therefore still oriented towards clearly defined scientific products such as publications, research data sets, patents or software as central elements of knowledge transfer. However, it is indispensable for the further gain of knowledge to make it accessible with comprehensive and high-quality semantic metadata.

In complex knowledge spaces, users could only navigate with the help of bibliographic attributes such as authors, publication year or publication form. Recently, however, corresponding services have increasingly concentrated on entity-centered information, i.e. information that relates to the content of individual scientific products, such as active substances, molecules or chemical substances mentioned in publications. Using deep learning techniques, PubPharm learns the context in which this entity information occurs from millions of specialist publications and then links it permanently to the corresponding publications. In contrast to classic Linked Open Data or subject databases, which contain individual pieces of information torn out of context, PubPharm makes the linking of concepts visible. Users receive explanations on the links and can approach their research object from different perspectives. The scientific products and their relevant entities no longer stand on their own, but form a network without whose deeper understanding scientific innovation is hardly possible.