NatLit
Natural language literature research in the metadata of the German National Library based on large language models
The German National Library (Deutsche Nationalbibliothek, DNB) serves as Germany’s central institution for collecting and indexing publications, fulfilling its legal mandate to comprehensively record all German and German-language works published since 1913 while providing free access to its metadata catalog. With over 33 million entries, extensive supplementary content, and an emerging natural-language interface powered by large language models, it functions as a key and increasingly accessible resource for academic research.
The aim of the NatLit project is to find the publications indexed in the DNB’s metadata better and more accurately for research questions by interacting with LLMs. Users without any bibliographic knowledge or knowledge of the query language are supported in carrying out complex, precise and comprehensive search queries. The interactive literature search will take place in a natural language chat in two phases. In phase 1 (literature search), a user formulates a search query for publications in the DNB holdings. Using a Retrieval-Augmented Generation approach, relevant entities in the query are extracted and relevant metadata in the DNB catalog is identified via a subgraph search and made available to a LLM – in addition to library background knowledge – to answer the question. In phase 2 (media summary), users can ask questions about the publications found, which are then answered by an LLM. The quality of the results of this literature research with large language models should be comparable or better than with conventional methods, measured against the criteria of technical functionality, search and response quality, and user experience. NatLit as a pilot project for the use of large language models with the extensive database of a national library is intended to test a simpler and more efficient literature search.
DFG funding: Wissenschaftliche Literaturversorgungs- und Informationssysteme (LIS): e-Research-Technologien
- Deutsche Nationalbibliothek