Skip to main content

The Uncovr project aims to improve the identification and linking of musical works in videos. Methods from artificial and collective intelligence will be applied and developed for this purpose. For example, we will investigate what types of tasks and incentive systems are suitable for obtaining high-quality video-work assignments with the help of serious games or crowdsourcing. Likewise, it will be investigated how machine learning methods, for example meta-learning and similarity learning approaches, are suitable for solving the task.


On popular online video platforms, such as YouTube, thousands of videos containing music are uploaded every day. Besides official music videos, this also includes videos with cover versions or live interpretations, as well as videos containing background music for the actual content. A correct identification and linking of musical works in music videos is a prerequisite for a fair compensation of the artists who wrote and performed the works.

The Uncovr project tackles this challenge by combining approaches from artificial intelligence and collective intelligence to identify interpretations of musical works in videos, beyond official music videos. For this purpose, already existing web resources with metadata on works, interpretations, and music videos are linked together and are then supplemented and consolidated using collective intelligence, for example with the help of serious games or crowdsourcing.

Main research questions are the design of adequate tasks and incentive systems as well as the selection of suitable (sub-)datasets. The dataset is then used to train machine learning methods, in particular deep neural networks, to enable the automatic identification and linking. The challenges are, for example, the choice of suitable representations and the formulation of adequate tasks.

The project is investigating the extent to which meta-learning and similarity learning approaches are suitable for solving this task. A further goal is to develop methods that intelligently combine algorithmic and human recognition performance in order to efficiently process very large amounts of data.