Mono To3D

Weakly Supervised Learning for Depth Estimation in Monocular Images

Leveraging two types of machine learning methods, namely learning to rank and superset learning, this project develops novel approaches to depth estimation in monocular images. Both approaches merely require “weak” supervision, either in the form of relative (“object B is behind object A”) or rough absolute depth information (“object A is close to the camera”), and thereby facilitate the acquisition of training data. As predictions, they produce qualitative depth maps in the form of rankings, specifying the relative order of objects in a scene. This is in contrast to conventional approaches based on statistical regression, which require precise training data and produce (unnecessarily) precise predictions. For both approaches, machine learning algorithms specifically tailored for the problem of depth estimation will be developed. These will be combined with two approaches to feature construction: the systematic (hand-crafted) modeling of monocular depth clues of human perception, and the use of deep neural networks for representation learning. Our qualitative, weakly supervised approaches to monocular depth estimation will be analyzed and compared with each other, as well as with existing approaches based on statistical regression. Last but not least, the benefits of our new algorithms will be investigated for several important applications, namely visual concept classification in images and videos, visual concept detection (by means of localization), and image segmentation.

Funding program

Deutsche Forschungsgemeinschaft (DFG) Research Grant


Prof. Dr. Ralph Ewerth

Project Coordinator