L3S Best Publication of the Quarter (Q2/2025)
Category: Vision-Language Models
Aligning Visual Contrastive Learning Models via Preference Optimization
Authors: Amirabbas Afzali, Borna Khodabandeh, Ali Rasekh, Mahyar JafariNodeh, Sepehr Kazemi, Simon Gottschalk
The paper in a nutshell:
Our paper addresses critical vulnerabilities in AI vision systems that understand both images and text, such as those used in search engines, autonomous vehicles, and content moderation. These systems can, on the one hand, be easily fooled by simple tricks like adding misleading text to images and, on the other hand, exhibit unfair biases towards certain groups. We developed a new training method that teaches these AI systems to behave more reliably by learning from human preferences about what constitutes correct behavior. Our approach is like having a human supervisor guide the AI to make better decisions, resulting in systems that are more robust against attacks and fairer in their judgments while maintaining their original capabilities.
Which problem do you solve with your research?
We tackle two major problems in modern AI vision systems: susceptibility to simple attacks and unfair biases. Such vision-language systems like CLIP can be easily deceived when attackers add misleading text to images (called “typographic attacks”), causing them to misidentify objects in the image. Additionally, these systems can exhibit gender and racial biases inherited from their training data, leading to unfair outcomes in real-world applications like hiring tools or content recommendation systems.
What is the potential impact of your findings?
Our research has significant implications for making AI systems safer and more trustworthy in critical applications. By improving robustness against attacks, our method could for example be utilised to enhance security in autonomous vehicles, medical imaging systems, and content moderation platforms. The bias mitigation capabilities could for example be utilised to increase fairness of AI systems in hiring, lending, and criminal justice applications. Furthermore, our approach provides a framework that other researchers can build upon to accelerate the development of more reliable and ethically sound AI technologies.
What is new about your research?
Our work is the first to successfully apply preference optimization techniques—previously used only for text-generating AI models—to vision-language systems that understand both images and text. We developed novel methods that allow fine-grained control over model behaviour: we can make systems more resistant to attacks while preserving their original capabilities or even reverse biased concepts (like gender stereotypes) without affecting performance on other tasks. This level of precise control over AI behavior was not possible with previous training methods and opens new possibilities for creating more aligned and controllable AI systems.
Paper link: https://arxiv.org/abs/2411.08923