XPL: A Cross-Model framework for Semi-Supervised Prompt Learning in Vision-Language Models
Omprakash Charkraborty1
Aadarsh Sahoo2
Rameswar Panda2
Abir Das1
1 IIT Kharagpur
2 MIT-IBM Watson AI Lab
TMLR 2024

Abstract

Prompt learning, which focuses on learning soft prompts, has emerged as a promising approach for efficiently adapting pretrained vision-language models (VLMs) to multiple downstream tasks. While prior works have shown promising performances on common benchmarks, they typically rely on labeled data samples only. This greatly discredits the information gain from the vast collection of otherwise unlabeled samples available in the wild. To mitigate this, we propose a simple yet efficient cross-model framework to leverage on the unlabeled samples achieving significant gain in model performance. Specifically, we employ a semi-supervised prompt learning approach which makes the learned prompts invariant to the different views of a given unlabeled sample. The multiple views are obtained using different augmentations on the images as well as by varying the lengths of visual and text prompts attached to these samples. Experimenting with this simple yet surprisingly effective approach over a large number of benchmark datasets, we observe a considerable improvement in the quality of soft prompts thereby making an immense gain in image classification performance. Interestingly, our approach also benefits from out-of-domain unlabeled images highlighting the robustness and generalization capabilities.

Experimental Results Overview

Paper & Code

Omprakash Charkraborty, Aadarsh Sahoo, Rameswar Panda, Abir Das
XPL: A Cross-Model framework for Semi-Supervised Prompt Learning in Vision-Language Models
TMLR, 2024
[PDF] [CODE]