Semi-Supervised Action Recognition with
Temporal Contrastive Learning
Ankit Singh1
Omprakash Chakraborty2
Ashutosh Varshney2
Rameswar Panda3
Rogerio Feris3
Kate Saenko3,4
Abir Das2
1 IIT Madras
2 IIT Kharagpur
3 MIT-IBM Watson AI Lab
4 Boston University
CVPR 2021

Abstract

Learning to recognize actions from only a handful of labeled videos is a challenging problem due to the scarcity of tediously collected activity labels. We approach this problem by learning a two-pathway temporal contrastive model using unlabeled videos at two different speeds leveraging the fact that changing video speed does not change an action. Specifically, we propose to maximize the similarity between encoded representations of the same video at two different speeds as well as minimize the similarity between different videos played at different speeds. This way we use the rich supervisory information in terms of ‘time’ that is present in otherwise unsupervised pool of videos. With this simple yet effective strategy of manipulating video playbackrates, we considerably outperform video extensions of sophisticated state-of-the-art semi-supervised image recognition methods across multiple diverse benchmark datasets and network architectures. Interestingly, our proposed approach benefits from out-of-domain unlabeled videos showing generalization and robustness. We also perform rigorous ablations and analysis to validate our approach


Comparative Study of TCL


Comparison of top-1 accuracy for TCL(Ours) with Pseudo-Label and FixMatch baselines trained with different percentages of labeled training data



Results on Mini-Something-V2




Paper, code and other details

Ankit Singh*, Omprakash Chakraborty*, Ashutosh Varshney, Rameswar Panda, Rogerio Feris, Kate Saenko, Abir Das
Semi-Supervised Action Recognition with Temporal Contrastive Learning
Computer Vision and Pattern Recognition (CVPR), 2021
[PDF] [Supp] [Code] [Bibtex] [Video Presentation] [Poster]