Learning Egocentric In-Hand Object Segmentation through Weak Supervision from Human Narrations
Click Here to...
One Patch to Caption Them All: A Unified Zero-Shot Captioning Framework
Click Here to...
Talking to DINO: Bridging Self-Supervised Vision Backbones with Language for Open-Vocabulary Segmentation
Click Here to...