Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
×
Mar 13, 2024 · Abstract:The pre-trained vision-language model, exemplified by CLIP, advances zero-shot semantic segmentation by aligning visual features ...
Mar 13, 2024 · It chooses visual features as key and value and class embeddings as query in cross attention layer. It progressively updates the class ...
Mar 13, 2024 · Equipped with a vision-language prompting strategy, this approach significantly boosts the generalization capacity of segmentation models ...
People also ask
A new approach for improving zero-shot semantic segmentation called Language-Driven Visual Consensus (LDVC) is introduced. By using class embeddings as ...
We demonstrate that our approach achieves highly competitive zero-shot performance compared to existing zero- and few-shot semantic segmentation methods ...
The pre-trained vision-language model, exemplified by CLIP, advances zero-shot semantic segmentation by aligning visual features with class embeddings ...
Equipped with a vision-language prompting strategy, our approach significantly boosts the generalization capacity of segmentation models for unseen classes.
A curated publication list on open vocabulary semantic segmentation and related area (e.g. zero-shot semantic segmentation) resources.
CLIP uses contrastive learning together with high-capacity language models and visual feature encoders to synthesize extremely robust models for zero-shot image ...
Missing: Consensus | Show results with:Consensus
The pre-trained vision-language model, exemplified by CLIP, advances zero-shot semantic segmentation by aligning visual features with class embeddings through a ...