Generative Prompt Model for Weakly Supervised Object Localization.

Weakly supervised object localization (WSOL) remains challenging when learning object localization models from image category labels. ... In this study, we propose a generative prompt model (GenPromp), defining the first generative pipeline to localize less discriminative object parts by formulating WSOL as a conditional image denoising ... Such strong results clearly demonstrate the superiority of the generative model over conventional discriminative models for weakly supervised object localization. ...

arXiv:2307.09756v1 fatcat:e7k54bktn5euzbx23uciyvstm4

Open Access

It also addresses the SAM's problems of requiring prompts and category unawareness for automatic object detection and segmentation. ... This paper introduces WeakSAM and solves the weakly-supervised object detection (WSOD) and segmentation by utilizing the pre-learned world knowledge contained in a vision foundation model, i.e., the Segment ... Weakly-supervised Object Detection Weakly-supervised object detection (WSOD) with imagelevel labels (Laptev et al.; Diba et al., 2017; Tang et al., 2018b; Gao et al., 2018; Wan et al., 2018; Zhang et ...

arXiv:2402.14812v1 fatcat:hw2dfj6oljesjpzbtrxk6al42i

Then, we design a task-related query prompt module to specifically tailor generated pseudo language queries for visual grounding tasks. ... weakly-supervised visual grounding methods on all the five datasets we have experimented. ... A key difference between our approach and weakly-supervised methods is that we can generate corresponding queries for the detected object which guarantees the correctness of map- . ...

arXiv:2203.08481v2 fatcat:rqbg4kktjvelzpuxqzwcqut4iq

Open Access Multiple Versions

We propose a strategy for augmenting existing text-image datasets with paraphrases using a large language model, and SelfEQ, a weakly-supervised strategy on visual explanation maps for paraphrases that ... Our work shows that the localization --"grounding"-- abilities of these models can be further improved by finetuning for self-consistent visual explanations. ... Conclusion In this paper, we propose a novel weakly-supervised tuning approach coupled with a data augmentation strategy to enhance the localization capabilities of a purely imagetext pair supervised model ...

arXiv:2312.04554v1 fatcat:gkilctlrnvhtlje6ntxnq34i7q

Open Access

To this end, we propose a novel unified weakly supervised OVSS pipeline that can perform ZSS, FSS and Cross-dataset segmentation on novel classes without using pixel-level labels for either the base (seen ... map class prompts to image features using frozen CLIP (a vision-language model) and ii) decouples weak ZSS/FSS into weak semantic segmentation and Zero-Shot segmentation. ... Without any information allowing the model to localize objects, this setting is perhaps the hardest for WSS. ...

arXiv:2302.14163v1 fatcat:ooqrlny3fndkrdyyojypyssvmu

Open Access

Motivated by this progress, in this work we question whether other fundamental problems, such as weakly supervised semantic segmentation (WSSS), can benefit from prompt tuning. ... These results highlight not only the benefits of language-vision models in WSSS but also the potential of prompt learning for this problem. The code is available at https://github.com/rB080/WSS_POLE. ... Results How effective is prompt learning for weakly supervised segmentation? ...

arXiv:2307.00097v3 fatcat:ux4rrqacgffqrlqditbqlbtmym

Open Access Multiple Versions

A promising strategy to address those challenges is to exploit knowledge from large-scale pretrained models (e.g., CLIP), but a direct knowledge distillation strategy does not perform well on the weakly-supervised ... One generalizable and scalable strategy for HOI detection is to use weak supervision, learning from image-level annotations only. ... G THE PROMPT GENERATION FOR V-COCO For the V-COCO dataset, each action has two different semantic roles ('instrument' and 'object') for different objects, like 'cut cake' and 'cut with knife'. ...

arXiv:2303.01313v1 fatcat:zw6sxvpqbnbmpk7kinkeqy3mza

Open Access

Our method automatically generates pseudo-mask annotations by leveraging the localization ability of a pre-trained vision-language model for objects present in image-caption pairs. ... In this work, we overcome this issue by learning both base and novel categories from pseudo-mask annotations generated by the vision-language model in a weakly supervised manner using our proposed Mask-free ... Given a novel object's text name, we can utilize the name as a text prompt to localize this object in an image with a pre-trained visionlanguage model. ...

arXiv:2303.16891v1 fatcat:rvcenpkosbfrxky77wxys4iyca

The proposed model is evaluated on the TBX-Att dataset and will serve as a solid baseline for future research. ... It also includes the public TBX11K dataset with 11200 X-ray images to facilitate weakly supervised detection. ... -The proposed method improves object detection baselines [21, 12] by large margins on TBX-Att, leading to a solid benchmark for weakly supervised TB detection. 2 Related Work Object Detection Object ...

arXiv:2207.00251v1 fatcat:ntkmjsopubdwdcouxc6fwn3zou

Open Access

In response, this paper introduces a weakly supervised anomaly detection framework that focuses on efficient context modeling and enhanced semantic discriminability. ... Additionally, we propose a Prompt-Enhanced Learning (PEL) module that integrates semantic priors using knowledge-based prompts to boost the discriminative capacity of context features while ensuring separability ... Two-stage self-training methods have emerged to generate high-confidence pseudo-labels for video snippets, recasting weakly supervised anomaly detection as a supervised task with noisy labels. ...

arXiv:2306.14451v2 fatcat:2uu4sha3ibh3hay5so2ruqboaa

Multiple Versions

Specifically, we design prompts and fill them with the bounding box annotations to generate descriptions containing extensive hints and context for instances recognition and localization. ... In this paper, we take advantage of language prompt to introduce effective and unbiased linguistic supervision into object detection, and propose a new mechanism called multimodal knowledge learning (MKL ... To fully improve the efficiency of multimodal supervision, we generate prompt-based object-level descriptions in objectlevel MKL. ...

arXiv:2205.04072v1 fatcat:uh53zwvphbgtbguws33gfcujum

In this paper we propose a novel approach for creating semantic segmentation masks for every object, without the need for training segmentation networks or seeing any segmentation masks. ... We utilize a vision-language embedding model (specifically CLIP) to create a rough segmentation map for each class, using model interpretability methods. ... Weakly-supervised salient object detec- back and discriminative features for zero-shot classification. ...

arXiv:2112.03185v1 fatcat:k7tgvamso5frzkhqmxqrjs77am

Specifically, we resort to rich image pre-trained models, by which the point-cloud detector learns localizing objects under the supervision of predicted 2D bounding boxes from 2D pre-trained detectors. ... localizing various objects, and 2) connecting textual and point-cloud representations to enable the detector to classify novel object categories based on text prompting. ... For the cross-modal weakly-supervised learning, the 2D bounding boxes predicted by 2D pre-trained models serve as weak supervision for 3D point-cloud detectors. ...

arXiv:2304.00788v1 fatcat:h434kb6i3ngx7b25vohmsgf5cy

a weakly supervised prompt learning model. ... To address this problem, we propose a weakly supervised prompt learning method MedPrompt to automatically generate medical prompts, which includes an unsupervised pre-trained vision-language model and ... Conclusion In this work, we propose a weakly supervised prompt learning method that can automatically generate medical text prompts for large-scale pre-trained vision-language models. ...

arXiv:2402.03783v1 fatcat:oldhkdmd6rdpphybinpsiukbme

To generate such images, these models must understand the semantics of the objects they are asked to generate. ... Specifically, given an image, we optimize the prompt embeddings of these models for maximum attention on the regions of interest. ... For example, as an immediate application, our method can be used to scale up training of 3D generative models such as FigNeRF [49] with images from the web without human supervision. ...

arXiv:2305.15581v2 fatcat:ae54pjjeprfebm5zlyu23vx2bu

Multiple Versions

Generative Prompt Model for Weakly Supervised Object Localization [article]

Preserved Fulltext

WeakSAM: Segment Anything Meets Weakly-supervised Instance-level Recognition [article]

Preserved Fulltext

Pseudo-Q: Generating Pseudo Language Queries for Visual Grounding [article]

Preserved Fulltext

Improved Visual Grounding through Self-Consistent Explanations [article]

Preserved Fulltext

A Language-Guided Benchmark for Weakly Supervised Open Vocabulary Semantic Segmentation [article]

Preserved Fulltext

Prompting classes: Exploring the Power of Prompt Class Learning in Weakly Supervised Semantic Segmentation [article]

Preserved Fulltext

Other Versions

Weakly-supervised HOI Detection via Prior-guided Bi-level Representation Learning [article]

Preserved Fulltext

Mask-free OVIS: Open-Vocabulary Instance Segmentation without Manual Mask Annotations [article]

Preserved Fulltext

Computer-aided Tuberculosis Diagnosis with Attribute Reasoning Assistance [article]

Preserved Fulltext

Learning Prompt-Enhanced Context Features for Weakly-Supervised Video Anomaly Detection [article]

Preserved Fulltext

Other Versions

Beyond Bounding Box: Multimodal Knowledge Learning for Object Detection [article]

Preserved Fulltext

Semantic Segmentation In-the-Wild Without Seeing Any Segmentation Examples [article]

Preserved Fulltext

Open-Vocabulary Point-Cloud Object Detection without 3D Annotation [article]

Preserved Fulltext

Exploring Low-Resource Medical Image Classification with Weakly Supervised Prompt Learning [article]

Preserved Fulltext

Unsupervised Semantic Correspondence Using Stable Diffusion [article]

Preserved Fulltext

Other Versions