A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2023; you can also visit the original URL.
The file type is application/pdf
.
Filters
Generative Prompt Model for Weakly Supervised Object Localization
[article]
2023
arXiv
pre-print
Weakly supervised object localization (WSOL) remains challenging when learning object localization models from image category labels. ...
In this study, we propose a generative prompt model (GenPromp), defining the first generative pipeline to localize less discriminative object parts by formulating WSOL as a conditional image denoising ...
Such strong results clearly demonstrate the superiority of the generative model over conventional discriminative models for weakly supervised object localization. ...
arXiv:2307.09756v1
fatcat:e7k54bktn5euzbx23uciyvstm4
WeakSAM: Segment Anything Meets Weakly-supervised Instance-level Recognition
[article]
2024
arXiv
pre-print
It also addresses the SAM's problems of requiring prompts and category unawareness for automatic object detection and segmentation. ...
This paper introduces WeakSAM and solves the weakly-supervised object detection (WSOD) and segmentation by utilizing the pre-learned world knowledge contained in a vision foundation model, i.e., the Segment ...
Weakly-supervised Object Detection Weakly-supervised object detection (WSOD) with imagelevel labels (Laptev et al.; Diba et al., 2017; Tang et al., 2018b; Gao et al., 2018; Wan et al., 2018; Zhang et ...
arXiv:2402.14812v1
fatcat:hw2dfj6oljesjpzbtrxk6al42i
Pseudo-Q: Generating Pseudo Language Queries for Visual Grounding
[article]
2022
arXiv
pre-print
Then, we design a task-related query prompt module to specifically tailor generated pseudo language queries for visual grounding tasks. ...
weakly-supervised visual grounding methods on all the five datasets we have experimented. ...
A key difference between our approach and weakly-supervised methods is that we can generate corresponding queries for the detected object which guarantees the correctness of map- . ...
arXiv:2203.08481v2
fatcat:rqbg4kktjvelzpuxqzwcqut4iq
Improved Visual Grounding through Self-Consistent Explanations
[article]
2023
arXiv
pre-print
We propose a strategy for augmenting existing text-image datasets with paraphrases using a large language model, and SelfEQ, a weakly-supervised strategy on visual explanation maps for paraphrases that ...
Our work shows that the localization --"grounding"-- abilities of these models can be further improved by finetuning for self-consistent visual explanations. ...
Conclusion In this paper, we propose a novel weakly-supervised tuning approach coupled with a data augmentation strategy to enhance the localization capabilities of a purely imagetext pair supervised model ...
arXiv:2312.04554v1
fatcat:gkilctlrnvhtlje6ntxnq34i7q
A Language-Guided Benchmark for Weakly Supervised Open Vocabulary Semantic Segmentation
[article]
2023
arXiv
pre-print
To this end, we propose a novel unified weakly supervised OVSS pipeline that can perform ZSS, FSS and Cross-dataset segmentation on novel classes without using pixel-level labels for either the base (seen ...
map class prompts to image features using frozen CLIP (a vision-language model) and ii) decouples weak ZSS/FSS into weak semantic segmentation and Zero-Shot segmentation. ...
Without any information allowing the model to localize objects, this setting is perhaps the hardest for WSS. ...
arXiv:2302.14163v1
fatcat:ooqrlny3fndkrdyyojypyssvmu
Prompting classes: Exploring the Power of Prompt Class Learning in Weakly Supervised Semantic Segmentation
[article]
2024
arXiv
pre-print
Motivated by this progress, in this work we question whether other fundamental problems, such as weakly supervised semantic segmentation (WSSS), can benefit from prompt tuning. ...
These results highlight not only the benefits of language-vision models in WSSS but also the potential of prompt learning for this problem. The code is available at https://github.com/rB080/WSS_POLE. ...
Results How effective is prompt learning for weakly supervised segmentation? ...
arXiv:2307.00097v3
fatcat:ux4rrqacgffqrlqditbqlbtmym
Weakly-supervised HOI Detection via Prior-guided Bi-level Representation Learning
[article]
2023
arXiv
pre-print
A promising strategy to address those challenges is to exploit knowledge from large-scale pretrained models (e.g., CLIP), but a direct knowledge distillation strategy does not perform well on the weakly-supervised ...
One generalizable and scalable strategy for HOI detection is to use weak supervision, learning from image-level annotations only. ...
G THE PROMPT GENERATION FOR V-COCO For the V-COCO dataset, each action has two different semantic roles ('instrument' and 'object') for different objects, like 'cut cake' and 'cut with knife'. ...
arXiv:2303.01313v1
fatcat:zw6sxvpqbnbmpk7kinkeqy3mza
Mask-free OVIS: Open-Vocabulary Instance Segmentation without Manual Mask Annotations
[article]
2023
arXiv
pre-print
Our method automatically generates pseudo-mask annotations by leveraging the localization ability of a pre-trained vision-language model for objects present in image-caption pairs. ...
In this work, we overcome this issue by learning both base and novel categories from pseudo-mask annotations generated by the vision-language model in a weakly supervised manner using our proposed Mask-free ...
Given a novel object's text name, we can utilize the name as a text prompt to localize this object in an image with a pre-trained visionlanguage model. ...
arXiv:2303.16891v1
fatcat:rvcenpkosbfrxky77wxys4iyca
Computer-aided Tuberculosis Diagnosis with Attribute Reasoning Assistance
[article]
2022
arXiv
pre-print
The proposed model is evaluated on the TBX-Att dataset and will serve as a solid baseline for future research. ...
It also includes the public TBX11K dataset with 11200 X-ray images to facilitate weakly supervised detection. ...
-The proposed method improves object detection baselines [21, 12] by large margins on TBX-Att, leading to a solid benchmark for weakly supervised TB detection. 2 Related Work
Object Detection Object ...
arXiv:2207.00251v1
fatcat:ntkmjsopubdwdcouxc6fwn3zou
Learning Prompt-Enhanced Context Features for Weakly-Supervised Video Anomaly Detection
[article]
2024
arXiv
pre-print
In response, this paper introduces a weakly supervised anomaly detection framework that focuses on efficient context modeling and enhanced semantic discriminability. ...
Additionally, we propose a Prompt-Enhanced Learning (PEL) module that integrates semantic priors using knowledge-based prompts to boost the discriminative capacity of context features while ensuring separability ...
Two-stage self-training methods have emerged to generate high-confidence pseudo-labels for video snippets, recasting weakly supervised anomaly detection as a supervised task with noisy labels. ...
arXiv:2306.14451v2
fatcat:2uu4sha3ibh3hay5so2ruqboaa
Beyond Bounding Box: Multimodal Knowledge Learning for Object Detection
[article]
2022
arXiv
pre-print
Specifically, we design prompts and fill them with the bounding box annotations to generate descriptions containing extensive hints and context for instances recognition and localization. ...
In this paper, we take advantage of language prompt to introduce effective and unbiased linguistic supervision into object detection, and propose a new mechanism called multimodal knowledge learning (MKL ...
To fully improve the efficiency of multimodal supervision, we generate prompt-based object-level descriptions in objectlevel MKL. ...
arXiv:2205.04072v1
fatcat:uh53zwvphbgtbguws33gfcujum
Semantic Segmentation In-the-Wild Without Seeing Any Segmentation Examples
[article]
2021
arXiv
pre-print
In this paper we propose a novel approach for creating semantic segmentation masks for every object, without the need for training segmentation networks or seeing any segmentation masks. ...
We utilize a vision-language embedding model (specifically CLIP) to create a rough segmentation map for each class, using model interpretability methods. ...
Weakly-supervised salient object detec-
back and discriminative features for zero-shot classification. ...
arXiv:2112.03185v1
fatcat:k7tgvamso5frzkhqmxqrjs77am
Open-Vocabulary Point-Cloud Object Detection without 3D Annotation
[article]
2023
arXiv
pre-print
Specifically, we resort to rich image pre-trained models, by which the point-cloud detector learns localizing objects under the supervision of predicted 2D bounding boxes from 2D pre-trained detectors. ...
localizing various objects, and 2) connecting textual and point-cloud representations to enable the detector to classify novel object categories based on text prompting. ...
For the cross-modal weakly-supervised learning, the 2D bounding boxes predicted by 2D pre-trained models serve as weak supervision for 3D point-cloud detectors. ...
arXiv:2304.00788v1
fatcat:h434kb6i3ngx7b25vohmsgf5cy
Exploring Low-Resource Medical Image Classification with Weakly Supervised Prompt Learning
[article]
2024
arXiv
pre-print
a weakly supervised prompt learning model. ...
To address this problem, we propose a weakly supervised prompt learning method MedPrompt to automatically generate medical prompts, which includes an unsupervised pre-trained vision-language model and ...
Conclusion In this work, we propose a weakly supervised prompt learning method that can automatically generate medical text prompts for large-scale pre-trained vision-language models. ...
arXiv:2402.03783v1
fatcat:oldhkdmd6rdpphybinpsiukbme
Unsupervised Semantic Correspondence Using Stable Diffusion
[article]
2023
arXiv
pre-print
To generate such images, these models must understand the semantics of the objects they are asked to generate. ...
Specifically, given an image, we optimize the prompt embeddings of these models for maximum attention on the regions of interest. ...
For example, as an immediate application, our method can be used to scale up training of 3D generative models such as FigNeRF [49] with images from the web without human supervision. ...
arXiv:2305.15581v2
fatcat:ae54pjjeprfebm5zlyu23vx2bu
« Previous
Showing results 1 — 15 out of 6,868 results