Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Filters








6,868 Hits in 4.1 sec

Generative Prompt Model for Weakly Supervised Object Localization [article]

Yuzhong Zhao, Qixiang Ye, Weijia Wu, Chunhua Shen, Fang Wan
2023 arXiv   pre-print
Weakly supervised object localization (WSOL) remains challenging when learning object localization models from image category labels.  ...  In this study, we propose a generative prompt model (GenPromp), defining the first generative pipeline to localize less discriminative object parts by formulating WSOL as a conditional image denoising  ...  Such strong results clearly demonstrate the superiority of the generative model over conventional discriminative models for weakly supervised object localization.  ... 
arXiv:2307.09756v1 fatcat:e7k54bktn5euzbx23uciyvstm4

WeakSAM: Segment Anything Meets Weakly-supervised Instance-level Recognition [article]

Lianghui Zhu, Junwei Zhou, Yan Liu, Xin Hao, Wenyu Liu, Xinggang Wang
2024 arXiv   pre-print
It also addresses the SAM's problems of requiring prompts and category unawareness for automatic object detection and segmentation.  ...  This paper introduces WeakSAM and solves the weakly-supervised object detection (WSOD) and segmentation by utilizing the pre-learned world knowledge contained in a vision foundation model, i.e., the Segment  ...  Weakly-supervised Object Detection Weakly-supervised object detection (WSOD) with imagelevel labels (Laptev et al.; Diba et al., 2017; Tang et al., 2018b; Gao et al., 2018; Wan et al., 2018; Zhang et  ... 
arXiv:2402.14812v1 fatcat:hw2dfj6oljesjpzbtrxk6al42i

Pseudo-Q: Generating Pseudo Language Queries for Visual Grounding [article]

Haojun Jiang, Yuanze Lin, Dongchen Han, Shiji Song, Gao Huang
2022 arXiv   pre-print
Then, we design a task-related query prompt module to specifically tailor generated pseudo language queries for visual grounding tasks.  ...  weakly-supervised visual grounding methods on all the five datasets we have experimented.  ...  A key difference between our approach and weakly-supervised methods is that we can generate corresponding queries for the detected object which guarantees the correctness of map- .  ... 
arXiv:2203.08481v2 fatcat:rqbg4kktjvelzpuxqzwcqut4iq

Improved Visual Grounding through Self-Consistent Explanations [article]

Ruozhen He, Paola Cascante-Bonilla, Ziyan Yang, Alexander C. Berg, Vicente Ordonez
2023 arXiv   pre-print
We propose a strategy for augmenting existing text-image datasets with paraphrases using a large language model, and SelfEQ, a weakly-supervised strategy on visual explanation maps for paraphrases that  ...  Our work shows that the localization --"grounding"-- abilities of these models can be further improved by finetuning for self-consistent visual explanations.  ...  Conclusion In this paper, we propose a novel weakly-supervised tuning approach coupled with a data augmentation strategy to enhance the localization capabilities of a purely imagetext pair supervised model  ... 
arXiv:2312.04554v1 fatcat:gkilctlrnvhtlje6ntxnq34i7q

A Language-Guided Benchmark for Weakly Supervised Open Vocabulary Semantic Segmentation [article]

Prashant Pandey, Mustafa Chasmai, Monish Natarajan, Brejesh Lall
2023 arXiv   pre-print
To this end, we propose a novel unified weakly supervised OVSS pipeline that can perform ZSS, FSS and Cross-dataset segmentation on novel classes without using pixel-level labels for either the base (seen  ...  map class prompts to image features using frozen CLIP (a vision-language model) and ii) decouples weak ZSS/FSS into weak semantic segmentation and Zero-Shot segmentation.  ...  Without any information allowing the model to localize objects, this setting is perhaps the hardest for WSS.  ... 
arXiv:2302.14163v1 fatcat:ooqrlny3fndkrdyyojypyssvmu

Prompting classes: Exploring the Power of Prompt Class Learning in Weakly Supervised Semantic Segmentation [article]

Balamurali Murugesan, Rukhshanda Hussain, Rajarshi Bhattacharya, Ismail Ben Ayed, Jose Dolz
2024 arXiv   pre-print
Motivated by this progress, in this work we question whether other fundamental problems, such as weakly supervised semantic segmentation (WSSS), can benefit from prompt tuning.  ...  These results highlight not only the benefits of language-vision models in WSSS but also the potential of prompt learning for this problem. The code is available at https://github.com/rB080/WSS_POLE.  ...  Results How effective is prompt learning for weakly supervised segmentation?  ... 
arXiv:2307.00097v3 fatcat:ux4rrqacgffqrlqditbqlbtmym

Weakly-supervised HOI Detection via Prior-guided Bi-level Representation Learning [article]

Bo Wan, Yongfei Liu, Desen Zhou, Tinne Tuytelaars, Xuming He
2023 arXiv   pre-print
A promising strategy to address those challenges is to exploit knowledge from large-scale pretrained models (e.g., CLIP), but a direct knowledge distillation strategy does not perform well on the weakly-supervised  ...  One generalizable and scalable strategy for HOI detection is to use weak supervision, learning from image-level annotations only.  ...  G THE PROMPT GENERATION FOR V-COCO For the V-COCO dataset, each action has two different semantic roles ('instrument' and 'object') for different objects, like 'cut cake' and 'cut with knife'.  ... 
arXiv:2303.01313v1 fatcat:zw6sxvpqbnbmpk7kinkeqy3mza

Mask-free OVIS: Open-Vocabulary Instance Segmentation without Manual Mask Annotations [article]

Vibashan VS, Ning Yu, Chen Xing, Can Qin, Mingfei Gao, Juan Carlos Niebles, Vishal M. Patel, Ran Xu
2023 arXiv   pre-print
Our method automatically generates pseudo-mask annotations by leveraging the localization ability of a pre-trained vision-language model for objects present in image-caption pairs.  ...  In this work, we overcome this issue by learning both base and novel categories from pseudo-mask annotations generated by the vision-language model in a weakly supervised manner using our proposed Mask-free  ...  Given a novel object's text name, we can utilize the name as a text prompt to localize this object in an image with a pre-trained visionlanguage model.  ... 
arXiv:2303.16891v1 fatcat:rvcenpkosbfrxky77wxys4iyca

Computer-aided Tuberculosis Diagnosis with Attribute Reasoning Assistance [article]

Chengwei Pan, Gangming Zhao, Junjie Fang, Baolian Qi, Jiaheng Liu, Chaowei Fang, Dingwen Zhang, Jinpeng Li, Yizhou Yu
2022 arXiv   pre-print
The proposed model is evaluated on the TBX-Att dataset and will serve as a solid baseline for future research.  ...  It also includes the public TBX11K dataset with 11200 X-ray images to facilitate weakly supervised detection.  ...  -The proposed method improves object detection baselines [21, 12] by large margins on TBX-Att, leading to a solid benchmark for weakly supervised TB detection. 2 Related Work Object Detection Object  ... 
arXiv:2207.00251v1 fatcat:ntkmjsopubdwdcouxc6fwn3zou

Learning Prompt-Enhanced Context Features for Weakly-Supervised Video Anomaly Detection [article]

Yujiang Pu, Xiaoyu Wu, Lulu Yang, Shengjin Wang
2024 arXiv   pre-print
In response, this paper introduces a weakly supervised anomaly detection framework that focuses on efficient context modeling and enhanced semantic discriminability.  ...  Additionally, we propose a Prompt-Enhanced Learning (PEL) module that integrates semantic priors using knowledge-based prompts to boost the discriminative capacity of context features while ensuring separability  ...  Two-stage self-training methods have emerged to generate high-confidence pseudo-labels for video snippets, recasting weakly supervised anomaly detection as a supervised task with noisy labels.  ... 
arXiv:2306.14451v2 fatcat:2uu4sha3ibh3hay5so2ruqboaa

Beyond Bounding Box: Multimodal Knowledge Learning for Object Detection [article]

Weixin Feng, Xingyuan Bu, Chenchen Zhang, Xubin Li
2022 arXiv   pre-print
Specifically, we design prompts and fill them with the bounding box annotations to generate descriptions containing extensive hints and context for instances recognition and localization.  ...  In this paper, we take advantage of language prompt to introduce effective and unbiased linguistic supervision into object detection, and propose a new mechanism called multimodal knowledge learning (MKL  ...  To fully improve the efficiency of multimodal supervision, we generate prompt-based object-level descriptions in objectlevel MKL.  ... 
arXiv:2205.04072v1 fatcat:uh53zwvphbgtbguws33gfcujum

Semantic Segmentation In-the-Wild Without Seeing Any Segmentation Examples [article]

Nir Zabari, Yedid Hoshen
2021 arXiv   pre-print
In this paper we propose a novel approach for creating semantic segmentation masks for every object, without the need for training segmentation networks or seeing any segmentation masks.  ...  We utilize a vision-language embedding model (specifically CLIP) to create a rough segmentation map for each class, using model interpretability methods.  ...  Weakly-supervised salient object detec- back and discriminative features for zero-shot classification.  ... 
arXiv:2112.03185v1 fatcat:k7tgvamso5frzkhqmxqrjs77am

Open-Vocabulary Point-Cloud Object Detection without 3D Annotation [article]

Yuheng Lu, Chenfeng Xu, Xiaobao Wei, Xiaodong Xie, Masayoshi Tomizuka, Kurt Keutzer, Shanghang Zhang
2023 arXiv   pre-print
Specifically, we resort to rich image pre-trained models, by which the point-cloud detector learns localizing objects under the supervision of predicted 2D bounding boxes from 2D pre-trained detectors.  ...  localizing various objects, and 2) connecting textual and point-cloud representations to enable the detector to classify novel object categories based on text prompting.  ...  For the cross-modal weakly-supervised learning, the 2D bounding boxes predicted by 2D pre-trained models serve as weak supervision for 3D point-cloud detectors.  ... 
arXiv:2304.00788v1 fatcat:h434kb6i3ngx7b25vohmsgf5cy

Exploring Low-Resource Medical Image Classification with Weakly Supervised Prompt Learning [article]

Fudan Zheng, Jindong Cao, Weijiang Yu, Zhiguang Chen, Nong Xiao, Yutong Lu
2024 arXiv   pre-print
a weakly supervised prompt learning model.  ...  To address this problem, we propose a weakly supervised prompt learning method MedPrompt to automatically generate medical prompts, which includes an unsupervised pre-trained vision-language model and  ...  Conclusion In this work, we propose a weakly supervised prompt learning method that can automatically generate medical text prompts for large-scale pre-trained vision-language models.  ... 
arXiv:2402.03783v1 fatcat:oldhkdmd6rdpphybinpsiukbme

Unsupervised Semantic Correspondence Using Stable Diffusion [article]

Eric Hedlin, Gopal Sharma, Shweta Mahajan, Hossam Isack, Abhishek Kar, Andrea Tagliasacchi, Kwang Moo Yi
2023 arXiv   pre-print
To generate such images, these models must understand the semantics of the objects they are asked to generate.  ...  Specifically, given an image, we optimize the prompt embeddings of these models for maximum attention on the regions of interest.  ...  For example, as an immediate application, our method can be used to scale up training of 3D generative models such as FigNeRF [49] with images from the web without human supervision.  ... 
arXiv:2305.15581v2 fatcat:ae54pjjeprfebm5zlyu23vx2bu
« Previous Showing results 1 — 15 out of 6,868 results