Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Filters








8,616 Hits in 2.4 sec

Unsupervised Image Captioning

Yang Feng, Lin Ma, Wei Liu, Jiebo Luo
2019 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)  
natural sentences to facilitate the unsupervised image captioning scenario.  ...  In this paper, we make the first attempt to train an image captioning model in an unsupervised manner.  ...  Unsupervised Image Captioning Unsupervised image captioning relies on a set of images I = {I 1 , . . . , I Ni }, a set of sentencesŜ = {Ŝ 1 , . . .  ... 
doi:10.1109/cvpr.2019.00425 dblp:conf/cvpr/Feng00L19a fatcat:sxtjh2o3svdrnaj4isa4rgw2sm

Unsupervised Image Captioning [article]

Yang Feng, Lin Ma, Wei Liu, Jiebo Luo
2019 arXiv   pre-print
natural sentences to facilitate the unsupervised image captioning scenario.  ...  In this paper, we make the first attempt to train an image captioning model in an unsupervised manner.  ...  Unsupervised Image Captioning Unsupervised image captioning relies on a set of images I = {I 1 , . . . , I Ni }, a set of sentencesŜ = {Ŝ 1 , . . .  ... 
arXiv:1811.10787v2 fatcat:odrroqn2cnfpxjroupyo5ggxde

Object-Centric Unsupervised Image Captioning [article]

Zihang Meng, David Yang, Xuefei Cao, Ashish Shah, Ser-Nam Lim
2022 arXiv   pre-print
In this paper, we explore the task of unsupervised image captioning which utilizes unpaired images and texts to train the model so that the texts can come from different sources than the images.  ...  Image captioning is a longstanding problem in the field of computer vision and natural language processing.  ...  than English, and really speaks to the importance of making advances in unsupervised image caption.  ... 
arXiv:2112.00969v2 fatcat:zoqnbuwxwve4jotqmgmfkqkcim

Removing Word-Level Spurious Alignment between Images and Pseudo-Captions in Unsupervised Image Captioning [article]

Ukyo Honda, Yoshitaka Ushiku, Atsushi Hashimoto, Taro Watanabe, Yuji Matsumoto
2021 arXiv   pre-print
Unsupervised image captioning is a challenging task that aims at generating captions without the supervision of image-sentence pairs, but only with images and sentences drawn from different sources and  ...  The focus of the previous work was on the alignment of input images and pseudo-captions at the sentence level. However, pseudo-captions contain many words that are irrelevant to a given image.  ...  That is, every image has completely matched captions in pseudo-captions, which is not the case in unsupervised image captioning.  ... 
arXiv:2104.13872v2 fatcat:gh4fxuhdljb3bozcd6sa3d6nwi

UNISON: Unpaired Cross-lingual Image Captioning [article]

Jiahui Gao, Yi Zhou, Philip L. H. Yu, Shafiq Joty, Jiuxiang Gu
2022 arXiv   pre-print
The traditional paradigm of image captioning relies on paired image-caption datasets to train the model in a supervised manner.  ...  Image captioning has emerged as an interesting research field in recent years due to its broad application scenarios.  ...  Since the vocabulary in the collected MT corpus is quite different from the vocabulary in image caption datasets, we filter the sentences in MT datasets according to an existing caption-style dictionary  ... 
arXiv:2010.01288v3 fatcat:byutcnokejfrbm4cdcluyjjn7a

Removing Partial Mismatches in Unsupervised Image Captioning

Ukyo Honda, Atsushi Hashimoto, Taro Watanabe, Yuji Matsumoto
2022 Transactions of the Japanese society for artificial intelligence  
Unsupervised image captioning is a task to describe images without the supervision of image-sentence pairs.  ...  They focused on aligning the pseudo-captions with input images at the sentence level. However, pseudo-captions contain many words that are irrelevant to a given image.  ...  * 6 [Laina 19] で用いられる Conceptual Captions は,ウェブ上で 収集したテキストデータに対して,低頻度語を含む文の除去 や固有名詞の上位語変換などのフィルタリングを行ったテキス トデータである.学習に使った Conceptual Captions の語彙は 15,412 語で,文中に現れる ⟨unk⟩(未知語を表す特殊トークン) の割合は約 0.3%であった.  ... 
doi:10.1527/tjsai.37-2_h-l82 fatcat:5i2indknbvdkbjzem37nqz6oyu

Improving Generalization of Image Captioning with Unsupervised Prompt Learning [article]

Hongchen Wei, Zhenzhong Chen
2023 arXiv   pre-print
In this paper, we propose an unsupervised prompt learning method to improve Generalization of Image Captioning (GeneIC), which learns a domain-specific prompt vector for the target domain without requiring  ...  Pretrained visual-language models have demonstrated impressive zero-shot abilities in image captioning, when accompanied by hand-crafted prompts.  ...  Therefore, we propose an unsupervised prompt learning method to improve generalization of image captioning.  ... 
arXiv:2308.02862v1 fatcat:vyzcwpuhm5clnphdvau3vy4cpy

Towards Unsupervised Image Captioning with Shared Multimodal Embeddings [article]

Iro Laina, Christian Rupprecht, Nassir Navab
2019 arXiv   pre-print
In this paper, we address image captioning by generating language descriptions of scenes without learning from annotated pairs of images and their captions.  ...  Our approach allows to exploit large text corpora outside the annotated distributions of image/caption data.  ...  In this work we explore unsupervised captioning, where image and language sources are independent.  ... 
arXiv:1908.09317v1 fatcat:nzdwgs22cjd4xbeaz5zu4npnfq

Recurrent Relational Memory Network for Unsupervised Image Captioning [article]

Dan Guo, Yang Wang, Peipei Song, Meng Wang
2020 arXiv   pre-print
Unsupervised image captioning with no annotations is an emerging challenge in computer vision, where the existing arts usually adopt GAN (Generative Adversarial Networks) models.  ...  R^2M encodes visual context through unsupervised training on images, while enabling the memory to learn from irrelevant textual corpus via supervised fashion.  ...  We perform unsupervised captioning through mess occurrences of common visual concepts in disjoint images and sentences.  ... 
arXiv:2006.13611v1 fatcat:w5uwfq6tknevzin2zoxj5gpgy4

Recurrent Relational Memory Network for Unsupervised Image Captioning

Dan Guo, Yang Wang, Peipei Song, Meng Wang
2020 Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence  
Unsupervised image captioning with no annotations is an emerging challenge in computer vision, where the existing arts usually adopt GAN (Generative Adversarial Networks) models.  ...  R2M encodes visual context through unsupervised training on images, while enabling the memory to learn from irrelevant textual corpus via supervised fashion.  ...  We perform unsupervised captioning through mess occurrences of common visual concepts in disjoint images and sentences.  ... 
doi:10.24963/ijcai.2020/128 dblp:conf/ijcai/GuoWSW20 fatcat:ttv3qb2kw5fyjatm3fknc25uk4

Triple Sequence Generative Adversarial Nets for Unsupervised Image Captioning

Yucheng Zhou, Wei Tao, Wenqiang Zhang
2021 ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)  
Labelling image-sentence is expensive and some unsupervised image captioning methods show promising results on caption generation.  ...  In the experiments, we use a large number of unpaired images and sentences to train our model on the unsupervised and unpaired setting.  ...  [7] propose three objectives to train the unsupervised image captioning model without any labelled image-sentence pairs.  ... 
doi:10.1109/icassp39728.2021.9414335 fatcat:76phn3ilwnhmtasabefmtxxzpy

Unsupervised Vision-and-Language Pre-training Without Parallel Images and Captions [article]

Liunian Harold Li, Haoxuan You, Zhecan Wang, Alireza Zareian, Shih-Fu Chang, Kai-Wei Chang
2021 arXiv   pre-print
Inspired by unsupervised machine translation, we investigate if a strong V&L representation model can be learned through unsupervised pre-training without image-caption corpora.  ...  However, existing models require a large amount of parallel image-caption data for pre-training. Such data are costly to collect and require cumbersome curation.  ...  More specifically, we use 3M images from CC and 1M captions from SBU captions (Ordonez et al., 2011) .  ... 
arXiv:2010.12831v2 fatcat:ftyzelmc35dg3fwckci4kh5we4

Semantic-Enhanced Cross-Modal Fusion for Improved Unsupervised Image Captioning

Nan Xiang, Ling Chen, Leiyan Liang, Xingdi Rao, Zehao Gong
2023 Electronics  
Unsupervised image captioning often grapples with challenges such as image–text mismatches and modality gaps, resulting in suboptimal captions.  ...  The findings not only contribute to the advancement of image captioning techniques but also open avenues for future research.  ...  [3] presented an unsupervised image description method utilizing generative adversarial networks (GAN) to generate diverse image captions.  ... 
doi:10.3390/electronics12173549 fatcat:r2tfelzxxjbrhe32hpu65wcblm

Removing Word-Level Spurious Alignment between Images and Pseudo-Captions in Unsupervised Image Captioning

Ukyo Honda, Yoshitaka Ushiku, Atsushi Hashimoto, Taro Watanabe, Yuji Matsumoto
2021 Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume   unpublished
Unsupervised image captioning is a challenging task that aims at generating captions without the supervision of image-sentence pairs, but only with images and sentences drawn from different sources and  ...  The focus of the previous work was on the alignment of input images and pseudo-captions at the sentence level. However, pseudo-captions contain many words that are irrelevant to a given image.  ...  That is, every image has completely matched captions in pseudo-captions, which is not the case in unsupervised image captioning.  ... 
doi:10.18653/v1/2021.eacl-main.323 fatcat:onivmomwgjaqla4mmdynipi5hi

Unsupervised Vision-and-Language Pre-training Without Parallel Images and Captions [article]

Liunian Harold Li, Haoxuan You, Zhecan Wang, Alireza Zareian, Shih-Fu Chang, Kai-Wei Chang
2020
Inspired by unsupervised machine translation, we investigate if a strong V&L representation model can be learned through unsupervised pre-training without image-caption corpora.  ...  However, existing models require a large amount of parallel image-caption data for pre-training. Such data are costly to collect and require cumbersome curation.  ...  2018) and unsupervised image captioning (Feng et al., 2019) .  ... 
doi:10.48550/arxiv.2010.12831 fatcat:jrc37cyvl5auxiqsizgu7r3gam
« Previous Showing results 1 — 15 out of 8,616 results