Unsupervised Image Captioning. - Internet Archive Scholar

natural sentences to facilitate the unsupervised image captioning scenario. ... In this paper, we make the first attempt to train an image captioning model in an unsupervised manner. ... Unsupervised Image Captioning Unsupervised image captioning relies on a set of images I = {I 1 , . . . , I Ni }, a set of sentencesŜ = {Ŝ 1 , . . . ...

doi:10.1109/cvpr.2019.00425 dblp:conf/cvpr/Feng00L19a fatcat:sxtjh2o3svdrnaj4isa4rgw2sm

natural sentences to facilitate the unsupervised image captioning scenario. ... In this paper, we make the first attempt to train an image captioning model in an unsupervised manner. ... Unsupervised Image Captioning Unsupervised image captioning relies on a set of images I = {I 1 , . . . , I Ni }, a set of sentencesŜ = {Ŝ 1 , . . . ...

arXiv:1811.10787v2 fatcat:odrroqn2cnfpxjroupyo5ggxde

Multiple Versions

In this paper, we explore the task of unsupervised image captioning which utilizes unpaired images and texts to train the model so that the texts can come from different sources than the images. ... Image captioning is a longstanding problem in the field of computer vision and natural language processing. ... than English, and really speaks to the importance of making advances in unsupervised image caption. ...

arXiv:2112.00969v2 fatcat:zoqnbuwxwve4jotqmgmfkqkcim

Multiple Versions

Unsupervised image captioning is a challenging task that aims at generating captions without the supervision of image-sentence pairs, but only with images and sentences drawn from different sources and ... The focus of the previous work was on the alignment of input images and pseudo-captions at the sentence level. However, pseudo-captions contain many words that are irrelevant to a given image. ... That is, every image has completely matched captions in pseudo-captions, which is not the case in unsupervised image captioning. ...

arXiv:2104.13872v2 fatcat:gh4fxuhdljb3bozcd6sa3d6nwi

Open Access Multiple Versions

The traditional paradigm of image captioning relies on paired image-caption datasets to train the model in a supervised manner. ... Image captioning has emerged as an interesting research field in recent years due to its broad application scenarios. ... Since the vocabulary in the collected MT corpus is quite different from the vocabulary in image caption datasets, we filter the sentences in MT datasets according to an existing caption-style dictionary ...

arXiv:2010.01288v3 fatcat:byutcnokejfrbm4cdcluyjjn7a

Multiple Versions

Unsupervised image captioning is a task to describe images without the supervision of image-sentence pairs. ... They focused on aligning the pseudo-captions with input images at the sentence level. However, pseudo-captions contain many words that are irrelevant to a given image. ... * 6 [Laina 19] で用いられる Conceptual Captions は，ウェブ上で収集したテキストデータに対して，低頻度語を含む文の除去や固有名詞の上位語変換などのフィルタリングを行ったテキストデータである．学習に使った Conceptual Captions の語彙は 15,412 語で，文中に現れる ⟨unk⟩(未知語を表す特殊トークン) の割合は約 0.3%であった． ...

doi:10.1527/tjsai.37-2_h-l82 fatcat:5i2indknbvdkbjzem37nqz6oyu

Szczepanski

In this paper, we propose an unsupervised prompt learning method to improve Generalization of Image Captioning (GeneIC), which learns a domain-specific prompt vector for the target domain without requiring ... Pretrained visual-language models have demonstrated impressive zero-shot abilities in image captioning, when accompanied by hand-crafted prompts. ... Therefore, we propose an unsupervised prompt learning method to improve generalization of image captioning. ...

arXiv:2308.02862v1 fatcat:vyzcwpuhm5clnphdvau3vy4cpy

In this paper, we address image captioning by generating language descriptions of scenes without learning from annotated pairs of images and their captions. ... Our approach allows to exploit large text corpora outside the annotated distributions of image/caption data. ... In this work we explore unsupervised captioning, where image and language sources are independent. ...

arXiv:1908.09317v1 fatcat:nzdwgs22cjd4xbeaz5zu4npnfq

Unsupervised image captioning with no annotations is an emerging challenge in computer vision, where the existing arts usually adopt GAN (Generative Adversarial Networks) models. ... R^2M encodes visual context through unsupervised training on images, while enabling the memory to learn from irrelevant textual corpus via supervised fashion. ... We perform unsupervised captioning through mess occurrences of common visual concepts in disjoint images and sentences. ...

arXiv:2006.13611v1 fatcat:w5uwfq6tknevzin2zoxj5gpgy4

Unsupervised image captioning with no annotations is an emerging challenge in computer vision, where the existing arts usually adopt GAN (Generative Adversarial Networks) models. ... R2M encodes visual context through unsupervised training on images, while enabling the memory to learn from irrelevant textual corpus via supervised fashion. ... We perform unsupervised captioning through mess occurrences of common visual concepts in disjoint images and sentences. ...

doi:10.24963/ijcai.2020/128 dblp:conf/ijcai/GuoWSW20 fatcat:ttv3qb2kw5fyjatm3fknc25uk4

Labelling image-sentence is expensive and some unsupervised image captioning methods show promising results on caption generation. ... In the experiments, we use a large number of unpaired images and sentences to train our model on the unsupervised and unpaired setting. ... [7] propose three objectives to train the unsupervised image captioning model without any labelled image-sentence pairs. ...

doi:10.1109/icassp39728.2021.9414335 fatcat:76phn3ilwnhmtasabefmtxxzpy

Inspired by unsupervised machine translation, we investigate if a strong V&L representation model can be learned through unsupervised pre-training without image-caption corpora. ... However, existing models require a large amount of parallel image-caption data for pre-training. Such data are costly to collect and require cumbersome curation. ... More specifically, we use 3M images from CC and 1M captions from SBU captions (Ordonez et al., 2011) . ...

arXiv:2010.12831v2 fatcat:ftyzelmc35dg3fwckci4kh5we4

Multiple Versions

Unsupervised image captioning often grapples with challenges such as image–text mismatches and modality gaps, resulting in suboptimal captions. ... The findings not only contribute to the advancement of image captioning techniques but also open avenues for future research. ... [3] presented an unsupervised image description method utilizing generative adversarial networks (GAN) to generate diverse image captions. ...

doi:10.3390/electronics12173549 fatcat:r2tfelzxxjbrhe32hpu65wcblm

DOAJ

Unsupervised image captioning is a challenging task that aims at generating captions without the supervision of image-sentence pairs, but only with images and sentences drawn from different sources and ... The focus of the previous work was on the alignment of input images and pseudo-captions at the sentence level. However, pseudo-captions contain many words that are irrelevant to a given image. ... That is, every image has completely matched captions in pseudo-captions, which is not the case in unsupervised image captioning. ...

doi:10.18653/v1/2021.eacl-main.323 fatcat:onivmomwgjaqla4mmdynipi5hi

Inspired by unsupervised machine translation, we investigate if a strong V&L representation model can be learned through unsupervised pre-training without image-caption corpora. ... However, existing models require a large amount of parallel image-caption data for pre-training. Such data are costly to collect and require cumbersome curation. ... 2018) and unsupervised image captioning (Feng et al., 2019) . ...

doi:10.48550/arxiv.2010.12831 fatcat:jrc37cyvl5auxiqsizgu7r3gam

Unsupervised Image Captioning

Preserved Fulltext

Unsupervised Image Captioning [article]

Preserved Fulltext

Other Versions

Object-Centric Unsupervised Image Captioning [article]

Preserved Fulltext

Other Versions

Removing Word-Level Spurious Alignment between Images and Pseudo-Captions in Unsupervised Image Captioning [article]

Preserved Fulltext

UNISON: Unpaired Cross-lingual Image Captioning [article]

Preserved Fulltext

Other Versions

Removing Partial Mismatches in Unsupervised Image Captioning

Preserved Fulltext

Improving Generalization of Image Captioning with Unsupervised Prompt Learning [article]

Preserved Fulltext

Towards Unsupervised Image Captioning with Shared Multimodal Embeddings [article]

Preserved Fulltext

Recurrent Relational Memory Network for Unsupervised Image Captioning [article]

Preserved Fulltext

Recurrent Relational Memory Network for Unsupervised Image Captioning

Preserved Fulltext

Triple Sequence Generative Adversarial Nets for Unsupervised Image Captioning

Preserved Fulltext

Unsupervised Vision-and-Language Pre-training Without Parallel Images and Captions [article]

Preserved Fulltext

Semantic-Enhanced Cross-Modal Fusion for Improved Unsupervised Image Captioning

Preserved Fulltext

Removing Word-Level Spurious Alignment between Images and Pseudo-Captions in Unsupervised Image Captioning

Preserved Fulltext

Unsupervised Vision-and-Language Pre-training Without Parallel Images and Captions [article]

Preserved Fulltext