Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

502 Hits in 7.7 sec

Image Captioning with Synergy-Gated Attention and Recurrent Fusion LSTM

2022 KSII Transactions on Internet and Information Systems  
Long Short-Term Memory (LSTM) combined with attention mechanism is extensively used to generate semantic sentences of images in image captioning models.  ...  First, the Synergy-Gated Attention (SGA) method is proposed, which can process the spatial features and the salient region features of given images simultaneously.  ...  Introduction Image captioning is a task that makes a sentence from reading an image. The sentence should be fluence and hold semantic consistency with image.  ... 
doi:10.3837/tiis.2022.10.010 fatcat:aqspiix37fcwbhcrr57tisd3ea

A Dual-Attention Learning Network with Word and Sentence Embedding for Medical Visual Question Answering [article]

Xiaofei Huang, Hongfang Gong
2022 arXiv   pre-print
In this study, a dual-attention learning network with word and sentence embedding (WSDAN) is proposed.  ...  We design a module, transformer with sentence embedding (TSE), to extract a double embedding representation of questions containing keywords and medical information.  ...  double embedding of words and sentences, but only uses image-guided attention. ( 4 ) WSDAN(NP) only with F (V,Q) has no pretrained model that uses double embedding but only uses question-guided attention  ... 
arXiv:2210.00220v2 fatcat:sjxdelnvwzakleawzdf5tttm4q

ARTEMIS: Attention-based Retrieval with Text-Explicit Matching and Implicit Similarity [article]

Ginger Delmas and Rafael Sampaio de Rezende and Gabriela Csurka and Diane Larlus
2022 arXiv   pre-print
An intuitive way to search for images is to use queries composed of an example image and a complementary text.  ...  While the first provides rich and implicit context for the search, the latter explicitly calls for new traits, or specifies how some elements of the example image should be changed to retrieve the desired  ...  to produce a sentence-level feature.  ... 
arXiv:2203.08101v2 fatcat:uwqqwuwazvbl7ajowvdj4uknze

A Survey of Natural Language Generation [article]

Chenhe Dong, Yinghui Li, Haifan Gong, Miaoxin Chen, Junxin Li, Ying Shen, Min Yang
2021 arXiv   pre-print
NLG tasks and datasets, and draw attention to the challenges in NLG evaluation, focusing on different evaluation methods and their relationships; (c) highlight some future emphasis and relatively recent  ...  research issues that arise due to the increasing synergy between NLG and other artificial intelligence areas, such as computer vision, text and computational creativity.  ...  Then the sentence-level and word-level dynamic attention and , are formulated as: = u ⊤ W 1 h , , = h , ⊤ W 2 h . (43) Finally, the static and dynamic attentions are combined into ˜ , to reweight the article  ... 
arXiv:2112.11739v1 fatcat:ygrpp6f25ja4vfbhcr5ycfpxhy

Vision-and-Language Pretrained Models: A Survey [article]

Siqu Long, Feiqi Cao, Soyeon Caren Han, Haiqin Yang
2022 arXiv   pre-print
In this paper, we present an overview of the major advances achieved in VLPMs for producing joint representations of vision and language.  ...  As the preliminaries, we briefly describe the general task definition and genetic architecture of VLPMs.  ...  such as visual regions/patches and textual words of the aligned image-text pairs.  ... 
arXiv:2204.07356v5 fatcat:uesrj6kfkffvzeycvx7hredl3e

Survey of the State of the Art in Natural Language Generation: Core tasks, applications and evaluation [article]

Albert Gatt, Emiel Krahmer
2018 arXiv   pre-print
recent research topics that have arisen partly as a result of growing synergies between NLG and other areas of artificial intelligence; (c) draw attention to the challenges in NLG evaluation, relating  ...  This survey therefore aims to (a) give an up-to-date synthesis of research on the core tasks in NLG and the architectures adopted in which such tasks are organised; (b) highlight a number of relatively  ...  Acknowledgements We thank the four reviewers for their detailed and constructive comments.  ... 
arXiv:1703.09902v4 fatcat:owx2fgo2bjej3b27ve2f3ledoe

Survey of the State of the Art in Natural Language Generation: Core tasks, applications and evaluation

Albert Gatt, Emiel Krahmer
2018 The Journal of Artificial Intelligence Research  
topics that have arisen partly as a result of growing synergies between NLG and other areas of artificial intelligence; (c) draw attention to the challenges in NLG evaluation, relating them to similar  ...  challenges faced in other areas of NLP, with an emphasis on different evaluation methods and the relationships between them.  ...  Acknowledgments We thank the four reviewers for their detailed and constructive comments.  ... 
doi:10.1613/jair.5477 fatcat:ycuteghjzncn7nx6pzkhzd6mn4

Product1M: Towards Weakly Supervised Instance-Level Product Retrieval via Cross-modal Pretraining [article]

Xunlin Zhan, Yangxin Wu, Xiao Dong, Yunchao Wei, Minlong Lu, Yichi Zhang, Hang Xu, Xiaodan Liang
2021 arXiv   pre-print
Notably, Product1M contains over 1 million image-caption pairs and consists of two sample types, i.e., single-product and multi-product samples, which encompass a wide variety of cosmetics brands.  ...  To promote the study of this challenging task, we contribute Product1M, one of the largest multi-modal cosmetic datasets for real-world instance-level retrieval.  ...  For the 'CAPTURE-1Inst' model, we feed the whole image and an image-level bounding box, which is of the same size as the image, to CAP-TURE for inference.  ... 
arXiv:2107.14572v2 fatcat:cemydi2wojbyvcmh44flggtjem

Message in a Bottle: An Advertising Campaign's Appropriation of Obama's Inclusive Rhetoric, and What This Reveals About National Identity

Tyler Naman
2011 Berkeley undergraduate journal  
When Calvin Klein uses "we are one," "one for all" and "for all for ever" combined with the image of the unified group of young people walking arm in arm to sell perfume, these terms and image have moved  ...  In the ck one ad, this contextual framing is completed with very few words: The caption at the top of the ad reads "we are one" and at the bottom reads "for all for ever."  ... 
doi:10.5070/b3242011670 fatcat:j6cwrxmnsjg4ziogrk72wpzxma

FS-COCO: Towards Understanding of Freehand Sketches of Common Objects in Context [article]

Pinaki Nath Chowdhury and Aneeshan Sain and Ayan Kumar Bhunia and Tao Xiang and Yulia Gryaditskaya and Yi-Zhe Song
2022 arXiv   pre-print
of information in sketches and image captions, as well as the potential benefit of combining the two modalities.  ...  Using our dataset, we study for the first time the problem of fine-grained image retrieval from freehand scene sketches and sketch captions.  ...  [63] is one of the first popular works to use the attention mechanism with an LSTM for image captioning.  ... 
arXiv:2203.02113v3 fatcat:zpt353655vejzi7j3fpbp3jq5i

A Comprehensive Survey on Automatic Knowledge Graph Construction [article]

Lingfeng Zhong, Jia Wu, Qian Li, Hao Peng, Xindong Wu
2023 arXiv   pre-print
Thus, there is a demand for a systematic review of paradigms to organize knowledge structures beyond data-level mentions.  ...  The survey concludes with discussions on the challenges and possible directions for future exploration.  ...  the word-level attention and Multi-level CNN [169] developing an input attention mechanism with attention-based pooling.  ... 
arXiv:2302.05019v1 fatcat:7in54wjwyzhfnkx755izrkzr3y

A Roadmap for Big Model [article]

Sha Yuan, Hanyu Zhao, Shuai Zhao, Jiahong Leng, Yangxiao Liang, Xiaozhi Wang, Jifan Yu, Xin Lv, Zhou Shao, Jiaao He, Yankai Lin, Xu Han (+88 others)
2022 arXiv   pre-print
At present, there is a lack of research work that sorts out the overall progress of BMs and guides the follow-up research.  ...  With the rapid development of deep learning, training Big Models (BMs) for multiple downstream tasks becomes a popular paradigm.  ...  However, even if the social bias is eliminated at the word level, the sentence-level bias can still exist due to the imbalanced combination of words.  ... 
arXiv:2203.14101v4 fatcat:rdikzudoezak5b36cf6hhne5u4

Non-Contrastive Learning Meets Language-Image Pre-Training [article]

Jinghao Zhou, Li Dong, Zhe Gan, Lijuan Wang, Furu Wei
2022 arXiv   pre-print
Nonetheless, the loose correlation between images and texts of web-crawled data renders the contrastive objective data inefficient and craving for a large training batch size.  ...  The synergy between two objectives lets xCLIP enjoy the best of both worlds: superior performance in both zero-shot transfer and representation learning.  ...  For each attention head from the last layer, we extract the attention map with [CLS] token as the query.  ... 
arXiv:2210.09304v1 fatcat:tkphueek3be2vas5ygplk2m2zq

Neural Approaches to Conversational AI [article]

Jianfeng Gao, Michel Galley, Lihong Li
2019 arXiv   pre-print
For each category, we present a review of state-of-the-art neural approaches, draw the connection between them and traditional approaches, and discuss the progress that has been made and challenges still  ...  The present paper surveys neural approaches to conversational AI that have been developed in the last few years.  ...  al., 2015a) , a sentence pair of different languages in machine translation (Gao et al., 2014a) , and an image-text pair in image captioning (Fang et al., 2015) and so on.  ... 
arXiv:1809.08267v3 fatcat:j57xlm4ogferdnrpfs4f2jporq

Effective writing skills for Public Relations

John Foster
2008 Zenodo  
Writing good English must be one of the most difficult jobs in the world. The tracking of a developing language that is rich, diverse, and constantly evolving in use and meaning is not an easy task.  ...  Today's rules and uses quickly become outdated, but this book captures English as it should be used now.  ...  For the illustration 18.1, Diabetes UK is the copyright holder for the website and image. Chinese artist Feng Feng provided oriental influence for WPP's Acknowledgements.  ... 
doi:10.5281/zenodo.5345141 fatcat:hd4nud5fyrg7rk3l36nrzbwrgi
« Previous Showing results 1 — 15 out of 502 results