Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Filters








692 Hits in 3.5 sec

Regularizing RNNs for Caption Generation by Reconstructing The Past with The Present [article]

Xinpeng Chen and Lin Ma and Wenhao Jiang and Jian Yao and Wei Liu
2018 arXiv   pre-print
ARNet aims at reconstructing the previous hidden state with the present one, besides behaving as the input-dependent transition operator.  ...  Additionally, ARNet remarkably reduces the discrepancy between training and inference processes for caption generation.  ...  states by reconstructing the past with the present. • ARNet can help regularize the transition dynamics of the RNN, therefore mitigating its discrepancy for sequence prediction. • ARNet coupling with  ... 
arXiv:1803.11439v2 fatcat:4tao76xsobcprcbu5kmbhnank4

Regularizing RNNs for Caption Generation by Reconstructing the Past with the Present

Xinpeng Chen, Lin Ma, Wenhao Jiang, Jian Yao, Wei Liu
2018 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition  
AR-Net aims at reconstructing the previous hidden state with the present one, besides behaving as the input-dependent transition operator.  ...  Additionally, ARNet remarkably reduces the discrepancy between training and inference processes for caption generation.  ...  states by reconstructing the past with the present. • ARNet can help regularize the transition dynamics of the RNN, therefore mitigating its discrepancy for sequence prediction. • ARNet coupling with  ... 
doi:10.1109/cvpr.2018.00834 dblp:conf/cvpr/Chen0JY018 fatcat:7mporxrrtfa45jckbiv7oujp2q

Bidirectional Multirate Reconstruction for Temporal Modeling in Videos [article]

Linchao Zhu, Zhongwen Xu, Yi Yang
2016 arXiv   pre-print
Notably, our method generates the best single feature for event detection with a relative improvement of 10.4 achieves the best performance in video captioning across all evaluation metrics on the YouTube2Text  ...  Given a clip sampled from a video, we use its past and future neighboring clips as the temporal context, and reconstruct the two temporal transitions, i.e., presentpast transition and present→future transition  ...  transitions, i.e., presentpast transition and present→future transition.  ... 
arXiv:1611.09053v1 fatcat:hv4h6zwixneg5m23jvthub2gku

Analysis of Different Neural Network Techniques Used for Image Caption Generation

2020 VOLUME-8 ISSUE-10, AUGUST 2019, REGULAR ISSUE  
Generating captions for images is same we need to describe images based on what you see. This task can be considered as a combination of computer vision and natural language processing.  ...  And the recent development in machine learning and deep learning has paved a way to deal with complex problem easily.  ...  The output of the present input depends on past output computation.  ... 
doi:10.35940/ijitee.i7165.079920 fatcat:r6qtsyl64fhhbjzshcpvvknsee

Hybrid deep neural network for Bangla automated image descriptor

Md Asifuzzaman Jishan, Khan Raqib Mahmud, Abul Kalam Al Azad, Md Shahabub Alam, Anif Minhaz Khan
2020 IJAIN (International Journal of Advances in Intelligent Informatics)  
In this study, a novel dataset was constructed by generating Bangla textual descriptor from visual input, called Bangla Natural Language Image to Text (BNLIT), incorporating 100 classes with annotation  ...  For the experiment of that task, we implemented a hybrid image captioning model, which achieved a remarkable result for a new self-made dataset, and that task was new for the Bangladesh perspective.  ...  with the Department of Statistics at Technische Universität Dortmund, Germany.  ... 
doi:10.26555/ijain.v6i2.499 fatcat:5wamb5fycjgtvai3dqhd33yq3a

Image-to-Markup Generation with Coarse-to-Fine Attention [article]

Yuntian Deng, Anssi Kanervisto, Jeffrey Ling, Alexander M. Rush
2017 arXiv   pre-print
Our method is evaluated in the context of image-to-LaTeX generation, and we introduce a new dataset of real-world rendered mathematical expressions paired with LaTeX markup.  ...  Our approach outperforms classical mathematical OCR systems by a large margin on in-domain rendered data, and, with pretraining, also performs well on out-of-domain handwritten data.  ...  Acknowledgements We would like to thank Daniel Kirsch for providing us Detexify data, and Sam Wiseman and Yoon Kim for the helpful feedback on this paper.  ... 
arXiv:1609.04938v2 fatcat:mtrbt22yo5apfjquti3xsf2fzm

Obj2Text: Generating Visually Descriptive Language from Object Layouts

Xuwang Yin, Vicente Ordonez
2017 Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing  
Generating captions for images is a task that has recently received considerable attention.  ...  In this work we focus on caption generation for abstract scenes, or object layouts where the only information provided is a set of objects and their locations.  ...  Acknowledgments This work was supported in part by an NVIDIA Hardware Grant. We are also thankful for the feedback from Mark Yatskar and anonymous reviewers of this paper.  ... 
doi:10.18653/v1/d17-1017 dblp:conf/emnlp/YinO17 fatcat:63fexvdnrbfbdhj5h4fdejogfi

OBJ2TEXT: Generating Visually Descriptive Language from Object Layouts [article]

Xuwang Yin, Vicente Ordonez
2017 arXiv   pre-print
Generating captions for images is a task that has recently received considerable attention.  ...  In this work we focus on caption generation for abstract scenes, or object layouts where the only information provided is a set of objects and their locations.  ...  Acknowledgments This work was supported in part by an NVIDIA Hardware Grant. We are also thankful for the feedback from Mark Yatskar and anonymous reviewers of this paper.  ... 
arXiv:1707.07102v1 fatcat:txhtyue5qbgsfahhbcifsfmw6m

Recurrent Topic-Transition GAN for Visual Paragraph Generation [article]

Xiaodan Liang, Zhiting Hu, Hao Zhang, Chuang Gan, Eric P. Xing
2017 arXiv   pre-print
The joint adversarial training of RTT-GAN drives the model to generate realistic paragraphs with smooth logical transition between sentence topics.  ...  The paragraph generator generates sentences recurrently by incorporating region-based visual and language attention mechanisms at each step.  ...  The word RNN with language attention then generates each word. fined as: L c (G) = − T t=1 N t i=1 log PG(wt,i|wt,1:i−1, s1:t−1, V). (3) Note that the reconstruction loss is only used for supervised examples  ... 
arXiv:1703.07022v2 fatcat:zp55ytm67fayvdp5dcn7wkpxf4

Deep Learning for Video Captioning: A Review

Shaoxiang Chen, Ting Yao, Yu-Gang Jiang
2019 Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence  
The other is caption generation, which decodes the learned representation into a sequential sentence, word by word.  ...  In this survey, we first formulate the problem of video captioning, then review state-of-the-art methods categorized by their emphasis on vision or language, and followed by a summary of standard datasets  ...  WS-DEC tries to reconstruct the ground truth caption by first localizing it and then generating caption based on the localized segment.  ... 
doi:10.24963/ijcai.2019/877 dblp:conf/ijcai/ChenYJ19 fatcat:3xxssrzqjjd5jbvtgkkp5lw7xa

Recurrent Image Captioner: Describing Images with Spatial-Invariant Transformation and Attention Filtering [article]

Hao Liu, Yang Yang, Fumin Shen, Lixin Duan, Heng Tao Shen
2016 arXiv   pre-print
We also employ bidirectional LSTM to preprocess sentences for generating better textual representations. Besides, we propose to exploit variational inference to optimize the whole architecture.  ...  Along with the prosperity of recurrent neural network in modelling sequential data and the power of attention mechanism in automatically identify salient information, image captioning, a.k.a., image description  ...  In the following recurrent process, it works like an autoencoder, and serves as a regularization for image captioning.  ... 
arXiv:1612.04949v1 fatcat:l72kpcj4tbb25j4kihp7hcqw5a

State estimation with limited sensors – A deep learning based approach [article]

Yash Kumar, Pranav Bahl, Souvik Chakraborty
2021 arXiv   pre-print
For efficient recovery of the state, the proposed approached is coupled with an auto-encoder based reduced order model.  ...  The importance of state estimation in fluid mechanics is well-established; it is required for accomplishing several tasks including design/optimization, active control, and future state prediction.  ...  Acknowledgements: The authors would like to thank Dr. Arghya Samanta and Nirmal J Nair for the useful discussions during this paper's preparation.  ... 
arXiv:2101.11513v2 fatcat:cn76zb6ya5dxjatpule2w4ed2i

A Survey of Natural Language Generation [article]

Chenhe Dong, Yinghui Li, Haifan Gong, Miaoxin Chen, Junxin Li, Ying Shen, Min Yang
2021 arXiv   pre-print
This paper offers a comprehensive review of the research on Natural Language Generation (NLG) over the past two decades, especially in relation to data-to-text generation and text-to-text generation deep  ...  This survey aims to (a) give the latest synthesis of deep learning research on the NLG core tasks, as well as the architectures adopted in the field; (b) detail meticulously and comprehensively various  ...  [53] present a denoising autoencoder for pre-training, which consists of a series of noising strategies to corrupt text and a training objective to reconstruct the original sentence.  ... 
arXiv:2112.11739v1 fatcat:ygrpp6f25ja4vfbhcr5ycfpxhy

Generating Text through Adversarial Training Using Skip-Thought Vectors

Afroz Ahamad
2019 Proceedings of the 2019 Conference of the North  
Attempts have been made to utilize GANs with word embeddings for text generation.  ...  This study presents an approach to text generation using Skip-Thought sentence embeddings with GANs based on gradient penalty functions and f-measures.  ...  Acknowledgements The author would like to thank Aruna Malapati for providing insights and access to an Nvidia Titan X GPU for the experiments; and Pranesh Bhargava, Greg Durrett and Yash Raj Jain for providing  ... 
doi:10.18653/v1/n19-3008 dblp:conf/naacl/Ahamad19 fatcat:2dim2p7k4rfcvlxtxi7vhs5emq

New Ideas and Trends in Deep Multimodal Content Understanding: A Review [article]

Wei Chen and Weiping Wang and Li Liu and Michael S. Lew
2020 arXiv   pre-print
These models go beyond the simple image classifiers in which they can do uni-directional (e.g. image captioning, image generation) and bi-directional (e.g. cross-modal retrieval, visual question answering  ...  Finally, we include several promising directions for future research.  ...  Acknowledgments This work was supported by LIACS MediaLab at Leiden University and China Scholarship Council (CSC No. 201703170183). We appreciate the helpful editing work from Dr. Erwin Bakker.  ... 
arXiv:2010.08189v1 fatcat:2l7molbcn5hf3oyhe3l52tdwra
« Previous Showing results 1 — 15 out of 692 results