Regularizing RNNs for Caption Generation by Reconstructing the Past With the Present.

ARNet aims at reconstructing the previous hidden state with the present one, besides behaving as the input-dependent transition operator. ... Additionally, ARNet remarkably reduces the discrepancy between training and inference processes for caption generation. ... states by reconstructing the past with the present. • ARNet can help regularize the transition dynamics of the RNN, therefore mitigating its discrepancy for sequence prediction. • ARNet coupling with ...

arXiv:1803.11439v2 fatcat:4tao76xsobcprcbu5kmbhnank4

Multiple Versions

AR-Net aims at reconstructing the previous hidden state with the present one, besides behaving as the input-dependent transition operator. ... Additionally, ARNet remarkably reduces the discrepancy between training and inference processes for caption generation. ... states by reconstructing the past with the present. • ARNet can help regularize the transition dynamics of the RNN, therefore mitigating its discrepancy for sequence prediction. • ARNet coupling with ...

doi:10.1109/cvpr.2018.00834 dblp:conf/cvpr/Chen0JY018 fatcat:7mporxrrtfa45jckbiv7oujp2q

Notably, our method generates the best single feature for event detection with a relative improvement of 10.4 achieves the best performance in video captioning across all evaluation metrics on the YouTube2Text ... Given a clip sampled from a video, we use its past and future neighboring clips as the temporal context, and reconstruct the two temporal transitions, i.e., present→past transition and present→future transition ... transitions, i.e., present→past transition and present→future transition. ...

arXiv:1611.09053v1 fatcat:hv4h6zwixneg5m23jvthub2gku

Generating captions for images is same we need to describe images based on what you see. This task can be considered as a combination of computer vision and natural language processing. ... And the recent development in machine learning and deep learning has paved a way to deal with complex problem easily. ... The output of the present input depends on past output computation. ...

doi:10.35940/ijitee.i7165.079920 fatcat:r6qtsyl64fhhbjzshcpvvknsee

In this study, a novel dataset was constructed by generating Bangla textual descriptor from visual input, called Bangla Natural Language Image to Text (BNLIT), incorporating 100 classes with annotation ... For the experiment of that task, we implemented a hybrid image captioning model, which achieved a remarkable result for a new self-made dataset, and that task was new for the Bangladesh perspective. ... with the Department of Statistics at Technische Universität Dortmund, Germany. ...

doi:10.26555/ijain.v6i2.499 fatcat:5wamb5fycjgtvai3dqhd33yq3a

DOAJ OJS

Our method is evaluated in the context of image-to-LaTeX generation, and we introduce a new dataset of real-world rendered mathematical expressions paired with LaTeX markup. ... Our approach outperforms classical mathematical OCR systems by a large margin on in-domain rendered data, and, with pretraining, also performs well on out-of-domain handwritten data. ... Acknowledgements We would like to thank Daniel Kirsch for providing us Detexify data, and Sam Wiseman and Yoon Kim for the helpful feedback on this paper. ...

arXiv:1609.04938v2 fatcat:mtrbt22yo5apfjquti3xsf2fzm

Multiple Versions

Generating captions for images is a task that has recently received considerable attention. ... In this work we focus on caption generation for abstract scenes, or object layouts where the only information provided is a set of objects and their locations. ... Acknowledgments This work was supported in part by an NVIDIA Hardware Grant. We are also thankful for the feedback from Mark Yatskar and anonymous reviewers of this paper. ...

doi:10.18653/v1/d17-1017 dblp:conf/emnlp/YinO17 fatcat:63fexvdnrbfbdhj5h4fdejogfi

Generating captions for images is a task that has recently received considerable attention. ... In this work we focus on caption generation for abstract scenes, or object layouts where the only information provided is a set of objects and their locations. ... Acknowledgments This work was supported in part by an NVIDIA Hardware Grant. We are also thankful for the feedback from Mark Yatskar and anonymous reviewers of this paper. ...

arXiv:1707.07102v1 fatcat:txhtyue5qbgsfahhbcifsfmw6m

The joint adversarial training of RTT-GAN drives the model to generate realistic paragraphs with smooth logical transition between sentence topics. ... The paragraph generator generates sentences recurrently by incorporating region-based visual and language attention mechanisms at each step. ... The word RNN with language attention then generates each word. fined as: L c (G) = − T t=1 N t i=1 log PG(wt,i|wt,1:i−1, s1:t−1, V). (3) Note that the reconstruction loss is only used for supervised examples ...

arXiv:1703.07022v2 fatcat:zp55ytm67fayvdp5dcn7wkpxf4

Multiple Versions

The other is caption generation, which decodes the learned representation into a sequential sentence, word by word. ... In this survey, we first formulate the problem of video captioning, then review state-of-the-art methods categorized by their emphasis on vision or language, and followed by a summary of standard datasets ... WS-DEC tries to reconstruct the ground truth caption by first localizing it and then generating caption based on the localized segment. ...

doi:10.24963/ijcai.2019/877 dblp:conf/ijcai/ChenYJ19 fatcat:3xxssrzqjjd5jbvtgkkp5lw7xa

We also employ bidirectional LSTM to preprocess sentences for generating better textual representations. Besides, we propose to exploit variational inference to optimize the whole architecture. ... Along with the prosperity of recurrent neural network in modelling sequential data and the power of attention mechanism in automatically identify salient information, image captioning, a.k.a., image description ... In the following recurrent process, it works like an autoencoder, and serves as a regularization for image captioning. ...

arXiv:1612.04949v1 fatcat:l72kpcj4tbb25j4kihp7hcqw5a

For efficient recovery of the state, the proposed approached is coupled with an auto-encoder based reduced order model. ... The importance of state estimation in fluid mechanics is well-established; it is required for accomplishing several tasks including design/optimization, active control, and future state prediction. ... Acknowledgements: The authors would like to thank Dr. Arghya Samanta and Nirmal J Nair for the useful discussions during this paper's preparation. ...

arXiv:2101.11513v2 fatcat:cn76zb6ya5dxjatpule2w4ed2i

Open Access Multiple Versions

This paper offers a comprehensive review of the research on Natural Language Generation (NLG) over the past two decades, especially in relation to data-to-text generation and text-to-text generation deep ... This survey aims to (a) give the latest synthesis of deep learning research on the NLG core tasks, as well as the architectures adopted in the field; (b) detail meticulously and comprehensively various ... [53] present a denoising autoencoder for pre-training, which consists of a series of noising strategies to corrupt text and a training objective to reconstruct the original sentence. ...

arXiv:2112.11739v1 fatcat:ygrpp6f25ja4vfbhcr5ycfpxhy

Open Access Multiple Versions

Attempts have been made to utilize GANs with word embeddings for text generation. ... This study presents an approach to text generation using Skip-Thought sentence embeddings with GANs based on gradient penalty functions and f-measures. ... Acknowledgements The author would like to thank Aruna Malapati for providing insights and access to an Nvidia Titan X GPU for the experiments; and Pranesh Bhargava, Greg Durrett and Yash Raj Jain for providing ...

doi:10.18653/v1/n19-3008 dblp:conf/naacl/Ahamad19 fatcat:2dim2p7k4rfcvlxtxi7vhs5emq

These models go beyond the simple image classifiers in which they can do uni-directional (e.g. image captioning, image generation) and bi-directional (e.g. cross-modal retrieval, visual question answering ... Finally, we include several promising directions for future research. ... Acknowledgments This work was supported by LIACS MediaLab at Leiden University and China Scholarship Council (CSC No. 201703170183). We appreciate the helpful editing work from Dr. Erwin Bakker. ...

arXiv:2010.08189v1 fatcat:2l7molbcn5hf3oyhe3l52tdwra

Regularizing RNNs for Caption Generation by Reconstructing The Past with The Present [article]

Preserved Fulltext

Other Versions

Regularizing RNNs for Caption Generation by Reconstructing the Past with the Present

Preserved Fulltext

Bidirectional Multirate Reconstruction for Temporal Modeling in Videos [article]

Preserved Fulltext

Analysis of Different Neural Network Techniques Used for Image Caption Generation

Preserved Fulltext

Hybrid deep neural network for Bangla automated image descriptor

Preserved Fulltext

Image-to-Markup Generation with Coarse-to-Fine Attention [article]

Preserved Fulltext

Other Versions

Obj2Text: Generating Visually Descriptive Language from Object Layouts

Preserved Fulltext

OBJ2TEXT: Generating Visually Descriptive Language from Object Layouts [article]

Preserved Fulltext

Recurrent Topic-Transition GAN for Visual Paragraph Generation [article]

Preserved Fulltext

Other Versions

Deep Learning for Video Captioning: A Review

Preserved Fulltext

Recurrent Image Captioner: Describing Images with Spatial-Invariant Transformation and Attention Filtering [article]

Preserved Fulltext

State estimation with limited sensors – A deep learning based approach [article]

Preserved Fulltext

A Survey of Natural Language Generation [article]

Preserved Fulltext

Generating Text through Adversarial Training Using Skip-Thought Vectors

Preserved Fulltext

New Ideas and Trends in Deep Multimodal Content Understanding: A Review [article]

Preserved Fulltext