A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Filters
Regularizing RNNs for Caption Generation by Reconstructing The Past with The Present
[article]
2018
arXiv
pre-print
ARNet aims at reconstructing the previous hidden state with the present one, besides behaving as the input-dependent transition operator. ...
Additionally, ARNet remarkably reduces the discrepancy between training and inference processes for caption generation. ...
states by reconstructing the past with the present. • ARNet can help regularize the transition dynamics of the RNN, therefore mitigating its discrepancy for sequence prediction. • ARNet coupling with ...
arXiv:1803.11439v2
fatcat:4tao76xsobcprcbu5kmbhnank4
Regularizing RNNs for Caption Generation by Reconstructing the Past with the Present
2018
2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition
AR-Net aims at reconstructing the previous hidden state with the present one, besides behaving as the input-dependent transition operator. ...
Additionally, ARNet remarkably reduces the discrepancy between training and inference processes for caption generation. ...
states by reconstructing the past with the present. • ARNet can help regularize the transition dynamics of the RNN, therefore mitigating its discrepancy for sequence prediction. • ARNet coupling with ...
doi:10.1109/cvpr.2018.00834
dblp:conf/cvpr/Chen0JY018
fatcat:7mporxrrtfa45jckbiv7oujp2q
Bidirectional Multirate Reconstruction for Temporal Modeling in Videos
[article]
2016
arXiv
pre-print
Notably, our method generates the best single feature for event detection with a relative improvement of 10.4 achieves the best performance in video captioning across all evaluation metrics on the YouTube2Text ...
Given a clip sampled from a video, we use its past and future neighboring clips as the temporal context, and reconstruct the two temporal transitions, i.e., present→past transition and present→future transition ...
transitions, i.e., present→past transition and present→future transition. ...
arXiv:1611.09053v1
fatcat:hv4h6zwixneg5m23jvthub2gku
Analysis of Different Neural Network Techniques Used for Image Caption Generation
2020
VOLUME-8 ISSUE-10, AUGUST 2019, REGULAR ISSUE
Generating captions for images is same we need to describe images based on what you see. This task can be considered as a combination of computer vision and natural language processing. ...
And the recent development in machine learning and deep learning has paved a way to deal with complex problem easily. ...
The output of the present input depends on past output computation. ...
doi:10.35940/ijitee.i7165.079920
fatcat:r6qtsyl64fhhbjzshcpvvknsee
Hybrid deep neural network for Bangla automated image descriptor
2020
IJAIN (International Journal of Advances in Intelligent Informatics)
In this study, a novel dataset was constructed by generating Bangla textual descriptor from visual input, called Bangla Natural Language Image to Text (BNLIT), incorporating 100 classes with annotation ...
For the experiment of that task, we implemented a hybrid image captioning model, which achieved a remarkable result for a new self-made dataset, and that task was new for the Bangladesh perspective. ...
with the Department of Statistics at Technische Universität Dortmund, Germany. ...
doi:10.26555/ijain.v6i2.499
fatcat:5wamb5fycjgtvai3dqhd33yq3a
Image-to-Markup Generation with Coarse-to-Fine Attention
[article]
2017
arXiv
pre-print
Our method is evaluated in the context of image-to-LaTeX generation, and we introduce a new dataset of real-world rendered mathematical expressions paired with LaTeX markup. ...
Our approach outperforms classical mathematical OCR systems by a large margin on in-domain rendered data, and, with pretraining, also performs well on out-of-domain handwritten data. ...
Acknowledgements We would like to thank Daniel Kirsch for providing us Detexify data, and Sam Wiseman and Yoon Kim for the helpful feedback on this paper. ...
arXiv:1609.04938v2
fatcat:mtrbt22yo5apfjquti3xsf2fzm
Obj2Text: Generating Visually Descriptive Language from Object Layouts
2017
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing
Generating captions for images is a task that has recently received considerable attention. ...
In this work we focus on caption generation for abstract scenes, or object layouts where the only information provided is a set of objects and their locations. ...
Acknowledgments This work was supported in part by an NVIDIA Hardware Grant. We are also thankful for the feedback from Mark Yatskar and anonymous reviewers of this paper. ...
doi:10.18653/v1/d17-1017
dblp:conf/emnlp/YinO17
fatcat:63fexvdnrbfbdhj5h4fdejogfi
OBJ2TEXT: Generating Visually Descriptive Language from Object Layouts
[article]
2017
arXiv
pre-print
Generating captions for images is a task that has recently received considerable attention. ...
In this work we focus on caption generation for abstract scenes, or object layouts where the only information provided is a set of objects and their locations. ...
Acknowledgments This work was supported in part by an NVIDIA Hardware Grant. We are also thankful for the feedback from Mark Yatskar and anonymous reviewers of this paper. ...
arXiv:1707.07102v1
fatcat:txhtyue5qbgsfahhbcifsfmw6m
Recurrent Topic-Transition GAN for Visual Paragraph Generation
[article]
2017
arXiv
pre-print
The joint adversarial training of RTT-GAN drives the model to generate realistic paragraphs with smooth logical transition between sentence topics. ...
The paragraph generator generates sentences recurrently by incorporating region-based visual and language attention mechanisms at each step. ...
The word RNN with language attention then generates each word. fined as: L c (G) = − T t=1 N t i=1 log PG(wt,i|wt,1:i−1, s1:t−1, V). (3) Note that the reconstruction loss is only used for supervised examples ...
arXiv:1703.07022v2
fatcat:zp55ytm67fayvdp5dcn7wkpxf4
Deep Learning for Video Captioning: A Review
2019
Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence
The other is caption generation, which decodes the learned representation into a sequential sentence, word by word. ...
In this survey, we first formulate the problem of video captioning, then review state-of-the-art methods categorized by their emphasis on vision or language, and followed by a summary of standard datasets ...
WS-DEC tries to reconstruct the ground truth caption by first localizing it and then generating caption based on the localized segment. ...
doi:10.24963/ijcai.2019/877
dblp:conf/ijcai/ChenYJ19
fatcat:3xxssrzqjjd5jbvtgkkp5lw7xa
Recurrent Image Captioner: Describing Images with Spatial-Invariant Transformation and Attention Filtering
[article]
2016
arXiv
pre-print
We also employ bidirectional LSTM to preprocess sentences for generating better textual representations. Besides, we propose to exploit variational inference to optimize the whole architecture. ...
Along with the prosperity of recurrent neural network in modelling sequential data and the power of attention mechanism in automatically identify salient information, image captioning, a.k.a., image description ...
In the following recurrent process, it works like an autoencoder, and serves as a regularization for image captioning. ...
arXiv:1612.04949v1
fatcat:l72kpcj4tbb25j4kihp7hcqw5a
State estimation with limited sensors – A deep learning based approach
[article]
2021
arXiv
pre-print
For efficient recovery of the state, the proposed approached is coupled with an auto-encoder based reduced order model. ...
The importance of state estimation in fluid mechanics is well-established; it is required for accomplishing several tasks including design/optimization, active control, and future state prediction. ...
Acknowledgements: The authors would like to thank Dr. Arghya Samanta and Nirmal J Nair for the useful discussions during this paper's preparation. ...
arXiv:2101.11513v2
fatcat:cn76zb6ya5dxjatpule2w4ed2i
A Survey of Natural Language Generation
[article]
2021
arXiv
pre-print
This paper offers a comprehensive review of the research on Natural Language Generation (NLG) over the past two decades, especially in relation to data-to-text generation and text-to-text generation deep ...
This survey aims to (a) give the latest synthesis of deep learning research on the NLG core tasks, as well as the architectures adopted in the field; (b) detail meticulously and comprehensively various ...
[53] present a denoising autoencoder for pre-training, which consists of a series of noising strategies to corrupt text and a training objective to reconstruct the original sentence. ...
arXiv:2112.11739v1
fatcat:ygrpp6f25ja4vfbhcr5ycfpxhy
Generating Text through Adversarial Training Using Skip-Thought Vectors
2019
Proceedings of the 2019 Conference of the North
Attempts have been made to utilize GANs with word embeddings for text generation. ...
This study presents an approach to text generation using Skip-Thought sentence embeddings with GANs based on gradient penalty functions and f-measures. ...
Acknowledgements The author would like to thank Aruna Malapati for providing insights and access to an Nvidia Titan X GPU for the experiments; and Pranesh Bhargava, Greg Durrett and Yash Raj Jain for providing ...
doi:10.18653/v1/n19-3008
dblp:conf/naacl/Ahamad19
fatcat:2dim2p7k4rfcvlxtxi7vhs5emq
New Ideas and Trends in Deep Multimodal Content Understanding: A Review
[article]
2020
arXiv
pre-print
These models go beyond the simple image classifiers in which they can do uni-directional (e.g. image captioning, image generation) and bi-directional (e.g. cross-modal retrieval, visual question answering ...
Finally, we include several promising directions for future research. ...
Acknowledgments This work was supported by LIACS MediaLab at Leiden University and China Scholarship Council (CSC No. 201703170183). We appreciate the helpful editing work from Dr. Erwin Bakker. ...
arXiv:2010.08189v1
fatcat:2l7molbcn5hf3oyhe3l52tdwra
« Previous
Showing results 1 — 15 out of 692 results