A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2017; you can also visit the original URL.
The file type is application/pdf
.
Filters
Dense Captioning with Joint Inference and Visual Context
2017
2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Dense captioning is a newly emerging computer vision topic for understanding images with dense language descriptions. ...
., objects, object parts, and interactions between them) from images, labeling each with a short descriptive phrase. ...
Introduction The computer vision community has recently witnessed the success of deep neural networks for image captioning, in which a sentence is generated to describe a given image. ...
doi:10.1109/cvpr.2017.214
dblp:conf/cvpr/YangTYL17
fatcat:7mcgtr3oinag5pnwob5bqviglq
Dense Captioning with Joint Inference and Visual Context
[article]
2017
arXiv
pre-print
Dense captioning is a newly emerging computer vision topic for understanding images with dense language descriptions. ...
., objects, object parts, and interactions between them) from images, labeling each with a short descriptive phrase. ...
Introduction The computer vision community has recently witnessed the success of deep neural networks for image captioning, in which a sentence is generated to describe a given image. ...
arXiv:1611.06949v2
fatcat:gm2xmjbq65edhpxn5qrlpdnswy
Long-term recurrent convolutional networks for visual recognition and description
2015
2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Models based on deep convolutional networks have dominated recent image interpretation tasks; we investigate whether models which are also recurrent, or "temporally deep", are effective for tasks involving ...
tasks, image description and retrieval problems, and video narration challenges. ...
Acknowledgements The authors thank Oriol Vinyals for valuable advice and helpful discussion throughout this work. ...
doi:10.1109/cvpr.2015.7298878
dblp:conf/cvpr/DonahueHGRVDS15
fatcat:5w4eeyesm5hipiieav2nzc4et4
Long-Term Recurrent Convolutional Networks for Visual Recognition and Description
2017
IEEE Transactions on Pattern Analysis and Machine Intelligence
Models based on deep convolutional networks have dominated recent image interpretation tasks; we investigate whether models which are also recurrent, or "temporally deep", are effective for tasks involving ...
tasks, image description and retrieval problems, and video narration challenges. ...
Acknowledgements The authors thank Oriol Vinyals for valuable advice and helpful discussion throughout this work. ...
doi:10.1109/tpami.2016.2599174
pmid:27608449
fatcat:xoluu2wo6jfbxddalikzfbrgk4
Robust Visual-Textual Sentiment Analysis
2016
Proceedings of the 2016 ACM on Multimedia Conference - MM '16
Our system first builds a semantic tree structure based on sentence parsing, aimed at aligning textual words and image regions for accurate analysis. ...
Sentiment analysis is crucial for extracting social signals from social media content. ...
Acknowledgment This work was generously supported in part by Adobe Research and New York State through the Goergen Institute for Data Science at the University of Rochester. ...
doi:10.1145/2964284.2964288
dblp:conf/mm/YouCJL16
fatcat:jwftcdellngpblor5mpbhooq6i
Long-term Recurrent Convolutional Networks for Visual Recognition and Description
[article]
2016
arXiv
pre-print
Models based on deep convolutional networks have dominated recent image interpretation tasks; we investigate whether models which are also recurrent, or "temporally deep", are effective for tasks involving ...
tasks, image description and retrieval problems, and video narration challenges. ...
ACKNOWLEDGMENTS The authors thank Oriol Vinyals for valuable advice and helpful discussion throughout this work. ...
arXiv:1411.4389v4
fatcat:24egskhcdjcsfet4onvzhwfskq
Learning Multimodal Attention LSTM Networks for Video Captioning
2017
Proceedings of the 2017 ACM on Multimedia Conference - MM '17
Multimodal Attention Long-Short Term Memory networks (MA-LSTM). ...
Different from existing approaches that employ the same LSTM structure for different modalities, we train modality-specific LSTM to capture the intrinsic representations of individual modalities. ...
In [24] , Venugopalan et al. design an encoderdecoder neural network to generate descriptions. ...
doi:10.1145/3123266.3123448
dblp:conf/mm/XuYZM17
fatcat:walf26k3tve45idna47khhtsq4
Deep-Temporal LSTM for Daily Living Action Recognition
[article]
2018
arXiv
pre-print
In this work, we propose a deep-temporal LSTM architecture which extends standard LSTM and allows better encoding of temporal information. ...
In addition, we propose to fuse 3D skeleton geometry with deep static appearance. ...
The complementary nature of the LSTM and CNN based networks are evident from the boosted performance for MSRDailyActiv-ity3D and NTU-RGB+D on fusion. ...
arXiv:1802.00421v2
fatcat:th4clkmfzvejrlmhs3n7z5smvu
A Review of Deep Learning-based Human Activity Recognition on Benchmark Video Datasets
2022
Applied Artificial Intelligence
A short comparison is also made with the handcrafted feature-based approach and its fusion with deep learning to show the evolution of HAR methods. ...
We propose a new taxonomy for categorizing the literature as CNN and RNN-based approaches. ...
State-of-the-art Deep Learning Approach In based methods as -multi-stream networks and sequential networks, whereas RNN-based methods as -LSTM with CNN and Fusion LSTM. ...
doi:10.1080/08839514.2022.2093705
fatcat:6on4g3sp3vaktnyyrk72k4mqta
Deep fusion of gray level co-occurrence matrices for lung nodule classification
[article]
2022
arXiv
pre-print
LSTM fusion structure. ...
A new long-short-term-memory (LSTM) based deep fusion structure, is introduced, where, the texture features computed from lung nodules through new volumetric grey-level-co-occurrence-matrices (GLCM) computations ...
This LSTM-based deep network fusion for adjacent VSs features is applied for classifying lung nodules on behalf of the spatially correlated information between sequenced VSs. ...
arXiv:2205.05123v1
fatcat:mpgnycr7mnddpju6o7df2eknh4
Deep Auto-Encoders with Sequential Learning for Multimodal Dimensional Emotion Recognition
[article]
2020
arXiv
pre-print
To address these challenges, in this paper, we propose a novel deep neural network architecture consisting of a two-stream auto-encoder and a long short term memory for effectively integrating visual and ...
audio signal streams for emotion recognition. ...
Those features are then combined via a fusion layer before being fed to the LSTM for sequential learning of the features (for every 0.2s long sequence) from input streams. ...
arXiv:2004.13236v1
fatcat:qgoybjaf3jgphcoed7qablxqlm
Sequential Deep Trajectory Descriptor for Action Recognition With Three-Stream CNN
2017
IEEE transactions on multimedia
To address this problem, this paper proposes a long-term motion descriptor called sequential Deep Trajectory Descriptor (sDTD). ...
Specifically, we project dense trajectories into two-dimensional planes, and subsequently a CNN-RNN network is employed to learn an effective representation for long-term motion. ...
[4] proposed their own recurrent networks respectively by connecting LSTMs to CNNs. Donahue et al. tested their model on activity recognition, image description and video description. Wu et al. ...
doi:10.1109/tmm.2017.2666540
fatcat:lddmr4wecnesbcua2tphb7qvxy
Deep-Temporal LSTM for Daily Living Action Recognition
2018
2018 15th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)
In this work, we propose a deep-temporal LSTM architecture which extends standard LSTM and allows better encoding of temporal information. ...
In addition, we propose to fuse 3D skeleton geometry with deep static appearance. ...
The complementary nature of the LSTM and CNN based networks are evident from the boosted performance for MSRDailyActiv-ity3D and NTU-RGB+D on fusion. ...
doi:10.1109/avss.2018.8639122
dblp:conf/avss/DasKBF18
fatcat:qzl7akjymfeijjjv6mbpolrm5m
Long Short-Term Memory with Gate and State Level Fusion for Light Field-Based Face Recognition
[article]
2020
arXiv
pre-print
Long Short-Term Memory (LSTM) is a prominent recurrent neural network for extracting dependencies from sequential data such as time-series and multi-view data, having achieved impressive results for different ...
The efficacy of the novel LSTM cell architectures is assessed by integrating them into deep learning-based methods for face recognition with multi-view, light field images. ...
Nowadays, due to their superior representation and prediction performance, deep Convolutional Neural Networks (CNNs) are increasingly adopted for visual recognition and description tasks [4] . ...
arXiv:1905.04421v2
fatcat:xi7fxu2qrjde7cnp6su4pujlpe
Fine-Grained Recognition via Attribute-Guided Attentive Feature Aggregation
2017
Proceedings of the 2017 ACM on Multimedia Conference - MM '17
This could be considered as a discriminant aggregation network and informative patch-level features are propagated and accumulated to the deeper nodes of the recurrent network for final classification. ...
First, we develop a novel attribute-guided attentive network to sequentially discover informative parts/regions, by seeking a good registration between attentive regions and predefined object attributes ...
description generation [29] . ...
doi:10.1145/3123266.3123358
dblp:conf/mm/YanNY17
fatcat:q5dlco5zrbfdfenmuhhpwzcp2a
« Previous
Showing results 1 — 15 out of 3,943 results