Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Filters








3,943 Hits in 3.7 sec

Dense Captioning with Joint Inference and Visual Context

Linjie Yang, Kevin Tang, Jianchao Yang, Li-Jia Li
2017 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)  
Dense captioning is a newly emerging computer vision topic for understanding images with dense language descriptions.  ...  ., objects, object parts, and interactions between them) from images, labeling each with a short descriptive phrase.  ...  Introduction The computer vision community has recently witnessed the success of deep neural networks for image captioning, in which a sentence is generated to describe a given image.  ... 
doi:10.1109/cvpr.2017.214 dblp:conf/cvpr/YangTYL17 fatcat:7mcgtr3oinag5pnwob5bqviglq

Dense Captioning with Joint Inference and Visual Context [article]

Linjie Yang, Kevin Tang, Jianchao Yang, Li-Jia Li
2017 arXiv   pre-print
Dense captioning is a newly emerging computer vision topic for understanding images with dense language descriptions.  ...  ., objects, object parts, and interactions between them) from images, labeling each with a short descriptive phrase.  ...  Introduction The computer vision community has recently witnessed the success of deep neural networks for image captioning, in which a sentence is generated to describe a given image.  ... 
arXiv:1611.06949v2 fatcat:gm2xmjbq65edhpxn5qrlpdnswy

Long-term recurrent convolutional networks for visual recognition and description

Jeff Donahue, Lisa Anne Hendricks, Sergio Guadarrama, Marcus Rohrbach, Subhashini Venugopalan, Trevor Darrell, Kate Saenko
2015 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)  
Models based on deep convolutional networks have dominated recent image interpretation tasks; we investigate whether models which are also recurrent, or "temporally deep", are effective for tasks involving  ...  tasks, image description and retrieval problems, and video narration challenges.  ...  Acknowledgements The authors thank Oriol Vinyals for valuable advice and helpful discussion throughout this work.  ... 
doi:10.1109/cvpr.2015.7298878 dblp:conf/cvpr/DonahueHGRVDS15 fatcat:5w4eeyesm5hipiieav2nzc4et4

Long-Term Recurrent Convolutional Networks for Visual Recognition and Description

Jeff Donahue, Lisa Anne Hendricks, Marcus Rohrbach, Subhashini Venugopalan, Sergio Guadarrama, Kate Saenko, Trevor Darrell
2017 IEEE Transactions on Pattern Analysis and Machine Intelligence  
Models based on deep convolutional networks have dominated recent image interpretation tasks; we investigate whether models which are also recurrent, or "temporally deep", are effective for tasks involving  ...  tasks, image description and retrieval problems, and video narration challenges.  ...  Acknowledgements The authors thank Oriol Vinyals for valuable advice and helpful discussion throughout this work.  ... 
doi:10.1109/tpami.2016.2599174 pmid:27608449 fatcat:xoluu2wo6jfbxddalikzfbrgk4

Robust Visual-Textual Sentiment Analysis

Quanzeng You, Liangliang Cao, Hailin Jin, Jiebo Luo
2016 Proceedings of the 2016 ACM on Multimedia Conference - MM '16  
Our system first builds a semantic tree structure based on sentence parsing, aimed at aligning textual words and image regions for accurate analysis.  ...  Sentiment analysis is crucial for extracting social signals from social media content.  ...  Acknowledgment This work was generously supported in part by Adobe Research and New York State through the Goergen Institute for Data Science at the University of Rochester.  ... 
doi:10.1145/2964284.2964288 dblp:conf/mm/YouCJL16 fatcat:jwftcdellngpblor5mpbhooq6i

Long-term Recurrent Convolutional Networks for Visual Recognition and Description [article]

Jeff Donahue, Lisa Anne Hendricks, Marcus Rohrbach, Subhashini Venugopalan, Sergio Guadarrama, Kate Saenko, Trevor Darrell
2016 arXiv   pre-print
Models based on deep convolutional networks have dominated recent image interpretation tasks; we investigate whether models which are also recurrent, or "temporally deep", are effective for tasks involving  ...  tasks, image description and retrieval problems, and video narration challenges.  ...  ACKNOWLEDGMENTS The authors thank Oriol Vinyals for valuable advice and helpful discussion throughout this work.  ... 
arXiv:1411.4389v4 fatcat:24egskhcdjcsfet4onvzhwfskq

Learning Multimodal Attention LSTM Networks for Video Captioning

Jun Xu, Ting Yao, Yongdong Zhang, Tao Mei
2017 Proceedings of the 2017 ACM on Multimedia Conference - MM '17  
Multimodal Attention Long-Short Term Memory networks (MA-LSTM).  ...  Different from existing approaches that employ the same LSTM structure for different modalities, we train modality-specific LSTM to capture the intrinsic representations of individual modalities.  ...  In [24] , Venugopalan et al. design an encoderdecoder neural network to generate descriptions.  ... 
doi:10.1145/3123266.3123448 dblp:conf/mm/XuYZM17 fatcat:walf26k3tve45idna47khhtsq4

Deep-Temporal LSTM for Daily Living Action Recognition [article]

Srijan Das, Michal Koperski, Francois Bremond, Gianpiero Francesca
2018 arXiv   pre-print
In this work, we propose a deep-temporal LSTM architecture which extends standard LSTM and allows better encoding of temporal information.  ...  In addition, we propose to fuse 3D skeleton geometry with deep static appearance.  ...  The complementary nature of the LSTM and CNN based networks are evident from the boosted performance for MSRDailyActiv-ity3D and NTU-RGB+D on fusion.  ... 
arXiv:1802.00421v2 fatcat:th4clkmfzvejrlmhs3n7z5smvu

A Review of Deep Learning-based Human Activity Recognition on Benchmark Video Datasets

Vijeta Sharma, Manjari Gupta, Anil Kumar Pandey, Deepti Mishra, Ajai Kumar
2022 Applied Artificial Intelligence  
A short comparison is also made with the handcrafted feature-based approach and its fusion with deep learning to show the evolution of HAR methods.  ...  We propose a new taxonomy for categorizing the literature as CNN and RNN-based approaches.  ...  State-of-the-art Deep Learning Approach In based methods as -multi-stream networks and sequential networks, whereas RNN-based methods as -LSTM with CNN and Fusion LSTM.  ... 
doi:10.1080/08839514.2022.2093705 fatcat:6on4g3sp3vaktnyyrk72k4mqta

Deep fusion of gray level co-occurrence matrices for lung nodule classification [article]

Ahmed Saihood, Hossein Karshenas, AhmadReza Naghsh Nilchi
2022 arXiv   pre-print
LSTM fusion structure.  ...  A new long-short-term-memory (LSTM) based deep fusion structure, is introduced, where, the texture features computed from lung nodules through new volumetric grey-level-co-occurrence-matrices (GLCM) computations  ...  This LSTM-based deep network fusion for adjacent VSs features is applied for classifying lung nodules on behalf of the spatially correlated information between sequenced VSs.  ... 
arXiv:2205.05123v1 fatcat:mpgnycr7mnddpju6o7df2eknh4

Deep Auto-Encoders with Sequential Learning for Multimodal Dimensional Emotion Recognition [article]

Dung Nguyen, Duc Thanh Nguyen, Rui Zeng, Thanh Thi Nguyen, Son N. Tran, Thin Nguyen, Sridha Sridharan, Clinton Fookes
2020 arXiv   pre-print
To address these challenges, in this paper, we propose a novel deep neural network architecture consisting of a two-stream auto-encoder and a long short term memory for effectively integrating visual and  ...  audio signal streams for emotion recognition.  ...  Those features are then combined via a fusion layer before being fed to the LSTM for sequential learning of the features (for every 0.2s long sequence) from input streams.  ... 
arXiv:2004.13236v1 fatcat:qgoybjaf3jgphcoed7qablxqlm

Sequential Deep Trajectory Descriptor for Action Recognition With Three-Stream CNN

Yemin Shi, Yonghong Tian, Yaowei Wang, Tiejun Huang
2017 IEEE transactions on multimedia  
To address this problem, this paper proposes a long-term motion descriptor called sequential Deep Trajectory Descriptor (sDTD).  ...  Specifically, we project dense trajectories into two-dimensional planes, and subsequently a CNN-RNN network is employed to learn an effective representation for long-term motion.  ...  [4] proposed their own recurrent networks respectively by connecting LSTMs to CNNs. Donahue et al. tested their model on activity recognition, image description and video description. Wu et al.  ... 
doi:10.1109/tmm.2017.2666540 fatcat:lddmr4wecnesbcua2tphb7qvxy

Deep-Temporal LSTM for Daily Living Action Recognition

Srijan Das, Michal Koperski, Francois Bremond, Gianpiero Francesca
2018 2018 15th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)  
In this work, we propose a deep-temporal LSTM architecture which extends standard LSTM and allows better encoding of temporal information.  ...  In addition, we propose to fuse 3D skeleton geometry with deep static appearance.  ...  The complementary nature of the LSTM and CNN based networks are evident from the boosted performance for MSRDailyActiv-ity3D and NTU-RGB+D on fusion.  ... 
doi:10.1109/avss.2018.8639122 dblp:conf/avss/DasKBF18 fatcat:qzl7akjymfeijjjv6mbpolrm5m

Long Short-Term Memory with Gate and State Level Fusion for Light Field-Based Face Recognition [article]

Alireza Sepas-Moghaddam, Ali Etemad, Fernando Pereira, Paulo Lobato Correia
2020 arXiv   pre-print
Long Short-Term Memory (LSTM) is a prominent recurrent neural network for extracting dependencies from sequential data such as time-series and multi-view data, having achieved impressive results for different  ...  The efficacy of the novel LSTM cell architectures is assessed by integrating them into deep learning-based methods for face recognition with multi-view, light field images.  ...  Nowadays, due to their superior representation and prediction performance, deep Convolutional Neural Networks (CNNs) are increasingly adopted for visual recognition and description tasks [4] .  ... 
arXiv:1905.04421v2 fatcat:xi7fxu2qrjde7cnp6su4pujlpe

Fine-Grained Recognition via Attribute-Guided Attentive Feature Aggregation

Yichao Yan, Bingbing Ni, Xiaokang Yang
2017 Proceedings of the 2017 ACM on Multimedia Conference - MM '17  
This could be considered as a discriminant aggregation network and informative patch-level features are propagated and accumulated to the deeper nodes of the recurrent network for final classification.  ...  First, we develop a novel attribute-guided attentive network to sequentially discover informative parts/regions, by seeking a good registration between attentive regions and predefined object attributes  ...  description generation [29] .  ... 
doi:10.1145/3123266.3123358 dblp:conf/mm/YanNY17 fatcat:q5dlco5zrbfdfenmuhhpwzcp2a
« Previous Showing results 1 — 15 out of 3,943 results