Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Filters








4,540 Hits in 3.9 sec

Learning Dynamic Memory Networks for Object Tracking [article]

Tianyu Yang, Antoni B. Chan
2018 arXiv   pre-print
is favorable for memorizing long-term object information.  ...  As the location of the target is at first unknown in the search feature map, an attention mechanism is applied to concentrate the LSTM input on the potential target.  ...  Memory Networks. Recent use of convolutional LSTM for visual tracking [36] shows that memory states are useful for object template management over long timescales.  ... 
arXiv:1803.07268v2 fatcat:2xt4e7jygrhlzhf22at6qfmg2a

Learning Dynamic Memory Networks for Object Tracking [chapter]

Tianyu Yang, Antoni B. Chan
2018 Lecture Notes in Computer Science  
favorable for memorizing long-term object information.  ...  As the location of the target is at first unknown in the search feature map, an attention mechanism is applied to concentrate the LSTM input on the potential target.  ...  Memory Networks. Recent use of convolutional LSTM for visual tracking [36] shows that memory states are useful for object template management over long timescales.  ... 
doi:10.1007/978-3-030-01240-3_10 fatcat:qp6lbqkbdzhjbhhwcyuc6gpxli

DAWN: Dual Augmented Memory Network for Unsupervised Video Object Tracking [article]

Zhenmei Shi, Haoyang Fang, Yu-Wing Tai, Chi-Keung Tang
2019 arXiv   pre-print
We are thus interested in memory augmented network, where an external memory remembers the evolving appearance of the target (foreground) object without backpropagation for updating weights.  ...  Our Dual Augmented Memory Network (DAWN) is unique in remembering both target and background, and using an improved attention LSTM memory to guide the focus on memorized features.  ...  For single object tracking, RASNet [34] introduces general attention, residual attention and channel attention.  ... 
arXiv:1908.00777v2 fatcat:xqqcajz4i5hwrecv3o2zvx4cke

Cooperative Use of Recurrent Neural Network and Siamese Region Proposal Network for Robust Visual Tracking

Xuechen Zhao, Yaoming Liu, Guang Han
2021 IEEE Access  
Image features extracted by the Siamese network are strengthened by the channel and spatial attention mechanisms, and are sent to the RPN for classification and regression.  ...  Temporal information is processed by a recurrent neural network-based Long Short-Term Memory (LSTM) to predict the rough location of the target, it is mapped to the anchor feature map of the RPN for anchor  ...  It involved adding an attention mechanism to the residual network.  ... 
doi:10.1109/access.2021.3072778 fatcat:zed2kglxffeyznqqcefqh7euly

A New Video-Based Crash Detection Method: Balancing Speed and Accuracy Using a Feature Fusion Deep Learning Framework

Zhenbo Lu, Wei Zhou, Shixiang Zhang, Chen Wang, Kun Wang
2020 Journal of Advanced Transportation  
In this framework, a residual neural network (ResNet) combined with attention modules was proposed to extract crash-related appearance features from urban traffic videos (i.e., a crash appearance feature  ...  networks.  ...  Crash Appearance Feature Module (ResNet + Attention) Residual Neural Network (ResNet).  ... 
doi:10.1155/2020/8848874 fatcat:gpbmso4isnabdhyn4uwfmdvsua

LSTA: Long Short-Term Attention for Egocentric Action Recognition

Swathikiran Sudhakaran, Sergio Escalera, Oswald Lanz
2019 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)  
In this paper we propose LSTA as a mechanism to focus on features from relevant spatial parts while attention is being tracked smoothly across the video sequence.  ...  It requires a fine-grained discrimination of small objects and their manipulation.  ...  We gratefully acknowledge the support of NVIDIA Corporation with the donation of GPUs used for this research.  ... 
doi:10.1109/cvpr.2019.01019 dblp:conf/cvpr/SudhakaranEL19 fatcat:numtqwnpdjgijhao3s4e7optii

Recognising Human-Object Interactions Using Attention-based LSTMs

Muna Almushyti, Frederick W. B. Li
2019 Computer Graphics and Visual Computing  
The proposed approach applies an attention mechanism to LSTMs in order to focus on important parts of human and object temporal information.  ...  The framework consists of LSTMs that firstly capture both human motion and temporal object information independently, followed by fusing these information through a bilinear layer to aggregate human-object  ...  The contributions of this paper are as follows: • We propose an LSTM-based framework with attention mechanism for recognising human-object interaction in videos.  ... 
doi:10.2312/cgvc.20191269 dblp:conf/tpcg/AlmushytiL19 fatcat:o3wskswsojenxgkmhut6skn5hm

Improved Actor Relation Graph based Group Activity Recognition [article]

Zijian Kuang, Xinran Tie
2020 arXiv   pre-print
A visualization model is further introduced to visualize each input video frame with predicted bounding boxes on each human object and predict individual action and collective activity.  ...  use Normalized cross-correlation (NCC) and the sum of absolute differences (SAD) to calculate the pair-wise appearance similarity and build the actor relationship graph to allow the graph convolution network  ...  Li et al. introduced a novel attention-based framework called Residual attention-based LSTM (Res-ATT [14] ).  ... 
arXiv:2010.12968v2 fatcat:z2mwgjjg2vclxj6cs6o4q2rcyy

Multiple Object Tracking in Deep Learning Approaches: A Survey

Yesul Park, L. Minh Dang, Sujin Lee, Dongil Han, Hyeonjoon Moon
2021 Electronics  
Multiple Object Tracking (MOT) is a subclass of object tracking that has received growing interest due to its academic and commercial potential.  ...  Object tracking is a fundamental computer vision problem that refers to a set of methods proposed to precisely track the motion trajectory of an object in a video.  ...  LSTM to use a Dual Matching Attention Network (DMAN).  ... 
doi:10.3390/electronics10192406 fatcat:phffjkjt3bbzfct6hs4657ipey

A Framework for Trajectory Prediction of Preceding Target Vehicles in Urban Scenario Using Multi-Sensor Fusion

Bin Zou, Wenbo Li, Xianjun Hou, Luqi Tang, Quan Yuan
2022 Sensors  
Reliable trajectory prediction of preceding vehicles is crucial for making safer planning.  ...  Then, two transformer-based networks are built to predict preceding target vehicles' future trajectory, which are the traditional transformer and the cluster-based transformer.  ...  The FDE for C-TF and TF are 3.519 m and 6.046 m, respectively, which improve 20.781 m and 18.254 m respectively compared to LSTM.  ... 
doi:10.3390/s22134808 pmid:35808302 pmcid:PMC9268907 fatcat:v2blbrqy6zfotbbhpluiyclo7y

LSTA: Long Short-Term Attention for Egocentric Action Recognition [article]

Swathikiran Sudhakaran and Sergio Escalera and Oswald Lanz
2019 arXiv   pre-print
In this paper we propose LSTA as a mechanism to focus on features from spatial relevant parts while attention is being tracked smoothly across the video sequence.  ...  It requires a fine-grained discrimination of small objects and their manipulation.  ...  We gratefully acknowledge the support of NVIDIA Corporation with the donation of GPUs used for this research.  ... 
arXiv:1811.10698v3 fatcat:ix72yqsyfnhhfcwcsvffr4xnxq

Revisiting Video Saliency: A Large-scale Benchmark and a New Model [article]

Wenguan Wang and Jianbing Shen and Fang Guo and Ming-Ming Cheng and Ali Borji
2018 arXiv   pre-print
Second, we propose a novel video saliency model that augments the CNN-LSTM network architecture with an attention mechanism to enable fast, end-to-end saliency learning.  ...  The attention mechanism explicitly encodes static saliency information, thus allowing LSTM to focus on learning more flexible temporal saliency representation across successive frames.  ...  They are mainly based on two-stream network architecture [2] that accounts for color images and motion fields separately, or two-layer LSTM with object information [30] .  ... 
arXiv:1801.07424v3 fatcat:kiutlpo76zbnrnyoemha5f25mu

Revisiting Video Saliency: A Large-Scale Benchmark and a New Model

Wenguan Wang, Jianbing Shen, Fang Guo, Ming-Ming Cheng, Ali Borji
2018 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition  
Second, we propose a novel video saliency model that augments the CNN-LSTM network architecture with an attention mechanism to enable fast, end-to-end saliency learning.  ...  The attention mechanism explicitly encodes static saliency information, thus allowing LSTM to focus on learning more flexible temporal saliency representation across successive frames.  ...  They are mainly based on two-stream network architecture [2] that accounts for color images and motion fields separately, or two-layer LSTM with object information [30] .  ... 
doi:10.1109/cvpr.2018.00514 dblp:conf/cvpr/WangSGCB18 fatcat:d72jmowatzaixm7va6ztnkj3xa

CNN-based MultiChannel End-to-End Speech Recognition for everyday home environments [article]

Nelson Yalta, Shinji Watanabe, Takaaki Hori, Kazuhiro Nakadai, Tetsuya Ogata
2019 arXiv   pre-print
The system comprises of an attention-based encoder-decoder neural network that directly generates a text as an output from a sound input.  ...  The multichannel CNN encoder, which uses residual connections and batch renormalization, is trained with augmented data, including white noise injection.  ...  The decoder network had a 1-layer LSTM with 300 cells and a CTC network.  ... 
arXiv:1811.02735v3 fatcat:7pistg3djjgyzlrjepw4hu74iu

Visual Feature Learning on Video Object and Human Action Detection: A Systematic Review

Dengshan Li, Rujing Wang, Peng Chen, Chengjun Xie, Qiong Zhou, Xiufang Jia
2021 Micromachines  
Video object detection includes object classification and object location within the frame. Human action recognition is the detection of human actions.  ...  Video object and human action detection are applied in many fields, such as video surveillance, face recognition, etc.  ...  [68] propose a fully-convolutional Siamese network for the video object tracking, which is light, and outperform the previous object tracking methods. Cai et al.  ... 
doi:10.3390/mi13010072 pmid:35056238 pmcid:PMC8781209 fatcat:kdc5msiv2rd7zh7qlxymbpdk3y
« Previous Showing results 1 — 15 out of 4,540 results