Residual LSTM Attention Network for Object Tracking.

is favorable for memorizing long-term object information. ... As the location of the target is at first unknown in the search feature map, an attention mechanism is applied to concentrate the LSTM input on the potential target. ... Memory Networks. Recent use of convolutional LSTM for visual tracking [36] shows that memory states are useful for object template management over long timescales. ...

arXiv:1803.07268v2 fatcat:2xt4e7jygrhlzhf22at6qfmg2a

Multiple Versions

favorable for memorizing long-term object information. ... As the location of the target is at first unknown in the search feature map, an attention mechanism is applied to concentrate the LSTM input on the potential target. ... Memory Networks. Recent use of convolutional LSTM for visual tracking [36] shows that memory states are useful for object template management over long timescales. ...

doi:10.1007/978-3-030-01240-3_10 fatcat:qp6lbqkbdzhjbhhwcyuc6gpxli

We are thus interested in memory augmented network, where an external memory remembers the evolving appearance of the target (foreground) object without backpropagation for updating weights. ... Our Dual Augmented Memory Network (DAWN) is unique in remembering both target and background, and using an improved attention LSTM memory to guide the focus on memorized features. ... For single object tracking, RASNet [34] introduces general attention, residual attention and channel attention. ...

arXiv:1908.00777v2 fatcat:xqqcajz4i5hwrecv3o2zvx4cke

Open Access Multiple Versions

Image features extracted by the Siamese network are strengthened by the channel and spatial attention mechanisms, and are sent to the RPN for classification and regression. ... Temporal information is processed by a recurrent neural network-based Long Short-Term Memory (LSTM) to predict the rough location of the target, it is mapped to the anchor feature map of the RPN for anchor ... It involved adding an attention mechanism to the residual network. ...

doi:10.1109/access.2021.3072778 fatcat:zed2kglxffeyznqqcefqh7euly

DOAJ

In this framework, a residual neural network (ResNet) combined with attention modules was proposed to extract crash-related appearance features from urban traffic videos (i.e., a crash appearance feature ... networks. ... Crash Appearance Feature Module (ResNet + Attention) Residual Neural Network (ResNet). ...

doi:10.1155/2020/8848874 fatcat:gpbmso4isnabdhyn4uwfmdvsua

DOAJ

In this paper we propose LSTA as a mechanism to focus on features from relevant spatial parts while attention is being tracked smoothly across the video sequence. ... It requires a fine-grained discrimination of small objects and their manipulation. ... We gratefully acknowledge the support of NVIDIA Corporation with the donation of GPUs used for this research. ...

doi:10.1109/cvpr.2019.01019 dblp:conf/cvpr/SudhakaranEL19 fatcat:numtqwnpdjgijhao3s4e7optii

The proposed approach applies an attention mechanism to LSTMs in order to focus on important parts of human and object temporal information. ... The framework consists of LSTMs that firstly capture both human motion and temporal object information independently, followed by fusing these information through a bilinear layer to aggregate human-object ... The contributions of this paper are as follows: • We propose an LSTM-based framework with attention mechanism for recognising human-object interaction in videos. ...

doi:10.2312/cgvc.20191269 dblp:conf/tpcg/AlmushytiL19 fatcat:o3wskswsojenxgkmhut6skn5hm

A visualization model is further introduced to visualize each input video frame with predicted bounding boxes on each human object and predict individual action and collective activity. ... use Normalized cross-correlation (NCC) and the sum of absolute differences (SAD) to calculate the pair-wise appearance similarity and build the actor relationship graph to allow the graph convolution network ... Li et al. introduced a novel attention-based framework called Residual attention-based LSTM (Res-ATT [14] ). ...

arXiv:2010.12968v2 fatcat:z2mwgjjg2vclxj6cs6o4q2rcyy

Multiple Versions

Multiple Object Tracking (MOT) is a subclass of object tracking that has received growing interest due to its academic and commercial potential. ... Object tracking is a fundamental computer vision problem that refers to a set of methods proposed to precisely track the motion trajectory of an object in a video. ... LSTM to use a Dual Matching Attention Network (DMAN). ...

doi:10.3390/electronics10192406 fatcat:phffjkjt3bbzfct6hs4657ipey

DOAJ

Reliable trajectory prediction of preceding vehicles is crucial for making safer planning. ... Then, two transformer-based networks are built to predict preceding target vehicles' future trajectory, which are the traditional transformer and the cluster-based transformer. ... The FDE for C-TF and TF are 3.519 m and 6.046 m, respectively, which improve 20.781 m and 18.254 m respectively compared to LSTM. ...

doi:10.3390/s22134808 pmid:35808302 pmcid:PMC9268907 fatcat:v2blbrqy6zfotbbhpluiyclo7y

DOAJ

In this paper we propose LSTA as a mechanism to focus on features from spatial relevant parts while attention is being tracked smoothly across the video sequence. ... It requires a fine-grained discrimination of small objects and their manipulation. ... We gratefully acknowledge the support of NVIDIA Corporation with the donation of GPUs used for this research. ...

arXiv:1811.10698v3 fatcat:ix72yqsyfnhhfcwcsvffr4xnxq

Multiple Versions

Second, we propose a novel video saliency model that augments the CNN-LSTM network architecture with an attention mechanism to enable fast, end-to-end saliency learning. ... The attention mechanism explicitly encodes static saliency information, thus allowing LSTM to focus on learning more flexible temporal saliency representation across successive frames. ... They are mainly based on two-stream network architecture [2] that accounts for color images and motion fields separately, or two-layer LSTM with object information [30] . ...

arXiv:1801.07424v3 fatcat:kiutlpo76zbnrnyoemha5f25mu

Open Access Multiple Versions

Second, we propose a novel video saliency model that augments the CNN-LSTM network architecture with an attention mechanism to enable fast, end-to-end saliency learning. ... The attention mechanism explicitly encodes static saliency information, thus allowing LSTM to focus on learning more flexible temporal saliency representation across successive frames. ... They are mainly based on two-stream network architecture [2] that accounts for color images and motion fields separately, or two-layer LSTM with object information [30] . ...

doi:10.1109/cvpr.2018.00514 dblp:conf/cvpr/WangSGCB18 fatcat:d72jmowatzaixm7va6ztnkj3xa

The system comprises of an attention-based encoder-decoder neural network that directly generates a text as an output from a sound input. ... The multichannel CNN encoder, which uses residual connections and batch renormalization, is trained with augmented data, including white noise injection. ... The decoder network had a 1-layer LSTM with 300 cells and a CTC network. ...

arXiv:1811.02735v3 fatcat:7pistg3djjgyzlrjepw4hu74iu

Multiple Versions

Video object detection includes object classification and object location within the frame. Human action recognition is the detection of human actions. ... Video object and human action detection are applied in many fields, such as video surveillance, face recognition, etc. ... [68] propose a fully-convolutional Siamese network for the video object tracking, which is light, and outperform the previous object tracking methods. Cai et al. ...

doi:10.3390/mi13010072 pmid:35056238 pmcid:PMC8781209 fatcat:kdc5msiv2rd7zh7qlxymbpdk3y

DOAJ

Learning Dynamic Memory Networks for Object Tracking [article]

Preserved Fulltext

Other Versions

Learning Dynamic Memory Networks for Object Tracking [chapter]

Preserved Fulltext

DAWN: Dual Augmented Memory Network for Unsupervised Video Object Tracking [article]

Preserved Fulltext

Other Versions

Cooperative Use of Recurrent Neural Network and Siamese Region Proposal Network for Robust Visual Tracking

Preserved Fulltext

A New Video-Based Crash Detection Method: Balancing Speed and Accuracy Using a Feature Fusion Deep Learning Framework

Preserved Fulltext

LSTA: Long Short-Term Attention for Egocentric Action Recognition

Preserved Fulltext

Recognising Human-Object Interactions Using Attention-based LSTMs

Preserved Fulltext

Improved Actor Relation Graph based Group Activity Recognition [article]

Preserved Fulltext

Multiple Object Tracking in Deep Learning Approaches: A Survey

Preserved Fulltext

A Framework for Trajectory Prediction of Preceding Target Vehicles in Urban Scenario Using Multi-Sensor Fusion

Preserved Fulltext

LSTA: Long Short-Term Attention for Egocentric Action Recognition [article]

Preserved Fulltext

Other Versions

Revisiting Video Saliency: A Large-scale Benchmark and a New Model [article]

Preserved Fulltext

Other Versions

Revisiting Video Saliency: A Large-Scale Benchmark and a New Model

Preserved Fulltext

CNN-based MultiChannel End-to-End Speech Recognition for everyday home environments [article]

Preserved Fulltext

Other Versions

Visual Feature Learning on Video Object and Human Action Detection: A Systematic Review

Preserved Fulltext