A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Filters
Learning Dynamic Memory Networks for Object Tracking
[article]
2018
arXiv
pre-print
is favorable for memorizing long-term object information. ...
As the location of the target is at first unknown in the search feature map, an attention mechanism is applied to concentrate the LSTM input on the potential target. ...
Memory Networks. Recent use of convolutional LSTM for visual tracking [36] shows that memory states are useful for object template management over long timescales. ...
arXiv:1803.07268v2
fatcat:2xt4e7jygrhlzhf22at6qfmg2a
Learning Dynamic Memory Networks for Object Tracking
[chapter]
2018
Lecture Notes in Computer Science
favorable for memorizing long-term object information. ...
As the location of the target is at first unknown in the search feature map, an attention mechanism is applied to concentrate the LSTM input on the potential target. ...
Memory Networks. Recent use of convolutional LSTM for visual tracking [36] shows that memory states are useful for object template management over long timescales. ...
doi:10.1007/978-3-030-01240-3_10
fatcat:qp6lbqkbdzhjbhhwcyuc6gpxli
DAWN: Dual Augmented Memory Network for Unsupervised Video Object Tracking
[article]
2019
arXiv
pre-print
We are thus interested in memory augmented network, where an external memory remembers the evolving appearance of the target (foreground) object without backpropagation for updating weights. ...
Our Dual Augmented Memory Network (DAWN) is unique in remembering both target and background, and using an improved attention LSTM memory to guide the focus on memorized features. ...
For single object tracking, RASNet [34] introduces general attention, residual attention and channel attention. ...
arXiv:1908.00777v2
fatcat:xqqcajz4i5hwrecv3o2zvx4cke
Cooperative Use of Recurrent Neural Network and Siamese Region Proposal Network for Robust Visual Tracking
2021
IEEE Access
Image features extracted by the Siamese network are strengthened by the channel and spatial attention mechanisms, and are sent to the RPN for classification and regression. ...
Temporal information is processed by a recurrent neural network-based Long Short-Term Memory (LSTM) to predict the rough location of the target, it is mapped to the anchor feature map of the RPN for anchor ...
It involved adding an attention mechanism to the residual network. ...
doi:10.1109/access.2021.3072778
fatcat:zed2kglxffeyznqqcefqh7euly
A New Video-Based Crash Detection Method: Balancing Speed and Accuracy Using a Feature Fusion Deep Learning Framework
2020
Journal of Advanced Transportation
In this framework, a residual neural network (ResNet) combined with attention modules was proposed to extract crash-related appearance features from urban traffic videos (i.e., a crash appearance feature ...
networks. ...
Crash Appearance Feature Module (ResNet + Attention)
Residual Neural Network (ResNet). ...
doi:10.1155/2020/8848874
fatcat:gpbmso4isnabdhyn4uwfmdvsua
LSTA: Long Short-Term Attention for Egocentric Action Recognition
2019
2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
In this paper we propose LSTA as a mechanism to focus on features from relevant spatial parts while attention is being tracked smoothly across the video sequence. ...
It requires a fine-grained discrimination of small objects and their manipulation. ...
We gratefully acknowledge the support of NVIDIA Corporation with the donation of GPUs used for this research. ...
doi:10.1109/cvpr.2019.01019
dblp:conf/cvpr/SudhakaranEL19
fatcat:numtqwnpdjgijhao3s4e7optii
Recognising Human-Object Interactions Using Attention-based LSTMs
2019
Computer Graphics and Visual Computing
The proposed approach applies an attention mechanism to LSTMs in order to focus on important parts of human and object temporal information. ...
The framework consists of LSTMs that firstly capture both human motion and temporal object information independently, followed by fusing these information through a bilinear layer to aggregate human-object ...
The contributions of this paper are as follows: • We propose an LSTM-based framework with attention mechanism for recognising human-object interaction in videos. ...
doi:10.2312/cgvc.20191269
dblp:conf/tpcg/AlmushytiL19
fatcat:o3wskswsojenxgkmhut6skn5hm
Improved Actor Relation Graph based Group Activity Recognition
[article]
2020
arXiv
pre-print
A visualization model is further introduced to visualize each input video frame with predicted bounding boxes on each human object and predict individual action and collective activity. ...
use Normalized cross-correlation (NCC) and the sum of absolute differences (SAD) to calculate the pair-wise appearance similarity and build the actor relationship graph to allow the graph convolution network ...
Li et al. introduced a novel attention-based framework called Residual attention-based LSTM (Res-ATT [14] ). ...
arXiv:2010.12968v2
fatcat:z2mwgjjg2vclxj6cs6o4q2rcyy
Multiple Object Tracking in Deep Learning Approaches: A Survey
2021
Electronics
Multiple Object Tracking (MOT) is a subclass of object tracking that has received growing interest due to its academic and commercial potential. ...
Object tracking is a fundamental computer vision problem that refers to a set of methods proposed to precisely track the motion trajectory of an object in a video. ...
LSTM to use a Dual Matching Attention Network (DMAN). ...
doi:10.3390/electronics10192406
fatcat:phffjkjt3bbzfct6hs4657ipey
A Framework for Trajectory Prediction of Preceding Target Vehicles in Urban Scenario Using Multi-Sensor Fusion
2022
Sensors
Reliable trajectory prediction of preceding vehicles is crucial for making safer planning. ...
Then, two transformer-based networks are built to predict preceding target vehicles' future trajectory, which are the traditional transformer and the cluster-based transformer. ...
The FDE for C-TF and TF are 3.519 m and 6.046 m, respectively, which improve 20.781 m and 18.254 m respectively compared to LSTM. ...
doi:10.3390/s22134808
pmid:35808302
pmcid:PMC9268907
fatcat:v2blbrqy6zfotbbhpluiyclo7y
LSTA: Long Short-Term Attention for Egocentric Action Recognition
[article]
2019
arXiv
pre-print
In this paper we propose LSTA as a mechanism to focus on features from spatial relevant parts while attention is being tracked smoothly across the video sequence. ...
It requires a fine-grained discrimination of small objects and their manipulation. ...
We gratefully acknowledge the support of NVIDIA Corporation with the donation of GPUs used for this research. ...
arXiv:1811.10698v3
fatcat:ix72yqsyfnhhfcwcsvffr4xnxq
Revisiting Video Saliency: A Large-scale Benchmark and a New Model
[article]
2018
arXiv
pre-print
Second, we propose a novel video saliency model that augments the CNN-LSTM network architecture with an attention mechanism to enable fast, end-to-end saliency learning. ...
The attention mechanism explicitly encodes static saliency information, thus allowing LSTM to focus on learning more flexible temporal saliency representation across successive frames. ...
They are mainly based on two-stream network architecture [2] that accounts for color images and motion fields separately, or two-layer LSTM with object information [30] . ...
arXiv:1801.07424v3
fatcat:kiutlpo76zbnrnyoemha5f25mu
Revisiting Video Saliency: A Large-Scale Benchmark and a New Model
2018
2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition
Second, we propose a novel video saliency model that augments the CNN-LSTM network architecture with an attention mechanism to enable fast, end-to-end saliency learning. ...
The attention mechanism explicitly encodes static saliency information, thus allowing LSTM to focus on learning more flexible temporal saliency representation across successive frames. ...
They are mainly based on two-stream network architecture [2] that accounts for color images and motion fields separately, or two-layer LSTM with object information [30] . ...
doi:10.1109/cvpr.2018.00514
dblp:conf/cvpr/WangSGCB18
fatcat:d72jmowatzaixm7va6ztnkj3xa
CNN-based MultiChannel End-to-End Speech Recognition for everyday home environments
[article]
2019
arXiv
pre-print
The system comprises of an attention-based encoder-decoder neural network that directly generates a text as an output from a sound input. ...
The multichannel CNN encoder, which uses residual connections and batch renormalization, is trained with augmented data, including white noise injection. ...
The decoder network had a 1-layer LSTM with 300 cells and a CTC network. ...
arXiv:1811.02735v3
fatcat:7pistg3djjgyzlrjepw4hu74iu
Visual Feature Learning on Video Object and Human Action Detection: A Systematic Review
2021
Micromachines
Video object detection includes object classification and object location within the frame. Human action recognition is the detection of human actions. ...
Video object and human action detection are applied in many fields, such as video surveillance, face recognition, etc. ...
[68] propose a fully-convolutional Siamese network for the video object tracking, which is light, and outperform the previous object tracking methods. Cai et al. ...
doi:10.3390/mi13010072
pmid:35056238
pmcid:PMC8781209
fatcat:kdc5msiv2rd7zh7qlxymbpdk3y
« Previous
Showing results 1 — 15 out of 4,540 results