Hierarchical spatio-temporal context modeling for action recognition.

In this paper, we propose to model the spatio-temporal context information in a hierarchical way, where three levels of context are exploited in ascending order of abstraction: 1) point-level context ( ... Building on the multichannel nonlinear SVMs, we validate this proposed hierarchical framework on the realistic action (HOHA) and event (LSCOM) recognition databases, and achieve 27% and 66% relative performance ... Schematic diagram on hierarchical spatio-temporal context modeling. ...

doi:10.1109/cvpr.2009.5206721 dblp:conf/cvpr/SunWYCCL09 fatcat:v6ww72arfjfdbokklaxe5qu5vq

In this paper, we propose to model the spatio-temporal context information in a hierarchical way, where three levels of context are exploited in ascending order of abstraction: 1) point-level context ( ... Building on the multichannel nonlinear SVMs, we validate this proposed hierarchical framework on the realistic action (HOHA) and event (LSCOM) recognition databases, and achieve 27% and 66% relative performance ... Schematic diagram on hierarchical spatio-temporal context modeling. ...

doi:10.1109/cvprw.2009.5206721 fatcat:d2qg4be4czfwdbznw7x7oruwum

To address the above issues, we propose an innovative heuristic architecture called Multi-stage Factorized Spatio-Temporal (MFST) for RGB-D action and gesture recognition. ... Specifically, the CDC-Stem module captures bottom-level spatio-temporal features and passes them successively to the following spatio-temporal factored stages to capture the hierarchical spatial and temporal ... With these designs, our MFST model effectively learns spatio-temporal features within each modality for action and gesture recognition. ...

arXiv:2308.12006v2 fatcat:uajfhoofpbg23ac54tnctjbdvy

Multiple Versions

Experiments show that our model is well suited for dense multi-label action recognition, which is a challenging sub-topic of action recognition that requires predicting multiple action labels in each frame ... By stacking spatial and temporal convolutions repeatedly, TAN forms a deep hierarchical representation for capturing spatio-temporal information in videos. ... This demonstrates that our model progressively constructs hierarchical spatio-temporal representations for recognizing human actions. ...

arXiv:1812.06203v1 fatcat:7allnptupzalbnxdoshweo5rpu

Recent video recognition models utilize Transformer models for long-range spatio-temporal context modeling. ... We extensively explore the design space of focal modulation-based spatio-temporal context modeling and demonstrate our parallel spatial and temporal encoding design to be the optimal choice. ... Conclusion To learn spatio-temporal representations that can effectively model both local and global contexts, this paper introduces Video-FocalNets for video action recognition tasks. ...

arXiv:2307.06947v4 fatcat:4mz2hvnhzbbhbmaji4ehjjkupa

Open Access Multiple Versions

At the second level of the hierarchy, a large contextual region containing many STVs (Ensemble of STVs) is considered in order to construct a probabilistic model of STVs and their spatio-temporal compositions ... in the cases of a single training example and cross-dataset action recognition. ... Recently, local STVs have been used in the context of BOV models and have shown promising results for action recognition [2] , [3] , [7] - [13] . ...

doi:10.1109/crv.2012.32 dblp:conf/crv/RoshtkhariL12 fatcat:woqfp4exlbbytbpkydgsjfni5i

The challenge is to recognize human actions with more accuracy and efficiency in recognition time. ... The important area in computer vision is human understanding and recognizing actions. The main aim of action recognition is an automatic analysis of various actions from video data. ... Ryoo and Aggarwal propose a spatio-temporal relationship matching strategy for human action recognition. ...

doi:10.17577/ijertv4is080577 fatcat:hikmv56t6jc5la7ipcny5u4kha

In this paper, we present a methodology for human action recognition from a sequence of depth maps obtained using Microsoft Kinect. ... In order to make the representation insensitive to temporal sequence misalignment, we propose using the Temporal Bag-of-Words model in a hierarchical manner by recursively partitioning the depth maps sequence ... Spatio-temporal features Local spatio-temporal features have shown good performance in action recognition task for color video stream [3] , [8] . ...

dblp:conf/mva/ShuklaBK13 fatcat:oragei32zfdxbbjjumt73uwbne

, person-group interaction, group-group interaction, uses of temporal information, and recognition of group activity frame wise or video wise. ... Findings: Different models of group activity recognition are characterized as per the capabilities of the defined model considering individual pose of person, atomic activity of person, person-person interaction ... as they uses context model like Action Context (AC) descriptor, Relative Action context (RAC) descriptor, Spatio-Temporal Volume (STV). 11, 12 Person-person interaction model like distance based attraction ...

doi:10.17485/ijst/2017/v10i23/113996 fatcat:5ltu45vqmvdgxgeasifu3fipry

Finally, a hierarchical decomposition of the human body is introduced to obtain a discriminative representation for a single action. ... INDEX TERMS Human activity recognition, qualitative spatio-temporal graph, vector quantization, discrete HMMs. ... of the hierarchical decomposition of the human body for activity recognition, we constructed spatio-temporal graphs for actions based on three alternative decompositions: the whole body (no hierarchical ...

doi:10.1109/access.2020.3009136 fatcat:skmwviewnjg2rigyztji55z6ga

DOAJ

Secondly, deviating from existing hierarchical approaches (individual-to-social-to-global activity), we introduce a dual-path architecture for multi-granular activity recognition. ... First, while previous works often focus on spatial distance among individuals within an image, we argue to consider the spatio-temporal proximity. ... For comparisons, we model three types of transformer structure: parallel, hierarchical, and reverse hierarchical. ...

arXiv:2403.14113v1 fatcat:liicsm7wjnbbtosmgtendspopq

This paper addresses the problem of complex activity recognition with a unified hierarchical model. We expand triangularchain CRFs (TriCRFs) to the spatial dimension. ... The proposed architecture can be perceived as a spatio-temporal version of the TriCRFs, in which the labels of actions and activity are modeled jointly and their complex dependencies are exploited. ... ACKNOWLEDGEMENTS We thank the anonymous reviews for their valuable comments. ...

doi:10.1145/2733373.2806304 dblp:conf/mm/CaoZL15 fatcat:47oporwe5fc47ousn5ispxjzba

This paper presents a novel approach for action recognition, localization and video matching based on a hierarchical codebook model of local spatio-temporal video volumes. ... The hierarchical algorithm codes a video as a compact set of spatio-temporal volumes, while considering their spatio-temporal compositions in order to account for spatial and temporal contextual information ... In this paper we present a hierarchical probabilistic codebook method for action recognition and localization in videos. ...

doi:10.1016/j.imavis.2013.08.005 fatcat:oyvytzkhnfalnlxqdp2zkqtwr4

Inspired by the way the human visual system operates, we propose a hierarchical architecture to capture the spatio-temporal information from a given input video at different time scales. ... The proposed network is used for video action detection and localization application that is the foundational element for video analysis. ... [7] proposes transformer networks for video action recognition, which can be used to extract the spatio-temporal context of the person. This can be used for human activity clas-sification. ...

doi:10.1109/cvprw50498.2020.00390 dblp:conf/cvpr/RamaswamySGP20 fatcat:crmhhmo5krfdzj6uyzboqojncq

Experiments on VIRAT 1.0 and 2.0 Ground Datasets demonstrate the effectiveness of the proposed hierarchical context model for improving the event recognition performance even under great challenges like ... To tackle the learning and inference challenges brought in by the model hierarchy, we develop complete learning and inference algorithms for the proposed hierarchical context model based on variational ... In addition, approaches like [26, 23] also utilize hierarchical probabilistic models for event and action recognition. ...

doi:10.1109/cvpr.2014.328 dblp:conf/cvpr/WangJ14 fatcat:vsbin2nebfgapbnv2lxgf7ekai

Hierarchical spatio-temporal context modeling for action recognition

Preserved Fulltext

Hierarchical spatio-temporal context modeling for action recognition

Preserved Fulltext

Multi-stage Factorized Spatio-Temporal Representation for RGB-D Action and Gesture Recognition [article]

Preserved Fulltext

Other Versions

TAN: Temporal Aggregation Network for Dense Multi-label Action Recognition [article]

Preserved Fulltext

Video-FocalNets: Spatio-Temporal Focal Modulation for Video Action Recognition [article]

Preserved Fulltext

Other Versions

A Multi-Scale Hierarchical Codebook Method for Human Action Recognition in Videos Using a Single Example

Preserved Fulltext

Study on Recent Approaches for Human Action Recognition in Real Time

Preserved Fulltext

Action Recognition using Temporal Bag-of-Words from Depth Maps

Preserved Fulltext

A Comprehensive Study of Group Activity Recognition Methods in Video

Preserved Fulltext

Learning Dynamic Spatio-Temporal Relations for Human Activity Recognition

Preserved Fulltext

Spatio-Temporal Proximity-Aware Dual-Path Model for Panoramic Activity Recognition [article]

Preserved Fulltext

Spatio-Temporal Triangular-Chain CRF for Activity Recognition

Preserved Fulltext

Human activity recognition in videos using a single example

Preserved Fulltext

Spatio-temporal action detection and localization using a hierarchical LSTM

Preserved Fulltext

A Hierarchical Context Model for Event Recognition in Surveillance Video

Preserved Fulltext