Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
×
A curated list of Audio Visual Question Answering(AVQA) dataset and papers. AVQA is a task where a system analyzes both audio and visual elements in a video ...
Apr 24, 2024 · Connected Papers is a visual tool to help researchers and applied scientists find academic papers relevant to their field of work.
This work proposes a new question answering task on instructional videos, designed to identify a span of a video segment as an answer which contains ...
In audio-visual temporal question, when asking which instrument in the video sounds first, nearly 80% answer is “Simultaneously”. In count- ing questions, ...
Oct 2, 2023 · Abstract: As a newly emerging task, audio-visual question answering (AVQA) has attracted research attention.
Mar 26, 2024 · The Spatio-Temporal Music AVQA (Music-AVQA) dataset was constructed on a large scale for this purpose. YouTube videos of musicians performing ...
To explore scene understanding and spatio-temporal reasoning over audio and visual modalities, we build a largescale audio-visual dataset, MUSIC-AVQA, which ...
Yun et al. [42] proposed the Pano-AVQA dataset, which comprises 360-degree videos and corresponding question-answer pairs. The Pano-AVQA dataset covers two ...
May 20, 2024 · Current audio-visual question answering (AVQA) methods are hindered by the scarcity of open-ended AVQA datasets. Most existing datasets ...