Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
×
Real-life scenarios contain more complex relationships between audio-visual objects and a wider varieties of audio-visual daily activities. AVQA is an audio-visual question answering dataset for the multimodal understanding of audio-visual objects and activities in real-life scenarios on videos.
ABSTRACT. Audio-visual question answering aims to answer questions regard- ing both audio and visual modalities in a given video, and has drawn.
Oct 10, 2022 · Audio-visual question answering aims to answer questions regarding both audio and visual modalities in a given video, and has drawn ...
People also ask
AVQA is an audio-visual question answering dataset for the multimodal understanding of audio-visual objects and activities in real-life scenarios on videos.
Oct 9, 2022 · AVQA is an audio-visual question answering dataset for the multimodal understanding of audio-visual objects and activities in real-life ...
The large-scale MUSIC-AVQA dataset of musical performance contains 45867 question-answer pairs, distributed in 9288 videos for over 150 hours.
Dec 20, 2023 · This paper focuses on the Audio-Visual Question Answering (AVQA) task that aims to answer questions derived from untrimmed audible videos. To ...
We propose a novel benchmark named Pano-AVQA as a large- scale grounded audio-visual question answering dataset on panoramic videos. Using 5.4K 360◦ video ...
Oct 11, 2021 · We propose a novel benchmark named Pano-AVQA as a large-scale grounded audio-visual question answering dataset on panoramic videos. Using 5.4K ...
In this paper, we focus on the Audio-Visual Question. Answering (AVQA) task, which aims to answer questions regarding different visual objects, sounds, ...