AVQA: A Dataset for Audio-Visual Question Answering on Videos.

AllVideos Books Images Maps News Shopping

Real-life scenarios contain more complex relationships between audio-visual objects and a wider varieties of audio-visual daily activities. AVQA is an audio-visual question answering dataset for the multimodal understanding of audio-visual objects and activities in real-life scenarios on videos.

AVQA Dataset - Papers With Code

paperswithcode.com › dataset › avqa

About Featured Snippets

[PDF] AVQA: A Dataset for Audio-Visual Question Answering on Videos

mn.cs.tsinghua.edu.cn › papers › 2...

ABSTRACT. Audio-visual question answering aims to answer questions regard- ing both audio and visual modalities in a given video, and has drawn.

AVQA: A Dataset for Audio-Visual Question Answering on Videos

dl.acm.org › doi

Oct 10, 2022 · Audio-visual question answering aims to answer questions regarding both audio and visual modalities in a given video, and has drawn ...

ABSTRACT · References

AlyssaYoung/AVQA: ACM MM 2022 paper_AVQA - GitHub

github.com › AlyssaYoung › AVQA

AVQA is an audio-visual question answering dataset for the multimodal understanding of audio-visual objects and activities in real-life scenarios on videos.

AVQA Dataset

mn.cs.tsinghua.edu.cn › avqa

Oct 9, 2022 · AVQA is an audio-visual question answering dataset for the multimodal understanding of audio-visual objects and activities in real-life ...

People also search for

Music Question Answering

MUSIC-AVQA

MUSIC-AVQA Dataset - Papers With Code

paperswithcode.com › dataset › music-av...

The large-scale MUSIC-AVQA dataset of musical performance contains 45867 question-answer pairs, distributed in 9288 videos for over 150 hours.

Object-aware Adaptive-Positivity Learning for Audio-Visual Question ...

arxiv.org › cs

Dec 20, 2023 · This paper focuses on the Audio-Visual Question Answering (AVQA) task that aims to answer questions derived from untrimmed audible videos. To ...

[PDF] Pano-AVQA: Grounded Audio-Visual Question Answering on 360

openaccess.thecvf.com › papers

We propose a novel benchmark named Pano-AVQA as a large- scale grounded audio-visual question answering dataset on panoramic videos. Using 5.4K 360◦ video ...

Pano-AVQA: Grounded Audio-Visual Question Answering on 360 ... - arXiv

arxiv.org › cs

Oct 11, 2021 · We propose a novel benchmark named Pano-AVQA as a large-scale grounded audio-visual question answering dataset on panoramic videos. Using 5.4K ...

[PDF] Learning to Answer Questions in Dynamic Audio-Visual Scenarios

gewu-lab.github.io › MUSIC-AVQA

In this paper, we focus on the Audio-Visual Question. Answering (AVQA) task, which aims to answer questions regarding different visual objects, sounds, ...