AVQA: A Dataset for Audio-Visual Question Answering on Videos.

AllVideos Books Images Maps News Shopping

Past week

All results

All results
Verbatim

Video Question Answering for People with Visual Impairments Using an ...

2 days ago · Activitynet-qa: A dataset for understanding complex web videos via question answering. ... Pano-avqa: Grounded audio-visual question answering on 360deg videos.

[PDF] arXiv:2405.19794v1 [cs.CV] 30 May 2024

www.arxiv.org › pdf

2 days ago · We introduce a novel visual question answering dataset comprising videos captured with a wearable 360-degree camera, aiming to address common challenges ...

similar - arxiv-sanity

arxiv-sanity-lite.com › ...

4 days ago · Audio-visual question answering (AVQA) is a challenging task that requires multistep spatio-temporal reasoning over multimodal contexts. Recent works rely on ...

Gunhee Kim's research works | Seoul National University ... - ResearchGate

www.researchgate.net › Gunhee-Kim-21...

5 days ago · Pano-AVQA: Grounded Audio-Visual Question Answering on 360° Videos ... Datasets for Audio-Visual Video Representation Learning ... YouTube-8M is the largest video ...

People also search for

bat: learning to reason about spatial sounds with large language models

Self-supervised audio pre-training

Audio Flamingo

In order to show you the most relevant results, we have omitted some entries very similar to the 4 already displayed. If you like, you can repeat the search with the omitted results included.