Past week
All results
- All results
- Verbatim
2 days ago · Activitynet-qa: A dataset for understanding complex web videos via question answering. ... Pano-avqa: Grounded audio-visual question answering on 360deg videos.
2 days ago · We introduce a novel visual question answering dataset comprising videos captured with a wearable 360-degree camera, aiming to address common challenges ...
4 days ago · Audio-visual question answering (AVQA) is a challenging task that requires multistep spatio-temporal reasoning over multimodal contexts. Recent works rely on ...
5 days ago · Pano-AVQA: Grounded Audio-Visual Question Answering on 360° Videos ... Datasets for Audio-Visual Video Representation Learning ... YouTube-8M is the largest video ...
In order to show you the most relevant results, we have omitted some entries very similar to the 4 already displayed.
If you like, you can repeat the search with the omitted results included. |