Knowledge-Constrained Answer Generation for Open-Ended Video Question Answering

Authors

  • Yao Jin Hangzhou Dianzi University
  • Guocheng Niu Baidu Inc.
  • Xinyan Xiao Baidu Inc.
  • Jian Zhang Zhejiang International Studies University
  • Xi Peng College of Computer Science, Sichuan Univerisity
  • Jun Yu Hangzhou Dianzi University

DOI:

https://doi.org/10.1609/aaai.v37i7.25983

Keywords:

ML: Multi-Instance/Multi-View Learning, ML: Multimodal Learning

Abstract

Open-ended Video question answering (open-ended VideoQA) aims to understand video content and question semantics to generate the correct answers. Most of the best performing models define the problem as a discriminative task of multi-label classification. In real-world scenarios, however, it is difficult to define a candidate set that includes all possible answers. In this paper, we propose a Knowledge-constrained Generative VideoQA Algorithm (KcGA) with an encoder-decoder pipeline, which enables out-of-domain answer generation through an adaptive external knowledge module and a multi-stream information control mechanism. We use ClipBERT to extract the video-question features, extract framewise object-level external knowledge from a commonsense knowledge base and compute the contextual-aware episode memory units via an attention based GRU to form the external knowledge features, and exploit multi-stream information control mechanism to fuse video-question and external knowledge features such that the semantic complementation and alignment are well achieved. We evaluate our model on two open-ended benchmark datasets to demonstrate that we can effectively and robustly generate high-quality answers without restrictions of training data.

Downloads

Published

2023-06-26

How to Cite

Jin, Y., Niu, G., Xiao, X., Zhang, J., Peng, X., & Yu, J. (2023). Knowledge-Constrained Answer Generation for Open-Ended Video Question Answering. Proceedings of the AAAI Conference on Artificial Intelligence, 37(7), 8141-8149. https://doi.org/10.1609/aaai.v37i7.25983

Issue

Section

AAAI Technical Track on Machine Learning II