Data Augmentation for Visual Question Answering

Kushal Kafle, Mohammed Yousefhussien, Christopher Kanan
2017 Proceedings of the 10th International Conference on Natural Language Generation  
However, data augmentation in natural language processing is much less studied. Here, we describe two methods for data augmentation for Visual Question Answering (VQA).  ...  The first uses existing semantic annotations to generate new questions. The second method is a generative approach using recurrent neural networks.  ...  Data augmentation is generating new training data from existing examples. In this paper, we explore two data augmentation methods for generating new question-answer (QA) pairs for images.  ... 
doi:10.18653/v1/w17-3529 dblp:conf/inlg/KafleYK17 fatcat:qjatdchlcfg4bfd355jgmorltq

Retrieval-Augmented Data Augmentation for Low-Resource Domain Tasks [article]

Minju Seo, Jinheon Baek, James Thorne, Sung Ju Hwang
2024 arXiv   pre-print
However, in low-resource settings, the amount of seed data samples to use for data augmentation is very small, which makes generated samples suboptimal and less diverse.  ...  existing LLM-powered data augmentation baselines.  ...  : {question 3} Answer Options: {answer options 3} Answer: {answer 3} Question: {question} I want you to act as a question and answer generator.  ... 
arXiv:2402.13482v1 fatcat:fpoym6cwerdgbj2v52o46gswpi

Cross-Modal Generative Augmentation for Visual Question Answering [article]

Zixu Wang, Yishu Miao, Lucia Specia
2021 arXiv   pre-print
This paper introduces a generative model for data augmentation by leveraging the correlations among multiple modalities.  ...  Experiments on Visual Question Answering as downstream task demonstrate the effectiveness of the proposed generative model, which is able to improve strong UpDn-based models to achieve state-of-the-art  ...  tasks for cross-modal data augmentation.  ... 
arXiv:2105.04780v2 fatcat:jhstbw4wbzgq7dwaca62xyrphq

Tell Me How to Ask Again: Question Data Augmentation with Controllable Rewriting in Continuous Space [article]

Dayiheng Liu, Yeyun Gong, Jie Fu, Yu Yan, Jiusheng Chen, Jiancheng Lv, Nan Duan, Ming Zhou
2020 arXiv   pre-print
In this paper, we propose a novel data augmentation method, referred to as Controllable Rewriting based Question Data Augmentation (CRQDA), for machine reading comprehension (MRC), question generation,  ...  We treat the question data augmentation task as a constrained question rewriting problem to generate context-relevant, high-quality, and diverse question data samples.  ...  Conclusion In this work, we present a novel question data augmentation method, called CRQDA, for contextrelevant answerable and unanswerable question generation.  ... 
arXiv:2010.01475v1 fatcat:5ncgxm5okzbmdbl4slqggrrsum

Learning to Rank Question Answer Pairs with Bilateral Contrastive Data Augmentation [article]

Yang Deng, Wenxuan Zhang, Wai Lam
2021 arXiv   pre-print
With the augmented dataset, we design a contrastive training objective for learning to rank question answer pairs.  ...  In this work, we propose a novel and easy-to-apply data augmentation strategy, namely Bilateral Generation (BiG), with a contrastive training objective for improving the performance of ranking question  ...  ., 2020) as the conditional generation model for data augmentation.  ... 
arXiv:2106.11096v2 fatcat:3j5icauh4bhmzhhyxp2dlnxqcm

Are VQA Systems RAD? Measuring Robustness to Augmented Data with Focused Interventions [article]

Daniel Rosenberg, Itai Gat, Amir Feder, Roi Reichart
2021 arXiv   pre-print
Finally, we connect between robustness and generalization, demonstrating the predictive power of RAD for performance on unseen augmentations.  ...  Our proposed augmentations are designed to make a focused intervention on a specific property of the question such that the answer changes.  ...  In our augmentations, we generate "yes/no" questions from "number" and "other" questions. For example, consider the question-answer pair "What color is the vehicle?  ... 
arXiv:2106.04484v2 fatcat:astp5refr5fdrk7yxtlnvmiwne

Can LLMs Augment Low-Resource Reading Comprehension Datasets? Opportunities and Challenges [article]

Vinay Samuel, Houda Aynaou, Arijit Ghosh Chowdhury, Karthik Venkat Ramanan, Aman Chadha
2023 arXiv   pre-print
This work serves to be the first analysis of LLMs as synthetic data augmenters for QA systems, highlighting the unique opportunities and challenges.  ...  Additionally, we release augmented versions of low resource datasets, that will allow the research community to create further benchmarks for evaluation of generated datasets.  ...  Our approach begins by generating supplementary contexts, questions, and answers to augment training sets.  ... 
arXiv:2309.12426v1 fatcat:pvn72q6otbdhvm62fpcc46lqme

Learning to Ask Unanswerable Questions for Machine Reading Comprehension

Haichao Zhu, Li Dong, Furu Wei, Wenhui Wang, Bing Qin, Ting Liu
2019 Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics  
In this work, we propose a data augmentation technique by automatically generating relevant unanswerable questions according to an answerable question paired with its corresponding paragraph that contains  ...  We also present a way to construct training data for our question generation models by leveraging the existing reading comprehension dataset. 
doi:10.18653/v1/p19-1415 dblp:conf/acl/ZhuDWWQL19 fatcat:biioiptm65bzznmmcbxqaqlqi4

When in Doubt, Ask: Generating Answerable and Unanswerable Questions, Unsupervised [article]

Liubov Nikolenko, Pouya Rezazadeh Kalehbasti
2020 arXiv   pre-print
Question Answering (QA) is key for making possible a robust communication between human and machine.  ...  Modern language models used for QA have surpassed the human-performance in several essential tasks; however, these models require large amounts of human-generated training data which are costly and time-consuming  ...  Before this paper, (i) generating training data for SQuAD question-answering and (ii) using unsupervised methods [instead of supervised methods] to generate training data directly on question-answering  ... 
arXiv:2010.01611v2 fatcat:6hdetreda5egharx3kfo7ok7ja

ZusammenQA: Data Augmentation with Specialized Models for Cross-lingual Open-retrieval Question Answering System [article]

Chia-Chien Hung, Tommaso Green, Robert Litschko, Tornike Tsereteli, Sotaro Takeshita, Marco Bombieri, Goran Glavaš, Simone Paolo Ponzetto
2022 arXiv   pre-print
Additionally, for both passage retrieval and answer generation, we augmented the training data provided by the task organizers with automatically generated question-answer pairs created from Wikipedia  ...  We devised several approaches combining different model variants for three main components: Data Augmentation, Passage Retrieval, and Answer Generation.  ...  Due to computational limitations, in our data augmented setting for generation model fine-tuning, we use 2K question-answer pairs with positive/negative passages for each language for our final results  ... 
arXiv:2205.14981v1 fatcat:72s2goyk2zhzvii7knfvicwkku

Synthesize Step-by-Step: Tools, Templates and LLMs as Data Generators for Reasoning-Based Chart VQA [article]

Zhuowan Li, Bhavan Jasani, Peng Tang, Shabnam Ghadar
2024 arXiv   pre-print
We leverage Large Language Models (LLMs), which have shown to have strong reasoning ability, as an automatic data annotator that generates question-answer annotations for chart images.  ...  We hope our work underscores the potential of synthetic data and encourages further exploration of data augmentation using LLMs for reasoning-heavy tasks.  ...  Prompts Tab. 8 shows the prompts to prompt the LLM-based data generator, for controllably generating questions and answers. D.  ... 
arXiv:2403.16385v2 fatcat:3noigtasabbohmunjwplkwfqyy

A Framework for Evaluating MRC Approaches with Unanswerable Questions

Hung Du, Srikanth Thudumu, Sankhya Singh, Scott Barnett, Irini Logothetis, Rajesh Vasa, Kon Mouzakis
2022 Zenodo  
METHODOLOGY Data collection consists of 100,000 humans generated question-answer pairs with 50,000 unanswerable questions.  ...  INTRODUCTION Question generation (QG) and question answering (QA) are challenging machine reading comprehension tasks.  ...  The robustness of U-Net was evaluated using the modified dataset containing augmented data.  ... 
doi:10.5281/zenodo.7146028 fatcat:neakwbluxvfolca7tcnihxfnwu

Can Question Generation Debias Question Answering Models? A Case Study on Question-Context Lexical Overlap [article]

Kazutoshi Shinoda and Saku Sugawara and Akiko Aizawa
2021 arXiv   pre-print
Question generation (QG), a method for augmenting QA datasets, can be a solution for such performance degradation if QG can properly debias QA datasets.  ...  Question answering (QA) models for reading comprehension have been demonstrated to exploit unintended dataset biases such as question-context lexical overlap. 
arXiv:2109.11256v1 fatcat:jatzah33qjbkxjqprpud6jctb4

Semantic Equivalent Adversarial Data Augmentation for Visual Question Answering [article]

Ruixue Tang, Chao Ma, Wei Emma Zhang, Qi Wu, Xiaokang Yang
2020 arXiv   pre-print
In this paper, instead of directly manipulating images and questions, we use generated adversarial examples for both images and questions as the augmented data.  ...  On the other hand, the data augmentation, as one of the major tricks for DNN, has been widely used in many computer vision tasks.  ...  Although there are few works studying the data augmentation problem for VQA [18, 35, 33, 1] , they merely generate either new questions or images.  ... 
arXiv:2007.09592v1 fatcat:y7wzntyz6rbhtcvugijeaz445i

Chabbiimen at VQA-Med 2021: Visual Generation of Relevant Natural Language Questions from Radiology Images for Anomaly Detection

Imen Chebbi
2021 Conference and Labs of the Evaluation Forum  
I have used augmentation techniques for increasing data and VGG19 for extraction of the feature from a picture and prediction. VQGR is implemented using Tensorflow.  ...  In addition, systems able to understand clinical pictures and answer the questions about its content can assist objective decision making, objective education.  ...  Technique Of Data Augmentation Technique Of Data Augmentation is a method utilized for augmenting the size of data by adding on moderately adapted duplicate of already available data or recently generated  ... 
dblp:conf/clef/Chebbi21 fatcat:hbspwbgbqvahhkcnlglr2xzttq
