RealCQA: Scientific Chart Question Answering as a Test-Bed for First-Order Logic

Ahmed, Saleem; Jawade, Bhavin; Pandey, Shubham; Setlur, Srirangaraj; Govindaraju, Venu

doi:10.1007/978-3-031-41682-8_5

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14189))

Included in the following conference series:

International Conference on Document Analysis and Recognition

801 Accesses

Abstract

We present a comprehensive study of chart visual question-answering(QA) task, to address the challenges faced in comprehending and extracting data from chart visualizations within documents. Despite efforts to tackle this problem using synthetic charts, solutions are limited by the shortage of annotated real-world data. To fill this gap, we introduce a benchmark and dataset for chart visual QA on real-world charts, offering a systematic analysis of the task and a novel taxonomy for template-based chart question creation. Our contribution includes the introduction of a new answer type, ‘list’, with both ranked and unranked variations. Our study is conducted on a real-world chart dataset from scientific literature, showcasing higher visual complexity compared to other works. Our focus is on template-based QA and how it can serve as a standard for evaluating the first-order logic capabilities of models. The results of our experiments, conducted on a real-world out-of-distribution dataset, provide a robust evaluation of large-scale pre-trained models and advance the field of chart visual QA and formal logic verification for neural networks in general. Our code and dataset is publicly available (https://github.com/cse-ai-lab/RealCQA).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 119.00; Price excludes VAT (USA)

Softcover Book: USD 159.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Textbook Question Answering with Multi-type Question Learning and Contextualized Diagram Representation

Graph Strategy for Interpretable Visual Question Answering

A Dataset and Baselines for Visual Question Answering on Art

References

Ahmed, S., Davila, K., Setlur, S., Govindaraju, V.: Equation attention relationship network (earn) : a geometric deep metric framework for learning similar math expression embedding. In: 2020 25th International Conference on Pattern Recognition (ICPR), pp. 6282–6289 (2021). https://doi.org/10.1109/ICPR48806.2021.9412619
Appalaraju, S., Jasani, B., Kota, B.U., Xie, Y., Manmatha, R.: DocFormer: end-to-end transformer for document understanding. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 993–1003 (2021)
Google Scholar
Barceló, P., Kostylev, E.V., Monet, M., Pérez, J., Reutter, J., Silva, J.P.: The logical expressiveness of graph neural networks. In: 8th International Conference on Learning Representations (ICLR 2020) (2020)
Google Scholar
Battaglia, P.W., et al.: Relational inductive biases, deep learning, and graph networks. arXiv preprint arXiv:1806.01261 (2018)
Besold, T.R., et al.: Neural-symbolic learning and reasoning: a survey and interpretation. CoRR abs/1711.03902 (2017)
Google Scholar
Borchmann, Ł., et al.: Due: End-to-end document understanding benchmark. In: Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2) (2021)
Google Scholar
Chang, S., Palzer, D., Li, J., Fosler-Lussier, E., Xiao, N.: MapQA: a dataset for question answering on choropleth maps. arXiv preprint arXiv:2211.08545 (2022)
Chaudhry, R., Shekhar, S., Gupta, U., Maneriker, P., Bansal, P., Joshi, A.: Leaf-QA: locate, encode & attend for figure question answering. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 3512–3521 (2020)
Google Scholar
Chen, C., et al.: Neural caption generation over figures. In: Adjunct Proceedings of the 2019 ACM International Joint Conference on Pervasive and Ubiquitous Computing and Proceedings of the 2019 ACM International Symposium on Wearable Computers, pp. 482–485 (2019)
Google Scholar
Cho, J., Lei, J., Tan, H., Bansal, M.: Unifying vision-and-language tasks via text generation. In: International Conference on Machine Learning, pp. 1931–1942. PMLR (2021)
Google Scholar
Davila, K., Xu, F., Ahmed, S., Mendoza, D.A., Setlur, S., Govindaraju, V.: ICPR 2022: challenge on harvesting raw tables from infographics (chart-infographics). In: 2022 26th International Conference on Pattern Recognition (ICPR), pp. 4995–5001 (2022). https://doi.org/10.1109/ICPR56361.2022.9956289
Du, Q., Wang, Q., Li, K., Tian, J., Xiao, L., Jin, Y.: CALM: commen-sense knowledge augmentation for document image understanding. In: Proceedings of the 30th ACM International Conference on Multimedia, pp. 3282–3290 (2022)
Google Scholar
Eisenschlos, J.M., Gor, M., Müller, T., Cohen, W.W.: MATE: multi-view attention for table transformer efficiency. CoRR abs/2109.04312 (2021)
Google Scholar
Gu, J., et al.: Unidoc: unified pretraining framework for document understanding. Adv. Neural. Inf. Process. Syst. 34, 39–50 (2021)
Google Scholar
Herzig, J., Nowak, P.K., Müller, T., Piccinno, F., Eisenschlos, J.M.: Tapas: weakly supervised table parsing via pre-training. arXiv preprint arXiv:2004.02349 (2020)
Jawade, B., Mohan, D.D., Ali, N.M., Setlur, S., Govindaraju, V.: NAPReg: nouns as proxies regularization for semantically aware cross-modal embeddings. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pp. 1135–1144 (2023)
Google Scholar
Kafle, K., Price, B., Cohen, S., Kanan, C.: DVQA: understanding data visualizations via question answering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5648–5656 (2018)
Google Scholar
Kaliszyk, C., Chollet, F., Szegedy, C.: HolStep: a machine learning dataset for higher-order logic theorem proving. arXiv preprint arXiv:1703.00426 (2017)
Kodali, V., Berleant, D.: Recent, rapid advancement in visual question answering: a review. In: 2022 IEEE International Conference on Electro Information Technology (eIT), pp. 139–146 (2022). https://doi.org/10.1109/eIT53891.2022.9813988
Levy, M., Ben-Ari, R., Lischinski, D.: Classification-regression for chart comprehension. In: Avidan, S., Brostow, G., Cisse, M., Farinella, G.M., Hassner, T. (eds.) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol. 13696, pp. 469–484. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20059-5_27
Li, P., et al.: SelfDoc: self-supervised document representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5652–5660 (2021)
Google Scholar
Liu, F., et al.: MatCha: enhancing visual language pretraining with math reasoning and chart derendering. arXiv preprint arXiv:2212.09662 (2022)
Mansouri, B., Agarwal, A., Oard, D.W., Zanibbi, R.: Advancing math-aware search: the ARQMath-3 lab at CLEF 2022. In: Hagen, M., et al. (eds.) ECIR 2022. LNCS, vol. 13186, pp. 408–415. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-99739-7_51
Chapter Google Scholar
Masry, A., Long, D.X., Tan, J.Q., Joty, S., Hoque, E.: ChartQA: a benchmark for question answering about charts with visual and logical reasoning. arXiv preprint arXiv:2203.10244 (2022)
Mathew, M., Karatzas, D., Jawahar, C.: DocVQA: A dataset for VQA on document images. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 2200–2209 (2021)
Google Scholar
Methani, N., Ganguly, P., Khapra, M.M., Kumar, P.: PlotQA: reasoning over scientific plots. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 1527–1536 (2020)
Google Scholar
Powalski, R., Borchmann, Ł, Jurkiewicz, D., Dwojak, T., Pietruszka, M., Pałka, G.: Going Full-TILT Boogie on document understanding with text-image-layout transformer. In: Lladós, J., Lopresti, D., Uchida, S. (eds.) ICDAR 2021. LNCS, vol. 12822, pp. 732–747. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-86331-9_47
Chapter Google Scholar
Qi, L., et al.: Dureadervis: a chinese dataset for open-domain document visual question answering. In: Findings of the Association for Computational Linguistics: ACL 2022. pp. 1338–1351 (2022)
Google Scholar
Ščavnická, Š., Štefánik, M., Kadlčík, M., Geletka, M., Sojka, P.: Towards general document understanding through question answering. RASLAN 2022 Recent Advances in Slavonic Natural Language Processing, p. 183 (2022)
Google Scholar
Singh, H., Shekhar, S.: Stl-cqa: Structure-based transformers with localization and encoding for chart question answering. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 3275–3284 (2020)
Google Scholar
Tanaka, R., Nishida, K., Yoshida, S.: VisualMRC: machine reading comprehension on document images. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 13878–13888 (2021)
Google Scholar
Tito, R., Mathew, M., Jawahar, C.V., Valveny, E., Karatzas, D.: ICDAR 2021 Competition on Document Visual Question Answering. In: Lladós, J., Lopresti, D., Uchida, S. (eds.) ICDAR 2021. LNCS, vol. 12824, pp. 635–649. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-86337-1_42
Chapter Google Scholar
Wu, X., et al.: A region-based document VQA. In: Proceedings of the 30th ACM International Conference on Multimedia, pp. 4909–4920 (2022)
Google Scholar
Xu, Y., Li, M., Cui, L., Huang, S., Wei, F., Zhou, M.: LayoutLM: pre-training of text and layout for document image understanding. In: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1192–1200 (2020)
Google Scholar
Zhong, X., Tang, J., Yepes, A.J.: PubLayNet: largest dataset ever for document layout analysis. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 1015–1022. IEEE (2019)
Google Scholar
Zhu, F., Lei, W., Feng, F., Wang, C., Zhang, H., Chua, T.S.: Towards complex document understanding by discrete reasoning. In: Proceedings of the 30th ACM International Conference on Multimedia, pp. 4857–4866 (2022)
Google Scholar

Download references

Author information

Authors and Affiliations

University at Buffalo, Buffalo, USA
Saleem Ahmed, Bhavin Jawade, Shubham Pandey, Srirangaraj Setlur & Venu Govindaraju

Authors

Saleem Ahmed
View author publications
You can also search for this author in PubMed Google Scholar
Bhavin Jawade
View author publications
You can also search for this author in PubMed Google Scholar
Shubham Pandey
View author publications
You can also search for this author in PubMed Google Scholar
Srirangaraj Setlur
View author publications
You can also search for this author in PubMed Google Scholar
Venu Govindaraju
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Saleem Ahmed .

Editor information

Editors and Affiliations

TU Dortmund University, Dortmund, Germany
Gernot A. Fink
Adobe, College Park, MN, USA
Rajiv Jain
Osaka Metropolitan University, Osaka, Japan
Koichi Kise
Rochester Institute of Technology, Rochester, NY, USA
Richard Zanibbi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ahmed, S., Jawade, B., Pandey, S., Setlur, S., Govindaraju, V. (2023). RealCQA: Scientific Chart Question Answering as a Test-Bed for First-Order Logic. In: Fink, G.A., Jain, R., Kise, K., Zanibbi, R. (eds) Document Analysis and Recognition - ICDAR 2023. ICDAR 2023. Lecture Notes in Computer Science, vol 14189. Springer, Cham. https://doi.org/10.1007/978-3-031-41682-8_5

Download citation

DOI: https://doi.org/10.1007/978-3-031-41682-8_5
Published: 19 August 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-41681-1
Online ISBN: 978-3-031-41682-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

RealCQA: Scientific Chart Question Answering as a Test-Bed for First-Order Logic

Abstract

Access this chapter

Similar content being viewed by others

Textbook Question Answering with Multi-type Question Learning and Contextualized Diagram Representation

Graph Strategy for Interpretable Visual Question Answering

A Dataset and Baselines for Visual Question Answering on Art

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Navigation

RealCQA: Scientific Chart Question Answering as a Test-Bed for First-Order Logic

Abstract

Access this chapter

Similar content being viewed by others

Textbook Question Answering with Multi-type Question Learning and Contextualized Diagram Representation

Graph Strategy for Interpretable Visual Question Answering

A Dataset and Baselines for Visual Question Answering on Art

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation