Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

RealCQA: Scientific Chart Question Answering as a Test-Bed for First-Order Logic

  • Conference paper
  • First Online:
Document Analysis and Recognition - ICDAR 2023 (ICDAR 2023)

Abstract

We present a comprehensive study of chart visual question-answering(QA) task, to address the challenges faced in comprehending and extracting data from chart visualizations within documents. Despite efforts to tackle this problem using synthetic charts, solutions are limited by the shortage of annotated real-world data. To fill this gap, we introduce a benchmark and dataset for chart visual QA on real-world charts, offering a systematic analysis of the task and a novel taxonomy for template-based chart question creation. Our contribution includes the introduction of a new answer type, ‘list’, with both ranked and unranked variations. Our study is conducted on a real-world chart dataset from scientific literature, showcasing higher visual complexity compared to other works. Our focus is on template-based QA and how it can serve as a standard for evaluating the first-order logic capabilities of models. The results of our experiments, conducted on a real-world out-of-distribution dataset, provide a robust evaluation of large-scale pre-trained models and advance the field of chart visual QA and formal logic verification for neural networks in general. Our code and dataset is publicly available (https://github.com/cse-ai-lab/RealCQA).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 119.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 159.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Ahmed, S., Davila, K., Setlur, S., Govindaraju, V.: Equation attention relationship network (earn) : a geometric deep metric framework for learning similar math expression embedding. In: 2020 25th International Conference on Pattern Recognition (ICPR), pp. 6282–6289 (2021). https://doi.org/10.1109/ICPR48806.2021.9412619

  2. Appalaraju, S., Jasani, B., Kota, B.U., Xie, Y., Manmatha, R.: DocFormer: end-to-end transformer for document understanding. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 993–1003 (2021)

    Google Scholar 

  3. Barceló, P., Kostylev, E.V., Monet, M., Pérez, J., Reutter, J., Silva, J.P.: The logical expressiveness of graph neural networks. In: 8th International Conference on Learning Representations (ICLR 2020) (2020)

    Google Scholar 

  4. Battaglia, P.W., et al.: Relational inductive biases, deep learning, and graph networks. arXiv preprint arXiv:1806.01261 (2018)

  5. Besold, T.R., et al.: Neural-symbolic learning and reasoning: a survey and interpretation. CoRR abs/1711.03902 (2017)

    Google Scholar 

  6. Borchmann, Ł., et al.: Due: End-to-end document understanding benchmark. In: Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2) (2021)

    Google Scholar 

  7. Chang, S., Palzer, D., Li, J., Fosler-Lussier, E., Xiao, N.: MapQA: a dataset for question answering on choropleth maps. arXiv preprint arXiv:2211.08545 (2022)

  8. Chaudhry, R., Shekhar, S., Gupta, U., Maneriker, P., Bansal, P., Joshi, A.: Leaf-QA: locate, encode & attend for figure question answering. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 3512–3521 (2020)

    Google Scholar 

  9. Chen, C., et al.: Neural caption generation over figures. In: Adjunct Proceedings of the 2019 ACM International Joint Conference on Pervasive and Ubiquitous Computing and Proceedings of the 2019 ACM International Symposium on Wearable Computers, pp. 482–485 (2019)

    Google Scholar 

  10. Cho, J., Lei, J., Tan, H., Bansal, M.: Unifying vision-and-language tasks via text generation. In: International Conference on Machine Learning, pp. 1931–1942. PMLR (2021)

    Google Scholar 

  11. Davila, K., Xu, F., Ahmed, S., Mendoza, D.A., Setlur, S., Govindaraju, V.: ICPR 2022: challenge on harvesting raw tables from infographics (chart-infographics). In: 2022 26th International Conference on Pattern Recognition (ICPR), pp. 4995–5001 (2022). https://doi.org/10.1109/ICPR56361.2022.9956289

  12. Du, Q., Wang, Q., Li, K., Tian, J., Xiao, L., Jin, Y.: CALM: commen-sense knowledge augmentation for document image understanding. In: Proceedings of the 30th ACM International Conference on Multimedia, pp. 3282–3290 (2022)

    Google Scholar 

  13. Eisenschlos, J.M., Gor, M., Müller, T., Cohen, W.W.: MATE: multi-view attention for table transformer efficiency. CoRR abs/2109.04312 (2021)

    Google Scholar 

  14. Gu, J., et al.: Unidoc: unified pretraining framework for document understanding. Adv. Neural. Inf. Process. Syst. 34, 39–50 (2021)

    Google Scholar 

  15. Herzig, J., Nowak, P.K., Müller, T., Piccinno, F., Eisenschlos, J.M.: Tapas: weakly supervised table parsing via pre-training. arXiv preprint arXiv:2004.02349 (2020)

  16. Jawade, B., Mohan, D.D., Ali, N.M., Setlur, S., Govindaraju, V.: NAPReg: nouns as proxies regularization for semantically aware cross-modal embeddings. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pp. 1135–1144 (2023)

    Google Scholar 

  17. Kafle, K., Price, B., Cohen, S., Kanan, C.: DVQA: understanding data visualizations via question answering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5648–5656 (2018)

    Google Scholar 

  18. Kaliszyk, C., Chollet, F., Szegedy, C.: HolStep: a machine learning dataset for higher-order logic theorem proving. arXiv preprint arXiv:1703.00426 (2017)

  19. Kodali, V., Berleant, D.: Recent, rapid advancement in visual question answering: a review. In: 2022 IEEE International Conference on Electro Information Technology (eIT), pp. 139–146 (2022). https://doi.org/10.1109/eIT53891.2022.9813988

  20. Levy, M., Ben-Ari, R., Lischinski, D.: Classification-regression for chart comprehension. In: Avidan, S., Brostow, G., Cisse, M., Farinella, G.M., Hassner, T. (eds.) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol. 13696, pp. 469–484. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20059-5_27

  21. Li, P., et al.: SelfDoc: self-supervised document representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5652–5660 (2021)

    Google Scholar 

  22. Liu, F., et al.: MatCha: enhancing visual language pretraining with math reasoning and chart derendering. arXiv preprint arXiv:2212.09662 (2022)

  23. Mansouri, B., Agarwal, A., Oard, D.W., Zanibbi, R.: Advancing math-aware search: the ARQMath-3 lab at CLEF 2022. In: Hagen, M., et al. (eds.) ECIR 2022. LNCS, vol. 13186, pp. 408–415. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-99739-7_51

    Chapter  Google Scholar 

  24. Masry, A., Long, D.X., Tan, J.Q., Joty, S., Hoque, E.: ChartQA: a benchmark for question answering about charts with visual and logical reasoning. arXiv preprint arXiv:2203.10244 (2022)

  25. Mathew, M., Karatzas, D., Jawahar, C.: DocVQA: A dataset for VQA on document images. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 2200–2209 (2021)

    Google Scholar 

  26. Methani, N., Ganguly, P., Khapra, M.M., Kumar, P.: PlotQA: reasoning over scientific plots. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 1527–1536 (2020)

    Google Scholar 

  27. Powalski, R., Borchmann, Ł, Jurkiewicz, D., Dwojak, T., Pietruszka, M., Pałka, G.: Going Full-TILT Boogie on document understanding with text-image-layout transformer. In: Lladós, J., Lopresti, D., Uchida, S. (eds.) ICDAR 2021. LNCS, vol. 12822, pp. 732–747. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-86331-9_47

    Chapter  Google Scholar 

  28. Qi, L., et al.: Dureadervis: a chinese dataset for open-domain document visual question answering. In: Findings of the Association for Computational Linguistics: ACL 2022. pp. 1338–1351 (2022)

    Google Scholar 

  29. Ščavnická, Š., Štefánik, M., Kadlčík, M., Geletka, M., Sojka, P.: Towards general document understanding through question answering. RASLAN 2022 Recent Advances in Slavonic Natural Language Processing, p. 183 (2022)

    Google Scholar 

  30. Singh, H., Shekhar, S.: Stl-cqa: Structure-based transformers with localization and encoding for chart question answering. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 3275–3284 (2020)

    Google Scholar 

  31. Tanaka, R., Nishida, K., Yoshida, S.: VisualMRC: machine reading comprehension on document images. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 13878–13888 (2021)

    Google Scholar 

  32. Tito, R., Mathew, M., Jawahar, C.V., Valveny, E., Karatzas, D.: ICDAR 2021 Competition on Document Visual Question Answering. In: Lladós, J., Lopresti, D., Uchida, S. (eds.) ICDAR 2021. LNCS, vol. 12824, pp. 635–649. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-86337-1_42

    Chapter  Google Scholar 

  33. Wu, X., et al.: A region-based document VQA. In: Proceedings of the 30th ACM International Conference on Multimedia, pp. 4909–4920 (2022)

    Google Scholar 

  34. Xu, Y., Li, M., Cui, L., Huang, S., Wei, F., Zhou, M.: LayoutLM: pre-training of text and layout for document image understanding. In: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1192–1200 (2020)

    Google Scholar 

  35. Zhong, X., Tang, J., Yepes, A.J.: PubLayNet: largest dataset ever for document layout analysis. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 1015–1022. IEEE (2019)

    Google Scholar 

  36. Zhu, F., Lei, W., Feng, F., Wang, C., Zhang, H., Chua, T.S.: Towards complex document understanding by discrete reasoning. In: Proceedings of the 30th ACM International Conference on Multimedia, pp. 4857–4866 (2022)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Saleem Ahmed .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ahmed, S., Jawade, B., Pandey, S., Setlur, S., Govindaraju, V. (2023). RealCQA: Scientific Chart Question Answering as a Test-Bed for First-Order Logic. In: Fink, G.A., Jain, R., Kise, K., Zanibbi, R. (eds) Document Analysis and Recognition - ICDAR 2023. ICDAR 2023. Lecture Notes in Computer Science, vol 14189. Springer, Cham. https://doi.org/10.1007/978-3-031-41682-8_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-41682-8_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-41681-1

  • Online ISBN: 978-3-031-41682-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics