Weakly Supervised Information Extraction from Inscrutable Handwritten Document Images

Paul, Sujoy; Madan, Gagan; Mishra, Akankshya; Hegde, Narayan; Kumar, Pradeep; Aggarwal, Gaurav

doi:10.1007/978-3-031-41685-9_28

Sujoy Paul¹¹,
Gagan Madan¹¹,
Akankshya Mishra¹¹,
Narayan Hegde¹¹,
Pradeep Kumar¹¹ &
…
Gaurav Aggarwal¹¹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14190))

Included in the following conference series:

International Conference on Document Analysis and Recognition

576 Accesses

Abstract

State-of-the-art information extraction methods are limited by OCR errors. They work well for printed text in form-like documents, but unstructured, handwritten documents still remain a challenge. Adapting existing models to domain-specific training data is quite expensive, because of two factors, 1) limited availability of the domain-specific documents (such as handwritten prescriptions, lab notes, etc.), and 2) annotations become even more challenging as one needs domain-specific knowledge to decode inscrutable handwritten document images. In this work, we focus on the complex problem of extracting medicine names from handwritten prescriptions using only weakly labeled data. The data consists of images along with the list of medicine names in it, but not their location in the image. We solve the problem by first identifying the regions of interest, i.e., medicine lines from just weak labels and then injecting a domain-specific medicine language model learned using only synthetically generated data. Compared to off-the-shelf state-of-the-art methods, our approach performs \(>2.5\times \) better in medicine names extraction from prescriptions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Achkar, R., Ghayad, K., Haidar, R., Saleh, S., Al Hajj, R.: Medical handwritten prescription recognition using CRNN. In: CITS. IEEE (2019)
Google Scholar
Araslanov, N., Roth, S.: Self-supervised augmentation consistency for adapting semantic segmentation. In: CVPR (2021)
Google Scholar
Bhunia, A.K., Sain, A., Chowdhury, P.N., Song, Y.Z.: Text is text, no matter what: unifying text recognition using knowledge distillation. In: ICCV (2021)
Google Scholar
Bissacco, A., Cummins, M., Netzer, Y., Neven, H.: PhotoOCR: reading text in uncontrolled conditions. In: ICCV, pp. 785–792 (2013)
Google Scholar
Breuel, T.M., Ul-Hasan, A., Al-Azawi, M.A., Shafait, F.: High-performance OCR for printed English and Fraktur using LSTM networks. In: ICDAR. IEEE (2013)
Google Scholar
Bukhari, S.S., Kadi, A., Jouneh, M.A., Mir, F.M., Dengel, A.: anyOCR: an open-source OCR system for historical archives. In: ICDAR (2017)
Google Scholar
Cascante-Bonilla, P., Tan, F., Qi, Y., Ordonez, V.: Curriculum labeling: revisiting pseudo-labeling for semi-supervised learning. In: AAAI (2021)
Google Scholar
Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFS. IEEE Trans. Pattern Anal. Mach. Intell. 40, 834–848 (2017)
Article Google Scholar
Cheng, B., et al.: Panoptic-DeepLab: a simple, strong, and fast baseline for bottom-up panoptic segmentation. In: CVPR, pp. 12475–12485 (2020)
Google Scholar
Diaz, D.H., Qin, S., Ingle, R., Fujii, Y., Bissacco, A.: Rethinking text line recognition models. arXiv preprint arXiv:2104.07787 (2021)
D’hondt, E., Grouin, C., Grau, B.: Generating a training corpus for OCR post-correction using encoder-decoder model. In: IJCNLP (2017)
Google Scholar
Fujii, Y., Driesen, K., Baccash, J., Hurst, A., Popat, A.C.: Sequence-to-label script identification for multilingual OCR. In: ICDAR. IEEE (2017)
Google Scholar
Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 369–376 (2006)
Google Scholar
Gupta, H., Del Corro, L., Broscheit, S., Hoffart, J., Brenner, E.: Unsupervised multi-view post-OCR error correction with language models. In: EMNLP, pp. 8647–8652 (2021)
Google Scholar
Gupta, M., Soeny, K.: Algorithms for rapid digitalization of prescriptions. Visual Inform. 5, 54–69 (2021)
Article Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPr, pp. 770–778 (2016)
Google Scholar
Huang, J., et al.: A multiplexed network for end-to-end, multilingual OCR. In: CVPR (2021)
Google Scholar
Ingle, R.R., Fujii, Y., Deselaers, T., Baccash, J., Popat, A.C.: A scalable handwritten text recognition system. In: ICDAR (2019)
Google Scholar
Jayakumar, P.: Online doctor consultation market to grow (2021). https://www.businesstoday.in/lifestyle/health/story/online-doctor-consultation-market-to-grow-72-to-836-million-by-march-2024-study-304689-2021-08-19
Karatzas, D., et al.: ICDAR 2015 competition on robust reading. In: ICDAR. IEEE (2015)
Google Scholar
Karthikeyan, S., de Herrera, A.G.S., Doctor, F., Mirza, A.: An OCR post-correction approach using deep learning for processing medical reports. IEEE Trans. Circuits Syst. Video Technol. 32, 2574–2581 (2021)
Article Google Scholar
Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: weakly supervised instance and semantic segmentation. In: CVPR (2017)
Google Scholar
Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR (2022)
Google Scholar
Lee, J., et al.: BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 36, 1234–1240 (2019)
Article Google Scholar
Li, D., Huang, J.B., Li, Y., Wang, S., Yang, M.H.: Weakly supervised object localization with progressive domain adaptation. In: CVPR (2016)
Google Scholar
Li, M., et al.: TrOCR: transformer-based optical character recognition with pre-trained models. arXiv preprint arXiv:2109.10282 (2021)
Litman, R., Anschel, O., Tsiper, S., Litman, R., Mazor, S., Manmatha, R.: Scatter: selective context attentional scene text recognizer. In: CVPR (2020)
Google Scholar
Liu, H., Wang, J., Long, M.: Cycle self-training for domain adaptation. Adv. Neural Inf. Process. Syst. 34, 22968–22981 (2021)
Google Scholar
Long, S., He, X., Yao, C.: Scene text detection and recognition: the deep learning era. Int. J. Comput. Vision 129, 161–184 (2021)
Article Google Scholar
Long, S., Qin, S., Panteleev, D., Bissacco, A., Fujii, Y., Raptis, M.: Towards end-to-end unified scene text detection and layout analysis. In: CVPR (2022)
Google Scholar
Marti, U.V., Bunke, H.: The IAM-database: an English sentence database for offline handwriting recognition. Int. J. Doc. Anal. Recogn. 5, 39–46 (2002)
Article MATH Google Scholar
Paul, Sujoy, Roy, Sourya, Roy-Chowdhury, Amit K..: W-TALC: weakly-supervised temporal activity localization and classification. In: Ferrari, Vittorio, Hebert, Martial, Sminchisescu, Cristian, Weiss, Yair (eds.) ECCV 2018. LNCS, vol. 11208, pp. 588–607. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01225-0_35
Chapter Google Scholar
Pragnadyuti, M., Rabindranath, D., Suhrita, P., Kumar, S.A., Kumar, J.S.: Legibility assessment of handwritten OPD prescriptions of a tertiary care medical college and hospital in Eastern India. SJMPS (2017)
Google Scholar
Rani, S., Rehman, A.U., Yousaf, B., Rauf, H.T., Nasr, E.A., Kadry, S.: Recognition of handwritten medical prescription using signature verification techniques. Comput Math Methods Med. (2022)
Google Scholar
Rasmy, L., Xiang, Y., Xie, Z., Tao, C., Zhi, D.: Med-BERT: pretrained contextualized embeddings on large-scale structured electronic health records for disease prediction. Nature 4, 86 (2021)
Google Scholar
Ratner, A., Bach, S.H., Ehrenberg, H., Fries, J., Wu, S., Ré, C.: Snorkel: rapid training data creation with weak supervision. In: VLDB. NIH Public Access (2017)
Google Scholar
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: CVPR (2016)
Google Scholar
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 28 (2015)
Google Scholar
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.C.: MobileNetV2: inverted residuals and linear bottlenecks. In: CVPR (2018)
Google Scholar
Sohn, K., et al.: FixMatch: simplifying semi-supervised learning with consistency and confidence. Adv. Neural Inf. Process. Syst. 33, 596–608 (2020)
Google Scholar
Thompson, P., McNaught, J., Ananiadou, S.: Customised OCR correction for historical medical text. In: 2015 digital heritage. IEEE (2015)
Google Scholar
Vaswani, A., et al.: Attention is all you need. Adv. Neural Inf. Process. Syst. 30 (2017)
Google Scholar
Wang, P., Li, H., Shen, C.: Towards end-to-end text spotting in natural scenes. IEEE Trans. Pattern Anal. Mach. Intell. 44, 7266–7281 (2021)
Article Google Scholar
Wei, Y., et al.: STC: a simple to complex framework for weakly-supervised semantic segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39, 2314–2320 (2016)
Article Google Scholar
Yang, X., et al.: GatorTron: a large clinical language model to unlock patient information from unstructured electronic health records. arXiv preprint arXiv:2203.03540 (2022)
Zhang, C., Cao, M., Yang, D., Chen, J., Zou, Y.: CoLa: weakly-supervised temporal action localization with snippet contrastive learning. In: CVPR (2021)
Google Scholar
Zhang, D., Han, J., Cheng, G., Yang, M.H.: Weakly supervised object localization and detection: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 44, 5866–5885 (2021)
Google Scholar

Download references

Acknowledgement

We thank Srujana Merugu, Ansh Khurana, Manish Gupta, Harsh Dhand and Shruti Garg for all the support and discussions during the course of this project. Without their effort, this project would not have been possible.

Author information

Authors and Affiliations

Google Research, Mountain View, USA
Sujoy Paul, Gagan Madan, Akankshya Mishra, Narayan Hegde, Pradeep Kumar & Gaurav Aggarwal

Authors

Sujoy Paul
View author publications
You can also search for this author in PubMed Google Scholar
Gagan Madan
View author publications
You can also search for this author in PubMed Google Scholar
Akankshya Mishra
View author publications
You can also search for this author in PubMed Google Scholar
Narayan Hegde
View author publications
You can also search for this author in PubMed Google Scholar
Pradeep Kumar
View author publications
You can also search for this author in PubMed Google Scholar
Gaurav Aggarwal
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sujoy Paul .

Editor information

Editors and Affiliations

TU Dortmund University, Dortmund, Germany
Gernot A. Fink
Adobe, College Park, MN, USA
Rajiv Jain
Osaka Metropolitan University, Osaka, Japan
Koichi Kise
Rochester Institute of Technology, Rochester, NY, USA
Richard Zanibbi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Paul, S., Madan, G., Mishra, A., Hegde, N., Kumar, P., Aggarwal, G. (2023). Weakly Supervised Information Extraction from Inscrutable Handwritten Document Images. In: Fink, G.A., Jain, R., Kise, K., Zanibbi, R. (eds) Document Analysis and Recognition - ICDAR 2023. ICDAR 2023. Lecture Notes in Computer Science, vol 14190. Springer, Cham. https://doi.org/10.1007/978-3-031-41685-9_28

Download citation

DOI: https://doi.org/10.1007/978-3-031-41685-9_28
Published: 19 August 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-41684-2
Online ISBN: 978-3-031-41685-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

Weakly Supervised Information Extraction from Inscrutable Handwritten Document Images