Abstract
State-of-the-art information extraction methods are limited by OCR errors. They work well for printed text in form-like documents, but unstructured, handwritten documents still remain a challenge. Adapting existing models to domain-specific training data is quite expensive, because of two factors, 1) limited availability of the domain-specific documents (such as handwritten prescriptions, lab notes, etc.), and 2) annotations become even more challenging as one needs domain-specific knowledge to decode inscrutable handwritten document images. In this work, we focus on the complex problem of extracting medicine names from handwritten prescriptions using only weakly labeled data. The data consists of images along with the list of medicine names in it, but not their location in the image. We solve the problem by first identifying the regions of interest, i.e., medicine lines from just weak labels and then injecting a domain-specific medicine language model learned using only synthetically generated data. Compared to off-the-shelf state-of-the-art methods, our approach performs \(>2.5\times \) better in medicine names extraction from prescriptions.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Achkar, R., Ghayad, K., Haidar, R., Saleh, S., Al Hajj, R.: Medical handwritten prescription recognition using CRNN. In: CITS. IEEE (2019)
Araslanov, N., Roth, S.: Self-supervised augmentation consistency for adapting semantic segmentation. In: CVPR (2021)
Bhunia, A.K., Sain, A., Chowdhury, P.N., Song, Y.Z.: Text is text, no matter what: unifying text recognition using knowledge distillation. In: ICCV (2021)
Bissacco, A., Cummins, M., Netzer, Y., Neven, H.: PhotoOCR: reading text in uncontrolled conditions. In: ICCV, pp. 785–792 (2013)
Breuel, T.M., Ul-Hasan, A., Al-Azawi, M.A., Shafait, F.: High-performance OCR for printed English and Fraktur using LSTM networks. In: ICDAR. IEEE (2013)
Bukhari, S.S., Kadi, A., Jouneh, M.A., Mir, F.M., Dengel, A.: anyOCR: an open-source OCR system for historical archives. In: ICDAR (2017)
Cascante-Bonilla, P., Tan, F., Qi, Y., Ordonez, V.: Curriculum labeling: revisiting pseudo-labeling for semi-supervised learning. In: AAAI (2021)
Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFS. IEEE Trans. Pattern Anal. Mach. Intell. 40, 834–848 (2017)
Cheng, B., et al.: Panoptic-DeepLab: a simple, strong, and fast baseline for bottom-up panoptic segmentation. In: CVPR, pp. 12475–12485 (2020)
Diaz, D.H., Qin, S., Ingle, R., Fujii, Y., Bissacco, A.: Rethinking text line recognition models. arXiv preprint arXiv:2104.07787 (2021)
D’hondt, E., Grouin, C., Grau, B.: Generating a training corpus for OCR post-correction using encoder-decoder model. In: IJCNLP (2017)
Fujii, Y., Driesen, K., Baccash, J., Hurst, A., Popat, A.C.: Sequence-to-label script identification for multilingual OCR. In: ICDAR. IEEE (2017)
Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 369–376 (2006)
Gupta, H., Del Corro, L., Broscheit, S., Hoffart, J., Brenner, E.: Unsupervised multi-view post-OCR error correction with language models. In: EMNLP, pp. 8647–8652 (2021)
Gupta, M., Soeny, K.: Algorithms for rapid digitalization of prescriptions. Visual Inform. 5, 54–69 (2021)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPr, pp. 770–778 (2016)
Huang, J., et al.: A multiplexed network for end-to-end, multilingual OCR. In: CVPR (2021)
Ingle, R.R., Fujii, Y., Deselaers, T., Baccash, J., Popat, A.C.: A scalable handwritten text recognition system. In: ICDAR (2019)
Jayakumar, P.: Online doctor consultation market to grow (2021). https://www.businesstoday.in/lifestyle/health/story/online-doctor-consultation-market-to-grow-72-to-836-million-by-march-2024-study-304689-2021-08-19
Karatzas, D., et al.: ICDAR 2015 competition on robust reading. In: ICDAR. IEEE (2015)
Karthikeyan, S., de Herrera, A.G.S., Doctor, F., Mirza, A.: An OCR post-correction approach using deep learning for processing medical reports. IEEE Trans. Circuits Syst. Video Technol. 32, 2574–2581 (2021)
Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: weakly supervised instance and semantic segmentation. In: CVPR (2017)
Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR (2022)
Lee, J., et al.: BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 36, 1234–1240 (2019)
Li, D., Huang, J.B., Li, Y., Wang, S., Yang, M.H.: Weakly supervised object localization with progressive domain adaptation. In: CVPR (2016)
Li, M., et al.: TrOCR: transformer-based optical character recognition with pre-trained models. arXiv preprint arXiv:2109.10282 (2021)
Litman, R., Anschel, O., Tsiper, S., Litman, R., Mazor, S., Manmatha, R.: Scatter: selective context attentional scene text recognizer. In: CVPR (2020)
Liu, H., Wang, J., Long, M.: Cycle self-training for domain adaptation. Adv. Neural Inf. Process. Syst. 34, 22968–22981 (2021)
Long, S., He, X., Yao, C.: Scene text detection and recognition: the deep learning era. Int. J. Comput. Vision 129, 161–184 (2021)
Long, S., Qin, S., Panteleev, D., Bissacco, A., Fujii, Y., Raptis, M.: Towards end-to-end unified scene text detection and layout analysis. In: CVPR (2022)
Marti, U.V., Bunke, H.: The IAM-database: an English sentence database for offline handwriting recognition. Int. J. Doc. Anal. Recogn. 5, 39–46 (2002)
Paul, Sujoy, Roy, Sourya, Roy-Chowdhury, Amit K..: W-TALC: weakly-supervised temporal activity localization and classification. In: Ferrari, Vittorio, Hebert, Martial, Sminchisescu, Cristian, Weiss, Yair (eds.) ECCV 2018. LNCS, vol. 11208, pp. 588–607. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01225-0_35
Pragnadyuti, M., Rabindranath, D., Suhrita, P., Kumar, S.A., Kumar, J.S.: Legibility assessment of handwritten OPD prescriptions of a tertiary care medical college and hospital in Eastern India. SJMPS (2017)
Rani, S., Rehman, A.U., Yousaf, B., Rauf, H.T., Nasr, E.A., Kadry, S.: Recognition of handwritten medical prescription using signature verification techniques. Comput Math Methods Med. (2022)
Rasmy, L., Xiang, Y., Xie, Z., Tao, C., Zhi, D.: Med-BERT: pretrained contextualized embeddings on large-scale structured electronic health records for disease prediction. Nature 4, 86 (2021)
Ratner, A., Bach, S.H., Ehrenberg, H., Fries, J., Wu, S., Ré, C.: Snorkel: rapid training data creation with weak supervision. In: VLDB. NIH Public Access (2017)
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: CVPR (2016)
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 28 (2015)
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.C.: MobileNetV2: inverted residuals and linear bottlenecks. In: CVPR (2018)
Sohn, K., et al.: FixMatch: simplifying semi-supervised learning with consistency and confidence. Adv. Neural Inf. Process. Syst. 33, 596–608 (2020)
Thompson, P., McNaught, J., Ananiadou, S.: Customised OCR correction for historical medical text. In: 2015 digital heritage. IEEE (2015)
Vaswani, A., et al.: Attention is all you need. Adv. Neural Inf. Process. Syst. 30 (2017)
Wang, P., Li, H., Shen, C.: Towards end-to-end text spotting in natural scenes. IEEE Trans. Pattern Anal. Mach. Intell. 44, 7266–7281 (2021)
Wei, Y., et al.: STC: a simple to complex framework for weakly-supervised semantic segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39, 2314–2320 (2016)
Yang, X., et al.: GatorTron: a large clinical language model to unlock patient information from unstructured electronic health records. arXiv preprint arXiv:2203.03540 (2022)
Zhang, C., Cao, M., Yang, D., Chen, J., Zou, Y.: CoLa: weakly-supervised temporal action localization with snippet contrastive learning. In: CVPR (2021)
Zhang, D., Han, J., Cheng, G., Yang, M.H.: Weakly supervised object localization and detection: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 44, 5866–5885 (2021)
Acknowledgement
We thank Srujana Merugu, Ansh Khurana, Manish Gupta, Harsh Dhand and Shruti Garg for all the support and discussions during the course of this project. Without their effort, this project would not have been possible.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Paul, S., Madan, G., Mishra, A., Hegde, N., Kumar, P., Aggarwal, G. (2023). Weakly Supervised Information Extraction from Inscrutable Handwritten Document Images. In: Fink, G.A., Jain, R., Kise, K., Zanibbi, R. (eds) Document Analysis and Recognition - ICDAR 2023. ICDAR 2023. Lecture Notes in Computer Science, vol 14190. Springer, Cham. https://doi.org/10.1007/978-3-031-41685-9_28
Download citation
DOI: https://doi.org/10.1007/978-3-031-41685-9_28
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-41684-2
Online ISBN: 978-3-031-41685-9
eBook Packages: Computer ScienceComputer Science (R0)