Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Weakly Supervised Information Extraction from Inscrutable Handwritten Document Images

  • Conference paper
  • First Online:
Document Analysis and Recognition - ICDAR 2023 (ICDAR 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14190))

Included in the following conference series:

  • 576 Accesses

Abstract

State-of-the-art information extraction methods are limited by OCR errors. They work well for printed text in form-like documents, but unstructured, handwritten documents still remain a challenge. Adapting existing models to domain-specific training data is quite expensive, because of two factors, 1) limited availability of the domain-specific documents (such as handwritten prescriptions, lab notes, etc.), and 2) annotations become even more challenging as one needs domain-specific knowledge to decode inscrutable handwritten document images. In this work, we focus on the complex problem of extracting medicine names from handwritten prescriptions using only weakly labeled data. The data consists of images along with the list of medicine names in it, but not their location in the image. We solve the problem by first identifying the regions of interest, i.e., medicine lines from just weak labels and then injecting a domain-specific medicine language model learned using only synthetically generated data. Compared to off-the-shelf state-of-the-art methods, our approach performs \(>2.5\times \) better in medicine names extraction from prescriptions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Achkar, R., Ghayad, K., Haidar, R., Saleh, S., Al Hajj, R.: Medical handwritten prescription recognition using CRNN. In: CITS. IEEE (2019)

    Google Scholar 

  2. Araslanov, N., Roth, S.: Self-supervised augmentation consistency for adapting semantic segmentation. In: CVPR (2021)

    Google Scholar 

  3. Bhunia, A.K., Sain, A., Chowdhury, P.N., Song, Y.Z.: Text is text, no matter what: unifying text recognition using knowledge distillation. In: ICCV (2021)

    Google Scholar 

  4. Bissacco, A., Cummins, M., Netzer, Y., Neven, H.: PhotoOCR: reading text in uncontrolled conditions. In: ICCV, pp. 785–792 (2013)

    Google Scholar 

  5. Breuel, T.M., Ul-Hasan, A., Al-Azawi, M.A., Shafait, F.: High-performance OCR for printed English and Fraktur using LSTM networks. In: ICDAR. IEEE (2013)

    Google Scholar 

  6. Bukhari, S.S., Kadi, A., Jouneh, M.A., Mir, F.M., Dengel, A.: anyOCR: an open-source OCR system for historical archives. In: ICDAR (2017)

    Google Scholar 

  7. Cascante-Bonilla, P., Tan, F., Qi, Y., Ordonez, V.: Curriculum labeling: revisiting pseudo-labeling for semi-supervised learning. In: AAAI (2021)

    Google Scholar 

  8. Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFS. IEEE Trans. Pattern Anal. Mach. Intell. 40, 834–848 (2017)

    Article  Google Scholar 

  9. Cheng, B., et al.: Panoptic-DeepLab: a simple, strong, and fast baseline for bottom-up panoptic segmentation. In: CVPR, pp. 12475–12485 (2020)

    Google Scholar 

  10. Diaz, D.H., Qin, S., Ingle, R., Fujii, Y., Bissacco, A.: Rethinking text line recognition models. arXiv preprint arXiv:2104.07787 (2021)

  11. D’hondt, E., Grouin, C., Grau, B.: Generating a training corpus for OCR post-correction using encoder-decoder model. In: IJCNLP (2017)

    Google Scholar 

  12. Fujii, Y., Driesen, K., Baccash, J., Hurst, A., Popat, A.C.: Sequence-to-label script identification for multilingual OCR. In: ICDAR. IEEE (2017)

    Google Scholar 

  13. Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 369–376 (2006)

    Google Scholar 

  14. Gupta, H., Del Corro, L., Broscheit, S., Hoffart, J., Brenner, E.: Unsupervised multi-view post-OCR error correction with language models. In: EMNLP, pp. 8647–8652 (2021)

    Google Scholar 

  15. Gupta, M., Soeny, K.: Algorithms for rapid digitalization of prescriptions. Visual Inform. 5, 54–69 (2021)

    Article  Google Scholar 

  16. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPr, pp. 770–778 (2016)

    Google Scholar 

  17. Huang, J., et al.: A multiplexed network for end-to-end, multilingual OCR. In: CVPR (2021)

    Google Scholar 

  18. Ingle, R.R., Fujii, Y., Deselaers, T., Baccash, J., Popat, A.C.: A scalable handwritten text recognition system. In: ICDAR (2019)

    Google Scholar 

  19. Jayakumar, P.: Online doctor consultation market to grow (2021). https://www.businesstoday.in/lifestyle/health/story/online-doctor-consultation-market-to-grow-72-to-836-million-by-march-2024-study-304689-2021-08-19

  20. Karatzas, D., et al.: ICDAR 2015 competition on robust reading. In: ICDAR. IEEE (2015)

    Google Scholar 

  21. Karthikeyan, S., de Herrera, A.G.S., Doctor, F., Mirza, A.: An OCR post-correction approach using deep learning for processing medical reports. IEEE Trans. Circuits Syst. Video Technol. 32, 2574–2581 (2021)

    Article  Google Scholar 

  22. Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: weakly supervised instance and semantic segmentation. In: CVPR (2017)

    Google Scholar 

  23. Kittenplon, Y., Lavi, I., Fogel, S., Bar, Y., Manmatha, R., Perona, P.: Towards weakly-supervised text spotting using a multi-task transformer. In: CVPR (2022)

    Google Scholar 

  24. Lee, J., et al.: BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 36, 1234–1240 (2019)

    Article  Google Scholar 

  25. Li, D., Huang, J.B., Li, Y., Wang, S., Yang, M.H.: Weakly supervised object localization with progressive domain adaptation. In: CVPR (2016)

    Google Scholar 

  26. Li, M., et al.: TrOCR: transformer-based optical character recognition with pre-trained models. arXiv preprint arXiv:2109.10282 (2021)

  27. Litman, R., Anschel, O., Tsiper, S., Litman, R., Mazor, S., Manmatha, R.: Scatter: selective context attentional scene text recognizer. In: CVPR (2020)

    Google Scholar 

  28. Liu, H., Wang, J., Long, M.: Cycle self-training for domain adaptation. Adv. Neural Inf. Process. Syst. 34, 22968–22981 (2021)

    Google Scholar 

  29. Long, S., He, X., Yao, C.: Scene text detection and recognition: the deep learning era. Int. J. Comput. Vision 129, 161–184 (2021)

    Article  Google Scholar 

  30. Long, S., Qin, S., Panteleev, D., Bissacco, A., Fujii, Y., Raptis, M.: Towards end-to-end unified scene text detection and layout analysis. In: CVPR (2022)

    Google Scholar 

  31. Marti, U.V., Bunke, H.: The IAM-database: an English sentence database for offline handwriting recognition. Int. J. Doc. Anal. Recogn. 5, 39–46 (2002)

    Article  MATH  Google Scholar 

  32. Paul, Sujoy, Roy, Sourya, Roy-Chowdhury, Amit K..: W-TALC: weakly-supervised temporal activity localization and classification. In: Ferrari, Vittorio, Hebert, Martial, Sminchisescu, Cristian, Weiss, Yair (eds.) ECCV 2018. LNCS, vol. 11208, pp. 588–607. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01225-0_35

    Chapter  Google Scholar 

  33. Pragnadyuti, M., Rabindranath, D., Suhrita, P., Kumar, S.A., Kumar, J.S.: Legibility assessment of handwritten OPD prescriptions of a tertiary care medical college and hospital in Eastern India. SJMPS (2017)

    Google Scholar 

  34. Rani, S., Rehman, A.U., Yousaf, B., Rauf, H.T., Nasr, E.A., Kadry, S.: Recognition of handwritten medical prescription using signature verification techniques. Comput Math Methods Med. (2022)

    Google Scholar 

  35. Rasmy, L., Xiang, Y., Xie, Z., Tao, C., Zhi, D.: Med-BERT: pretrained contextualized embeddings on large-scale structured electronic health records for disease prediction. Nature 4, 86 (2021)

    Google Scholar 

  36. Ratner, A., Bach, S.H., Ehrenberg, H., Fries, J., Wu, S., Ré, C.: Snorkel: rapid training data creation with weak supervision. In: VLDB. NIH Public Access (2017)

    Google Scholar 

  37. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: CVPR (2016)

    Google Scholar 

  38. Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 28 (2015)

    Google Scholar 

  39. Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.C.: MobileNetV2: inverted residuals and linear bottlenecks. In: CVPR (2018)

    Google Scholar 

  40. Sohn, K., et al.: FixMatch: simplifying semi-supervised learning with consistency and confidence. Adv. Neural Inf. Process. Syst. 33, 596–608 (2020)

    Google Scholar 

  41. Thompson, P., McNaught, J., Ananiadou, S.: Customised OCR correction for historical medical text. In: 2015 digital heritage. IEEE (2015)

    Google Scholar 

  42. Vaswani, A., et al.: Attention is all you need. Adv. Neural Inf. Process. Syst. 30 (2017)

    Google Scholar 

  43. Wang, P., Li, H., Shen, C.: Towards end-to-end text spotting in natural scenes. IEEE Trans. Pattern Anal. Mach. Intell. 44, 7266–7281 (2021)

    Article  Google Scholar 

  44. Wei, Y., et al.: STC: a simple to complex framework for weakly-supervised semantic segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39, 2314–2320 (2016)

    Article  Google Scholar 

  45. Yang, X., et al.: GatorTron: a large clinical language model to unlock patient information from unstructured electronic health records. arXiv preprint arXiv:2203.03540 (2022)

  46. Zhang, C., Cao, M., Yang, D., Chen, J., Zou, Y.: CoLa: weakly-supervised temporal action localization with snippet contrastive learning. In: CVPR (2021)

    Google Scholar 

  47. Zhang, D., Han, J., Cheng, G., Yang, M.H.: Weakly supervised object localization and detection: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 44, 5866–5885 (2021)

    Google Scholar 

Download references

Acknowledgement

We thank Srujana Merugu, Ansh Khurana, Manish Gupta, Harsh Dhand and Shruti Garg for all the support and discussions during the course of this project. Without their effort, this project would not have been possible.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sujoy Paul .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Paul, S., Madan, G., Mishra, A., Hegde, N., Kumar, P., Aggarwal, G. (2023). Weakly Supervised Information Extraction from Inscrutable Handwritten Document Images. In: Fink, G.A., Jain, R., Kise, K., Zanibbi, R. (eds) Document Analysis and Recognition - ICDAR 2023. ICDAR 2023. Lecture Notes in Computer Science, vol 14190. Springer, Cham. https://doi.org/10.1007/978-3-031-41685-9_28

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-41685-9_28

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-41684-2

  • Online ISBN: 978-3-031-41685-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics