Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Drop flow method: an iterative algorithm for complete segmentation of Devanagari ancient manuscripts

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

One of the major challenges of ancient manuscripts recognition is character segmentation. Because of many distinct features of ancient documents (thick characters, overlapping and touching characters), character segmentation is a very difficult task. Devanagari ancient manuscripts consist of vowels, consonants, modifiers, conjuncts and compound characters. Using existing techniques, segmentation of overlapping and touching characters is problematic. In this paper, an iterative character segmentation algorithm is presented for ancient documents in Devanagari script. At the beginning, the lines are extracted from the ancient documents by dividing the document image into vertical stripes and then using piecewise horizontal projection profiles. After that, these lines are segmented into words using vertical projection profiles and finally, words are segmented in characters using an iterative algorithm. In each iteration, character segmentation is refined. In the present work, we have proposed a new algorithm with the name ‘Drop Flow Method’ to find the segmentation path between touching components. The proposed algorithm can segment touching characters and 96.0% accuracy has been achieved for complete segmentation of Devanagari ancient manuscripts.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23
Fig. 24
Fig. 25

Similar content being viewed by others

References

  1. Alaei A, Nagabhushan P, Pal U (2011) Piece-wise painting technique for line segmentation of unconstrained handwritten text: a specific study with Persian text documents. Pattern Anal Applic 14(4):381–394

    Article  MathSciNet  Google Scholar 

  2. Alam MM, Kashem MA (2010) A complete Bangla OCR system for printed characters. International Journal of Computer and Information Technology 1(1):30–35

    Google Scholar 

  3. Babu S, Jangid M (2016) Touching character segmentation of Devanagari script. ICCCNT '16 Proceedings of the 7th International Conference on Computing Communication and Networking Technologies, Article No. 26, Dallas, TX, USA: doi:https://doi.org/10.1145/2967878.2967908

  4. Bag S, Krishna A (2015) Character segmentation of Hindi unconstrained handwritten words. Proceedings of the 17th International workshop on Combinatorial Image Analysis 9448:247–260

    Article  MathSciNet  Google Scholar 

  5. Bansal V, Sinha RMK (2002) Segmentation of touching and fused Devanagari characters. Pattern Recogn 35(4):875–893

    Article  MATH  Google Scholar 

  6. Bar-Yosef I, Hagbi N, Kedem K, Dinstein I (2009) Line segmentation for degraded handwritten historical documents. 10th International Conference on Document Analysis and Recognition, Barcelona, pp 1161–1165

    Google Scholar 

  7. Brodic D, Milivojevic Z (2009) Reference text line identification based on water flow algorithm. Proceedings of the International Scientific Conference on Information, Communication and Energy Systems and Technologies 17(1):30–47

    MATH  Google Scholar 

  8. Casey RG, Lecolinet E (1996) A survey of methods and strategies in character segmentation. IEEE Trans Pattern Anal Mach Intell 18(7):690–706

    Article  Google Scholar 

  9. Chen K, Seuret M, Liwicki M, Hennebert J, Liu CL, Ingold R (2016) Page segmentation for historical handwritten document images using conditional random fields. Proceedings of the International Conference on Frontiers in Handwriting Recognition (ICFHR), 90–95

  10. Chen Y, Wang J (2000) Segmentation of single-or multiple-touching handwritten numeral string using background and foreground analysis. IEEE Trans Pattern Anal Mach Intell 22(11):1304–1317

    Article  Google Scholar 

  11. Dogra S, Sehgal A (2017) Devanagari letters segmentation and recognition system: a brief review. IJSRD - International Journal for Scientific Research & Development 5(01):1418–1422

    Google Scholar 

  12. Dunn CE, Wang PSP (1992) Character segmentation techniques for handwritten text-a survey. Proceedings of the 11th International Conference on Recognition Methodology and Systems 2:577–580

    Google Scholar 

  13. Dutta K, Krishnan P, Mathew M, Jawahar CV (2018) Offline Handwriting Recognition on Devanagari Using a New Benchmark Dataset. 13th IAPR International Workshop on Document Analysis Systems (DAS), Vienna, pp 25–30. https://doi.org/10.1109/DAS.2018.69

    Google Scholar 

  14. Fujisawa H, Nakano Y, Kurino K (1992) Segmentation methods for character recognition from segmentation to document structure analysis. Proc IEEE 80(7):1079–1092

    Article  Google Scholar 

  15. Gatos B, Louloudis G, Stamatopoulos N (2014) Segmentation of historical handwritten documents into text zones and text lines. Proceedings of the International Conference on Frontiers in Handwriting Recognition (ICFHR), 464–469.

  16. Jindal MK, Lehal GS, Sharma RK (2009) Segmentation of touching characters in upper zone in printed Gurmukhi script. Proceedings of 2nd Bangalore Annual Compute Conference, 1–6

  17. Kim KK, Kim JH and Suen CY (2000) Recognition of unconstrained handwritten numeral strings by composite segmentation method. Proceedings of the 15th International Conference on Pattern Recognition: 594–597

  18. Kumar A, Yadav M, Patnaik T, Kumar B (2013) A survey on touching character segmentation. International Journal of Engineering and Advanced Technology 2(3):569–574

    Google Scholar 

  19. Mohite RS, Bombade BR (2014) Challenging issues in Devanagari script recognition. International Journal Computer Technology & Applications 5(3):947–952

    Google Scholar 

  20. Oliveira LS, Lethelier E, Bortolozzi F, Sabourin R (2000) A new approach to segment handwritten digits. Proceedings of the 7th International Workshop on Frontiers in Handwriting Recognition, 577–582

  21. Pal U, Belaid A, Choisy C (2003) Touching numeral segmentation using water reservoir concept. Pattern Recogn Lett 24(1–3):261–272

    Article  Google Scholar 

  22. Palakollu S, Dhir R and Rani R (2012) Handwritten Hindi text segmentation techniques for lines and characters. Proceedings of the World Congress on Engineering and Computer Science, 1–5

  23. Panichkriangkrai C, Li L and Hachimura K (2013) Character segmentation and retrieval for learning support system of Japanese historical books. Proceedings of the 2nd International Workshop on Historical Document Imaging and Processing, 118–122

  24. Rao S, Junitha M, Bhaskara S, Rao S (2014) Segmentation of touching Telugu characters under Noisy environment. Journal of Emerging Trends in Computing and Information Sciences 5(9):698–702

    Google Scholar 

  25. Rao NV, Sastry ASCS, Chakravarthy ASN, Rao AVS (2015) Analysis of canonical character segmentation technique for ancient Telugu text documents. J Theor Appl Inf Technol 82(2):311–320

    Google Scholar 

  26. Reddy LP, Babu TR, Rao NV, Babu BR (2010) Touching syllable segmentation using Split profile algorithm. International Journal of Computer Science Issues 7(3):17–26

    Google Scholar 

  27. Saba T, Sulong G, Rehman A (2010) A survey on methods and strategies on touched character segmentation. International Journal of Research and Reviews in Computer Science 1(2):103–114

    Google Scholar 

  28. Shah K, Singh J, Pushkarna P, Kurawadwala H, Alate A (2013) A new approach for segmentation of Devnagari characters. Global Journal for Research Analysis 2(4):162–164

    Google Scholar 

  29. Sharma DV, Lehal GS (2006) An iterative algorithm for segmentation of isolated handwritten words in Gurmukhi script. The 18th International Conference on Pattern Recognition (ICPR'06), pp 1022–1025

    Google Scholar 

  30. Sridevi N, Sbashini P (2012) Segmentation of text lines and characters in ancient Tamil script documents using computational intelligence techniques. Int J Comput Appl 52(14):7–12

    Google Scholar 

  31. Srivastav A, Sahu N (2016) Segmentation of Devanagari handwritten characters. Int J Comput Appl 142(14):15–18

    Google Scholar 

  32. Sulem LL, Zahour A, Taconet B (2007) Text line segmentation of historical documents: a survey. Int J Doc Anal Recognit 9(2–4):123–138

    Article  Google Scholar 

  33. Tripathy N, Pal U (2006) Handwriting segmentation of unconstrained Oriya text. SADHANA 31(6):755–769

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Munish Kumar.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Narang, S.R., Jindal, M.K. & Kumar, M. Drop flow method: an iterative algorithm for complete segmentation of Devanagari ancient manuscripts. Multimed Tools Appl 78, 23255–23280 (2019). https://doi.org/10.1007/s11042-019-7620-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-019-7620-6

Keywords