Abstract
One of the major challenges of ancient manuscripts recognition is character segmentation. Because of many distinct features of ancient documents (thick characters, overlapping and touching characters), character segmentation is a very difficult task. Devanagari ancient manuscripts consist of vowels, consonants, modifiers, conjuncts and compound characters. Using existing techniques, segmentation of overlapping and touching characters is problematic. In this paper, an iterative character segmentation algorithm is presented for ancient documents in Devanagari script. At the beginning, the lines are extracted from the ancient documents by dividing the document image into vertical stripes and then using piecewise horizontal projection profiles. After that, these lines are segmented into words using vertical projection profiles and finally, words are segmented in characters using an iterative algorithm. In each iteration, character segmentation is refined. In the present work, we have proposed a new algorithm with the name ‘Drop Flow Method’ to find the segmentation path between touching components. The proposed algorithm can segment touching characters and 96.0% accuracy has been achieved for complete segmentation of Devanagari ancient manuscripts.
Similar content being viewed by others
References
Alaei A, Nagabhushan P, Pal U (2011) Piece-wise painting technique for line segmentation of unconstrained handwritten text: a specific study with Persian text documents. Pattern Anal Applic 14(4):381–394
Alam MM, Kashem MA (2010) A complete Bangla OCR system for printed characters. International Journal of Computer and Information Technology 1(1):30–35
Babu S, Jangid M (2016) Touching character segmentation of Devanagari script. ICCCNT '16 Proceedings of the 7th International Conference on Computing Communication and Networking Technologies, Article No. 26, Dallas, TX, USA: doi:https://doi.org/10.1145/2967878.2967908
Bag S, Krishna A (2015) Character segmentation of Hindi unconstrained handwritten words. Proceedings of the 17th International workshop on Combinatorial Image Analysis 9448:247–260
Bansal V, Sinha RMK (2002) Segmentation of touching and fused Devanagari characters. Pattern Recogn 35(4):875–893
Bar-Yosef I, Hagbi N, Kedem K, Dinstein I (2009) Line segmentation for degraded handwritten historical documents. 10th International Conference on Document Analysis and Recognition, Barcelona, pp 1161–1165
Brodic D, Milivojevic Z (2009) Reference text line identification based on water flow algorithm. Proceedings of the International Scientific Conference on Information, Communication and Energy Systems and Technologies 17(1):30–47
Casey RG, Lecolinet E (1996) A survey of methods and strategies in character segmentation. IEEE Trans Pattern Anal Mach Intell 18(7):690–706
Chen K, Seuret M, Liwicki M, Hennebert J, Liu CL, Ingold R (2016) Page segmentation for historical handwritten document images using conditional random fields. Proceedings of the International Conference on Frontiers in Handwriting Recognition (ICFHR), 90–95
Chen Y, Wang J (2000) Segmentation of single-or multiple-touching handwritten numeral string using background and foreground analysis. IEEE Trans Pattern Anal Mach Intell 22(11):1304–1317
Dogra S, Sehgal A (2017) Devanagari letters segmentation and recognition system: a brief review. IJSRD - International Journal for Scientific Research & Development 5(01):1418–1422
Dunn CE, Wang PSP (1992) Character segmentation techniques for handwritten text-a survey. Proceedings of the 11th International Conference on Recognition Methodology and Systems 2:577–580
Dutta K, Krishnan P, Mathew M, Jawahar CV (2018) Offline Handwriting Recognition on Devanagari Using a New Benchmark Dataset. 13th IAPR International Workshop on Document Analysis Systems (DAS), Vienna, pp 25–30. https://doi.org/10.1109/DAS.2018.69
Fujisawa H, Nakano Y, Kurino K (1992) Segmentation methods for character recognition from segmentation to document structure analysis. Proc IEEE 80(7):1079–1092
Gatos B, Louloudis G, Stamatopoulos N (2014) Segmentation of historical handwritten documents into text zones and text lines. Proceedings of the International Conference on Frontiers in Handwriting Recognition (ICFHR), 464–469.
Jindal MK, Lehal GS, Sharma RK (2009) Segmentation of touching characters in upper zone in printed Gurmukhi script. Proceedings of 2nd Bangalore Annual Compute Conference, 1–6
Kim KK, Kim JH and Suen CY (2000) Recognition of unconstrained handwritten numeral strings by composite segmentation method. Proceedings of the 15th International Conference on Pattern Recognition: 594–597
Kumar A, Yadav M, Patnaik T, Kumar B (2013) A survey on touching character segmentation. International Journal of Engineering and Advanced Technology 2(3):569–574
Mohite RS, Bombade BR (2014) Challenging issues in Devanagari script recognition. International Journal Computer Technology & Applications 5(3):947–952
Oliveira LS, Lethelier E, Bortolozzi F, Sabourin R (2000) A new approach to segment handwritten digits. Proceedings of the 7th International Workshop on Frontiers in Handwriting Recognition, 577–582
Pal U, Belaid A, Choisy C (2003) Touching numeral segmentation using water reservoir concept. Pattern Recogn Lett 24(1–3):261–272
Palakollu S, Dhir R and Rani R (2012) Handwritten Hindi text segmentation techniques for lines and characters. Proceedings of the World Congress on Engineering and Computer Science, 1–5
Panichkriangkrai C, Li L and Hachimura K (2013) Character segmentation and retrieval for learning support system of Japanese historical books. Proceedings of the 2nd International Workshop on Historical Document Imaging and Processing, 118–122
Rao S, Junitha M, Bhaskara S, Rao S (2014) Segmentation of touching Telugu characters under Noisy environment. Journal of Emerging Trends in Computing and Information Sciences 5(9):698–702
Rao NV, Sastry ASCS, Chakravarthy ASN, Rao AVS (2015) Analysis of canonical character segmentation technique for ancient Telugu text documents. J Theor Appl Inf Technol 82(2):311–320
Reddy LP, Babu TR, Rao NV, Babu BR (2010) Touching syllable segmentation using Split profile algorithm. International Journal of Computer Science Issues 7(3):17–26
Saba T, Sulong G, Rehman A (2010) A survey on methods and strategies on touched character segmentation. International Journal of Research and Reviews in Computer Science 1(2):103–114
Shah K, Singh J, Pushkarna P, Kurawadwala H, Alate A (2013) A new approach for segmentation of Devnagari characters. Global Journal for Research Analysis 2(4):162–164
Sharma DV, Lehal GS (2006) An iterative algorithm for segmentation of isolated handwritten words in Gurmukhi script. The 18th International Conference on Pattern Recognition (ICPR'06), pp 1022–1025
Sridevi N, Sbashini P (2012) Segmentation of text lines and characters in ancient Tamil script documents using computational intelligence techniques. Int J Comput Appl 52(14):7–12
Srivastav A, Sahu N (2016) Segmentation of Devanagari handwritten characters. Int J Comput Appl 142(14):15–18
Sulem LL, Zahour A, Taconet B (2007) Text line segmentation of historical documents: a survey. Int J Doc Anal Recognit 9(2–4):123–138
Tripathy N, Pal U (2006) Handwriting segmentation of unconstrained Oriya text. SADHANA 31(6):755–769
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Narang, S.R., Jindal, M.K. & Kumar, M. Drop flow method: an iterative algorithm for complete segmentation of Devanagari ancient manuscripts. Multimed Tools Appl 78, 23255–23280 (2019). https://doi.org/10.1007/s11042-019-7620-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-019-7620-6