Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Table 2 Database figures

From: MDIW-13: a New Multi-Lingual and Multi-Script Database and Benchmark for Script Identification

Script

Abbrev

Handwritten

Printed

  

Docs

Lines

Words

Docs

Lines

Words

Arabic/Per

Arab

48

621

3940

51

1082

6202

Bengali

Ban

67

1486

9320

51

466

2557

Gujarati

Guj

3

41

181

32

384

2211

Gurmukhi/Punjabi

Gurm

6

111

700

115

1062

9104

Devanagari

Hind

21

230

1457

47

397

2782

Japanese

Jap

20

121

441

80

559

1814

Kannada

Kan

15

377

1995

53

582

2157

Malayalam

Mal

12

211

719

70

706

4320

Oriya

Ori

50

1136

7847

42

548

2309

Roman

Rom

90

750

4308

56

961

7627

Tamil

Tam

14

276

1430

46

301

2118

Telugu

Tel

10

154

801

49

483

2126

Thai

Tha

26

473

4472

61

461

3717

 

Total:

382

5987

37611

753

7992

49,044

  1. Docs: number of documents; Lines: number of lines; Words: number of words