Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

28,250 Hits in 6.5 sec

Information Extraction from Visually Rich Documents with Font Style Embeddings [article]

Ismail Oussaid, William Vanhuffel, Pirashanth Ratnamogan, Mhamed Hajaiej, Alexis Mathey, Thomas Gilles
2022 arXiv   pre-print
Information extraction (IE) from documents is an intensive area of research with a large set of industrial applications.  ...  Our experiments on three real-world complex datasets demonstrate that using token style attributes based embedding instead of a raw visual embedding in LayoutLM model is beneficial.  ...  In the task of information extraction, one need to get textual information from the document.  ... 
arXiv:2111.04045v2 fatcat:visu7gjpyravzkp4kwtq26t7uy

Extraction of Logical Structure from Articles in Mathematics [chapter]

Koji Nakagawa, Akihiro Nomura, Masakazu Suzuki
2004 Lecture Notes in Computer Science  
Then the meta-information (e.g. title, author) and the logical structure (e.g. section, theorem) of the documents are automatically extracted.  ...  The purpose of this paper is to show the extraction method of logical structure specialized for mathematical documents.  ...  Conclusion A method of extracting meta-information and logical structure from mathematical documents was presented and implemented on the base of the INFTY system.  ... 
doi:10.1007/978-3-540-27818-4_20 fatcat:bljntj4l5nb6bn3bhkv2pcvfqu

Information Retrieval System for Handwritten Documents [chapter]

Sargur Srihari, Anantharaman Ganesh, Catalin Tomai, Yong-Chul Shin, Chen Huang
2004 Lecture Notes in Computer Science  
Several types of queries are permitted: (i) entire document image; (ii) a region of interest (ROI) of a document; (iii) a word image; and (iv) textual.  ...  The design and performance of a content-based information retrieval system for handwritten documents is described.  ...  Acknowledgments CEDAR-FOX is the result of the effort of many students and researchers at CEDAR.  ... 
doi:10.1007/978-3-540-28640-0_28 fatcat:dih37sul6nf5tm5irpkqehlnlm


2010 Issues in Information Systems  
semantics of image information.  ...  Therefore, with the rapid development of internet technology, the number of internet users and the amount of web image information on the internet is ever increasing.  ...  Feature vectors are extracted by image preprocessing and meta information, such as keywords, semantic information, and visual information, are manually or automatically added.  ... 
doi:10.48009/1_iis_2010_483-490 fatcat:j7vx6kcr35cytf7cj6m4wcmolq


Andruid Kerne, Yin Qu, Andrew M. Webb, Sashikanth Damaraju, Nic Lupfer, Abhinav Mathur
2010 Proceedings of the 19th ACM international conference on Information and knowledge management - CIKM '10  
We introduce meta-metadata, a language and software architecture addressing a metadata semantics lifecycle: (1) data structures for representation of metadata in programs; (2) metadata extraction from  ...  Collecting, organizing, and thinking about diverse information resources is the keystone of meaningful digital information experiences, from research to education to leisure.  ...  Future parsers will handle other types of documents, such as images, and various formats of audio and video.  ... 
doi:10.1145/1871437.1871580 dblp:conf/cikm/KerneQWDLM10 fatcat:tamcv6sjzvfurpa7d2axu46p4a

Knowledge Modelling for Deductive Web Mining [chapter]

Vojtěch Svátek, Martin Labský, Miroslav Vacura
2004 Lecture Notes in Computer Science  
We developed a multi-dimensional, ontology-based framework, and a collection of problem-solving methods, which enable to characterise DWM applications at an abstract level.  ...  We show that the heterogeneity and unboundedness of the web demands for some modifications of the problem-solving method paradigm used in the context of traditional artificial intelligence. ...  ...  Acknowledgements The research is partially supported by the grant no.201/03/1318 of the Czech Science Foundation.  ... 
doi:10.1007/978-3-540-30202-5_23 fatcat:edk55kn3tjahjkzguj2udsnyau

Document Image Processing for Hospital Information Systems [chapter]

Hiroharu Kawanaka, Koji Yamamoto, Haruhiko Takase, Shinji Tsuruok
2012 Modern Information Systems  
By converting the data to such files, the readability of the data are guaranteed, and the meta-data of the documents, e.g. timestamp, patient ID, document type etc., are used as key information of search  ...  Our method can cover this problem as much of these meta-data are automatically extracted from the images, which would contribute to improve DACS.  ...  .), ISBN: 978-953-51-0647-0, InTech, Available from: http://www.intechopen.com/books/modern-informationsystems/document-image-processing-for-hospital-information-systems  ... 
doi:10.5772/37400 fatcat:adqfd6tmefeqlon7a7yccqx7rm

Content-level Annotation of Large Collection of Printed Document Images

A. Kumar, C.V. Jawahar
2007 Proceedings of the International Conference on Document Analysis and Recognition  
In this paper, we propose an efficient hierarchical approach for annotation of large collection of printed document images. We align document images with independently keyed-in text.  ...  We employ an XML representation for storage of the annotation information.  ...  Authors also thank members of the consortium working on development of Indian language OCRs for their inputs and cooperation.  ... 
doi:10.1109/icdar.2007.4377025 dblp:conf/icdar/KumarJ07 fatcat:pl74yrflzjd2jm7jxltwjzkleq

Parallel processing considerations for image recognition tasks

Steven J. Simske, John D. Owens, I-Jong Lin, Yu-Jin Zhang, Giordano B. Beretta
2011 Parallel Processing for Imaging Applications  
This type of image analysis is readily addressed by a map-reduce approach. Examples include document skew detection and multiple face detection and tracking.  ...  However, there are three less trivial categories of parallel processing that will be considered in this paper: parallel processing (1) by task; (2) by image region; and (3) by meta-algorithm.  ...  This type of document classification is roughly equivalent to template matching. I term this document classification by style.  ... 
doi:10.1117/12.879645 dblp:conf/ppia/Simske11 fatcat:oad7gfwshfhz5pwh3ghwpq5d3a

Embedding Metadata and Other Semantics in Word Processing Documents

Peter Sefton, Ian Barnes, Ron Ward, Jim Downing
2009 International Journal of Digital Curation  
This paper describes a technique for embedding document metadata, and potentially other semantic references inline in word processing documents, which the authors have implemented with the help of a software  ...  Several assumptions underly the approach; It must be available across computing platforms and work with both Microsoft Word (because of its user base) and OpenOffice.org (because of its free availability  ...  Thanks to Peter Murray-Rust and Joe Townsend at the The Unilever Centre for Molecular Science Informatics for their input into this document and to Linda Octalina at the University of Southern Queensland  ... 
doi:10.2218/ijdc.v4i2.96 fatcat:nxnki5nl5bazvdn57pn6vauypm


2004 International journal of pattern recognition and artificial intelligence  
Previous research efforts on optical font recognition have mostly limited applications since they deal with only a few types of font attributes and estimate them from a line or block of text.  ...  At the word-level, it has the advantages of obtaining more detailed font attributes including the following: script (Korean and English), font style (regular, bold, italic, and underlined), typeface (Myung-jo  ...  Acknowledgment This work was supported by grant number R05-2003-000-10396-0 from the Program for Regional Scientists of the Korea Science and Engineering Foundation (KOSEF).  ... 
doi:10.1142/s0218001404003307 fatcat:pnafajmfnnbszdwtqm3r54oude

Citation Data-set for Machine Learning Citation Styles and Entity Extraction from Citation Strings [article]

Niall Martin Ryan
2018 arXiv   pre-print
Meticulous extraction is further needed when evaluating the similarity of documents and calculating their citation impact.  ...  This meta-data can be difficult to acquire accurately due to the thousands of different styles and noise that can be applied to a bibliography to create the citation string.  ...  Particularly the documents structure and layout which impede upon meta data extraction. This information usually needs to be reverse engineered for it to be recovered.  ... 
arXiv:1805.04798v2 fatcat:dhbatuekzvf77nehg6juapkucu

Page 12 of The Information Management Journal Vol. 33, Issue 2 [page]

1999 The Information Management Journal  
Structure Based Mark-up Languages One of the major problems that information managers face with digital information is the need to identify the type of documents being maintained on the myriad of disks  ...  , which func- tions differently from the caption of a graphic or the tabular data extracted from a database table, etc.).  ... 

Web Content Adaptation System

May H. Riadh, Akram M. Othman
2011 International Journal of Computer Applications  
Table of Content The document is re-created based on extracted content and a list of heading is created.  ...  extract the content from web page as shown below: HTML Parser HTML represents a certain range of hypertext information, it is a simple markup language used to create hypertext documents that are platform  ...  The following conclusions reached from the implementation of the proposed adaptation system. 7.  ... 
doi:10.5120/2978-3817 fatcat:dmq3jpwww5h2hojsljmzb5lsm4

Cross-Domain Document Object Detection: Benchmark Suite and Method [article]

Kai Li, Curtis Wigington, Chris Tensmeyer, Handong Zhao, Nikolaos Barmpalios, Vlad I. Morariu, Varun Manjunatha, Tong Sun, Yun Fu
2020 arXiv   pre-print
For each dataset, we provide the page images, bounding box annotations, PDF files, and the rendering layers extracted from the PDF files.  ...  We establish a benchmark suite consisting of different types of PDF document datasets that can be utilized for cross-domain DOD model training and evaluation.  ...  We thank Richard Cohn and Kana Sethu for coding the tool and instructing how to use it for synthesizing documents.  ... 
arXiv:2003.13197v1 fatcat:t46n3ompnvbc7p2vb2x2j35ebu
« Previous Showing results 1 — 15 out of 28,250 results