Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Filters








1,689 Hits in 7.6 sec

Discovery of Frequent Tag Tree Patterns in Semistructured Web Documents [chapter]

Tetsuhiro Miyahara, Yusuke Suzuki, Takayoshi Shoudai, Tomoyuki Uchida, Kenichi Takahashi, Hiroaki Ueda
2002 Lecture Notes in Computer Science  
We propose a new method for discovering frequent tree structured patterns in semistructured Web documents by using a tag tree pattern as a hypothesis.  ...  So a tag tree pattern is suited for representing tree structured patterns in such Web documents. First we show that it is hard to compute the optimum frequent tag tree pattern.  ...  This work is partly supported by Grant-in-Aid for Scientific Research (C) No.13680459 from Japan Society for the Promotion of Science and Grant for Special Academic Research No.1608 from Hiroshima City  ... 
doi:10.1007/3-540-47887-6_35 fatcat:uojoaags6vb6zdjnam74xmu3qu

Discovery of Maximally Frequent Tag Tree Patterns with Contractible Variables from Semistructured Documents [chapter]

Tetsuhiro Miyahara, Yusuke Suzuki, Takayoshi Shoudai, Tomoyuki Uchida, Kenichi Takahashi, Hiroaki Ueda
2004 Lecture Notes in Computer Science  
Many documents such as Web documents or XML files do not have rigid structures.  ...  Then In this paper, we have considered knowledge discovery from semistructured Web documents such as XML files.  ...  Thus, we have shown that atag tree pattern and the algorithms are useful for knowledge discovery from semistructured Web documents. Fig. 1.  ... 
doi:10.1007/978-3-540-24775-3_17 fatcat:pgt7bdgcffckjhim4a2nykjqqq

Discovering typical structures of documents

Ke Wang, Huiqing Liu
1998 Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '98  
The structure of a document refers to the role and hierarchy of subdocument references. Many online documents are similarly structured, though not identically structured.  ...  We study the problem of discovering "typical" structures of a collection of such documents, where the user specifies the minimum frequency of a typical structure.  ...  Discovering interests/access patterns. Detecting user's interests and browsing patterns on the Web can help organize Web pages and attract more businesses.  ... 
doi:10.1145/290941.290982 dblp:conf/sigir/WangL98 fatcat:gx4bjboclvgajapixzg26upgsu

Efficient Algorithms for Discovering Frequent and Maximal Substructures from Large Semistructured Data [chapter]

Hiroki Arimura
2010 Lecture Notes in Electrical Engineering  
In this paper, we review recent advances in efficient algorithms for semi-structured data mining, that is, discovery of rules and patterns from structured data such as sets, sequences, trees, and graphs  ...  After introducing basic definitions and problems, We present efficent algorithms for frequent and maximal pattern mining for classes of sets, sequences, and trees.  ...  The results presented in this talk are obtained in the joint works with Takeaki Uno, Shin-ichi Nakano, Shin-ich Minato, Tatsuya Asai, Takashi Katoh, and Kouichi Hirata.  ... 
doi:10.1007/978-90-481-9794-1_66 fatcat:pdkbj5txsjauppaknm35ckv5zq

Data mining and the Web

Minos N. Garofalakis, Rajeev Rastogi, S. Seshadri, Kyuseok Shim
1999 Proceedings of the second international workshop on Web information and data management - WIDM '99  
The approach in [4] attempts to mine the relational schema for a set of semistructured documents using a mining algorithm that computes frequent tree patterns in the data.  ...  Mining the Structure of Web Documents Web pages are instances of semistructured data, and thus mining their structure is critical to extracting information from them.  ... 
doi:10.1145/319759.319781 dblp:conf/widm/GarofalakisRSS99 fatcat:zttixd2fajcsvbiibc43g563vq

MINING IS-PART-OF ASSOCIATION PATTERNS FROM SEMISTRUCTURED DATA

KE WANG, HUIQING LIU
2001 Knowledge Management and Intelligent Enterprises  
One example of semistructured data sources is the World Wide Web (WWW). In the semistructured world, the individual schema contained in each object has replaced the external schema of the data.  ...  We introduce the framework of is-part-of association patterns to address the issue. We show applications of mining is-part-of association patterns in several disparate domains.  ...  An is-part-of association pattern for such documents contains frequently co-occurred keywords grouped according to topics.  ... 
doi:10.1142/9789812838995_0012 fatcat:4pftzgygtvcchjdkggvf2lzmua

Discovering structural association of semistructured data

Ke Wang, Huiqing Liu
2000 IEEE Transactions on Knowledge and Data Engineering  
The discovery task is impacted by structural features of semistructured data in a non-trivial way and traditional data mining frameworks are inapplicable.  ...  Many semistructured objects are similarly, though not identically, structured. We study the problem of discovering \typical" substructures of a collection of semistructured objects.  ...  In a semistructured document, each subdocument reference is labeled by its role, and the \topic" of a document is represented by the tree-like structure of such roles rooted at the document.  ... 
doi:10.1109/69.846290 fatcat:b2xd4d43cbdh7ogfqm366fmibu

Fast mining of frequent tree structures by hashing and indexing

Dimitrios Katsaros, Alexandros Nanopoulos, Yannis Manolopoulos
2005 Information and Software Technology  
Hierarchical semistructured data arise frequently in the Web, or in biological information processing applications.  ...  In this paper, we study the problem of discovering frequently occurring structures in semistructured objects using the notion of association rules.  ...  Schema discovery for semistructured data Initial works on structural pattern discovery in collections of semistructured objects are described in Refs. [10, 22, 23, 30] .  ... 
doi:10.1016/j.infsof.2004.06.006 fatcat:gyiedp23p5dq3li5fy5wzi4kji

Interactive Query Formulation in Semistructured Databases [chapter]

Agathoniki Trigoni
2002 Lecture Notes in Computer Science  
IR-style searches do not exploit but a small part of the user's knowledge about the data (mainly keywords), and result in sets of documents that cannot be filtered using detailed criteria.  ...  -A system is not user-friendly when it requires a detailed knowledge of the database structure, especially in the presence of large amounts of heterogeneous data.  ...  Our framework is equally relevant and could benefit from previous work on discovering frequent tree expressions in documents [28, 29] .  ... 
doi:10.1007/3-540-36109-x_28 fatcat:p5mf2zeasvdptdgojwyzoxbwey

Mining frequent rooted subtrees in XML data with Me-Tree

Wansong Zhang, Daxin Liu, Jianpei Zhang
2004 Proceedings of the 2004 IEEE Systems and Information Engineering Design Symposium, 2004.  
In this paper, we study a data mining problem of discovering frequent subtrees in a large collection of XML data, where both of the patterns and the data are modeled by labeled ordered trees.  ...  Hence, there have been increasing demands for efficient methods for discovering patterns in large collection of semistructured data.  ...  In Section 3, we present our algorithm for solving the frequent pattern discovery problem for labeled ordered trees using the techniques of rightmost-expansion and Me-tree.  ... 
doi:10.1109/sieds.2004.239908 fatcat:pxonwshm5zfvvd23m6um3mfrl4

Proceedings 2002 IEEE International Conference on Data Mining. ICDM 2002

2002 2002 IEEE International Conference on Data Mining 2002 Proceedings ICDM-02  
Sebag Computing Frequent Graph Patterns from Semistructured Data ..................... ....... .............................. 458 N. Vanerik, E. Gudes, and S. E.  ...  Lam From Path Tree to Frequent Patterns: A Framework for Mining Frequent ...................... Mining Case Bases for Action Recommendation .................................... ............. Q.  ... 
doi:10.1109/icdm.2002.1183878 fatcat:3iufo7cncbbzbn7cwjme73wrpm

An approach for frequent access pattern identification in web usage mining

Murli Manohar Sharma, Anju Bala
2014 2014 International Conference on Advances in Computing, Communications and Informatics (ICACCI)  
In the existing approaches of finding the patterns, tree have been created which is based on the frequent access pattern identification. The creation of tree has increased the overhead of web usage.  ...  The web server logs provide important information. In the field of web mining the analysis of the web logs is done to identify the users' search patterns.  ...  suppressed and crucial information about the frequent-patterns named frequent-pattern tree (FP-tree).  ... 
doi:10.1109/icacci.2014.6968481 dblp:conf/icacci/SharmaB14 fatcat:26ix6zevanflbj7y4rbts6hnfq

Semi-structured Documents Mining: A Review and Comparison

Amina Madani, Omar Boussaid, Djamel Eddine Zegour
2013 Procedia Computer Science  
In this survey paper, we review popular semi-structured documents mining approaches (structure alone and both structure and content).  ...  The number of semi-structured documents that is produced is steadily increasing. Thus, it will be essential for discovering new knowledge from them.  ...  They present an efficient pattern mining algorithm called FREQT for discovering all frequent tree patterns from a large collection of labeled ordered trees.  ... 
doi:10.1016/j.procs.2013.09.110 fatcat:sjh47ru4ofcdtezxs6brnfiqta

Web data extraction based on structural similarity

Zhao Li, Wee Keong Ng, Aixin Sun
2005 Knowledge and Information Systems  
Document schemata are patterns of structures embedded in documents.  ...  In this paper, we demonstrate a holistic approach to Web data extraction. The principal component of our proposal is the notion of a document schema.  ...  We have also introduced an efficient algorithm to discover frequent structures to generate schema in Web documents.  ... 
doi:10.1007/s10115-004-0188-z fatcat:vyqafjj67re57bxiciehhxv2cq

Discovering interesting information with advances in web technology

Richi Nayak, Pierre Senellart, Fabian M. Suchanek, Aparna S. Varde
2013 SIGKDD Explorations  
The Web is a steadily evolving resource comprising much more than mere HTML pages. With its ever-growing data sources in a variety of formats, it provides great potential for knowledge discovery.  ...  Our goal is to show that all these areas can be as useful for knowledge discovery as the HTML-based part of the Web.  ...  Some examples of knowledge discovery from XML documents are mining frequent sub-tree/sub-graph patterns, grouping and classifying documents/schemas, mining XML queries for efficient processing and schema  ... 
doi:10.1145/2481244.2481255 fatcat:lvr2d5k3cre6lpnwnd2udp22pe
« Previous Showing results 1 — 15 out of 1,689 results