Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Filters








10,156 Hits in 3.8 sec

Large Margin Taxonomy Embedding for Document Categorization

Kilian Q. Weinberger, Olivier Chapelle
2008 Neural Information Processing Systems  
The optimization of the semantic space incorporates large margin constraints that ensure that for each instance the correct class prototype is closer than any other.  ...  Applications of multi-class classification, such as document categorization, often appear in cost-sensitive settings.  ...  Figure 2 : 2 Figure 2: The schematic layout of the large-margin embedding of the taxonomy and the documents.As a first step, we represent topic α as the vector e α and document x i as x i = A x i .  ... 
dblp:conf/nips/WeinbergerC08 fatcat:cc6jsbvb7rdujcipflqoccatzq

Hierarchical Taxonomy-Aware and Attentional Graph Capsule RCNNs for Large-Scale Multi-Label Text Classification

Hao Peng, Jianxin Li, Senzhang Wang, Lihong Wang, Qiran Gong, Renyu Yang, Bo Li, Philip Yu, Lifang He
2019 IEEE Transactions on Knowledge and Data Engineering  
To leverage the hierarchical relations among the class labels, we propose a hierarchical taxonomy embedding method to learn their representations, and define a novel weighted margin loss by incorporating  ...  In this paper, we propose a novel hierarchical taxonomy-aware and attentional graph capsule recurrent CNNs framework for large-scale multi-label text classification.  ...  weighted margin loss for large-scale multi-label text classification: Word Order Preserved Graph-of-Words for Document Modeling.  ... 
doi:10.1109/tkde.2019.2959991 fatcat:wi5guiyng5bqxeuwggro6ysgue

A taxonomy fuzzy filtering approach

S. Vrettos, A. Stafylopatis
2003 Journal of Automatic Control  
Our work proposes the use of topic taxonomies as part of a filtering language. Given a taxonomy, a classifier is trained for each one of its topics.  ...  (Topic1 AND Topic2) OR Topic3, in order to filter related documents in a stream.  ...  Text/Hypertext categorization promises not only to help maintain updated and large web taxonomies, but to be used in the context of content-based filtering [3] - [8] .  ... 
doi:10.2298/jac0301026v fatcat:gnirpd4vd5btdl76cwxwl3bjxa

An Empirical Study of E-Commerce Website Success Model

Junjun Li, Jianjun Sun
2009 2009 International Conference on Management and Service Science  
In the Rakuten data challenge on taxonomy Classification for eCommerce -scale Product Catalogs, we propose an approach based on deep convolutional neural networks to predict product taxonomies using their  ...  ACKNOWLEDGMENTS The author would like to thank the organizer of SIGIR 2018 eCom Data Challenge (Rakuten Institute of Technology Boston (RIT-Boston)) for their support.  ...  For our cases with large amount of classes, we create a unique binary coding for each taxonomy.  ... 
doi:10.1109/icmss.2009.5302176 fatcat:oazunshqtzbora5syyfqxgyjee

TELEClass: Taxonomy Enrichment and LLM-Enhanced Hierarchical Text Classification with Minimal Supervision [article]

Yunyi Zhang, Ruozhen Yang, Xueqiang Xu, Jinfeng Xiao, Jiaming Shen, Jiawei Han
2024 arXiv   pre-print
Hierarchical text classification aims to categorize each document into a set of classes in a label taxonomy.  ...  topical terms mined from the corpus to facilitate classifier training and (2) utilizes LLMs for both data annotation and creation tailored for the hierarchical label space.  ...  margin.  ... 
arXiv:2403.00165v1 fatcat:maz2bl3lhncujaf3ynn2if7rti

Seeded Hierarchical Clustering for Expert-Crafted Taxonomies [article]

Anish Saha, Amith Ananthram, Emily Allaway, Heng Ji, Kathleen McKeown
2022 arXiv   pre-print
., political science) use expert-crafted taxonomies to make sense of large, unlabeled corpora.  ...  HierSeed assigns documents to topics by weighing document density against topic hierarchical structure.  ...  In fact, HIERSEED with fitting data outperforms both these methods by a large margin across all corpora and metrics.  ... 
arXiv:2205.11602v1 fatcat:jmdwans4jnejlp5rxpwlyd3dbe

Unsupervised Neural Categorization for Scientific Publications [chapter]

Keqian Li, Hanwen Zha, Yu Su, Xifeng Yan
2018 Proceedings of the 2018 SIAM International Conference on Data Mining  
Most conventional document categorization methods require a large number of documents with labeled categories for training.  ...  Finally we categorize documents by jointly considering the category attribution of their concepts.  ...  Acknowledgements The authors would like to thank the anonymous reviewers for their thoughtful comments, and Jiaming Shen, Jingbo Shang and Jiawei Han for the help with Segphrase.  ... 
doi:10.1137/1.9781611975321.5 dblp:conf/sdm/LiZSY18 fatcat:mmphj4avavavriz4hqmhhwhyfa

A Brief Review of Network Embedding

Yaojing Wang, Yuan Yao, Hanghang Tong, Feng Xu, Jian Lu
2019 Big Data Mining and Analytics  
In this article, we briefly review the existing network embedding methods by two taxonomies.  ...  The non-technical taxonomy focuses on the problem setting aspect and categorizes existing work based on whether to preserve special network properties, to consider special network types, or to incorporate  ...  Figure 1 summarizes the proposed two taxonomies. For each taxonomy, we first review and categorize the existing network embedding methods accordingly.  ... 
doi:10.26599/bdma.2018.9020029 dblp:journals/bigdatama/WangYTXL19 fatcat:qg2vj4ueh5fyliuvhb6if6gu7y

Knowledge Enhanced Pretrained Language Models: A Compreshensive Survey [article]

Xiaokai Wei, Shen Wang, Dejiao Zhang, Parminder Bhatia, Andrew Arnold
2021 arXiv   pre-print
We introduce three taxonomies to categorize existing work. Besides, we also survey the various NLU and NLG applications on which KE-PLM has demonstrated superior performance over vanilla PLMs.  ...  Pretrained Language Models (PLM) have established a new paradigm through learning informative contextualized representations on large-scale text corpus.  ...  To provide insights on these models and facilitate future research, we build three taxonomies to categorize the existing KE-PLMs.  ... 
arXiv:2110.08455v1 fatcat:b2nw5jdu7neo3brveddmah6mra

TaxoCom: Topic Taxonomy Completion with Hierarchical Discovery of Novel Topic Clusters [article]

Dongha Lee, Jiaming Shen, SeongKu Kang, Susik Yoon, Jiawei Han, Hwanjo Yu
2022 arXiv   pre-print
We propose a novel framework for topic taxonomy completion, named TaxoCom, which recursively expands the topic taxonomy by discovering novel sub-topic clusters of terms and documents.  ...  other baselines for a downstream task.  ...  In Table 3 , TaxoCom significantly outperforms all the baselines in terms of both the measures. 11 For topic completeness, the weakly supervised methods beat the unsupervised methods by a large margin  ... 
arXiv:2201.06771v1 fatcat:kqhnz4a2vnb3ncg3mv2rqlddz4

TagRec: Automated Tagging of Questions with Hierarchical Learning Taxonomy [article]

Venktesh V, Mukesh Mohania, Vikram Goyal
2021 arXiv   pre-print
We demonstrate that our method helps to handle the unseen labels and hence can be used for taxonomy tagging in the wild.  ...  (taxonomy) vector representations.  ...  The margin was set to a value of 0.1, which is a fraction of the norm of the embedding vectors (1.0), and it yields the best performance.  ... 
arXiv:2107.10649v1 fatcat:mltjp7y7xzbzlljkzcq7bekf2q

Analogy-preserving Semantic Embedding for Visual Object Categorization

Sung Ju Hwang, Kristen Grauman, Fei Sha
2013 International Conference on Machine Learning  
., according to a given object taxonomy for visual recognition), limiting the influence to pairwise structures.  ...  We translate semantic analogies into higher-order geometric constraints called analogical parallelograms, and use them in a novel convex regularizer for a discriminatively learned label embedding.  ...  embedding, by a large margin (see dotted circles).  ... 
dblp:conf/icml/HwangGS13 fatcat:rdscg5msejedlj4xb7wpxmzyly

Hierarchical taxonomy preparation for text categorization using consistent bipartite spectral graph copartitioning

Bin Gao, Tie-Yan Liu, Guang Feng, Tao Qin, Qian-Sheng Cheng, Wei-Ying Ma
2005 IEEE Transactions on Knowledge and Data Engineering  
In this paper, we propose a novel algorithm to automatically mine a hierarchical structure from the flat taxonomy of a data corpus as a preparation for the adoption of hierarchical classification.  ...  In particular, we first compute matrices to represent the relations among categories, documents, and terms.  ...  , and V B will be the embedding for term clustering.  ... 
doi:10.1109/tkde.2005.147 fatcat:pt4z6ed3lreorn2taojnlm3shy

MATCH: Metadata-Aware Text Classification in A Large Hierarchy [article]

Yu Zhang, Zhihong Shen, Yuxiao Dong, Kuansan Wang, Jiawei Han
2021 arXiv   pre-print
Commonly, the metadata of the given documents and the hierarchy of the labels are available in real-world applications.  ...  Multi-label text classification refers to the problem of assigning each given document its most relevant labels from the label set.  ...  For our MATCH framework, we set the margin of embedding pre-training 𝛾 = 0.3, number of attention heads 𝑘 = 2, number of [CLS] tokens 𝐶 = 8, number of Transformer layers 𝐿 = 3, and the dropout rate  ... 
arXiv:2102.07349v2 fatcat:tawngtzaj5d7dnmb43dz2ggxpy

MN-DS: A Multilabeled News Dataset for News Articles Hierarchical Classification [article]

Alina Petukhova, Nuno Fachada
2023 arXiv   pre-print
We manually labelled the articles based on a hierarchical taxonomy with 17 first-level and 109 second-level categories.  ...  This dataset can be used to train machine learning models for automatically classifying news articles by topic.  ...  The following embeddings were selected: • Tf-idf embedding, where Tf-idf stands for term frequency-inverse document frequency 15 .  ... 
arXiv:2212.12061v2 fatcat:z33lyo6kv5ebbexjmj3crl6z74
« Previous Showing results 1 — 15 out of 10,156 results