Large Margin Taxonomy Embedding for Document Categorization.

The optimization of the semantic space incorporates large margin constraints that ensure that for each instance the correct class prototype is closer than any other. ... Applications of multi-class classification, such as document categorization, often appear in cost-sensitive settings. ... Figure 2 : 2 Figure 2: The schematic layout of the large-margin embedding of the taxonomy and the documents.As a first step, we represent topic α as the vector e α and document x i as x i = A x i . ...

dblp:conf/nips/WeinbergerC08 fatcat:cc6jsbvb7rdujcipflqoccatzq

To leverage the hierarchical relations among the class labels, we propose a hierarchical taxonomy embedding method to learn their representations, and define a novel weighted margin loss by incorporating ... In this paper, we propose a novel hierarchical taxonomy-aware and attentional graph capsule recurrent CNNs framework for large-scale multi-label text classification. ... weighted margin loss for large-scale multi-label text classification: Word Order Preserved Graph-of-Words for Document Modeling. ...

doi:10.1109/tkde.2019.2959991 fatcat:wi5guiyng5bqxeuwggro6ysgue

Our work proposes the use of topic taxonomies as part of a filtering language. Given a taxonomy, a classifier is trained for each one of its topics. ... (Topic1 AND Topic2) OR Topic3, in order to filter related documents in a stream. ... Text/Hypertext categorization promises not only to help maintain updated and large web taxonomies, but to be used in the context of content-based filtering [3] - [8] . ...

doi:10.2298/jac0301026v fatcat:gnirpd4vd5btdl76cwxwl3bjxa

Open Access

In the Rakuten data challenge on taxonomy Classification for eCommerce -scale Product Catalogs, we propose an approach based on deep convolutional neural networks to predict product taxonomies using their ... ACKNOWLEDGMENTS The author would like to thank the organizer of SIGIR 2018 eCom Data Challenge (Rakuten Institute of Technology Boston (RIT-Boston)) for their support. ... For our cases with large amount of classes, we create a unique binary coding for each taxonomy. ...

doi:10.1109/icmss.2009.5302176 fatcat:oazunshqtzbora5syyfqxgyjee

Hierarchical text classification aims to categorize each document into a set of classes in a label taxonomy. ... topical terms mined from the corpus to facilitate classifier training and (2) utilizes LLMs for both data annotation and creation tailored for the hierarchical label space. ... margin. ...

arXiv:2403.00165v1 fatcat:maz2bl3lhncujaf3ynn2if7rti

Open Access

., political science) use expert-crafted taxonomies to make sense of large, unlabeled corpora. ... HierSeed assigns documents to topics by weighing document density against topic hierarchical structure. ... In fact, HIERSEED with fitting data outperforms both these methods by a large margin across all corpora and metrics. ...

arXiv:2205.11602v1 fatcat:jmdwans4jnejlp5rxpwlyd3dbe

Open Access

Most conventional document categorization methods require a large number of documents with labeled categories for training. ... Finally we categorize documents by jointly considering the category attribution of their concepts. ... Acknowledgements The authors would like to thank the anonymous reviewers for their thoughtful comments, and Jiaming Shen, Jingbo Shang and Jiawei Han for the help with Segphrase. ...

doi:10.1137/1.9781611975321.5 dblp:conf/sdm/LiZSY18 fatcat:mmphj4avavavriz4hqmhhwhyfa

In this article, we briefly review the existing network embedding methods by two taxonomies. ... The non-technical taxonomy focuses on the problem setting aspect and categorizes existing work based on whether to preserve special network properties, to consider special network types, or to incorporate ... Figure 1 summarizes the proposed two taxonomies. For each taxonomy, we first review and categorize the existing network embedding methods accordingly. ...

doi:10.26599/bdma.2018.9020029 dblp:journals/bigdatama/WangYTXL19 fatcat:qg2vj4ueh5fyliuvhb6if6gu7y

We introduce three taxonomies to categorize existing work. Besides, we also survey the various NLU and NLG applications on which KE-PLM has demonstrated superior performance over vanilla PLMs. ... Pretrained Language Models (PLM) have established a new paradigm through learning informative contextualized representations on large-scale text corpus. ... To provide insights on these models and facilitate future research, we build three taxonomies to categorize the existing KE-PLMs. ...

arXiv:2110.08455v1 fatcat:b2nw5jdu7neo3brveddmah6mra

Open Access

We propose a novel framework for topic taxonomy completion, named TaxoCom, which recursively expands the topic taxonomy by discovering novel sub-topic clusters of terms and documents. ... other baselines for a downstream task. ... In Table 3 , TaxoCom significantly outperforms all the baselines in terms of both the measures. 11 For topic completeness, the weakly supervised methods beat the unsupervised methods by a large margin ...

arXiv:2201.06771v1 fatcat:kqhnz4a2vnb3ncg3mv2rqlddz4

Open Access Multiple Versions

We demonstrate that our method helps to handle the unseen labels and hence can be used for taxonomy tagging in the wild. ... (taxonomy) vector representations. ... The margin was set to a value of 0.1, which is a fraction of the norm of the embedding vectors (1.0), and it yields the best performance. ...

arXiv:2107.10649v1 fatcat:mltjp7y7xzbzlljkzcq7bekf2q

., according to a given object taxonomy for visual recognition), limiting the influence to pairwise structures. ... We translate semantic analogies into higher-order geometric constraints called analogical parallelograms, and use them in a novel convex regularizer for a discriminatively learned label embedding. ... embedding, by a large margin (see dotted circles). ...

dblp:conf/icml/HwangGS13 fatcat:rdscg5msejedlj4xb7wpxmzyly

In this paper, we propose a novel algorithm to automatically mine a hierarchical structure from the flat taxonomy of a data corpus as a preparation for the adoption of hierarchical classification. ... In particular, we first compute matrices to represent the relations among categories, documents, and terms. ... , and V B will be the embedding for term clustering. ...

doi:10.1109/tkde.2005.147 fatcat:pt4z6ed3lreorn2taojnlm3shy

Commonly, the metadata of the given documents and the hierarchy of the labels are available in real-world applications. ... Multi-label text classification refers to the problem of assigning each given document its most relevant labels from the label set. ... For our MATCH framework, we set the margin of embedding pre-training 𝛾 = 0.3, number of attention heads 𝑘 = 2, number of [CLS] tokens 𝐶 = 8, number of Transformer layers 𝐿 = 3, and the dropout rate ...

arXiv:2102.07349v2 fatcat:tawngtzaj5d7dnmb43dz2ggxpy

Multiple Versions

We manually labelled the articles based on a hierarchical taxonomy with 17 first-level and 109 second-level categories. ... This dataset can be used to train machine learning models for automatically classifying news articles by topic. ... The following embeddings were selected: • Tf-idf embedding, where Tf-idf stands for term frequency-inverse document frequency 15 . ...

arXiv:2212.12061v2 fatcat:z33lyo6kv5ebbexjmj3crl6z74

Open Access Multiple Versions

Large Margin Taxonomy Embedding for Document Categorization

Preserved Fulltext

Hierarchical Taxonomy-Aware and Attentional Graph Capsule RCNNs for Large-Scale Multi-Label Text Classification

Preserved Fulltext

A taxonomy fuzzy filtering approach

Preserved Fulltext

An Empirical Study of E-Commerce Website Success Model

Preserved Fulltext

TELEClass: Taxonomy Enrichment and LLM-Enhanced Hierarchical Text Classification with Minimal Supervision [article]

Preserved Fulltext

Seeded Hierarchical Clustering for Expert-Crafted Taxonomies [article]

Preserved Fulltext

Unsupervised Neural Categorization for Scientific Publications [chapter]

Preserved Fulltext

A Brief Review of Network Embedding

Preserved Fulltext

Knowledge Enhanced Pretrained Language Models: A Compreshensive Survey [article]

Preserved Fulltext

TaxoCom: Topic Taxonomy Completion with Hierarchical Discovery of Novel Topic Clusters [article]

Preserved Fulltext

Other Versions

TagRec: Automated Tagging of Questions with Hierarchical Learning Taxonomy [article]

Preserved Fulltext

Analogy-preserving Semantic Embedding for Visual Object Categorization

Preserved Fulltext

Hierarchical taxonomy preparation for text categorization using consistent bipartite spectral graph copartitioning

Preserved Fulltext

MATCH: Metadata-Aware Text Classification in A Large Hierarchy [article]

Preserved Fulltext

Other Versions

MN-DS: A Multilabeled News Dataset for News Articles Hierarchical Classification [article]

Preserved Fulltext

Other Versions