A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2022; you can also visit the original URL.
The file type is application/pdf
.
Filters
Large Margin Taxonomy Embedding for Document Categorization
2008
Neural Information Processing Systems
The optimization of the semantic space incorporates large margin constraints that ensure that for each instance the correct class prototype is closer than any other. ...
Applications of multi-class classification, such as document categorization, often appear in cost-sensitive settings. ...
Figure 2 : 2 Figure 2: The schematic layout of the large-margin embedding of the taxonomy and the documents.As a first step, we represent topic α as the vector e α and document x i as x i = A x i . ...
dblp:conf/nips/WeinbergerC08
fatcat:cc6jsbvb7rdujcipflqoccatzq
Hierarchical Taxonomy-Aware and Attentional Graph Capsule RCNNs for Large-Scale Multi-Label Text Classification
2019
IEEE Transactions on Knowledge and Data Engineering
To leverage the hierarchical relations among the class labels, we propose a hierarchical taxonomy embedding method to learn their representations, and define a novel weighted margin loss by incorporating ...
In this paper, we propose a novel hierarchical taxonomy-aware and attentional graph capsule recurrent CNNs framework for large-scale multi-label text classification. ...
weighted margin loss for large-scale multi-label text classification: Word Order Preserved Graph-of-Words for Document Modeling. ...
doi:10.1109/tkde.2019.2959991
fatcat:wi5guiyng5bqxeuwggro6ysgue
A taxonomy fuzzy filtering approach
2003
Journal of Automatic Control
Our work proposes the use of topic taxonomies as part of a filtering language. Given a taxonomy, a classifier is trained for each one of its topics. ...
(Topic1 AND Topic2) OR Topic3, in order to filter related documents in a stream. ...
Text/Hypertext categorization promises not only to help maintain updated and large web taxonomies, but to be used in the context of content-based filtering [3] - [8] . ...
doi:10.2298/jac0301026v
fatcat:gnirpd4vd5btdl76cwxwl3bjxa
An Empirical Study of E-Commerce Website Success Model
2009
2009 International Conference on Management and Service Science
In the Rakuten data challenge on taxonomy Classification for eCommerce -scale Product Catalogs, we propose an approach based on deep convolutional neural networks to predict product taxonomies using their ...
ACKNOWLEDGMENTS The author would like to thank the organizer of SIGIR 2018 eCom Data Challenge (Rakuten Institute of Technology Boston (RIT-Boston)) for their support. ...
For our cases with large amount of classes, we create a unique binary coding for each taxonomy. ...
doi:10.1109/icmss.2009.5302176
fatcat:oazunshqtzbora5syyfqxgyjee
TELEClass: Taxonomy Enrichment and LLM-Enhanced Hierarchical Text Classification with Minimal Supervision
[article]
2024
arXiv
pre-print
Hierarchical text classification aims to categorize each document into a set of classes in a label taxonomy. ...
topical terms mined from the corpus to facilitate classifier training and (2) utilizes LLMs for both data annotation and creation tailored for the hierarchical label space. ...
margin. ...
arXiv:2403.00165v1
fatcat:maz2bl3lhncujaf3ynn2if7rti
Seeded Hierarchical Clustering for Expert-Crafted Taxonomies
[article]
2022
arXiv
pre-print
., political science) use expert-crafted taxonomies to make sense of large, unlabeled corpora. ...
HierSeed assigns documents to topics by weighing document density against topic hierarchical structure. ...
In fact, HIERSEED with fitting data outperforms both these methods by a large margin across all corpora and metrics. ...
arXiv:2205.11602v1
fatcat:jmdwans4jnejlp5rxpwlyd3dbe
Unsupervised Neural Categorization for Scientific Publications
[chapter]
2018
Proceedings of the 2018 SIAM International Conference on Data Mining
Most conventional document categorization methods require a large number of documents with labeled categories for training. ...
Finally we categorize documents by jointly considering the category attribution of their concepts. ...
Acknowledgements The authors would like to thank the anonymous reviewers for their thoughtful comments, and Jiaming Shen, Jingbo Shang and Jiawei Han for the help with Segphrase. ...
doi:10.1137/1.9781611975321.5
dblp:conf/sdm/LiZSY18
fatcat:mmphj4avavavriz4hqmhhwhyfa
A Brief Review of Network Embedding
2019
Big Data Mining and Analytics
In this article, we briefly review the existing network embedding methods by two taxonomies. ...
The non-technical taxonomy focuses on the problem setting aspect and categorizes existing work based on whether to preserve special network properties, to consider special network types, or to incorporate ...
Figure 1 summarizes the proposed two taxonomies. For each taxonomy, we first review and categorize the existing network embedding methods accordingly. ...
doi:10.26599/bdma.2018.9020029
dblp:journals/bigdatama/WangYTXL19
fatcat:qg2vj4ueh5fyliuvhb6if6gu7y
Knowledge Enhanced Pretrained Language Models: A Compreshensive Survey
[article]
2021
arXiv
pre-print
We introduce three taxonomies to categorize existing work. Besides, we also survey the various NLU and NLG applications on which KE-PLM has demonstrated superior performance over vanilla PLMs. ...
Pretrained Language Models (PLM) have established a new paradigm through learning informative contextualized representations on large-scale text corpus. ...
To provide insights on these models and facilitate future research, we build three taxonomies to categorize the existing KE-PLMs. ...
arXiv:2110.08455v1
fatcat:b2nw5jdu7neo3brveddmah6mra
TaxoCom: Topic Taxonomy Completion with Hierarchical Discovery of Novel Topic Clusters
[article]
2022
arXiv
pre-print
We propose a novel framework for topic taxonomy completion, named TaxoCom, which recursively expands the topic taxonomy by discovering novel sub-topic clusters of terms and documents. ...
other baselines for a downstream task. ...
In Table 3 , TaxoCom significantly outperforms all the baselines in terms of both the measures. 11 For topic completeness, the weakly supervised methods beat the unsupervised methods by a large margin ...
arXiv:2201.06771v1
fatcat:kqhnz4a2vnb3ncg3mv2rqlddz4
TagRec: Automated Tagging of Questions with Hierarchical Learning Taxonomy
[article]
2021
arXiv
pre-print
We demonstrate that our method helps to handle the unseen labels and hence can be used for taxonomy tagging in the wild. ...
(taxonomy) vector representations. ...
The margin was set to a value of 0.1, which is a fraction of the norm of the embedding vectors (1.0), and it yields the best performance. ...
arXiv:2107.10649v1
fatcat:mltjp7y7xzbzlljkzcq7bekf2q
Analogy-preserving Semantic Embedding for Visual Object Categorization
2013
International Conference on Machine Learning
., according to a given object taxonomy for visual recognition), limiting the influence to pairwise structures. ...
We translate semantic analogies into higher-order geometric constraints called analogical parallelograms, and use them in a novel convex regularizer for a discriminatively learned label embedding. ...
embedding, by a large margin (see dotted circles). ...
dblp:conf/icml/HwangGS13
fatcat:rdscg5msejedlj4xb7wpxmzyly
Hierarchical taxonomy preparation for text categorization using consistent bipartite spectral graph copartitioning
2005
IEEE Transactions on Knowledge and Data Engineering
In this paper, we propose a novel algorithm to automatically mine a hierarchical structure from the flat taxonomy of a data corpus as a preparation for the adoption of hierarchical classification. ...
In particular, we first compute matrices to represent the relations among categories, documents, and terms. ...
, and V B will be the embedding for term clustering. ...
doi:10.1109/tkde.2005.147
fatcat:pt4z6ed3lreorn2taojnlm3shy
MATCH: Metadata-Aware Text Classification in A Large Hierarchy
[article]
2021
arXiv
pre-print
Commonly, the metadata of the given documents and the hierarchy of the labels are available in real-world applications. ...
Multi-label text classification refers to the problem of assigning each given document its most relevant labels from the label set. ...
For our MATCH framework, we set the margin of embedding pre-training 𝛾 = 0.3, number of attention heads 𝑘 = 2, number of [CLS] tokens 𝐶 = 8, number of Transformer layers 𝐿 = 3, and the dropout rate ...
arXiv:2102.07349v2
fatcat:tawngtzaj5d7dnmb43dz2ggxpy
MN-DS: A Multilabeled News Dataset for News Articles Hierarchical Classification
[article]
2023
arXiv
pre-print
We manually labelled the articles based on a hierarchical taxonomy with 17 first-level and 109 second-level categories. ...
This dataset can be used to train machine learning models for automatically classifying news articles by topic. ...
The following embeddings were selected: • Tf-idf embedding, where Tf-idf stands for term frequency-inverse document frequency 15 . ...
arXiv:2212.12061v2
fatcat:z33lyo6kv5ebbexjmj3crl6z74
« Previous
Showing results 1 — 15 out of 10,156 results