General formulation and evaluation of agglomerative clustering methods with metric and non-metric distances.

This method avoids the need to compute the distance of each data object to the cluster center. It saves running time. ... Hierarchical clustering is the grouping of objects of interest according to their similarity into a hierarchy, with different levels reflecting the degree of inter-object resemblance. ... Each level of a dendrogram can be evaluated by a cluster validation method and the best level and its corresponding clusters are returned. HAC algorithms are non-parametric. ...

doi:10.1109/icprime.2013.6496461 fatcat:djw3tm7tkne4hme7svhjilppou

Applying the approach to data on the outside collaborations of the Chinese Academy of Sciences and visualizing the results reveals interesting structure relevant for science policy decisions. ... about how they collaborate with each other. ... Since it is a metric, techniques wellfounded in a metric space, such as agglomerative hierarchical clustering with Ward's method, can be brought to bear. ...

doi:10.1117/12.812383 dblp:conf/vda/Duhon09 fatcat:nbrp6lpy45cnvmwipzs5mjygzi

We survey agglomerative hierarchical clustering algorithms and discuss efficient implementations that are available in R and other software environments. ... We look at hierarchical self-organizing maps, and mixture models. We review grid-based clustering, focusing on hierarchical densitybased approaches. ... They also address the question of metrics: results are valid in a wide class of distances including those associated with the Minkowski metrics. ...

doi:10.1002/widm.1219 fatcat:4cdvfpypibe3petriqyuaiunk4

In this work, we present an approach for evaluating segmentation strategies and solving the biological problem of creating robust interpretable maps of biological data by employing wards agglomerative ... Finally, we find that the cluster representations and label annotations, in the case with clusters of high immersiveness, correspond to compositionally inferred labels with the highest specificity. ... Figure 3 : Synthetic data example employing agglomerative hierarchical clustering and complete linkage (maximum distance) method for cluster distance attribution. ...

arXiv:2402.06928v1 fatcat:2opogdohifckre2zc2x77wyicu

Open Access

It is assumed that the data set, to begin with, is endowed with a metric, and we include discussion of how this can be brought about if a dissimilarity, only, holds. ... The basis for part of the metric-endowed data set being ultrametric is to consider triplets of the observables (vectors). We develop a novel consensus of hierarchical clusterings. ... Hierarchical agglomerative clustering algorithms are a general and widely-used class of algorithm for inducing an ultrametric on dissimilarity or distance input, or coordinate data on which a metric or ...

arXiv:1309.3611v1 fatcat:4sunhvuhwvcihgkp2nvspa73ua

In this paper, we propose a novel semi-supervised hierarchical clustering framework based on ultra-metric dendrogram distance. ... Semi-supervised clustering (i.e., clustering with knowledge-based constraints) has emerged as an important variant of the traditional clustering paradigms. ... Based on the way the clusters are generated, clustering methods can be divided into two categories: partitional clustering and hierarchical clustering [2] [3] . ...

doi:10.1109/icdm.2011.130 dblp:conf/icdm/ZhengL11 fatcat:7kwlsml3u5gkjcjc5zxcoaw5z4

In this paper, we propose a novel semi-supervised hierarchical clustering framework based on ultra-metric dendrogram distance. ... Semi-supervised clustering (i.e., clustering with knowledge-based constraints) has emerged as an important variant of the traditional clustering paradigms. ... Based on the way the clusters are generated, clustering methods can be divided into two categories: partitional clustering and hierarchical clustering [2] [3] . ...

doi:10.1007/978-3-642-31900-6_39 fatcat:misnzmthgnevdlrxm7g3g5k2wi

traditional speaker clustering methods based on the "bag of acoustic features" representation and statistical model based distance metrics, 2) our advocated use of the cosine distance metric yields consistent ... Our speaker clustering experiments on the GALE database clearly indicate that 1) our speaker clustering methods based on the GMM mean supervector representation and vector-based distance metrics outperform ... These two metrics are standard for evaluating (general) data clustering results [42] . ...

doi:10.1109/tpami.2011.174 pmid:21844626 fatcat:g7m7wki6pvcb3gotxydj4k6ewq

K-modes is a non-parametric clustering algorithm suitable for handling categorical data and optimizes a matching metric (L 0 loss function) without using any explicit distance metric. ... Ward's method chooses the initial centroids by using the sum of squared errors to evaluate the distance between two clusters. ...

doi:10.1201/9781315373515-4 fatcat:nv3tftuhyzcbdfi6g7invscl5u

We introduce a new test distance metric based on hypergraphs and evaluate their accuracy using multi-fault benchmarks that we have built on top of Defects4J and SIR. ... Results show that our technique, Hybiscus, can automatically achieve perfect clustering (i.e., the same number of clusters as the ground truth number of root causes, with all failing tests with the same ... Our empirical evaluation shows that, when used with Agglomerative Hierarchical Clustering (AHC) and a distance-based estimation of cluster numbers, Hybiscus can significantly outperform other failure clustering ...

arXiv:2104.10360v1 fatcat:cfa7r6wsonaphcwzoif5hxp5fy

We present methods for automatically generating a diverse set of alternate clusterings, as well as methods for grouping clusterings into meta clusters. ... We evaluate meta clustering on four test problems and two case studies. Surprisingly, clusterings that would be of most interest to users often are not very compact clusterings. ... Pedro Artigas, Anna Goldenberg, and Anton Likhodedov helped with early experiments in meta clustering as part of a class project at CMU. ...

doi:10.1109/icdm.2006.103 dblp:conf/icdm/CaruanaENS06 fatcat:t7fij6li3rdmhh23ulwwkx7yfq

We compare the results of many well-known clustering algorithms such ask-means, HDBSCAN, GMM and Agglomerative Hierarchical Clustering when they operate on the low-dimension feature space yielded by UMAP ... A series of experiments on several image datasets demonstrate that the proposed method allows each of the clustering algorithms studied to improve its performance on each dataset considered. ... Evaluation Metrics In order to validate the performance of unsupervised clustering algorithms, we use the two standard evaluation metrics, accuracy (ACC) and Normalized Mutual Information (NMI). ...

doi:10.1007/978-3-030-51935-3_34 fatcat:6yrc4jamwne7nhisg5mod4k3te

As a part of it, we propose the tasks of Russian news event detection, headline selection, and headline generation. These tasks are accompanied by datasets and baselines. ... This paper presents the results of the Russian News Clustering and Headline Selection shared task. ... Acknowledgements We would like to thank the participants of all three tracks, especially Tatiana Shavrina, Ivan Bondarenko, and Nikita Yudin for helpful comments and valuable suggestions. ...

arXiv:2105.00981v3 fatcat:6oiewmaj7rd37ephovhlxeufsu

Multiple Versions

This paper mainly presents some technical discussions on the identification and analyze of "LAN usersessions". The identification of a user-session is non trivial. ... We have defined a clustering based approach in detail, and also we discussed positive and negative of this approach, and we apply it to real traffic traces. ... To efficiently position, in our unidimensional metric space, the representatives at procedure start-up, we evaluate the distance between any two adjacent samples .According to the distance metric, we take ...

doi:10.5121/ijcseit.2012.2604 fatcat:6qkz4b7z7vhfhouftjgkcusiqa

This leads to a very powerful yet elegant method for bottomup agglomerative clustering with strong theoretical guarantees. ... We introduce albedo and reflectance fields as features for the distance computations. We compare against other established methods to bring out possible pros and cons of the proposed method. ... A number of works, in the recent past, present agglomerative schemes for clustering with exponential families. ...

doi:10.1109/cvpr.2015.7298886 dblp:conf/cvpr/GuptaSMA15 fatcat:os3gvi6efvg3xgljdclfkzmnwa

Estimating incremental dimensional algorithm with sequence data set

Preserved Fulltext

Understanding outside collaborations of the Chinese Academy of Sciences using Jensen-Shannon divergence

Preserved Fulltext

Algorithms for hierarchical clustering: an overview, II

Preserved Fulltext

Happy and Immersive Clustering Segmentations of Biological Co-Expression Patterns [article]

Preserved Fulltext

Ultrametric Component Analysis with Application to Analysis of Text and of Emotion [article]

Preserved Fulltext

Semi-supervised Hierarchical Clustering

Preserved Fulltext

Semi-supervised Hierarchical Co-clustering [chapter]

Preserved Fulltext

Partially Supervised Speaker Clustering

Preserved Fulltext

A Survey of Partitional and Hierarchical Clustering Algorithms [chapter]

Preserved Fulltext

Improving Test Distance for Failure Clustering with Hypergraph Modelling [article]

Preserved Fulltext

Meta Clustering

Preserved Fulltext

Considerably Improving Clustering Algorithms Using UMAP Dimensionality Reduction Technique: A Comparative Study [chapter]

Preserved Fulltext

Russian News Clustering and Headline Selection Shared Task [article]

Preserved Fulltext

Other Versions

Identification and Investigation of the User Session for Lan Connectivity Via Enhanced Partition Approach of Clustering Techniques

Preserved Fulltext

KL divergence based agglomerative clustering for automated Vitiligo grading

Preserved Fulltext