Data Independent Method of Constructing Distributed LSH for Large-Scale Dynamic High-Dimensional Indexing.

However, their method needs to estimate a global parameter on the whole dataset beforehand. It is impractical for a large-scale dynamical dataset. ... Constructing effective and efficient indexes for explosive growing multimedia data is a very challenging problem. ... CONCLUSIONS In this paper, we propose a data independent method of constructing distributed LSH for the large-scale highdimensional multimedia feature sets. ...

doi:10.1109/hpcc.2012.82 dblp:conf/hpcc/GuZZZLB12 fatcat:lk7pbjucong25bpxhx24auzj4e

Citation

Xiaoguang Gu, Lei Zhang, Dongming Zhang, Yongdong Zhang, Jintao Li, Ning Bao. "Data Independent Method of Constructing Distributed LSH for Large-Scale Dynamic High-Dimensional Indexing." 2012 IEEE 14th International Conference on High Performance Computing and Communication & 2012 IEEE 9th International Conference on Embedded Software and Systems (2012) 564-571

To improve the feasibility of large scale data clustering in high dimensional space we propose an enhanced Locality Sensitive Hashing Clustering Method. ... These attributes make it well suited to clustering data in high dimensional space. ... ACKNOWLEDGMENT This work was supported by Nature Science Foundation of China No. 60872142. ...

doi:10.4313/teem.2014.15.3.125 fatcat:yunvp5r3ijdjfgd26o52cdvsrm

Open Access

In recent years, Locality Sensitive Hashing (LSH) (and its variant Euclidean LSH) has become a popular index structure for large-scale and high-dimensional similarity search problem. ... We also provide a method to get optimal pivot for even larger improvement. Experiments show that our algorithm significantly improves the query performance of LSH. I. ... All these variants of LSH are based on the same structure as Euclidean LSH. LSH is efficient to organize and query large-scale and high-dimensional databases. ...

doi:10.1109/vcip.2011.6115941 dblp:conf/vcip/ZhangGZZL11 fatcat:f2k5tktnjnbzhg6iwjkcvtp2ua

During the query phase of DB-LSH, a small number of high-quality candidates can be generated efficiently by dynamically constructing query-based hypercubic buckets with the required widths through index-based ... To address this dilemma, in this paper we propose a novel LSH scheme called DB-LSH which supports efficient ANN search for large high-dimensional datasets. ... However, it is well known that finding the exact NN in large-scale high-dimensional datasets can be very time-consuming. ...

arXiv:2207.07823v2 fatcat:bxiico7d7neqtbh5bov53fwrke

Multiple Versions

We consider K-Nearest Neighbor search for high dimensional data in large-scale structured Peer-to-Peer networks. ... We report on a comprehensive performance evaluation using high dimensional real-world data, demonstrating the suitability of our approach. ... Furthermore, as the data sources are naturally distributed in large-scale networks, traditional centralized indexing technique become impractical. ...

dblp:conf/webdb/HaghaniMCA08 fatcat:r3yxlzjhznbd7gdxug5fnzts4e

Then we talk about recent approaches on designing the indexes and operators for highly efficient similarity query processing on top of embeddings (or more generally, high dimensional data). ... To measure the similarity between data objects, traditional methods normally work on low level or syntax features(e.g., basic visual features on images or bag-of-word features of text), which makes them ... Different from LSH whose hashing functions are data-independent (i.e., selection of the hashing functions is independent from data distribution), learning to hash is a family of data-dependent methods ...

arXiv:2204.07922v1 fatcat:u5osyghs6vgppnj5gpnrzhae5y

LSH is originally proposed for resolving the high-dimensional approximate similarity search problem. Until now, many kinds of variations of LSH have been proposed for large-scale indexing. ... Much of the interest is focused on improving the query accuracy for skewed data distribution and reducing the storage space. ... Conclusion LSH is efficient to index high-dimensional data and its variations can make it index large-scale dataset, thus it has been a popular index structure for large-scale and high-dimensional dataset ...

doi:10.1016/j.sigpro.2012.07.014 fatcat:q3sh7npujra5rftvcfzdf6hsqe

Similarity indices for high-dimensional data are very desirable for building content-based search systems for featurerich data such as audio, images, videos, and other sensor data. ... We have implemented the multi-probe LSH method and evaluated the implementation with two different high-dimensional datasets. ... For reasonably large datasets, the index data structure may even fit into main memory. • High-dimensional: The indexing scheme should work well for datasets with very high intrinsic dimensionalities (e.g ...

dblp:conf/vldb/LvJWCL07 fatcat:wp3vfusws5dqrhyscgx4i5f6bu

In this paper, we propose LIDER, an efficient high-dimensional Learned Index for large-scale DEnse passage Retrieval. ... But most of the existing learned indexes are designed for low dimensional data, which are not suitable for dense passage retrieval with high-dimensional dense embeddings. ... An effective dimension reduction method for high-dimensional data is locality-sensitive hashing (LSH). ...

arXiv:2205.00970v3 fatcat:qs2w6b465zaabco3nq3ff3p3lq

Multiple Versions

We consider mappings from the multi-dimensional LSH bucket space to the linearly ordered set of peers that jointly maintain the indexed data and derive requirements to achieve high quality search results ... In this paper we consider distributed K-Nearest Neighbor (KNN) search and range query processing in high dimensional data. ... Furthermore, as the data sources are naturally distributed in large-scale networks, traditional centralized indexing techniques become impractical. ...

doi:10.1145/1516360.1516446 dblp:conf/edbt/HaghaniMA09 fatcat:45xdtdzmdva6jkwbkplfoloc2u

To the best of our knowledge, this is the first work on semantic-sensitive namespace management for ultra-scale file systems. ... Existing large-scale file systems rely on hierarchically structured namespace that leads to severe performance bottlenecks and renders it impossible to support real-time queries on multi-dimensional attributes ... Bounded LSH constructs L hash tables, each of which contains M LSH functions that follow the 2-stable Gaussian distribution for the Euclidean distance. ...

doi:10.1109/tpds.2013.140 fatcat:2zd6qygebvhk7fylw56fsy24au

The purpose of this paper is to examine and review existing indexing techniques for large-scale data. ... ., privacy and large-scale data mining, are also discussed. ... Thus, several challenging areas of research can serve as a basis for possible future research directions for the indexing of large IoT data. ...

doi:10.3390/fi14010019 fatcat:xnlzg7cs2fb3lgng65ha5ucf5m

DOAJ Szczepanski

One popular algorithm for similarity search, especially for high dimensional data (where spatial indexes like kdtrees do not perform well) is Locality Sensitive Hashing (LSH), an approximation algorithm ... In this paper, we describe a new variant of LSH, called Parallel LSH (PLSH) designed to be extremely efficient, capable of scaling out on multiple nodes and multiple cores, and which supports highthroughput ... ACKNOWLEDGEMENTS This work was supported by a grant from Intel, as a part of the Intel Science and Technology Center in Big Data (ISTC-BD). ...

doi:10.14778/2556549.2556574 fatcat:z7c2qdi2lvewlkamfphubde7ky

We present FLASH (Fast LSH Algorithm for Similarity search accelerated with HPC), a similarity search system for ultra-high dimensional datasets on a single machine, that does not require similarity computations ... and is tailored for high-performance computing platforms. ... We would like to thank anonymous reviewers and Rasmus Pagh for discussions on the role of correlations in Theorem 1. ...

arXiv:1709.01190v2 fatcat:vic3h5gjnbfnppyruoc4io6rsu

Multiple Versions

collections of high-dimensional data. ... Over the last two decades, much research effort has been spent on nearest neighbor search in high-dimensional data sets. ... The authors would like to thank Morgunblaðið for the use of their large picture collection, the authors of LSH and SIFT for giving them access to their implementations, and the anonymous reviewers for ...

doi:10.1109/tpami.2008.130 pmid:19299861 fatcat:7ofvu7ri2jfcjogpziwvldia4u

Data Independent Method of Constructing Distributed LSH for Large-Scale Dynamic High-Dimensional Indexing

Preserved Fulltext

Enhanced Locality Sensitive Clustering in High Dimensional Space

Preserved Fulltext

A pivot-based filtering algorithm for enhancing query performance of LSH

Preserved Fulltext

DB-LSH: Locality-Sensitive Hashing with Query-based Dynamic Bucketing [article]

Preserved Fulltext

Other Versions

LSH At Large - Distributed KNN Search in High Dimensions

Preserved Fulltext

A Survey on Efficient Processing of Similarity Queries over Neural Embeddings [article]

Preserved Fulltext

An improved method of locality sensitive hashing for indexing large-scale and high-dimensional features

Preserved Fulltext

Multi-Probe LSH: Efficient Indexing for High-Dimensional Similarity Search

Preserved Fulltext

LIDER: An Efficient High-dimensional Learned Index for Large-scale Dense Passage Retrieval [article]

Preserved Fulltext

Other Versions

Distributed similarity search in high dimensions using locality sensitive hashing

Preserved Fulltext

SANE: Semantic-Aware Namespacein Ultra-Large-Scale File Systems

Preserved Fulltext

A Survey on Big IoT Data Indexing: Potential Solutions, Recent Advancements, and Open Issues

Preserved Fulltext

Streaming similarity search over one billion tweets using parallel locality-sensitive hashing

Preserved Fulltext

FLASH: Randomized Algorithms Accelerated over CPU-GPU for Ultra-High Dimensional Similarity Search [article]

Preserved Fulltext

Other Versions

NV-Tree: An Efficient Disk-Based Index for Approximate Search in Very Large High-Dimensional Collections

Preserved Fulltext