A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2016; you can also visit the original URL.
The file type is application/pdf
.
Filters
Data Independent Method of Constructing Distributed LSH for Large-Scale Dynamic High-Dimensional Indexing
2012
2012 IEEE 14th International Conference on High Performance Computing and Communication & 2012 IEEE 9th International Conference on Embedded Software and Systems
However, their method needs to estimate a global parameter on the whole dataset beforehand. It is impractical for a large-scale dynamical dataset. ...
Constructing effective and efficient indexes for explosive growing multimedia data is a very challenging problem. ...
CONCLUSIONS In this paper, we propose a data independent method of constructing distributed LSH for the large-scale highdimensional multimedia feature sets. ...
doi:10.1109/hpcc.2012.82
dblp:conf/hpcc/GuZZZLB12
fatcat:lk7pbjucong25bpxhx24auzj4e
Enhanced Locality Sensitive Clustering in High Dimensional Space
2014
Transactions on Electrical and Electronic Materials
To improve the feasibility of large scale data clustering in high dimensional space we propose an enhanced Locality Sensitive Hashing Clustering Method. ...
These attributes make it well suited to clustering data in high dimensional space. ...
ACKNOWLEDGMENT This work was supported by Nature Science Foundation of China No. 60872142. ...
doi:10.4313/teem.2014.15.3.125
fatcat:yunvp5r3ijdjfgd26o52cdvsrm
A pivot-based filtering algorithm for enhancing query performance of LSH
2011
2011 Visual Communications and Image Processing (VCIP)
In recent years, Locality Sensitive Hashing (LSH) (and its variant Euclidean LSH) has become a popular index structure for large-scale and high-dimensional similarity search problem. ...
We also provide a method to get optimal pivot for even larger improvement. Experiments show that our algorithm significantly improves the query performance of LSH. I. ...
All these variants of LSH are based on the same structure as Euclidean LSH. LSH is efficient to organize and query large-scale and high-dimensional databases. ...
doi:10.1109/vcip.2011.6115941
dblp:conf/vcip/ZhangGZZL11
fatcat:f2k5tktnjnbzhg6iwjkcvtp2ua
DB-LSH: Locality-Sensitive Hashing with Query-based Dynamic Bucketing
[article]
2022
arXiv
pre-print
During the query phase of DB-LSH, a small number of high-quality candidates can be generated efficiently by dynamically constructing query-based hypercubic buckets with the required widths through index-based ...
To address this dilemma, in this paper we propose a novel LSH scheme called DB-LSH which supports efficient ANN search for large high-dimensional datasets. ...
However, it is well known that finding the exact NN in large-scale high-dimensional datasets can be very time-consuming. ...
arXiv:2207.07823v2
fatcat:bxiico7d7neqtbh5bov53fwrke
LSH At Large - Distributed KNN Search in High Dimensions
2008
International Workshop on the Web and Databases
We consider K-Nearest Neighbor search for high dimensional data in large-scale structured Peer-to-Peer networks. ...
We report on a comprehensive performance evaluation using high dimensional real-world data, demonstrating the suitability of our approach. ...
Furthermore, as the data sources are naturally distributed in large-scale networks, traditional centralized indexing technique become impractical. ...
dblp:conf/webdb/HaghaniMCA08
fatcat:r3yxlzjhznbd7gdxug5fnzts4e
A Survey on Efficient Processing of Similarity Queries over Neural Embeddings
[article]
2022
arXiv
pre-print
Then we talk about recent approaches on designing the indexes and operators for highly efficient similarity query processing on top of embeddings (or more generally, high dimensional data). ...
To measure the similarity between data objects, traditional methods normally work on low level or syntax features(e.g., basic visual features on images or bag-of-word features of text), which makes them ...
Different from LSH whose hashing functions are data-independent (i.e., selection of the hashing functions is independent from data distribution), learning to hash is a family of data-dependent methods ...
arXiv:2204.07922v1
fatcat:u5osyghs6vgppnj5gpnrzhae5y
An improved method of locality sensitive hashing for indexing large-scale and high-dimensional features
2013
Signal Processing
LSH is originally proposed for resolving the high-dimensional approximate similarity search problem. Until now, many kinds of variations of LSH have been proposed for large-scale indexing. ...
Much of the interest is focused on improving the query accuracy for skewed data distribution and reducing the storage space. ...
Conclusion LSH is efficient to index high-dimensional data and its variations can make it index large-scale dataset, thus it has been a popular index structure for large-scale and high-dimensional dataset ...
doi:10.1016/j.sigpro.2012.07.014
fatcat:q3sh7npujra5rftvcfzdf6hsqe
Multi-Probe LSH: Efficient Indexing for High-Dimensional Similarity Search
2007
Very Large Data Bases Conference
Similarity indices for high-dimensional data are very desirable for building content-based search systems for featurerich data such as audio, images, videos, and other sensor data. ...
We have implemented the multi-probe LSH method and evaluated the implementation with two different high-dimensional datasets. ...
For reasonably large datasets, the index data structure may even fit into main memory. • High-dimensional: The indexing scheme should work well for datasets with very high intrinsic dimensionalities (e.g ...
dblp:conf/vldb/LvJWCL07
fatcat:wp3vfusws5dqrhyscgx4i5f6bu
LIDER: An Efficient High-dimensional Learned Index for Large-scale Dense Passage Retrieval
[article]
2022
arXiv
pre-print
In this paper, we propose LIDER, an efficient high-dimensional Learned Index for large-scale DEnse passage Retrieval. ...
But most of the existing learned indexes are designed for low dimensional data, which are not suitable for dense passage retrieval with high-dimensional dense embeddings. ...
An effective dimension reduction method for high-dimensional data is locality-sensitive hashing (LSH). ...
arXiv:2205.00970v3
fatcat:qs2w6b465zaabco3nq3ff3p3lq
Distributed similarity search in high dimensions using locality sensitive hashing
2009
Proceedings of the 12th International Conference on Extending Database Technology Advances in Database Technology - EDBT '09
We consider mappings from the multi-dimensional LSH bucket space to the linearly ordered set of peers that jointly maintain the indexed data and derive requirements to achieve high quality search results ...
In this paper we consider distributed K-Nearest Neighbor (KNN) search and range query processing in high dimensional data. ...
Furthermore, as the data sources are naturally distributed in large-scale networks, traditional centralized indexing techniques become impractical. ...
doi:10.1145/1516360.1516446
dblp:conf/edbt/HaghaniMA09
fatcat:45xdtdzmdva6jkwbkplfoloc2u
SANE: Semantic-Aware Namespacein Ultra-Large-Scale File Systems
2014
IEEE Transactions on Parallel and Distributed Systems
To the best of our knowledge, this is the first work on semantic-sensitive namespace management for ultra-scale file systems. ...
Existing large-scale file systems rely on hierarchically structured namespace that leads to severe performance bottlenecks and renders it impossible to support real-time queries on multi-dimensional attributes ...
Bounded LSH constructs L hash tables, each of which contains M LSH functions that follow the 2-stable Gaussian distribution for the Euclidean distance. ...
doi:10.1109/tpds.2013.140
fatcat:2zd6qygebvhk7fylw56fsy24au
A Survey on Big IoT Data Indexing: Potential Solutions, Recent Advancements, and Open Issues
2021
Future Internet
The purpose of this paper is to examine and review existing indexing techniques for large-scale data. ...
., privacy and large-scale data mining, are also discussed. ...
Thus, several challenging areas of research can serve as a basis for possible future research directions for the indexing of large IoT data. ...
doi:10.3390/fi14010019
fatcat:xnlzg7cs2fb3lgng65ha5ucf5m
Streaming similarity search over one billion tweets using parallel locality-sensitive hashing
2013
Proceedings of the VLDB Endowment
One popular algorithm for similarity search, especially for high dimensional data (where spatial indexes like kdtrees do not perform well) is Locality Sensitive Hashing (LSH), an approximation algorithm ...
In this paper, we describe a new variant of LSH, called Parallel LSH (PLSH) designed to be extremely efficient, capable of scaling out on multiple nodes and multiple cores, and which supports highthroughput ...
ACKNOWLEDGEMENTS This work was supported by a grant from Intel, as a part of the Intel Science and Technology Center in Big Data (ISTC-BD). ...
doi:10.14778/2556549.2556574
fatcat:z7c2qdi2lvewlkamfphubde7ky
FLASH: Randomized Algorithms Accelerated over CPU-GPU for Ultra-High Dimensional Similarity Search
[article]
2018
arXiv
pre-print
We present FLASH (Fast LSH Algorithm for Similarity search accelerated with HPC), a similarity search system for ultra-high dimensional datasets on a single machine, that does not require similarity computations ...
and is tailored for high-performance computing platforms. ...
We would like to thank anonymous reviewers and Rasmus Pagh for discussions on the role of correlations in Theorem 1. ...
arXiv:1709.01190v2
fatcat:vic3h5gjnbfnppyruoc4io6rsu
NV-Tree: An Efficient Disk-Based Index for Approximate Search in Very Large High-Dimensional Collections
2009
IEEE Transactions on Pattern Analysis and Machine Intelligence
collections of high-dimensional data. ...
Over the last two decades, much research effort has been spent on nearest neighbor search in high-dimensional data sets. ...
The authors would like to thank Morgunblaðið for the use of their large picture collection, the authors of LSH and SIFT for giving them access to their implementations, and the anonymous reviewers for ...
doi:10.1109/tpami.2008.130
pmid:19299861
fatcat:7ofvu7ri2jfcjogpziwvldia4u
« Previous
Showing results 1 — 15 out of 955 results