Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Filters








955 Hits in 3.6 sec

Data Independent Method of Constructing Distributed LSH for Large-Scale Dynamic High-Dimensional Indexing

Xiaoguang Gu, Lei Zhang, Dongming Zhang, Yongdong Zhang, Jintao Li, Ning Bao
2012 2012 IEEE 14th International Conference on High Performance Computing and Communication & 2012 IEEE 9th International Conference on Embedded Software and Systems  
However, their method needs to estimate a global parameter on the whole dataset beforehand. It is impractical for a large-scale dynamical dataset.  ...  Constructing effective and efficient indexes for explosive growing multimedia data is a very challenging problem.  ...  CONCLUSIONS In this paper, we propose a data independent method of constructing distributed LSH for the large-scale highdimensional multimedia feature sets.  ... 
doi:10.1109/hpcc.2012.82 dblp:conf/hpcc/GuZZZLB12 fatcat:lk7pbjucong25bpxhx24auzj4e

Enhanced Locality Sensitive Clustering in High Dimensional Space

Gang Chen, Hao-Lin Gao, Bi-Cheng Li, Guo-En Hu
2014 Transactions on Electrical and Electronic Materials  
To improve the feasibility of large scale data clustering in high dimensional space we propose an enhanced Locality Sensitive Hashing Clustering Method.  ...  These attributes make it well suited to clustering data in high dimensional space.  ...  ACKNOWLEDGMENT This work was supported by Nature Science Foundation of China No. 60872142.  ... 
doi:10.4313/teem.2014.15.3.125 fatcat:yunvp5r3ijdjfgd26o52cdvsrm

A pivot-based filtering algorithm for enhancing query performance of LSH

Lei Zhang, Xiao-guang Gu, Yong-dong Zhang, Dong-ming Zhang, Jin-tao Li
2011 2011 Visual Communications and Image Processing (VCIP)  
In recent years, Locality Sensitive Hashing (LSH) (and its variant Euclidean LSH) has become a popular index structure for large-scale and high-dimensional similarity search problem.  ...  We also provide a method to get optimal pivot for even larger improvement. Experiments show that our algorithm significantly improves the query performance of LSH. I.  ...  All these variants of LSH are based on the same structure as Euclidean LSH. LSH is efficient to organize and query large-scale and high-dimensional databases.  ... 
doi:10.1109/vcip.2011.6115941 dblp:conf/vcip/ZhangGZZL11 fatcat:f2k5tktnjnbzhg6iwjkcvtp2ua

DB-LSH: Locality-Sensitive Hashing with Query-based Dynamic Bucketing [article]

Yao Tian, Xi Zhao, Xiaofang Zhou
2022 arXiv   pre-print
During the query phase of DB-LSH, a small number of high-quality candidates can be generated efficiently by dynamically constructing query-based hypercubic buckets with the required widths through index-based  ...  To address this dilemma, in this paper we propose a novel LSH scheme called DB-LSH which supports efficient ANN search for large high-dimensional datasets.  ...  However, it is well known that finding the exact NN in large-scale high-dimensional datasets can be very time-consuming.  ... 
arXiv:2207.07823v2 fatcat:bxiico7d7neqtbh5bov53fwrke

LSH At Large - Distributed KNN Search in High Dimensions

Parisa Haghani, Sebastian Michel, Philippe Cudré-Mauroux, Karl Aberer
2008 International Workshop on the Web and Databases  
We consider K-Nearest Neighbor search for high dimensional data in large-scale structured Peer-to-Peer networks.  ...  We report on a comprehensive performance evaluation using high dimensional real-world data, demonstrating the suitability of our approach.  ...  Furthermore, as the data sources are naturally distributed in large-scale networks, traditional centralized indexing technique become impractical.  ... 
dblp:conf/webdb/HaghaniMCA08 fatcat:r3yxlzjhznbd7gdxug5fnzts4e

A Survey on Efficient Processing of Similarity Queries over Neural Embeddings [article]

Yifan Wang
2022 arXiv   pre-print
Then we talk about recent approaches on designing the indexes and operators for highly efficient similarity query processing on top of embeddings (or more generally, high dimensional data).  ...  To measure the similarity between data objects, traditional methods normally work on low level or syntax features(e.g., basic visual features on images or bag-of-word features of text), which makes them  ...  Different from LSH whose hashing functions are data-independent (i.e., selection of the hashing functions is independent from data distribution), learning to hash is a family of data-dependent methods  ... 
arXiv:2204.07922v1 fatcat:u5osyghs6vgppnj5gpnrzhae5y

An improved method of locality sensitive hashing for indexing large-scale and high-dimensional features

Xiaoguang Gu, Yongdong Zhang, Lei Zhang, Dongming Zhang, Jintao Li
2013 Signal Processing  
LSH is originally proposed for resolving the high-dimensional approximate similarity search problem. Until now, many kinds of variations of LSH have been proposed for large-scale indexing.  ...  Much of the interest is focused on improving the query accuracy for skewed data distribution and reducing the storage space.  ...  Conclusion LSH is efficient to index high-dimensional data and its variations can make it index large-scale dataset, thus it has been a popular index structure for large-scale and high-dimensional dataset  ... 
doi:10.1016/j.sigpro.2012.07.014 fatcat:q3sh7npujra5rftvcfzdf6hsqe

Multi-Probe LSH: Efficient Indexing for High-Dimensional Similarity Search

Qin Lv, William Josephson, Zhe Wang, Moses Charikar, Kai Li
2007 Very Large Data Bases Conference  
Similarity indices for high-dimensional data are very desirable for building content-based search systems for featurerich data such as audio, images, videos, and other sensor data.  ...  We have implemented the multi-probe LSH method and evaluated the implementation with two different high-dimensional datasets.  ...  For reasonably large datasets, the index data structure may even fit into main memory. • High-dimensional: The indexing scheme should work well for datasets with very high intrinsic dimensionalities (e.g  ... 
dblp:conf/vldb/LvJWCL07 fatcat:wp3vfusws5dqrhyscgx4i5f6bu

LIDER: An Efficient High-dimensional Learned Index for Large-scale Dense Passage Retrieval [article]

Yifan Wang, Haodi Ma, Daisy Zhe Wang
2022 arXiv   pre-print
In this paper, we propose LIDER, an efficient high-dimensional Learned Index for large-scale DEnse passage Retrieval.  ...  But most of the existing learned indexes are designed for low dimensional data, which are not suitable for dense passage retrieval with high-dimensional dense embeddings.  ...  An effective dimension reduction method for high-dimensional data is locality-sensitive hashing (LSH).  ... 
arXiv:2205.00970v3 fatcat:qs2w6b465zaabco3nq3ff3p3lq

Distributed similarity search in high dimensions using locality sensitive hashing

Parisa Haghani, Sebastian Michel, Karl Aberer
2009 Proceedings of the 12th International Conference on Extending Database Technology Advances in Database Technology - EDBT '09  
We consider mappings from the multi-dimensional LSH bucket space to the linearly ordered set of peers that jointly maintain the indexed data and derive requirements to achieve high quality search results  ...  In this paper we consider distributed K-Nearest Neighbor (KNN) search and range query processing in high dimensional data.  ...  Furthermore, as the data sources are naturally distributed in large-scale networks, traditional centralized indexing techniques become impractical.  ... 
doi:10.1145/1516360.1516446 dblp:conf/edbt/HaghaniMA09 fatcat:45xdtdzmdva6jkwbkplfoloc2u

SANE: Semantic-Aware Namespacein Ultra-Large-Scale File Systems

Yu Hua, Hong jiang, Yifeng Zhu, Dan Feng, Lei Xu
2014 IEEE Transactions on Parallel and Distributed Systems  
To the best of our knowledge, this is the first work on semantic-sensitive namespace management for ultra-scale file systems.  ...  Existing large-scale file systems rely on hierarchically structured namespace that leads to severe performance bottlenecks and renders it impossible to support real-time queries on multi-dimensional attributes  ...  Bounded LSH constructs L hash tables, each of which contains M LSH functions that follow the 2-stable Gaussian distribution for the Euclidean distance.  ... 
doi:10.1109/tpds.2013.140 fatcat:2zd6qygebvhk7fylw56fsy24au

A Survey on Big IoT Data Indexing: Potential Solutions, Recent Advancements, and Open Issues

Zineddine Kouahla, Ala-Eddine Benrazek, Mohamed Amine Ferrag, Brahim Farou, Hamid Seridi, Muhammet Kurulay, Adeel Anjum, Alia Asheralieva
2021 Future Internet  
The purpose of this paper is to examine and review existing indexing techniques for large-scale data.  ...  ., privacy and large-scale data mining, are also discussed.  ...  Thus, several challenging areas of research can serve as a basis for possible future research directions for the indexing of large IoT data.  ... 
doi:10.3390/fi14010019 fatcat:xnlzg7cs2fb3lgng65ha5ucf5m

Streaming similarity search over one billion tweets using parallel locality-sensitive hashing

Narayanan Sundaram, Aizana Turmukhametova, Nadathur Satish, Todd Mostak, Piotr Indyk, Samuel Madden, Pradeep Dubey
2013 Proceedings of the VLDB Endowment  
One popular algorithm for similarity search, especially for high dimensional data (where spatial indexes like kdtrees do not perform well) is Locality Sensitive Hashing (LSH), an approximation algorithm  ...  In this paper, we describe a new variant of LSH, called Parallel LSH (PLSH) designed to be extremely efficient, capable of scaling out on multiple nodes and multiple cores, and which supports highthroughput  ...  ACKNOWLEDGEMENTS This work was supported by a grant from Intel, as a part of the Intel Science and Technology Center in Big Data (ISTC-BD).  ... 
doi:10.14778/2556549.2556574 fatcat:z7c2qdi2lvewlkamfphubde7ky

FLASH: Randomized Algorithms Accelerated over CPU-GPU for Ultra-High Dimensional Similarity Search [article]

Yiqiu Wang, Anshumali Shrivastava, Jonathan Wang, Junghee Ryu
2018 arXiv   pre-print
We present FLASH (Fast LSH Algorithm for Similarity search accelerated with HPC), a similarity search system for ultra-high dimensional datasets on a single machine, that does not require similarity computations  ...  and is tailored for high-performance computing platforms.  ...  We would like to thank anonymous reviewers and Rasmus Pagh for discussions on the role of correlations in Theorem 1.  ... 
arXiv:1709.01190v2 fatcat:vic3h5gjnbfnppyruoc4io6rsu

NV-Tree: An Efficient Disk-Based Index for Approximate Search in Very Large High-Dimensional Collections

Herwig Lejsek, Friðrik Heiðar Ásmundsson, Björn Þór Jónsson, Laurent Amsaleg
2009 IEEE Transactions on Pattern Analysis and Machine Intelligence  
collections of high-dimensional data.  ...  Over the last two decades, much research effort has been spent on nearest neighbor search in high-dimensional data sets.  ...  The authors would like to thank Morgunblaðið for the use of their large picture collection, the authors of LSH and SIFT for giving them access to their implementations, and the anonymous reviewers for  ... 
doi:10.1109/tpami.2008.130 pmid:19299861 fatcat:7ofvu7ri2jfcjogpziwvldia4u
« Previous Showing results 1 — 15 out of 955 results