Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Filters








104 Hits in 4.1 sec

A pivot-based filtering algorithm for enhancing query performance of LSH

Lei Zhang, Xiao-guang Gu, Yong-dong Zhang, Dong-ming Zhang, Jin-tao Li
2011 2011 Visual Communications and Image Processing (VCIP)  
In this paper, we analyze a phenomenon we called "Non-Uniform" that degrades the query performance of LSH and propose a pivot-based algorithm to improve the query performance.  ...  We also provide a method to get optimal pivot for even larger improvement. Experiments show that our algorithm significantly improves the query performance of LSH. I.  ...  However, when using LSH for query, a final filtering process based on exact similarity measure is needed.  ... 
doi:10.1109/vcip.2011.6115941 dblp:conf/vcip/ZhangGZZL11 fatcat:f2k5tktnjnbzhg6iwjkcvtp2ua

A fast audio similarity retrieval method for millions of music tracks

Dominik Schnitzer, Arthur Flexer, Gerhard Widmer
2010 Multimedia tools and applications  
We present a filter-and-refine method to speed up nearest neighbor searches with the Kullback-Leibler divergence for multivariate Gaussians.  ...  Overall the method accelerates the search for similar music pieces by a factor of 10-30 and yields high recall values of 95-99% compared to a standard linear search.  ...  For very high-dimensional data Locality Sensitive Hashing (LSH, [1] ) should be used as the afore mentioned algorithms are likely to perform worse or equal than a linear scan with very high dimensional  ... 
doi:10.1007/s11042-010-0679-8 fatcat:bjqi4ujejbgtlm4b4uzhaxz4fa

Double Distance-Calculation-Pruning for Similarity Search

Ives Pola, Fernanda Pola, Danilo Eler
2018 Information  
In this paper, we propose a generic concept that uses both lower and upper bound properties based on the Metric Spaces Theory to increase the avoidance of element comparisons.  ...  We analyzed the prunability power increase and show an example of its application on classical join nested loops algorithms.  ...  Acknowledgments: The authors would like to thank FAPESP (Fundação de Amparo à Pesquisa do Estado de São Paulo) for the financial support.  ... 
doi:10.3390/info9050124 fatcat:ix6dyzasebenridcib3jx2ifqq

Off the Beaten Path

Leonid Boytsov, David Novak, Yury Malkov, Eric Nyberg
2016 Proceedings of the 25th ACM International on Conference on Information and Knowledge Management - CIKM '16  
While an exact brute-force k-NN search using this similarity function is slow, we demonstrate that an approximate algorithm can be nearly two orders of magnitude faster at the expense of only a small loss  ...  We address this issue by replacing the term-based search with a generic k-NN retrieval algorithm, where a similarity function can take into account subtle term associations.  ...  We also thank Di Wang for helping with a Lucene baseline; Chris Dyer for a discussion of IBM Model 1 efficiency; Yoav Goldberg, Manaal Faruqui, Chenyan Xiong, Ruey-Cheng Chen for discussions related to  ... 
doi:10.1145/2983323.2983815 dblp:conf/cikm/BoytsovNMN16 fatcat:7u24e5nm6fev3ni2rppgxfl4cm

A Survey on Efficient Processing of Similarity Queries over Neural Embeddings [article]

Yifan Wang
2022 arXiv   pre-print
Similarity query is the family of queries based on some similarity metrics.  ...  Then we talk about recent approaches on designing the indexes and operators for highly efficient similarity query processing on top of embeddings (or more generally, high dimensional data).  ...  In short, any query looking for answers based on similarity between records instead of exact value match is a similarity query.  ... 
arXiv:2204.07922v1 fatcat:u5osyghs6vgppnj5gpnrzhae5y

A Multi-View Deep Learning Approach for Cross Domain User Modeling in Recommendation Systems

Ali Mamdouh Elkahky, Yang Song, Xiaodong He
2015 Proceedings of the 24th International Conference on World Wide Web - WWW '15  
Results indicate that our approach is significantly better than the state-of-the-art algorithms (up to 49% enhancement on existing users and 115% enhancement on new users).  ...  We propose to use a rich feature set to represent users, according to their web browsing history and search queries.  ...  In user collaborative filtering such as [3] , the algorithm computes the similarity between users based on items they liked.  ... 
doi:10.1145/2736277.2741667 dblp:conf/www/ElkahkySH15 fatcat:dbvcoir2qngppc2kqdtxk4kuqi

A Survey of Blocking and Filtering Techniques for Entity Resolution [article]

George Papadakis, Dimitrios Skoutas, Emmanouil Thanos, Themis Palpanas
2020 arXiv   pre-print
knowledge and of Filtering for high similarity thresholds.  ...  This includes large volumes of semi-structured data, which pose challenges not only to the scalability of efficiency techniques, but also to their core assumptions: the requirement of Blocking for schema  ...  MinHash LSH is combined with SN in [92] : when searching for the nearest neighbors of a query entity, the entities in large LSH blocks are sorted via a custom scoring function and, then, a window of fixed  ... 
arXiv:1905.06167v4 fatcat:zoodv75tazg23cfnq4dwfgt6ge

Multimedia Indexing, Search, and Retrieval in Large Databases of Social Networks [chapter]

Theodoros Semertzidis, Dimitrios Rafailidis, Eleftherios Tiakas, Michael G. Strintzis, Petros Daras
2012 Computer Communications and Networks  
This plethora of content created the need for finding the desired media in the social media universe.  ...  Moreover, the diversity of the available content, inspired users to demand and formulate more complicated queries.  ...  Finally, in Fig. 4 , we evaluate the retrieval accuracy of LSH, by performing 1000 top-100 queries (denoted by 100-NN queries) and varying the dimensionality of the SIFT datasets.  ... 
doi:10.1007/978-1-4471-4555-4_3 dblp:series/ccn/SemertzidisRTSD13 fatcat:6l3whv5qcjgshmjwcqh6dgelb4

Blocking and Filtering Techniques for Entity Resolution

George Papadakis, Dimitrios Skoutas, Emmanouil Thanos, Themis Palpanas
2020 ACM Computing Surveys  
In this survey, we review a large number of relevant works under two different but related frameworks: Blocking and Filtering.  ...  For each framework we provide a comprehensive list of the relevant works, discussing them in the greater context. We conclude with the most promising directions for future work in the field.  ...  MinHash LSH is combined with SN in [88] : when searching for the nearest neighbors of a query entity, the entities in large LSH blocks are sorted via a custom scoring function, and then a window of fixed  ... 
doi:10.1145/3377455 fatcat:uuzuuxwwzrfg7cwfwzswdqvklm

Query of CC: Unearthing Large Scale Domain-Specific Knowledge from Public Corpora [article]

Zhaoye Fei, Yunfan Shao, Linyang Li, Zhiyuan Zeng, Conghui He, Hang Yan, Dahua Lin, Xipeng Qiu
2024 arXiv   pre-print
To address this limitation, we propose an efficient data collection method Query of CC based on large language models.  ...  Large language models have demonstrated remarkable potential in various tasks, however, there remains a significant scarcity of open-source models and data for specific domains.  ...  Exploring the potential impact of retriever selection on the quality of collected data might be a pivotal direction for future research.  ... 
arXiv:2401.14624v3 fatcat:rivf3fuewbfr3hzwbk34jmoh2m

A Survey on Big IoT Data Indexing: Potential Solutions, Recent Advancements, and Open Issues

Zineddine Kouahla, Ala-Eddine Benrazek, Mohamed Amine Ferrag, Brahim Farou, Hamid Seridi, Muhammet Kurulay, Adeel Anjum, Alia Asheralieva
2021 Future Internet  
A taxonomy of indexing techniques is proposed to enable researchers to understand and select the techniques that will serve as a basis for designing a new indexing scheme.  ...  The purpose of this paper is to examine and review existing indexing techniques for large-scale data.  ...  Thus, several challenging areas of research can serve as a basis for possible future research directions for the indexing of large IoT data.  ... 
doi:10.3390/fi14010019 fatcat:xnlzg7cs2fb3lgng65ha5ucf5m

Efficient Joinable Table Discovery in Data Lakes: A High-Dimensional Similarity-Based Approach [article]

Yuyang Dong, Kunihiro Takeoka, Chuan Xiao, Masafumi Oyamada
2021 arXiv   pre-print
To efficiently find joinable tables with similarity, we propose a block-and-verify method that utilizes pivot-based filtering.  ...  In this paper, we propose PEXESO, a framework for joinable table discovery in data lakes.  ...  Takuma Nozawa (NEC Corporation) for discussions.  ... 
arXiv:2010.13273v4 fatcat:jg5g4jrqnfhpjelexhzwlk4ecy

Content-Based Image Retrieval by Query Adaptive Search using Hash Codes

Aarthi E, Kaviya S, Keerthika D, Kirithika B
2017 IJARCCE  
This can be achieved by first off offline learning bitwise weights of the hash codes for a various set of predefined linguistics thought categories.  ...  This paper introduces associate approach that allows query-adaptive ranking of the came pictures with equal playing distances to the queries.  ...  Thereafter, a progressive algorithm with adaptive filter technique was proposed for efficient skyline computation in this environment and summarizes the key principles of algorithm into a query routing  ... 
doi:10.17148/ijarcce.2017.6342 fatcat:7f36fh4drfe5tcoar6pj4dmvbu

Hashing for Similarity Search: A Survey [article]

Jingdong Wang, Heng Tao Shen, Jingkuan Song, Jianqiu Ji
2014 arXiv   pre-print
Similarity search (nearest neighbor search) is a problem of pursuing the data items whose distances to a query item are the smallest from a large database.  ...  Various methods have been developed to address this problem, and recently a lot of efforts have been devoted to approximate search.  ...  For the subset, an LSH scheme is conducted. The query process first locates a bucket from outer hash tables for a query. If the bucket is empty, the algorithm stops.  ... 
arXiv:1408.2927v1 fatcat:reknwesjnbafvcbouyudrzp4rq

Beyond Precision: A Study on Recall of Initial Retrieval with Neural Representations [article]

Yan Xiao, Jiafeng Guo, Yixing Fan, Yanyan Lan, Jun Xu, Xueqi Cheng
2018 arXiv   pre-print
Specifically, to meet the efficiency requirement of the initial stage, we introduce a neural index for the neural representations of documents, and propose two hybrid search schemes based on both neural  ...  Vocabulary mismatch is a central problem in information retrieval (IR), i.e., the relevant documents may not contain the same (symbolic) terms of the query.  ...  Given a query, the documents sharing a prespecified number of k-NPs with the query are filtered to compute real distance.  ... 
arXiv:1806.10869v2 fatcat:f7ggl2nnszchzdhqmkupfc63y4
« Previous Showing results 1 — 15 out of 104 results