Abstract
We propose a new data structure to search in metric spaces. A metric space is formed by a collection of objects and a distance function defined among them which satisfies the triangle inequality. The goal is, given a set of objects and a query, retrieve those objects close enough to the query. The complexity measure is the number of distances computed to achieve this goal. Our data structure, called sa-tree (“spatial approximation tree”), is based on approaching the searched objects spatially, that is, getting closer and closer to them, rather than the classic divide-and-conquer approach of other data structures. We analyze our method and show that the number of distance evaluations to search among n objects is sublinear. We show experimentally that the sa-tree is the best existing technique when the metric space is hard to search or the query has low selectivity. These are the most important unsolved cases in real applications. As a practical advantage, our data structure is one of the few that does not need to tune parameters, which makes it appealing for use by non-experts.
- 1. Aurenhammer F (1991) Voronoi diagrams - a survey of a fundamental geometric data structure. ACM Comput Surv 23(3):345- 405. Google Scholar
- 2. Bentley J (1975) Multidimensional binary search trees used for associative searching. Comm ACM 18(9):509-517. Google Scholar
- 3. Bentley J (1979) Multidimensional binary search trees in database applications. IEEE Trans Software Eng 5(4):333-340.Google Scholar
- 4. Burkhard W, Keller R (1973) Some approaches to best-match file searching. Comm ACM 16(4):230-236. Google Scholar
- 5. Bozkaya T, Ozsoyoglu M (1997) Distance-based indexing for high-dimensional metric spaces. In Proc. ACM Conference on Management of Data (SIGMOD'97), Sigmod Rec 26(2):357- 368. Google Scholar
- 6. Brin S (1995) Near neighbor search in large metric spaces. In Proc. 21st Conference on Very Large Databases (VLDB'95), pp 574-584. Google Scholar
- 7. Baeza-Yates R, Cunto W, Manber U, Wu S (1994) Proximity matching using fixed-queries trees. In Proc. 5th Conference on Combinatorial Pattern Matching (CPM'94), Lecture Notes in Computer Science, vol. 807. Springer, Berlin Heidelberg New York, pp 198-212. Google Scholar
- 8. Baeza-Yates R, Ribeiro-Neto B (1999) Modern information retrieval. Addison-Wesley, Reading, Mass., USA. Google Scholar
- 9. Chávez E, Marroquín J (1997) Proximity queries in metric spaces. In Proc. 4th South American Workshop on String Processing (WSP'97), pp 21-36. Carleton University.Google Scholar
- 10. Chávez E, Marroquín J, Baeza-Yates R (1999) Spaghettis: an array-based algorithm for similarity queries in metric spaces. In Proc. 6th South American Symposium on String Processing and Information Retrieval (SPIRE'99), pp 38-46. IEEE, New York. Google Scholar
- 11. Chávez E, Marroquín J, Navarro G (2001) Fixed queries array: a fast and economical data structure for proximity searching. Multimedia Tools Appl 14(2):113-135. Google Scholar
- 12. Chávez E, Navarro G (2000) An effective clustering algorithm to index high dimensional metric spaces. In Proc. 7th South American Symposium on String Processing and Information Retrieval (SPIRE'00), pp 75-86. IEEE, New York. Google Scholar
- 13. Chávez E, Navarro G (2001) A probabilistic spell for the curse of dimensionality. In Proc. 3rd Workshop on Algorithm Engineering and Experiments (ALENEX'01), pp 147-160, Lecture Notes in Computer Science, vol. 2153. Springer, Berlin Heidelberg New York.Google Scholar
- 14. Chávez E, Navarro G, Baeza-Yates R, Marroquín J (2001) Searching in metric spaces. ACM Comput Surv 33(3):273-321. Google Scholar
- 15. Ciaccia P, Patella M, Zezula P (1997) M-tree: an efficient access method for similarity search in metric spaces. In Proc. 23rd Conference on Very Large Databases (VLDB'97), pp 426-435. Google Scholar
- 16. Dehne F, Noltemeier H (1987) Voronoi trees and clustering problems. Inf Syst 12(2):171-175. Google Scholar
- 17. Guttman A (1984) R-trees: a dynamic index structure for spatial searching. In Proc. ACM Conference on Management of Data (SIGMOD'84), pp 47-57. Google Scholar
- 18. Harman D (1995) Overview of the third text retrieval conference. In: Proc. 3rd Text Retrieval Conference (TREC-3), pp 1-19. NIST Special Publication 500-207.Google Scholar
- 19. Hjaltason G, Samet H (1999) Distance browsing in spatial databases. ACM Trans Database Syst 24(2):265-318. Google Scholar
- 20. Micó L, Oncina J, Carrasco R (1996) A fast branch and bound nearest neighbor classifier in metric spaces. Pattern Recognition Lett 17:731-739. Google Scholar
- 21. Micó L, Oncina J, Vidal E (1994) A new version of the nearest-neighbor approximating and eliminating search (aesa) with linear preprocessing-time and memory requirements. Pattern Recognition Lett 15:9-17. Google Scholar
- 22. Navarro G (1999) Searching in metric spaces by spatial approx - imation. In Proc. 6th South American Symposium on String Processing and Information Retrieval (SPIRE'99), pp 141-148. IEEE, New York. Google Scholar
- 23. Navarro G (2001) A guided tour to approximate string matching. ACM Comput Surv 33(1):31-88. Google Scholar
- 24. Nene S, Nayar S (1997) A simple algorithm for nearest neighbor search in high dimensions. IEEE Trans Pattern Anal Mach Intell 19(9):989-1003. Google Scholar
- 25. Noltemeier H (1989) Voronoi trees and applications. In Proc. International Workshop on Discrete Algorithms and Complexity, pp 69-74.Google Scholar
- 26. Navarro G, Reyes N (2001) Dynamic spatial approximation trees. In Proc. XXI Conference of the Chilean Computer Science Society (SCCC'01). IEEE, New York, pp 213-222.Google Scholar
- 27. Noltemeier H, Verbarg K, Zirkelbach C (1992) Monotonous Bisector* Trees-a tool for efficient partitioning of complex schenes of geometric objects. In: Data structures and efficient algorithms, Lecture Notes in Computer Science, vol. 594. Springer, Berlin Heidelberg New York, pp 186-203. Google Scholar
- 28. Reyes N (2001) Dynamic data structures for searching metric spaces. MSc. Thesis, Univ. Nac. de San Luis, Argentina. In progress. Advisor: Navarro G.Google Scholar
- 29. Shapiro M (1977) The choice of reference points in best-match file searching. Comm ACM 20(5):339-343. Google Scholar
- 30. Uhlmann J (1991) Implementing metric trees to satisfy general proximity/similarity queries. Manuscript.Google Scholar
- 31. Uhlmann J (1991) Satisfying general proximity/similarity queries with metric trees. Inf Process Lett 40:175-179.Google Scholar
- 32. Vidal E (1986) An algorithm for fnding nearest neighbors in (approximately) constant verage time. Pattern Recognition Lett 4:145-157. Google Scholar
- 33. Yianilos P (1993) Data structures and algorithms for nearest neighbor search in general metric spaces. In Proc. 4th ACM-SIAM Symposium on Discrete Algorithms (SODA '93), pp 311 - 321. Google Scholar
- 34. Yianilos P (2000) Locally lifting the curse of dimensionality for nearest neighbor search. In: Proc. 11th ACM-SIAM Symposium on Discrete Algorithms (SODA'00). Google Scholar
Index Terms
- Searching in metric spaces by spatial approximation
Recommendations
Searching in Metric Spaces by Spatial Approximation
SPIRE '99: Proceedings of the String Processing and Information Retrieval Symposium & International Workshop on GroupwareWe propose a new data structure to search in metric spaces. A metric space is formed by a collection of objects and a distance function defined among them, which satisfies the triangular inequality. The goal is, given a set of objects and a query, ...
Properties of Embedding Methods for Similarity Searching in Metric Spaces
Complex data types such as images, documents, DNA sequences, etc. are becoming increasingly important in modern database applications. A typical query in many of these applications seeks to find objects that are similar to some target object, where (dis)...
Effective nearest neighbor indexing with the euclidean metric
CIKM '01: Proceedings of the tenth international conference on Information and knowledge managementThe nearest neighbor search is an important operation widely-used in multimedia databases. In higher dimensions, most of previous methods for nearest neighbor search become inefficient and require to compute nearest neighbor distances to a large ...
Comments