Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article
Free Access

Searching in metric spaces by spatial approximation

Published:01 August 2002Publication History
Skip Abstract Section

Abstract

We propose a new data structure to search in metric spaces. A metric space is formed by a collection of objects and a distance function defined among them which satisfies the triangle inequality. The goal is, given a set of objects and a query, retrieve those objects close enough to the query. The complexity measure is the number of distances computed to achieve this goal. Our data structure, called sa-tree (“spatial approximation tree”), is based on approaching the searched objects spatially, that is, getting closer and closer to them, rather than the classic divide-and-conquer approach of other data structures. We analyze our method and show that the number of distance evaluations to search among n objects is sublinear. We show experimentally that the sa-tree is the best existing technique when the metric space is hard to search or the query has low selectivity. These are the most important unsolved cases in real applications. As a practical advantage, our data structure is one of the few that does not need to tune parameters, which makes it appealing for use by non-experts.

References

  1. 1. Aurenhammer F (1991) Voronoi diagrams - a survey of a fundamental geometric data structure. ACM Comput Surv 23(3):345- 405. Google ScholarGoogle Scholar
  2. 2. Bentley J (1975) Multidimensional binary search trees used for associative searching. Comm ACM 18(9):509-517. Google ScholarGoogle Scholar
  3. 3. Bentley J (1979) Multidimensional binary search trees in database applications. IEEE Trans Software Eng 5(4):333-340.Google ScholarGoogle Scholar
  4. 4. Burkhard W, Keller R (1973) Some approaches to best-match file searching. Comm ACM 16(4):230-236. Google ScholarGoogle Scholar
  5. 5. Bozkaya T, Ozsoyoglu M (1997) Distance-based indexing for high-dimensional metric spaces. In Proc. ACM Conference on Management of Data (SIGMOD'97), Sigmod Rec 26(2):357- 368. Google ScholarGoogle Scholar
  6. 6. Brin S (1995) Near neighbor search in large metric spaces. In Proc. 21st Conference on Very Large Databases (VLDB'95), pp 574-584. Google ScholarGoogle Scholar
  7. 7. Baeza-Yates R, Cunto W, Manber U, Wu S (1994) Proximity matching using fixed-queries trees. In Proc. 5th Conference on Combinatorial Pattern Matching (CPM'94), Lecture Notes in Computer Science, vol. 807. Springer, Berlin Heidelberg New York, pp 198-212. Google ScholarGoogle Scholar
  8. 8. Baeza-Yates R, Ribeiro-Neto B (1999) Modern information retrieval. Addison-Wesley, Reading, Mass., USA. Google ScholarGoogle Scholar
  9. 9. Chávez E, Marroquín J (1997) Proximity queries in metric spaces. In Proc. 4th South American Workshop on String Processing (WSP'97), pp 21-36. Carleton University.Google ScholarGoogle Scholar
  10. 10. Chávez E, Marroquín J, Baeza-Yates R (1999) Spaghettis: an array-based algorithm for similarity queries in metric spaces. In Proc. 6th South American Symposium on String Processing and Information Retrieval (SPIRE'99), pp 38-46. IEEE, New York. Google ScholarGoogle Scholar
  11. 11. Chávez E, Marroquín J, Navarro G (2001) Fixed queries array: a fast and economical data structure for proximity searching. Multimedia Tools Appl 14(2):113-135. Google ScholarGoogle Scholar
  12. 12. Chávez E, Navarro G (2000) An effective clustering algorithm to index high dimensional metric spaces. In Proc. 7th South American Symposium on String Processing and Information Retrieval (SPIRE'00), pp 75-86. IEEE, New York. Google ScholarGoogle Scholar
  13. 13. Chávez E, Navarro G (2001) A probabilistic spell for the curse of dimensionality. In Proc. 3rd Workshop on Algorithm Engineering and Experiments (ALENEX'01), pp 147-160, Lecture Notes in Computer Science, vol. 2153. Springer, Berlin Heidelberg New York.Google ScholarGoogle Scholar
  14. 14. Chávez E, Navarro G, Baeza-Yates R, Marroquín J (2001) Searching in metric spaces. ACM Comput Surv 33(3):273-321. Google ScholarGoogle Scholar
  15. 15. Ciaccia P, Patella M, Zezula P (1997) M-tree: an efficient access method for similarity search in metric spaces. In Proc. 23rd Conference on Very Large Databases (VLDB'97), pp 426-435. Google ScholarGoogle Scholar
  16. 16. Dehne F, Noltemeier H (1987) Voronoi trees and clustering problems. Inf Syst 12(2):171-175. Google ScholarGoogle Scholar
  17. 17. Guttman A (1984) R-trees: a dynamic index structure for spatial searching. In Proc. ACM Conference on Management of Data (SIGMOD'84), pp 47-57. Google ScholarGoogle Scholar
  18. 18. Harman D (1995) Overview of the third text retrieval conference. In: Proc. 3rd Text Retrieval Conference (TREC-3), pp 1-19. NIST Special Publication 500-207.Google ScholarGoogle Scholar
  19. 19. Hjaltason G, Samet H (1999) Distance browsing in spatial databases. ACM Trans Database Syst 24(2):265-318. Google ScholarGoogle Scholar
  20. 20. Micó L, Oncina J, Carrasco R (1996) A fast branch and bound nearest neighbor classifier in metric spaces. Pattern Recognition Lett 17:731-739. Google ScholarGoogle Scholar
  21. 21. Micó L, Oncina J, Vidal E (1994) A new version of the nearest-neighbor approximating and eliminating search (aesa) with linear preprocessing-time and memory requirements. Pattern Recognition Lett 15:9-17. Google ScholarGoogle Scholar
  22. 22. Navarro G (1999) Searching in metric spaces by spatial approx - imation. In Proc. 6th South American Symposium on String Processing and Information Retrieval (SPIRE'99), pp 141-148. IEEE, New York. Google ScholarGoogle Scholar
  23. 23. Navarro G (2001) A guided tour to approximate string matching. ACM Comput Surv 33(1):31-88. Google ScholarGoogle Scholar
  24. 24. Nene S, Nayar S (1997) A simple algorithm for nearest neighbor search in high dimensions. IEEE Trans Pattern Anal Mach Intell 19(9):989-1003. Google ScholarGoogle Scholar
  25. 25. Noltemeier H (1989) Voronoi trees and applications. In Proc. International Workshop on Discrete Algorithms and Complexity, pp 69-74.Google ScholarGoogle Scholar
  26. 26. Navarro G, Reyes N (2001) Dynamic spatial approximation trees. In Proc. XXI Conference of the Chilean Computer Science Society (SCCC'01). IEEE, New York, pp 213-222.Google ScholarGoogle Scholar
  27. 27. Noltemeier H, Verbarg K, Zirkelbach C (1992) Monotonous Bisector* Trees-a tool for efficient partitioning of complex schenes of geometric objects. In: Data structures and efficient algorithms, Lecture Notes in Computer Science, vol. 594. Springer, Berlin Heidelberg New York, pp 186-203. Google ScholarGoogle Scholar
  28. 28. Reyes N (2001) Dynamic data structures for searching metric spaces. MSc. Thesis, Univ. Nac. de San Luis, Argentina. In progress. Advisor: Navarro G.Google ScholarGoogle Scholar
  29. 29. Shapiro M (1977) The choice of reference points in best-match file searching. Comm ACM 20(5):339-343. Google ScholarGoogle Scholar
  30. 30. Uhlmann J (1991) Implementing metric trees to satisfy general proximity/similarity queries. Manuscript.Google ScholarGoogle Scholar
  31. 31. Uhlmann J (1991) Satisfying general proximity/similarity queries with metric trees. Inf Process Lett 40:175-179.Google ScholarGoogle Scholar
  32. 32. Vidal E (1986) An algorithm for fnding nearest neighbors in (approximately) constant verage time. Pattern Recognition Lett 4:145-157. Google ScholarGoogle Scholar
  33. 33. Yianilos P (1993) Data structures and algorithms for nearest neighbor search in general metric spaces. In Proc. 4th ACM-SIAM Symposium on Discrete Algorithms (SODA '93), pp 311 - 321. Google ScholarGoogle Scholar
  34. 34. Yianilos P (2000) Locally lifting the curse of dimensionality for nearest neighbor search. In: Proc. 11th ACM-SIAM Symposium on Discrete Algorithms (SODA'00). Google ScholarGoogle Scholar

Index Terms

  1. Searching in metric spaces by spatial approximation

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in

            Full Access

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader