ABSTRACT
Similarity search in high-dimensional metric spaces is a key operation in many applications, such as multimedia databases, image retrieval, object recognition, and others. The high dimensionality of the data requires special index structures to facilitate the search. Most of existing indexes are constructed by partitioning the data set using distance-based criteria. However, those methods either produce disjoint partitions, but ignore the distribution properties of the data; or produce non-disjoint groups, which greatly affect the search performance. In this paper, we study the performance of a new index structure, called Ball-and-Plane tree (BP-tree), which overcomes the above disadvantages. BP-tree is constructed by recursively dividing the data set into compact clusters. Distinctive from other techniques, it integrates the advantages of both disjoint and non-disjoint paradigms in order to achieve a structure of tight and low overlapping clusters, yielding significantly improved performance. Results obtained from an extensive experimental evaluation with real-world data sets show that BP-tree consistently outperforms state-of-the-art solutions.
- C. M. Bishop. Pattern Recognition and Machine Learning (Information Science and Statistics). Springer-Verlag, Inc., 2006. Google ScholarDigital Library
- T. Bozkaya and Z. M. Özsoyoglu. Indexing large metric spaces for similarity search queries. ACM Trans. Database Syst., 24(3):361--404, 1999. Google ScholarDigital Library
- S. Brin. Near neighbor search in large metric spaces. In VLDB, pages 574--584, 1995. Google ScholarDigital Library
- W. A. Burkhard and R. M. Keller. Some approaches to best-match file searching. Commun. ACM, 16(4):230--236, 1973. Google ScholarDigital Library
- E. Chávez and G. Navarro. A compact space decomposition for effective metric indexing. Pattern Recognition Letters, 26(9):1363--1376, 2005. Google ScholarDigital Library
- E. Chávez, G. Navarro, R. A. Baeza-Yates, and J. L. Marroquín. Searching in metric spaces. ACM Comput. Surv., 33(3):273--321, 2001. Google ScholarDigital Library
- P. Ciaccia, M. Patella, and P. Zezula. M-tree: An efficient access method for similarity search in metric spaces. In VLDB, pages 426--435, 1997. Google ScholarDigital Library
- J.-M. Geusebroek, G. J. Burghouts, and A. W. M. Smeulders. The amsterdam library of object images. IJCV, 61(1):103--112, 2005. Google ScholarDigital Library
- G. Griffin, A. Holub, and P. Perona. Caltech-256 object category dataset. Technical Report 7694, California Institute of Technology, 2007.Google Scholar
- J. Huang, R. Kumar, M. Mitra, W.-J. Zhu, and R. Zabih. Image indexing using color correlograms. In CVPR, pages 762--768, 1997. Google ScholarDigital Library
- G. Navarro. Searching in metric spaces by spatial approximation. VLDB J., 11(1):28--46, 2002. Google ScholarDigital Library
- A. Rocha, J. Almeida, M. A. Nascimento, R. Torres, and S. Goldenstein. Efficient and flexible cluster-and-search approach for cbir. In Int. Conf. Adv. Concepts Intell. Vision Syst., pages 77--88, 2008. Google ScholarDigital Library
- M. J. Swain and B. H. Ballard. Color indexing. IJCV, 7(1):11--32, 1991. Google ScholarDigital Library
- C. Traina Jr., A. J. M. Traina, C. Faloutsos, and B. Seeger. Fast indexing and visualization of metric data sets using slim-trees. IEEE Trans. Known. Data Eng., 14(2):244--260, 2002. Google ScholarDigital Library
- J. K. Uhlmann. Satisfying general proximity/similarity queries with metric trees. Inf. Process. Lett., 40(4):175--179, 1991.Google ScholarCross Ref
- M. R. Vieira, C. Traina Jr., F. J. T. Chino, and A. J. M. Traina. DBM-tree: Trading height-balancing for performance in metric access methods. J. Braz. Comp. Soc., 11(3):37--52, 2006.Google ScholarCross Ref
- P. N. Yianilos. Data structures and algorithms for nearest neighbor search in general metric spaces. In SODA, pages 311--321, 1993. Google ScholarDigital Library
Index Terms
- BP-tree: an efficient index for similarity search in high-dimensional metric spaces
Recommendations
CM-tree: A dynamic clustered index for similarity search in metric databases
Repositories of unstructured data types, such as free text, images, audio and video, have been recently emerging in various fields. A general searching approach for such data types is that of similarity search, where the search is for similar objects ...
D-Index: Distance Searching Index for Metric Data Sets
In order to speedup retrieval in large collections of data, index structures partition the data into subsets so that query requests can be evaluated without examining the entire collection. As the complexity of modern data types grows, metric spaces ...
Bulk construction of dynamic clustered metric trees
Repositories of complex data types, such as images, audio, video and free text, are becoming increasingly frequent in various fields. A general searching approach for such data types is that of similarity search, where the search is for similar objects ...
Comments