Abstract
Most similarity search techniques map the data objects into some high-dimensional feature space. The similarity search then corresponds to a nearest-neighbor search in the feature space which is computationally very intensive. In this paper, we present a new parallel method for fast nearest-neighbor search in high-dimensional feature spaces. The core problem of designing a parallel nearest-neighbor algorithm is to find an adequate distribution of the data onto the disks. Unfortunately, the known declustering methods to not perform well for high-dimensional nearest-neighbor search. In contrast, our method has been optimized based on the special properties of high-dimensional spaces and therefore provides a near-optimal distribution of the data items among the disks. The basic idea of our data declustering technique is to assign the buckets corresponding to different quadrants of the data space to different disks. We show that our technique - in contrast to other declustering methods - guarantees that all buckets corresponding to neighboring quadrants are assigned to different disks. We evaluate our method using large amounts of real data (up to 40 MBytes) and compare it with the best known data declustering method, the Hilbert curve. Our experiments show that our method provides an almost linear speed-up and a constant scale-up. Additionally, it outperforms the Hilbert approach by a factor of up to 5.
- AGMM 90 Altschul S. F., Gish W., Miller W., Myers E. W., Lipman D.J.: 'A Basic Local Alignment Search Tool', Journal of Molecular Biology, Vol. 215, No. 3, 1990, pp. 403-410.]]Google ScholarCross Ref
- Ary 95 Arya S.: "Nearest Neighbor Searching and Applications', Ph.D. thesis, University of Maryland, College Park, MD, 1995.]] Google ScholarDigital Library
- Big 89 Biggs N.L.: 'Discrete Mathematics', Oxford Science Publications, Clarendon Press-Oxford, 1989, pp. 172-176.]] Google ScholarDigital Library
- BBKK 97 Berchtold S., B6hm C., Keim D., Kriegel H.-P.: 'A Cost Model For Nearest Neighbor Search in High- Dimensional Data Space', ACM PODS Symposium on Pricinples of Database Systems, 1997, Tucson, Arizona.]] Google ScholarDigital Library
- BKK 96 Berchtold S., Keim D., Kriegel H.-P.: 'The X-tree: An Index Structure for High-Dimensional Data', 22nd Conf. on Very Large Databases, 1996, Bombay, India, pp. 28-39.]] Google ScholarDigital Library
- BKSS 90 Beckmann N., Kriegel H.-P., Schneider R., Seeger B.: "The R*-tree: An Efficient and Robust Access Method for Points and Rectangles ', Proc. ACM SIGMOD Int. Conf. on Management of Data, Atlantic City, NJ, 1990, pp. 322-331.]] Google ScholarDigital Library
- DS 82 Du H.C., Sobolewski J.S.: 'Disk allocation for cartesian product files on multiple Disk systems', ACM TODS, Journal of Transactions on Database Systems, 1982, pp. 82-101.]] Google ScholarDigital Library
- Fal 94 Faloutsos C., Barber R., Flickner M., Hafner J., et al.: 'Efficient and Effective Querying by Image Content', Journal of Intelligent Information Systems, 1994, Vol. 3, pp. 231-262.]] Google ScholarDigital Library
- FB 93 Faloutsos C., Bhagwat P.: 'Declustering Using Fractals', PDIS Journal of Parallel and Distributed Information Systems, 1993, pp. 18-25.]] Google ScholarDigital Library
- FBF 77 Friedman J. H., Bentley J. L., Finkel R. A.: 'An Algorithm for Finding Best Matches in Logarithmic Expected Time', ACM Transactions on Mathematical Software, Vol. 3, No. 3, September I977, pp. 209-226.]] Google ScholarDigital Library
- HS 95 Hjaltason G. R., Samet H.: 'Ranking in Spatial Databases', Proc. 4th Int. Symp. on Large Spatial Databases, Portland, ME, 1995, pp. 83-95.]] Google ScholarDigital Library
- Jag 91 Jagadish H. V.: 'A Retrieval Technique for Similar Shapes' Proc. ACM SIGMOD Int. Conf. on Management of Data, 1991, pp. 208-217.]] Google ScholarDigital Library
- Kuk 92 Kukich K.: 'Techniques for Automatically Correcting Words in Text', ACM Computing Surveys, Vol. 24, No. 4, 1992, pp. 377-440.]] Google ScholarDigital Library
- KP 88 Kim M.H., Pramanik S.: ' Optimal file distribution for partial match retrieval', Proc. ACM SIGMOD Int. Conf. on Management of Data, 1988, pp. 173-182.]] Google ScholarDigital Library
- LJF 94 Lin K., Jagadish H. V., Faloutsos C.: 'The TV-tree: An Index Structure for High-Dimensional Data ', VLDB Journal, Vol. 3, pp. 517-542, 1995.]] Google ScholarDigital Library
- MG 93 Mehrotra R., Gary J.: 'Feature-Based Retrieval of Similar Shapes', Proc. 9th Int. Conf. on Data Engeneering, April 1993]] Google ScholarDigital Library
- MG 95 Mehrotra R., Gary J.: 'Feature-lndex-Based Sililar Shape retrieval', Proc. of the 3rd Working Conf. on Visual Database Systems, March 1995]] Google ScholarDigital Library
- PS 85 Preparata F.P., Shamos M. I.: 'Computational Geometry', Chapter 5 ('Proximity: Fundamental Algorithms'), Springer Verlag New York, 1985, pp. 185-225.]] Google ScholarDigital Library
- RKV 95 Roussopoulos N., Kelley S., Vincent F.: 'Nearest Neighbor Queries', Proc. ACM SIGMOD Int. Conf. on Management of Data, 1995, pp. 71-79.]] Google ScholarDigital Library
- RP 92 Ramasubramanian V., Paliwal K. K.: 'Fast k- Dimensional Tree Algorithms for Nearest Neighbor Search with Application to Vector Quantization Encoding', IEEE Transactions on Signal Processing, Vol. 40, No. 3, March 1992, pp. 518-531.]]Google ScholarCross Ref
- SBK 92 Shoichet B. K., Bodian D. L., Kuntz 1. D.: 'Molecular Docking Using Shape Descriptors', Journal of Computational Chemistry, Vol. 13, No. 3, 1992, pp. 380-397.]] Google ScholarDigital Library
- SH 94 Shawney H., Hafner J.: "Efficient Color Histogram hzdexing', Proc. Int. Conf. on Image Processing, 1994, pp. 66-70.]]Google ScholarCross Ref
- Wel 71 Welch T.: 'Bounds on the Information Retrieval Efficiency of Static File Structures', Technical Report 88, MIT, 1971.]] Google ScholarDigital Library
- WW 80 Wallace T., Wintz P.: 'An Efficient Three- Dimensional Aircraft Recognition Algorithm Using Normalized Fourier Descriptors ', Computer Graphics and Image Processing, Vol. ! 3, pp. 99-126, 1980]]Google Scholar
Index Terms
- Fast parallel similarity search in multimedia databases
Recommendations
Fast parallel similarity search in multimedia databases
SIGMOD '97: Proceedings of the 1997 ACM SIGMOD international conference on Management of dataMost similarity search techniques map the data objects into some high-dimensional feature space. The similarity search then corresponds to a nearest-neighbor search in the feature space which is computationally very intensive. In this paper, we present ...
Weighted hashing for fast large scale similarity search
CIKM '13: Proceedings of the 22nd ACM international conference on Information & Knowledge ManagementSimilarity search, or finding approximate nearest neighbors, is an important technique for many applications. Many recent research demonstrate that hashing methods can achieve promising results for large scale similarity search due to its computational ...
Distinctiveness-Sensitive Nearest-Neighbor Search for Efficient Similarity Retrieval of Multimedia Information
ICDE '01: Proceedings of the 17th International Conference on Data EngineeringAbstract: Nearest neighbor (NN) search in high dimensional feature space is widely used for similarity retrieval of multi-media information. However, recent research results in the database literature reveal that a curious problem happens in high ...
Comments