Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article
Free Access

Fast parallel similarity search in multimedia databases

Published:01 June 1997Publication History
Skip Abstract Section

Abstract

Most similarity search techniques map the data objects into some high-dimensional feature space. The similarity search then corresponds to a nearest-neighbor search in the feature space which is computationally very intensive. In this paper, we present a new parallel method for fast nearest-neighbor search in high-dimensional feature spaces. The core problem of designing a parallel nearest-neighbor algorithm is to find an adequate distribution of the data onto the disks. Unfortunately, the known declustering methods to not perform well for high-dimensional nearest-neighbor search. In contrast, our method has been optimized based on the special properties of high-dimensional spaces and therefore provides a near-optimal distribution of the data items among the disks. The basic idea of our data declustering technique is to assign the buckets corresponding to different quadrants of the data space to different disks. We show that our technique - in contrast to other declustering methods - guarantees that all buckets corresponding to neighboring quadrants are assigned to different disks. We evaluate our method using large amounts of real data (up to 40 MBytes) and compare it with the best known data declustering method, the Hilbert curve. Our experiments show that our method provides an almost linear speed-up and a constant scale-up. Additionally, it outperforms the Hilbert approach by a factor of up to 5.

References

  1. AGMM 90 Altschul S. F., Gish W., Miller W., Myers E. W., Lipman D.J.: 'A Basic Local Alignment Search Tool', Journal of Molecular Biology, Vol. 215, No. 3, 1990, pp. 403-410.]]Google ScholarGoogle ScholarCross RefCross Ref
  2. Ary 95 Arya S.: "Nearest Neighbor Searching and Applications', Ph.D. thesis, University of Maryland, College Park, MD, 1995.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Big 89 Biggs N.L.: 'Discrete Mathematics', Oxford Science Publications, Clarendon Press-Oxford, 1989, pp. 172-176.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. BBKK 97 Berchtold S., B6hm C., Keim D., Kriegel H.-P.: 'A Cost Model For Nearest Neighbor Search in High- Dimensional Data Space', ACM PODS Symposium on Pricinples of Database Systems, 1997, Tucson, Arizona.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. BKK 96 Berchtold S., Keim D., Kriegel H.-P.: 'The X-tree: An Index Structure for High-Dimensional Data', 22nd Conf. on Very Large Databases, 1996, Bombay, India, pp. 28-39.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. BKSS 90 Beckmann N., Kriegel H.-P., Schneider R., Seeger B.: "The R*-tree: An Efficient and Robust Access Method for Points and Rectangles ', Proc. ACM SIGMOD Int. Conf. on Management of Data, Atlantic City, NJ, 1990, pp. 322-331.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. DS 82 Du H.C., Sobolewski J.S.: 'Disk allocation for cartesian product files on multiple Disk systems', ACM TODS, Journal of Transactions on Database Systems, 1982, pp. 82-101.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Fal 94 Faloutsos C., Barber R., Flickner M., Hafner J., et al.: 'Efficient and Effective Querying by Image Content', Journal of Intelligent Information Systems, 1994, Vol. 3, pp. 231-262.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. FB 93 Faloutsos C., Bhagwat P.: 'Declustering Using Fractals', PDIS Journal of Parallel and Distributed Information Systems, 1993, pp. 18-25.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. FBF 77 Friedman J. H., Bentley J. L., Finkel R. A.: 'An Algorithm for Finding Best Matches in Logarithmic Expected Time', ACM Transactions on Mathematical Software, Vol. 3, No. 3, September I977, pp. 209-226.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. HS 95 Hjaltason G. R., Samet H.: 'Ranking in Spatial Databases', Proc. 4th Int. Symp. on Large Spatial Databases, Portland, ME, 1995, pp. 83-95.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Jag 91 Jagadish H. V.: 'A Retrieval Technique for Similar Shapes' Proc. ACM SIGMOD Int. Conf. on Management of Data, 1991, pp. 208-217.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Kuk 92 Kukich K.: 'Techniques for Automatically Correcting Words in Text', ACM Computing Surveys, Vol. 24, No. 4, 1992, pp. 377-440.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. KP 88 Kim M.H., Pramanik S.: ' Optimal file distribution for partial match retrieval', Proc. ACM SIGMOD Int. Conf. on Management of Data, 1988, pp. 173-182.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. LJF 94 Lin K., Jagadish H. V., Faloutsos C.: 'The TV-tree: An Index Structure for High-Dimensional Data ', VLDB Journal, Vol. 3, pp. 517-542, 1995.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. MG 93 Mehrotra R., Gary J.: 'Feature-Based Retrieval of Similar Shapes', Proc. 9th Int. Conf. on Data Engeneering, April 1993]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. MG 95 Mehrotra R., Gary J.: 'Feature-lndex-Based Sililar Shape retrieval', Proc. of the 3rd Working Conf. on Visual Database Systems, March 1995]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. PS 85 Preparata F.P., Shamos M. I.: 'Computational Geometry', Chapter 5 ('Proximity: Fundamental Algorithms'), Springer Verlag New York, 1985, pp. 185-225.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. RKV 95 Roussopoulos N., Kelley S., Vincent F.: 'Nearest Neighbor Queries', Proc. ACM SIGMOD Int. Conf. on Management of Data, 1995, pp. 71-79.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. RP 92 Ramasubramanian V., Paliwal K. K.: 'Fast k- Dimensional Tree Algorithms for Nearest Neighbor Search with Application to Vector Quantization Encoding', IEEE Transactions on Signal Processing, Vol. 40, No. 3, March 1992, pp. 518-531.]]Google ScholarGoogle ScholarCross RefCross Ref
  21. SBK 92 Shoichet B. K., Bodian D. L., Kuntz 1. D.: 'Molecular Docking Using Shape Descriptors', Journal of Computational Chemistry, Vol. 13, No. 3, 1992, pp. 380-397.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. SH 94 Shawney H., Hafner J.: "Efficient Color Histogram hzdexing', Proc. Int. Conf. on Image Processing, 1994, pp. 66-70.]]Google ScholarGoogle ScholarCross RefCross Ref
  23. Wel 71 Welch T.: 'Bounds on the Information Retrieval Efficiency of Static File Structures', Technical Report 88, MIT, 1971.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. WW 80 Wallace T., Wintz P.: 'An Efficient Three- Dimensional Aircraft Recognition Algorithm Using Normalized Fourier Descriptors ', Computer Graphics and Image Processing, Vol. ! 3, pp. 99-126, 1980]]Google ScholarGoogle Scholar

Index Terms

  1. Fast parallel similarity search in multimedia databases

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in

            Full Access

            • Published in

              cover image ACM SIGMOD Record
              ACM SIGMOD Record  Volume 26, Issue 2
              June 1997
              583 pages
              ISSN:0163-5808
              DOI:10.1145/253262
              Issue’s Table of Contents
              • cover image ACM Conferences
                SIGMOD '97: Proceedings of the 1997 ACM SIGMOD international conference on Management of data
                June 1997
                594 pages
                ISBN:0897919114
                DOI:10.1145/253260

              Copyright © 1997 ACM

              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 1 June 1997

              Check for updates

              Qualifiers

              • article

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader