Fast parallel similarity search in multimedia databases

Authors:
Stefan Berchtold

University of Munich, Germany

University of Munich, Germany
View Profile

,
Christian Böhm

University of Munich, Germany

University of Munich, Germany
View Profile

,
Bernhard Braunmüller

University of Munich, Germany

University of Munich, Germany
View Profile

,
Daniel A. Keim

University of Munich, Germany

University of Munich, Germany
View Profile

,
Hans-Peter Kriegel

University of Munich, Germany

University of Munich, Germany
View Profile

Authors Info & Claims

ACM SIGMOD Record Volume 26 Issue 2June 1997pp 1–12https://doi.org/10.1145/253262.253263

Published:01 June 1997Publication History

ACM SIGMOD Record

Abstract

Most similarity search techniques map the data objects into some high-dimensional feature space. The similarity search then corresponds to a nearest-neighbor search in the feature space which is computationally very intensive. In this paper, we present a new parallel method for fast nearest-neighbor search in high-dimensional feature spaces. The core problem of designing a parallel nearest-neighbor algorithm is to find an adequate distribution of the data onto the disks. Unfortunately, the known declustering methods to not perform well for high-dimensional nearest-neighbor search. In contrast, our method has been optimized based on the special properties of high-dimensional spaces and therefore provides a near-optimal distribution of the data items among the disks. The basic idea of our data declustering technique is to assign the buckets corresponding to different quadrants of the data space to different disks. We show that our technique - in contrast to other declustering methods - guarantees that all buckets corresponding to neighboring quadrants are assigned to different disks. We evaluate our method using large amounts of real data (up to 40 MBytes) and compare it with the best known data declustering method, the Hilbert curve. Our experiments show that our method provides an almost linear speed-up and a constant scale-up. Additionally, it outperforms the Hilbert approach by a factor of up to 5.

References

AGMM 90 Altschul S. F., Gish W., Miller W., Myers E. W., Lipman D.J.: 'A Basic Local Alignment Search Tool', Journal of Molecular Biology, Vol. 215, No. 3, 1990, pp. 403-410.]]Google ScholarCross Ref
Ary 95 Arya S.: "Nearest Neighbor Searching and Applications', Ph.D. thesis, University of Maryland, College Park, MD, 1995.]] Google ScholarDigital Library
Big 89 Biggs N.L.: 'Discrete Mathematics', Oxford Science Publications, Clarendon Press-Oxford, 1989, pp. 172-176.]] Google ScholarDigital Library
BBKK 97 Berchtold S., B6hm C., Keim D., Kriegel H.-P.: 'A Cost Model For Nearest Neighbor Search in High- Dimensional Data Space', ACM PODS Symposium on Pricinples of Database Systems, 1997, Tucson, Arizona.]] Google ScholarDigital Library
BKK 96 Berchtold S., Keim D., Kriegel H.-P.: 'The X-tree: An Index Structure for High-Dimensional Data', 22nd Conf. on Very Large Databases, 1996, Bombay, India, pp. 28-39.]] Google ScholarDigital Library
BKSS 90 Beckmann N., Kriegel H.-P., Schneider R., Seeger B.: "The R*-tree: An Efficient and Robust Access Method for Points and Rectangles ', Proc. ACM SIGMOD Int. Conf. on Management of Data, Atlantic City, NJ, 1990, pp. 322-331.]] Google ScholarDigital Library
DS 82 Du H.C., Sobolewski J.S.: 'Disk allocation for cartesian product files on multiple Disk systems', ACM TODS, Journal of Transactions on Database Systems, 1982, pp. 82-101.]] Google ScholarDigital Library
Fal 94 Faloutsos C., Barber R., Flickner M., Hafner J., et al.: 'Efficient and Effective Querying by Image Content', Journal of Intelligent Information Systems, 1994, Vol. 3, pp. 231-262.]] Google ScholarDigital Library
FB 93 Faloutsos C., Bhagwat P.: 'Declustering Using Fractals', PDIS Journal of Parallel and Distributed Information Systems, 1993, pp. 18-25.]] Google ScholarDigital Library
FBF 77 Friedman J. H., Bentley J. L., Finkel R. A.: 'An Algorithm for Finding Best Matches in Logarithmic Expected Time', ACM Transactions on Mathematical Software, Vol. 3, No. 3, September I977, pp. 209-226.]] Google ScholarDigital Library
HS 95 Hjaltason G. R., Samet H.: 'Ranking in Spatial Databases', Proc. 4th Int. Symp. on Large Spatial Databases, Portland, ME, 1995, pp. 83-95.]] Google ScholarDigital Library
Jag 91 Jagadish H. V.: 'A Retrieval Technique for Similar Shapes' Proc. ACM SIGMOD Int. Conf. on Management of Data, 1991, pp. 208-217.]] Google ScholarDigital Library
Kuk 92 Kukich K.: 'Techniques for Automatically Correcting Words in Text', ACM Computing Surveys, Vol. 24, No. 4, 1992, pp. 377-440.]] Google ScholarDigital Library
KP 88 Kim M.H., Pramanik S.: ' Optimal file distribution for partial match retrieval', Proc. ACM SIGMOD Int. Conf. on Management of Data, 1988, pp. 173-182.]] Google ScholarDigital Library
LJF 94 Lin K., Jagadish H. V., Faloutsos C.: 'The TV-tree: An Index Structure for High-Dimensional Data ', VLDB Journal, Vol. 3, pp. 517-542, 1995.]] Google ScholarDigital Library
MG 93 Mehrotra R., Gary J.: 'Feature-Based Retrieval of Similar Shapes', Proc. 9th Int. Conf. on Data Engeneering, April 1993]] Google ScholarDigital Library
MG 95 Mehrotra R., Gary J.: 'Feature-lndex-Based Sililar Shape retrieval', Proc. of the 3rd Working Conf. on Visual Database Systems, March 1995]] Google ScholarDigital Library
PS 85 Preparata F.P., Shamos M. I.: 'Computational Geometry', Chapter 5 ('Proximity: Fundamental Algorithms'), Springer Verlag New York, 1985, pp. 185-225.]] Google ScholarDigital Library
RKV 95 Roussopoulos N., Kelley S., Vincent F.: 'Nearest Neighbor Queries', Proc. ACM SIGMOD Int. Conf. on Management of Data, 1995, pp. 71-79.]] Google ScholarDigital Library
RP 92 Ramasubramanian V., Paliwal K. K.: 'Fast k- Dimensional Tree Algorithms for Nearest Neighbor Search with Application to Vector Quantization Encoding', IEEE Transactions on Signal Processing, Vol. 40, No. 3, March 1992, pp. 518-531.]]Google ScholarCross Ref
SBK 92 Shoichet B. K., Bodian D. L., Kuntz 1. D.: 'Molecular Docking Using Shape Descriptors', Journal of Computational Chemistry, Vol. 13, No. 3, 1992, pp. 380-397.]] Google ScholarDigital Library
SH 94 Shawney H., Hafner J.: "Efficient Color Histogram hzdexing', Proc. Int. Conf. on Image Processing, 1994, pp. 66-70.]]Google ScholarCross Ref
Wel 71 Welch T.: 'Bounds on the Information Retrieval Efficiency of Static File Structures', Technical Report 88, MIT, 1971.]] Google ScholarDigital Library
WW 80 Wallace T., Wintz P.: 'An Efficient Three- Dimensional Aircraft Recognition Algorithm Using Normalized Fourier Descriptors ', Computer Graphics and Image Processing, Vol. ! 3, pp. 99-126, 1980]]Google Scholar

Index Terms

Fast parallel similarity search in multimedia databases
1. Information systems
2. Theory of computation
  1. Models of computation
    1. Concurrency
      1. Parallel computing models
  2. Theory and algorithms for application domains
    1. Database theory
      1. Database query processing and optimization (theory)

Recommendations

Fast parallel similarity search in multimedia databases
SIGMOD '97: Proceedings of the 1997 ACM SIGMOD international conference on Management of data

Most similarity search techniques map the data objects into some high-dimensional feature space. The similarity search then corresponds to a nearest-neighbor search in the feature space which is computationally very intensive. In this paper, we present ...
Read More
Weighted hashing for fast large scale similarity search
CIKM '13: Proceedings of the 22nd ACM international conference on Information & Knowledge Management

Similarity search, or finding approximate nearest neighbors, is an important technique for many applications. Many recent research demonstrate that hashing methods can achieve promising results for large scale similarity search due to its computational ...
Read More
Distinctiveness-Sensitive Nearest-Neighbor Search for Efficient Similarity Retrieval of Multimedia Information
ICDE '01: Proceedings of the 17th International Conference on Data Engineering

Abstract: Nearest neighbor (NN) search in high dimensional feature space is widely used for similarity retrieval of multi-media information. However, recent research results in the database literature reveal that a curious problem happens in high ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM SIGMOD Record Volume 26, Issue 2
June 1997
583 pages
ISSN:0163-5808
DOI:10.1145/253262
Chairman:
Sudha Ram
Univ. of Arizona, Tucson
,
Editor:
Joan M. Peckham
Univ. of Rhode Island, Kingston
Issue’s Table of Contents
SIGMOD '97: Proceedings of the 1997 ACM SIGMOD international conference on Management of data
June 1997
594 pages
ISBN:0897919114
DOI:10.1145/253260
Editors:
Joan M. Peckman
Univ. of Rhode Island, Kingston
,
Sudha Ram
Univ. of Arizona, Tucson
,
Michael Franklin
Copyright © 1997 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 June 1997
Check for updates
Qualifiers
- article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 154
  Total Citations
  View Citations
- 1,047
  Total Downloads
- Downloads (Last 12 months)58
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Fast parallel similarity search in multimedia databases

ACM SIGMOD Record

Abstract

References

Cited By

Index Terms

Recommendations

Fast parallel similarity search in multimedia databases

Weighted hashing for fast large scale similarity search

Distinctiveness-Sensitive Nearest-Neighbor Search for Efficient Similarity Retrieval of Multimedia Information