The Pruning Power: Theory and Heuristics for Mining Databases with Multiple k-Nearest-Neighbor Queries

Böhm, Christian; Braunmüller, Bernhard; Kriegel, Hans-Peter

doi:10.1007/3-540-44466-1_37

Christian Böhm⁷,
Bernhard Braunmüller⁷ &
Hans-Peter Kriegel⁷

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1874))

Included in the following conference series:

International Conference on Data Warehousing and Knowledge Discovery

504 Accesses
1 Citations

Abstract

Numerous data mining algorithms rely heavily on similarity queries. Although many or even all of the performed queries do not depend on each other, the algorithms process them in a sequential way. Recently, a novel technique for efficiently processing multiple similarity queries issued simultaneously has been introduced. It was shown that multiple similarity queries substantially speed-up query intensive data mining applications. For the important case of multiple k-nearest neighbor queries on top of a multidimensional index structure the problem of scheduling directory pages and data pages arises. This aspect has not been addressed so far. In this paper, we derive the theoretic foundation of this scheduling problem. Additionally, we propose several scheduling algorithms based on our theoretical results. In our experimental evaluation, we show that considering the maximum priority of pages clearly outperforms other scheduling approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

SELSH: A Hashing Scheme for Approximate Similarity Search with Early Stop Condition

The Power of Distance Distributions: Cost Models and Scheduling Policies for Quality-Controlled Similarity Queries

Dynamic Multi-probe LSH: An I/O Efficient Index Structure for Approximate Nearest Neighbor Search

References

Berchtold S., Böhm C., Keim D., Kriegel H.-P.: ‘A Cost Model for Nearest Neighbor Search in High-Dimensional Data Spaces’, Proc. 16th ACM SIGACT-SIGMOD-SIGART Symp. on Principles of Database Systems, Tucson, USA, 1997, pp. 78–86.
Google Scholar
Braunmüller B., Ester M., Kriegel H.-P., Sander J.: ‘Efficiently Supporting Multiple Similarity Queries for Mining in Metric Databases’, Proc. 16th Int. Conf. on Data Engineering, San Diego, USA, 2000, pp. 256–267.
Google Scholar
Berchtold S., Böhm C., Jagadish H.V., Kriegel H.-P., Sander J.: ‘Independent Quantization: An Index Compression Technique for High-Dimensional Spaces’, Proc. Int. Conf. on Data Engineering, San Diego, USA, 2000, pp. 577–588.
Google Scholar
Berchtold S., Kriegel H.-P.: ‘S3: Similarity Search in CAD Database Systems’, Proc. ACM SIGMOD Int. Conf. on Management of Data, Tucson, USA, 1997, pp. 564–567.
Google Scholar
Berchtold S., Keim D., Kriegel H.-P.: ‘The X-tree: An Index Structure for High-Dimensional Data’, Proc. Conf. on Very Large Data Bases, Mumbai, India, 1996, pp. 28–39.
Google Scholar
Breunig M. M., Kriegel H.-P., Ng R., Sander J.: ‘OPTICS-OF: Identifying Local Outliers’, Proc. Conf. on Principles of Data Mining and Knowledge Discovery, Prague, 1999, in: Lecture Notes in Computer Science, Springer, Vol. 1704, 1999, pp. 262–270.
Google Scholar
Böhm C.: ‘Efficiently Indexing High-Dimensional Data Spaces’, Ph.D. thesis, University of Munich, Munich, Germany, 1998.
Google Scholar
Friedman J. H., Bentley J. L., Finkel R. A.: ‘An Algorithm for Finding Best Matches in Logarithmic Expected Time’, ACM Transactions on Mathematical Software, Vol. 3, No. 3, 1977, pp. 209–226.
Article MATH Google Scholar
Fayyad U. M., Piatetsky-Shapiro G., Smyth P.: ‘From Data Mining to Knowledge Discovery: An Overview’, Advances in Knowledge Discovery and Data Mining, AAAI Press, 1996, pp. 1–34.
Google Scholar
Gaede V., Günther O.:‘Multidimensional Access Methods’, ACM Computing Surveys, Vol. 30, No. 2, 1998, pp. 170–231.
Article Google Scholar
Høg E. et al.: “The Tycho Catalogue”, Journal of Astronomy and Astrophysics, Vol. 323, 1997, pp. L57–L60.
Google Scholar
Hjaltason G. R., Samet H.: ‘Ranking in Spatial Databases’, Proc. Int. Symp. on Large Spatial Databases, Portland, USA, 1995, pp. 83–95.
Google Scholar
Knorr E.M., Ng R.T.: ‘Finding Aggregate Proximity Relationships and Commonalities in Spatial Data Mining,’ IEEE Transactions on Knowledge and Data Engineering, Vol. 8, No. 6, pp. 884–897.
Google Scholar
Mitchell T.M.: ‘Machine Learning’, McGraw-Hill, 1997.
Google Scholar
Roussopoulos N., Kelley S., Vincent F.: ‘Nearest Neighbor Queries’, Proc. ACM SIGMOD Int. Conf. on Management of Data, San Jose, USA, 1995, pp. 71–79.
Google Scholar
Samet H.: ‘The Design and Analysis of Spatial Data Structures’, Addison-Wesley, 1989.
Google Scholar
Weber R., Schek H.-J., Blott S.: ‘A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces’, Proc. Int. Conf. on Very Large Data Bases, New York, USA, 1998, pp. 194–205.
Google Scholar

Download references

Author information

Authors and Affiliations

University of Munich, Oettingenstr. 67, D-80538, Munich, Germany
Christian Böhm, Bernhard Braunmüller & Hans-Peter Kriegel

Authors

Christian Böhm
View author publications
You can also search for this author in PubMed Google Scholar
Bernhard Braunmüller
View author publications
You can also search for this author in PubMed Google Scholar
Hans-Peter Kriegel
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Graduate School of Informatics, Kyoto University, Yoshida-Honmachi, Sakyo-ku, Kyoto, 606-8501, Japan
Yahiko Kambayashi
Computer Science Department, Western Michigan University, Kalamazoo, MI, 49008, USA
Mukesh Mohania
Vienna University of Technology, IFS, Favoritenstr. 9-11/188, 1040, Vienna, Austria
A. Min Tjoa

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Böhm, C., Braunmüller, B., Kriegel, HP. (2000). The Pruning Power: Theory and Heuristics for Mining Databases with Multiple k-Nearest-Neighbor Queries. In: Kambayashi, Y., Mohania, M., Tjoa, A.M. (eds) Data Warehousing and Knowledge Discovery. DaWaK 2000. Lecture Notes in Computer Science, vol 1874. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44466-1_37

Download citation

DOI: https://doi.org/10.1007/3-540-44466-1_37
Published: 06 July 2001
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-67980-6
Online ISBN: 978-3-540-44466-4
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

The Pruning Power: Theory and Heuristics for Mining Databases with Multiple k-Nearest-Neighbor Queries

Abstract

Access this chapter

Preview

Similar content being viewed by others

SELSH: A Hashing Scheme for Approximate Similarity Search with Early Stop Condition

The Power of Distance Distributions: Cost Models and Scheduling Policies for Quality-Controlled Similarity Queries

Dynamic Multi-probe LSH: An I/O Efficient Index Structure for Approximate Nearest Neighbor Search

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

The Pruning Power: Theory and Heuristics for Mining Databases with Multiple k-Nearest-Neighbor Queries

Abstract

Access this chapter

Preview

Similar content being viewed by others

SELSH: A Hashing Scheme for Approximate Similarity Search with Early Stop Condition

The Power of Distance Distributions: Cost Models and Scheduling Policies for Quality-Controlled Similarity Queries

Dynamic Multi-probe LSH: An I/O Efficient Index Structure for Approximate Nearest Neighbor Search

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation