Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

The Pruning Power: Theory and Heuristics for Mining Databases with Multiple k-Nearest-Neighbor Queries

  • Conference paper
  • First Online:
Data Warehousing and Knowledge Discovery (DaWaK 2000)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1874))

Included in the following conference series:

Abstract

Numerous data mining algorithms rely heavily on similarity queries. Although many or even all of the performed queries do not depend on each other, the algorithms process them in a sequential way. Recently, a novel technique for efficiently processing multiple similarity queries issued simultaneously has been introduced. It was shown that multiple similarity queries substantially speed-up query intensive data mining applications. For the important case of multiple k-nearest neighbor queries on top of a multidimensional index structure the problem of scheduling directory pages and data pages arises. This aspect has not been addressed so far. In this paper, we derive the theoretic foundation of this scheduling problem. Additionally, we propose several scheduling algorithms based on our theoretical results. In our experimental evaluation, we show that considering the maximum priority of pages clearly outperforms other scheduling approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Berchtold S., Böhm C., Keim D., Kriegel H.-P.: ‘A Cost Model for Nearest Neighbor Search in High-Dimensional Data Spaces’, Proc. 16th ACM SIGACT-SIGMOD-SIGART Symp. on Principles of Database Systems, Tucson, USA, 1997, pp. 78–86.

    Google Scholar 

  2. Braunmüller B., Ester M., Kriegel H.-P., Sander J.: ‘Efficiently Supporting Multiple Similarity Queries for Mining in Metric Databases’, Proc. 16th Int. Conf. on Data Engineering, San Diego, USA, 2000, pp. 256–267.

    Google Scholar 

  3. Berchtold S., Böhm C., Jagadish H.V., Kriegel H.-P., Sander J.: ‘Independent Quantization: An Index Compression Technique for High-Dimensional Spaces’, Proc. Int. Conf. on Data Engineering, San Diego, USA, 2000, pp. 577–588.

    Google Scholar 

  4. Berchtold S., Kriegel H.-P.: ‘S3: Similarity Search in CAD Database Systems’, Proc. ACM SIGMOD Int. Conf. on Management of Data, Tucson, USA, 1997, pp. 564–567.

    Google Scholar 

  5. Berchtold S., Keim D., Kriegel H.-P.: ‘The X-tree: An Index Structure for High-Dimensional Data’, Proc. Conf. on Very Large Data Bases, Mumbai, India, 1996, pp. 28–39.

    Google Scholar 

  6. Breunig M. M., Kriegel H.-P., Ng R., Sander J.: ‘OPTICS-OF: Identifying Local Outliers’, Proc. Conf. on Principles of Data Mining and Knowledge Discovery, Prague, 1999, in: Lecture Notes in Computer Science, Springer, Vol. 1704, 1999, pp. 262–270.

    Google Scholar 

  7. Böhm C.: ‘Efficiently Indexing High-Dimensional Data Spaces’, Ph.D. thesis, University of Munich, Munich, Germany, 1998.

    Google Scholar 

  8. Friedman J. H., Bentley J. L., Finkel R. A.: ‘An Algorithm for Finding Best Matches in Logarithmic Expected Time’, ACM Transactions on Mathematical Software, Vol. 3, No. 3, 1977, pp. 209–226.

    Article  MATH  Google Scholar 

  9. Fayyad U. M., Piatetsky-Shapiro G., Smyth P.: ‘From Data Mining to Knowledge Discovery: An Overview’, Advances in Knowledge Discovery and Data Mining, AAAI Press, 1996, pp. 1–34.

    Google Scholar 

  10. Gaede V., Günther O.:‘Multidimensional Access Methods’, ACM Computing Surveys, Vol. 30, No. 2, 1998, pp. 170–231.

    Article  Google Scholar 

  11. Høg E. et al.: “The Tycho Catalogue”, Journal of Astronomy and Astrophysics, Vol. 323, 1997, pp. L57–L60.

    Google Scholar 

  12. Hjaltason G. R., Samet H.: ‘Ranking in Spatial Databases’, Proc. Int. Symp. on Large Spatial Databases, Portland, USA, 1995, pp. 83–95.

    Google Scholar 

  13. Knorr E.M., Ng R.T.: ‘Finding Aggregate Proximity Relationships and Commonalities in Spatial Data Mining,’ IEEE Transactions on Knowledge and Data Engineering, Vol. 8, No. 6, pp. 884–897.

    Google Scholar 

  14. Mitchell T.M.: ‘Machine Learning’, McGraw-Hill, 1997.

    Google Scholar 

  15. Roussopoulos N., Kelley S., Vincent F.: ‘Nearest Neighbor Queries’, Proc. ACM SIGMOD Int. Conf. on Management of Data, San Jose, USA, 1995, pp. 71–79.

    Google Scholar 

  16. Samet H.: ‘The Design and Analysis of Spatial Data Structures’, Addison-Wesley, 1989.

    Google Scholar 

  17. Weber R., Schek H.-J., Blott S.: ‘A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces’, Proc. Int. Conf. on Very Large Data Bases, New York, USA, 1998, pp. 194–205.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2000 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Böhm, C., Braunmüller, B., Kriegel, HP. (2000). The Pruning Power: Theory and Heuristics for Mining Databases with Multiple k-Nearest-Neighbor Queries. In: Kambayashi, Y., Mohania, M., Tjoa, A.M. (eds) Data Warehousing and Knowledge Discovery. DaWaK 2000. Lecture Notes in Computer Science, vol 1874. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44466-1_37

Download citation

  • DOI: https://doi.org/10.1007/3-540-44466-1_37

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-67980-6

  • Online ISBN: 978-3-540-44466-4

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics