Abstract
Numerous data mining algorithms rely heavily on similarity queries. Although many or even all of the performed queries do not depend on each other, the algorithms process them in a sequential way. Recently, a novel technique for efficiently processing multiple similarity queries issued simultaneously has been introduced. It was shown that multiple similarity queries substantially speed-up query intensive data mining applications. For the important case of multiple k-nearest neighbor queries on top of a multidimensional index structure the problem of scheduling directory pages and data pages arises. This aspect has not been addressed so far. In this paper, we derive the theoretic foundation of this scheduling problem. Additionally, we propose several scheduling algorithms based on our theoretical results. In our experimental evaluation, we show that considering the maximum priority of pages clearly outperforms other scheduling approaches.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Berchtold S., Böhm C., Keim D., Kriegel H.-P.: ‘A Cost Model for Nearest Neighbor Search in High-Dimensional Data Spaces’, Proc. 16th ACM SIGACT-SIGMOD-SIGART Symp. on Principles of Database Systems, Tucson, USA, 1997, pp. 78–86.
Braunmüller B., Ester M., Kriegel H.-P., Sander J.: ‘Efficiently Supporting Multiple Similarity Queries for Mining in Metric Databases’, Proc. 16th Int. Conf. on Data Engineering, San Diego, USA, 2000, pp. 256–267.
Berchtold S., Böhm C., Jagadish H.V., Kriegel H.-P., Sander J.: ‘Independent Quantization: An Index Compression Technique for High-Dimensional Spaces’, Proc. Int. Conf. on Data Engineering, San Diego, USA, 2000, pp. 577–588.
Berchtold S., Kriegel H.-P.: ‘S3: Similarity Search in CAD Database Systems’, Proc. ACM SIGMOD Int. Conf. on Management of Data, Tucson, USA, 1997, pp. 564–567.
Berchtold S., Keim D., Kriegel H.-P.: ‘The X-tree: An Index Structure for High-Dimensional Data’, Proc. Conf. on Very Large Data Bases, Mumbai, India, 1996, pp. 28–39.
Breunig M. M., Kriegel H.-P., Ng R., Sander J.: ‘OPTICS-OF: Identifying Local Outliers’, Proc. Conf. on Principles of Data Mining and Knowledge Discovery, Prague, 1999, in: Lecture Notes in Computer Science, Springer, Vol. 1704, 1999, pp. 262–270.
Böhm C.: ‘Efficiently Indexing High-Dimensional Data Spaces’, Ph.D. thesis, University of Munich, Munich, Germany, 1998.
Friedman J. H., Bentley J. L., Finkel R. A.: ‘An Algorithm for Finding Best Matches in Logarithmic Expected Time’, ACM Transactions on Mathematical Software, Vol. 3, No. 3, 1977, pp. 209–226.
Fayyad U. M., Piatetsky-Shapiro G., Smyth P.: ‘From Data Mining to Knowledge Discovery: An Overview’, Advances in Knowledge Discovery and Data Mining, AAAI Press, 1996, pp. 1–34.
Gaede V., Günther O.:‘Multidimensional Access Methods’, ACM Computing Surveys, Vol. 30, No. 2, 1998, pp. 170–231.
Høg E. et al.: “The Tycho Catalogue”, Journal of Astronomy and Astrophysics, Vol. 323, 1997, pp. L57–L60.
Hjaltason G. R., Samet H.: ‘Ranking in Spatial Databases’, Proc. Int. Symp. on Large Spatial Databases, Portland, USA, 1995, pp. 83–95.
Knorr E.M., Ng R.T.: ‘Finding Aggregate Proximity Relationships and Commonalities in Spatial Data Mining,’ IEEE Transactions on Knowledge and Data Engineering, Vol. 8, No. 6, pp. 884–897.
Mitchell T.M.: ‘Machine Learning’, McGraw-Hill, 1997.
Roussopoulos N., Kelley S., Vincent F.: ‘Nearest Neighbor Queries’, Proc. ACM SIGMOD Int. Conf. on Management of Data, San Jose, USA, 1995, pp. 71–79.
Samet H.: ‘The Design and Analysis of Spatial Data Structures’, Addison-Wesley, 1989.
Weber R., Schek H.-J., Blott S.: ‘A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces’, Proc. Int. Conf. on Very Large Data Bases, New York, USA, 1998, pp. 194–205.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2000 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Böhm, C., Braunmüller, B., Kriegel, HP. (2000). The Pruning Power: Theory and Heuristics for Mining Databases with Multiple k-Nearest-Neighbor Queries. In: Kambayashi, Y., Mohania, M., Tjoa, A.M. (eds) Data Warehousing and Knowledge Discovery. DaWaK 2000. Lecture Notes in Computer Science, vol 1874. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44466-1_37
Download citation
DOI: https://doi.org/10.1007/3-540-44466-1_37
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-67980-6
Online ISBN: 978-3-540-44466-4
eBook Packages: Springer Book Archive