Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Hardness of String Similarity Search and Other Indexing Problems

  • Conference paper
Automata, Languages and Programming (ICALP 2004)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3142))

Included in the following conference series:

Abstract

Similarity search is a fundamental problem in computer science. Given a set of points A={A 1,...,A p } from a universe U and a distance measure D, it is possible to pose similarity search queries on a point Q in the form of nearest neighbors (find the string that has the smallest edit distance to a query string) or in the form of furthest neighbors (find the string that has the longest common subsequence with a query string).

Exact similarity search appears to be a very hard problem for most application domains; available solutions require either a preprocessing time/space exponential with p or query time exponential with |Q|. For such problems approximate solutions have recently attracted considerable attention. Approximate nearest (furthest) neighbor search aims to find a point in A whose distance to query point Q is within a small multiplicative factor of that between Q and its nearest (furthest) neighbor.

In this paper, we study hardness of several important similarity search problems for strings as well as other combinatorial objects, for which exact solutions have proven to be very difficult to achieve. We show here that even the approximate versions of these problems are quite hard; more specifically they are as hard as exact similarity search in Hamming space. Thus available cell probe lower bounds for exact similarity search in Hamming space apply for approximate similarity search in string spaces (under Levenshtein edit distance and longest common subsequence) as well as other spaces.

As a consequence of our reductions we also make observations about pairwise approximate distance computations. One such observation gives a simple linear time 2-approximation algorithm for permutation edit distance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 189.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 239.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Barkol, O., Rabani, Y.: Tighter lower bounds for nearest neighbor search and related problems in the cell probe model. In: Proc. of STOC (2000)

    Google Scholar 

  2. Borodin, A., Ostrovsky, R., Rabani, Y.: Lower bounds for high-dimensional nearest neighbor search and related problems. In: Proc. of STOC (1999)

    Google Scholar 

  3. Bourgain, J.: On Lipschitz embedding of finite metric spaces in Hilbert space. Israel Journal of Mathematics 52, 46–52 (1985)

    Article  MATH  MathSciNet  Google Scholar 

  4. Chakrabarti, A., Chazelle, B., Gum, B., Lvov, A.: A Lower Bound on the Complexity of Approximate Nearest-Neighbor Searching on the Hamming Cube. In: Proc. ACM STOC (1999)

    Google Scholar 

  5. Chakrabarti, A., Regev, O.: An optimal randomized cell probe lower bound for approximate nearest neighbor searching. In: ECCC (2003)

    Google Scholar 

  6. Cormode, G., Paterson, M., Sahinalp, S.C., Vishkin, U.: Communication Complexity of Document Exchange. In: Proc. ACM-SIAM Symp. on Discrete Algorithms (2000)

    Google Scholar 

  7. Cormode, G., Muthukrishnan, S., Sahinalp, S.C.: Permutation Edit Distance and Matching via Embeddings. In: Orejas, F., Spirakis, P.G., van Leeuwen, J. (eds.) ICALP 2001. LNCS, vol. 2076, p. 481. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  8. Farach-Colton, M., Indyk, P.: Approximate nearest neighbor algorithms for Hausdorff metrics via embeddings. In: Proc. of FOCS (1999)

    Google Scholar 

  9. Hirschberg, D., Galil: Serial Computations of Levenshtein Distances. In: Apostolico (ed.) Pattern Matching Algorithms, Oxford Univ. Press, Oxford (1997)

    Google Scholar 

  10. Indyk, P.: Approximate nearest neighbors in l ∞ . In: Proc. of FOCS (1998)

    Google Scholar 

  11. Indyk, P.: Approximate nearest neighbor algorithms for Frechet metric via product metrics. In: Proc. of Symp. on Computational Geometry (2002)

    Google Scholar 

  12. Indyk, P.: Better Algorithms for High-dimensional Proximity Problems via Asymmetric Embeddings. In: Proc. of 14th SODA (2003)

    Google Scholar 

  13. Indyk, P., Motwani, R.: Approximate nearest neighbors: Towards removing the curse of dimensionality. In: Proc. of 30th STOC (1998)

    Google Scholar 

  14. Jayram, T.S., Khot, S., Kumar, R., Rabani, Y.: Cell-Probe Lower Bounds for the Partial Match Problem. In: Proc. of STOC (2003)

    Google Scholar 

  15. Kalyanasundaram, B., Schnitger, G.: The Probabilistic Communication Complexity of Set Intersection. SIAM Journal on Discrete Mathematics 5, 545–557 (1992)

    Article  MATH  MathSciNet  Google Scholar 

  16. Kushilevitz, E., Ostrovsky, R., Rabani, Y.: Efficient search for approximate nearest neighbor in high dimensional spaces. In: Proc. of 30th STOC (1998)

    Google Scholar 

  17. Linial, N., London, E., Rabinovich, Y.: The geometry of graphs and some of its algorithmic applications. Combinatorica 15, 215–245 (1995)

    Article  MATH  MathSciNet  Google Scholar 

  18. Liu, D.: A strong lower bound for approximate nearest neighbor searching in the cell probe model (2003) (manuscript)

    Google Scholar 

  19. Miltersen, P.B.: Lower bounds for union-split-find related problems on random access machines. In: Proc. of 26th STOC (1994)

    Google Scholar 

  20. Miltersen, P.B., Nisan, N., Safra, S., Wigderson, A.: On data structures and asymmetric communication complexity. Journal of Computer and System Sciences 57(1), 37–49 (1998)

    Article  MATH  MathSciNet  Google Scholar 

  21. Muthukrishnan, S., Sahinalp, C.: Approximate nearest neighbors and sequence comparison with block operations. In: Proc. of 32nd STOC (2000)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Sahinalp, S.C., Utis, A. (2004). Hardness of String Similarity Search and Other Indexing Problems. In: Díaz, J., Karhumäki, J., Lepistö, A., Sannella, D. (eds) Automata, Languages and Programming. ICALP 2004. Lecture Notes in Computer Science, vol 3142. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-27836-8_90

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-27836-8_90

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-22849-3

  • Online ISBN: 978-3-540-27836-8

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics