Abstract
Similarity search is a fundamental problem in computer science. Given a set of points A={A 1,...,A p } from a universe U and a distance measure D, it is possible to pose similarity search queries on a point Q in the form of nearest neighbors (find the string that has the smallest edit distance to a query string) or in the form of furthest neighbors (find the string that has the longest common subsequence with a query string).
Exact similarity search appears to be a very hard problem for most application domains; available solutions require either a preprocessing time/space exponential with p or query time exponential with |Q|. For such problems approximate solutions have recently attracted considerable attention. Approximate nearest (furthest) neighbor search aims to find a point in A whose distance to query point Q is within a small multiplicative factor of that between Q and its nearest (furthest) neighbor.
In this paper, we study hardness of several important similarity search problems for strings as well as other combinatorial objects, for which exact solutions have proven to be very difficult to achieve. We show here that even the approximate versions of these problems are quite hard; more specifically they are as hard as exact similarity search in Hamming space. Thus available cell probe lower bounds for exact similarity search in Hamming space apply for approximate similarity search in string spaces (under Levenshtein edit distance and longest common subsequence) as well as other spaces.
As a consequence of our reductions we also make observations about pairwise approximate distance computations. One such observation gives a simple linear time 2-approximation algorithm for permutation edit distance.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Barkol, O., Rabani, Y.: Tighter lower bounds for nearest neighbor search and related problems in the cell probe model. In: Proc. of STOC (2000)
Borodin, A., Ostrovsky, R., Rabani, Y.: Lower bounds for high-dimensional nearest neighbor search and related problems. In: Proc. of STOC (1999)
Bourgain, J.: On Lipschitz embedding of finite metric spaces in Hilbert space. Israel Journal of Mathematics 52, 46–52 (1985)
Chakrabarti, A., Chazelle, B., Gum, B., Lvov, A.: A Lower Bound on the Complexity of Approximate Nearest-Neighbor Searching on the Hamming Cube. In: Proc. ACM STOC (1999)
Chakrabarti, A., Regev, O.: An optimal randomized cell probe lower bound for approximate nearest neighbor searching. In: ECCC (2003)
Cormode, G., Paterson, M., Sahinalp, S.C., Vishkin, U.: Communication Complexity of Document Exchange. In: Proc. ACM-SIAM Symp. on Discrete Algorithms (2000)
Cormode, G., Muthukrishnan, S., Sahinalp, S.C.: Permutation Edit Distance and Matching via Embeddings. In: Orejas, F., Spirakis, P.G., van Leeuwen, J. (eds.) ICALP 2001. LNCS, vol. 2076, p. 481. Springer, Heidelberg (2001)
Farach-Colton, M., Indyk, P.: Approximate nearest neighbor algorithms for Hausdorff metrics via embeddings. In: Proc. of FOCS (1999)
Hirschberg, D., Galil: Serial Computations of Levenshtein Distances. In: Apostolico (ed.) Pattern Matching Algorithms, Oxford Univ. Press, Oxford (1997)
Indyk, P.: Approximate nearest neighbors in l ∞ . In: Proc. of FOCS (1998)
Indyk, P.: Approximate nearest neighbor algorithms for Frechet metric via product metrics. In: Proc. of Symp. on Computational Geometry (2002)
Indyk, P.: Better Algorithms for High-dimensional Proximity Problems via Asymmetric Embeddings. In: Proc. of 14th SODA (2003)
Indyk, P., Motwani, R.: Approximate nearest neighbors: Towards removing the curse of dimensionality. In: Proc. of 30th STOC (1998)
Jayram, T.S., Khot, S., Kumar, R., Rabani, Y.: Cell-Probe Lower Bounds for the Partial Match Problem. In: Proc. of STOC (2003)
Kalyanasundaram, B., Schnitger, G.: The Probabilistic Communication Complexity of Set Intersection. SIAM Journal on Discrete Mathematics 5, 545–557 (1992)
Kushilevitz, E., Ostrovsky, R., Rabani, Y.: Efficient search for approximate nearest neighbor in high dimensional spaces. In: Proc. of 30th STOC (1998)
Linial, N., London, E., Rabinovich, Y.: The geometry of graphs and some of its algorithmic applications. Combinatorica 15, 215–245 (1995)
Liu, D.: A strong lower bound for approximate nearest neighbor searching in the cell probe model (2003) (manuscript)
Miltersen, P.B.: Lower bounds for union-split-find related problems on random access machines. In: Proc. of 26th STOC (1994)
Miltersen, P.B., Nisan, N., Safra, S., Wigderson, A.: On data structures and asymmetric communication complexity. Journal of Computer and System Sciences 57(1), 37–49 (1998)
Muthukrishnan, S., Sahinalp, C.: Approximate nearest neighbors and sequence comparison with block operations. In: Proc. of 32nd STOC (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Sahinalp, S.C., Utis, A. (2004). Hardness of String Similarity Search and Other Indexing Problems. In: DÃaz, J., Karhumäki, J., Lepistö, A., Sannella, D. (eds) Automata, Languages and Programming. ICALP 2004. Lecture Notes in Computer Science, vol 3142. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-27836-8_90
Download citation
DOI: https://doi.org/10.1007/978-3-540-27836-8_90
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22849-3
Online ISBN: 978-3-540-27836-8
eBook Packages: Springer Book Archive