research-article

Automatic management of partitioned, replicated search services

Authors:
Florian Leibert

Twitter, San Francisco, California

Twitter, San Francisco, California
View Profile

,
Jake Mannix

Twitter, San Francisco, California

Twitter, San Francisco, California
View Profile

,
Jimmy Lin

Twitter, San Francisco, California

Twitter, San Francisco, California
View Profile

,
Babak Hamadani

Twitter, San Francisco, California

Twitter, San Francisco, California
View Profile

SOCC '11: Proceedings of the 2nd ACM Symposium on Cloud ComputingOctober 2011Article No.: 27Pages 1–8https://doi.org/10.1145/2038916.2038943

Published:26 October 2011Publication History

SOCC '11: Proceedings of the 2nd ACM Symposium on Cloud Computing

Pages 1–8

ABSTRACT

Low-latency, high-throughput web services are typically achieved through partitioning, replication, and caching. Although these strategies and the general design of large-scale distributed search systems are well known, the academic literature provides surprisingly few details on deployment and operational considerations in production environments. In this paper, we address this gap by sharing the distributed search architecture that underlies Twitter user search, a service for discovering relevant accounts on the popular microblogging service. Our design makes use of the principle that eliminates the distinction between failure and other anticipated service disruptions: as a result, most operational scenarios share exactly the same code path. This simplicity leads to greater robustness and fault-tolerance. Another salient feature of our architecture is its exclusive reliance on open-source software components, which makes it easier for the community to learn from our experiences and replicate our findings.

References

R. Baeza-Yates, C. Castillo, F. Junqueira, V. Plachouras, and F. Silvestri. Challenges on distributed web retrieval. ICDE, 2007.Google ScholarCross Ref
R. Baeza-Yates, A. Gionis, F. Junqueira, V. Murdock, V. Plachouras, and F. Silvestri. The impact of caching on search engines. SIGIR, 2007. Google ScholarDigital Library
L. Barroso, J. Dean, and U. Hölzle. Web search for a planet: The Google cluster architecture. IEEE Micro, 23(2): 22--28, 2003. Google ScholarDigital Library
L. Barroso and U. Hölzle. The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines Morgan & Claypool, 2009. Google ScholarDigital Library
S. Büttcher, C. Clarke, and G. Cormack. Information Retrieval: Implementing and Evaluating Search Engines. MIT Press, 2010. Google ScholarDigital Library
J. Dean and S. Ghemawat. MapReduce: Simplified data processing on large clusters. OSDI, 2004. Google ScholarDigital Library
D. DeWitt and J. Gray. Parallel database systems: The future of high performance database systems. CACM, 35(6): 85--98, 1992. Google ScholarDigital Library
D. Ford, F. Labelle, F. I. Popovici, M. Stokely, V.-A. Truong, L. Barroso, C. Grimes, and S. Quinlan. Availability in globally distributed storage systems. OSDI, 2010. Google ScholarDigital Library
J. Hamilton. On designing and deploying Internet-scale services. LISA, 2007. Google ScholarDigital Library
P. Hunt, M. Konar, F. P. Junqueira, and B. Reed. ZooKeeper: Wait-free coordination for Internet-scale systems. USENIX, 2010. Google ScholarDigital Library
L. Lamport. The part-time parliament. ACM Transactions on Computer Systems, 16(2): 133--169, 1998. Google ScholarDigital Library
C. Manning, P. Raghavan, and H. Schütze. An Introduction to Information Retrieval Cambridge University Press, 2008. Google ScholarDigital Library
A. Moffat, W. Webber, and J. Zobel. Load balancing for term-distributed parallel retrieval. SIGIR, 2006. Google ScholarDigital Library
C. Olston, B. Reed, U. Srivastava, R. Kumar, and A. Tomkins. Pig Latin: A not-so-foreign language for data processing. SIGMOD, 2008. Google ScholarDigital Library
G. Skobeltsyn, F. Junqueira, V. Plachouras, and R. Baeza-Yates. ResIn: A combination of results caching and index pruning for high-performance web search engines. SIGIR, 2008. Google ScholarDigital Library

Index Terms

Automatic management of partitioned, replicated search services
1. Information systems
  1. Information retrieval

Recommendations

Low cost management of replicated data
Read More
Maintaining Availability in Partitioned Replicated Databases
Read More
Reach Availability Modeling of Replicated Services
COMPSAC '11: Proceedings of the 2011 IEEE 35th Annual Computer Software and Applications Conference

Availability is an important issue in distributed systems providing quality of services. Data or service availability modeling is very important to ensure such quality of services. Many factors, such as node availability and network link availability, ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SOCC '11: Proceedings of the 2nd ACM Symposium on Cloud Computing
October 2011
377 pages
ISBN:9781450309769
DOI:10.1145/2038916
Program Chairs:
Jeffrey S. Chase
Duke University
,
Amr El Abbadi
Univ of California, Santa Barbara
Copyright © 2011 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 26 October 2011
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
configuration management
distributed retrieval architectures
failover
information retrieval
robustness
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate169of722submissions,23%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 8
  Total Citations
  View Citations
- 396
  Total Downloads
- Downloads (Last 12 months)5
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Automatic management of partitioned, replicated search services

SOCC '11: Proceedings of the 2nd ACM Symposium on Cloud Computing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Low cost management of replicated data

Maintaining Availability in Partitioned Replicated Databases

Reach Availability Modeling of Replicated Services

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Automatic management of partitioned, replicated search services

SOCC '11: Proceedings of the 2nd ACM Symposium on Cloud Computing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Low cost management of replicated data

Maintaining Availability in Partitioned Replicated Databases

Reach Availability Modeling of Replicated Services

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media