ABSTRACT
Low-latency, high-throughput web services are typically achieved through partitioning, replication, and caching. Although these strategies and the general design of large-scale distributed search systems are well known, the academic literature provides surprisingly few details on deployment and operational considerations in production environments. In this paper, we address this gap by sharing the distributed search architecture that underlies Twitter user search, a service for discovering relevant accounts on the popular microblogging service. Our design makes use of the principle that eliminates the distinction between failure and other anticipated service disruptions: as a result, most operational scenarios share exactly the same code path. This simplicity leads to greater robustness and fault-tolerance. Another salient feature of our architecture is its exclusive reliance on open-source software components, which makes it easier for the community to learn from our experiences and replicate our findings.
- R. Baeza-Yates, C. Castillo, F. Junqueira, V. Plachouras, and F. Silvestri. Challenges on distributed web retrieval. ICDE, 2007.Google ScholarCross Ref
- R. Baeza-Yates, A. Gionis, F. Junqueira, V. Murdock, V. Plachouras, and F. Silvestri. The impact of caching on search engines. SIGIR, 2007. Google ScholarDigital Library
- L. Barroso, J. Dean, and U. Hölzle. Web search for a planet: The Google cluster architecture. IEEE Micro, 23(2): 22--28, 2003. Google ScholarDigital Library
- L. Barroso and U. Hölzle. The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines Morgan & Claypool, 2009. Google ScholarDigital Library
- S. Büttcher, C. Clarke, and G. Cormack. Information Retrieval: Implementing and Evaluating Search Engines. MIT Press, 2010. Google ScholarDigital Library
- J. Dean and S. Ghemawat. MapReduce: Simplified data processing on large clusters. OSDI, 2004. Google ScholarDigital Library
- D. DeWitt and J. Gray. Parallel database systems: The future of high performance database systems. CACM, 35(6): 85--98, 1992. Google ScholarDigital Library
- D. Ford, F. Labelle, F. I. Popovici, M. Stokely, V.-A. Truong, L. Barroso, C. Grimes, and S. Quinlan. Availability in globally distributed storage systems. OSDI, 2010. Google ScholarDigital Library
- J. Hamilton. On designing and deploying Internet-scale services. LISA, 2007. Google ScholarDigital Library
- P. Hunt, M. Konar, F. P. Junqueira, and B. Reed. ZooKeeper: Wait-free coordination for Internet-scale systems. USENIX, 2010. Google ScholarDigital Library
- L. Lamport. The part-time parliament. ACM Transactions on Computer Systems, 16(2): 133--169, 1998. Google ScholarDigital Library
- C. Manning, P. Raghavan, and H. Schütze. An Introduction to Information Retrieval Cambridge University Press, 2008. Google ScholarDigital Library
- A. Moffat, W. Webber, and J. Zobel. Load balancing for term-distributed parallel retrieval. SIGIR, 2006. Google ScholarDigital Library
- C. Olston, B. Reed, U. Srivastava, R. Kumar, and A. Tomkins. Pig Latin: A not-so-foreign language for data processing. SIGMOD, 2008. Google ScholarDigital Library
- G. Skobeltsyn, F. Junqueira, V. Plachouras, and R. Baeza-Yates. ResIn: A combination of results caching and index pruning for high-performance web search engines. SIGIR, 2008. Google ScholarDigital Library
Index Terms
- Automatic management of partitioned, replicated search services
Recommendations
Reach Availability Modeling of Replicated Services
COMPSAC '11: Proceedings of the 2011 IEEE 35th Annual Computer Software and Applications ConferenceAvailability is an important issue in distributed systems providing quality of services. Data or service availability modeling is very important to ensure such quality of services. Many factors, such as node availability and network link availability, ...
Comments