research-article

Free Access

Managing data transfers in computer clusters with orchestra

Authors:
Mosharaf Chowdhury

University of California, Berkeley, Berkeley, CA, USA

University of California, Berkeley, Berkeley, CA, USA
View Profile

,
Matei Zaharia

University of California, Berkeley, Berkeley, CA, USA

University of California, Berkeley, Berkeley, CA, USA
View Profile

,
Justin Ma

University of California, Berkeley, Berkeley, CA, USA

University of California, Berkeley, Berkeley, CA, USA
View Profile

,
Michael I. Jordan

University of California, Berkeley, Berkeley, CA, USA

University of California, Berkeley, Berkeley, CA, USA
View Profile

,
Ion Stoica

University of California, Berkeley, Berkeley, CA, USA

University of California, Berkeley, Berkeley, CA, USA
View Profile

SIGCOMM '11: Proceedings of the ACM SIGCOMM 2011 conferenceAugust 2011Pages 98–109https://doi.org/10.1145/2018436.2018448

Published:15 August 2011Publication History

SIGCOMM '11: Proceedings of the ACM SIGCOMM 2011 conference

Pages 98–109

ABSTRACT

Cluster computing applications like MapReduce and Dryad transfer massive amounts of data between their computation stages. These transfers can have a significant impact on job performance, accounting for more than 50% of job completion times. Despite this impact, there has been relatively little work on optimizing the performance of these data transfers, with networking researchers traditionally focusing on per-flow traffic management. We address this limitation by proposing a global management architecture and a set of algorithms that (1) improve the transfer times of common communication patterns, such as broadcast and shuffle, and (2) allow scheduling policies at the transfer level, such as prioritizing a transfer over other transfers. Using a prototype implementation, we show that our solution improves broadcast completion times by up to 4.5X compared to the status quo in Hadoop. We also show that transfer-level scheduling can reduce the completion time of high-priority transfers by 1.7X.

Supplemental Material

sigcomm_3_3.mp4

mp4

145.6 MB

Download

References

Amazon EC2. http://aws.amazon.com/ec2.Google Scholar
Apache Hadoop. http://hadoop.apache.org.Google Scholar
BitTornado. http://www.bittornado.com.Google Scholar
BitTorrent. http://www.bittorrent.com.Google Scholar
DETERlab. http://www.isi.deterlab.net.Google Scholar
Fragment replicate join -- Pig wiki. http://wiki.apache.org/pig/PigFRJoin.Google Scholar
LANTorrent. http://www.nimbusproject.org.Google Scholar
Murder. http://github.com/lg/murder.Google Scholar
H. Abu-Libdeh, P. Costa, A. Rowstron, G. O'Shea, and A. Donnelly. Symbiotic routing in future data centers. In SIGCOMM, pages 51--62, 2010. Google ScholarDigital Library
M. Al-Fares, S. Radhakrishnan, B. Raghavan, N. Huang, and A. Vahdat. Hedera: Dynamic flow scheduling for data center networks. In NSDI, 2010. Google ScholarDigital Library
G. Ananthanarayanan, S. Kandula, A. Greenberg, I. Stoica, Y. Lu, B. Saha, and E. Harris. Reining in the outliers in mapreduce clusters using Mantri. In OSDI, 2010. Google ScholarDigital Library
M. Casado, M. J. Freedman, J. Pettit, J. Luo, N. McKeown, and S. Shenker. Ethane: Taking control of the enterprise. In SIGCOMM, pages 1--12, 2007. Google ScholarDigital Library
M. Castro, P. Druschel, A.-M. Kermarrec, A. Nandi, A. Rowstron, and A. Singh. Splitstream: high-bandwidth multicast in cooperative environments. In SOSP, 2003. Google ScholarDigital Library
Y. Chen, R. Griffith, J. Liu, R. H. Katz, and A. D. Joseph. Understanding TCP incast throughput collapse in datacenter networks. In WREN, pages 73--82, 2009. Google ScholarDigital Library
J. Dean and S. Ghemawat. MapReduce: Simplified data processing on large clusters. In OSDI, pages 137--150, 2004. Google ScholarDigital Library
C. Diot, W. Dabbous, and J. Crowcroft. Multipoint communication: A survey of protocols, functions, and mechanisms. IEEE JSAC, 15(3):277--290, 1997. Google ScholarDigital Library
B. Donnet, B. Gueye, and M. A. Kaafar. A Survey on Network Coordinates Systems, Design, and Security. IEEE Communication Surveys and Tutorials, 12(4), Oct. 2010. Google ScholarDigital Library
C. Fraley and A. Raftery. MCLUST Version 3 for R: Normal mixture modeling and model-based clustering. Technical Report 504, Department of Statistics, University of Washington, Sept. 2006.Google Scholar
P. Ganesan and M. Seshadri. On cooperative content distribution and the price of barter. In ICDCS, 2005. Google ScholarDigital Library
C. Gkantsidis, T. Karagiannis, and M. VojnoviC. Planet scale software updates. In SIGCOMM, pages 423--434, 2006. Google ScholarDigital Library
A. Greenberg, J. R. Hamilton, N. Jain, S. Kandula, C. Kim, P. Lahiri, D. A. Maltz, P. Patel, and S. Sengupta. VL2: A scalable and flexible data center network. In SIGCOMM, 2009. Google ScholarDigital Library
A. Greenberg, G. Hjalmtysson, D. A. Maltz, A. Myers, J. Rexford, G. Xie, H. Yan, J. Zhan, and H. Zhang. A clean slate 4D approach to network control and management. SIGCOMM CCR, 35:41--54, 2005. Google ScholarDigital Library
C. Guo, G. Lu, D. Li, H. Wu, X. Zhang, Y. Shi, C. Tian, Y. Zhang, and S. Lu. BCube: A high performance, server-centric network architecture for modular data centers. In SIGCOMM, pages 63--74, 2009. Google ScholarDigital Library
C. Guo, H. Wu, K. Tan, L. Shi, Y. Zhang, and S. Lu. DCell: A scalable and fault-tolerant network structure for data centers. In SIGCOMM, pages 75--86, 2008. Google ScholarDigital Library
T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, New York, NY, 2009.Google ScholarCross Ref
B. Hindman, A. Konwinski, M. Zaharia, A. Ghodsi, A. Joseph, R. Katz, S. Shenker, and I. Stoica. Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center. In NSDI, 2011. Google ScholarDigital Library
U. Hoelzle and L. A. Barroso. The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines. Morgan and Claypool Publishers, 1st edition, 2009. Google ScholarDigital Library
M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly. Dryad: Distributed data-parallel programs from sequential building blocks. In EuroSys, pages 59--72, 2007. Google ScholarDigital Library
M. Isard, V. Prabhakaran, J. Currey, U. Wieder, K. Talwar, and A. Goldberg. Quincy: Fair scheduling for distributed computing clusters. In SOSP, 2009. Google ScholarDigital Library
D. A. Joseph, A. Tavakoli, and I. Stoica. A policy-aware switching layer for data centers. In SIGCOMM, 2008. Google ScholarDigital Library
J. B. Kruskal and M. Wish. Multidimensional Scaling. Sage University Paper series on Quantitative Applications in the Social Sciences, 07-001, 1978.Google ScholarCross Ref
G. Malewicz, M. H. Austern, A. J. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: A system for large-scale graph processing. In SIGMOD, 2010. Google ScholarDigital Library
Y. Mao and L. K. Saul. Modeling Distances in Large-Scale Networks by Matrix Factorization. In IMC, 2004. Google ScholarDigital Library
D. G. Murray, M. Schwarzkopf, C. Smowton, S. Smith, A. Madhavapeddy, and S. Hand. Ciel: A Universal Execution Engine for Distributed Data-Flow Computing. In NSDI, 2011. Google ScholarDigital Library
R. N. Mysore, A. Pamboris, N. Farrington, N. Huang, P. Miri, S. Radhakrishnan, V. Subramanya, and A. Vahdat. PortLand: A scalable fault-tolerant layer 2 data center network fabric. In SIGCOMM, pages 39--50, 2009. Google ScholarDigital Library
R. Peterson and E. G. Sirer. Antfarm: Efficient content distribution with managed swarms. In NSDI, 2009. Google ScholarDigital Library
B. Pfaff, J. Pettit, K. Amidon, M. Casado, T. Koponen, and S. Shenker. Extending networking into the virtualization layer. In HotNets 2009.Google Scholar
A. Shieh, S. Kandula, A. Greenberg, and C. Kim. Sharing the data center network. In NSDI, 2011. Google ScholarDigital Library
D. B. Shmoys. Cut problems and their application to divide-and-conquer, chapter 5, pages 192--235. PWS Publishing Co., Boston, MA, USA, 1997. Google ScholarDigital Library
K. Thomas, C. Grier, J. Ma, V. Paxson, and D. Song. Design and evaluation of a real-time URL spam filtering service. In IEEE Symposium on Security and Privacy, 2011. Google ScholarDigital Library
V. Vasudevan, A. Phanishayee, H. Shah, E. Krevat, D. G. Andersen, G. R. Ganger, G. A. Gibson, and B. Mueller. Safe and effective fine-grained TCP retransmissions for datacenter communication. In SIGCOMM, pages 303--314, 2009. Google ScholarDigital Library
H. Yan, D. A. Maltz, T. S. E. Ng, H. Gogineni, H. Zhang, and Z. Cai. Tesseract: A 4D network control plane. In NSDI '07. Google ScholarDigital Library
M. Zaharia, D. Borthakur, J. Sen Sarma, K. Elmeleegy, S. Shenker, and I. Stoica. Delay scheduling: A simple technique for achieving locality and fairness in cluster scheduling. In EuroSys, 2010. Google ScholarDigital Library
M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica. Spark: Cluster Computing with Working Sets. In HotCloud, 2010. Google ScholarDigital Library
Y. Zhou, D. Wilkinson, R. Schreiber, and R. Pan. Large-scale parallel collaborative filtering for the Netflix prize. In AAIM, pages 337--348. Springer-Verlag, 2008. Google ScholarDigital Library

Index Terms

Managing data transfers in computer clusters with orchestra
1. Computer systems organization
  1. Architectures
    1. Distributed architectures
2. Software and its engineering
  1. Software organization and properties
    1. Software system structures
      1. Distributed systems organizing principles

Recommendations

Efficient Coflow Scheduling Without Prior Knowledge
SIGCOMM '15: Proceedings of the 2015 ACM Conference on Special Interest Group on Data Communication

Inter-coflow scheduling improves application-level communication performance in data-parallel clusters. However, existing efficient schedulers require a priori coflow information and ignore cluster dynamics like pipelining, task failures, and ...
Read More
Efficient coflow scheduling with Varys
SIGCOMM '14: Proceedings of the 2014 ACM conference on SIGCOMM

Communication in data-parallel applications often involves a collection of parallel flows. Traditional techniques to optimize flow-level metrics do not perform well in optimizing such collections, because the network is largely agnostic to application-...
Read More
Managing data transfers in computer clusters with orchestra
SIGCOMM '11

Cluster computing applications like MapReduce and Dryad transfer massive amounts of data between their computation stages. These transfers can have a significant impact on job performance, accounting for more than 50% of job completion times. Despite ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGCOMM '11: Proceedings of the ACM SIGCOMM 2011 conference
August 2011
502 pages
ISBN:9781450307970
DOI:10.1145/2018436
General Chairs:
Srinivasan Keshav
University of Waterloo, Canada
,
Jörg Liebeherr
University of Toronto, Canada
,
Program Chairs:
John Byers
Boston University, USA
,
Jeffrey Mogul
HP Labs, USA
ACM SIGCOMM Computer Communication Review Volume 41, Issue 4
SIGCOMM '11
August 2011
480 pages
ISSN:0146-4833
DOI:10.1145/2043164
Issue’s Table of Contents
Copyright © 2011 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 15 August 2011
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
data transfer
data-intensive applications
datacenter networks
Qualifiers
- research-article
Conference

Acceptance Rates
SIGCOMM '11 Paper Acceptance Rate32of223submissions,14%Overall Acceptance Rate554of3,547submissions,16%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 539
  Total Citations
  View Citations
- 2,311
  Total Downloads
- Downloads (Last 12 months)172
- Downloads (Last 6 weeks)17
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Managing data transfers in computer clusters with orchestra

SIGCOMM '11: Proceedings of the ACM SIGCOMM 2011 conference

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Efficient Coflow Scheduling Without Prior Knowledge

Efficient coflow scheduling with Varys

Managing data transfers in computer clusters with orchestra