Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2018436.2018448acmconferencesArticle/Chapter ViewAbstractPublication PagescommConference Proceedingsconference-collections
research-article
Free Access

Managing data transfers in computer clusters with orchestra

Authors Info & Claims
Published:15 August 2011Publication History

ABSTRACT

Cluster computing applications like MapReduce and Dryad transfer massive amounts of data between their computation stages. These transfers can have a significant impact on job performance, accounting for more than 50% of job completion times. Despite this impact, there has been relatively little work on optimizing the performance of these data transfers, with networking researchers traditionally focusing on per-flow traffic management. We address this limitation by proposing a global management architecture and a set of algorithms that (1) improve the transfer times of common communication patterns, such as broadcast and shuffle, and (2) allow scheduling policies at the transfer level, such as prioritizing a transfer over other transfers. Using a prototype implementation, we show that our solution improves broadcast completion times by up to 4.5X compared to the status quo in Hadoop. We also show that transfer-level scheduling can reduce the completion time of high-priority transfers by 1.7X.

Skip Supplemental Material Section

Supplemental Material

sigcomm_3_3.mp4

mp4

145.6 MB

References

  1. Amazon EC2. http://aws.amazon.com/ec2.Google ScholarGoogle Scholar
  2. Apache Hadoop. http://hadoop.apache.org.Google ScholarGoogle Scholar
  3. BitTornado. http://www.bittornado.com.Google ScholarGoogle Scholar
  4. BitTorrent. http://www.bittorrent.com.Google ScholarGoogle Scholar
  5. DETERlab. http://www.isi.deterlab.net.Google ScholarGoogle Scholar
  6. Fragment replicate join -- Pig wiki. http://wiki.apache.org/pig/PigFRJoin.Google ScholarGoogle Scholar
  7. LANTorrent. http://www.nimbusproject.org.Google ScholarGoogle Scholar
  8. Murder. http://github.com/lg/murder.Google ScholarGoogle Scholar
  9. H. Abu-Libdeh, P. Costa, A. Rowstron, G. O'Shea, and A. Donnelly. Symbiotic routing in future data centers. In SIGCOMM, pages 51--62, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. M. Al-Fares, S. Radhakrishnan, B. Raghavan, N. Huang, and A. Vahdat. Hedera: Dynamic flow scheduling for data center networks. In NSDI, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. G. Ananthanarayanan, S. Kandula, A. Greenberg, I. Stoica, Y. Lu, B. Saha, and E. Harris. Reining in the outliers in mapreduce clusters using Mantri. In OSDI, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. M. Casado, M. J. Freedman, J. Pettit, J. Luo, N. McKeown, and S. Shenker. Ethane: Taking control of the enterprise. In SIGCOMM, pages 1--12, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. M. Castro, P. Druschel, A.-M. Kermarrec, A. Nandi, A. Rowstron, and A. Singh. Splitstream: high-bandwidth multicast in cooperative environments. In SOSP, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Y. Chen, R. Griffith, J. Liu, R. H. Katz, and A. D. Joseph. Understanding TCP incast throughput collapse in datacenter networks. In WREN, pages 73--82, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. J. Dean and S. Ghemawat. MapReduce: Simplified data processing on large clusters. In OSDI, pages 137--150, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. C. Diot, W. Dabbous, and J. Crowcroft. Multipoint communication: A survey of protocols, functions, and mechanisms. IEEE JSAC, 15(3):277--290, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. B. Donnet, B. Gueye, and M. A. Kaafar. A Survey on Network Coordinates Systems, Design, and Security. IEEE Communication Surveys and Tutorials, 12(4), Oct. 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. C. Fraley and A. Raftery. MCLUST Version 3 for R: Normal mixture modeling and model-based clustering. Technical Report 504, Department of Statistics, University of Washington, Sept. 2006.Google ScholarGoogle Scholar
  19. P. Ganesan and M. Seshadri. On cooperative content distribution and the price of barter. In ICDCS, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. C. Gkantsidis, T. Karagiannis, and M. VojnoviC. Planet scale software updates. In SIGCOMM, pages 423--434, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. A. Greenberg, J. R. Hamilton, N. Jain, S. Kandula, C. Kim, P. Lahiri, D. A. Maltz, P. Patel, and S. Sengupta. VL2: A scalable and flexible data center network. In SIGCOMM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. A. Greenberg, G. Hjalmtysson, D. A. Maltz, A. Myers, J. Rexford, G. Xie, H. Yan, J. Zhan, and H. Zhang. A clean slate 4D approach to network control and management. SIGCOMM CCR, 35:41--54, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. C. Guo, G. Lu, D. Li, H. Wu, X. Zhang, Y. Shi, C. Tian, Y. Zhang, and S. Lu. BCube: A high performance, server-centric network architecture for modular data centers. In SIGCOMM, pages 63--74, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. C. Guo, H. Wu, K. Tan, L. Shi, Y. Zhang, and S. Lu. DCell: A scalable and fault-tolerant network structure for data centers. In SIGCOMM, pages 75--86, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, New York, NY, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  26. B. Hindman, A. Konwinski, M. Zaharia, A. Ghodsi, A. Joseph, R. Katz, S. Shenker, and I. Stoica. Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center. In NSDI, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. U. Hoelzle and L. A. Barroso. The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines. Morgan and Claypool Publishers, 1st edition, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly. Dryad: Distributed data-parallel programs from sequential building blocks. In EuroSys, pages 59--72, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. M. Isard, V. Prabhakaran, J. Currey, U. Wieder, K. Talwar, and A. Goldberg. Quincy: Fair scheduling for distributed computing clusters. In SOSP, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. D. A. Joseph, A. Tavakoli, and I. Stoica. A policy-aware switching layer for data centers. In SIGCOMM, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. J. B. Kruskal and M. Wish. Multidimensional Scaling. Sage University Paper series on Quantitative Applications in the Social Sciences, 07-001, 1978.Google ScholarGoogle ScholarCross RefCross Ref
  32. G. Malewicz, M. H. Austern, A. J. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: A system for large-scale graph processing. In SIGMOD, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Y. Mao and L. K. Saul. Modeling Distances in Large-Scale Networks by Matrix Factorization. In IMC, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. D. G. Murray, M. Schwarzkopf, C. Smowton, S. Smith, A. Madhavapeddy, and S. Hand. Ciel: A Universal Execution Engine for Distributed Data-Flow Computing. In NSDI, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. R. N. Mysore, A. Pamboris, N. Farrington, N. Huang, P. Miri, S. Radhakrishnan, V. Subramanya, and A. Vahdat. PortLand: A scalable fault-tolerant layer 2 data center network fabric. In SIGCOMM, pages 39--50, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. R. Peterson and E. G. Sirer. Antfarm: Efficient content distribution with managed swarms. In NSDI, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. B. Pfaff, J. Pettit, K. Amidon, M. Casado, T. Koponen, and S. Shenker. Extending networking into the virtualization layer. In HotNets 2009.Google ScholarGoogle Scholar
  38. A. Shieh, S. Kandula, A. Greenberg, and C. Kim. Sharing the data center network. In NSDI, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. D. B. Shmoys. Cut problems and their application to divide-and-conquer, chapter 5, pages 192--235. PWS Publishing Co., Boston, MA, USA, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. K. Thomas, C. Grier, J. Ma, V. Paxson, and D. Song. Design and evaluation of a real-time URL spam filtering service. In IEEE Symposium on Security and Privacy, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. V. Vasudevan, A. Phanishayee, H. Shah, E. Krevat, D. G. Andersen, G. R. Ganger, G. A. Gibson, and B. Mueller. Safe and effective fine-grained TCP retransmissions for datacenter communication. In SIGCOMM, pages 303--314, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. H. Yan, D. A. Maltz, T. S. E. Ng, H. Gogineni, H. Zhang, and Z. Cai. Tesseract: A 4D network control plane. In NSDI '07. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. M. Zaharia, D. Borthakur, J. Sen Sarma, K. Elmeleegy, S. Shenker, and I. Stoica. Delay scheduling: A simple technique for achieving locality and fairness in cluster scheduling. In EuroSys, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica. Spark: Cluster Computing with Working Sets. In HotCloud, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Y. Zhou, D. Wilkinson, R. Schreiber, and R. Pan. Large-scale parallel collaborative filtering for the Netflix prize. In AAIM, pages 337--348. Springer-Verlag, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Managing data transfers in computer clusters with orchestra

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        SIGCOMM '11: Proceedings of the ACM SIGCOMM 2011 conference
        August 2011
        502 pages
        ISBN:9781450307970
        DOI:10.1145/2018436
        • cover image ACM SIGCOMM Computer Communication Review
          ACM SIGCOMM Computer Communication Review  Volume 41, Issue 4
          SIGCOMM '11
          August 2011
          480 pages
          ISSN:0146-4833
          DOI:10.1145/2043164
          Issue’s Table of Contents

        Copyright © 2011 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 15 August 2011

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        SIGCOMM '11 Paper Acceptance Rate32of223submissions,14%Overall Acceptance Rate554of3,547submissions,16%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader