Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3361525.3361553acmconferencesArticle/Chapter ViewAbstractPublication PagesmiddlewareConference Proceedingsconference-collections
research-article
Public Access

ReLAQS: Reducing Latency for Multi-Tenant Approximate Queries via Scheduling

Published:09 December 2019Publication History

ABSTRACT

Approximate Query Processing has become increasingly popular as larger data sizes have increased query latency in distributed query processing systems. To provide such approximate results, systems return intermediate results and iteratively update these approximations as they process more data. In shared clusters, however, these systems waste resources by directing resources to queries that are no longer improving the results given to users.

We describe ReLAQS, a cluster scheduling system for online aggregation queries that aims to reduce latency by assigning resources to queries with the most potential for improvement. ReLAQS utilizes the approximate results each query returns to periodically estimate how much progress each concurrent query is currently making. It then uses this information to predict how much progress each query is expected to make in the near future and redistributes resources in real-time to maximize the overall quality of the answers returned across the cluster. Experiments show that ReLAQS achieves a reduction in latency of up to 47% compared to traditional fair schedulers.

References

  1. Databricks. URL: http://databricks.com/.Google ScholarGoogle Scholar
  2. Ooyala Job Server. URL: https://github.com/ooyala/spark-jobserver.Google ScholarGoogle Scholar
  3. S. Agarwal, H. Milner, A. Kleiner, A. Talwalkar, M. Jordan, S. Madden, B. Mozafari, and I. Stoica. Knowing when you're wrong: Building fast and reliable approximate query processing systems. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, SIGMOD '14, pages 481--492, New York, NY, USA, 2014. ACM.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. S. Agarwal, B. Mozafari, A. Panda, H. Milner, S. Madden, and I. Stoica. BlinkDB: Queries with Bounded Errors and Bounded Response Times on Very Large Data. In ACM EuroSys, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Y. Ahmad, O. Kennedy, C. Koch, and M. Nikolic. Dbtoaster: Higherorder delta processing for dynamic, frequently fresh views. Proceedings of the VLDB Endowment, 5(10):968--979, 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. G. Ananthanarayanan, M. C.-C. Hung, X. Ren, I. Stoica, A. Wierman, and M. Yu. GRASS: Trimming Stragglers in Approximation Analytics. In USENIX NSDI, 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. M. Armbrust, R. S. Xin, C. Lian, Y. Huai, D. Liu, J. K. Bradley, X. Meng, T. Kaftan, M. J. Franklin, A. Ghodsi, et al. Spark sql: Relational data processing in spark. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, pages 1383--1394. ACM, 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. B. Babcock, S. Chaudhuri, and G. Das. Dynamic Sample Selection for Approximate Query Processing. In ACM SIGMOD, 2003.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. A. A. Bhattacharya, D. Culler, E. Friedman, A. Ghodsi, S. Shenker, and I. Stoica. Hierarchical Scheduling for Diverse Datacenter Workloads. In ACM SoCC, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. B. Efron and R. J. Tibshirani. An introduction to the bootstrap. CRC press, 1994.Google ScholarGoogle ScholarCross RefCross Ref
  11. A. Ghodsi, M. Zaharia, B. Hindman, A. Konwinski, S. Shenker, and I. Stoica. Dominant Resource Fairness: Fair Allocation of Multiple Resource Types. In USENIX NSDI, 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. I. Goiri, R. Bianchini, S. Nagarakatte, and T. D. Nguyen. Approxhadoop: Bringing approximations to mapreduce frameworks. In ACM SIGARCH Computer Architecture News, volume 43, pages 383--397. ACM, 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. M. Habib, C. McDiarmid, J. Ramirez-Alfonsin, and B. Reed. Probabilistic methods for algorithmic discrete mathematics, volume 16. Springer Science & Business Media, 2013.Google ScholarGoogle Scholar
  14. Capacity Scheduler. Retrieved 04/20/2017, URL: https://hadoop.apache.org/docs/r2.4.1/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html.Google ScholarGoogle Scholar
  15. J. M. Hellerstein, P. J. Haas, and H. J. Wang. Online Aggregation. In ACM SIGMOD, 1997.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. B. Hindman, A. Konwinski, M. Zaharia, A. Ghodsi, A. D. Joseph, R. Katz, S. Shenker, and I. Stoica. Mesos: A Platform for Fine-grained Resource Sharing in the Data Center. In USENIX NSDI, 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. M. Isard, V. Prabhakaran, J. Currey, U. Wieder, K. Talwar, and A. Goldberg. Quincy: Fair Scheduling for Distributed Computing Clusters. In ACM SOSP, 2009.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. C. Jermaine, S. Arumugam, A. Pol, and A. Dobra. Scalable Approximate Query Processing with the DBO Engine. ACM Transactions on Database Systems, 33(4):23, 2008.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Jin, Li. Preemptive scheduling in mesos framework.Google ScholarGoogle Scholar
  20. R. Johari and J. N. Tsitsiklis. Efficiency Loss in a Network Resource Allocation Game. Math. Oper. Res., 29:407--435, 2004.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. A. John. Mathematical statistics and data analysis. Wadsworth & Brooks/Cole, 1988.Google ScholarGoogle Scholar
  22. F. P. Kelly, A. K. Maulloo, and D. K. H. Tan. Rate Control for Communication Networks: Shadow Prices, Proportional Fairness and Stability. The Journal of the Operational Research Society, 49:237--252, 1998.Google ScholarGoogle ScholarCross RefCross Ref
  23. S. H. Low and D. E. Lapsley. Optimization Flow Control---I: Basic Algorithm and Convergence. IEEE/ACM Transactions on Networking, 7(6):861--874, 1999.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. N. Pansare, V. R. Borkar, C. Jermaine, and T. Condie. Online Aggregation for Large MapReduce Jobs. Proceedings of the VLDB Endowment, 4(11), 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. A. Parameswaran, N. Polyzotis, and H. Garcia-Molina. Seedb: Visualizing database queries efficiently. Proc. VLDB Endow., 7(4):325--328, Dec. 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Y. Park, M. Cafarella, and B. Mozafari. Visualization-aware sampling for very large databases. In Data Engineering (ICDE), 2016 IEEE 32nd International Conference on, pages 755--766. IEEE, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  27. S. Rahman, M. Aliakbarpour, H. K. Kong, E. Blais, K. Karahalios, A. Parameswaran, and R. Rubinfield. I've seen "enough": Incrementally improving visualizations to support rapid decision making. Proc. VLDB Endow., 10(11):1262--1273, Aug. 2017.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. S. Venkataraman, A. Panda, G. Ananthanarayanan, M. J. Franklin, and I. Stoica. The Power of Choice in Data-aware Cluster Scheduling. In USENIX OSDI, 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. E. Wu, L. Jiang, L. Xu, and A. Nandi. Graphical perception in animated bar charts. arXiv preprint arXiv:1604.00080, 2016.Google ScholarGoogle Scholar
  30. S. Wu, B. C. Ooi, and K.-L. Tan. Continuous sampling for online aggregation over multiple queries. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, SIGMOD '10, pages 651--662, New York, NY, USA, 2010. ACM.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Apache Hadoop YARN. Retrieved 02/08/2017, URL: http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html.Google ScholarGoogle Scholar
  32. M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauley, M. J. Franklin, S. Shenker, and I. Stoica. Resilient Distributed Datasets: A Fault-tolerant Abstraction for In-memory Cluster Computing. In USENIX NSDI, 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. K. Zeng, S. Agarwal, A. Dave, M. Armbrust, and I. Stoica. G-ola: Generalized on-line aggregation for interactive analysis on big data. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, pages 913--918. ACM, 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. K. Zeng, S. Agarwal, and I. Stoica. iOLAP: Managing Uncertainty for Efficient Incremental OLAP. In ACM SIGMOD, 2016.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. H. Zhang, G. Ananthanarayanan, P. Bodik, M. Philipose, P. Bahl, and M. J. Freedman. Live Video Analytics at Scale with Approximation and Delay-Tolerance. In USENIX NSDI, 2017.Google ScholarGoogle Scholar
  36. H. Zhang, L. Stafman, A. Or, and M. J. Freedman. Slaq: Quality-driven scheduling for distributed machine learning. In Proceedings of the 2017 Symposium on Cloud Computing, SoCC '17, pages 390--404, New York, NY, USA, 2017. ACM.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. ReLAQS: Reducing Latency for Multi-Tenant Approximate Queries via Scheduling

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          Middleware '19: Proceedings of the 20th International Middleware Conference
          December 2019
          342 pages
          ISBN:9781450370097
          DOI:10.1145/3361525

          Copyright © 2019 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 9 December 2019

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed limited

          Acceptance Rates

          Overall Acceptance Rate203of948submissions,21%

          Upcoming Conference

          MIDDLEWARE '24
          25th International Middleware Conference
          December 2 - 6, 2024
          Hong Kong , Hong Kong

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader