Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1555349.1555360acmconferencesArticle/Chapter ViewAbstractPublication PagesmetricsConference Proceedingsconference-collections
research-article

Reference-driven performance anomaly identification

Published:15 June 2009Publication History

ABSTRACT

Complex system software allows a variety of execution conditions on system configurations and workload properties. This paper explores a principled use of reference executions--those of similar execution conditions from the target--to help identify the symptoms and causes of performance anomalies. First, to identify anomaly symptoms, we construct change profiles that probabilistically characterize expected performance deviations between target and reference executions. By synthesizing several single-parameter change profiles, we can scalably identify anomalous reference-to-target changes in a complex system with multiple execution parameters. Second, to narrow the scope of anomaly root cause analysis, we filter anomaly-related low-level system metrics as those that manifest very differently between target and reference executions. Our anomaly identification approach requires little expert knowledge or detailed models on system internals and consequently it can be easily deployed. Using empirical case studies on the Linux I/O subsystem and a J2EE-based distributed online service, we demonstrate our approach's effectiveness in identifying performance anomalies over a wide range of execution conditions as well as multiple system software versions. In particular, we discovered five previously unknown performance anomaly causes in the Linux 2.6.23 kernel. Additionally, our preliminary results suggest that online anomaly detection and system reconfiguration may help evade performance anomalies in complex online systems.

References

  1. Realistic nonstationary online workloads. http://www.cs.rochester.edu/u/stewart/models.html.Google ScholarGoogle Scholar
  2. MySQL JDBC driver. http://www.mysql.com/products/connector.Google ScholarGoogle Scholar
  3. R.A. Fisher. The arrangement of field experiments. J. of the Ministry of Agriculture of Great Britain, 33:503--513, 1926.Google ScholarGoogle Scholar
  4. M. Grindal, J. Offutt, and S.F. Andler. Combination testing strategies: A survey. Software Testing, Verification and Reliability, 15(3):167--199, Mar. 2005.Google ScholarGoogle ScholarCross RefCross Ref
  5. S. Iyer and P. Druschel. Anticipatory scheduling: A disk scheduling framework to overcome deceptive idleness in synchronous I/O. In 18th ACM Symp. on Operating Systems Principles, pages 117--130, Banff, Canada, Oct. 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. N. Joukov, A. Traeger, R. Iyer, C.P. Wright, and E. Zadok. Operating system profiling via latency analysis. In 7th USENIX Symp. on Operating Systems Design and Implementation, pages 89--102, Seattle, WA, Nov. 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. C. Li and K. Shen. Managing prefetch memory for data-intensive online servers. In 4th USENIX Conf. on File and Storage Technologies, pages 253--266, Dec. 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. C. Li, K. Shen, and A. Papathanasiou. Competitive prefetching for concurrent sequential I/O. In Second EuroSys Conf., pages 189--202, Lisbon, Portugal, Mar. 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Linux kernel bug tracker. http://bugzilla.kernel.org/.Google ScholarGoogle Scholar
  10. Linux kernel bug tracker on "many pre-mature anticipation timeouts in anticipatory I/O scheduler". http://bugzilla.kernel.org/show_bug.cgi?id=10756.Google ScholarGoogle Scholar
  11. M.P. Mesnier, M. Wachs, R.R. Sambasivan, A.X. Zheng, and G.R. Ganger. Modeling the relative fitness of storage. In ACM SIGMETRICS, pages 37--48, San Diego, CA, June 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. P. Reynolds, C. Killian, J. Wiener, J. Mogul, M. Shah, and A. Vahdat. Pip: Detecting the unexpected in distributed systems. In Third USENIX Symp. on Networked Systems Design and Implementation, San Jose, CA, May 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. RUBiS: Rice University bidding system. http://rubis.objectweb.org.Google ScholarGoogle Scholar
  14. Y. Rubner, C. Tomasi, and L.J. Guibas. The earth mover's distance as a metric for image retrieval. Int'l J. of Computer Vision, 40(2):99--121, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. K. Shen, M. Zhong, and C. Li. I/O system performance debugging using model-driven anomaly characterization. In 4th USENIX Conf. on File and Storage Technologies, pages 309--322, San Francisco, CA, Dec. 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. C. Stewart, T. Kelly, and A. Zhang. Exploiting nonstationarity for performance prediction. In Second EuroSys Conf., pages 31--44, Lisbon, Portugal, Mar. 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. C. Stewart and K. Shen. Performance modeling and system management for multi-component online services. In Second USENIX Symp. on Networked Systems Design and Implementation, pages 71--84, Boston, MA, May 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. E. Thereska and G.R. Ganger. IRONModel: Robust performance models in the wild. In ACM SIGMETRICS, pages 253--264, Annapolis, MD, June 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. A. Traeger, I. Deras, and E. Zadok. DARC: Dynamic analysis of root causes of latency distributions. In ACM SIGMETRICS, pages 277--288, Annapolis, MD, June 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. J. Tucek, S. Lu, C. Huang, S. Xanthos, and Y. Zhou. Triage: Diagnosing production run failures at the user's site. In 21th ACM Symp. on Operating Systems Principles, pages 131--144, Stevenson, WA, Oct. 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. H.J. Wang, J.C. Platt, Y. Chen, R. Zhang, and Y.-M. Wang. Automatic misconfiguration troubleshooting with PeerPressure. In 6th USENIX Symp. on Operating Systems Design and Implementation, pages 245--258, San Francisco, CA, Dec. 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. A. Zeller. Isolating cause-effect chains from computer programs. In 10th ACM Symp. on Foundations of Software Engineering, pages 1--10, Charleston, SC, Nov. 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Reference-driven performance anomaly identification

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        SIGMETRICS '09: Proceedings of the eleventh international joint conference on Measurement and modeling of computer systems
        June 2009
        336 pages
        ISBN:9781605585116
        DOI:10.1145/1555349
        • cover image ACM SIGMETRICS Performance Evaluation Review
          ACM SIGMETRICS Performance Evaluation Review  Volume 37, Issue 1
          SIGMETRICS '09
          June 2009
          320 pages
          ISSN:0163-5999
          DOI:10.1145/2492101
          Issue’s Table of Contents

        Copyright © 2009 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 15 June 2009

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate459of2,691submissions,17%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader