research-article

Reference-driven performance anomaly identification

Authors:
Kai Shen

University of Rochester, Rochester, NY, USA

University of Rochester, Rochester, NY, USA
View Profile

,
Christopher Stewart

University of Rochester, Rochester, NY, USA

University of Rochester, Rochester, NY, USA
View Profile

,
Chuanpeng Li

University of Rochester, Rochester, NY, USA

University of Rochester, Rochester, NY, USA
View Profile

,
Xin Li

University of Rochester, Rochester, NY, USA

University of Rochester, Rochester, NY, USA
View Profile

SIGMETRICS '09: Proceedings of the eleventh international joint conference on Measurement and modeling of computer systemsJune 2009Pages 85–96https://doi.org/10.1145/1555349.1555360

Published:15 June 2009Publication History

SIGMETRICS '09: Proceedings of the eleventh international joint conference on Measurement and modeling of computer systems

Pages 85–96

ABSTRACT

Complex system software allows a variety of execution conditions on system configurations and workload properties. This paper explores a principled use of reference executions--those of similar execution conditions from the target--to help identify the symptoms and causes of performance anomalies. First, to identify anomaly symptoms, we construct change profiles that probabilistically characterize expected performance deviations between target and reference executions. By synthesizing several single-parameter change profiles, we can scalably identify anomalous reference-to-target changes in a complex system with multiple execution parameters. Second, to narrow the scope of anomaly root cause analysis, we filter anomaly-related low-level system metrics as those that manifest very differently between target and reference executions. Our anomaly identification approach requires little expert knowledge or detailed models on system internals and consequently it can be easily deployed. Using empirical case studies on the Linux I/O subsystem and a J2EE-based distributed online service, we demonstrate our approach's effectiveness in identifying performance anomalies over a wide range of execution conditions as well as multiple system software versions. In particular, we discovered five previously unknown performance anomaly causes in the Linux 2.6.23 kernel. Additionally, our preliminary results suggest that online anomaly detection and system reconfiguration may help evade performance anomalies in complex online systems.

References

Realistic nonstationary online workloads. http://www.cs.rochester.edu/u/stewart/models.html.Google Scholar
MySQL JDBC driver. http://www.mysql.com/products/connector.Google Scholar
R.A. Fisher. The arrangement of field experiments. J. of the Ministry of Agriculture of Great Britain, 33:503--513, 1926.Google Scholar
M. Grindal, J. Offutt, and S.F. Andler. Combination testing strategies: A survey. Software Testing, Verification and Reliability, 15(3):167--199, Mar. 2005.Google ScholarCross Ref
S. Iyer and P. Druschel. Anticipatory scheduling: A disk scheduling framework to overcome deceptive idleness in synchronous I/O. In 18th ACM Symp. on Operating Systems Principles, pages 117--130, Banff, Canada, Oct. 2001. Google ScholarDigital Library
N. Joukov, A. Traeger, R. Iyer, C.P. Wright, and E. Zadok. Operating system profiling via latency analysis. In 7th USENIX Symp. on Operating Systems Design and Implementation, pages 89--102, Seattle, WA, Nov. 2006. Google ScholarDigital Library
C. Li and K. Shen. Managing prefetch memory for data-intensive online servers. In 4th USENIX Conf. on File and Storage Technologies, pages 253--266, Dec. 2005. Google ScholarDigital Library
C. Li, K. Shen, and A. Papathanasiou. Competitive prefetching for concurrent sequential I/O. In Second EuroSys Conf., pages 189--202, Lisbon, Portugal, Mar. 2007. Google ScholarDigital Library
Linux kernel bug tracker. http://bugzilla.kernel.org/.Google Scholar
Linux kernel bug tracker on "many pre-mature anticipation timeouts in anticipatory I/O scheduler". http://bugzilla.kernel.org/show_bug.cgi?id=10756.Google Scholar
M.P. Mesnier, M. Wachs, R.R. Sambasivan, A.X. Zheng, and G.R. Ganger. Modeling the relative fitness of storage. In ACM SIGMETRICS, pages 37--48, San Diego, CA, June 2007. Google ScholarDigital Library
P. Reynolds, C. Killian, J. Wiener, J. Mogul, M. Shah, and A. Vahdat. Pip: Detecting the unexpected in distributed systems. In Third USENIX Symp. on Networked Systems Design and Implementation, San Jose, CA, May 2006. Google ScholarDigital Library
RUBiS: Rice University bidding system. http://rubis.objectweb.org.Google Scholar
Y. Rubner, C. Tomasi, and L.J. Guibas. The earth mover's distance as a metric for image retrieval. Int'l J. of Computer Vision, 40(2):99--121, 2000. Google ScholarDigital Library
K. Shen, M. Zhong, and C. Li. I/O system performance debugging using model-driven anomaly characterization. In 4th USENIX Conf. on File and Storage Technologies, pages 309--322, San Francisco, CA, Dec. 2005. Google ScholarDigital Library
C. Stewart, T. Kelly, and A. Zhang. Exploiting nonstationarity for performance prediction. In Second EuroSys Conf., pages 31--44, Lisbon, Portugal, Mar. 2007. Google ScholarDigital Library
C. Stewart and K. Shen. Performance modeling and system management for multi-component online services. In Second USENIX Symp. on Networked Systems Design and Implementation, pages 71--84, Boston, MA, May 2005. Google ScholarDigital Library
E. Thereska and G.R. Ganger. IRONModel: Robust performance models in the wild. In ACM SIGMETRICS, pages 253--264, Annapolis, MD, June 2008. Google ScholarDigital Library
A. Traeger, I. Deras, and E. Zadok. DARC: Dynamic analysis of root causes of latency distributions. In ACM SIGMETRICS, pages 277--288, Annapolis, MD, June 2008. Google ScholarDigital Library
J. Tucek, S. Lu, C. Huang, S. Xanthos, and Y. Zhou. Triage: Diagnosing production run failures at the user's site. In 21th ACM Symp. on Operating Systems Principles, pages 131--144, Stevenson, WA, Oct. 2007. Google ScholarDigital Library
H.J. Wang, J.C. Platt, Y. Chen, R. Zhang, and Y.-M. Wang. Automatic misconfiguration troubleshooting with PeerPressure. In 6th USENIX Symp. on Operating Systems Design and Implementation, pages 245--258, San Francisco, CA, Dec. 2004. Google ScholarDigital Library
A. Zeller. Isolating cause-effect chains from computer programs. In 10th ACM Symp. on Foundations of Software Engineering, pages 1--10, Charleston, SC, Nov. 2002. Google ScholarDigital Library

Index Terms

Reference-driven performance anomaly identification
1. General and reference
  1. Cross-computing tools and techniques
    1. Measurement
2. Software and its engineering
  1. Software creation and management
    1. Software development process management
      1. Software development methods

Recommendations

Reference-driven performance anomaly identification
SIGMETRICS '09

Complex system software allows a variety of execution conditions on system configurations and workload properties. This paper explores a principled use of reference executions--those of similar execution conditions from the target--to help identify the ...
Read More
A Performance Anomaly Detection and Analysis Framework for DBMS Development

Detecting performance anomalies and finding their root causes are tedious tasks requiring much manual work. Functionality enhancements in DBMS development as in most software development often introduce performance problems in addition to bugs. To ...
Read More
Performance Anomaly Detection and Bottleneck Identification

In order to meet stringent performance requirements, system administrators must effectively detect undesirable performance behaviours, identify potential root causes, and take adequate corrective measures. The problem of uncovering and understanding ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGMETRICS '09: Proceedings of the eleventh international joint conference on Measurement and modeling of computer systems
June 2009
336 pages
ISBN:9781605585116
DOI:10.1145/1555349
General Chairs:
John Douceur
Microsoft Research, USA
,
Albert Greenberg
Microsoft Research, USA
,
Program Chairs:
Thomas Bonald
Orange Labs, France
,
Jason Nieh
Columbia University, USA
ACM SIGMETRICS Performance Evaluation Review Volume 37, Issue 1
SIGMETRICS '09
June 2009
320 pages
ISSN:0163-5999
DOI:10.1145/2492101
Issue’s Table of Contents
Copyright © 2009 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 15 June 2009
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
operating system
performance anomaly
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate459of2,691submissions,17%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 34
  Total Citations
  View Citations
- 512
  Total Downloads
- Downloads (Last 12 months)12
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Reference-driven performance anomaly identification

SIGMETRICS '09: Proceedings of the eleventh international joint conference on Measurement and modeling of computer systems

ABSTRACT

References

Cited By

Index Terms

Recommendations

Reference-driven performance anomaly identification

A Performance Anomaly Detection and Analysis Framework for DBMS Development

Performance Anomaly Detection and Bottleneck Identification