research-article

Automatic construction of an effective training set for prioritizing static analysis warnings

Authors:
Guangtai Liang

School of Electronics Engineering and Computer Science and Peking University, Beijing, China

School of Electronics Engineering and Computer Science and Peking University, Beijing, China
View Profile

,
Ling Wu

School of Electronics Engineering and Computer Science and Peking University, Beijing, China

School of Electronics Engineering and Computer Science and Peking University, Beijing, China
View Profile

,
Qian Wu

School of Electronics Engineering and Computer Science and Peking University, Beijing, China

School of Electronics Engineering and Computer Science and Peking University, Beijing, China
View Profile

,
Qianxiang Wang

School of Electronics Engineering and Computer Science and Peking University, Beijing, China

School of Electronics Engineering and Computer Science and Peking University, Beijing, China
View Profile

,
Tao Xie

North Carolina State University, Raleigh, NC, USA

North Carolina State University, Raleigh, NC, USA
View Profile

,
Hong Mei

School of Electronics Engineering and Computer Science and Peking University, Beijing, China

School of Electronics Engineering and Computer Science and Peking University, Beijing, China
View Profile

ASE '10: Proceedings of the 25th IEEE/ACM International Conference on Automated Software EngineeringSeptember 2010Pages 93–102https://doi.org/10.1145/1858996.1859013

Published:20 September 2010Publication History

ASE '10: Proceedings of the 25th IEEE/ACM International Conference on Automated Software Engineering

Pages 93–102

ABSTRACT

In order to improve ineffective warning prioritization of static analysis tools, various approaches have been proposed to compute a ranking score for each warning. In these approaches, an effective training set is vital in exploring which factors impact the ranking score and how. While manual approaches to build a training set can achieve high effectiveness but suffer from low efficiency (i.e., high cost), existing automatic approaches suffer from low effectiveness. In this paper, we propose an automatic approach for constructing an effective training set. In our approach, we select three categories of impact factors as input attributes of the training set, and propose a new heuristic for identifying actionable warnings to automatically label the training set. Our empirical evaluations show that the precision of the top 22 warnings for Lucene, 20 for ANT, and 6 for Spring can achieve 100% with the help of our constructed training set.

References

}}C. Artho. Jlint - Find Bugs in Java Programs. http://Jlint.sourceforge.net/.Google Scholar
}}N. Ayewah, D. Hovemeyer, J. D. Morgenthaler, J. Penix, and W. Pugh. Using static analysis to find bugs. IEEE Software, vol. 25, no. 5, pages 22--29, 2008. Google ScholarDigital Library
}}C. Boogerd and L. Moonen. Prioritizing software inspection results using static profiling. In Proc. SCAM, pages 149--160, 2006. Google ScholarDigital Library
}}D. Binkley. Source code analysis: a road map. In Proc. FOSE, pages 104--119, 2007. Google ScholarDigital Library
}}J. Bevan, E. J. Whitehead, Jr., S. Kim, and M. Godfrey. Identifying changed source code lines from revision repositories. In Proc. ESEC/FSE, pages 177--186, 2005.Google ScholarDigital Library
}}B. Chess and J. West. Secure programming with static analysis. Aaison Wesley, 2007. Google ScholarDigital Library
}}D. Cubranic and G. C. Murphy. Hipikat: recommending pertinent software development artifacts. In Proc. ICSE, pages 408--418, 2003. Google ScholarDigital Library
}}K. Chen, S. R. Schach, L. Yu, J. Offutt, and G. Z. Heller. Open-source change logs. Empirical Software Engineering, vol. 9, no. 3, pages 197--210, 2004. Google ScholarDigital Library
}}D. Engler, B. Chelf, A. Chou, and S. Hallem. Bugs as deviate behavior: A general approach to inferring errors in system code. In Proc. SOSP, pages 57--72, 2001. Google ScholarDigital Library
}}D. Engler and M. Musuvathi. Static analysis versus software model checking for bug finding. In Proc. VMCAI, pages 191--210, 2004.Google ScholarCross Ref
}}M. Fischer, M. Pinzger, and H. Gall. Populating a release history database from revision control and bug tracking systems. In Proc. ICSM, pages 23--32, 2003. Google ScholarDigital Library
}}FindBugs, available at http://findbugs.sourceforge.net/.Google Scholar
}}Fortify, available at http://www.fortify.net/intro.html.Google Scholar
}}K. Hornik, M. Stinchcombe and H. White. Multilayer feed-forward networks are universal approximators. Neural Networks, vol. 2, pages 359--366, 1989. Google ScholarDigital Library
}}D. Hovemeyer and W. Pugh. Finding bugs is easy. In Proc. OOPSLA, pages 132--136, 2004. Google ScholarDigital Library
}}S. Heckman and L. Williams. On establishing a benchmark for evaluating static analysis alert prioritization and classification techniques. In Pro. ESEM, pages 41--50, 2008. Google ScholarDigital Library
}}S. S. Heckman. Adaptively ranking alerts generated from automated static analysis. ACM Crossroads, 14(1), pages 1--11, 2007. Google ScholarDigital Library
}}S. Kim and M. D. Ernst. Which warnings should I fix first? In Proc. ESEC/FSE, pages 45--54, 2007. Google ScholarDigital Library
}}S. Kim and M. D. Ernst. Prioritizing warning categories by analyzing software history. In Proc. MSR, pages 27--30, 2007. Google ScholarDigital Library
}}T. Kremenek, K. Ashcraft, J. Yang and D. Engler. Correlation exploitation in error ranking. In Proc. FSE, pages 83--93, 2004. Google ScholarDigital Library
}}T. Kremenek and D. R. Engler. Z-ranking: using statistical analysis to counter the impact of static analysis approximations. In Proc. SAS, pages 295--315, 2003. Google ScholarDigital Library
}}Lint4j, available at http://www.jutils.com/.Google Scholar
}}A. Mockus and L. G. Votta. Identifying reasons for software changes using historic databases. In Proc. ICSM, pages 120--130, 2000. Google ScholarDigital Library
}}PMD, available at http://pmd.sourceforge.net/.Google Scholar
}}J. R. Ruthruff, J. Penix, J. D. Morgenthaler, S. Elbaum, and G. Rothermel. Predicting accurate and actionable static analysis warnings: an experimental approach. In Proc. ICSE, pages 341--350, 2008. Google ScholarDigital Library
}}N. Rutar, C. B. Almazan, and J. S. Foster. A comparison of bug finding tools for Java. In Proc. ISSRE, pages 245--256, 2004. Google ScholarDigital Library
}}G. Salton, A. Wong, and C. S. Yang. A vector space model for automatic indexing. Communications of the ACM, vol.18, no.11, pages 613--620, 1975. Google ScholarDigital Library
}}S. E. Sim, S. Easterbrook, and R. C. Holt. Using benchmarking to advance research: a challenge to software engineering, In Proc. ICSE, pages 74--83, 2003. Google ScholarDigital Library
}}J. Spacco, D. Hovemeyer, and W. Pugh. Tracking defect warnings across revisions. In Proc. MSR, pages 133--136, 2006. Google ScholarDigital Library
}}J. Sliwerski, T. Zimmermann and A. Zeller. When do changes induce fixes? In Proc. MSR 2005, pages 1--5, 2005. Google ScholarDigital Library
}}Weka, available at http://www.cs.waikato.ac.nz/~ml/weka/Google Scholar
}}C. C. Williams and J. K. Hollingsworth. Automatic mining of source code repositories to improve static analysis techniques. IEEE Trans. Software Engineering, vol. 31, no. 6, pages 466--480, 2005. Google ScholarDigital Library

Index Terms

Recommendations

Predicting accurate and actionable static analysis warnings: an experimental approach
ICSE '08: Proceedings of the 30th international conference on Software engineering

Static analysis tools report software defects that may or may not be detected by other verification methods. Two challenges complicating the adoption of these tools are spurious false positive warnings and legitimate warnings that are not acted on. This ...
Read More
OASIS: prioritizing static analysis warnings for Android apps based on app user reviews
ESEC/FSE 2017: Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering

Lint is a widely-used static analyzer for detecting bugs/issues in Android apps. However, it can generate many false warnings. One existing solution to this problem is to leverage project history data (e.g., bug fixing statistics) for warning ...
Read More
Semi-supervised Based Training Set Construction for Outlier Detection
CLOUDCOM-ASIA '13: Proceedings of the 2013 International Conference on Cloud Computing and Big Data

Outliers are sparse and few. It's costly to obtain a training set with enough outliers so that existing approaches to the problem of outlier detection seldom processed with supervised manner. However, given a training set with sufficient outliers, ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ASE '10: Proceedings of the 25th IEEE/ACM International Conference on Automated Software Engineering
September 2010
534 pages
ISBN:9781450301169
DOI:10.1145/1858996
General Chair:
Charles Pecheur
Université catholique de Louvain, Belgium
,
Program Chairs:
Jamie Andrews
University of Western Ontario, Canada
,
Elisabetta Di Nitto
Politecnico di Milano, Italy
Copyright © 2010 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 20 September 2010
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
generic-bug-related lines
static analysis tools
training-set construction
warning prioritization
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate82of337submissions,24%
Upcoming Conference
ASE '24

Sponsor:

sigsoft online

sigsoft online

ASE '24: 39th IEEE/ACM International Conference on Automated Software Engineering

October 27 - November 1, 2024

Sacramento , CA , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 26
  Total Citations
  View Citations
- 457
  Total Downloads
- Downloads (Last 12 months)12
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Automatic construction of an effective training set for prioritizing static analysis warnings

ASE '10: Proceedings of the 25th IEEE/ACM International Conference on Automated Software Engineering

ABSTRACT

References

Cited By

Index Terms

Recommendations

Predicting accurate and actionable static analysis warnings: an experimental approach

OASIS: prioritizing static analysis warnings for Android apps based on app user reviews

Semi-supervised Based Training Set Construction for Outlier Detection

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Automatic construction of an effective training set for prioritizing static analysis warnings

ASE '10: Proceedings of the 25th IEEE/ACM International Conference on Automated Software Engineering

ABSTRACT

References

Cited By

Index Terms

Recommendations

Predicting accurate and actionable static analysis warnings: an experimental approach

OASIS: prioritizing static analysis warnings for Android apps based on app user reviews

Semi-supervised Based Training Set Construction for Outlier Detection

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media