research-article

Context-Aware Code Change Embedding for Better Patch Correctness Assessment

Authors:
Bo Lin

National University of Defense Technology, Changsha, China

National University of Defense Technology, Changsha, China

0000-0001-5905-4677
View Profile

,
Shangwen Wang

National University of Defense Technology, Changsha, China

National University of Defense Technology, Changsha, China

0000-0003-1469-2063
View Profile

,
Ming Wen

Huazhong University of Science and Technology, Wuhan, China

Huazhong University of Science and Technology, Wuhan, China

0000-0001-5588-9618
View Profile

,
Xiaoguang Mao

National University of Defense Technology, Changsha, China

National University of Defense Technology, Changsha, China

0000-0003-4204-7424
View Profile

ACM Transactions on Software Engineering and Methodology Volume 31 Issue 3Article No.: 51pp 1–29https://doi.org/10.1145/3505247

Published:18 May 2022Publication History

ACM Transactions on Software Engineering and Methodology

Abstract

Despite the capability in successfully fixing more and more real-world bugs, existing Automated Program Repair (APR) techniques are still challenged by the long-standing overfitting problem (i.e., a generated patch that passes all tests is actually incorrect). Plenty of approaches have been proposed for automated patch correctness assessment (APCA). Nonetheless, dynamic ones (i.e., those that needed to execute tests) are time-consuming while static ones (i.e., those built on top of static code features) are less precise. Therefore, embedding techniques have been proposed recently, which assess patch correctness via embedding token sequences extracted from the changed code of a generated patch. However, existing techniques rarely considered the context information and program structures of a generated patch, which are crucial for patch correctness assessment as revealed by existing studies. In this study, we explore the idea of context-aware code change embedding considering program structures for patch correctness assessment. Specifically, given a patch, we not only focus on the changed code but also take the correlated unchanged part into consideration, through which the context information can be extracted and leveraged. We then utilize the AST path technique for representation where the structure information from AST node can be captured. Finally, based on several pre-defined heuristics, we build a deep learning based classifier to predict the correctness of the patch. We implemented this idea as Cache and performed extensive experiments to assess its effectiveness. Our results demonstrate that Cache can (1) perform better than previous representation learning based techniques (e.g., Cache relatively outperforms existing techniques by \( \approx \)6%, \( \approx \)3%, and \( \approx \)16%, respectively under three diverse experiment settings), and (2) achieve overall higher performance than existing APCA techniques while even being more precise than certain dynamic ones including PATCH-SIM (92.9% vs. 83.0%). Further results reveal that the context information and program structures leveraged by Cache contributed significantly to its outstanding performance.

REFERENCES

[1] Abreu Rui, Zoeteweij Peter, and Gemund Arjan JC Van. 2007. On the accuracy of spectrum-based fault localization. In Testing: Academic and Industrial Conference Practice and Research Techniques-MUTATION. IEEE, 89–98. Google ScholarCross Ref
[2] Alon Uri, Brody Shaked, Levy Omer, and Yahav Eran. 2019. code2seq: Generating sequences from structured representations of code. In Proceedings of the 7th International Conference on Learning Representations. OpenReview.net.Google Scholar
[3] Alon Uri, Sadaka Roy, Levy Omer, and Yahav Eran. 2020. Structural language models of code. In Proceedings of 37th International Conference on Machine Learning.Google ScholarDigital Library
[4] Alon Uri, Zilberstein Meital, Levy Omer, and Yahav Eran. 2018. A general path-based representation for predicting program properties. In Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM, 404–419. Google ScholarDigital Library
[5] Alon Uri, Zilberstein Meital, Levy Omer, and Yahav Eran. 2019. code2vec: Learning distributed representations of code. Proceedings of the ACM on Programming Languages 3, POPL (2019), 40:1–40:29. Google ScholarDigital Library
[6] Alshammari Abdulrahman, Morris Christopher, Hilton Michael, and Bell Jonathan. 2021. FlakeFlagger: Predicting flakiness without rerunning tests. In 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE). 1572–1584. Google ScholarDigital Library
[7] Bader Johannes, Scott Andrew, Pradel Michael, and Chandra Satish. 2019. Getafix: Learning to fix bugs automatically. Proceedings of the ACM on Programming Languages 3, OOPSLA (2019), 159:1–159:27. Google ScholarDigital Library
[8] Breiman Leo, Friedman Jerome H., Olshen Richard A., and Stone Charles J.. 1984. Classification and Regression Trees. Routledge. Google ScholarCross Ref
[9] Brody Shaked, Alon Uri, and Yahav Eran. 2020. A structural model for contextual code changes. Proceedings of the ACM on Programming Languages 4, OOPSLA (2020), 1–28.Google ScholarDigital Library
[10] Chakraborty Saikat, Ding Yangruibo, Allamanis Miltiadis, and Ray Baishakhi. 2020. CODIT: Code editing with tree-based neural models. IEEE Transactions on Software Engineering (2020).Google Scholar
[11] Chen Lingchao, Ouyang Yicheng, and Zhang Lingming. 2021. Fast and precise on-the-fly patch validation for all. In Proceedings of the 43rd International Conference on Software Engineering.Google ScholarDigital Library
[12] Chen Liushan, Pei Yu, and Furia Carlo A.. 2017. Contract-based program repair without the contracts. In Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering. 637–647. Google ScholarCross Ref
[13] Chen Zimin, Kommrusch Steve James, Tufano Michele, Pouchet Louis-Noël, Poshyvanyk Denys, and Monperrus Martin. 2019. Sequencer: Sequence-to-sequence learning for end-to-end program repair. IEEE Trans. on Software Engineering (2019).Google Scholar
[14] Csuvik Viktor, Horváth Dániel, Horváth Ferenc, and Vidács László. 2020. Utilizing source code embeddings to identify correct patches. In Proceedings of the 2nd International Workshop on Intelligent Bug Fixing. IEEE, 18–25. Google ScholarCross Ref
[15] Devlin Jacob, Chang Ming-Wei, Lee Kenton, and Toutanova Kristina. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 4171–4186. Google ScholarCross Ref
[16] Durieux Thomas, Madeiral Fernanda, Martinez Matias, and Abreu Rui. 2019. Empirical review of Java program repair tools: A large-scale experiment on 2,141 bugs and 23,551 repair attempts. In Proceedings of the 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. ACM, 302–313. Google ScholarDigital Library
[17] Durieux Thomas and Monperrus Martin. 2016. DynaMoth: Dynamic code synthesis for automatic program repair. In Proceedings of the 11th International Workshop in Automation of Software Test. ACM, 85–91. Google ScholarDigital Library
[18] Durieux Thomas and Monperrus Martin. 2016. IntroClassJava: A benchmark of 297 small and buggy Java programs. In Technical Report #hal-01272126. University of Lille.Google Scholar
[19] Ernst Michael D., Cockrell Jake, Griswold William G., and Notkin David. 2001. Dynamically discovering likely program invariants to support program evolution. IEEE Transactions on Software Engineering 27, 2 (2001), 99–123.Google ScholarDigital Library
[20] Ernst Michael D., Cockrell Jake, Griswold William G., and Notkin David. 2001. Dynamically discovering likely program invariants to support program evolution. IEEE Transactions on Software Engineering 27, 2 (Feb. 2001), 99–123.Google ScholarDigital Library
[21] Falleri Jean-Rémy, Morandat Floréal, Blanc Xavier, Martinez Matias, and Monperrus Martin. 2014. Fine-grained and accurate source code differencing. In Proceedings of the 29th ACM/IEEE International Conference on Automated Software Engineering. ACM, 313–324. Google ScholarDigital Library
[22] Fan Yuanrui, Xia Xin, Lo David, and Hassan Ahmed E.. 2018. Chaff from the wheat: Characterizing and determining valid bug reports. IEEE Transactions on Software Engineering 46, 5 (2018), 495–525.Google ScholarCross Ref
[23] Fraser Gordon and Arcuri Andrea. 2011. EvoSuite: Automatic test suite generation for object-oriented software. In Proceedings of the 19th ACM SIGSOFT Symposium and the 13th European Conference on Foundations of Software Engineering. 416–419.Google ScholarDigital Library
[24] Guo Anbang, Mao Xiaoguang, Yang Deheng, and Wang Shangwen. 2018. An empirical study on the effect of dynamic slicing on automated program repair efficiency. In 2018 IEEE International Conference on Software Maintenance and Evolution (ICSME). 554–558.Google ScholarCross Ref
[25] Ho Tin Kam. 1995. Random decision forests. In Proceedings of 3rd International Conference on Document Analysis and Recognition, Vol. 1. IEEE, 278–282.Google ScholarDigital Library
[26] Hoang Thong, Kang Hong Jin, Lawall Julia, and Lo David. 2020. CC2Vec: Distributed representations of code changes. In Proceedings of the 42nd International Conference on Software Engineering. ACM, 518–529. Google ScholarDigital Library
[27] Hua Jinru, Zhang Mengshi, Wang Kaiyuan, and Khurshid Sarfraz. 2018. Towards practical program repair with on-demand candidate generation. In Proceedings of the 40th International Conference on Software Engineering. ACM, 12–23. Google ScholarDigital Library
[28] Jiang Jiajun, Xiong Yingfei, Zhang Hongyu, Gao Qing, and Chen Xiangqun. 2018. Shaping program repair space with existing patches and similar code. In Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis. ACM, 298–309. Google ScholarDigital Library
[29] Just René, Jalali Darioush, and Ernst Michael D.. 2014. Defects4J: A database of existing faults to enable controlled testing studies for Java programs. In Proceedings of the 23rd International Symposium on Software Testing and Analysis. ACM, 437–440. Google ScholarDigital Library
[30] Karampatsis Rafael-Michael and Sutton Charles A.. 2020. How often do single-statement bugs occur? the ManySStuBs4J dataset. In Proceedings of the 17th Mining Software Repositories. IEEE. http://arxiv.org/abs/1905.13334.Google ScholarDigital Library
[31] Kingma Diederik P. and Ba Jimmy. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).Google Scholar
[32] Kleinbaum David G., Dietz K., Gail M., and Mitchel Klein. 2002. Logistic Regression. Springer.Google Scholar
[33] Kovalenko Vladimir, Bogomolov Egor, Bryksin Timofey, and Bacchelli Alberto. 2019. PathMiner: A library for mining of path-based representations of code. In 2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR). IEEE, 13–17.Google ScholarDigital Library
[34] Koyuncu Anil, Liu Kui, Bissyandé Tegawendé F., Kim Dongsun, Klein Jacques, Monperrus Martin, and Traon Yves Le. 2018. FixMiner: Mining relevant fix patterns for automated program repair. arXiv preprint arXiv:1810.01791 (2018).Google Scholar
[35] Le Quoc V. and Mikolov Tomas. 2014. Distributed representations of sentences and documents. In Proceedings of the 31st International Conference on Machine Learning. JMLR.org, 1188–1196. http://proceedings.mlr.press/v32/le14.html.Google ScholarDigital Library
[36] Le Xuan-Bach D., Bao Lingfeng, Lo David, Xia Xin, Li Shanping, and Pasareanu Corina. 2019. On reliability of patch correctness assessment. In Proceedings of the 41st International Conference on Software Engineering. IEEE, 524–535. Google ScholarDigital Library
[37] Le Xuan-Bach D., Chu Duc-Hiep, Lo David, Goues Claire Le, and Visser Willem. 2017. S3: Syntax-and semantic-guided repair synthesis via programming by examples. In Proceedings of the 11th Joint Meeting on Foundations of Software Engineering. ACM, 593–604. Google ScholarDigital Library
[38] Le Xuan Bach D., Lo David, and Goues Claire Le. 2016. History driven program repair. In Proceedings of the 23rd IEEE International Conference on Software Analysis, Evolution, and Reengineering. 213–224. Google ScholarCross Ref
[39] Le Xuan Bach D., Thung Ferdian, Lo David, and Goues Claire Le. 2018. Overfitting in semantics-based automated program repair. Empirical Software Engineering 23, 5 (2018), 3007–3033. Google ScholarDigital Library
[40] Goues Claire Le, Nguyen ThanhVu, Forrest Stephanie, and Weimer Westley. 2012. GenProg: A generic method for automatic software repair. IEEE Transactions on Software Engineering 38, 1 (2012), 54–72. Google ScholarDigital Library
[41] Goues Claire Le, Pradel Michael, and Roychoudhury Abhik. 2019. Automated program repair. Commun. ACM 62, 12 (2019), 56–65. Google ScholarDigital Library
[42] Li Hang. 2011. A short introduction to learning to rank. IEICE Transactions on Information and Systems 94, 10 (2011), 1854–1862.Google ScholarCross Ref
[43] Li Yi, Wang Shaohua, Nguyen Tien N., and Nguyen Son Van. 2019. Improving bug detection via context-based code representation learning and attention-based neural networks. Proceedings of the ACM on Programming Languages 3, OOPSLA (2019), 162:1–162:30. Google ScholarDigital Library
[44] Lin Bo, Wang Shangwen, Wen Ming, Zhang Zhang, Wu Hongjun, Qin Yihao, and Mao Xiaoguang. 2020. Understanding the non-repairability factors of automated program repair techniques. In 2020 27th Asia-Pacific Software Engineering Conference (APSEC). IEEE, 71–80. Google ScholarCross Ref
[45] Lin Derrick, Koppel James, Chen Angela, and Solar-Lezama Armando. 2017. QuixBugs: A multi-lingual program repair benchmark set based on the Quixey challenge. In Proceedings Companion of the 2017 ACM SIGPLAN International Conference on Systems, Programming, Languages, and Applications: Software for Humanity. ACM, 55–56. Google ScholarDigital Library
[46] Liu Kui, Kim Dongsun, Bissyandé Tegawendé F., Kim Tae-young, Kim Kisub, Koyuncu Anil, Kim Suntae, and Traon Yves Le. 2019. Learning to spot and refactor inconsistent method names. In Proceedings of the 41st International Conference on Software Engineering. IEEE, 1–12. Google ScholarDigital Library
[47] Liu Kui, Kim Dongsun, Koyuncu Anil, Li Li, Bissyandé Tegawendé F., and Traon Yves Le. 2018. A closer look at real-world patches. In Proceedings of the 34th International Conference on Software Maintenance and Evolution. IEEE, 275–286. Google ScholarCross Ref
[48] Liu Kui, Koyuncu Anil, Bissyandé Tegawendé F., Kim Dongsun, Klein Jacques, and Traon Yves Le. 2019. You cannot fix what you cannot find! An investigation of fault localization bias in benchmarking automated program repair systems. In Proceedings of the 12th IEEE International Conference on Software Testing, Verification and Validation. IEEE, 102–113. Google ScholarCross Ref
[49] Liu Kui, Koyuncu Anil, Kim Dongsun, and Bissyandé Tegawendé F.. 2019. AVATAR: Fixing semantic bugs with fix patterns of static analysis violations. In Proceedings of the 26th IEEE International Conference on Software Analysis, Evolution and Reengineering. IEEE, 456–467. Google ScholarCross Ref
[50] Liu Kui, Koyuncu Anil, Kim Dongsun, and Bissyandé Tegawendé F.. 2019. TBar: Revisiting template-based automated program repair. In Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis. ACM, 31–42. Google ScholarDigital Library
[51] Liu Kui, Wang Shangwen, Koyuncu Anil, Kim Kisub, Bissyandé Tegawendé F., Kim Dongsun, Wu Peng, Klein Jacques, Mao Xiaoguang, and Traon Yves Le. 2020. On the efficiency of test suite based program repair: A systematic assessment of 16 automated repair systems for Java programs. In Proceedings of the 42nd International Conference on Software Engineering. ACM, 615–627. Google ScholarDigital Library
[52] Liu Xuliang and Zhong Hao. 2018. Mining stackoverflow for program repair. In Proceedings of the 25th IEEE International Conference on Software Analysis, Evolution and Reengineering. IEEE, 118–129. Google ScholarCross Ref
[53] Liu Zhongxin, Xia Xin, Yan Meng, and Li Shanping. 2020. Automating just-in-time comment updating. In Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering. ACM.Google ScholarDigital Library
[54] Long Fan and Rinard Martin. 2015. Staged program repair with condition synthesis. In Proceedings of the 10th Joint Meeting on Foundations of Software Engineering. ACM, 166–178. Google ScholarDigital Library
[55] Long Fan and Rinard Martin. 2016. An analysis of the search spaces for generate and validate patch generation systems. In Proceedings of the 38th International Conference on Software Engineering. IEEE, 702–713. Google ScholarDigital Library
[56] Lozoya Rocío Cabrera, Baumann Arnaud, Sabetta Antonino, and Bezzi Michele. 2021. Commit2vec: Learning distributed representations of code changes. SN Computer Science 2, 3 (2021), 1–16.Google Scholar
[57] Lutellier Thibaud, Pham Hung Viet, Pang Lawrence, Li Yitong, Wei Moshi, and Tan Lin. 2020. CoCoNuT: Combining context-aware neural translation models using ensemble for program repair. In Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis. 101–114. Google ScholarDigital Library
[58] Maaten Laurens van der and Hinton Geoffrey. 2008. Visualizing data using t-SNE. Journal of Machine Learning Research 9, (Nov2008), 2579–2605.Google Scholar
[59] Madeiral Fernanda, Urli Simon, Maia Marcelo, and Monperrus Martin. 2019. Bears: An extensible Java bug benchmark for automatic program repair studies. In Proceedings of the 26th International Conference on Software Analysis, Evolution and Reengineering. IEEE, 468–478. Google ScholarCross Ref
[60] Marinescu Paul Dan and Cadar Cristian. 2013. KATCH: High-coverage testing of software patches. In Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering. 235–245.Google ScholarDigital Library
[61] Martinez Matias and Monperrus Martin. 2016. Astor: A program repair library for Java (demo). In Proceedings of the 25th International Symposium on Software Testing and Analysis. ACM, 441–444. Google ScholarDigital Library
[62] Martinez Matias and Monperrus Martin. 2018. Ultra-large repair search space with automatically mined templates: The Cardumen mode of Astor. In Proceedings of the 10th International Symposium on Search Based Software Engineering. Springer, 65–86. Google ScholarCross Ref
[63] Monperrus Martin. 2018. The living review on automated program repair. In HAL/Archives-Ouvertes. fr, Technical Report.Google Scholar
[64] Nilizadeh Amirfarhad, Leavens Gary T., Le Xuan-Bach D., Pasareanu Corina S., and Cok David R.. 2021. Exploring true test overfitting in dynamic automated program repair using formal methods. In Proceedings of the 14th IEEE International Conference on Software Testing, Verification and Validation.Google ScholarCross Ref
[65] Pacheco Carlos and Ernst Michael D.. 2007. Randoop: Feedback-directed random testing for Java. In Companion to the 22nd ACM SIGPLAN Conference on Object-oriented Programming Systems and Applications Companion. 815–816.Google ScholarDigital Library
[66] Patil Tina R. and Sherekar Swati Sunil. 2013. Performance analysis of Naive Bayes and J48 classification algorithm for data classification. International Journal of Computer Science and Applications 6, 2 (2013), 256–261.Google Scholar
[67] Qi Yuhua, Mao Xiaoguang, Lei Yan, Dai Ziying, and Wang Chengsong. 2014. The strength of random search on automated program repair. In Proceedings of the 36th International Conference on Software Engineering. ACM, 254–265. Google ScholarDigital Library
[68] Qin Yihao, Wang Shangwen, Liu Kui, Mao Xiaoguang, and Bissyandé Tegawendé F.. 2021. On the impact of flaky tests in automated program repair. In Proceedings of the 28th IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, 295–306. Google ScholarCross Ref
[69] Rubinstein Reuven Y.. 1999. The cross-entropy method for combinatorial and continuous optimization. Methodology and Computing in Applied Probability 1 (1999), 127–190.Google ScholarDigital Library
[70] Saha Ripon, Lyu Yingjun, Lam Wing, Yoshida Hiroaki, and Prasad Mukul. 2018. Bugs.jar: A large-scale, diverse dataset of real-world Java bugs. In Proceedings of the 15th IEEE/ACM International Conference on Mining Software Repositories. ACM, 10–13. Google ScholarDigital Library
[71] Smith Edward K., Barr Earl T., Goues Claire Le, and Brun Yuriy. 2015. Is the cure worse than the disease? overfitting in automated program repair. In Proceedings of the 10th Joint Meeting on Foundations of Software Engineering. ACM, 532–543. Google ScholarDigital Library
[72] Sobreira Victor, Durieux Thomas, Madeiral Fernanda, Monperrus Martin, and Maia Marcelo de Almeida. 2018. Dissection of a bug dataset: Anatomy of 395 patches from Defects4J. In Proceedings of the 25th International Conference on Software Analysis, Evolution and Reengineering. IEEE, 130–140. Google ScholarCross Ref
[73] Srivastava Nitish, Hinton Geoffrey E., Krizhevsky Alex, Sutskever Ilya, and Salakhutdinov Ruslan. 2014. Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research 15 (2014), 1929–1958.Google ScholarDigital Library
[74] Tan Shin Hwei, Yoshida Hiroaki, Prasad Mukul R., and Roychoudhury Abhik. 2016. Anti-patterns in search-based program repair. In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering. 727–738.Google ScholarDigital Library
[75] Tenenbaum Joshua B. and Freeman William T.. 2000. Separating style and content with bilinear models. Neural Computation 12, 6 (2000), 1247–1283.Google ScholarDigital Library
[76] Tian Haoye, Liu Kui, Kaboré Abdoul Kader, Koyuncu Anil, Li Li, Klein Jacques, and Bissyandé Tegawendé F.. 2020. Evaluating representation learning of code changes for predicting patch correctness in program repair. In Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering. ACM.Google ScholarDigital Library
[77] Vaswani Ashish, Shazeer Noam, Parmar Niki, Uszkoreit Jakob, Jones Llion, Gomez Aidan N., Kaiser Łukasz, and Polosukhin Illia. 2017. Attention is all you need. In Advances in Neural Information Processing Systems. 5998–6008.Google Scholar
[78] Wang Shangwen, Liu Kui, Lin Bo, Li Li, Klein Jacques, Mao Xiaoguang, and Bissyandé Tegawendé F.. 2021. Beep: Fine-grained Fix Localization by Learning to Predict Buggy Code Elements. arXiv:2111.07739[cs.SE].Google Scholar
[79] Wang Shangwen, Wen Ming, Chen Liqian, Yi Xin, and Mao Xiaoguang. 2019. How different is it between machine-generated and developer-provided patches? An empirical study on the correct patches generated by automated program repair techniques. In Proceedings of the 13th International Symposium on Empirical Software Engineering and Measurement. IEEE, 1–12. Google ScholarCross Ref
[80] Wang Shangwen, Wen Ming, Lin Bo, and Mao Xiaoguang. 2021. Lightweight global and local contexts guided method name recommendation with prior knowledge. In Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE). ACM, 741–753. Google ScholarDigital Library
[81] Wang Shangwen, Wen Ming, Lin Bo, Wu Hongjun, Qin Yihao, Zou Deqing, Mao Xiaoguang, and Jin Hai. 2020. Automated patch correctness assessment: How far are we?. In Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering. ACM, 968–980. Google ScholarDigital Library
[82] Wen Ming, Chen Junjie, Wu Rongxin, Hao Dan, and Cheung Shing-Chi. 2018. Context-aware patch generation for better automated program repair. In Proceedings of the 40th International Conference on Software Engineering. ACM, 1–11. Google ScholarDigital Library
[83] White Martin, Tufano Michele, Vendome Christopher, and Poshyvanyk Denys. 2016. Deep learning code fragments for code clone detection. In 2016 31st IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 87–98.Google ScholarDigital Library
[84] Wu Hongjun, Zhang Zhuo, Wang Shangwen, Lei Yan, Lin Bo, Qin Yihao, Zhang Haoyu, and Mao Xiaoguang. 2021. Peculiar: Smart contract vulnerability detection based on crucial data flow graph and pre-training techniques. In 2021 IEEE 32nd International Symposium on Software Reliability Engineering (ISSRE). IEEE, 378–389. Google ScholarCross Ref
[85] Xin Qi and Reiss Steven P.. 2017. Identifying test-suite-overfitted patches through test case generation. In Proceedings of the 26th ACM SIGSOFT International Symposium on Software Testing and Analysis. ACM, 226–236. Google ScholarDigital Library
[86] Xin Qi and Reiss Steven P.. 2017. Leveraging syntax-related code for automated program repair. In Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering. 660–670. Google ScholarCross Ref
[87] Xiong Yingfei, Liu Xinyuan, Zeng Muhan, Zhang Lu, and Huang Gang. 2018. Identifying patch correctness in test-based program repair. In Proceedings of the 40th International Conference on Software Engineering. ACM, 789–799. Google ScholarDigital Library
[88] Xiong Yingfei, Wang Jie, Yan Runfa, Zhang Jiachen, Han Shi, Huang Gang, and Zhang Lu. 2017. Precise condition synthesis for program repair. In Proceedings of the 39th IEEE/ACM International Conference on Software Engineering. IEEE, 416–426. Google ScholarDigital Library
[89] Xuan Jifeng, Martinez Matias, Demarco Favio, Clement Maxime, Marcote Sebastian Lamelas, Durieux Thomas, Berre Daniel Le, and Monperrus Martin. 2017. Nopol: Automatic repair of conditional statement bugs in Java programs. IEEE Transactions on Software Engineering 43, 1 (2017), 34–55. Google ScholarDigital Library
[90] Yang Bo and Yang Jinqiu. 2020. Exploring the differences between plausible and correct patches at fine-grained level. In Proceedings of the 2nd International Workshop on Intelligent Bug Fixing. IEEE, 1–8. Google ScholarCross Ref
[91] Yang Jinqiu, Zhikhartsev Alexey, Liu Yuefei, and Tan Lin. 2017. Better test cases for better automated program repair. In Proceedings of the 11th Joint Meeting on Foundations of Software Engineering. ACM, 831–841. Google ScholarDigital Library
[92] Ye He, Gu Jian, Martinez Matias, Durieux Thomas, and Monperrus Martin. 2021. Automated classification of overfitting patches with statically extracted code features. IEEE Transactions on Software Engineering (2021).Google ScholarDigital Library
[93] Ye He, Martinez Matias, and Monperrus Martin. 2021. Automated patch assessment for program repair at scale. Empirical Software Engineering 26, 2 (2021), 1–38.Google ScholarDigital Library
[94] Yi Jooyong, Tan Shin Hwei, Mechtaev Sergey, Böhme Marcel, and Roychoudhury Abhik. 2018. A correlation study between automated program repair and test-suite metrics. Empirical Software Engineering 23, 5 (2018), 2948–2979. Google ScholarDigital Library
[95] Yin Pengcheng, Neubig Graham, Allamanis Miltiadis, Brockschmidt Marc, and Gaunt Alexander L.. 2019. Learning to represent edits. In International Conference on Learning Representations. https://openreview.net/forum?id=BJl6AjC5F7.Google Scholar
[96] Yu Zhongxing, Martinez Matias, Danglot Benjamin, Durieux Thomas, and Monperrus Martin. 2019. Alleviating patch overfitting with automatic test generation: A study of feasibility and effectiveness for the Nopol repair system. Empirical Software Engineering 24, 1 (2019), 33–67. Google ScholarDigital Library
[97] Yuan Yuan and Banzhaf Wolfgang. 2018. ARJA: Automated repair of Java programs via multi-objective genetic programming. IEEE Transactions on Software Engineering (2018). Google ScholarCross Ref
[98] Zhang Jian, Wang Xu, Zhang Hongyu, Sun Hailong, Wang Kaixuan, and Liu Xudong. 2019. A novel neural source code representation based on abstract syntax tree. In 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE). IEEE, 783–794.Google ScholarDigital Library
[99] Zhang Xiangyu, Gupta Neelam, and Gupta Rajiv. 2006. Pruning dynamic slices with confidence. In 2006 ACM SIGPLAN Conference on Programming Language Design and Implementation.Google ScholarDigital Library
[100] Zhang Zhuo, Lei Yan, Mao Xiaoguang, and Li Panpan. 2019. CNN-FL: An effective approach for localizing faults using convolutional neural networks. In 2019 IEEE 26th International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, 445–455.Google ScholarCross Ref
[101] Zhao Gang and Huang Jeff. 2018. DeepSim: Deep learning code functional similarity. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 141–151.Google ScholarDigital Library
[102] Zou Daming, Liang Jingjing, Xiong Yingfei, Ernst Michael D., and Zhang Lu. 2019. An empirical study of fault localization families and their combinations. IEEE Transactions on Software Engineering (2019).Google ScholarDigital Library

Index Terms

Context-Aware Code Change Embedding for Better Patch Correctness Assessment
1. Software and its engineering
  1. Software creation and management
    1. Software verification and validation
      1. Software defect analysis
        Software testing and debugging

Recommendations

Context-aware patch generation for better automated program repair
ICSE '18: Proceedings of the 40th International Conference on Software Engineering

The effectiveness of search-based automated program repair is limited in the number of correct patches that can be successfully generated. There are two causes of such limitation. First, the search space does not contain the correct patch. Second, the ...
Read More
Automated patch correctness assessment: how far are we?
ASE '20: Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering

Test-based automated program repair (APR) has attracted huge attention from both industry and academia. Despite the significant progress made in recent studies, the overfitting problem (i.e., the generated patch is plausible but overfitting) is still a ...
Read More
Evaluating representation learning of code changes for predicting patch correctness in program repair
ASE '20: Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering

A large body of the literature of automated program repair develops approaches where patches are generated to be validated against an oracle (e.g., a test suite). Because such an oracle can be imperfect, the generated patches, although validated by the ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Software Engineering and Methodology Volume 31, Issue 3
July 2022
912 pages
ISSN:1049-331X
EISSN:1557-7392
DOI:10.1145/3514181
Editor:
Mauro Pezzè
USI Università della Svizzera italiana and SIT Schaffhausen Institute of Technology, Switzerland
Issue’s Table of Contents
Copyright © 2022 Association for Computing Machinery.
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 18 May 2022
- Revised: 1 December 2021
- Accepted: 1 December 2021
- Received: 1 July 2021
Published in tosem Volume 31, Issue 3

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Automated program repair
patch correctness
deep learning
Qualifiers
- research-article
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 23
  Total Citations
  View Citations
- 1,252
  Total Downloads
- Downloads (Last 12 months)585
- Downloads (Last 6 weeks)89
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

View Full Text

HTML Format

View this article in HTML Format .

View HTML Format

Context-Aware Code Change Embedding for Better Patch Correctness Assessment

ACM Transactions on Software Engineering and Methodology

Abstract

REFERENCES

Cited By

Index Terms

Recommendations

Context-aware patch generation for better automated program repair

Automated patch correctness assessment: how far are we?

Evaluating representation learning of code changes for predicting patch correctness in program repair