Two Birds with One Stone: Boosting Code Generation and Code Search via a Generative Adversarial Network

Authors:
Shangwen Wang

National University of Defense Technology, Changsha, China

National University of Defense Technology, Changsha, China

0000-0003-1469-2063
View Profile

,
Bo Lin

National University of Defense Technology, Changsha, China

National University of Defense Technology, Changsha, China

0000-0001-5905-4677
View Profile

,
Zhensu Sun

Singapore Management University, Singapore, Singapore

Singapore Management University, Singapore, Singapore

0000-0001-5393-7858
View Profile

,
Ming Wen

Huazhong University of Science and Technology, Wuhan, China

Huazhong University of Science and Technology, Wuhan, China

0000-0001-5588-9618
View Profile

,
Yepang Liu

Southern University of Science and Technology, Shenzhen, China

Southern University of Science and Technology, Shenzhen, China

0000-0001-8147-8126
View Profile

,
Yan Lei

Chongqing University, Chongqing, China

Chongqing University, Chongqing, China

0000-0003-4504-6806
View Profile

,
Xiaoguang Mao

National University of Defense Technology, Changsha, China

National University of Defense Technology, Changsha, China

0000-0003-4204-7424
View Profile

Proceedings of the ACM on Programming Languages Volume 7 Issue OOPSLA2Article No.: 239pp 486–515https://doi.org/10.1145/3622815

Published:16 October 2023Publication History

Proceedings of the ACM on Programming Languages

Abstract

Automatically transforming developers' natural language descriptions into source code has been a longstanding goal in software engineering research. Two types of approaches have been proposed in the literature to achieve this: code generation, which involves generating a new code snippet, and code search, which involves reusing existing code. However, despite existing efforts, the effectiveness of the state-of-the-art techniques remains limited. To seek for further advancement, our insight is that code generation and code search can help overcome the limitation of each other: the code generator can benefit from feedback on the quality of its generated code, which can be provided by the code searcher, while the code searcher can benefit from the additional training data augmented by the code generator to better understand code semantics. Drawing on this insight, we propose a novel approach that combines code generation and code search techniques using a generative adversarial network (GAN), enabling mutual improvement through the adversarial training. Specifically, we treat code generation and code search as the generator and discriminator in the GAN framework, respectively, and incorporate several customized designs for our tasks. We evaluate our approach in eight different settings, and consistently observe significant performance improvements for both code generation and code search. For instance, when using NatGen, a state-of-the-art code generator, as the generator and GraphCodeBERT, a state-of-the-art code searcher, as the discriminator, we achieve a 32% increase in CodeBLEU score for code generation, and a 12% increase in mean reciprocal rank for code search on a large-scale Python dataset, compared to their original performances.

References

Miltiadis Allamanis, Earl T. Barr, Premkumar T. Devanbu, and Charles A. Sutton. 2018. A Survey of Machine Learning for Big Code and Naturalness. Comput. Surveys, 51, 4 (2018), 81:1–81:37. https://doi.org/10.1145/3212695 Google ScholarDigital Library
Liang Bao, Xin Liu, Fangzheng Wang, and Baoyin Fang. 2019. ACTGAN: automatic configuration tuning for software systems with generative adversarial networks. In 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE). 465–476. Google ScholarDigital Library
Shraddha Barke, Michael B. James, and Nadia Polikarpova. 2023. Grounded Copilot: How Programmers Interact with Code-Generating Models. 7, OOPSLA1 (2023), Article 78, apr, 27 pages. https://doi.org/10.1145/3586030 Google ScholarDigital Library
Samy Bengio, Oriol Vinyals, Navdeep Jaitly, and Noam Shazeer. 2015. Scheduled sampling for sequence prediction with recurrent neural networks. Advances in neural information processing systems, 28 (2015). Google Scholar
Andrew Brock, Jeff Donahue, and Karen Simonyan. 2019. Large Scale GAN Training for High Fidelity Natural Image Synthesis. In International Conference on Learning Representations. Google Scholar
Nghi DQ Bui, Yijun Yu, and Lingxiao Jiang. 2021. Self-supervised contrastive learning for code retrieval and summarization via semantic-preserving transformations. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 511–521. Google ScholarDigital Library
Rudy Bunel, Matthew Hausknecht, Jacob Devlin, Rishabh Singh, and Pushmeet Kohli. 2018. Leveraging Grammar and Reinforcement Learning for Neural Program Synthesis. In International Conference on Learning Representations (ICLR). Google Scholar
Jose Cambronero, Hongyu Li, Seohyun Kim, Koushik Sen, and Satish Chandra. 2019. When deep learning met code search. In Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 964–974. Google ScholarDigital Library
Saikat Chakraborty, Toufique Ahmed, Yangruibo Ding, Premkumar Devanbu, and Baishakhi Ray. 2022. NatGen: Generative pre-training by “Naturalizing” source code. In Proceedings of the 30th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE). Google ScholarDigital Library
Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, and Greg Brockman. 2021. Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374. Google Scholar
Matthias De Lange, Rahaf Aljundi, Marc Masana, Sarah Parisot, Xu Jia, Aleš Leonardis, Gregory Slabaugh, and Tinne Tuytelaars. 2021. A continual learning survey: Defying forgetting in classification tasks. IEEE transactions on pattern analysis and machine intelligence, 44, 7 (2021), 3366–3385. Google Scholar
Emily L Denton, Soumith Chintala, and Rob Fergus. 2015. Deep generative image models using a laplacian pyramid of adversarial networks. Advances in neural information processing systems, 28 (2015). Google Scholar
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 4171–4186. https://doi.org/10.18653/v1/n19-1423 Google ScholarCross Ref
Luca Di Grazia and Michael Pradel. 2022. Code Search: A Survey of Techniques for Finding Code. arXiv preprint arXiv:2204.02765. Google Scholar
Li Dong and Mirella Lapata. 2018. Coarse-to-Fine Decoding for Neural Semantic Parsing. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 731–742. Google ScholarCross Ref
Aryaz Eghbali and Michael Pradel. 2022. CrystalBLEU: precisely and efficiently measuring the similarity of code. In 37th IEEE/ACM International Conference on Automated Software Engineering. 1–12. Google ScholarDigital Library
Zhiyu Fan, Xiang Gao, Abhik Roychoudhury, and Shin Hwei Tan. 2023. Automated Repair of Programs from Large Language Models. In 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). Google ScholarDigital Library
Zhangyin Feng, Daya Guo, Duyu Tang, Nan Duan, Xiaocheng Feng, Ming Gong, Linjun Shou, Bing Qin, Ting Liu, and Daxin Jiang. 2020. CodeBERT: A Pre-Trained Model for Programming and Natural Languages. In Findings of the Association for Computational Linguistics: EMNLP 2020. 1536–1547. Google ScholarCross Ref
Shuzheng Gao, Hongyu Zhang, Cuiyun Gao, and Chaozheng Wang. 2023. Keeping Pace with Ever-Increasing Data: Towards Continual Learning of Code Intelligence Models. In 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). Google Scholar
Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative Adversarial Nets. Advances in Neural Information Processing Systems, 27 (2014). Google Scholar
Xiaodong Gu, Hongyu Zhang, and Sunghun Kim. 2018. Deep code search. In 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE). 933–944. Google ScholarDigital Library
Daya Guo, Shuo Ren, Shuai Lu, Zhangyin Feng, Duyu Tang, Shujie Liu, Long Zhou, Nan Duan, Alexey Svyatkovskiy, and Shengyu Fu. 2021. GraphCodeBERT: Pre-training Code Representations with Data Flow. In ICLR. Google Scholar
David Harel, Hagi Lachover, Amnon Naamad, Amir Pnueli, Michal Politi, Rivi Sherman, Aharon Shtull-Trauring, and Mark Trakhtenbrot. 1990. Statemate: A working environment for the development of complex reactive systems. IEEE Transactions on software engineering, 16, 4 (1990), 403–414. Google ScholarDigital Library
Tatsunori B Hashimoto, Kelvin Guu, Yonatan Oren, and Percy S Liang. 2018. A retrieve-and-edit framework for predicting structured outputs. Advances in Neural Information Processing Systems, 31 (2018). Google Scholar
Shirley Anugrah Hayati, Raphael Olivier, Pravalika Avvaru, Pengcheng Yin, Anthony Tomasic, and Graham Neubig. 2018. Retrieval-Based Neural Code Generation. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Google ScholarCross Ref
Jane Huffman Hayes and Jeff Offutt. 2006. Input validation analysis and testing. Empirical Software Engineering, 11 (2006), 493–522. Google ScholarDigital Library
Hamel Husain, Ho-Hsiang Wu, Tiferet Gazit, Miltiadis Allamanis, and Marc Brockschmidt. 2019. Codesearchnet challenge: Evaluating the state of semantic code search. arXiv preprint arXiv:1909.09436. Google Scholar
Michael B James, Zheng Guo, Ziteng Wang, Shivani Doshi, Hila Peleg, Ranjit Jhala, and Nadia Polikarpova. 2020. Digging for fold: synthesis-aided API discovery for Haskell. Proceedings of the ACM on Programming Languages, 4, OOPSLA (2020), 1–27. Google ScholarDigital Library
Amy J Ko, Brad A Myers, and Htet Htet Aung. 2004. Six learning barriers in end-user programming systems. In 2004 IEEE Symposium on Visual Languages-Human Centric Computing. 199–206. Google ScholarDigital Library
Yan Lei, Tiantian Wen, Huan Xie, Lingfeng Fu, Chunyan Liu, Lei Xu, and Hongxia Sun. 2023. Mitigating the Effect of Class Imbalance in Fault Localization Using Context-aware Generative Adversarial Network. In 2023 IEEE/ACM 31st International Conference on Program Comprehension (ICPC). Google Scholar
Jia Li, Yongmin Li, Ge Li, Zhi Jin, Yiyang Hao, and Xing Hu. 2023. SkCoder: A Sketch-based Approach for Automatic Code Generation. In 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). Google Scholar
Jiwei Li, Will Monroe, Tianlin Shi, Sébastien Jean, Alan Ritter, and Dan Jurafsky. 2017. Adversarial Learning for Neural Dialogue Generation. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. 2157–2169. Google ScholarCross Ref
Yujia Li, David Choi, Junyoung Chung, Nate Kushman, Julian Schrittwieser, Rémi Leblond, Tom Eccles, James Keeling, Felix Gimeno, and Agustin Dal Lago. 2022. Competition-level code generation with alphacode. Science, 378, 6624 (2022), 1092–1097. Google Scholar
Zhizhong Li and Derek Hoiem. 2017. Learning without forgetting. IEEE transactions on pattern analysis and machine intelligence, 40, 12 (2017), 2935–2947. Google Scholar
Chunyang Ling, Zeqi Lin, Yanzhen Zou, and Bing Xie. 2020. Adaptive deep code search. In Proceedings of the 28th International Conference on Program Comprehension. 48–59. Google ScholarDigital Library
Wang Ling, Phil Blunsom, Edward Grefenstette, Karl Moritz Hermann, Tomáš Kočiskỳ, Fumin Wang, and Andrew Senior. 2016. Latent Predictor Networks for Code Generation. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 599–609. Google ScholarCross Ref
Chao Liu, Xin Xia, David Lo, Cuiyun Gao, Xiaohu Yang, and John Grundy. 2021. Opportunities and challenges in code search tools. ACM Computing Surveys (CSUR), 54, 9 (2021), 1–40. Google Scholar
Hui Liu, Mingzhu Shen, Jiaqi Zhu, Nan Niu, Ge Li, and Lu Zhang. 2020. Deep Learning Based Program Generation from Requirements Text: Are We There Yet? IEEE Transactions on Software Engineering, 1–1. Google Scholar
Shuai Lu, Daya Guo, Shuo Ren, Junjie Huang, Alexey Svyatkovskiy, Ambrosio Blanco, Colin Clement, Dawn Drain, Daxin Jiang, and Duyu Tang. 2021. CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 1). Google Scholar
Sifei Luan, Di Yang, Celeste Barnaby, Koushik Sen, and Satish Chandra. 2019. Aroma: Code recommendation via structural code search. Proceedings of the ACM on Programming Languages, 3, OOPSLA (2019), 1–28. Google ScholarDigital Library
Fei Lv, Hongyu Zhang, Jian-guang Lou, Shaowei Wang, Dongmei Zhang, and Jianjun Zhao. 2015. Codehow: Effective code search based on api understanding and extended boolean model (e). In 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE). 260–270. Google ScholarDigital Library
Collin McMillan, Mark Grechanik, Denys Poshyvanyk, Qing Xie, and Chen Fu. 2011. Portfolio: finding relevant functions and their usage. In Proceedings of the 33rd International Conference on Software Engineering. 111–120. Google ScholarDigital Library
Collin McMillan, Negar Hariri, Denys Poshyvanyk, Jane Cleland-Huang, and Bamshad Mobasher. 2012. Recommending source code for use in rapid software prototypes. In 2012 34th International Conference on Software Engineering (ICSE). 848–858. Google ScholarCross Ref
Anders Miltner, Sumit Gulwani, Vu Le, Alan Leung, Arjun Radhakrishna, Gustavo Soares, Ashish Tiwari, and Abhishek Udupa. 2019. On the fly synthesis of edit suggestions. Proceedings of the ACM on Programming Languages, 3, OOPSLA (2019), 1–29. Google ScholarDigital Library
Mehdi Mirza and Simon Osindero. 2014. Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784. Google Scholar
Parastoo Mohagheghi and Reidar Conradi. 2007. Quality, productivity and economic benefits of software reuse: a review of industrial studies. Empirical Software Engineering, 12, 5 (2007), 471–516. Google ScholarDigital Library
Changan Niu, Chuanyi Li, Vincent Ng, Dongxiao Chen, Jidong Ge, and Bin Luo. 2023. An Empirical Comparison of Pre-Trained Models of Source Code. In 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). Google Scholar
Suphakit Niwattanakul, Jatsada Singthongchai, Ekkachai Naenudorn, and Supachanun Wanapu. 2013. Using of Jaccard coefficient for keywords similarity. In Proceedings of the international multiconference of engineers and computer scientists. 1, 380–384. Google Scholar
Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics. 311–318. Google Scholar
Hammond Pearce, Baleegh Ahmad, Benjamin Tan, Brendan Dolan-Gavitt, and Ramesh Karri. 2022. Asleep at the keyboard? assessing the security of github copilot’s code contributions. In 2022 IEEE Symposium on Security and Privacy (SP). 754–768. Google ScholarCross Ref
Daniël AA Pelsmaeker, Hendrik van Antwerpen, Casper Bach Poulsen, and Eelco Visser. 2022. Language-parametric static semantic code completion. Proceedings of the ACM on Programming Languages, 6, OOPSLA1 (2022), 1–30. Google ScholarDigital Library
Alec Radford, Luke Metz, and Soumith Chintala. 2015. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434. Google Scholar
Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. 2018. Improving language understanding by generative pre-training. Google Scholar
Veselin Raychev, Martin Vechev, and Eran Yahav. 2014. Code completion with statistical language models. In Proceedings of the 35th ACM SIGPLAN conference on programming language design and implementation. 419–428. Google ScholarDigital Library
Shuo Ren, Daya Guo, Shuai Lu, Long Zhou, Shujie Liu, Duyu Tang, Neel Sundaresan, Ming Zhou, Ambrosio Blanco, and Shuai Ma. 2020. Codebleu: a method for automatic evaluation of code synthesis. arXiv preprint arXiv:2009.10297. Google Scholar
Caitlin Sadowski, Kathryn T Stolee, and Sebastian Elbaum. 2015. How developers search for code: a case study. In Proceedings of the 2015 10th joint meeting on foundations of software engineering. 191–201. Google ScholarDigital Library
Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, and Xi Chen. 2016. Improved techniques for training gans. Advances in neural information processing systems, 29 (2016). Google Scholar
Ensheng Shi, Wenchao Gub, Yanlin Wang, Lun Du, Hongyu Zhang, Shi Han, Dongmei Zhang, and Hongbin Sun. 2022. Enhancing Semantic Code Search with Multimodal Contrastive Learning and Soft Data Augmentation. arXiv preprint arXiv:2204.03293. Google Scholar
Jiho Shin and Jaechang Nam. 2021. A Survey of Automatic Code Generation from Natural Language. Journal of Information Processing Systems, 17, 3 (2021), 537–555. Google Scholar
Jianhang Shuai, Ling Xu, Chao Liu, Meng Yan, Xin Xia, and Yan Lei. 2020. Improving code search with co-attentive representation learning. In Proceedings of the 28th International Conference on Program Comprehension. 196–207. Google ScholarDigital Library
Shao-Hua Sun, Hyeonwoo Noh, Sriram Somasundaram, and Joseph Lim. 2018. Neural program synthesis from diverse demonstration videos. In International Conference on Machine Learning. 4790–4799. Google Scholar
Weisong Sun, Chunrong Fang, Yuchen Chen, Guanhong Tao, Tingxu Han, and Quanjun Zhang. 2022. Code Search based on Context-aware Code Translation. In Proceedings of the 44th International Conference on Software Engineering. Google ScholarDigital Library
Richard S Sutton, David McAllester, Satinder Singh, and Yishay Mansour. 1999. Policy gradient methods for reinforcement learning with function approximation. Advances in neural information processing systems, 12 (1999). Google Scholar
Sergey Tulyakov, Ming-Yu Liu, Xiaodong Yang, and Jan Kautz. 2018. Mocogan: Decomposing motion and content for video generation. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1526–1535. Google ScholarCross Ref
Laurens Van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE.. Journal of machine learning research, 9, 11 (2008). Google Scholar
Gust Verbruggen, Vu Le, and Sumit Gulwani. 2021. Semantic programming by example with pre-trained models. Proceedings of the ACM on Programming Languages, 5, OOPSLA (2021), 1–25. Google ScholarDigital Library
Yao Wan, Jingdong Shu, Yulei Sui, Guandong Xu, Zhou Zhao, Jian Wu, and Philip Yu. 2019. Multi-modal attention network learning for semantic source code retrieval. In 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE). 13–25. Google ScholarDigital Library
Chaozheng Wang, Junhao Hu, Cuiyun Gao, Yu Jin, Tao Xie, Hailiang Huang, Zhenyu Lei, and Yuetang Deng. 2023. Practitioners’ Expectations on Code Completion. arXiv preprint arXiv:2301.03846. Google Scholar
Hong Wang, Wenhan Xiong, Mo Yu, Xiaoxiao Guo, Shiyu Chang, and William Yang Wang. 2019. Sentence embedding alignment for lifelong relation extraction. In Annual Conference of the North American Chapter of the Association for Computational Linguistics. Google ScholarCross Ref
Shangwen Wang. 2023. The Artifacts of OOPSLA-2023 Submission #197. https://doi.org/10.5281/zenodo.7824776 Google ScholarDigital Library
Shangwen Wang, Mingyang Geng, Bo Lin, Zhensu Sun, Ming Wen, Yepang Liu, Li Li, Tegawendé F. Bissyandé, and Xiaoguang Mao. 2023. Natural Language to Code: How Far are We? In Proceedings of the 31st ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE). https://doi.org/10.1145/3611643.3616323 Google ScholarDigital Library
Yue Wang, Weishi Wang, Shafiq Joty, and Steven CH Hoi. 2021. CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. 8696–8708. Google ScholarCross Ref
Fengcai Wen, Emad Aghajani, Csaba Nagy, Michele Lanza, and Gabriele Bavota. 2021. Siri, write the next method. In 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE). 138–149. Google ScholarDigital Library
Michael W Whalen. 2000. High-integrity code generation for state-based formalisms. In Proceedings of the 2000 International Conference on Software Engineering. ICSE 2000 the New Millennium. 725–727. Google ScholarDigital Library
F. Wilcoxon. 1945. Individual Comparisons by Ranking Methods. Biometrics Bulletin, 1, 6 (1945), 80–83. Google ScholarCross Ref
Weihao Xia, Yulun Zhang, Yujiu Yang, Jing-Hao Xue, Bolei Zhou, and Ming-Hsuan Yang. 2022. Gan inversion: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence. Google ScholarCross Ref
Xin Xia, Lingfeng Bao, David Lo, Pavneet Singh Kochhar, Ahmed E Hassan, and Zhenchang Xing. 2017. What do developers search for on the web? Empirical Software Engineering, 22, 6 (2017), 3149–3185. Google ScholarDigital Library
Frank F Xu, Bogdan Vasilescu, and Graham Neubig. 2022. In-ide code generation from natural language: Promise and challenges. ACM Transactions on Software Engineering and Methodology (TOSEM), 31, 2 (2022), 1–47. Google ScholarDigital Library
Ling Xu, Huanhuan Yang, Chao Liu, Jianhang Shuai, Meng Yan, Yan Lei, and Zhou Xu. 2021. Two-Stage Attention-Based Model for Code Search with Textual and Structural Features. In 2021 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER). 342–353. Google Scholar
Ziyu Yao, Jayavardhan Reddy Peddamail, and Huan Sun. 2019. Coacor: Code annotation for code retrieval with reinforcement learning. In The world wide web conference. 2203–2214. Google Scholar
Pengcheng Yin and Graham Neubig. 2017. A Syntactic Neural Model for General-Purpose Code Generation. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 440–450. Google ScholarCross Ref
Pengcheng Yin and Graham Neubig. 2018. TRANX: A Transition-based Neural Abstract Syntax Parser for Semantic Parsing and Code Generation. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. 7–12. Google ScholarCross Ref
Lantao Yu, Weinan Zhang, Jun Wang, and Yong Yu. 2017. Seqgan: Sequence generative adversarial nets with policy gradient. In Proceedings of the AAAI conference on artificial intelligence. 31. Google ScholarCross Ref
Zhengran Zeng, Hanzhuo Tan, Haotian Zhang, Jing Li, Yuqun Zhang, and Lingming Zhang. 2022. An Extensive Study on Pre-trained Models for Program Understanding and Generation. In Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis. Google ScholarDigital Library
Wen Zhang, Yang Feng, Fandong Meng, Di You, and Qun Liu. 2019. Bridging the Gap between Training and Inference for Neural Machine Translation. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 4334–4343. Google ScholarCross Ref
Yuhao Zhang, Yasharth Bajpai, Priyanshu Gupta, Ameya Ketkar, Miltiadis Allamanis, Titus Barik, Sumit Gulwani, Arjun Radhakrishna, Mohammad Raza, and Gustavo Soares. 2022. Overwatch: learning patterns in code edit sequences. Proceedings of the ACM on Programming Languages, 6, OOPSLA2 (2022), 395–423. Google ScholarDigital Library
Junchen Zhao, Yurun Song, Junlin Wang, and Ian G Harris. 2022. GAP-Gen: Guided Automatic Python Code Generation. arXiv preprint arXiv:2201.08810. Google Scholar
Tianming Zhao, Chunyang Chen, Yuanning Liu, and Xiaodong Zhu. 2021. Guigan: Learning to generate gui designs using generative adversarial networks. In 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE). 748–760. Google ScholarDigital Library
Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A Efros. 2017. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE international conference on computer vision. 2223–2232. Google ScholarCross Ref

Index Terms

Two Birds with One Stone: Boosting Code Generation and Code Search via a Generative Adversarial Network
1. Software and its engineering
  1. Software creation and management
    1. Software development techniques
      1. Automatic programming
      2. Reusability

Recommendations

Code Search: A Survey of Techniques for Finding Code
The immense amounts of source code provide ample challenges and opportunities during software development. To handle the size of code bases, developers commonly search for code, e.g., when trying to find where a particular feature is implemented or when ...
Read More
Code semantic enrichment for deep code search
Abstract
Code search aims to retrieve code snippets from a large-scale codebase, where the semantics of the searched code match developers’ query intent. Code is a low-level implementation of programming intents, but query is always expressed as clear and ...
Graphical abstract

Display Omitted
Highlights
- Finding that the code semantics can be enriched by incorporating with the description of its most similar code.
- Proposing a code semantic enrichment approach named SemEnr for deep code search.
- Evaluating the performance of SemEnr ...
Read More
Code Search is All You Need? Improving Code Suggestions with Code Search
ICSE '24: Proceedings of the IEEE/ACM 46th International Conference on Software Engineering

Modern integrated development environments (IDEs) provide various automated code suggestion techniques (e.g., code completion and code generation) to help developers improve their efficiency. Such techniques may retrieve similar code snippets from the ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
Proceedings of the ACM on Programming Languages Volume 7, Issue OOPSLA2
October 2023
2250 pages
EISSN:2475-1421
DOI:10.1145/3554312
Editor:
Michael Hicks
Amazon, USA
Issue’s Table of Contents
Copyright © 2023 Owner/Author
This work is licensed under a Creative Commons Attribution 4.0 International License.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 16 October 2023
Published in pacmpl Volume 7, Issue OOPSLA2

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Badges
- Artifacts Available / v1.1
Author Tags
Code Generation
Code Search
Generative Adversarial Network
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1
  Total Citations
  View Citations
- 445
  Total Downloads
- Downloads (Last 12 months)445
- Downloads (Last 6 weeks)58
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Two Birds with One Stone: Boosting Code Generation and Code Search via a Generative Adversarial Network

Proceedings of the ACM on Programming Languages

Abstract

References

Cited By

Index Terms

Recommendations

Code Search: A Survey of Techniques for Finding Code

Code semantic enrichment for deep code search

Code Search is All You Need? Improving Code Suggestions with Code Search