Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

CoditT5: Pretraining for Source Code and Natural Language Editing release_nivresaxv5d2zoaumxatww6vyq

by Jiyang Zhang, Sheena Panthaplackel, Pengyu Nie, Junyi Jessy Li, Milos Gligoric

Released as a article .

2022  

Abstract

Pretrained language models have been shown to be effective in many software-related generation tasks; however, they are not well-suited for editing tasks as they are not designed to reason about edits. To address this, we propose a novel pretraining objective which explicitly models edits and use it to build CoditT5, a large language model for software-related editing tasks that is pretrained on large amounts of source code and natural language comments. We fine-tune it on various downstream editing tasks, including comment updating, bug fixing, and automated code review. By outperforming standard generation-based models, we demonstrate the generalizability of our approach and its suitability for editing tasks. We also show how a standard generation model and our edit-based model can complement one another through simple reranking strategies, with which we achieve state-of-the-art performance for the three downstream editing tasks.
In text/plain format

Archived Files and Locations

application/pdf  299.3 kB
file_sqfxyehdkzfv7icz2fngypibcq
arxiv.org (repository)
web.archive.org (webarchive)
Read Archived PDF
Preserved and Accessible
Type  article
Stage   submitted
Date   2022-09-14
Version   v2
Language   en ?
arXiv  2208.05446v2
Work Entity
access all versions, variants, and formats of this works (eg, pre-prints)
Catalog Record
Revision: b6ef08a4-0cce-4aaa-bba8-286b3072d288
API URL: JSON