Abstract
Bug localization, which is used to help programmers identify the
location of bugs in source code, is an essential task in software
development. Researchers have already made efforts to harness the
powerful deep learning (DL) techniques to automate it. However, training
bug localization model is usually challenging because it requires a
large quantity of data labeled with the bug’s exact location, which is
difficult and time-consuming to collect. By contrast, obtaining bug
detection data with binary labels of whether there is a bug in the
source code is much simpler. This paper proposes a WEakly supervised bug
LocaLization (WELL) method, which only uses the bug detection data with
binary labels to train a bug localization model. With CodeBERT finetuned
on the buggy-or-not binary labeled data, WELL can address bug
localization in a weakly supervised manner. The evaluations on three
method-level synthetic datasets and one file-level real-world dataset
show that WELL is significantly better than the existing SOTA model in
typical bug localization tasks such as variable misuse and other
programming bugs.