Speaker: Mati Shomrat
Title: public PhD lecture: Detecting Refactored Clones
Abstract:
Software systems tend to contain sections
of code that are very similar, named code clones. Such code duplication
may occur for a variety of reasons. Once duplicated, each copy takes on
a life of its own, as different transformations may be applied to it. Some
of these transformations are behavior-preserving (that is, refactorings),
while others may modify the behavior of a copy.
Cloning, therefore, creates the risk
of changes not being propagated to all copies and errors arising due to
diverging clones.
Developers often copy and paste code
to quickly implement functionalities that have been implemented before.
With the proliferation of open-source repositories, this kind of
reuse is easier than ever. Sometimes, however, the code is copied
illegally; this can be either intentional plagiarism or, given the complexity
of software licenses, it can also be the result of an innocent mistake
on the part of a developer believing that a certain piece of code can be
legally copied. Software-development companies need to protect themselves
against both kinds of violations.
The availability of automated refactoring
support in modern development environments further complicates the task
of clone detection, as these tools make it very easy for developers (and
plagiarists) to make significant and wide-ranging syntactic changes to
code without changing its functionality.
We present Cider, a general tool for
the identification of refactored clones. Cider is a semantic clone
detector, based on a graph representation of programs. The graph
abstraction allows Cider to detect semantically similar code fragments,
while abstracting away from the concrete syntax, thus avoiding the syntactic
effects of refactoring. Refactorings may change not only the intraprocedural
structure of code, but its interprocedural organization as well. Some
refactorings, such as Extract Method and Introduce Factory, introduce new
methods, while others, such as Inline Method, remove method calls
and may remove the called methods altogether. Cider is able to cope
with such interprocedural refactorings, and is unique in doing so.
Cider was evaluated on several open-source
projects. The results suggest that interprocedural clones are ubiquitous,
demonstrating the pervasive nature of the problem.