I finished my Master’s degree at Concordia University in Montreal, Canada as a research student. My thesis was about duplicate code refactoring and below is the abstract for it.

A Refactoring Technique for Large Groups of Software Clones

Code duplication, also known as software clones, is a persistent problem in software systems that is usually associated with error-proneness and poor software maintainability. Despite the fact that clone detection is a mature research field, clone refactoring has not been equally investigated. Clone refactoring requires the unification and merging of duplicated code, which is a challenging problem because of the changes that take place on the initial clones after their introduction.

In recent years, more research works attempted to address the challenges around clone refactoring by applying different techniques; however, they suffer from poor accuracy or performance issues, especially for large clone groups containing more than two clone instances. We contribute to this field by proposing an automated approach that a) finds refactorable subgroups (consisting of three clones or more) within the original group of clones, b) finds the statements that to be merged and extracted in a fast yet accurate way, and c) assesses the refactorability of clone subgroups.

We evaluated our approach in comparison to the state-of-the-art, and the results show that we have a high accuracy in matching the clone statements, while maintaining high performance. In a case study, where we carefully examined all clone groups in project JFreeChart 1.0.10, we found that around 49% of the 98 clone subgroups are actually refactorable. Finally, we conducted a large-scale study on over 44k clone groups (13.6k groups containing 3 clones or more) detected by four clone detection tools in nine open source projects to assess the refactorability for clone groups. The outcome of this study revealed the presence of 2,833 refactorable clone subgroups that contain in total 13,398 clone instances.

The complete thesis can be found here: