An Empirical Evaluation of the MuJava Mutation Operators

Ben Smith Laurie Williams

Abstract

Mutation testing is used to assess the fault-finding effectiveness of a test suite. Information provided by mutation testing can also be used to guide the creation of additional valuable tests and/or to reveal faults in the implementation code. However, concerns about the time efficiency of mutation testing may prohibit its widespread, practical use. We conducted an empirical study using the MuClipse automated mutation testing plug-in for Eclipse on the back end of a small web-based application. The first objective of our study was to categorize the behavior of the mutants generated by selected mutation operators during successive attempts to kill the mutants. The results of this categorization can be used to inform developers in their mutant operator selection to improve the efficiency and effectiveness of their mutation testing. The second outcome of our study identified patterns in the implementation code that remained untested after attempting to kill all mutants.

1. Introduction

Mutation testing is a testing methodology in which two or more program mutations (mutants for short) are executed against the same test suite to evaluate the ability of the test suite to detect these alterations [5]. The mutation testing procedure entails adding or modifying test cases until the test suite is sufficient to detect all mutants [1]. The post-mutation testing, augmented test suite may reveal latent faults and will provide a stronger test suite to detect future errors which might be injected. The mutation process is computationally expensive and inefficient [3]. Most often, mutation operators produce mutants which demonstrate the need to modify the test bed code or the need for more test cases [3]. However, some mutation operators produce mutants which cannot be detected by a test suite, and the developer must manually determine these are “false positive” mutants. Additionally, the process of adding a new test case will frequently detect more than was intended, which brings into question the necessity of multiple variations of the same mutated statement.

As a result, empirical data about the behavior of the mutants produced by a given mutation operator can help us understand the usefulness of the operator in a given context. Our research objective is to compare the resultant behavior of mutants produced by the set of mutation operators supported in the MuJava tool to empirically determine which are the most effective. Additionally, after completion of the mutation process for a given Java class, we categorized the untested lines of code into exception handling, branch statements, method body and return statements. Finally, our research reveals several design decisions which can be implemented in future automated mutation tools to improve their efficiency for users. A mutation testing empirical study was conducted using two versions of three major classes for the Java backend of the iTrust web healthcare application. For each Java class, we began by maximizing the efficiency of the existing unit test suite by removing redundant and incorrect tests. Next, the initial mutation score and associated detail by mutant was recorded. We then iteratively attempted to write tests to detect each mutant, one at a time, until every mutant had been examined. Data, such as mutation score and mutant status, was recorded after each iteration. When all mutants had been examined, a line coverage utility was used to ascertain the remaining untested lines of code. These lines of code were then categorized by their language constructs. The study was conducted using the MuClipse mutation testing plug-in for Eclipse. MuClipse was adapted from the MuJava [12] testing tool. The remainder of this paper is organized as follows: Section 2 briefly explains mutation testing and summarizes other studies that have been conducted to evaluate its efficacy. Next, Section 3 provides information on MuClipse and its advancements for the mutation process. Section 4 details the test bed and the procedure used to gather our data, including terms specific to this study. Then, Section 5 shows the results, their interpretation, and the limitations of the study. Finally, Section 6 details some lessons learned by the gathering of this data which can be applied to the development of future automated mutation tools and which can be used by developers when executing mutation testing in practice.

==