It has been estimated that between 25-50 percent of the total cost and time in system development may be spent on software testing and debugging [4, 18], and as much as 95 percent of the time spent on debugging may again be spent on fault-location [11]. Thus it is important to automate and speed up the process of locating where an error occurs in the code.

For this purpose programmers apply debugging tools such as dbx, gdb and other vendor-specific programs . These tools let the programmer keep track of the state of the program and the values of certain variables while stepping through the execution of a program. The debugging process can be time consuming since the programmer must inspect values manually to determine when an error occurs.

In order to automate the process most debuggers can set conditional breakpoints (watch-points). In this way the debugger stops only when some condition is evaluated to true. For example, one might know the expected range of a variable. However, if the computation is ``almost'' correct such information will be of little help in determining where the error occurs.

Program slicing [1, 13, 16] is a way of determining which lines of code affect a certain variable. Thus if one knows that a certain variable is incorrect this method gives an overview of the other variables that affect it.

One must, however, know in which variables the error occurs and also when it occurs. An incorrect end result is incorrect might be the result of a long chain of events in the program. For this reason execution backtracking [1] has been suggested. This method lets the user backtrack through the execution of the program recreating the different states of the execution. For memory intensive programs with long execution times this is not a practical procedure as it would require storing vast amount of data. A limited time window in which one can backtrack will save memory but might not be sufficient to backtrack to where the error originates. Moreover, this process still requires the user to inspect the program manually .

The motivation for the current work came from debugging large time-stepped simulation codes. Examples of such codes are fluid dynamics computations, oil reservoir modeling, weather forecasting, etc. These codes have in common that they try to simulate the behavior of a complex system over time. The simulation is broken down into shorter time steps on which the model is able to predict the outcome within reasonable accuracy. Such simulations tend to be computationally intensive and may run for hours and even for several days. These codes also tend to be very large involving up to several hundred thousands of code lines.

If an error should occur in such a program the task of locating it might be very difficult and time-consuming. This is especially true for large codes and if the error only occurs in long test runs.

The development of large numerical codes is usually carried out in an incremental fashion and over a long period of time. Thus it is not unlikely that if parts of the code is changed and an error occurs one would have two versions of the program, one giving the correct answer and a new version giving the wrong one. In this setting it can be argued that the programmer has sufficient knowledge of what has been changed in the program in order to locate the error. But adding new features and capabilities to a program might initiate an error in the old parts of the code that had already been tested. Moreover, if the error only occurs for specific data sets the code might have continued developing for some time before one realizes that it contains an error.

We present a framework called comparative debugging that automates this particular debugging situation. In this framework the programmer runs the old and the new versions of the program simultaneously, while monitoring the values of a set of selected variables. If the values of any variable differs between the two programs the programmer is automatically notified. This will direct the programmer to the first instance where an error occurs.

We believe this to be a novel approach to automating and thus speeding up the debugging process that can easily be incorporated into existing debugging tools. In order to illustrate this we present a prototype tool called the Wizard that realizes the presented methods.

Aside from speeding up the debugging process the Wizard can also be used to verify that two programs reach the same answer through the same set of computations.

The paper is organized as follows. In Section 2 we set up the the theoretical framework for comparative debugging. In Section 3 we point to some applications where this might be of use. The Wizard itself is presented in Section 4 before we conclude in Section 5.