[SATV 0x06] Delta Debugging
Once we have reproduced a program failure, we must find out what’s relevant:
- Does failure really depend on 10,000 lines of code?
- Does failure really require this exact schedule of events?
- Does failure really need this sequence of function calls?
Answering these questions helps simplify a test case. But why simplify?
- Ease of communication: a simplified test case is easier to communicate
- Easier debugging: smaller test cases result in smaller states and shorter executions
- Identify duplicates: simplified test cases subsume several duplicates
Let’s look at a concrete bug report in the Mozilla bug database. Consider this HTML page.
<td align=left valign=top>
<SELECT NAME="op sys" MULTIPLE SIZE=7>
<OPTION VALUE="All">All<OPTION VALUE="Windows 3.1">Windows 3.1<OPTION VALUE="Windows 95">Windows 95<OPTION VALUE="Windows 98">Windows 98<OPTION
VALUE="Windows ME">Windows ME<OPTION VALUE="Windows 2000">Windows 2000<OPTION VALUE="Windows NT">Windows NT<OPTION VALUE="MacSystem 7">Mac
System 7<OPTION VALUE="Mac System 7.5">Mac System 7.5<OPTION VALUE="Mac System 7.6.1">Mac System 7.6.1<OPTION VALUE="Mac System 8.0">Mac System
8.0<OPTION VALUE="Mac System 8.5">Mac System 8.5<OPTION VALUE="Mac System 8.6">Mac System 8.6<OPTION VALUE="Mac System 9.x">Mac System
9.x<OPTION VALUE="MacOS X">MacOS X<OPTION VALUE="Linux">Linux<OPTION VALUE="BSDI">BSDI<OPTION VALUE="FreeBSD">FreeBSD<OPTION
VALUE="NetBSD">NetBSD<OPTION VALUE="OpenBSD">OpenBSD<OPTION VALUE="AIX">AIX<OPTION VALUE="BeOS">BeOS<OPTION VALUE="HP-UX">HP-UX<OPTION
VALUE="IRIX">IRIX<OPTION VALUE="Neutrino">Neutrino<OPTION VALUE="OpenVMS">OpenVMS<OPTION VALUE="OS/2">OS/2<OPTION VALUE="OSF/1">OSF/1<OPTION
VALUE="Solaris">Solaris<OPTION VALUE="SunOS">SunOS<OPTION VALUE="other">other</SELECT>
</td>
<td align=left valign=top>
<SELECT NAME="priority" MULTIPLE SIZE=7>
<OPTION VALUE="--">--<OPTION VALUE="P1">P1<OPTION VALUE="P2">P2<OPTION VALUE="P3">P3<OPTION VALUE="P4">P4<OPTION VALUE="P5">P5</SELECT>
</td>
<td align=left valign=top>
<SELECT NAME="bug severity" MULTIPLE SIZE=7>
<OPTION VALUE="blocker">blocker<OPTION VALUE="critical">critical<OPTION VALUE="major">major<OPTION VALUE="normal">normal<OPTION
VALUE="minor">minor<OPTION VALUE="trivial">trivial<OPTION VALUE="enhancement">enhancement</SELECT>
</tr>
</table>
Loading this page using a certain version of Mozilla’s web browser and printing it causes a segmentation fault. Somewhere in this HTML input is something that makes the browser fail. But how do we find it?
How do we go from that large input to this simple input?
<SELECT>
01 Binary Search
What do we as humans do in order to minimize test cases?
One possibility is that we might use a binary search, cutting the test case in two and testing each half of the input separately. We could even iterate this procedure to shrink the input as much as possible. Even better, binary search is a process that can be easily automated for large test cases!
Let’s see how this application of binary search might work.
Proceeding by binary search, we throw away half the input and see if the output is still wrong. If it is, repeat this operation; if not, go back to the previous state and discard the other half of the input.
Now let’s see an example. Assuming that the original input that triggers the bug is:

After testing, the output is reported to be wrong, so we discard half of the original input and test again:

The testing still reports the same error, so we repeat the process:

It is evident from the testing result that the output is correct, indicating that this particular input is not the root cause of the bug. We therefore revert to the previous state and attempt the alternative half of the input.

The testing result indicates that this half is the root cause. However, we are yet to ascertain whether this is the minimal bug-triggered input, so it is necessary to continue splitting and testing it.

As demonstrated in the figure above, both halves successfully pass the test, indicating that the previous input has been sufficiently minimized. In other words, once streamlined, the input can no longer result in the error. This is the simplified input we are looking for.

Using binary search, we can identify the following code that is triggered by a bug:
<SELECT NAME="priority" MULTIPLE SIZE=7>
The next step is to identify the specific part of the code that triggers the bug. In this regard, we can still use binary search, but with finer granularity. The following graph illustrates the process and provides the result.

Finally, we identify the source of the issue:
<SELECT>
02 Delta Debugging
Let’s generalize the binary search procedure a bit. Instead of requiring ourselves to divide inputs strictly in half at each iteration, we could allow ourselves more granularity in dividing our input.

The process of dividing the input will result in an increase in the granularity. The impact of input granularity:
| Input granularity | Finer | Coarser |
|---|---|---|
| Chance of finding a failing input subset | Higher | Lower |
| Progress of the search | Slower | Faster |
Now we give the formal description of the delta-debugging algorithm.
Inputs and Failures:
- Let be the set of possible inputs
- corresponds to an input that passes
- corresponds to an input that fails
Changes:
- We can go from one input to another input by a series of changes. A change is a mapping which takes one input and changes it to another input.
- A change can be decomposed to a number of elementary changes where and .
- We have a set of changes such that . Each subset of of is a test case.
Testing test cases:
- Given a test case , we would like to know if the input generated by applying changes in to causes the same failure as .
- We define the function such that, given , iff is a failing input.
Minimizing test case:
- Goal: find the smallest test case such that
- A failing test case is called the global minimum of if .
- The global minimum is the smallest set of changes which will make the program fail. Finding the global minimum may require performing an exponential number of tests.
03 Search for 1-minimal Input
Find a set of changes that cause the failure, but removing any single change causes the failure to go away. This is 1-minimality.
-
A failing test case is called a local minimum of if .
-
A failing test case is n-minimal if .
-
A failing test case is 1-minimal if .
Here we present a naive algorithm for finding a 1-minimal subset of : if , then is 1-minimal else recurse on for some . This algorithm can be described in pseudocode as follows:
function find_1_minimal(c):
for δ in c:
c_prime = c - {δ} // 创建一个移除了 δ 的新集合
if test(c_prime) == F:
return find_1_minimal(c_prime) // 递归调用,继续简化
return c // 如果无法移除任何元素,返回当前集合,它是 1-minimal
Asymptotic analysis: In the worst case, we have to remove one element from the set per iteration, which is potentially . This is .
Could we improve this process? The answer is yes. It is silly to remove one element at a time. We can attempt to divide the change set into two initially, and increase the number of subsets if we are unable to make progress. Should we get lucky, the search will converge quickly. This is the minimization algorithm.
The minimization algorithm finds a 1-minimal test case:
- It partitions the set to - are pairwise disjoint, and . Specially, when , and .
- Define the complement of as .
- It starts with .
- It tests each test case defined by each partition and its complement, and reduces the test case if a smaller failure inducing set is found, otherwise it refines the partition (i.e. ).
Steps of the minimization algorithm:
- Start with and as the test set
- Test each and each
- There are three possible outcomes:
- Some causes failure: Go to step 1 with and .
- Some causes failure: Go to step 1 with and 1.
- No test causes failure: If granularity can be refined, go to step 1 with and ; otherwise, done, found the 1-minimal subset.
Asymptotic analysis: Worst case is still quadratic, when it subdivides until each set is of size 1 (reduce to the naive algorithm). But luckily, for a single failure, it converges in (binary search).
In practice, it may be necessary to customize and adapt the delta debugging algorithm to each significant system in order to exploit system knowledge; that is, it is not a plug-and-play algorithm.
Quiz: A program crashes when its input contains 42. Fill in the data in each iteration of the minimization algorithm assuming character granularity.
| Iteration | n | ||
|---|---|---|---|
| 1 | 2 | 2424 | 24 |
| 2 | 4 | 2424 | 2,4,242,224,424,244 |
| 3 | 3 | 242 | 2,4,24,42,22 |
| 4 | 2 | 42 | 4,2 |
Footnotes
-
This is to keep the granularity (search length) consistent, for . ↩