小特·白色空间 小特·白色空间

[SATV 0x06] Delta Debugging

2025年10月10日

Once we have reproduced a program failure, we must find out what’s relevant:

  • Does failure really depend on 10,000 lines of code?
  • Does failure really require this exact schedule of events?
  • Does failure really need this sequence of function calls?

Answering these questions helps simplify a test case. But why simplify?

  • Ease of communication: a simplified test case is easier to communicate
  • Easier debugging: smaller test cases result in smaller states and shorter executions
  • Identify duplicates: simplified test cases subsume several duplicates

Let’s look at a concrete bug report in the Mozilla bug database. Consider this HTML page.

<td align=left valign=top>
<SELECT NAME="op sys" MULTIPLE SIZE=7>
<OPTION VALUE="All">All<OPTION VALUE="Windows 3.1">Windows 3.1<OPTION VALUE="Windows 95">Windows 95<OPTION VALUE="Windows 98">Windows 98<OPTION 
VALUE="Windows ME">Windows ME<OPTION VALUE="Windows 2000">Windows 2000<OPTION VALUE="Windows NT">Windows NT<OPTION VALUE="MacSystem 7">Mac 
System 7<OPTION VALUE="Mac System 7.5">Mac System 7.5<OPTION VALUE="Mac System 7.6.1">Mac System 7.6.1<OPTION VALUE="Mac System 8.0">Mac System 
8.0<OPTION VALUE="Mac System 8.5">Mac System 8.5<OPTION VALUE="Mac System 8.6">Mac System 8.6<OPTION VALUE="Mac System 9.x">Mac System 
9.x<OPTION VALUE="MacOS X">MacOS X<OPTION VALUE="Linux">Linux<OPTION VALUE="BSDI">BSDI<OPTION VALUE="FreeBSD">FreeBSD<OPTION 
VALUE="NetBSD">NetBSD<OPTION VALUE="OpenBSD">OpenBSD<OPTION VALUE="AIX">AIX<OPTION VALUE="BeOS">BeOS<OPTION VALUE="HP-UX">HP-UX<OPTION 
VALUE="IRIX">IRIX<OPTION VALUE="Neutrino">Neutrino<OPTION VALUE="OpenVMS">OpenVMS<OPTION VALUE="OS/2">OS/2<OPTION VALUE="OSF/1">OSF/1<OPTION 
VALUE="Solaris">Solaris<OPTION VALUE="SunOS">SunOS<OPTION VALUE="other">other</SELECT>
</td>
<td align=left valign=top>
<SELECT NAME="priority" MULTIPLE SIZE=7>
<OPTION VALUE="--">--<OPTION VALUE="P1">P1<OPTION VALUE="P2">P2<OPTION VALUE="P3">P3<OPTION VALUE="P4">P4<OPTION VALUE="P5">P5</SELECT>
</td>
<td align=left valign=top>
<SELECT NAME="bug severity" MULTIPLE SIZE=7>
<OPTION VALUE="blocker">blocker<OPTION VALUE="critical">critical<OPTION VALUE="major">major<OPTION VALUE="normal">normal<OPTION 
VALUE="minor">minor<OPTION VALUE="trivial">trivial<OPTION VALUE="enhancement">enhancement</SELECT>
</tr>
</table>

Loading this page using a certain version of Mozilla’s web browser and printing it causes a segmentation fault. Somewhere in this HTML input is something that makes the browser fail. But how do we find it?

How do we go from that large input to this simple input?

<SELECT>

What do we as humans do in order to minimize test cases?

One possibility is that we might use a binary search, cutting the test case in two and testing each half of the input separately. We could even iterate this procedure to shrink the input as much as possible. Even better, binary search is a process that can be easily automated for large test cases!

Let’s see how this application of binary search might work.

Proceeding by binary search, we throw away half the input and see if the output is still wrong. If it is, repeat this operation; if not, go back to the previous state and discard the other half of the input.

Now let’s see an example. Assuming that the original input that triggers the bug is:

After testing, the output is reported to be wrong, so we discard half of the original input and test again:

The testing still reports the same error, so we repeat the process:

It is evident from the testing result that the output is correct, indicating that this particular input is not the root cause of the bug. We therefore revert to the previous state and attempt the alternative half of the input.

The testing result indicates that this half is the root cause. However, we are yet to ascertain whether this is the minimal bug-triggered input, so it is necessary to continue splitting and testing it.

As demonstrated in the figure above, both halves successfully pass the test, indicating that the previous input has been sufficiently minimized. In other words, once streamlined, the input can no longer result in the error. This is the simplified input we are looking for.

Using binary search, we can identify the following code that is triggered by a bug:

<SELECT NAME="priority" MULTIPLE SIZE=7>

The next step is to identify the specific part of the code that triggers the bug. In this regard, we can still use binary search, but with finer granularity. The following graph illustrates the process and provides the result.

Finally, we identify the source of the issue:

<SELECT>

02 Delta Debugging

Let’s generalize the binary search procedure a bit. Instead of requiring ourselves to divide inputs strictly in half at each iteration, we could allow ourselves more granularity in dividing our input.

The process of dividing the input will result in an increase in the granularity. The impact of input granularity:

Input granularityFinerCoarser
Chance of finding a failing input subsetHigherLower
Progress of the searchSlowerFaster

Now we give the formal description of the delta-debugging algorithm.

Inputs and Failures:

  • Let RR be the set of possible inputs
  • rPRr_{P}\in R corresponds to an input that passes
  • rFRr_{F}\in R corresponds to an input that fails

Changes:

  • We can go from one input r1r_{1} to another input r2r_{2} by a series of changes. A change δ\delta is a mapping RRR\rightarrow R which takes one input and changes it to another input.
  • A change δ\delta can be decomposed to a number of elementary changes δ1,δ2,,δn\delta_{1},\delta_{2},\dots,\delta_{n} where δ=δ1δ2δn\delta=\delta_{1}\circ\delta_{2}\circ\dots\circ\delta_{n} and (δiδj)(r)=δj(δi(r))(\delta_{i}\circ\delta_{j})(r)=\delta_{j}(\delta_{i}(r)).
  • We have a set of changes cF={δ1,δ2,,δn}c_{F}=\{\delta_{1},\delta_{2},\dots,\delta_{n}\} such that rF=(δ1δ2δn)(rP)r_{F}=(\delta_{1}\circ\delta_{2}\circ\dots\circ\delta_{n})(r_{P}). Each subset of cc of cFc_{F} is a test case.

Testing test cases:

  • Given a test case cc, we would like to know if the input generated by applying changes in cc to rPr_{P} causes the same failure as rFr_{F}.
  • We define the function test:Powerset(cF){P,F}\mathrm{test}: \mathrm{Powerset}(c_{F})\rightarrow\{P,F\} such that, given c={δ1,δ2,,δn}cFc=\{\delta_{1},\delta_{2},\dots,\delta_{n}\}\subseteq c_{F}, test(c)=F\mathrm{test}(c)=F iff (δ1δ2δn)(rP)(\delta_{1}\circ\delta_{2}\circ\dots\circ\delta_{n})(r_{P}) is a failing input.

Minimizing test case:

  • Goal: find the smallest test case cc such that test(c)=F\mathrm{test}(c)=F
  • A failing test case ccFc\subseteq c_{F} is called the global minimum of cFc_{F} if ccF,c<ctest(c)F\forall c^{\prime}\subseteq c_{F},\lvert c^{\prime}\rvert<\lvert c\rvert\Rightarrow\mathrm{test}(c^{\prime})\neq F.
  • The global minimum is the smallest set of changes which will make the program fail. Finding the global minimum may require performing an exponential number of tests.

03 Search for 1-minimal Input

Find a set of changes that cause the failure, but removing any single change causes the failure to go away. This is 1-minimality.

  • A failing test case ccFc\subseteq c_{F} is called a local minimum of cFc_{F} if cc,test(c)F\forall c^{\prime}\subset c,\mathrm{test}(c^{\prime})\neq F.

  • A failing test case ccFc\subseteq c_{F} is n-minimal if cc,ccntest(c)F\forall c^{\prime}\subset c,\lvert c\rvert-\lvert c^{\prime}\rvert\le n\Rightarrow\mathrm{test}(c^{\prime})\neq F.

  • A failing test case is 1-minimal if δic,test(c{δi})F\forall\delta_{i}\in c,\mathrm{test}(c-\{\delta_{i}\})\neq F.

Here we present a naive algorithm for finding a 1-minimal subset of cc: if δic,test(c{δi})F\forall\delta_{i}\in c,\mathrm{test}(c-\{\delta_{i}\})\neq F, then cc is 1-minimal else recurse on c{δ}c-\{\delta\} for some δc,test(c{δ})=F\delta\in c,\mathrm{test}(c-\{\delta\})=F. This algorithm can be described in pseudocode as follows:

function find_1_minimal(c):
    for δ in c:
        c_prime = c - {δ}  // 创建一个移除了 δ 的新集合
        if test(c_prime) == F:
            return find_1_minimal(c_prime)  // 递归调用,继续简化
    return c  // 如果无法移除任何元素,返回当前集合,它是 1-minimal

Asymptotic analysis: In the worst case, we have to remove one element from the set per iteration, which is potentially N+(N1)+(N2)+N+(N-1)+(N-2)+\dots. This is O(N2)\Omicron(N^2).

Could we improve this process? The answer is yes. It is silly to remove one element at a time. We can attempt to divide the change set into two initially, and increase the number of subsets if we are unable to make progress. Should we get lucky, the search will converge quickly. This is the minimization algorithm.

The minimization algorithm finds a 1-minimal test case:

  • It partitions the set cFc_{F} to Δ1,Δ2,,Δn\Delta_{1},\Delta_{2},\dots,\Delta_{n} - Δ1,Δ2,,Δn\Delta_{1},\Delta_{2},\dots,\Delta_{n} are pairwise disjoint, and cF=Δ1Δ2Δnc_{F}=\Delta_{1}\cup\Delta_{2}\cup\dots\cup\Delta_{n}. Specially, when n=2n=2, Δ1=2\Delta_{1}=\nabla_{2} and Δ2=1\Delta_{2}=\nabla_{1}.
  • Define the complement of Δ1\Delta_{1} as i=cFΔi\nabla_{i}=c_{F}-\Delta_i.
  • It starts with n=2n=2.
  • It tests each test case defined by each partition and its complement, and reduces the test case if a smaller failure inducing set is found, otherwise it refines the partition (i.e. n:=n2n:=n*2).

Steps of the minimization algorithm:

  1. Start with n=2n=2 and Δ\Delta as the test set
  2. Test each Δ1,Δ2,,Δn\Delta_{1},\Delta_{2},\dots,\Delta_{n} and each 1,2,,n\nabla_{1},\nabla_{2},\dots,\nabla_{n}
  3. There are three possible outcomes:
    1. Some Δi\Delta_{i} causes failure: Go to step 1 with Δ=Δi\Delta=\Delta_{i} and n=2n=2.
    2. Some i\nabla_{i} causes failure: Go to step 1 with Δ=i\Delta=\nabla_{i} and n=n1n=n-11.
    3. No test causes failure: If granularity can be refined, go to step 1 with Δ=Δ\Delta=\Delta and n=n2n=n*2; otherwise, done, found the 1-minimal subset.

Asymptotic analysis: Worst case is still quadratic, when it subdivides until each set is of size 1 (reduce to the naive algorithm). But luckily, for a single failure, it converges in O(logN)\Omicron(\log N) (binary search).

In practice, it may be necessary to customize and adapt the delta debugging algorithm to each significant system in order to exploit system knowledge; that is, it is not a plug-and-play algorithm.


Quiz: A program crashes when its input contains 42. Fill in the data in each iteration of the minimization algorithm assuming character granularity.

IterationnΔ\DeltaΔ1,Δ2,,Δn,1,2,,n\Delta_{1},\Delta_{2},\dots,\Delta_{n},\nabla_{1},\nabla_{2},\dots,\nabla_{n}
12242424
2424242,4,242,224,424,244
332422,4,24,42,22
42424,2

Footnotes

  1. This is to keep the granularity (search length) consistent, for i+1=i+Δi=n\lvert\nabla_{i}\rvert+1=\lvert\nabla_{i}\rvert+\lvert\Delta_{i}\rvert=n.

本文采用 CC BY-NC-SA 4.0 协议,转载请注明出处。

相关文章

评论