Re: Could this be done simpler?

Junio C Hamano <gitster@xxxxxxxxx> · Thu, 25 Jun 2009 11:32:30 -0700

Matthias Andree <matthias.andree@xxxxxx> writes:

> And if I may be so bold: Please rewrite before somebody starts polishing the
> bisect facilities WRT octopus merges. These seem unrelated, as in: you don't
> need to make bisect more convenient to be able to fix the description of
> git-pull --append...

Let's have a refresher course of how bisection works with a history with
merges.

Assume that you have this history (time flows from left to right, recent
commits are known to be bad, old commits are known to be good).

                       o---o---o---o---A
                      /                 \
  ---o---o---o---o---F---o---o---o---B---M

In real life, you would start from a history with more commits on top of M
and only know that the tip of that sequence is bad, but for brevity, let's
assume we bisected and already know M is bad.

If B is good, the breakage was either introduced at M, or was on the side
branch leading to A, but not older than F where A and B forked from.

    Side note.  As in all other discussion in this message, remember
    that bisect is for finding a _single_ breakage that was left
    unfixed til the tip of the history being bisected.  "B is good"
    means "the _single_ breakage is not in the commit that would
    affect B, i.e. in B's ancestors",

If B is bad, on the other hand, the branch leading to A since the fork
point F is exonerated and we do not have to look at the side branch that
leads to A.

Which means that by seeing one the tip of a merged branch is good, you
can see that everything before the merge base is good and you need to only
look at _the other_ branch.

What happens if M is an Octopus?

                       o---o---o---o---A
                      /                 \
  ---o---o---o---o---F---o---o---o---B---M
                  \       \             /|
                   \       o---o---o---C |
                    \                    |
                     o---o---o---o---o---D

If B is good, you still need to look at histories leading to A, C, and D
individually.  Of course if B is bad, then you do not have to look at 
the histrories leading to A, C and D from their respective fork points,
but you still do have to look at the shared past.

But we could optimize further.  After knowing M, an Octopus merge, is bad,
when we are tempted to test one of the tips of the branches that was
merged (say B), we can instead give a tree that is a result of merging
only A and B (i.e. excluding C and D) for testing.  If it is good, then
the histories leading to both A and B are good, and we only need to check
side branches leading C and D since they forked from the shared common
history.  If combination of A and B is bad, on the other hand, then we do
not have to check branch histories leading to C nor D.

Doing so essentially shifts the balance between what happens if a single
test turns out to be good or bad.  If we test the tip of the branch, and
if it is bad, we will eliminate other forks (but still need to test the
shared history).  If it is good, we only eliminate that particular branch
and shared history, but all the other forks remain suspect.  So it is a
tradeoff between:

 - the size of all the other side branches since they forked == number of
   commits we do not have to test if this round says "bad";

 - the size of this side branch and the shared history == number of
   commits we do not have to test if this round says "good";

The current bisect algorithm makes this tradeoff, by computing the above
two numbers and finding the point that makes them closest to each other.
It however does not let you test two commits at the same time (i.e.
testing the merge of A and B in the above example) which could make the
tradeoff even more efficient.

I see there is another window for optimization we could make from the
above observation.  Making the number of commits eliminated when the test
is "good" and "bad" as close to equal as possible is the best strategy
when the tested commit has a 50-50 chance of being "good" or "bad".  If we
somehow know that the tested commit is likely to be "bad", we would want
to maximize the number of commits eliminated when the commit is indeed
"bad" (and vice versa).

I do not see an easy way to exploit this window offhand, though...
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html