Re: Git bisect does not find commit introducing the bug

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, Feb 19, 2017 at 12:32 PM, Alex Hoffman <spec@xxxxxx> wrote:
>> At the end of the git-bisect man page (in the SEE ALSO section) there
>> is a link to https://github.com/git/git/blob/master/Documentation/git-bisect-lk2009.txt
>> which has a lot of details about how bisect works.
>
> Thanks for pointing out the SEE ALSO section. I think it makes sense
> to include/describe the entire algorithm in the man page itself,
> although I am not sure whether the graphs would be always correctly
> visually represented in the man page format.

It would possibly be very long to describe the entire algorithm, as it
can be quite complex in some cases and it is difficult to understand
without graphs. Maybe we could describe it, or some parts of it, in a
separate document and provide links at different places in the man
page.
Anyway feel free to send patches.

>> The goal is to find the first bad commit, which is a commit that has
>> only good parents.
>
> OK, bisect's mission is more exact than I thought, which is good. M

Good that you seem to agree with this goal.

>> As o1 is an ancestor of G, then o1 is considered good by the bisect algorithm.
>> If it was bad, it would means that there is a transition from bad to
>> good between o1 and G.
>> But when a good commit is an ancestor of the bad commit, git bisect
>> makes the assumption that there is no transition from bad to good in
>> the graph.
>
> The assumption that there is no transition from bad to good in the
> graph did not hold in my example and it does not hold when a feature
> was recently introduced and gets broken relative shortly afterwards.
> But I consider it is easy to change the algorithm not to assume, but
> rather to check it.

I don't think the default algorithm will change soon, but there have
been discussions for a long time about adding options to use different
algorithms.

For example people have been discussing a "--first-parent" option for
many years as well as recently. It would bisect only along the first
parents of the involved commits, and it could help find the merge
commit that introduced a bug in the mainline.

>> git bisect makes some assumptions that are true most of the time, so
>> in practice it works well most of the time.
>
> Whatever the definition of "most of the time" everyone might have, I
> think there is room for improvement.

So feel free to send patches that would implement an option with the
improvements you want.

> Below I am trying to make a small
> change to the current algorithm in order to deal with the assumption
> that sometimes does not hold (e.g in my example), by explicitly
> validating the check.
>
>> --o1--o2--o3--G--X1
>>     \                \
>>      x1--x2--x3--x4--X2--B--
>>       \              /
>>        y1--y2--y3
>
> Step 1a. (Unchanged) keep only the commits that:
>
>         a) are ancestor of the "bad" commit (including the "bad" commit itself),
>         b) are not ancestor of a "good" commit (excluding the "good" commits).
>
> The following graph results:
>       x1--x2--x3--x4--X2--B--
>        \              /
>         y1--y2--y3

I would say that the above graph is missing X1.

> Step 1b. (New) Mark all root commits of the resulting graph (i.e
> commits without parents) as unconfirmed (unconfirmed=node that has
> only bad parents). Remove all root commits that user already confirmed
> (e.g if user already marked its parent as good right before starting
> bisect run). For every unconfirmed root commit check if it has any
> good parents. In the example above check whether x1 has good parents.

I think I understand the above...

>      If the current root element has any parents and none of them is
> good, we can delete all paths from it until to the next commit that
> has a parent in the ancestors of GOOD. In the example above to delete
> the path x1-x3 and x1-y3. Also add new resulting root commits to the
> list of unconfirmed commits (commit x4).
>      Otherwise mark it as confirmed.

... but I don't understand the logic of the above. If the root element
has a bad parent, then it means that the "first bad commit" is either
the bad parent or one of its ancestors, so it is not logical to delete
it. In your example if x1 has one bad parent, this bad parent and its
ancestors should be included in the search of the first bad commit.

Otherwise the goal is not any more to find the first bad commit.

PS: I saw that you have just sent another version of the algorithm,
but I don't want to take a look at it right now. Anyway I am keeping
my above comments as they might still be useful.

> Step2. Continue the existing algorithm.
>
>
> If this improvement works (i.e you do not find any bugs in it and it
> is feasible to implement, which seems to me)

As you describe it, I don't think it is compatible with the goal of
finding the first bad commit.
Also there are many things that are feasible to implement, but it
doesn't mean that someone will soon make the effort to implement them
in a way that looks good enough to be deemed worth merging into the
current code base.

> the following would be
> its advantages:
> 1. An assumption less, as we explicitly check the assumption.

Checking can be costly. If the probability that the check will fail is
very low, while the cost of checking is high, it is less costly on
average to not check.

> 2. It might be quicker, because we delete parts of graph that cannot
> contain transitions.

I don't agree that it's a good idea to delete what you suggest above.
Or if you think that the goal should not be to find the "first bad
commit" in the above case, then you should explain what the goal
should be.

> 3. It returns more exact results.

Yeah, but checking every commit related to the good and bad commits
would also return more exact results. (This can probably be done using
`git rebase --exec ...` by the way.) One could even print a nice graph
with all the good and bad commits. The problem is that it would not be
efficient. If git bisect makes some assumptions, it is because they
have been deemed reasonable and they have worked well in practice.
It's also because the goal of git bisect is to be efficient, otherwise
there would be no point in using a binary search algorithm in the
first place.



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]