Re: [PATCH v2] bisect: loosen halfway() check for a large number of commits

Junio C Hamano <gitster@xxxxxxxxx> · Thu, 12 Nov 2020 10:23:13 -0800

SZEDER Gábor <szeder.dev@xxxxxxxxx> writes:

> So let's loosen the check in the halfway() helper to consider commits
> within about 0.1% of the exact halfway point as halfway as well, and
> rename the function to approx_halfway() accordingly.  This will allow
> us to return early on a bigger good-bad range, even when there is no
> commit exactly at the halfway point, thereby reducing the runtime of
> the first command above considerably, from ~15s to 4.901s.

The optimization presented with this change would probably offer
more merge commits to be tested than a single-parent commit than the
original algorithm, simply because merges are inspected first before
single-parent commits so have better chance to be picked as "good
enough" among commits with similar goodness.

  Side note: This is merely an observation---I do not know if it is
  a good thing, a bad thing, or a neutral thing, but it would likely
  affect the end-user experience.

The optimization presented here gives probably more than enough
improvement, but it just occured to me when writing the entry to
explain the topic in the What's cooking report:

     "git bisect start/next" in a large span of history spends a lot of
     time trying to come up with exactly the half-way point; this can be
     optimized by stopping when we see a commit that is close enough to
     the half-way point.

The realization is that the optimization naturally will be affected
by the order the commits are visited.  If a commit that is close
enough to the half-way point happens to be visited earlier, it would
help terminate our search early.  And do_find_bisection() search
counts all merge commits before any commit on the linear ancestry
chain can be counted to optimize counting of commits on the linear
ancestry chain, which are expected to exist more than merge commits.

By sorting the "list" to somehow encourage commits near the half-way
appear early on it, we may raise the likelyhood that we'd find
good-enough commit early and terminate, no?  Perhaps sort by the
(absolute) distance between the committer timestamp of individual
commit and its median value, or something?

Thanks.