Re: [RFD] should all merge bases be equal?

Michael Haggerty <mhagger@xxxxxxxxxxxx> · Thu, 9 Feb 2017 15:44:09 +0100

On 10/18/2016 12:28 AM, Junio C Hamano wrote:
> [...]
> Being accustomed how fast my merges go, there is one merge that
> hiccups for me every once in a few days: merging back from 'master'
> to 'next'.  [...]
> 
> The reason why this merge is slow is because it typically have many
> merge bases.  [...]

I overlooked this topic until just now :-(

I spent a lot of time looking at merge bases a couple of years ago [1],
originally motivated by the crappy diffs you get from

    git diff master...branch

when the merge base is chosen poorly. In that email I include a lot of
data and suggest a different heuristic, namely to define the "best"
merge base $M to be the one that minimizes the number of non-merge
commits between $M and either of the branch tips (it doesn't matter
which one you choose); i.e., the one that minimizes

    git rev-list --count --no-merges $M..$TIP

. The idea is that a merge base that is "closer" content-wise to the
tips will probably yield smaller diffs. I would expect that merge base
also to yield simpler merges, though I didn't test that. Relying on the
number of commits (rather than some other measure of how much the
content has been changed) is only a heuristic, but it seems to work well
and it can be implemented pretty cheaply.

We actually use an algorithm like the one I described at GitHub, though
it is implemented as a script rather than ever having been integrated
into git. And (for no particular reason) we include merge commits in the
commit count (it doesn't make much difference whether merges are
included or excluded).

Your idea to look at the first-parent histories of the two branch tips
is an interesting one and has the nice theoretical property that it is
based on the DAG topology rather than a count of commits. I'd be very
curious to see how the sizes of asymmetric diffs differ between your
method and mine, because for me smaller and more readable diffs are one
of the main benefits of better merge bases.

I would worry a bit that your proposed algorithm won't perform as well
for people who use less disciplined workflows than git.git or the Linux
kernel. For example, many people merge a lot more frequently and
chaotically, maybe even with the parents reversed from the canonical order.

Anyway, I mostly wanted to remind you of the earlier discussion of this
topic. There's a lot more information there.

Michael

[1] http://public-inbox.org/git/539A25BF.4060501@xxxxxxxxxxxx/