On Mon, May 21, 2018 at 2:50 PM, Jeff King <peff@xxxxxxxx> wrote: > On Mon, May 21, 2018 at 11:33:11AM -0700, Elijah Newren wrote: > >> > In t6024-recursive-merge.sh, we have the following commit structure: >> > >> > # 1 - A - D - F >> > # \ X / >> > # B X >> > # X \ >> > # 2 - C - E - G >> > >> > When merging F to G, there are two "best" merge-bases, A and C. With >> > core.commitGraph=false, 'git merge-base F G' returns A, while it returns C >> > when core.commitGraph=true. This is due to the new walk order when using >> > generation numbers, although I have not dug deep into the code to point out >> > exactly where the choice between A and C is made. Likely it's just whatever >> > order they are inserted into a list. >> >> Ooh, interesting. >> >> Just a guess, but could it be related to relative ordering of >> committer timestamps? Ordering of committer timestamps apparently >> affects order of merge-bases returned to merge-recursive, and although >> that shouldn't have mattered, a few bugs meant that it did and the >> order ended up determining what contents a successful merge would >> have. See this recent post: >> >> https://public-inbox.org/git/CABPp-BFc1OLYKzS5rauOehvEugPc0oGMJp-NMEAmVMW7QR=4Eg@xxxxxxxxxxxxxx/ >> >> The fact that the merge was successful for both orderings of merge >> bases was the real bug, though; it should have detected and reported a >> conflict both ways. > > Traditionally we've inserted commits into the walk queue in commit-date > ordering, but with identical dates it may depend on the order in which > you reach the commits. Many of the tests are particularly bad for > showing this off because they do not use test_tick, and so you end up > with a bunch of commits with identical timestamps. > > If we're just using generation numbers for queue ordering, we're even > more likely to hit these cases, since they're expected to increase along > parallel branches at roughly the same rate. It's probably a good idea to > have some tie-breakers to make things more deterministic (walk order > shouldn't matter, but it can be confusing if we sometimes use one order > and sometimes the other). > > Even ordering by {generation, timestamp} isn't quite enough, since you > could still tie there. Perhaps {generation, timestamp, hash} would be a > sensible ordering? The hash sounds reasonable as the definite tie breaker. git merge-base is documented as "Find as good common ancestors as possible for a merge", so in case we do not require the tie breaking to be cheap, we could go by "smallest diff output" of the two diffs against the potential merge commit. Though I don't think this is really optimal for performance reasons.