Derrick Stolee <stolee@xxxxxxxxx> writes: > On 5/22/2018 1:39 AM, Michael Haggerty wrote: >> On 05/21/2018 08:10 PM, Derrick Stolee wrote: >>> [...] >>> In the Discussion section of the `git merge-base` docs [1], we have the >>> following: >>> >>> When the history involves criss-cross merges, there can be more than >>> one best common ancestor for two commits. For example, with this topology: >>> >>> ---1---o---A >>> \ / >>> X >>> / \ >>> ---2---o---o---B >>> >>> both 1 and 2 are merge-bases of A and B. Neither one is better than >>> the other (both are best merge bases). When the --all option is not >>> given, it is unspecified which best one is output. >>> >>> This means our official documentation mentions that we do not have a >>> concrete way to differentiate between these choices. This makes me think >>> that this change in behavior is not a bug, but it _is_ a change in >>> behavior. It's worth mentioning, but I don't think there is any value in >>> making sure `git merge-base` returns the same output. >>> >>> Does anyone disagree? Is this something we should solidify so we always >>> have a "definitive" merge-base? >>> [...] >> This may be beyond the scope of what you are working on, but there are >> significant advantages to selecting a "best" merge base from among the >> candidates. Long ago [1] I proposed that the "best" merge base is the >> merge base candidate that minimizes the number of non-merge commits that >> are in >> >> git rev-list $candidate..$branch >> >> that are already in master: >> >> git rev-list $master >> >> (assuming merging branch into master), which is equivalent to choosing >> the merge base that minimizes >> >> git rev-list --count $candidate..$branch Is the above correct... >> In fact, this criterion is symmetric if you exchange branch ↔ master, >> which is a nice property, and indeed generalizes pretty simply to >> computing the merge base of more than two commits. ...as it doesn't seem to have the described symmetry. >> >> In that email I also included some data showing that the "best" merge >> base almost always results in either the same or a shorter diff than the >> more or less arbitrary algorithm that we currently use. Sometimes the >> difference in diff length is dramatic. >> >> To me it feels like the best *deterministic* merge base would be based >> on the above criterion, maybe with first-parent reachability, commit >> times, and SHA-1s used (in that order) to break ties. > > Thanks, everyone, for your perspective on this. I'm walking away with > these conclusions: > > 1. While this is a change in behavior, it is not a regression. We do > not need to act immediately to preserve old behavior in these > ambiguous cases. > > 2. We should (eventually) define tie-breaking conditions. I like > Michael's suggestion above. One thing I'd like to point out is that when searching for some algorithm to speed up merge-base calculation (which is called lowest common ancestor in graph theory, and for which I have currently found only an algorithm with O(|V|*{E|) preparation time, and U(1) query) I have found instead attempts to rigorously define single representative lowest common ancestor. It might be worth a look how it is done. Another possible source to compare against is the algorithm used by Mercurial (which as far as I know doesn't use recursive merge strategy, so it needs to chose one merge base). HTH, -- Jakub Narębski