On 5/22/2018 1:39 AM, Michael Haggerty wrote:
On 05/21/2018 08:10 PM, Derrick Stolee wrote:
[...]
In the Discussion section of the `git merge-base` docs [1], we have the
following:
When the history involves criss-cross merges, there can be more than
one best common ancestor for two commits. For example, with this topology:
---1---o---A
\ /
X
/ \
---2---o---o---B
both 1 and 2 are merge-bases of A and B. Neither one is better than
the other (both are best merge bases). When the --all option is not
given, it is unspecified which best one is output.
This means our official documentation mentions that we do not have a
concrete way to differentiate between these choices. This makes me think
that this change in behavior is not a bug, but it _is_ a change in
behavior. It's worth mentioning, but I don't think there is any value in
making sure `git merge-base` returns the same output.
Does anyone disagree? Is this something we should solidify so we always
have a "definitive" merge-base?
[...]
This may be beyond the scope of what you are working on, but there are
significant advantages to selecting a "best" merge base from among the
candidates. Long ago [1] I proposed that the "best" merge base is the
merge base candidate that minimizes the number of non-merge commits that
are in
git rev-list $candidate..$branch
that are already in master:
git rev-list $master
(assuming merging branch into master), which is equivalent to choosing
the merge base that minimizes
git rev-list --count $candidate..$branch
In fact, this criterion is symmetric if you exchange branch ↔ master,
which is a nice property, and indeed generalizes pretty simply to
computing the merge base of more than two commits.
In that email I also included some data showing that the "best" merge
base almost always results in either the same or a shorter diff than the
more or less arbitrary algorithm that we currently use. Sometimes the
difference in diff length is dramatic.
To me it feels like the best *deterministic* merge base would be based
on the above criterion, maybe with first-parent reachability, commit
times, and SHA-1s used (in that order) to break ties.
Thanks, everyone, for your perspective on this. I'm walking away with
these conclusions:
1. While this is a change in behavior, it is not a regression. We do not
need to act immediately to preserve old behavior in these ambiguous cases.
2. We should (eventually) define tie-breaking conditions. I like
Michael's suggestion above.
Thanks,
-Stolee