Re: commit-graph: change in "best" merge-base when ambiguous

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 05/21/2018 08:10 PM, Derrick Stolee wrote:
> [...]
> In the Discussion section of the `git merge-base` docs [1], we have the
> following:
> 
>     When the history involves criss-cross merges, there can be more than
> one best common ancestor for two commits. For example, with this topology:
> 
>     ---1---o---A
>         \ /
>          X
>         / \
>     ---2---o---o---B
> 
>     both 1 and 2 are merge-bases of A and B. Neither one is better than
> the other (both are best merge bases). When the --all option is not
> given,     it is unspecified which best one is output.
> 
> This means our official documentation mentions that we do not have a
> concrete way to differentiate between these choices. This makes me think
> that this change in behavior is not a bug, but it _is_ a change in
> behavior. It's worth mentioning, but I don't think there is any value in
> making sure `git merge-base` returns the same output.
> 
> Does anyone disagree? Is this something we should solidify so we always
> have a "definitive" merge-base?
> [...]

This may be beyond the scope of what you are working on, but there are
significant advantages to selecting a "best" merge base from among the
candidates. Long ago [1] I proposed that the "best" merge base is the
merge base candidate that minimizes the number of non-merge commits that
are in

    git rev-list $candidate..$branch

that are already in master:

    git rev-list $master

(assuming merging branch into master), which is equivalent to choosing
the merge base that minimizes

    git rev-list --count $candidate..$branch

In fact, this criterion is symmetric if you exchange branch ↔ master,
which is a nice property, and indeed generalizes pretty simply to
computing the merge base of more than two commits.

In that email I also included some data showing that the "best" merge
base almost always results in either the same or a shorter diff than the
more or less arbitrary algorithm that we currently use. Sometimes the
difference in diff length is dramatic.

To me it feels like the best *deterministic* merge base would be based
on the above criterion, maybe with first-parent reachability, commit
times, and SHA-1s used (in that order) to break ties.

I don't plan to work on the implementation of this idea myself (though
we've long used a script-based implementation of this algorithm
internally at GitHub).

Michael

[1] https://public-inbox.org/git/539A25BF.4060501@xxxxxxxxxxxx/
    See the rest of the thread for more interesting discussion.
[2]
https://public-inbox.org/git/8a9b3f20-eed2-c59b-f7ea-3c68b3c30bf5@xxxxxxxxxxxx/
    Higher in this thread, Junio proposes a different criterion.



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux