Re: [RFC] Rebasing merges: a jorney to the ultimate solution (Road Clear)

Jacob Keller <jacob.keller@xxxxxxxxx> · Sat, 17 Feb 2018 20:16:34 -0800

On Fri, Feb 16, 2018 at 5:08 AM, Sergey Organov <sorganov@xxxxxxxxx> wrote:
> Hi,
>
> By accepting the challenges raised in recent discussion of advanced
> support for history rebasing and editing in Git, I hopefully figured out
> a clean and elegant method of rebasing merges that I think is "The Right
> Way (TM)" to perform this so far troublesome operation. ["(TM)" here has
> second meaning: a "Trivial Merge (TM)", see below.]
>
> Let me begin by outlining the method in git terms, and special thanks
> here must go to "Johannes Sixt" <j6t@xxxxxxxx> for his original bright
> idea to use "cherry-pick -m1" to rebase merge commits.
>
> End of preface -- here we go.
>

I hope to take a more detailed look at this, also possibly with some
attempts at re-creating the process by hand to see it in practice.

> Given 2 original branches, b1 and b2, and a merge commit M that joins
> them, suppose we've already rebased b1 to b1', and b2 to b2'. Suppose
> also that B1' and B2' happen to be the tip commits on b1' and b2',
> respectively.
>
> To produce merge commit M' that joins b1' and b2', the following
> operations will suffice:
>
> 1. Checkout b2' and cherry-pick -m2 M, to produce U2' (and new b2').
> 2. Checkout b1' and cherry-pick -m1 M, to produce U1' (and new b1').
> 3. Merge --no-ff new b2' to current new b1', to produce UM'.
> 4. Get rid of U1' and U2' by re-writing parent references of UM' from
>    U1' and U2' to  B1' and B2', respectively, to produce M'.
> 5. Mission complete.
>

Seems pretty straight forward, go to each branch and cherry-pick the
merge respective to its relative parent, and then finally re-merge
everything, and consume the intermittent commits.

> Let's now see why and how the method actually works.
>
> Firs off, let me introduce you to my new friend, the Trivial Merge, or
> (TM) for short. By definition, (TM) is a merge that introduces
> absolutely no differences to the sides of the merge. (I also like to
> sometimes call him "Angel Merge", both as the most beautiful of all
> merges, and as direct antithesis to "evil merge".)
>
> One very nice thing about (TM) is that to safely rebase it, it suffices
> to merge its (rebased) parents. It is safe in this case, as (TM) itself
> doesn't posses any content changes, and thus none could be missed by
> replacing it with another merge commit.
>
> I bet most of us have never seen (TM) in practice though, so let's see
> how (TM) can help us handle general case of some random merge. What I'm
> going to do is to create a virtual (TM) and see how it goes from there.
>
> Let's start with this history:
>
>   M
>  / \
> B1  B2
>
> And let's transform it to the following one, contextually equivalent to
> the original, by introducing 2 simple utility commits U1 and U2, and a
> new utility merge commit UM:
>
>   UM
>  /  \
> U1   U2
> |    |
> B1   B2
>
> Here content of any of the created UM, U1, and U2 is the same, and is
> exact copy of original content of M. I.e., provided [A] denotes
> "content of commit A", we have:
>
> [UM] = [U1] = [U2] = [M]
>
> Stress again how these changes to the history preserve the exact content
> of the original merge ([UM] = [M]), and how U1 an U2 represent content
> changes due to merge on either side[*], and how neither preceding nor
> subsequent commits content would be affected by the change of
> representation.
>
> Now observe that as [U1] = [UM], and [U2] = [UM], the UM happens to be
> exactly our new friend -- the "Trivial Merge (TM)" his true self,
> introducing zero changes to content.
>
> Next we rebase our new representation of the history and we get:
>
>   UM'
>  /  \
> U1'  U2'
> |    |
> B1'  B2'
>
> Here UM' is bare merge of U1' and U2', in exact accordance with the
> method of rebasing a (TM) we've already discussed above, and U1' and U2'
> are rebased versions of U1 and U2, obtained by usual rebasing methods
> for non-merge commits.
>
> (Note, however, that at this point UM' is not necessarily a (TM)
> anymore, so in real implementation it may make sense to check if UM' is
> not a (TM) and stop for possible user amendment.)
>

This might be a bit tricky for a user to understand what the process
is, especially if they don't understand how it's creating special U1'
and U2' commits. However, it *is* the cleanest method I've either seen
or thought of for presenting the conflict to the user.

> Finally, to get to our required merge commit M', we get the content of
> UM' and record two actual parents of the merge:
>
>   M'
>  / \
> B1' B2'
>
> Where [M'] = [UM'].
>
> That's it. Mission complete.
>
> I expect the method to have the following nice features:
>
> - it carefully preserves user changes by rebasing the merge commit
> itself, in a way that is semantically similar to rebasing simple
> (non-merge) commits, yet it allows changes made to branches during
> history editing to propagate over corresponding merge commit that joins
> the branches, even automatically when the changes don't conflict, as
> expected.
>

Right.

> - it has provision for detection of even slightest chances of ending up
> with surprising merge (just check if UM' is still (TM)), so that
> implementation could stop for user inspection and amendment when
> appropriate, yet it is capable of handling trivial cases smoothly and
> automatically.

Nice!

>
> - it never falls back to simple invocation of merge operation on rebased
> original branches themselves, thus avoiding the problem of lack of
> knowledge of how the merge at hand has been performed in the first
> place. It doesn't prevent implementation from letting user to manually
> perform whatever merge she wishes when suspect result is automatically
> detected though.
>

Right, since we're re-creating the intermittent commits U1' and U2'
first based on the original merge, that's how we manage to maintain
the result of the merge. I like it.

> - it extends trivially to octopus merges.
>
> - it appears shiny to the point that it will likely be able to handle
> even darkest evil merges nicely, no special treatment required.
>

Yep, and I like that it has a pretty reasonable way of presenting
conflicts for resolution. It may be a bit tricky to explain the use of
the intermittent commits U1' and U2' though.

> Footnote:
>
> [*] We may as well consider the (UM,U1,U2) trio to be semantically split
> representation of git merge commit, where U1 and U2 represent content
> changes to the sides, and UM represents pure history joint. Or, the
> other way around, we may consider git merge commit to be optimized
> representation of this trio. I think this split representation could
> help to simplify reasoning about git merges in general.

Yes, I think this concept is pretty useful. I think it could be useful
as a way of showing how the merge worked.

Thanks,
Jake

>
> -- Sergey