Re: [RFC v2] Rebasing merges: a jorney to the ultimate solution (Road Clear)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Buga,

On Thu, 8 Mar 2018, Igor Djordjevic wrote:

> On 07/03/2018 15:08, Johannes Schindelin wrote:
> > 
> > > > Didn't we settle on Phillip's "perform successive three-way merges
> > > > between the original merge commit and the new tips with the old tips
> > > > as base" strategy?
> > >
> > > It seems you did, dunno exactly why.
> > 
> > That is not true. You make it sound like I was the only one who liked
> > this, and not Phillip and Buga, too.
> 
> For myself, I do actually favor Sergey`s approach in general, but 
> _implemented_ through what Phillip described (or a mixture of both, to 
> be precise). But, let me explain... :)

So as you explained later in this sub-thread, Sergey's approach is
essentially the same as Phillip's.

I still do not understand Sergey's approach on a fundamental level. I
mean, I can follow his instructions how to implement his algorithm, but it
is as if I had a blindfold on and somebody guided me through a maze: I
understand *what* I am supposed to do, but I have no clue *why*.

And admittedly, I got very frustrated when a document was thrown my way
that is too long to read in one sitting, and all of my attempts at getting
clear and comprehensible answers to specific questions were met with "go
and read that document, I am sure you will understand then".

For something as fundamental to my daily workflow as an interactive rebase
(*especially* when trying to maintain the branch topology), this is no
good at all.

Since you already confirmed that there is essentially no difference
between the two approaches, I will simply go with the one I understand, in
particular I understand *why* it works.

But let's read on, maybe I will change my mind based on your explanations
(which do answer my questions, thank you so much for that)...

> > > The main problem with this decision is that we still don't see how
> > > and when to stop for user amendment using this method. OTOH, the
> > > original has this issue carefully discussed.
> > 
> > Why would we want to stop, unless there are merge conflicts?
> 
> Because we can reliably know that something "unusual" happened - and by
> that I don`t necessarily mean "wrong", but just might be worth user
> inspection.

We have a similar conundrum in recursive merges. Remember how multiple
merge bases are merged recursively? There can be merge conflicts, too, in
*any* of the individual merges involved, and indeed, there are (under
relatively rare circumstances).

Since we already faced that problem, and we already answered it by
presenting possibly nested merge conflicts, I am in strong favor of
keeping our new scenario consistent: present possibly-nested merge
conflicts.

As far as I understand, one of the arguments in favor of the current
approach was: there is no good way to tell the user where they are, and
how to continue from there. So better just to continue and present the
user with the entire set of conflicts, and have an obvious way out.

> For example, situation like this (M is made on A3 with `-s ours`, 
> obsoleting Bx commits):
> 
> (1) ---X8--X9 (master)
>        |\
>        | A1---A2---A3
>        |             \
>        |              M (topic)
>        |             /
>        \-B1---B2---B3
> 
> ... where we want to rebase M onto X9 is what I would call "usual 
> stuff", but this situation (M is still made on A3 with `-s ours`, 
> obsoleting Bx commits, but note cherry-picked B2'):
> 
> (2) ---X8--B2'--X9 (master)
>        |\
>        | A1---A2---A3
>        |             \
>        |              M (topic)
>        |             /
>        \-B1---B2---B3
> 
> ... where we still want to rebase M onto X9 is what we might consider 
> "unusual", because we noticed that something that shouldn`t be part 
> of the rebased merge commit (due to previous `-s ours`) actually got 
> in there (due to later cherry-pick), and just wanting the user to 
> check and confirm.

We already have those scenarios when performing a regular interactive
rebase, where a patch was already applied upstream. In the normal case,
the user is not even shown B2, thanks to the --cherry-pick option used in
generating the todo list.

Granted, in some cases --cherry-pick does not detect that, and then we
generate a todo list including B2, and when that patch is applied, the
interactive rebase stops, saying that there are no changes to be
committed.

And this behavior is exactly the same with --recreate-merges!

So I do not think that it would make sense to bother the user *again* when
rebasing the merge commit.

If there are merge conflicts, yes, we will have to. If there are none
(even if your U1' != U2'), it would be outright annoying to stop.

> > > "rebase sides of the merge commit and then three-way merge them back
> > > using original merge commit as base"
> > 
> > And that is also wrong, as I had proved already! Only Buga's addition
> > made it robust against dropping/modifying commits, and that addition
> > also makes it more complicated.
> 
> No, this is actually right, that sentence nicely describing _how_ it 
> works.

Does it? Because that's exactly backwards from how Phillip's approach
works: it certainly does not use the original merge commit as base. It
uses the non-rebased merge parent as base, one at a time.

Now I am curious...

> That addition of mine was just including the correct merge base (being
> the original merge commit that we are "rebasing"), and that`s what
> Sergey is talking about.

Okay, render me still unconvinced that Sergey's approach is actually
correct.

> > > > - it is *very easy* to reason about, once it is pointed out that
> > > > rebases and merges result in the same trees.
> > >
> > > The original is as easy to reason about, if not easier, especially as
> > > recursive merge strategy is not being used there in new ways.
> > 
> > So do it. I still have to hear a single-sentence, clear and obvious
> > explanation why it works.

So the closest I got was the description with the *original merge commit*
as merge base, which I disagree with. It does not make sense to me.

Consecutive three-way merges between the original merge commit and the
merge parents (with the *original merge parents* as merge base,
respectively), still makes the most sense to me: it is a merge between the
amendmends to the merge commit and the changes introduced by rebasing the
merge parents.

I am now getting the sense as if Sergey's approach (with your, let's say,
"fix") is trying to apply too much, and by using the original merge commit
as merge base then tries to undo part of that.

> > > I honestly don't see any advantages of Phillip's method over the
> > > original, except personal preferences. At the same time, I have no
> > > objection of using it either, provided consistency check problem is
> > > solved there as well.
> > 
> > Okay, let me reiterate then, because I do not want this point to be
> > missed:
> > 
> > Phillip's method is essentially merging the new tips into the original
> > merge, pretending that the new tips were not rebased but merged into
> > upstream.
> > 
> > So it exploits the duality of the rebase and merge operation, which both
> > result in identical trees (potentially after resolving merge conflicts).
> > 
> > I cannot think of any such interpretation for your proposal augmented by
> > Buga's fix-ups. And I haven't heard any such interpretation from your
> > side, either.
> 
> Ok here it goes (I might be still wrong, but please bare with me).
> 
> What Sergey originally described also uses the "duality of the rebase 
> and merge operation", too ;) Just in a bit different way (and might 
> be a more straightforward one?).

I will be the judge whether it looks more straight-forward to me. :-)

> Here`s a starting point, two commits A and B, merged into M:
> 
> (3) ---A
>         \
>          M
>         /
>     ---B
> 
> 
> According the "patch theory"[1] (which might not be too popular 
> around here, but should serve the purpose for what I`m trying to 
> explain),

I think that the patch theory is not usually quoted here because

1) originally, Darcs had this snake oil way of (over-)selling its theory
   as having "roots in quantum mechanics" [*1*] at the time Git took off,
   and

2) it does not really have a good concept of rebasing, even if it should
   be theoretically possible to integrate that into the "theory of
   patches".

> each merge commit can be "transformed" to two non-merge commits, one on
> top of each of the merge parents, where new commit brings its original
> merge parent commit tree to the state of the merge commit tree:
> 
> (4) ---A---U1
> 
> 
> 
>     ---B---U2

You get the same result if you stick to regular graph theory, of course.
All you need to do is to interpret the directed vertices between nodes as
a patch & parent relationship. Simple.

> Now, we have two new commits, U1 and U2, each having the same tree as 
> previous merge commit M, but representing changes in regards to 
> specific parents - and this is essentially what Sergey`s original 
> approach was using (whether he knew it, or not).
> 
> When it comes to rebasing, it`s pretty simple, too. As this:
> 
> (5) ---X1---o---o---o---o---o---X2 (master)
>        |\
>        | A1---A2---A3
>        |             \
>        |              M
>        |             /
>        \-B1---B2---B3
> 
> ... actually equals this:
> 
> (6) ---X1---o---o---o---o---o---X2 (master)
>        |\
>        | A1---A2---A3---U1
>        |
>        |
>        |
>        \-B1---B2---B3---U2
> 
> ... where trees of M, U1 and U2 are same,

Okay, so you basically duplicated the merge commit and dropped its
semantics as a merge commit. That is a very big difference to Phillip's
approach already.

It opens the door to ambiguities, as we will see later.

> and we can use the regular rebase semantics and rebase it to this:
> 
> (7) ---X1---o---o---o---o---o---X2 (master)
>                                 |\
>                                 | A1'--A2'--A3'--U1'
>                                 |
>                                 |
>                                 |
>                                 \-B1'--B2'--B3'--U2'
> 
> ... which is essentially this again:
> 
> (8) ---X1---o---o---o---o---o---X2 (master)
>                                 |\
>                                 | A1'--A2'--A3'
>                                 |            \
>                                 |             M'
>                                 |            /
>                                 \-B1'--B2'--B3'
> 
> ... where it is still true that trees of U1', U2' and M' are still 
> the same. So we managed to rebase a merge commit without ever doing a 
> merge :) (note that, practically, we _can_ finally even merge U1' and 
> U2' to produce M', it shouldn`t really matter as all the trees are 
> the same, so merge will be a no-op)

U1' and U2' do not have the same tree, though *especially* when the user
did not edit the todo list to insert/modify/drop/reorder commits.

This violation of expectations is of course caused by "duplicating the
merge commit" into U1 and U2.

But let's continue.

> But, as we saw (what you`ve explained), this doesn`t really work in 
> case one of the sides of the merge gets "disbalanced" (so to say), 
> like dropping a commit (which could also happen non-interactively, 
> where a commit has been cherry-picked to a different branch, but
> previously obsoleted by `-s ours` merge).

Precisely. A *very* important counter-argument to this approach so far.

> As observed, dropped commit could still wrongly get into final merge 
> commit tree (or cherry-picked commit wrongly not get there), due to 
> the nature of those rebased U1 and U2 temporary commits to hold all 
> the differences in regards to their specific merge commit parent.

The reason for this, of course, is that either U1's or U2's diff will show
those differences, *and we still try to rebase them even if the user
already dropped them*.

But let's continue.

> A natural improvement to original idea which Sergey eventually came 
> up with after my findings (which you now ended up calling a hack, even, 
> but I would respectfully disagree), is to use original merge commit M 
> as a merge base for final merge of U1' and U2' - and here is why it 
> works, too, and why I wouldn`t consider it a hack, but a proper (and 
> a solid) solution.
> 
> Merge commit M is what we are rebasing, so we should have a way to 
> somehow learn from it, alongside U1 and U2 temporary commits - and 
> that is what using original merge commit M as a base does for us. It 
> helps us spot any "disbalances" that happened in comparison to original 
> merge parent trees, which is exactly what we`re interested in.

... except that it gets the direction wrong. Rather than trying to *avoid*
rebasing possibly dropped changes, it tries to kind of "undo" them by
using a merge base that does not really make sense (unless you think of it
as a "revert").

It would make sense if M could interpreted as a branch point. But it
cannot, as we specifically did *not* continue to develop the merge parents
from that merge commit.

Instead, what we did was to branch off of the original branch point X1.

Reframing the rebase of a sub-branch (X1..A3) as merge with upstream,
however, we can interpret M and A3' as revisions we want to merge, with A3
as the merge base.

(You can think of it in terms of the "theory of patches" thusly: if X1..A3
is represented by patch K, X1..X2 by patch L, and X2..A3' by K', then what
we want to merge into M is K^(-1).L.K', which is precisely what A3..A3'
translates to.)

You can also think of it as diverging changes going from A3: one direction
was to merge (resulting in the commit M), the other direction was to
rebase onto X2 (resulting in A3'), and a 3-way merge M <- A3 -> A3' will
reconcile those changes (and you will want to repeat with B3/B3', too).

Let's put that into the context of your example: instead of introducing U1
and U2, we introduce V1 and V2 right away, as temporary *merge* commits,
where the tree of V1 is identical to the one of A3', and the tree of V2
to that of B3'.

 ---X1---o---o---o---o---o---X2 (master)
    |                        |\
    |                        | A1'--A2'--A3'--V1
    |\                       |               /
    | -A1---A2---A3----------+---------------
    |              \         |
    |               M        \-B1'--B2'--B3'--V2
    |              /                         /
     \-B1---B2---B3--------------------------

Note: the *really* important difference is that these temporary commits
are based on the *rebased* history rather than the *unrebased* history.

Second note: this is very related to Git for Windows' original strategy of
coping with the non-fast-forwarding nature of rebases: after rebasing the
Windows-specific patches on top of core Git, we merged (with `-s ours`)
the unrebased history. This 1) makes it fast-forwardable, and 2) still
rebases the patches [*2*].

Third note: my favorite mental model is still the duality of rebasing and
merging, in which case V1 and V2 would not have A3' and B3' as first
parents, but X2.

Phillip's strategy is to merge M with V1 and V2.

This translates to "merge the amendments of the merge commit M with the
changes introduced by rebasing its merge parents".

The really cute part about this is that (in contrast to using U1' and
U2'), we do not merge the amendments of the merge commit multiple times,
but exactly once. And therefore, we do not need to "undo" them by using
the original merge commit as merge base, either (which would introduce
many more opportunities for merge conflicts to creep in, oftentimes
unnecessary conflicts to begin with).

> In ideal case (as per "patch theory"[1]), final merge of rebased U1' 
> and U2' with original merge commit M as a base would be rather simple, 
> too (not to say pointless again), as both U1' and U2' would have same 
> changes in comparison to M, namely all the changes from commit we are 
> rebasing from (X1 above) to commit we are rebasing to (X2 above).
> 
> But in a more complex case like this:
> 
> (9) ---X1---B2'---o---o---o---o---X2 (master)
>                                   |\
>                                   | A12--A2'---B3'
>                                   |             \
>                                   |              M'
>                                   |             /
>                                   \-B1'--B3'---B4
> 
> ..., being what we are ultimately interested in, it helps us notice 
> all kind of changes on our rebased merge parent branches, and act 
> accordingly (as shown by my previously posted test script[2]).

To use the "theory of patches" to explain why Phillip's approach is so
much more appealing: in Sergey's approach, we will rebase U1 (which is
"sort of" B1.B2.B3 in the "theory of patches"). If the cherry-pick of B2'
caused merge conflicts that had to be resolved, then these merge conflicts
will have to be resolved *again* when rebasing U1 (because B2 is *part of*
U1). And of course they will have to be resolved *again* when merging U1'
with U2'.

In short: Phillip's approach is a short-cut that avoids unnecessary merge
conflicts because it avoids rebasing changes that would need to be undone
right away anyway.

> All this said (and hoping anyone is still reading this), to shed some 
> light on what I meant by "favoring Sergey`s approach in general, but 
> _implemented_ through what Phillip described".
> 
> I think we should use temporary commits U1 and U2, being a sound 
> approach backed up by the "patch theory"[1], as it helps us notice 
> any "disbalances" of the rebased merge commit parents trees (so we 
> can stop for user inspection), finally merging them with original 
> merge commit as a base, in order to spot additional changes on trees 
> of the merge commit parents we are rebasing.

In Phillip's approach, we do not need to rebase the amendments of the
merge commit M twice (or even more times, for octopus merges). Therefore,
there is no opportunity for these imbalances.

> But, the merging steps themselves, backed up by a need to stop for 
> conflict resolution as soon as one happens, should be performed 
> _iteratively_, like Phillip showed us... I would think :)

>From a user interface perspective, this is a bad idea: you would now have
to communicate to the user *where* in the process they are.

And for that you would have to explain that process to them. Including
"theory of patches" and all.

Taking me as a prime example, you now know how tedious and impractical
that would be.

> And, for example, that would do wonders when we introduce completely 
> new branch during an interactive rebase, too, for example:
> 
> (10) ---X1---B2'---o---o---o---o---X2 (master)
>         |\                        /|\
>         | A1---A2---A3---U1       || A12--A2'---B3'---U1'
>         |             \           ||             \
>         |              M          ||              M'
>         |             /           ||             /|
>         \-B1---B2---B3---U2       |\---B3'---B4-/-|---U2'
>                                   |               |
>                                   \-----B1'-------/
> 
> 
> In this situation, we would merge all temporary Ux commits using 
> original merge M as a merge base, and then merge _that_ resulting 
> tree with B1' (being a newly introduced merge commit parent), using 
> whatever merge base is most appropriate between the two (in this 
> case, X2).

The semantics of octopus merges are very different from regular recursive
merges. I am not sure that we want to go there in this discussion...

If, on the other hand, you do not try to turn M from a regular 2-parent
merge commit into an octopus merge during an interactive rebase, but
instead split the B branch into two branches *before* merging the result
into the rebased merge commit, we are in the same square as
reordering/dropping/inserting/modifying patches: to neither of the
presented strategies would it matter what kind of branch topology the
merge parents have.

For the record: I am still not convinced that Phillip's and Sergey's
approach are equivalent, even in terms of the "theory of patches". But if
they are, then Phillip's version is a shorter version that avoids applying
changes just to revert them right away. And in a setting where each patch
can cause merge conflicts (as the interactive rebase is), the less changes
you have to apply, the better.

Ciao,
Dscho

> [1] https://en.wikibooks.org/wiki/Understanding_Darcs/Patch_theory

Footnote *1*: https://web.archive.org/web/20050511033324/http://darcs.net/
And it must be said: the "theory of patches" are not rooted in quantum
mechanics. They might be inspired by Darc's inventor being more familiar
with it than with fundamental algebra, for sure. The "theory of patches"
is nothing else than an algebra on graph vertices, though.

Footnote *2*: We called this strategy the rebasing merge. We replaced it
by the opposite strategy ("merging rebase", i.e. starting with the `git
merge -s ours <unrebased>` and *then* applying the rebased patches)
because the latter makes it really obvious where the Windows-specific
patches *start*.

Of course, you could draw the diagram very easily also using the merging
rebase, the conclusion would still be the same.



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux