Re: [PATCH v6 00/15] rebase -i: offer to recreate commit topology

Jacob Keller <jacob.keller@xxxxxxxxx> · Thu, 12 Apr 2018 15:14:50 -0700

On Thu, Apr 12, 2018 at 3:02 PM, Johannes Schindelin
<Johannes.Schindelin@xxxxxx> wrote:
> Hi Jake,
>
> On Thu, 12 Apr 2018, Jacob Keller wrote:
>
>> On Wed, Apr 11, 2018 at 10:42 PM, Sergey Organov <sorganov@xxxxxxxxx> wrote:
>> >
>> > Jacob Keller <jacob.keller@xxxxxxxxx> writes:
>> >> On Wed, Apr 11, 2018 at 6:13 AM, Sergey Organov <sorganov@xxxxxxxxx> wrote:
>> >>> It was rather --recreate-merges just a few weeks ago, and I've seen
>> >>> nobody actually commented either in favor or against the
>> >>> --rebase-merges.
>> >>>
>> >>> git rebase --rebase-merges
>> >>
>> >> I'm going to jump in here and say that *I* prefer --rebase-merges, as
>> >> it clearly mentions merge commits (which is the thing that changes).
>> >
>> > OK, thanks, it's fair and the first argument in favor of
>> > --rebase-merges I see.
>>
>> I'd be ok with "--keep-merges" also.
>
> My main argument against --keep-merges is that there is no good short
> option for it: -k and -m are already taken. And I really like my `git
> rebase -kir` now...
>

Right, that's unfortunate.

> A minor argument in favor of `--rebase-merges` vs `--keep-merges` is that
> we do not really keep the merge commits, we rewrite them. In the version
> as per this here patch series, we really create recursive merges from
> scratch.

I also don't have a strong opinion in regards to --keep vs --rebase..

>
> In the later patch series on which I am working, we use a variation of
> Phillip's strategy which can be construed as a generalization of the
> cherry-pick to include merges: for a cherry-pick, we perform a 3-way merge
> between the commit and HEAD, with the commit's parent commit as merge
> base. With Phillip's strategy, we perform a 3-way merge between the merge
> commit and HEAD (i.e. the rebased first parent), with the merge commit's
> first parent as merge base, followed by a 3-way merge with the rebased
> 2nd parent (with the original 2nd parent as merge base), etc
>
> However. This strategy, while it performed well in my initial tests (and
> in Buga's initial tests, too), *does* involve more than one 3-way merge,
> and therefore it risks something very, very nasty: *nested* merge
> conflicts.

Yea, it could. Finding an elegant solution around this would be ideal!
(By elegant, I mean a solution which produces merge conflicts that
users can resolve relatively easily).

>
> Now, I did see nested merge conflicts in the past, very rarely, but that
> can happen, when two developers criss-cross merge each others' `master`
> branch and are really happy to perform invasive changes that our merge
> does not deal well with, such as indentation changes.
>
> When rebasing a merge conflict, however, such nested conflicts can happen
> relatively easily. Not rare at all.

Right. This would be true regardless of what strategy we choose, I think.

>
> I found out about this by doing what I keep preaching in this thred:
> theory is often very nice *right* until the point where it hits reality,
> and then frequently turns really useless, real quickly. Theoretical
> musings can therefore be an utter waste of time, unless accompanied by
> concrete examples.

Agreed.

>
> To start, I built on the example for an "evil merge" that I gave already
> in the very beginning of this insanely chatty thread: if one branch
> changes the signature of a function, and a second branch adds a caller to
> that function, then by necessity a merge between those two branches has to
> change the caller to accommodate the signature change. Otherwise it would
> end up in a broken state.
>
> In my `sequencer-shears` branch at https://github.com/dscho/git, I added
> this as a test case, where I start out with a main.c containing a single
> function called core(). I then create one branch where this function is
> renamed to hi(), and another branch where the function caller() is added
> that calls core(). Then I merge both, amending the merge commit so that
> caller() now calls hi(). So this is the main.c after merging:
>
>         int hi(void) {
>                 printf("Hello, world!\n");
>         }
>         /* caller */
>         void caller(void) {
>                 hi();
>         }
>
> To create the kind of problems that are all too common in my daily work
> (seemingly every time some stable patch in Git for Windows gets
> upstreamed, it changes into an incompatible version, causing merge
> conflicts, and sometimes not only that... but I digress...), I then added
> an "upstream" where some maintainer decided that core() is better called
> greeting(), and also that a placeholder function for an event loop should
> be added. So in upstream, main.c looks like this:
>
>         int greeting(void) {
>                 printf("Hello, world!\n");
>         }
>         /* main event loop */
>         void event_loop(void) {
>                 /* TODO: place holder for now */
>         }
>
> Keep in mind: while this is a minimal example of disagreeing changes that
> may look unrealistic, in practice this is the exact type of problem I am
> dealing with on a daily basis, in Git for Windows as well as in GVFS Git
> (which adds a thicket of branches on top of Git for Windows) and with the
> MSYS2 runtime (where Git for Windows stacks patches on top of MSYs2, which
> in turn maintains their set of patches on top of the Cygwin runtime), and
> with BusyBox, and probably other projects I forgot spontaneously. This
> makes me convinced that this is the exact type of problem that will
> challenge whatever --rebase-merges has to deal with, or better put: what
> the user of --rebase-merges will have to deal with.
>
> (If I got a penny for every merge conflict I resolved, where test cases
> were appended to files in t/, I'd probably be rich by now. Likewise, the
> `const char *` -> `struct object_oid *` conflicts have gotten to a point
> where I can resolve them while chatting to somebody.)
>
> Now, rebasing the original patches above (renaming core() to hi(), and
> adding caller()) will obviously conflict with those upstream patches
> (renaming core() to greeting(), and adding event_loop()). That cannot be
> avoided. In the example above, I decided to override upstream's decision
> by insisting on the name hi(), and resolving the other merge conflict by
> adding *both* event_loop() and caller().
>
> The big trick, now, is to avoid forcing the user to resolve the same
> conflicts *again* when the merge commit is rebased. The better we can help
> the user here, the more powerful will this mode be.
>
> But here, Phillip's strategy (as implemented by yours truly) runs this
> problem:
>
>         int hi(void) {
>                 printf("Hello, world!\n");
>         }
>         <<<<<<< intermediate merge
>         <<<<<<< HEAD
>         /* main event loop */
>         void event_loop(void) {
>                 /* TODO: place holder for now */
>         =======
>         =======
>         }
>         >>>>>>> <HASH>... merge head #1
>         /* caller */
>         void caller(void) {
>                 hi();
>         >>>>>>> <HASH>... original merge
>         }
>
> Now, no matter who I ask, everybody so far agreed with me that this looks
> bad. Like, really bad. There are two merge conflicts, obviously, but it is
> not even clear which conflict markers belong together!
>
> It gets a little better when I take a page out of recursive merge's
> playbook, which uses different marker sizes for nested merge conflicts
> (which I of course implemented and pushed to `sequencer-shears`, currently
> still in an unpolished state):
>
>         int hi(void) {
>                 printf("Hello, world!\n");
>         }
>         <<<<<<< intermediate merge
>         <<<<<<<< HEAD
>         /* main event loop */
>         void event_loop(void) {
>                 /* TODO: place holder for now */
>         ========
>         =======
>         }
>         >>>>>>> <HASH>... merge head #1
>         /* caller */
>         void caller(void) {
>                 hi();
>         >>>>>>>> <HASH>... original merge
>         }
>
> At least now we understand which conflict markers belong together. But I
> still needed to inspect the intermediate states to understand what is
> going on:
>
> After the first 3-way merge (the one between the original merge commit and
> HEAD), we have the conflict markers around event_loop() and caller(),
> because they had been added into the same spot.
>
> The second 3-way merge would also want to add the event_loop(), but not
> caller(), so ideally it should see that event_loop() is already there and
> not add any conflict markers. But that is not the case: event_loop() was
> added *with conflict markers*.
>
> So those conflict markers in the first 3-way merge *cause* the conflicts
> in the second 3-way merge!
>
> And indeed, if we merge the other way round (original merge with 2nd
> parent, then with 1st parent), the result looks much better:
>
>         int hi(void) {
>                 printf("Hello, world!\n");
>         }
>         /* main event loop */
>         void event_loop(void) {
>                 /* TODO: place holder for now */
>         }
>         <<<<<<<< HEAD
>         ========
>         /* caller */
>         void caller(void) {
>                 hi();
>         }
>         >>>>>>>> <HASH>... intermediate merge
>
> So: the order of the 3-way merges does matter.
>
> I did implement this, too, in the `sequencer_shears` branch: if the first
> 3-way merge causes conflicts, attempt the second one, and if that one is
> clean, try merging that merge result into HEAD (forgetting about the first
> attempted 3-way merge).
>
> That is still unsatisfying, though, as it is easy to come up with a
> main2.c in the above example that requires the *opposite* merge order to
> avoid nested conflicts.

I agree, this solution won't work reliably because we can show
examples which fail the opposite way.

>
> The only way out I can see is to implement some sort of "W merge" or
> "chandelier merge" that can perform an N-way merge between one revision
> and N-1 other revisions (each of the N-1 bringing its own merge base). I
> call them "W" or "chandelier" because such a merge can be visualized by
> the original merge commit being the center of a chandelier, and each arm
> representing one of the N-1 merge heads with their own merge bases.
>

I think this approach sounds reasonable.

> Similar to the 3-way merge we have implemented in xdiff/xmerge.c, this
> "chandelier merge" would then generate the two diffs between merge base
> and both merge heads, except not only one time, but N-1 times. It would
> then iterate through all hunks ordered by file name and line range. Any
> hunk without conflicting changes would be applied as-is, and the remaining
> ones be turned into conflicts (handling those chandelier arms first where
> both diffs' hunks look identical).
>
> Have I missed any simpler alternative?

I *think* this would work well if I understand it, but it's difficult
to process without examples.

>
> Ciao,
> Johannes