Re: merge-ort and --rebase-merges, was Re: [PATCH v2 0/7] Drop support for git rebase --preserve-merges

Elijah Newren <newren@xxxxxxxxx> · Mon, 13 Sep 2021 08:53:29 -0700

On Mon, Sep 13, 2021 at 4:24 AM Johannes Schindelin
<Johannes.Schindelin@xxxxxx> wrote:
>
> Hi Elijah,
>
> On Fri, 10 Sep 2021, Elijah Newren wrote:
>
> > On Fri, Sep 10, 2021 at 5:08 AM Johannes Schindelin
> > <Johannes.Schindelin@xxxxxx> wrote:
> > >
> > > On Tue, 7 Sep 2021, Elijah Newren wrote:
> > >

[...snip...]

> > If I've understood that all correctly, then my idea will give you the
> > following conflict to resolve:
> >
> > ==> rebase-of-original-merge, before conflict resolution:
> > int hi(void) {
> >     printf("Hello, world!\n");
> > }
> > /* main event loop */
> > void event_loop(void) {
> >     /* TODO: place holder for now */
> > }
> > /* caller */
> > void caller(void) {
> > <<<<<<< HEAD
> >     greeting();
> > ||||||| auto-remerge of original-merge
> >     core();
> > =======
> >     hi();
> > >>>>>>> original-merge
> > }
>
> That looks very intriguing! I would _love_ to play with this a bit, and I
> think you provided enough guidance to get going. I am currently preparing
> to go mostly offline for the second half of September, read: I won't be
> able to play with this before October. But I am definitely interested,
> this sounds very exciting.

If you start working on it, let me know.  I was thinking of playing
with it, but don't know exactly when I'll get time to do so; very
unlikely before October, and reasonably likely not even before the end
of the year.

While I've provided the high level details in this thread which are
good enough to handle the simple cases, I think that the interesting
bits are the non-simple cases.  I have not thought all of them
through, but I'll include below some notes of mine that might be
helpful if you get to it first.  Note that I focus below on the
non-simple cases, and discuss content-based conflicts before covering
path-based ones:

* We're doing a three way merge of merges: pre-M, M, and N to get R; M
is the original merge, pre-M is (automatic) remerge of M, and N is
automatic merge of rebased parents of M.

* Note that N is what current rebase-merges uses, so we have all
information from that merge and can provide it to the user when or if
it is helpful.

* Both pre-M and N may themselves have conflicts.

* We need to programmatically handle conflict marker length when pre-M
and/or N have nested conflicts.  (must modify merge routines to return
the maximal conflict marker depth used)

* Special case that pre-M matches N (per hunk): If both pre-M and N
have conflict markers, but they happen to match, then we know to take
the version from M and the result IS clean (at least for that hunk).
So, you can still get a clean merge even if there are conflicts in
both pre-M and N.

* Special case that pre-M matches M (per hunk): Usually in the
three-way merge of "Base, Left, Right => Result", if Base matches
either side then you get a clean merge.  However, if pre-M matches M
but N has conflicts, the result is NOT clean.  Another way to look at
this is that conflict markers are special and should be treated
differently than other lines.  (And path-based conflicts probably need
special handling too, as discussed below.)

* In the case of complicated conflicts, consider providing user with
both R:resulting-file and N:resulting-file (and point them at `git log
-1 --remerge-diff M [-- resulting-file]`)

* Having either binary files or path-based conflicts (e.g.
modify/delete, file vs. directory vs. submodule, switch to symlink vs.
switch to executable, rename/add, rename/rename -- either 1to2 or
2to1, directory rename detection, etc.) in either pre-M or N -- or
both -- are going to need special care.

* One example of path-based conflicts:  Let's say pre-M had no
conflict at path P, and that pre-M:P and M:P matched.  Let's say that
N:P had a modify/delete conflict.  Note that for modify/delete
conflicts we tend to print a message to the console and leave the
modified version of the file in the working tree.  Here, despite the
fact that pre-M:P and M:P matched, we cannot just take the modified
file from N at P as the merge result.  The modify/delete conflict
should persist and the user given an opportunity to resolve it.
Representing the modify/delete might be an interesting question,
though since...

* If both pre-M and N have conflicts, then pre-M would have had up to
three versions of file in the index at higher stages, N would have had
up to three versions of file in the index at higher stages, and M
would have one.  We cannot represent all 7 versions of the file in the
index at the end, which means conflict representation might be tricky.
content-based conflicts are easier to handle here than path-based
ones, since content-based conflicts can just do similar to what
rename/rename(2to1) conflicts do today: just stick the result of the
preliminary three-way merges into the index.  path-based conflicts get
more interesting because you can't do a preliminary merge of binary
files or a preliminary merge of a modify/delete, etc.

* If both pre-M and N have path-based conflicts, but of different
types, how exactly do we mention that to the user?  Just list all the
types?  (This probably qualifies as a case of a "complicated" conflict
where we want to (also?) provide the user with N:resulting-file
instead of (just?) R:resulting-file.)  We may also need to modify
merge machinery to return types of conflicts per-path, an extension of
what my "merge-into" series (not yet submitted) provides.