Re: [RFC PATCH 0/2] Introduce new merge-tree-ort command

Johannes Schindelin <Johannes.Schindelin@xxxxxx> · Tue, 11 Jan 2022 23:30:36 +0100 (CET)

Hi Elijah,

On Mon, 10 Jan 2022, Elijah Newren wrote:

> On Mon, Jan 10, 2022 at 5:49 AM Johannes Schindelin
> <Johannes.Schindelin@xxxxxx> wrote:
> >
> > On Fri, 7 Jan 2022, Elijah Newren wrote:
> >
> > > On Fri, Jan 7, 2022 at 9:58 AM Christian Couder
> > > <christian.couder@xxxxxxxxx> wrote:
> > > >
> > > > On Wed, Jan 5, 2022 at 5:54 PM Elijah Newren <newren@xxxxxxxxx>
> > > > wrote:
> > > > >
> > > > > On Wed, Jan 5, 2022 at 8:33 AM Christian Couder
> > > > > <christian.couder@xxxxxxxxx> wrote:
> > > >
> ...
> >
> > I _bet_ it needs first and foremost the information "is this
> > mergeable?".
> >
> > That is by far the most common question the code I saw had to answer,
> > without any follow-up questions asked.
> >
> > An add-on question is then "which files/directories conflicted?". And
> > there, it really only wanted the file names, along with the OIDs of
> > the respective item in the base, the HEAD and the merge branch.
>
> This might be difficult to provide for some cases, e.g.
> rename/rename(1to2), especially if that conflict gets entangled with any
> others.  Users would also have difficulty gaining these even using the
> command line `git merge` (with either recursive or ort) for these cases.
>
> Also, does this presume three OIDs?  What about the cases where there
> are 4 or 6 for a given path?
>
> I'm a bit worried about the assumptions underlying this request, and
> that it might not be satisfiable in general depending on what those
> assumptions are.

Well, that request is driven by the reality of a web UI.

You cannot reasonably resolve just any merge conflict in a web UI. But you
can easily resolve a trivial content conflict in, say, a README file,
without opening a console window, cloning the repository, starting an
editor, then saving the result after resolving the textual conflict,
committing it, then trying to force-push to the original PR branch (if the
contributor has given you permission to push).

So what I am talking about here really is the case where no rename
conflict happened, no directory/file conflict, no type change. Just the
good ole' textual conflicts where the same lines were modified in
divergent ways.

This means that we need the base, HEAD and MERGE OIDs here (and modes, you
are absolutely correct).

> > In my work in December, I also had to implement another thing that I
> > think we will have to implement in `merge-tree` in one form or
> > another: when users provided merge conflict resolutions via the web
> > UI, we want the merge to succeed. What I implemented was this (in a
> > nutshell, a way to provide file names with associated blob OIDs that
> > resolve the content conflicts):
>
> Interesting.  I'm curious, though -- if they are providing conflict
> resolutions, then would they not have had a previous merge to find out
> what the conflicts were?  If so, wouldn't they be able to provide the
> tree to avoid the need for a second merge?

No, the conflict resolutions are apparently stored independently from the
commits. Probably to allow for the same resolutions to be applied in case
that the PR's target branch changes but leaves the same conflicts.

Again, we are dealing with the realities of a web UI ;-)

BTW I am not saying that the way I implemented it is 100% the best way to
achieve the intended goal. It is quite possible that there is a better
way, or that we need to at least provide the (mode,OID) triplet
corresponding to the conflict, too, so that the merge machinery can verify
that the resolution is applied to the _correct_ conflict ;-)

> > -- snip --
> > Author: Johannes Schindelin <Johannes.Schindelin@xxxxxx>
> > Date:   Thu Dec 16 20:46:35 2021 +0100
> > Subject: merge-ort: allow pre-resolving merge conflicts
> >
> > One of merge-ort's particular strengths is that it works well even in a
> > worktree-less setting, e.g. in a backend for a server-side merge.
> >
> > In such a scenario, it is conceivable that the merge in question not
> > only results in a conflict, but that the caller figures out some sort of
> > resolution before calling the merge again. The second call, of course,
> > is meant to apply the merge conflict resolution.
> >
> > With this commit, we allow just that. The new, optional
> > `pre_resolved_conflicts` field of `struct merge_options` maps paths to
> > the blob OID of the resolution.
> >
> > Signed-off-by: Johannes Schindelin <johannes.schindelin@xxxxxx>
> >
> > diff --git a/merge-ort.c b/merge-ort.c
> > index a74d4251c3..fa59ce2f97 100644
> > --- a/merge-ort.c
> > +++ b/merge-ort.c
> > @@ -3961,6 +3961,7 @@ static void process_entries(struct merge_options *opt,
> >         prefetch_for_content_merges(opt, &plist);
> >         for (entry = &plist.items[plist.nr-1]; entry >= plist.items; --entry) {
> >                 char *path = entry->string;
> > +               struct object_id *resolved_oid;
> >                 /*
> >                  * NOTE: mi may actually be a pointer to a conflict_info, but
> >                  * we have to check mi->clean first to see if it's safe to
> > @@ -3972,7 +3973,17 @@ static void process_entries(struct merge_options *opt,
> >                                           &dir_metadata);
> >                 if (mi->clean)
> >                         record_entry_for_tree(&dir_metadata, path, mi);
> > -               else {
> > +               else if (opt->pre_resolved_conflicts &&
> > +                        (resolved_oid =
> > +                         strmap_get(opt->pre_resolved_conflicts, path))) {
>
> You have a couple problematic assumptions here:
>
>   * What if the user's resolution required fixing a semantic conflict,
> meaning they needed to modify a file that had no conflicts?  Your code
> here would ignore those, because the clean case is handled above.
>
>   * What if the user's resolution required adding a totally new file
> (either because a rename is handled as a separate add/delete, or they
> just needed a new file)?  The loop above isn't over items in
> pre_resolved_conflicts, it just assumes that items in
> pre_resolved_conflicts are also in the plist.items being looped over.

I can see how these assumptions may look ludicrous when coming from the
command-line. But again, we are talking about the realities of a conflict
resolution via a web UI.

So I think that it is out of the question to address non-textual conflicts
in this scenario. And then the assumptions make much more sense.

>
> > +                       mi->clean = 1;
> > +                       mi->is_null = 0;
> > +                       memcpy(&mi->result.oid, resolved_oid,
> > +                              sizeof(*resolved_oid));
>
> And here's another:
>
>   * What if the provided resolution was a deletion of the file in
> question (especially after e.g. a modify/delete or rename/delete
> conflict)?  Don't you need to have a special check for the zero_oid
> here?
>
> And if I combine the things above:
>
>   * What if the user wants to remove a file and add a directory of
> files in its place at some location in order to resolve things?
> Granted, you didn't handle either deletes or new files above, but if
> you did add both, then you might have this issue.  The code at this
> point used some very carefully constructed logic and order of walking
> items specifically to handle file/delete conflicts correctly.  I'm
> worried that your conflict resolution stuff here might interact badly
> with that.
>
> > +                       if (!mi->result.mode)
> > +                               mi->result.mode = 0100644;
>
> How do you know it's not supposed to be 0100755?

I don't ;-)

This was a proof-of-concept, and I meant to look into this a bit further,
trying to figure out from where I could get this information, but I never
got to that.

Now that I think about it, the best solution would probably be for the
mode to be provided in addition to the blob OID, so that the caller has to
decide.

> > +                       record_entry_for_tree(&dir_metadata, path, mi);
> > +               } else {
> >                         struct conflict_info *ci = (struct conflict_info *)mi;
> >                         process_entry(opt, path, ci, &dir_metadata);
> >                 }
> > diff --git a/merge-recursive.h b/merge-recursive.h
> > index 0795a1d3ec..1f45effdd0 100644
> > --- a/merge-recursive.h
> > +++ b/merge-recursive.h
> > @@ -47,6 +47,13 @@ struct merge_options {
> >         const char *subtree_shift;
> >         unsigned renormalize : 1;
> >
> > +       /*
> > +        * (merge-ORT only) If non-NULL, contains a map from `path` to OID. If
> > +        * the given `path would be marked as conflict, it is instead resolved
> > +        * to the specified blob.
> > +        */
> > +       struct strmap *pre_resolved_conflicts;
> > +
> >         /* internal fields used by the implementation */
> >         struct merge_options_internal *priv;
> >  };
> > -- snap --
> >
> > It is a proof-of-concept
>
> Fair enough.
>
> >, therefore it expects the resolved conflicts to
> > be in _files_. I don't think that there is a way to reasonably handle
> > symlink target conflicts or directory/file/symlink conflicts, but there
> > might be.
>
> You really need (mode,oid) pairs to be provided by the user.  That
> fixes the executable issue I mentioned above, and makes it clear how
> to handle symlinks and file/symlink conflicts.

It's a really good point, too.

> directory/file are still handled by providing individual files, but
> ordering traversal becomes really tricky.  The directory/file case
> might even require the pre_resolved_conflicts to be pulled out of that
> loop somehow.  It'd take some investigative work, or some deep
> thought, or both.

My idea for directory/file conflicts is to essentially teach the web UI to
throw its hands in the air upon encountering them, and telling the user
that it's not possible to resolve these types of conflicts via the UI.

And since my principal driver here is the concrete need for such a
server-side merge with conflict resolution, that's as far as I want to
push `merge-tree`, too, and leave any more complicated resolution to a
future date, or never, whichever comes first.

> > A subsequent commit changed my hacky `merge-tree` hack to optionally
> > accept NUL-terminated merge conflict resolutions in <path>/<OID> pairs via
> > stdin (again, avoiding files to be written where we do not _need_ spend
> > I/O unnecessarily).
> >
> > The biggest problem we faced at the Contributor Summit was that our
> > discussion was not informed by the actual server-side needs. So I would
> > like to reiterate my challenge to make that the driver. Let's not use any
> > hypothetical scenario as the basis for the design of `git merge-tree`, but
> > let's use the concrete requirements of actual server-side merges that are
> > currently implemented using libgit2, when trying to figure out what
> > functionality we need from `merge-tree`, and in what shape.
> >
> > Here is an initial list:
> >
> > - need to determine whether a merge will succeed, quickly
> >
> > - need the tree OID for a successful merge
> >
> > - need the list of items that conflict...
>
> I think my version is good up to here.

Yes!

The only thing I would change is to not even bother providing a tree
object in case of conflicts. Either we provide it, and expect the user to
somehow reconstruct the conflict types from there, or we simply don't say
anything about the tree whose (transitive) file contents may or may not
contain conflict markers (or not, think: binary files). Providing a tree
object in case of a failed merge isn't helpful IMO.

> >  , along with OIDs and modes, if the merge failed
>
> Could you clarify what you mean here by OIDs and modes?  For a given
> path, are you just looking for a (pathname, OID, mode) tuple?  (In
> which case, you'd get the OID of a file that potentially has embedded
> conflict markers)
>
> Or are you looking for multiple OIDs and modes for each file?

This. In case of a conflict, I am looking for (mode,OID) for the merge
base (which _can_ be a virtual one in case of recursive merges) as well as
for the two divergent revisions that were supposed to be merged.

I do realize that other types of conflicts can occur, such as a
rename/rename conflict, and we would need a way to represent these in the
output, too. But in such an instance, where no clear (mode,OID) triplet
can be provided that represents the merge conflicts for this file, the
current web UI cannot offer a way to resolve the conflict, so I am a bit
less interested in that scenario.

> If you are looking for multiple OIDs and modes for each file
> (representing where the content came from on the different sides of
> the merge), are you looking for:
>   * the OID and mode of each file that played a part in the merge resolution
> OR
>   * just the OIDs and modes that would have been recorded in the index
> had the merge been done by `git merge`?

I think the latter was my idea.

But again, you made me think of rename/rename conflicts and friends, and
we would need a way to represent those, too. Even if my current use case
would need to only detect their presence in order to say "nope, can't
resolve that on the web".

> (Those last two possibilities are usually the same answer, but no they
> are not always the same.  The index cannot hold all the OIDs and modes
> in various cases and we have to squash them down to three, possibly
> tossing some information that might have been of interest to the user.
> This can even occur when you have a unique merge base.)
>
> >
> > - need a way to provide merge conflict resolutions
> >
> > And now that I wrote all this, I realize I should have sent it to the
> > `merge-tree` thread, not the `merge-tree-ort` thread...
>
> My submission was RFC and you're providing C, so it's all good in my
> book.  I'm reading both threads.  :-)

:-)

Thank you so much!
Dscho