Re: [Summit topic] Server-side merge/rebase: needs and wants?

Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> · Tue, 09 Nov 2021 03:15:50 +0100

On Mon, Nov 08 2021, Taylor Blau wrote:

> I was discussing this with Elijah today in IRC. I thought that I sent
> the following message to the list, but somehow dropped it from the CC
> list, and only sent it to Elijah and Johannes.
>
> Here it is in its entirety, this time copying the list.
>
> n Thu, Oct 21, 2021 at 01:56:06PM +0200, Johannes Schindelin wrote:
>>  5.  The challenge is not necessarily the technical challenges, but the UX for
>>      server tools that live “above” the git executable.
>>
>>      1. What kind of output is needed? Machine-readable error messages?
>>
>>      2. What Git objects must be created: a tree? A commit?
>>
>>      3. How to handle, report, and store conflicts? Index is not typically
>>         available on the server.
>
> I looked a little bit more into what GitHub would need in order to make
> the switch. For background, we currently perform merges and rebases
> using libgit2 as the backend, for the obvious reason which is that in a
> pre-ORT world we could not write an intermediate result without having
> an index around.
>
> (As a fun aside, we used to expand our bare copy of a repository into a
> temporary working directory, perform the merge there, and then delete
> the directory. We definitely don't do that anymore ;)).
>
> It looks like our current libgit2 usage more-or-less returns an
> (object_id, list<file>) tuple, where:
>
>   - a non-NULL object_id is the result of a successful (i.e.,
>     conflict-free) merge; specifically the oid of the resulting root
>     tree
>
>   - a NULL object_id and a non-empty list of files indicates that the
>     merge could not be completed without manual conflict resolution, and
>     the list of files indicates where the conflicts were
>
> When we try to process a conflicted merge, we display the list of files
> where conflicts were present in the web UI. We do have a UI to resolve
> conflicts, but we populate the contents of that UI by telling libgit2 to
> perform the same merge on *just that file*, and writing out the file
> with its conflict markers as the result (and sending that result out to
> a web editor).
>
> So I think an ORT-powered server-side merge would have to be able to:
>
>   - write out the contents of a merge (with a tree, not a commit), and
>     indicate whether or not that merge was successful with an exit code
>
>   - write out the list of files that had conflicts upon failure
>
> Given my limited knowledge of the ORT implementation, it seems like
> writing out the conflicts themselves would be pretty easy. But GitHub
> probably wouldn't use it, or at least not immediately, since we rely
> heavily on being able to recreate the conflicts file-by-file as they are
> needed.
>
> Anyway, I happened to be looking into all of this during the summit, but
> never wrote any of it down. So I figured that this might be helpful in
> case folks are interested in pursuing this further. If so, let me know
> if there are any other questions about what GitHub might want on the
> backend, and I'll try to answer as best I can.

That's very informative, thanks.

Not that "ort" won't me much better at this, but FWIW git-merge-tree
sort of gets most of the way-ish to what you're describing already in
terms of a command interface.

I.e. I'm not the first or last to have (not for anything serious)
implement a dry-run bare-repo merge with something like:

    git merge-tree origin/master git-for-windows/main origin/seen >diff
    # Better regex needed, but basically this
    grep "^\+<<<<<<< \.our$" diff && conflict=t

So with some parsing of that command output you can get a diff with one
side or the other applied.

>From there it's a matter of applying the patch, and from there you'd get
blobs/trees. which is painful from just having a diff & no index, so
it's a common use-case of libgit2 for just such basic usage.

But to the extent that we were talking about plumbing interfaces
wouldn't basically a git-merge-tree on steroids (or extension thereof)
do, i.e.:

 * Ask it to merge X heads, returns whether it worked or not
 * ... and can return a diff with conflict markers like this
 * ... for just some <pathspec>
 * ... maybe with the conflict already "resolved" one way or the other?
 * ... optionally, after some markers write one/both sides, spew out the
   relevant tree/blob OIDs
 * ... which again, could be limited by the <pathspec> above.

I'm thinking of something that basically works like git for-each-ref --format="". So:

    git merge-tree --format="..." <heads> -- <pathspec>

Where that <format> can be custom \0-delimited (or whatever) sections of
payload that could have whatever combination of the above you'd need. I
think git-for-each-ref is probably the best example we've got of a
plumbing interface in this category, i.e. being able to extract
arbitrary payloads via format specifiers & "path" (well, ref)
limitation.

Elijah probably has much better ideas already, I'm just spitballing. 

But if something like that worked it would be mostly a matter of
stealing code from for-each-ref and the like, and then the <handwaiving>
mapping that to ORT callbacks somehow.

And then it could even learn a --batch mode, which with those formats
could allow calling it without paying the price for command
re-invocation, something like the update-ref/proposed cat-file interface
discussed in another thread at [1].

1. https://lore.kernel.org/git/211106.86k0hmgc8q.gmgdl@xxxxxxxxxxxxxxxxxxx/