Re: [PATCH 7/8] merge-tree: support saving merge messages to a separate file

Elijah Newren <newren@xxxxxxxxx> · Mon, 3 Jan 2022 11:46:53 -0800

On Mon, Jan 3, 2022 at 9:23 AM Fabian Stelzer <fs@xxxxxxxxxxxx> wrote:
>
> On 03.01.2022 08:51, Elijah Newren wrote:
> >On Mon, Jan 3, 2022 at 4:31 AM Fabian Stelzer <fs@xxxxxxxxxxxx> wrote:
> >>
> >> On 31.12.2021 05:04, Elijah Newren via GitGitGadget wrote:
> >> >From: Elijah Newren <newren@xxxxxxxxx>
> [...]
> >> >
> >> > static int real_merge(struct merge_tree_options *o,
> >> >@@ -442,8 +443,15 @@ static int real_merge(struct merge_tree_options *o,
> >> >        */
> >> >
> >> >       merge_incore_recursive(&opt, merge_bases, parent1, parent2, &result);
> >> >+
> >> >+      if (o->messages_file) {
> >> >+              FILE *fp = xfopen(o->messages_file, "w");
> >> >+              merge_display_update_messages(&opt, &result, fp);
> >> >+              fclose(fp);
> >>
> >> I don't know enough about how merge-ort works internally, but it looks to me
> >> like at this point the merge already happened and we just didn't clean up
> >> (finalize) yet. It feels wrong to die() at this point just because we can't
> >> open messages_file.
> >
> >Yes, the merge already happened; there now exists a new toplevel tree
> >(that nothing references).  I'm not sure I understand what's wrong
> >with die'ing here, though.  I can't tell if you want to defer the
> >die-ing until later, or just avoid the die-ing and return some kind of
> >success despite failing to complete what the user requested.
> >
>
> I think i would prefer the merge operation to abort before actually merging
> when not being able to write its logfile. Otherwise we possibly do a whole
> lot of work that`s inaccessible afterwards isn't it? (since we don`t print
> the hash)

I see where you're coming from, but I don't see this as worth worrying
about.  For two reasons:

(1) I'm not sure I buy the "whole lot of work" concern.

merge-ort is pretty snappy.  For a simple example of rebasing a single
patch in linux.git across a branch with 28000 renames, I get 176
milliseconds for merge_incore_nonrecursive().  Granted, linux.git is
pretty small in terms of number of files, but Stolee did some
measurements a while back on the Microsoft repos with millions of
files at HEAD.  For those, for a trivial merge he saw
merge_incore_recursive() complete in 2 milliseconds, and for a trivial
rebase he saw merge_incore_nonrecursive() complete in 4 milliseconds
(See https://lore.kernel.org/git/CABPp-BHO7bZ3H7A=E9TudhvBoNfwPvRiDMm8S9kq3mYeSXrpXw@xxxxxxxxxxxxxx/).
So huge numbers of files pose much less of a problem than lots of
interesting work like renames, and merge-ort is pretty fast in either
case.  Sure, if we were talking about traditional merge-recursive
which would have taken 150000 milliseconds on the same single patch in
linux.git testcase (due to the 28000 renames), then we might worry
more about not letting work get tossed, but at only 176 milliseconds
even with a crazy number of renames, it's just not worth worrying
about.

(2) Even if there is a lot of computation, I don't see why this error
path merits extra coding work to salvage the computation somehow

By way of comparison, a regular `git merge` will abort after
completing the same amount of merge work (i.e. after creating a new
tree) when the user has a dirty working tree involving a path that
would need to be updated by the merge operation.  And that is not a
bug; it's a requirement -- we cannot first check if the user has
dirtied such a path before performing the merge because it's
impossible to do so accurately in the face of renames.
merge-recursive tried to do that and had early aborts that fell in the
false-positive category and some that fell in the false-negative
category.  It was impossible to fix the false-positives and
false-negatives without either (a) disallowing ever doing a merge with
a dirty working tree under any conditions (a backwards compatibility
break), or (b) waiting to do the notification of
dirty-files-in-the-way until after the merge tree has been computed.
I wasn't about to break that feature, so merge-ort had to delay error
notifications instead.

Now, the dirty-file-in-the-way condition is for a very common case
(either for users who intentionally like keeping dirty changes around
and doing merges but the branch they are merging happens to touch a
file they didn't know about, or users who just forgot that they had
local modifications).  In contrast, this case here is for when we
cannot open a file for writing -- with the filename explicitly just
specified by the user.

So, I'd rather keep the code nice and simple as it currently stands.

> Thanks for your work on this feature. I think this could open a lot of new
> possibilities.

I hope people do interesting things with it, and with the server-side
commit replaying I'm working on as well.