Re: [PATCH v2 1/4] merge-ort: barebones API of new merge strategy with empty implementation

Elijah Newren <newren@xxxxxxxxx> · Mon, 26 Oct 2020 14:18:46 -0700

On Mon, Oct 26, 2020 at 1:45 PM Junio C Hamano <gitster@xxxxxxxxx> wrote:
>
> "Elijah Newren via GitGitGadget" <gitgitgadget@xxxxxxxxx> writes:
>
> > + *   git merge [-s recursive]
> > + *
> > + * with
> > + *
> > + *   git merge -s ort
> > + *
> > + * Note: git's parser allows the space between '-s' and its argument to be
> > + * missing.  (Should I have backronymed "ham", "alsa", "kip", "nap, "alvo",
> > + * "cale", "peedy", or "ins" instead of "ort"?)
>
> One thing that is quite unpleasant is "git grep ort" gives us too
> many hits already, and it will be hard to locate ort related changes
> with "git log --grep=ort", as the name is too short to serve as an
> effective way to limit the search.

Suggestions for an alternative name?  merge-pandemic.c since it was
mostly written during the pandemic?

I'm really not good at naming things...

> > diff --git a/merge-ort.h b/merge-ort.h
> > new file mode 100644
> > index 0000000000..47d30cf538
> > --- /dev/null
> > +++ b/merge-ort.h
> > @@ -0,0 +1,49 @@
> > +#ifndef MERGE_ORT_H
> > +#define MERGE_ORT_H
> > +
> > +#include "merge-recursive.h"
> > +
> > +struct commit;
> > +struct tree;
> > +
> > +struct merge_result {
> > +     /* whether the merge is clean */
> > +     int clean;
> > +
> > +     /* Result of merge.  If !clean, represents what would go in worktree */
> > +     struct tree *tree;
>
> Curious.  Because there is no way for "struct tree" to hold an
> in-core pointer to a "struct blob" (iow, for a blob to be in a
> "struct tree", it has to have been assigned an object name), unless
> we are using the "pretend" mechanism, which has its own downsides,
> we are committed to create a throw-away blob objects with conflict
> markers in them, and write them to the object store.

This is something merge-recursive already does (and I've copied some
of that code over, around merge_3way() and the call to
write_object_file() with the results).  I thought the reasoning behind
this was memory -- we're okay assuming any given file fits in memory
(and perhaps up to three copies of it so we can do a three-way merge),
but we're not okay assuming all (changed) files from a commit
simultaneously fit in memory.

> If we were writing a new merge machinery from scratch, I would have
> preferred a truly in-core implementation that does not have to write
> out to the object store but if this makes the implementation simpler,
> perhaps it is a small enough price to pay.

I thought about that early on, but I was worried about out-of-memory
situations if we attempt to do truly in-memory, at least for large
changes in large repositories.

And as you have seen above, I do rely on being able to create trees.

> > +     /*
> > +      * Additional metadata used by merge_switch_to_result() or future calls
> > +      * to merge_inmemory_*().  Not for external use.
> > +      */
> > +     void *priv;
> > +     unsigned ate;
>
> I'd prefer to see this named not so cute.  Will we hang random
> variations of things, or would this be better to be made into a
> pointer to union, with an enum that tells us which kind it is in
> use?

I don't understand the union suggestion.  Both fields are used.

Would you object if 'ate' was named '_'?  That was my original name,
but Taylor didn't like it.  It is used on about 4 lines of code, I'm
99.9% sure it will never be used in additional locations, and callers
shouldn't mess with it.  I just don't have a good name for it.  I
guess maybe I should just call it "properly_initialized" or something.

> > +};
>
>
> > +/* rename-detecting three-way merge with recursive ancestor consolidation. */
> > +void merge_inmemory_recursive(struct merge_options *opt,
> > +                           struct commit_list *merge_bases,
> > +                           struct commit *side1,
> > +                           struct commit *side2,
> > +                           struct merge_result *result);
>
> I've seen "incore" spelled as a squashed-into-a-single-word, but not
> "in_memory".

I can add an underscore.  Or switch to incore.  Preference?

> > +/* rename-detecting three-way merge, no recursion. */
> > +void merge_inmemory_nonrecursive(struct merge_options *opt,
> > +                              struct tree *merge_base,
> > +                              struct tree *side1,
> > +                              struct tree *side2,
> > +                              struct merge_result *result);
> > +
> > +/* Update the working tree and index from head to result after inmemory merge */
> > +void merge_switch_to_result(struct merge_options *opt,
> > +                         struct tree *head,
> > +                         struct merge_result *result,
> > +                         int update_worktree_and_index,
> > +                         int display_update_msgs);
>
> To those who have known how our merge works, a natural expectation
> for an "in-core" merge is that when the "in-core" merge finishes,
> the index would hold the higher stages for the conflicted paths, and
> cleanly merged paths would have the result at stage 0, and there is
> an extra thing that we haven't had that represents what the working
> tree files for conflicted paths should look like (historically we
> wrote out the conflicted result to the working tree files---being
> in-core operation we cannot afford to), so that (1) cleanly merged
> paths can be externalized by writing from their stage 0 entries and
> (2) contents with conflicts can be externalized by that "extra
> thing".
>
> But this helper says "working tree and index" are both updated, so
> the "in-core" merge it expects must have not just the working tree
> result (in result->tree, as the comment in the structure says) but
> also how the higher stages of the index should look like somewhere
> in the result structure.  How the latter is done is not at all clear
> at this point in the mock-up.  Leaving it opaque is fine, but the
> function, and the result structure, deserve clarification to avoid
> confusing readers by highlighting how it is different from the
> traditional ways (e.g. "we don't touch the index at all---instead we
> store that in the priv/ate fields", if that is what is going on).

Yes, your reading is correct.  We don't touch the index (or any index,
or any cache_entry) at all.  Among other things, data that can be used
to update the index are in the "priv" field.

I'll try to add some notes to the file.