Re: [PATCH 3/3] merge-ort: implement merge_incore_recursive()

Elijah Newren <newren@xxxxxxxxx> · Tue, 15 Dec 2020 20:09:14 -0800

On Tue, Dec 15, 2020 at 6:07 PM Junio C Hamano <gitster@xxxxxxxxx> wrote:
>
> "Elijah Newren via GitGitGadget" <gitgitgadget@xxxxxxxxx> writes:
>
> > +/*
> > + * Originally from merge_recursive_internal(); somewhat adapted, though.
> > + */
> > +static void merge_ort_internal(struct merge_options *opt,
> > +                            struct commit_list *merge_bases,
> > +                            struct commit *h1,
> > +                            struct commit *h2,
> > +                            struct merge_result *result)
> > +{
> > +     struct commit_list *iter;
> > +     struct commit *merged_merge_bases;
> > +     const char *ancestor_name;
> > +     struct strbuf merge_base_abbrev = STRBUF_INIT;
> > +
> > +     if (!merge_bases) {
> > +             merge_bases = get_merge_bases(h1, h2);
> > +             merge_bases = reverse_commit_list(merge_bases);
>
> Do we want to say why we reverse here, or is the reason so well
> known why in the original merge-recursive case?

Oh, good point.  After starting on merge-ort, I shifted back and forth
between it and cleaning up merge-recursive for a while...and it looks
like this is one of the things I forgot to copy over.  The reason was
totally opaque to me until Johannes spelled it out over here:
https://lore.kernel.org/git/nycvar.QRO.7.76.6.1907252055500.21907@xxxxxxxxxxxxxxxxx/

Note that the same reversing of merge bases is done in builtin/merge.c
and sequencer.c as well.  It resulted in me adding the following note
to the declaration of merge_recursive() in merge-recursive.h:

 * NOTE: empirically, about a decade ago it was determined that with more
 *       than two merge bases, optimal behavior was found when the
 *       merge_bases were passed in the order of oldest commit to newest
 *       commit.  Also, merge_bases will be consumed (emptied) so make a
 *       copy if you need it.

but I never copied that comment over to merge_incore_recursive().  I
should do that, and perhaps reference that comment at this point in
the code.

> > +     }
> > +
> > +     merged_merge_bases = pop_commit(&merge_bases);
> > +     if (merged_merge_bases == NULL) {
> > +             /* if there is no common ancestor, use an empty tree */
> > +             struct tree *tree;
> > +
> > +             tree = lookup_tree(opt->repo, opt->repo->hash_algo->empty_tree);
> > +             merged_merge_bases = make_virtual_commit(opt->repo, tree,
> > +                                                      "ancestor");
> > +             ancestor_name = "empty tree";
> > +     } else if (opt->ancestor && !opt->priv->call_depth) {
> > +             ancestor_name = opt->ancestor;
> > +     } else if (merge_bases) {
> > +             ancestor_name = "merged common ancestors";
> > +     } else {
> > +             strbuf_add_unique_abbrev(&merge_base_abbrev,
> > +                                      &merged_merge_bases->object.oid,
> > +                                      DEFAULT_ABBREV);
> > +             ancestor_name = merge_base_abbrev.buf;
> > +     }
>
> So, up to this point we learned:
>
>  - merge bases either given by the caller, or computed from h1 and
>    h2 when the caller just told us to figure it out ourselves.
>
>  - if we have
>
>    - 0 merge base between h1 and h2, in which case we would use an
>      empty tree as an imaginary common
>
>    - 1 merge base between h1 and h2, in which case the common
>      ancestor of the resuting merge between h1 and h2 is that single
>      merge base
>
>    - 2 or more bases, in which case we'd use that would eventually
>      come back when we merged recursively all bases.
>
> and the primary products of the above procedure are
>
>  - ancestor_name (the string used when presenting conflicts while
>    merging h1 and h2)
>
>  - merged_merge_bases (one starting commit among the merge bases)
>
> And then the loop will iterate over the remaining merge bases,
> merging one popped from it in the current merged_merge_bases,
> until we run out.  At that point when we leave the loop, we'd
> have merged_merge_bases that is a virtual commit to be used as
> a single merge base to use while merging trees of h1 and h2.
>
> > +     for (iter = merge_bases; iter; iter = iter->next) {
> > +             const char *saved_b1, *saved_b2;
> > +             struct commit *prev = merged_merge_bases;
> > +
> > +             opt->priv->call_depth++;
> > +             /*
> > +              * When the merge fails, the result contains files
> > +              * with conflict markers. The cleanness flag is
> > +              * ignored (unless indicating an error), it was never
> > +              * actually used, as result of merge_trees has always
> > +              * overwritten it: the committed "conflicts" were
> > +              * already resolved.
> > +              */
> > +             saved_b1 = opt->branch1;
> > +             saved_b2 = opt->branch2;
> > +             opt->branch1 = "Temporary merge branch 1";
> > +             opt->branch2 = "Temporary merge branch 2";
> > +             merge_ort_internal(opt, NULL, prev, iter->item, result);
> > +             if (result->clean < 0)
> > +                     return;
> > +             opt->branch1 = saved_b1;
> > +             opt->branch2 = saved_b2;
> > +             opt->priv->call_depth--;
> > +
> > +             merged_merge_bases = make_virtual_commit(opt->repo,
> > +                                                      result->tree,
> > +                                                      "merged tree");
> > +             commit_list_insert(prev, &merged_merge_bases->parents);
> > +             commit_list_insert(iter->item,
> > +                                &merged_merge_bases->parents->next);
>
> We need to record these parents because...?  When merged_merge_bases
> we just created is used as one side of a merge in the next iteration,
> we'd need to compute the merge base between it and the one we'd pop
> out of merge_bases, and that is why.
>
> > +             clear_or_reinit_internal_opts(opt->priv, 1);
> > +     }
>
> OK.  I think I understood this loop.  It looks mostly straight-forward.
>
> > +     opt->ancestor = ancestor_name;
>
> And the label to be used, that was computed before the above loop,
> is used here...
>
> > +     merge_ort_nonrecursive_internal(opt,
> > +                                     repo_get_commit_tree(opt->repo,
> > +                                                          merged_merge_bases),
> > +                                     repo_get_commit_tree(opt->repo, h1),
> > +                                     repo_get_commit_tree(opt->repo, h2),
> > +                                     result);
>
> ... to finally compute the 3-way merge between h1 and h2.
>
> > +     strbuf_release(&merge_base_abbrev);
>
> And the storage that may have been holding the .ancestor name is
> cleared, as we no longer need it.
>
> > +     opt->ancestor = NULL;  /* avoid accidental re-use of opt->ancestor */
> > +}
> > +
> >  void merge_incore_nonrecursive(struct merge_options *opt,
> >                              struct tree *merge_base,
> >                              struct tree *side1,
> > @@ -1493,7 +1577,9 @@ void merge_incore_recursive(struct merge_options *opt,
> >                           struct commit *side2,
> >                           struct merge_result *result)
> >  {
> > -     (void)reverse_commit_list;
> > -     (void)make_virtual_commit;
> > -     die("Not yet implemented");
> > +     assert(opt->ancestor == NULL ||
> > +            !strcmp(opt->ancestor, "constructed merge base"));
>
> Where does this string come from?  The recursive backend does
> asssign a fixed string with that value to opt->ancestor, but we
> don't expect that string to come here, no?

It's specifically the merge_recursive_generic() function from
merge-recursive.c that sets this, which was part of the
merge-recursive API.  merge-ort does not (yet?) have an equivalent
function (anything calling merge_recursive_generic() just can't use
merge-ort right now -- a list that includes 'am', 'stash', and
'merge-recursive').  For now, I am just letting those callers continue
to use merge-recursive.c.  I never figured out if I wanted to make
that function part of merge-ort's API, whether I just wanted to add a
wrapper to merge-ort-wrappers.[ch] for it, or if I should rewrite the
callers to do something else.

However, looking a little closer, the name for opt->ancestor is
slightly phony -- I think it only makes sense for "am", not for either
of "stash" or "merge-recursive".  Perhaps I should instead count the
number of merge_bases, and assert that either opt->ancestor == NULL
(exclusive)OR  num_merge_bases == 1.  merge_recursive_generic() should
also be made to stop setting opt->ancestor, and then callers like
"am", "stash", and "merge-recursive" should be responsible to provide
a reasonable ancestor name for merge.conflictStyle=diff3 to use when
it's clear they are providing the sole ancestor.

Then at some point I can decide what to do with
merge_recursive_generic().  I'll probably just make it a wrapper at
some point; since that lets me kick the can down the road even
further.  :-)

> > +     merge_start(opt, result);
> > +     merge_ort_internal(opt, merge_bases, side1, side2, result);
> >  }