Re: [PATCH v2 5/5] merge-ort: add prefetching for content merges

Elijah Newren <newren@xxxxxxxxx> · Tue, 22 Jun 2021 01:02:38 -0700

On Wed, Jun 16, 2021 at 10:04 PM Junio C Hamano <gitster@xxxxxxxxx> wrote:
>
> "Elijah Newren via GitGitGadget" <gitgitgadget@xxxxxxxxx> writes:
>
> > +             /* Ignore clean entries */
> > +             if (ci->merged.clean)
> > +                     continue;
> > +
> > +             /* Ignore entries that don't need a content merge */
> > +             if (ci->match_mask || ci->filemask < 6 ||
> > +                 !S_ISREG(ci->stages[1].mode) ||
> > +                 !S_ISREG(ci->stages[2].mode) ||
> > +                 oideq(&ci->stages[1].oid, &ci->stages[2].oid))
> > +                     continue;
> > +
> > +             /* Also don't need content merge if base matches either side */
> > +             if (ci->filemask == 7 &&
> > +                 S_ISREG(ci->stages[0].mode) &&
> > +                 (oideq(&ci->stages[0].oid, &ci->stages[1].oid) ||
> > +                  oideq(&ci->stages[0].oid, &ci->stages[2].oid)))
> > +                     continue;
>
> Even though this is unlikely to change, it is unsatisfactory that we
> reproduce the knowledge on the situations when a merge will
> trivially resolve and when it will need to go content level.

I agree, it's not the nicest.

> One obvious way to solve it would be to fold this logic into the
> main code that actually merges a list of "ci"s by making it a two
> pass process (the first pass does essentially the same as this new
> function, the second pass does the tree-level merge where the above
> says "continue", fills mmfiles with the loop below, and calls into
> ll_merge() after the loop to merge), but the logic duplication is
> not too big and it may not be worth such a code churn.

I'm worried even more about the resulting complexity than the code
churn.  The two-pass model, which I considered, would require special
casing so many of the branches of process_entry() that it feels like
it'd be increasing code complexity more than introducing a function
with a few duplicated checks.  process_entry() was already a function
that Stolee reported as coming across as pretty complex to him in
earlier rounds of review, but that seems to just be intrinsic based on
the number of special cases: handling anything from entries with D/F
conflicts, to different file types, to match_mask being precomputed,
to recursive vs. normal cases, to modify/delete, to normalization, to
added on one side, to deleted on both side, to three-way content
merges.  The three-way content merges are just one of 9-ish different
branches, and are the only one that we're prefetching for.  It just
seems easier and cleaner overall to add these three checks to pick off
the cases that will end up going through the three-way content merges.
I've looked at it again a couple times over the past few days based on
your comment, but I still can't see a way to restructure it that feels
cleaner than what I've currently got.

Also, it may be worth noting here that if these checks fell out of
date with process_entry() in some manner, it still would not affect
the correctness of the code.  At worst, it'd only affect whether
enough or too many objects are prefetched.  If too many, then some
extra objects would be downloaded, and if too few, then we'd end up
later fetching additional objects 1-by-1 on demand later.

So I'm going to agree with the not-worth-it portion of your final
sentence and leave this out of the next roll.