Re: [RFC PATCH] diff: only prefetch for certain output formats

Jeff King <peff@xxxxxxxx> · Thu, 30 Jan 2020 19:14:16 -0500

On Thu, Jan 30, 2020 at 03:20:02PM -0800, Jonathan Tan wrote:

> > +	/*
> > +	 * At this point we know there's actual work to do: we have rename
> > +	 * destinations that didn't find an exact match, and we have potential
> > +	 * sources. So we'll have to do inexact rename detection, which
> > +	 * requires looking at the blobs. It's worth pre-fetching them as a
> > +	 * group now.
> > +	 */
> > +	for (i = 0; i < rename_dst_nr; i++) {
> [...]
> 
> And also the equivalent code in diffcore_break() and in diffcore_std()
> after both these functions are invoked (in case nothing got prefetched,
> but the diff still requires blobs).

I think diffcore_break() would probably be OK to just pre-fetch
everything if it's enabled, since it has to look at the content of all
modifications. Though I suppose _technically_ added/deleted entries do
not get looked at, I doubt anybody would care in practice since the
primary use is to then feed all of the pairs into the rename code.

The diffcore_std() logic would be similar to what you wrote earlier
based on theformats. I think you'd want it to come first, before
diffcore_rename(), because it fetches a superset of refname (if it
fetches anything at all). I.e., for "diff -M -p", you'd want:

  1. diffcore_std() sees "-p" and fetches everything

  2. diffcore_rename() sees there's nothing we don't already have

rather than:

  1. diffcore_rename() fetches a few blobs to do rename detection

  2. diffcore_std() fetches a few more blobs that weren't rename
     candidates, but we need for "-p"

-Peff