Re: [PATCH 1/2] diff: Fix modified lines stats with --stat and --numstat

Jeff King <peff@xxxxxxxx> · Fri, 18 Sep 2020 13:37:39 -0400

On Fri, Sep 18, 2020 at 11:10:45AM -0400, Thomas Guyot-Sionnest wrote:

> > So, my last point is the most important of the three. I'd expect
> > something more along the lines of:
> >
> >   1. diff_fill_oid_info resolve the link to the pipe, and
> >   2. index_path handles the resolved fd.
> >
> > ...but it looks like that is already what it's doing? I'm confused why
> > this doesn't work as-is.
> 
> So the idea is to checksum the data and write a valid oid. I'll see if
> I can figure that out. Thanks for the hint though else I would likely
> have gone with a buffer and memcmp. Your solution seems cleaner, and
> there is a few other uses of oideq's that look dubious at best with
> the case of null oids / buffered data so it's definitely a better
> approach.

You're generally better off not to compute the oid just to compare two
buffers:

  - a byte-by-byte comparison can quit early as soon as it sees a
    difference, whereas computing two hashes has to cover each byte

  - even in the worst case that the byte comparison has to go all the
    way to the end, it's way faster than computing a sha1

So generally in the diff code we compare oids if we got them for free
(from the index, or from a tree), but otherwise it's OK to have
filespecs without the oid_valid flag set, and compare them byte-wise
when necessary. And there something like:

  if (one->size == two->size &&
      !memcmp(one->data, two->data, one->size))

is what you'd want.

Note that filespecs may not have their data or size loaded yet, though.
Looking at that part of builtin_diffstat(), I'm pretty sure that is
possible here (see how later code calls diff_populate_filespec() to make
sure it has data). OTOH, I guess if they're from stdin we'd always have
the data (since we'd have no oid to load from), so it might be OK under
that conditional.

-Peff