Re: [PATCH v2 01/24] builtin/pack-objects: make hash agnostic

Jeff King <peff@xxxxxxxx> · Sun, 23 Feb 2020 22:01:42 -0500

On Sun, Feb 23, 2020 at 04:57:59PM -0500, Jeff King wrote:

> It would be nice to get rid of [nth_packed_object_sha1] entirely. In most cases,
> it's either free to do so (we end up copying the result into an oid
> anyway) or we pay for one extra hashcpy (out of the mmap into a local
> struct). But the one in pack-check.c:verify_packfile() is tricky; we
> store a pointer per object into the index mmap, and we'd have to switch
> to storing an oid per object. Given that the code isn't commonly run
> (and other operations like _generating_ the index in the first place are
> clearly OK with the extra memory hit), I think I'd be OK with that in
> the name of cleanliness.
> 
> All of that is sort of orthogonal to your goals, I think, so I don't
> mind at all calling it out of scope for your series. As long as we use
> the_hash_algo->rawsz, these bare pointer accesses aren't wrong (it's
> just that it's easy to accidentally get it wrong, as this code shows).

I looked into this a bit further. It turns out that the current code in
verify_packfile() was explicitly trying to avoid that extra memory. But
the good news is that I think we can improve it further, cleaning up the
existing type-punning and using even less memory than now.

I'll prepare a separate series to do that. I had thought the cleanup
might also make this whole 20 / the_hash_algo->rawsz thing go away, but
it doesn't (the name hashwrite() made me think we ought to be using an
oidwrite(), but of course that's silly; the "hash" here is a "file with
a hash checksum" and not "an object id hash"). So the two topics really
are independent.

-Peff